Network Working Group B. Lilly
Internet-Draft March 2005
Intended status: Standards Track
Expires: September 15, 2005
Indicating and Negotiating Text Script
draft-lilly-content-script-01
Status of this Memo
By submitting this Internet-Draft, the author represents that any
applicable patent or other IPR claims of which he is aware have been
or will be disclosed, and any of which he becomes aware will be
disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as
Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
Copyright Notice
Copyright © The Internet Society (2005).
Abstract
Some written text in some languages can be represented in multiple
scripts, or writing forms. This memo proposes mechanisms for
identification and negotiation of script for written text.
Lilly Expires September 15, 2005 [Page 1]
Internet-Draft Indicating and Negotiating Text Script March 2005
Table of Contents
1. Introduction................................................... 3
1.1. Script, Language, Charset, and Content.................... 3
1.1.1. Script is Distinct from Language..................... 3
1.1.2. Script is Related to Charset......................... 3
1.1.3. Documents in a Single Language May Have Multiple
Scripts............................................... 3
1.1.4. Documents in Multiple Languages May Use a Single
Script................................................ 3
2. Requirement Levels............................................. 3
3. ABNF References................................................ 3
4. Header Fields.................................................. 4
4.1. Indicating Script; the Content-Script Header Field........ 4
4.1.1. Semantics............................................ 4
4.1.2. ABNF................................................. 4
4.1.3. Usage................................................ 4
4.1.4. Header Field Registration Template................... 5
4.2. Script Negotiation; the Accept-Script Header Field........ 6
4.2.1. Semantics............................................ 6
4.2.2. ABNF................................................. 6
4.2.3. Semantic Details..................................... 6
4.2.4. Usage................................................ 6
4.2.5. Header Field Registration Templates.................. 7
5. Media Feature Tag.............................................. 8
5.1. Media Feature Tag Registration Template................... 9
6. Acknowledgments................................................ 10
7. Security Considerations........................................ 10
8. Internationalization Considerations............................ 10
9. IANA Considerations............................................ 10
Appendix A. Examples.............................................. 11
A.1. Script Indication......................................... 11
A.1.1. Simple Example....................................... 11
A.1.2. Multiple Alternatives................................ 11
A.2. Script Negotiation........................................ 11
Appendix B. Change History........................................ 11
Normative References.............................................. 12
Informative References............................................ 12
Author's Address.................................................. 13
Lilly Expires September 15, 2005 [Page 2]
Internet-Draft Indicating and Negotiating Text Script March 2005
1. Introduction
Some written text in some languages can be represented in multiple
scripts, or writing forms. This memo proposes mechanisms for
identification and negotiation of script.
1.1. Script, Language, Charset, and Content
1.1.1. Script is Distinct from Language
Language is a characteristic of many forms of human communication.
For example, it applies to oral communication and to writing.
Script, however, applies only to a subset of communication forms.
Therefore, for purposes such as content negotiation, it is desirable
to indicate script separately from language.
1.1.2. Script is Related to Charset
Some charsets [I1.RFC2978] apply only to a single script. For
example, ANSI X3.4 applies only to Latin script, and KOI8 applies
only to Cyrillic script. In other cases, such as ISO 10646, script
can be inferred from the range of character codes used, provided one
has access to the content and is willing to analyze it.
1.1.3. Documents in a Single Language May Have Multiple Scripts
It is desirable to specify script separately from language, as
multiple scripts may be associated with a single language in a single
document or piece of text. It is not uncommon for text in Japanese,
for example, to contain a mix of Katakana and Hiragana, and some text
also contains Latin script for some words of foreign origin.
1.1.4. Documents in Multiple Languages May Use a Single Script
It is desirable to specify script separately from language, as a text
document written in a single script might contain multiple languages.
2. Requirement Levels
The key words "MUST", "MUST NOT", "SHOULD", "SHOULD NOT",
"RECOMMENDED", and "MAY" in this document are to be interpreted as
described in [N1.BCP14].
3. ABNF References
ABNF in this document uses grammar productions defined in
[N2.RFC2234] and [N3.RFC2822].
Lilly Expires September 15, 2005 [Page 3]
Internet-Draft Indicating and Negotiating Text Script March 2005
4. Header Fields
4.1. Indicating Script; the Content-Script Header Field
4.1.1. Semantics
The Content-Script field indicates the script or scripts used in a
piece of content, and (in the case of composite media including
entire MIME messages) any enclosed media content.
4.1.2. ABNF
content-script = "Content-Script:" [CFWS] script-list [CFWS] CRLF
script-list = script *([CFWS] "," [CFWS] script)
script = 4ALPHA ; script tag per ISO 15924:2004; script tags
; are case-insensitive protocol elements
Note that there is no provision for linear whitespace or line-folding
between the field name tag (a case-insensitive protocol element
[N3.RFC2822], [I3.RFC1958]) and the colon separating the field name
from the field body. Generators MUST NOT insert linear whitespace or
line folding between the field name and the colon.
4.1.3. Usage
4.1.3.1. When
A Content-Script field SHOULD be used to indicate script(s) for
non-trivial sequences of characters in human-readable text [I4.BCP18]
where script is not unique to the language in use.
It MAY be omitted for short texts where script may be determined from
the charset and character codes used, or where only a single script
is used for the language(s) applicable to the text.
It MAY be used for image data representing text, such as facsimile
image data.
It MUST NOT be used where no script is applicable, such as in audio
data of spoken language, or image or video media where no script is
applicable to the content.
It SHOULD NOT be used where visible text is merely incidental to the
content, as may be the case with some content using image, video, or
model media types.
4.1.3.2. Where
The Content-Script field MAY be used in the message header
[N3.RFC2822] of a MIME message [I5.RFC2045], in a MIME-part header
[I6.RFC2046], or in the header of a protocol which uses MIME header
Lilly Expires September 15, 2005 [Page 4]
Internet-Draft Indicating and Negotiating Text Script March 2005
fields to indicate content characteristics such as [I7.RFC1945] and
[I8.RFC2616].
The field MAY be used in the MIME-part header of a composite media
type [I6.RFC2046], if and only if it is equally applicable to each
part of the composite media type. When used with composite media
types, each component piece of content acquires the semantics
associated with the Content-Script field(s) in the enclosing
composite media type MIME-part headers, plus those of any
Content-Script fields in the MIME message header, plus those of any
Content-Script fields in that individual component media type's
MIME-part header. There is no mechanism to remove the semantics
associated with an enclosing composite media type, therefore a script
code MUST NOT be specified in a Content-Script field in a composite
media type MIME-part header if the concept of script is not
applicable to some enclosed media type or if some enclosed media type
does not use that script.
Its use is RECOMMENDED with media type message/external-body as it
may help to reduce wasted resources that might otherwise be expended
on retrieval of unintelligible content.
4.1.3.3. Who
The Content-Script field MAY be set by a message or content author or
a user agent acting on the author's behalf.
It MUST NOT be inserted, modified (except for non-protocol elements),
or deleted by submission, transport, or delivery agents
[I9.Crocker05].
It SHOULD, when present, be used by recipient user agents to assist
in presentation of human-readable content (presentation includes
display as well as text-to-speech conversion and similar
technologies).
4.1.3.4. How Many
It is RECOMMENDED that a single Content-Script field be used in the
header associated with a piece of content.
Multiple Content-Script fields MAY be used, and if present in a
single piece of content MUST be interpreted identically to a single
field listing all scripts listed in all Content-Script fields
applicable to the content.
4.1.4. Header Field Registration Template
[I10.BCP90] requires a registration template. The template is
provided in this section.
Header field name: Content-Script
Applicable protocol: mime
Lilly Expires September 15, 2005 [Page 5]
Internet-Draft Indicating and Negotiating Text Script March 2005
Status: standards track
Author/Change controller: IESG
Specification document(s): This document (when approved and an RFC
number assigned)
Related information: none
4.2. Script Negotiation; the Accept-Script Header Field
4.2.1. Semantics
The Accept-Script field indicates a set of preferences related to
script. See below for details of interpretations of preference
values.
4.2.2. ABNF
Note that there is no provision for linear whitespace or line-folding
between the field name tag (a case-insensitive protocol element
[N3.RFC2822], [I3.RFC1958]) and the colon separating the field name
from the field body. Generators MUST NOT insert linear whitespace or
line folding between the field name and the colon.
4.2.3. Semantic Details
Each script may have an associated preference value, indicated as a
decimal floating-point number with at most three decimal places. An
asterisk matches any script not explicitly listed. The default
preference value associated with a script or asterisk is 1. Scripts
with larger preference values are preferable to scripts with lower
preference values. A script SHOULD NOT be named more than once in an
Accept-Script field; if it is, however, the preference value
associated with the script is the last one presented with that script
in left-to-right order in the field body. If an Accept-Script field
is presented, any scripts not explicitly named have an implicit
preference value associated with an asterisk if one is presented in
the field; if there is no asterisk, the preference value for unnamed
scripts is implicitly zero. If no Accept-Script field is presented,
all scripts are to be presumed to be equally preferred.
4.2.4. Usage
4.2.4.1. When
An Accept-Script field MAY be used to indicate script preferences
where a suitable negotiation method, such as [I11.RFC2295] is
available, and the requester has a preference, and script is
potentially relevant to one or more media types under consideration.
It SHOULD NOT be used if any of those conditions is not met.
Lilly Expires September 15, 2005 [Page 6]
Internet-Draft Indicating and Negotiating Text Script March 2005
4.2.4.2. Where
Usage of an Accept-Script field is dictated by the negotiation
protocol and is outside of the scope of this document.
4.2.4.3. Who
The Accept-Script field MAY be set by a message or content requester
or a user agent acting on the requester's behalf.
It MUST NOT be inserted, modified (except for non-protocol elements),
or deleted by transport protocols.
It SHOULD, when present, be used by content-serving protocols to
supply preferred content to requesters when content in multiple
scripts otherwise meeting requests is available. This memo does not
address how content-serving protocols should balance preferences for
multiple characteristics of requested content; that is left to
content-serving protocol specifications and/or implementations.
4.2.4.4. How Many
At most one Accept-Script field may be presented.
4.2.5. Header Field Registration Templates
[I10.BCP90] requires separate templates for different "protocols".
Since the Accept-Script field is not a MIME field, and may be used by
a number of protocols which support content negotiation, templates
are provided in this section for such protocols using header fields
known at the time of writing.
4.2.5.1. HTTP Header Field Registration Templates
There are two Hyper text transfer protocols (HTTP): [I7.RFC1945],
[I8.RFC2616]. The registration templates for those protocols are
provided in this section.
4.2.5.1.1. HTTP/1.0 template
Header field name: Accept-Script
Applicable protocol: [I7.RFC1945]
Status: informational
Author/Change controller: IESG
Specification document(s): This document (when approved and an RFC
number assigned)
Related information: none
Lilly Expires September 15, 2005 [Page 7]
Internet-Draft Indicating and Negotiating Text Script March 2005
4.2.5.1.2. HTTP/1.1 template
Header field name: Accept-Script
Applicable protocol: http
Status: standards track
Author/Change controller: IESG
Specification document(s): This document (when approved and an RFC
number assigned)
Related information: none
4.2.5.2. RFC 2295 protocol template
Header field name: Accept-Script
Applicable protocol: RFC 2295 [I11.RFC2295]
Status: experimental
Author/Change controller: IESG
Specification document(s): This document (when approved and an RFC
number assigned)
Related information: none
4.2.5.3. HTCPCP template
Header field name: Accept-Script
Applicable protocol: RFC 2324 [I12.RFC2324]
Status: informational
Author/Change controller: IESG
Specification document(s): This document (when approved and an RFC
number assigned)
Related information: none
5. Media Feature Tag
[I13.BCP31] provides a registration template for registration of
media feature tags. Media feature tags may be used for content
negotiation such as in Content-Alternative, Content-Features, and
Media-Accept-Features fields [I14.RFC2912], [I15.RFC3297],
[I16.RFC2533], [I17.RFC2738]. The media feature tag registration
appears below.
Lilly Expires September 15, 2005 [Page 8]
Internet-Draft Indicating and Negotiating Text Script March 2005
5.1. Media Feature Tag Registration Template
Media feature tag name: script
Summary of the media feature indicated by this feature tag:
Indication of script(s) used in a text document using ISO standard
script name tags
Values appropriate for use with this feature tag:
[ ] 1. The feature tag is Boolean and may have values of TRUE or
FALSE. A value of TRUE indicates an available capability. A
value of FALSE indicates the capability is not available.
[X] 2. The feature has an associated numeric or enumerated value.
[ ] 2a. Signed Integer
[ ] 2b. Rational number
[ ] 2c. Token (equality relationship)
[ ] 2d. Token (ordered)
[ ] 2e. String (equality relationship)
[X] 2f. String (defined comparison) Comparison is as
case-insensitive strings. Strings are compared for equality
only (no ordering). The special value "*" matches any
script.
The feature tag is intended primarily for use in the following
applications, protocols, services, or negotiation mechanisms: MIME
Examples of typical use: script=Latn
Related standards or documents: [N4.ISO15924], [I2.15924Lists]
Considerations particular to use in individual applications,
protocols, services, or negotiation mechanisms: none
Interoperability considerations: Applications developed prior to
registration of this tag cannot be expected to recognize the tag.
Such applications will be unable to participate in script content
negotiation.
Security considerations:
Privacy concerns, related to exposure of personal information:
While script may identify an author as belonging to an ethnic
group, and that information might be abused, script information
can be determined from content. Negotiation of script may
reveal a preference for script, and that information also has
potential for abuse.
Lilly Expires September 15, 2005 [Page 9]
Internet-Draft Indicating and Negotiating Text Script March 2005
Denial of service concerns related to consequences of specifying
incorrect values: none known.
Other: none known.
Additional information: none
Keywords: none
Related feature tags: charset, language
Related media types or data formats: all subtypes of the text
media type.
Related markup tags: none known
Name(s) & email address(es) of person(s) to contact for further
information:
Bruce Lilly
blilly@erols.com
Intended usage: COMMON
Author/Change controller: IESG
Requested IANA publication delay: none
Other information: none
6. Acknowledgments
The author gratefully acknowledges discussions on this topic which
took place in December 2004 and January 2005 on the IETF discussion
mailing list.
7. Security Considerations
While script may identify an author as belonging to an ethnic group,
and that information might be abused, script information can be
determined from content as noted in section 1.1.2 Negotiation of
script may reveal a preference for script, and that information also
has potential for abuse.
8. Internationalization Considerations
This memo raises no new internationalization considerations.
9. IANA Considerations
IANA shall register the header field names defined in this document
(on approval by the IESG) in the permanent header field registry.
Lilly Expires September 15, 2005 [Page 10]
Internet-Draft Indicating and Negotiating Text Script March 2005
IANA shall register the media feature tag defined in this document
(on approval by the IESG) in the IETF tree of the media feature tag
registry.
Appendix A. Examples
A.1. Script Indication
A.1.1. Simple Example
MIME-Version: 1.0
Content-Type: text/plain ; charset=iso-2022-jp-2
Content-Language: ja
Content-Script: Hira, Kana
<Japanese language text in a mix of Katakana and Hiragana,
ISO 2022-JP-2 charset goes here>
A.1.2. Multiple Alternatives
MIME-Version: 1.0
Content-Type: multipart/alternative ; boundary=next
Content-Language: ja
--next
Content-Type: text/plain ; charset=iso-2022-jp-2
Content-Script: Kana
<Japanese language text in Katakana, ISO 2022-JP-2 charset goes here>
--next
Content-Type: text/plain ; charset=iso-2022-jp-2
Content-Script: Hira
<Japanese language text in Hiragana, ISO 2022-JP-2 charset goes here>
--next--
A.2. Script Negotiation
Accept-Script: Latn ; q = (foo) 1, Cyrl ; q = 0.5, * ; q = 0.001
The example expresses a strong preference for Latin script, followed
in preference by Cyrillic script, but accepting any script with a low
but non-zero preference value.
Appendix B. Change History
[[This change history will not be part of a published RFC]]
-00 to -01
• added this change history
• fixed ABNF bug in script production
Lilly Expires September 15, 2005 [Page 11]
Internet-Draft Indicating and Negotiating Text Script March 2005
• reformatted ABNF
• added media feature tag description and registration template;
revised title accordingly
Normative References
[N1.BCP14] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997.
[N2.RFC2234] Crocker, D. and P. Overell, "Augmented BNF for Syntax
Specifications: ABNF", RFC 2234, November 1997.
[N3.RFC2822] Resnick, P., "Internet Message Format", RFC 2822, April
2001.
[N4.ISO15924] International Organization for Standardization (ISO),
"ISO 15924:2004 -- Codes for the representation of
names of scripts", March 2003.
Informative References
[I1.RFC2978] Freed, N. and J. Postel, "IANA Charset Registration
Procedures", BCP 19, RFC 2978, October 2000.
[I2.15924Lists] ISO has designated The Unicode Consortium as the
ISO 15924 Registration Authority. Lists of ISO 15924
codes may be obtained free of charge from
http://www.unicode.org/iso15924/codelists.html
[I3.RFC1958] Carpenter, B., "Architectural Principles of the
Internet", RFC 1958, June 1996.
[I4.BCP18] Alvestrand, H., "IETF Policy on Character Sets and
Languages", BCP 18, RFC 2277, January 1998.
[I5.RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet
Mail Extensions (MIME) Part One: Format of Internet
Message Bodies", RFC 2045, November 1996.
[I6.RFC2046] Freed, N. and N. Borenstein, "Multipurpose Internet
Mail Extensions (MIME) Part Two: Media Types",
RFC 2046, November 1996.
[I7.RFC1945] Berners-Lee, T., Fielding, R., and H. Frystyk,
"Hypertext Transfer Protocol -- HTTP/1.0", RFC 1945,
May 1996.
[I8.RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H.,
Masinter, L., Leach, P., and T. Berners-Lee,
"Hypertext Transfer Protocol -- HTTP/1.1", RFC 2616,
June 1999.
Lilly Expires September 15, 2005 [Page 12]
Internet-Draft Indicating and Negotiating Text Script March 2005
[I9.Crocker05] Crocker, D., "Internet Mail Architecture", Work in
progress (February 2005).
[I10.BCP90] Klyne, G., Nottingham, M., and J. Mogul,
"Registration Procedures for Message Header Fields",
BCP 90, RFC 3864, September 2004.
[I11.RFC2295] Holtman, K. and A. Mutz, "Transparent Content
Negotiation in HTTP", RFC 2295, March 1998.
[I12.RFC2324] Masinter, L., "Hyper Text Coffee Pot Control Protocol
(HTCPCP/1.0)", RFC 2324, April 1998.
[I13.BCP31] Holtman, K., Mutz, A., and T. Hardie, "Media Feature
Tag Registration Procedure", BCP 31, RFC 2506, March
1999.
[I14.RFC2912] Klyne, G., "Indicating Media Features for MIME
Content", RFC 2912, September 2000.
[I15.RFC3297] Klyne, G., Iwazaki, R., and D. Crocker, "Content
Negotiation for Messaging Services based on Email",
RFC 3297, July 2002.
[I16.RFC2533] Klyne, G., "A Syntax for Describing Media Feature
Sets", RFC 2533, March 1999.
[I17.RFC2738] Klyne, G., "Corrections to "A Syntax for Describing
Media Feature Sets"", RFC 2738, December 1999.
Author's Address
Bruce Lilly
Email: blilly@erols.com
Full Copyright Statement
Copyright © The Internet Society (2005).
This document is subject to the rights, licenses and restrictions
contained in BCP 78, and except as set forth therein, the author
retains all his rights.
This document and the information contained herein are provided on an
"AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE REPRESENTS OR
IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Lilly Expires September 15, 2005 [Page 13]
Internet-Draft Indicating and Negotiating Text Script March 2005
Intellectual Property Statement
The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in
this document or the extent to which any license under such rights
might or might not be available; nor does it represent that it has
made any independent effort to identify any such rights. Information
on the procedures with respect to rights in RFC documents can be
found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use of
such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository at
http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at
ietf-ipr@ietf.org.
Acknowledgment
Funding for the RFC Editor function is currently provided by the
Internet Society.
Lilly Expires September 15, 2005 [Page 14]