Description: Notes for header generators. The goal is interoperability. 0. General 0.1 CRLF is the rule for line termination, and no lone CR or newline character should appear in any message (header or body). If you "yahoo", you're breaking the rules. For atonement, you are required to read Jonathan Swift's "Gulliver's Travels". 1. Received header fields. 1.0 general considerations 1.0.1 Various RFCs permit different types of whitespace, line folding, and/or comments in various places. The conservative approach is safest, viz. use a single space character to separate elements and avoid comments unless absolutely necessary. If line folding is required, it should be implemented as a CRLF pair followed by a space character (not a tab). See RFC 821 section 4.1.2 for the location of space characters between elements. 1.1. from domain 1.1.1 The domain must be valid. It should be determined from the connection information (i.e. the TCP source address), not the SMTP EHLO or HELO parameter (unless that parameter is a fully-qualified domain name corresponding to an A RR which maps to the TCP source IP address). Less reliable information (e.g. HELO parameter) should be placed in a comment if it is desired to preserve it for tracing. However, RFC 821 does not have provision for comments. 1.2 by domain 1.2.1 The domain must be a fully-qualified domain name. To avoid confusion when tracing, the domain should be the host's A record name, not a CNAME or alias. 1.3 via link 1.3.1 This should only be included if the message is handled by a uucp gateway ("uucp" is the only valid parameter per RFC 1700). In particular, "via tcp" should not be used. 1.4 with protocol 1.4.1 The only valid parameters are "smtp" and "esmtp". This may be included to indicate whether the client initiated the session with EHLO (esmtp) or HELO (smtp). In particular, cruft such as "with Internet Mail Service" or "with Microsoft SMTPSVC" is illegal and must not be used. Only one with clause should be used. 1.5 id 1.5.1 Avoid it. RFC 821 and RFC 822 gave mutually exclusive syntax for the parameter. If you include it (it is optional) you will violate at least one RFC. RFC 2821 added another variant which conflicts with RFCs 821, 822, and 2822. 1.6 for 1.6.1 Avoid it (it's optional). RFCs 821 and 822 gave mutually exclusive syntax for the parameter. 1.7 the semicolon 1.7.1 Make sure that there is a space character before the semicolon. RFCs 821 and 2821 specify this as a trailing space character (821) or as trailing CFWS (2821) after each of domain, link, protocol, etc. However it is implemented, make sure that the space is there; don't run the semicolon up against the preceding element. 1.8 date-time stamp 1.8.1 It is mandatory. RFC 821 introduced the Received header field, referred to there as a "time stamp line". A time stamp line with no time stamp is an oxymoron. 1.8.2 Do not include the day-of-week name (it is optional in RFC 822, 2821, and 2822) as it is not permitted by the RFC 821 syntax for the Received header field. 1.8.3 Use only the official 3 letter English month abbreviations. 1.8.4 Use 4-digit year numbers. RFC 822 erroneously specified 2 digits (RFCs 733 and earlier permitted 4 digits), and this was corrected by RFC 1123, which amended the RFC 822 specification. 1.8.5 Make sure that there's a space between the year number and the hour (see 1.0.1). 1.8.6 The time should be specified with two digits each for hours, minutes, and seconds, with colons separating the fields. There should not be any whitespace or comments in the time. 1.8.7 Using UTC rather than local time avoids issues such as Daylight Savings Time (or equivalent), users who set the wrong local time zone (this presumes that the operating system uses UTC internally, which is the only sane thing to do anyway), etc. Unless local time is the same as UTC, the zone should be specified as -0000 (see RFC 2822, section 3.3). Use of UTC also makes comparisons of time stamps trivial (you might be surprised how many times the computation of UTC from local time and offset or vice versa has been botched -- even RFC 821 got it wrong for the one-letter military zones). The ANSI/ISO C function gmtime() may be used to obtain UTC on machines supporting an ANSI/ISO C environment. 1.8.8 Use only numeric offsets for the zone. There should be no whitespace, line folding, or comments between the '+' or '-' and the 4-digit offset. If local time is used (see 1.8.7), the offset should be the official offset for the local time zone (e.g. -0801 is nonsense, because there is no time zone whose offset is not a multiple of 15 minutes). The zone is not optional. 1.8.9 Seconds should be included (optional in 822, 2821, 2822) as the RFC 821 syntax requires seconds to be present. 1.8.10 End the Received header field immediately after the zone. Resist the urge to put a comment after the zone (it is not permitted by the RFC 821 syntax). 2. Date, Resent-Date, Expires header fields 2.1 See 1.8.3 through 1.8.8. 3. From 3.1 Use a single mailbox specification (RFC 1036 does not permit multiple mailboxes). If the message has more than one author, pick one, and note the existence of the others in a Comments header field. 3.2 Use exactly one From header field. One is required, more than one is forbidden. 3.3 The mailbox should not have a comment associated with it. If you feel the urge to comment, use a Comments header. See RFC 2822, section 3.4. 3.4 The angle-bracketed mailbox form is slightly easier to parse than the form without the brackets (the brackets clearly delimit the mailbox). The angle-bracketed form also provides for a "display name" associated with the mailbox. RFC 1036 requires a display name if the angle-bracketed mailbox form is used. Therefore, if there is a display name associated with the mailbox, use it with the angle bracketed mailbox format (making sure that the display name is a quoted string if it contains any special characters (e.g. '.')), otherwise use the unbracketed format. 4. Sender, To, Cc, Bcc, Reply-To, Resent-From, etc. 4.1 see 3.1 and 3.2 for Resent-From 4.2 see 3.3 through 3.4 for mailbox specification. To, Cc, Bcc, Reply-To and Resent- variations of those also permit the more general "address" specification, which is either a mailbox or a group (see 4.3). 4.3 An address is a mailbox (see 3.3 through 3.4) or a group. A group consists of a display name (mandatory, not optional (RFC 1036 and RFC 822 prior to its amendment by RFC 1123 section 5.2.15); must be quoted if it contains any special characters) followed by a colon, an optional comma-separated list of mailboxes, and is terminated by a semicolon. Comments should not be used with addresses. If there are no mailboxes (i.e. an empty group), there should be a space character between the colon and semicolon. 5. Message-ID 5.1 Use it. It is required for Usenet messages (RFC 1036), and is the source for References and In-Reply-To. The syntax of the id is an angle-bracketed contruct. Inside the angle brackets, are a local-part, which contains no special characters, an '@', and a fully-qualified domain name, in that order. The local-part should not contain a slash character ('/'). 5.2 The Message-ID identifier is required to be globally unique. The use of a fully-qualified domain name (corresponding to an A record) meets part of that requirement. The uniqueness of the local-part for the host corresponding to the fully-qualified domain name can be achieved by any suitable mechanism. One that works well is to use a local-part constructed from the date, time, and some other identifier (e.g. a process identifier number) that ensures uniqueness for that time and date on that host. For example: <20010728122557.1234@baz.foo.bar.com> for a message generated by process 1234 on host baz.foo.bar.com on 28 July 2001 at 12:25:57 (note that in the example, the date and time use fixed width fields in order from most significant (year) to least significant (second)).