Description: overview of mparse, its installation and use mparse READ_ME.txt version 0.106 2006/08/06 00:56:16 Introduction: Mparse is a validating parser and generator of RFC 822 / 2822 / MIME text messages built as a library archive based on a lexical analyzer and parser derived from the RFC BNF specifications. The library provides for calling supplied functions when message constructs are parsed and can provide warning and error messages interspersed with the message header fields. It can serve as the skeleton for applications which need to be able to process text messages (mail and news user agents, submission agents, transfer agents, delivery agents, spam filters, cancelbots, etc.). Additional library functions provide for generating standards-compliant messages and message components, such as dates, message-identifiers, and domain names and literals. A program built around the library is provided which will parse messages and report syntactic errors, deprecated constructs, and semantic errors per relevant RFCs. This program can be used as an aid to authors (and editors) who use examples in their texts to ensure that the examples comply with the relevant RFCs. It can also be used to examine messages generated or modified by user agents and transfer agents, reporting on illegal constructs or those which may lead to interoperability problems. A second program generates a simple RFC-compliant message. A third program generates message fields from supplied arguments, possibly affected by environment variables. Source code is also included for a few additional sample programs which demonstrate how the library can be used. This software is OSI Certified Open Source Software. OSI Certified is a certification mark of the Open Source Initiative. The software is distributed under the zlib/libpng license; see the file LICENSE included with the distribution. Prerequisites: To rebuild after modifying the grammar or lexical analyzer files, you will need GNU bison 1.33 or later and flex 2.5.31 (beta) or later. To build from the source without modifying the grammar or lexical analyzer, these are not required, as the corresponding C source files are provided. You will of course need a C compiler and something resembling a POSIX-compliant operating system. To maintain the time zone, charset, country, language code information, etc., obtain the latest data from sources referenced in the corresponding files. If you have wget, this will happen automatically when you run make. You'll also need tar, gperf 3.02 or later, and awk. Data relevant to mparse is not expected to change often, although MIME media types seem to be added rather often. Configuration: Edit the lines near the top of the makefile to configure for local conditions. See the comments in the makefile for details. When you have finished editing the makefile, run make. You should end up with an executable named mparse. If you get link errors referencing strcopy or strcasecmp or strncasecmp, go back and reread the comments in the makefile and try again. The executable "hooktest" can be built to test the hooks called when message components are encountered. Hooktest is also used for regression testing, which can be invoked by making the target "regression". The mparse library (or libraries if you have libtool for building shared libraries) and relevant header file can be installed by make install. To build an application using the parser, write application level functions to process the message header and body components. The parser will call the functions when the corresponding message component is encountered. Pointers to the functions are supplied to the parser via the "mparse_hooks" structure which is pointed to by a pointer in the "mparse_message" structure which is pointed to by an argument passed to the parser. See mparse.c for an example of how to configure run-time options and how to call the parser. As distributed, a basic application which can report various types of syntax and compatibility errors and warnings, and which can extract sections from multipart MIME messages is built: Usage: mparse [options] [files] options: [-a] report errors relative to all RFCs [-B] byte-stuff input (prefix body lines beginning with '.' with '.') [-b] remove byte-stuffing (strip '.' from start of body lines) [-D[g][l][e][t][h][n][p][r][s]] debug grammar, lexical analysis, storage associated with errors, tokens, fields, entity, mime parameter structures, protocol status structures, and/or lists [-d] report errors relative to the DNS RFCs (1034 & 1035) [-g] field generation rules (no obsolete forms) [-h] fields only; don't parse or echo body [-m] report errors relative to the MIME RFCs (2045-2049, 2231) [-n] suppress errors [-o] only report errors (suppress warnings) [-q] (quiet) suppress echoing [-R] repair errors if possible [-r] report errors relative to the Host Requirements RFC (1123) [-s] report errors relative to the current SMTP RFC (2821) [-t] report errors relative to the current Text Message RFC (2822) [-u] report errors relative to the Usenet RFC (1036) [-v] set primary processing context to indicate message validation [-w] count warnings as errors in exit status [-X] support experimental and private-use names [-x N] where N is a number, exclude errors relative to RFC N [-N] where N is a number, report errors relative to RFC N [-S rfc] report status and extended status for transport protocol RFC rfc option letters except D and x (but not numbers) may be combined, e.g. -must -822 recognized RFC numbers include: 724 Proposed Official Standard for the Format of ARPA Network Messages (obsoleted by 733) 733 STANDARD FOR THE FORMAT OF ARPA NETWORK TEXT MESSAGES (obsoleted by 822) 765 FILE TRANSFER PROTOCOL (obsolete) 772 MAIL TRANSFER PROTOCOL 780 MAIL TRANSFER PROTOCOL 788 Simple Mail Transfer Protocol (obsoleted by 821) 821 Simple Mail Transfer Protocol (obsoleted by 2821) 822 Standard for the format of ARPA Internet text messages (obsoleted by 2822) 850 Standard for interchange of USENET messages (obsoleted by RFC 1036) 977 Network News Transfer Protocol 987 Mapping Between X.400 and RFC 822 1026 Addendum to RFC 987 (Mapping between X.400 and RFC-822) 1036 Standard for interchange of USENET messages 1034 Domain names - concepts and facilities 1035 Domain names - implementation and specification 1036 Standard for interchange of USENET messages 1049 A CONTENT-TYPE HEADER FIELD FOR INTERNET MESSAGES 1123 Requirements for Internet Hosts - Application and Support 1204 Message Posting Protocol (MPP) 1505 Encoding Header Field for Internet Messages 1847 Security Multiparts for MIME: Multipart/Signed and Multipart/Encrypted 1864 The Content-MD5 Header Field 1958 Architectural Principles of the Internet 2017 Definition of the URL MIME External-Body Access-Type 2033 Local Mail Transfer Protocol 2045 Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies 2046 Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types 2047 Multipurpose Internet Mail Extensions (MIME) Part Three: Message Header Extensions for Non-ASCII Text 2049 Multipurpose Internet Mail Extensions (MIME) Part Five: Conformance Criteria and Examples 2156 MIXER (Mime Internet X.400 Enhanced Relay): Mapping between X.400 and RFC 822/MIME 2476 Message Submission 2503 MIME Types for Use with the ISO ILL Protocol 2231 MIME Parameter Value and Encoded Word Extensions: Character Sets, Languages, and Continuations 2298 An Extensible Message Format for Message Disposition Notifications 2311 S/MIME Version 2 Message Specification 2530 Media Features using DSN and MDN 2533 A Syntax for Describing Media Feature Sets 2557 MIME Encapsulation of Aggregate Documents, such as HTML (MHTML) 2586 The Audio/L16 MIME content type 2616 Hypertext Transfer Protocol -- HTTP/1.1 2633 S/MIME Version 3 Message Specification 2652 MIME Object Definitions for the Common Indexing Protocol (CIP) 2660 The Secure HyperText Transfer Protocol 2821 Simple Mail Transfer Protocol 2822 Internet Message Format 2919 List-Id: A Structured Field and Namespace for the Identification of Mailing Lists 3066 Tags for the Identification of Languages 3156 MIME Security with OpenPGP 3297 Content Negotiation for Messaging Services 3335 MIME-based Secure Peer-to-Peer Business Data Interchange over the Internet 3458 Message Context for Internet Mail 3462 The Multipart/Report Content Type for the Reporting of Mail System Administrative Messages 3464 An Extensible Message Format for Delivery Status Notifications 3692 Assigning Experimental and Testing Numbers Considered Useful 3798 An Extensible Message Format for Message Disposition Notifications 4194 The S Hexdump format 4409 Message Submission for Mail in addition, -0 can be specified to report "common sense" errors If no files are specified on the command line, standard input is read. If multiple files are specified, the messages will be concatenated on stdout, unless the -q (quiet) option is specified, in which case only the exit status provides information about the message(s) (number of errors, up to 255). Note that for best results, input should be in RFC 822/2822/2821 format, i.e. lines end with an old-fashioned mechanical teletype "carriage return" control character before the ASCII newline character. sed -e 's/$/'$'\r'/g may be useful to add them to files with lines separated only by ASCII newlines (the quoting above is important!). Hard parse errors (failure to comply with the very liberal grammar) are always reported, unless the -q (quiet) option is specified. Other errors may be reported as given by the options. RFC 2822 specifies two grammars; one for generation of messages and another for parsing messages. The -g option treats violations of the "generate" syntax as errors (if -a, -t, or -2822 is also specified), otherwise warnings are issued. The -o option suppresses warning messages, and only reports hard errors. Normally, the exit status is a count of hard errors; the -w option includes warnings as well as errors in the count (up to 255). Specifying the -h (header only) option will cause the message body to be ignored. The -q (quiet) option suppresses all output, including message header, warnings, and error messages. The exit status of the program gives the number of errors encountered (up to 255). Messages received via SMTP or POP or similar protocols may have been processed with "byte stuffing", i.e. lines beginning with a '.' may have had an additional '.' prepended. This can be removed by specifying the -b flag. Conversely, messages which are being sent as data to an SMTP or POP client may need to be so processed; this is done by specifying the -B option flag. The -b flag affects body sections which are processed, while -B affects only output. Therefore, a byte-stuffed message body can be processed with byte-stuffing removed while retaining byte-stuffing at the output by specifying both -b and -B. The relevant processing model is: message in -> [unstuff] -> parse/extract -> [[stuff] -> message out] More detail on the implementation and library functions can be found in the included man page, which is also available in PDF format. Output: Interspersed with the header fields that appeared in the input are three types of additional header fields (unless suppressed): X-NG: points to a parse error. An attempt is made to pinpoint the offending input. References to applicable RFCs are given where appropriate. X-Warning: warns of an obsolete or deprecated construct. If you're using mparse to check header fields generated by software that you've written, your software is in violation of one of the RFCs (or you've found a bug in mparse). The warning contains a reference to the applicable RFC and a message indicating the nature of the problem. X-Err: is emitted after the last input header field if there is some problem with the number of input header fields (see RFCs 822, 1036, and 2822 for details of the requirements for standard header fields, other RFCs for extension header fields). Lines regarding problems encountered in the body of the message are output to stderr (unless suppressed) after the message has been parsed. Examples: The first example reports hard errors (but not warnings) relative to RFCs 1036 and 2822 for a sample message: mparse -out example Path: foo!bar@frammis.gov,blurfl!not.for.mail From : a@b.com, c@d.edu X-NG: ->, RFC 1036 [2.1.1] requires a single mailbox in the From field (2) To (foo) : Foo Bar X-NG: (foo) <- RFC 2822 [3.6, 4.5] prohibits CFWS after field name Subject : Test Date : Fri, 31 Dec 72 09:55:06 -0600 X-NG: ->Fri RFC 2822 [3.3] prohibits date inconsistency (day-of-week) (Sun) Newsgroups: comp.mail, news.general X-Err: RFC 1036 [2.1.5] requires exactly one field (0 Message-ID) X-Err: RFC 2822 [3.6.2, 4.4] requires a Sender field when there are multiple From field addresses (2 addresses) This is a test. To report all errors and warnings except for Usenet (RFCs 850, 1036) idiosyncratic errors: mparse -a -x 850 -x 1036 example Path: foo!bar@frammis.gov,blurfl!not.for.mail From : a@b.com, c@d.edu X-Warning: <- RFC 2822 [3.6, 4.5] prohibits (when generating messages) CFWS after field name X-Warning: ->, RFC 850 [2.1.3] requires a single mailbox in the From field (2) X-Warning: ->, RFC 1036 [2.1.1] requires a single mailbox in the From field (2) To (foo) : Foo Bar X-NG: (foo) <- RFC 2822 [3.6, 4.5] prohibits CFWS after field name X-Warning: ->=0D=0A =0D=0A RFC 2822 [3.2.3, 4.2] prohibits (when generating messages) continuation line with only whitespace Subject : Test X-Warning: <- RFC 2822 [3.6, 4.5] prohibits (when generating messages) CFWS after field name Date : Fri, 31 Dec 72 09:55:06 -0600 X-Warning: <- RFC 2822 [3.6, 4.5] prohibits (when generating messages) CFWS after field name X-NG: ->Fri RFC 822 [5.2] prohibits date inconsistency (day-of-week) (Sun) X-NG: ->Fri RFC 2822 [3.3] prohibits date inconsistency (day-of-week) (Sun) X-NG: ->72 ridiculous year (1972) X-Warning: ->72 RFC 1123 [5.2.14] strongly recommends 4 digits (2 digits) X-Warning: ->72 RFC 2822 [3.3] requires (when generating messages) 4+ digits (2 digits) Newsgroups: comp.mail, news.general X-Err: RFC 822 [4.1] requires a Sender field when there are multiple From field addresses (2 addresses) X-Warning: RFC 850 [2.1.7] requires exactly one field (0 Message-ID) X-Warning: RFC 850 [2.2.1] requires Relay-Version be first header field X-Warning: RFC 850 [2.2.1] requires exactly one field (0 Relay-Version) X-Warning: RFC 850 [2.2.2] requires exactly one field (0 Posting-Version) X-Warning: RFC 1036 [2.1.5] requires exactly one field (0 Message-ID) X-Err: RFC 2821 [4.4] requires exactly one field (0 Return-Path) X-Err: RFC 2822 [3.6.2, 4.4] requires a Sender field when there are multiple From field addresses (2 addresses) This is a test. Note that compatibility warnings for RFCs 850 & 1036 were still issued. To silently report errors via exit status: mparse -utqno example || echo $? 5 The last example tests the generating functions, verifying that all relevant RFC specifications are met: gentest | mparse -a (try it on your local system) This package also includes a simple program to generate message fields, which demonstrates the method used by the library to enforce generation syntax rules. The program, hdrtest, is invoked with command-line arguments; the first is a field name and any remaining arguments comprise the field body. The colon separating the field name from field body should NOT be supplied (although it is possble to do so by supplying an empty string for the first argument, in which case the second argument's first word is used as the field name). The environment variables CHARSET and LANGUAGE, if present, are used if any field body content must be encoded. Any syntax errors are printed, both before and after mparse attempts to fix errors in the supplied input. Canonical forms of the field name and any relevant keywords are used. Examples (text supplied by Jacob Palme): hdrtest subject "Tid fär nästa möte med CMC-forskargruppen" raw string: subject:Tid fär nästa möte med CMC-forskargruppen X-NG: ->f=E4r RFC 822 [3.2, 3.3] prohibits non-ASCII octet X-NG: ->f=E4r RFC 2822 [3.2.3, 4, 4.1] prohibits non-ASCII octet X-NG: ->n=E4sta RFC 822 [3.2, 3.3] prohibits non-ASCII octet X-NG: ->n=E4sta RFC 2822 [3.2.3, 4, 4.1] prohibits non-ASCII octet X-NG: ->m=F6te RFC 822 [3.2, 3.3] prohibits non-ASCII octet X-NG: ->m=F6te RFC 2822 [3.2.3, 4, 4.1] prohibits non-ASCII octet errors remain: Subject: Tid fär nästa möte med CMC-forskargruppen X-NG: ->f=E4r RFC 822 [3.2, 3.3] prohibits non-ASCII octet X-NG: ->f=E4r RFC 2822 [3.2.3, 4, 4.1] prohibits non-ASCII octet X-NG: ->n=E4sta RFC 822 [3.2, 3.3] prohibits non-ASCII octet X-NG: ->n=E4sta RFC 2822 [3.2.3, 4, 4.1] prohibits non-ASCII octet X-NG: ->m=F6te RFC 822 [3.2, 3.3] prohibits non-ASCII octet X-NG: ->m=F6te RFC 2822 [3.2.3, 4, 4.1] prohibits non-ASCII octet rfc 822, errors MPARSE_ERR_BIT8, remedies MPARSE_FIX_ENCODE: "fär" (0) rfc 2822, errors MPARSE_ERR_BIT8, remedies MPARSE_FIX_ENCODE: "fär" (0) rfc 822, errors MPARSE_ERR_BIT8, remedies MPARSE_FIX_ENCODE: "nästa" (0) rfc 2822, errors MPARSE_ERR_BIT8, remedies MPARSE_FIX_ENCODE: "nästa" (0) rfc 822, errors MPARSE_ERR_BIT8, remedies MPARSE_FIX_ENCODE: "möte" (0) rfc 2822, errors MPARSE_ERR_BIT8, remedies MPARSE_FIX_ENCODE: "möte" (0) CHARSET=iso-8859-1 hdrtest subject "Tid fär nästa möte med CMC-forskargruppen" raw string: subject:Tid fär nästa möte med CMC-forskargruppen X-NG: ->f=E4r RFC 822 [3.2, 3.3] prohibits non-ASCII octet X-NG: ->f=E4r RFC 2822 [3.2.3, 4, 4.1] prohibits non-ASCII octet X-NG: ->n=E4sta RFC 822 [3.2, 3.3] prohibits non-ASCII octet X-NG: ->n=E4sta RFC 2822 [3.2.3, 4, 4.1] prohibits non-ASCII octet X-NG: ->m=F6te RFC 822 [3.2, 3.3] prohibits non-ASCII octet X-NG: ->m=F6te RFC 2822 [3.2.3, 4, 4.1] prohibits non-ASCII octet sanitized for your protection and convenience: Subject: Tid =?ISO-8859-1?q?f=E4r_n=E4sta_m=F6te?= med CMC-forskargruppen CHARSET=iso-8859-1 LANGUAGE=se hdrtest subject "Tid fär nästa möte med CMC-forskargruppen" raw string: subject:Tid fär nästa möte med CMC-forskargruppen X-NG: ->f=E4r RFC 822 [3.2, 3.3] prohibits non-ASCII octet X-NG: ->f=E4r RFC 2822 [3.2.3, 4, 4.1] prohibits non-ASCII octet X-NG: ->n=E4sta RFC 822 [3.2, 3.3] prohibits non-ASCII octet X-NG: ->n=E4sta RFC 2822 [3.2.3, 4, 4.1] prohibits non-ASCII octet X-NG: ->m=F6te RFC 822 [3.2, 3.3] prohibits non-ASCII octet X-NG: ->m=F6te RFC 2822 [3.2.3, 4, 4.1] prohibits non-ASCII octet sanitized for your protection and convenience: Subject: =?ISO-8859-1*se?q?Tid_f=E4r_n=E4sta_m=F6te_med_CMC-forskargruppen?= The following example changes an RFC 724 date-time format with a bogus day-of-week into RFC 2822-legal form: hdrtest date "Doomsday, 5/12/77(comment)1234-cdt" raw string: date:Doomsday, 5/12/77(comment)1234-cdt X-NG: Doomsday<- RFC 822 [5.1] prohibits invalid day-of-week X-NG: Doomsday<- RFC 850 [2.1.4] prohibits invalid day-of-week X-NG: Doomsday<- RFC 1036 [2.1.2] prohibits invalid day-of-week X-NG: Doomsday<- RFC 2822 [3.3, 4.3] prohibits invalid day-of-week X-NG: ->5 RFC 822 [5.1] prohibits invalid month X-NG: ->5 RFC 2822 [3.3, 4.3] prohibits invalid month X-NG: ->/ RFC 822 [5.1] prohibits illegal syntax X-NG: ->/ RFC 2822 [3.3, 4.3] prohibits illegal syntax X-NG: ->/ RFC 2822 [3.3, 4.3] requires FWS (no comments) X-NG: ->/ RFC 822 [5.1] prohibits illegal syntax X-NG: ->/ RFC 2822 [3.3, 4.3] prohibits illegal syntax X-NG: ->/ RFC 2822 [3.3, 4.3] requires FWS (no comments) X-Warning: ->77 RFC 1123 [5.2.14] strongly recommends 4 digits (2 digits) X-NG: ->77 RFC 2822 [3.3] requires 4+ digits (2 digits) X-NG: ->(comment) RFC 2822 [3.3, 4.3] requires FWS (no comments) X-NG: ->1234 RFC 822 [5.1] requires 2 digits (4 digits) X-NG: ->1234 RFC 850 [2.1.4] requires 2 digits (4 digits) X-NG: ->1234 RFC 1036 [2.1.2] requires 2 digits (4 digits) X-NG: ->1234 RFC 2822 [3.3, 4.3] requires 2 digits (4 digits) X-NG: ->1234 RFC 2822 [3.3, 4.3] requires colon between hours and minutes X-NG: ->- RFC 822 [5.1] prohibits illegal syntax X-NG: ->- RFC 2822 [3.3, 4.3] prohibits illegal syntax X-NG: ->- RFC 2822 [3.3, 4.3] requires FWS (no comments) X-NG: ->cdt RFC 2822 [3.3] prohibits obsolete zone sanitized for your protection and convenience: Date: Thu, 12 May 1977 12:34 -0500 The next example reorders the Received field components per RFC-specified order and corrects the time stamp format: hdrtest received "with smtp by bar.edu from foo.org via uucp;Doomsday, 5/12/77(comment)123456-cdt" raw string: received:with smtp by bar.edu from foo.org via uucp;Doomsday, 5/12/77(comment)123456-cdt X-NG: ->with RFC 821 [4.1.2] requires one space character X-NG: ->with RFC 2821 [4.4] requires FWS (no comments) X-NG: ->by RFC 821 [4.1.2] requires specific clause order X-NG: ->by RFC 822 [4.1] requires specific clause order X-NG: ->by RFC 2821 [4.4] requires specific clause order X-NG: ->from RFC 821 [4.1.2] requires specific clause order X-NG: ->from RFC 822 [4.1] requires specific clause order X-NG: ->from RFC 2821 [4.4] requires specific clause order X-NG: ->via RFC 821 [4.1.2] requires specific clause order X-NG: ->via RFC 822 [4.1] requires specific clause order X-NG: ->via RFC 2821 [4.4] requires specific clause order X-NG: ->; RFC 821 [4.1.2] requires one space character X-NG: RFC 2821 [4.4] requires CFWS ->; X-NG: RFC 821 [4.1.2] prohibits day-of-week ->Doomsday X-NG: ->Doomsday RFC 821 [4.1.2] requires one space character X-NG: RFC 822 [5.1] prohibits invalid day-of-week ->Doomsday X-NG: ->Doomsday RFC 2821 [4.4] prohibits invalid day-of-week X-NG: RFC 2821 [4.4] requires FWS (no comments) ->Doomsday X-NG: ->Doomsday RFC 2822 [3.3, 4.3] prohibits invalid day-of-week X-NG: RFC 822 [5.1] prohibits invalid month ->5 X-NG: RFC 2822 [3.3, 4.3] prohibits invalid month ->5 X-NG: RFC 821 [4.1.2] requires one space character ->/ X-NG: RFC 822 [5.1] prohibits illegal syntax ->/ X-NG: RFC 2822 [3.3, 4.3] prohibits illegal syntax ->/ X-NG: RFC 2822 [3.3, 4.3] requires FWS (no comments) ->/ X-NG: RFC 821 [4.1.2] requires one space character ->/ X-NG: RFC 822 [5.1] prohibits illegal syntax ->/ X-NG: RFC 2822 [3.3, 4.3] prohibits illegal syntax ->/ X-NG: RFC 2822 [3.3, 4.3] requires FWS (no comments) ->/ X-Warning: ->77 RFC 1123 [5.2.14] strongly recommends 4 digits (2 digits) X-NG: RFC 2821 [4.4] requires 4+ digits (2 digits) ->77 X-NG: RFC 2822 [3.3] requires 4+ digits (2 digits) ->77 X-NG: RFC 821 [4.1.2] requires one space character ->(comment) X-NG: RFC 2821 [4.4] prohibits comments ->(comment) X-NG: RFC 2822 [3.3, 4.3] requires FWS (no comments) ->(comment) X-NG: RFC 821 [4.1.2] requires colon between hours and minutes ->123456 X-NG: RFC 821 [4.1.2] requires colon between minutes and seconds ->123456 X-NG: RFC 2821 [4.4] requires colon between hours and minutes ->123456 X-NG: RFC 2821 [4.4] requires colon between minutes and seconds ->123456 X-NG: RFC 2822 [3.3, 4.3] requires colon between hours and minutes ->123456 X-NG: RFC 2822 [3.3, 4.3] requires colon between minutes and seconds ->123456 X-NG: RFC 821 [4.1.2] requires one space character -->- X-NG: RFC 822 [5.1] prohibits illegal syntax -->- X-NG: RFC 2822 [3.3, 4.3] prohibits illegal syntax -->- X-NG: RFC 2822 [3.3, 4.3] requires FWS (no comments) -->- X-NG: RFC 2821 [4.4] prohibits obsolete zone -->cdt X-NG: RFC 2822 [3.3] prohibits obsolete zone -->cdt X-Warning: RFC 2822 [2.1.1, 2.3, 3.5] strongly recommends against line length > 78 octets (88) sanitized for your protection and convenience: Received: from foo.org by bar.edu via uucp with smtp ; 12 May 1977 12:34:56 -0500 The following example adds Received field 'id' and 'for' optional components, which are deleted due to conflicting requirements in RFCs 821 and 822: hdrtest received "with smtp for by bar.edu id <123@bar.edu> from foo.org via uucp;Doomsday, 5/12/77(comment)123456-cdt" raw string: received:with smtp for by bar.edu id <123@bar.edu> from foo.org via uucp;Doomsday, 5/12/77(comment)123456-cdt X-NG: ->with RFC 821 [4.1.2] requires one space character X-NG: ->with RFC 2821 [4.4] requires FWS (no comments) X-NG: ->< RFC 822 [4.1] requires addr-spec X-NG: ->by RFC 821 [4.1.2] requires specific clause order X-NG: ->by RFC 822 [4.1] requires specific clause order X-NG: ->by RFC 2821 [4.4] requires specific clause order X-NG: ->id RFC 821 [4.1.2] requires specific clause order X-NG: ->id RFC 822 [4.1] requires specific clause order X-NG: ->id RFC 2821 [4.4] requires specific clause order X-NG: RFC 821 [4.1.2] requires string as id ->< X-NG: RFC 821 [4.1.2] requires specific clause order ->from X-NG: RFC 822 [4.1] requires specific clause order ->from X-NG: RFC 2821 [4.4] requires specific clause order ->from X-NG: RFC 821 [4.1.2] requires specific clause order ->via X-NG: RFC 822 [4.1] requires specific clause order ->via X-NG: RFC 2821 [4.4] requires specific clause order ->via X-NG: RFC 821 [4.1.2] requires one space character -->; X-NG: RFC 2821 [4.4] requires CFWS -->; X-NG: RFC 821 [4.1.2] prohibits day-of-week -->Doomsday X-NG: RFC 821 [4.1.2] requires one space character -->Doomsday X-NG: RFC 822 [5.1] prohibits invalid day-of-week -->Doomsday X-NG: RFC 2821 [4.4] prohibits invalid day-of-week -->Doomsday X-NG: RFC 2821 [4.4] requires FWS (no comments) -->Doomsday X-NG: RFC 2822 [3.3, 4.3] prohibits invalid day-of-week -->Doomsday X-NG: RFC 822 [5.1] prohibits invalid month -->5 X-NG: RFC 2822 [3.3, 4.3] prohibits invalid month -->5 X-NG: RFC 821 [4.1.2] requires one space character -->/ X-NG: RFC 822 [5.1] prohibits illegal syntax -->/ X-NG: RFC 2822 [3.3, 4.3] prohibits illegal syntax -->/ X-NG: RFC 2822 [3.3, 4.3] requires FWS (no comments) -->/ X-NG: RFC 821 [4.1.2] requires one space character -->/ X-NG: RFC 822 [5.1] prohibits illegal syntax -->/ X-NG: RFC 2822 [3.3, 4.3] prohibits illegal syntax -->/ X-NG: RFC 2822 [3.3, 4.3] requires FWS (no comments) -->/ X-Warning: RFC 1123 [5.2.14] strongly recommends 4 digits (2 digits) -->77 X-NG: RFC 2821 [4.4] requires 4+ digits (2 digits) -->77 X-NG: RFC 2822 [3.3] requires 4+ digits (2 digits) -->77 X-NG: RFC 821 [4.1.2] requires one space character -->(comment) X-NG: RFC 2821 [4.4] prohibits comments -->(comment) X-NG: RFC 2822 [3.3, 4.3] requires FWS (no comments) -->(comment) X-NG: RFC 821 [4.1.2] requires colon between hours and minutes -->123456 X-NG: RFC 821 [4.1.2] requires colon between minutes and seconds -->123456 X-NG: RFC 2821 [4.4] requires colon between hours and minutes -->123456 X-NG: RFC 2821 [4.4] requires colon between minutes and seconds -->123456 X-NG: RFC 2822 [3.3, 4.3] requires colon between hours and minutes -->123456 X-NG: RFC 2822 [3.3, 4.3] requires colon between minutes and seconds -->123456 X-NG: RFC 821 [4.1.2] requires one space character -->- X-NG: RFC 822 [5.1] prohibits illegal syntax -->- X-NG: RFC 2822 [3.3, 4.3] prohibits illegal syntax -->- X-NG: RFC 2822 [3.3, 4.3] requires FWS (no comments) -->- X-NG: RFC 2821 [4.4] prohibits obsolete zone -->cdt X-NG: RFC 2822 [3.3] prohibits obsolete zone -->cdt X-Warning: RFC 2822 [2.1.1, 2.3, 3.5] strongly recommends against line length > 78 octets (123) sanitized for your protection and convenience: Received: from foo.org by bar.edu via uucp with smtp ; 12 May 1977 12:34:56 -0500 The next example is as the one immediately above, but the (conflicting) RFC 821 and 822 requirements are ignored: hdrtest -x 821 -x 822 received "with smtp for by bar.edu id <123@bar.edu> from foo.org via uucp;Doomsday, 5/12/77(comment)123456-cdt" raw string: received:with smtp for by bar.edu id <123@bar.edu> from foo.org via uucp;Doomsday, 5/12/77(comment)123456-cdt X-NG: ->with RFC 2821 [4.4] requires FWS (no comments) X-NG: ->by RFC 2821 [4.4] requires specific clause order X-NG: ->id RFC 2821 [4.4] requires specific clause order X-NG: RFC 2821 [4.4] requires specific clause order ->from X-NG: RFC 2821 [4.4] requires specific clause order ->via X-NG: RFC 2821 [4.4] requires CFWS -->; X-NG: RFC 2821 [4.4] prohibits invalid day-of-week -->Doomsday X-NG: RFC 2821 [4.4] requires FWS (no comments) -->Doomsday X-NG: RFC 2822 [3.3, 4.3] prohibits invalid day-of-week -->Doomsday X-NG: RFC 2822 [3.3, 4.3] prohibits invalid month -->5 X-NG: RFC 2822 [3.3, 4.3] prohibits illegal syntax -->/ X-NG: RFC 2822 [3.3, 4.3] requires FWS (no comments) -->/ X-NG: RFC 2822 [3.3, 4.3] prohibits illegal syntax -->/ X-NG: RFC 2822 [3.3, 4.3] requires FWS (no comments) -->/ X-Warning: RFC 1123 [5.2.14] strongly recommends 4 digits (2 digits) -->77 X-NG: RFC 2821 [4.4] requires 4+ digits (2 digits) -->77 X-NG: RFC 2822 [3.3] requires 4+ digits (2 digits) -->77 X-NG: RFC 2821 [4.4] prohibits comments -->(comment) X-NG: RFC 2822 [3.3, 4.3] requires FWS (no comments) -->(comment) X-NG: RFC 2821 [4.4] requires colon between hours and minutes -->123456 X-NG: RFC 2821 [4.4] requires colon between minutes and seconds -->123456 X-NG: RFC 2822 [3.3, 4.3] requires colon between hours and minutes -->123456 X-NG: RFC 2822 [3.3, 4.3] requires colon between minutes and seconds -->123456 X-NG: RFC 2822 [3.3, 4.3] prohibits illegal syntax -->- X-NG: RFC 2822 [3.3, 4.3] requires FWS (no comments) -->- X-NG: RFC 2821 [4.4] prohibits obsolete zone -->cdt X-NG: RFC 2822 [3.3] prohibits obsolete zone -->cdt X-Warning: RFC 2822 [2.1.1, 2.3, 3.5] strongly recommends against line length > 78 octets (123) sanitized for your protection and convenience: Received: from foo.org by bar.edu via uucp with smtp id <123@bar.edu> for ; Thu, 12 May 1977 12:34:56 -0500 Note that as RFC 821 requirements are ignored, the (corrected) day-of-week is retained (RFC 821 does not permit day-of-week in the time stamp). Limitations of mparse: There are some conflicts between RFCs 821, 822, 2821, 2822, 1034, 1123, 1700, 1036, etc. and quite a few gray (grey if you prefer) areas. These may be addressed as the standards are revised. RFC 1036 attempts to impose structure on the Subject field, which RFC 822 clearly states is unstructured. As RFC 1036 clearly states that RFC 822 has precedence in the case of conflicts, the conflicting RFC 1036 requirements are ignored. RFC 2822 is rather generous in allowing certain constructs. This parser warns about invalid old dates, invalid time zones, domain names and domain literals that are syntactically legal per RFC 2822, but which are meaningless. This is a feature which is enabled via the -0 (zero) command-line option. RFC 2822 eliminated the distinction between RFC 822 extension header fields and user-defined (those beginning with "X-") header fields. The RFC 822 scheme reduced the likelihood of namespace clashes. Mparse does distinguish between the two types, although they are treated similarly. Separate user hooks are provided for the two types. RFC 2184 (obsoleted by RFC 2231) used the number 1 for the first part of a parameter continuation. RFC 2231 uses 0. RFC 2184 (in effect from August to November 1997) parameter continuation numbering is not supported. This is a conflict between the two RFCs which cannot be readily resolved. RFC 2821 permits a mix of angle bracketed addresses and mailbox specifications in a Received header field "for" clause, as well as permitting multiple mailboxes there. RFC 2822 does not permit a mix or multiple addr-specs. The mix and multiple mailboxes permitted by the 2821 specification lead to parsing conflicts and are therefore not supported. A number of additional MIME extension header fields have been defined in several RFCs. Content-Base (RFC 2110) and Content-Location (RFCs 2110, 2557) are not yet fully supported. Content-Location (RFC 2557, 2616, 2912), also Content-Base (RFC 2110) These can't be unambiguously parsed as currently defined in the RFCs. Also, encoding of URIs and handling of long URIs are not appropriately defined. No provision for #fragments. Reporting-UA (RFC 2298) specification is ambiguous. A number of additional MIME types have been defined in multiple RFCs. At least the following may have parsing implications which are not yet fully supported: application/dicom (RFC 3240) may require id parameter. multipart/voice-message (RFCs 2421, 2423) requires text/directory be present, requires certain content in text/directory. text/directory (RFC 2425) uses structured fields. Many header field definitions have been obsoleted (e.g. due to name changes from RFC 1327 to 2156); it may be impractical to fully support all of the obsoleted header fields. Detailed documentation is available in the files mparse.3 and mparse.pdf. Bug reports: If you think that you have found a bug, please supply a test case (a minimal message that demonstrates the bug), a description of what you expected and why what you got didn't meet your expectations, preferably including references to relevant sections of pertinent RFCs, and optionally (if you think you know how to fix the problem) a patch to the relevant source file(s). Send bug reports to blilly@erols.com. Please include "mparse" in the Subject header field. Feedback other than bug reports is also welcome. Use the same email address as above. Including "mparse" in the Subject header field would be appreciated.