0.106 06 Aug 2006 N.B. major change: all publicly-visible macro names, structure names, and function names have been changed to reduce conflicts with other libraries and with application code namespace. See update.sed for a script which may assist in updating application source code. In conjunction with this major change, the opportunity has been taken to clarify the purpose of defined macros by suitable renaming removed some accumulated cruft verified build with gcc 4.0.1 and 4.0.2 and 4.1.0 tested with gperf 3.02 (lengthtable patch no longer required) tested with flex version 2.5.33 make.file recipe hacked, whacked, sliced, diced, folded, spindled, and multilated to work with bison version 2.1a new application subtypes: relax-ng-compact-syntax vnd.3gpp2.bcmcsinfo+xml vnd.medcalcdata vnd.ms-xpsdocument vnd.nokia.catalogs vnd.nokia.conml+wbxml vnd.nokia.conml+xml vnd.nokia.pcd+wbxml vnd.nokia.pcd+xml vnd.sealed.3df vnd.sealed.csf vnd.sun.wadl+xml vnd.crick.clicker vnd.crick.clicker.keyboard vnd.crick.clicker.palette vnd.crick.clicker.template vnd.crick.clicker.wordbank vnd.dvb.esgcontainer vnd.dvb.ipdcesgaccess vnd.frogans.fnc vnd.frogans.ltf vnd.omaloc-supl-init vnd.otps.ct-kip+xml vnd.pocketlearn vnd.qualcomm.brew-app-res vnd.sealed.tiff nasdata smil+xml vnd.3gpp2.sms vnd.fujixerox.HBPL vnd.ms-htmlhelp vnd.oma.dd2+xml vnd.umajin xenc+xml example H224 json mxf vnd.chipnuts.karaoke-mmd vnd.cups-ppd vnd.ezpix-album vnd.fujixerox.ART4 vnd.fujixerox.ART-EX vnd.hp-jlyt vnd.igloader vnd.nokia.iptv.config+xml vnd.oasis.opendocument.chart vnd.oasis.opendocument.chart-template vnd.oasis.opendocument.formula vnd.oasis.opendocument.formula-template vnd.oasis.opendocument.graphics vnd.oasis.opendocument.graphics-template vnd.oasis.opendocument.image vnd.oasis.opendocument.image-template vnd.oasis.opendocument.presentation vnd.oasis.opendocument.presentation-template vnd.oasis.opendocument.spreadsheet vnd.oasis.opendocument.spreadsheet-template vnd.oasis.opendocument.text vnd.oasis.opendocument.text-master vnd.oasis.opendocument.text-template vnd.oasis.opendocument.text-web vnd.scribus vnd.solent.sdkm+xml vnd.syncml.dm+wbxml vnd.uoml+xml vnd.vd-study new audio subtypes: vnd.CELP vnd.hns.audio asc rtp-midi vnd.4SB example dls eac3 t38 new video subtypes: vc1 vnd.hns.video example new image subtype: example new message subtype: example new model subtype: example new multipart subtype: example new text subtype: example new media type: example updated language tags updated country codes access-types source revision date changed; no useful substantive changes revised access.awk to work around broken source revision date lines in access-types file transfer-encodings source revision date changed; no substantive changes media-feature-tags source revision date changed; no substantive changes mail-parameters source revision date changed; no substantive changes cleaned up documentation grammar, usage, formatting added recent relevant RFCs to documentation handle unique RFC 4194 media type requirements added more informative comments to mparse.h fixed bug in octet counting when decoding base 64 more robust handling of broken fields in mparse_interpolate_components and mparse_mailbox_components fixed bugs in mparse_tokens_string, mparse_token_string added function to enumerate action keywords made line folding more robust fixed typos in comments code and test messages to handle unnecessary media type "example" registrations (documentation could use unregistered, unregisterable "X-example" without buggering up the IANA registry...) Sheesh, I'm amazed Taylor didn't register example+xml, vnd.example, vnd.example+xml, ... (but let's not give him any ideas...) improved presentation of illegal time zone error/warning added error/warning for RFC 822 lone CR/lone LF improved list consolidation updated status return functions for RFC 4409 added S/MIME message examples from RFC 4134 to regression tests 0.105 26 Dec 2005 new application media types: rlmi+xml rtx vnd.anser-web-certificate-issue-initiation vnd.anser-web-funds-transfer-initiation vnd.apple.installer+xml vnd.autopackage vnd.fluxtime.clip vnd.kahootz vnd.ms-cab-compressed vnd.ms-fontobject vnd.ms-ims vnd.osgi.dp vnd.piaccess.application-licence vnd.picsel vnd.preminet vnd.proteus.magazine vnd.ruckus.download vnd.zzazz.deck+xml xv+xml new audio media types: rtx vnd.cmles.radio-events vnd.dlna.adts new media feature tag: sip.message new image media type: vnd.adobe.photoshop updated language tags new URI scheme: info new text media type: rtx new video media types: rtx vnd.dlna.mpeg-tts handle spurious option flags in program names when generating version.c 0.104 12 Nov 2005 access-types source data date change (no substantive change) new application media types: atom+xml conference-info+xml poc-settings+xml vnd.sema fastinfoset fastsoap nss soap+fastinfoset vnd.marlin.drm.mdcf revised encoding.awk to handle bizarre changes to transfer-encodings data file new URI scheme: dns types source data file date change (no substantive change) language tags source updates incorporated new audio subtypes: amr-wb+ G7221 t140c VMR-WB image media subtypes data file updated no substantive change to subtypes text subtype registry revised, but no new addition new video subtype: 3gpp-tt feature tags source data file date change (no substantive change) via Received component source data file date change (no substantive change) verified build with bison 2.1 0.103 21 Aug 2005 more tweaks to rfc2822grammar_simplified.txt encoded-words in comments exclude '?' MIME 8bit, binary provision for message body new application media type xhtml-voice+xml just before the IESG rescinded its approval of the draft specifying it, and directed IANA to mark the subtype as obsolete more new application types: xcap-att+xml xcap-caps+xml xcap-el+xml xcap-error+xml mpeg4-iod mpeg4-iod-xmt mp4 language tags source updates incorporated audio subtypes source data date changed (no substantive change) new audio subtypes: 3gpp2 ac3 mp4 video subtypes source data date changed (no substantive change) new video subtypes: 3gpp2 mp4 0.102 21 Jul 2005 fixed typos in rfc2822grammar_simplified.txt adjusted "end" rule to accommodate multiple trailing comments in rfc2822grammar_simplified.txt thanks to Frank Ellermann language tags source updates incorporated via link source data date changed (no substantive change) 0.101 12 Jul 2005 added new audio subtypes BV16 BV32 media feature tags source data date changed (no substantive change) added code for December 2005 leap second updated for 2005j tzdata added new application media types ccxml+xml ecmascript javascript pls+xml srgs srgs+xml ssml+xml voicexml+xml xhtml-voice+xml Received field "via" link keywords source data date changed (no substantive change) 0.100 12 Jun 2005 added new application subtypes: dialog-info+xml resource-lists+xml rls-services+xml kpml-request+xml kpml-response+xml added new feature tags: sip.byeless sip.rendering updated language tags incorporated URI schemes source data date changes (no substantive change) added new text subtypes: csv RED troff added RFCs 3930, 3935, 3939, 4027, 4047, 4096 to list of relevant RFCs 0.99 01 May 2005 made index.html comply with 4.01 strict specification revised ID strings cleaned up some inconsequential issues that led to noisy gcc 4.0.0 warnings added some new RFCs to lists in documentation 0.98 26 Apr 2005 updated RFC lists in documentation added MTSN to layer chart new application media subtypes csta+xml CSTAdata+xml mbox shf+xml simple-filter+xml language tags updated URI schemes registry updated; no substantive changes new video media subtype raw tested with gcc 4.0.0; updated ID string declarations (gcc 4.0.0 would otherwise elide them :-/) moved object file dependencies to a separate file 0.97 12 Mar 2005 formatting tweaks to documentation removed unused variable in print_entity 0.96 10 Mar 2005 updated awk script for parsing media-feature-tags registry (IANA changed the format) 0.95 06 Mar 2005 updated for 2005b tzdata, then again for 2005c and yet again for 2005e, and once more for 2003f (at this rate, the time zone guys are going to run out of letters well before the end of 2005) updated documentation RFC lists fixed bug in make.file application media subtype registry updated 21 Jan, but no substantive change to subtypes audio media subtype registry updated 21 Jan, but no substantive change to subtypes character-sets IANA registry updated 28 Jan, but no substantive change to charsets more URI schemes improved checking of domain literal syntax per RFC 2821 (as also referenced by 2822) improved grammar debugging output for bison 2.0 preliminary handling of poorly-designed and poorly- documented Content-Convert and Content-Previous fields from draft-ietf-fax-esmtp-conneg-13 0.94 09 Jan 2005 updated comments regarding VPIM/MIXER compatibility new application media subtypes: samlassertion+xml samlmetadata+xml xop+xml new text media subtype vnd.esmertec.theme-descriptor text subtype registry revised in Jan 2005, but no new addition new video media subtype H264 added updated documentation to list RFCs 2305 and 3695 revised for bison 1.875e, tested/built with bison 2.0 partial support for RFC 3939 Caller-ID and Caller-Name fields more comprehensive error/warning reports for bad language tags updated for 2005a tzdata de-lint cleanup updated copyright notices for the new year (changed files only) 0.93 30 Nov 2004 fixed handling of i-* language tags 0.92 29 Nov 2004 revised charsets.awk script for improved portability (same results with different awk implementations) fixed missing script version generated by access.awk new application subtype "fits" new image subtype "fits" 0.91 17 Nov 2004 some tweaks to makefile: optimization flags split out from other flags corrected bug in set of URI path component reserved characters corrected bug in html encoding in mailto URI conversion 0.90 06 Nov 2004 added RFC 3938 to RFC table in documentation added argument to count_errors which permits masking of error values that are to be counted implemented message-to-mailto URI conversion where the message prototype being converted lacks some mandatory header fields updated Archived-At per draft-duerst-archived-at-02.txt updated for 2004g tzdata message context registration source changed (no substantive change to data) new audio media subtypes: dsr-es202050 dsr-es202211 dsr-es202212 added RFC 2480 to RFC list 0.89 25 Oct 2004 minor cleanup (typos in comments, etc.) more new application subtypes: dns im-iscomposing+xml new text media subtype: dns updated for 2004e tzdata fixed languages.awk vs. SCCS issue fixed bug in combine_partial.c SCCS version of check_disp.h 0.88 03 Oct 2004 new context value new audio media type ILBC language tags updated cleaned up some code (removed unused variables, etc.) changed calls to error message printing functions now call a function taking va_list for extra arguments printing functions now return size_t instead of int 0.87 01 Oct 2004 more bug fixes to digestify.c fixed bug related to sublist copying when copying tokens fixed bugs in mailto<->message conversion with binary body content fixed bug in minimize_mime_fields some performance improvements more new application media types: xmpp+xml updated for 2004d tzdata image media subtypes data file updated 21 Sep 2004 no substantive change to subtypes audio media subtypes data file updated 21 Sep 2004 and again 22 Sep 2004 audio/clearmode added message media subtypes data file updated 24 Sep 2004 no substantive change to subtypes revised references to draft-moore-auto-email-response to point to RFC 3834 added support for Solicitation field (RFC 3865) added more RFCs to reference lists added an argument to copy_field to allow specification of insertion point for copy added functions to extract sender mailbox, authors' mailboxes, To field addresses, List-Post field URIs to facilitate various specialized types of responses added functions to fragment and reassemble messages and to build and burst message digests digestify.c now uses new function Message Tracking RFC numbers finally assigned 0.86 25 Aug 2004 fixed bug in handling planetmirror alternative to elsie for tzdata updates more improvements in handling internal interfaces with DO_INCLUDE more typo corrections in documentation fixed bug introduced a few releases back in digestify.c 0.85 24 Aug 2004 ensure default message encoding value is set significant change in calls to some error output functions -- new API requires caller to provide a function and argument for output allowing greter flexibility than mere output to a FILE * (e.g. syslog). fputs_wrapper function provided for FILE * output. 0.84 23 Aug 2004 mirror alternative to elsie.nci.nih.gov for tzdata fixed declaration in uri.h updated expected gperf release number 0.83 22 Aug 2004 added identification strings to indicate pedigree of files generated from source data via awk scripts and gperf added functions to enumerate media types and subtypes documented more functions (made public) fixed bugs in insert_entity and create_multipart corrected typos in documentation pointed out by Laird Breyer build option to hide internal functions 0.82 09 Aug 2004 corrected typos in FAQ distribution tar archive name includes version number and unpacks in version-numbered subdirectory LICENSE changed to OSI Certified zlib/libpng license protection against unsigned integer overflow when counting errors and warnings cleaned up some exposed internal functions improved handling of errors when processing composite messages and media types added function to conditionally encode message entities rearranged source code files: smaller files separate object code (smaller executables) maintenance facilitated documented more functions (made public) fixed bug in quoted-printable encoding of CRLF in non- text media reverted to use of literal string for RFC sections to simplify maintenance 0.81 24 Jul 2004 handle more malformed messages 2004b time zone data audio media types source file updated, but no additions or other changes mail-parameters updated 0.80 17 Jul 2004 new message media type "tracking-status" added fixed bug preventing error messages and warnings for missing required header fields improved error/warning messages for fields in message/rfc822 media improved error/warning reporting for missing DSN and MDN fields 0.79 15 Jul 2004 parser now accepts (generating approriate error/warning mesages) more liberal syntax in Sender field body (lists, addresses) replaced manifest RFC number macros by actual RFC numbers (eliminating namespace issues and in preparation for some expected RFC revisions) fixed bug in handling of text media types with unknown charset, added test message 0.78 10 Jul 2004 fixed more field/token error counting inefficiency bugs revised MIME compatibility flags for some charsets 0.77 06 Jul 2004 fixed bugs in address.c (counting field errors inappropriately) 0.76 05 Jul 2004 fixed bug in handling of Mail-Copies-To (usefor draft) fixed bugs in handling of Path (RFCs 850,1050, usefor draft) (some issues still undecided) removed news field MIME parameters based on RFC 850/1036 incompatibility and WG consensus, except for Injector-Info cleaned up RFC section number macros added torture test; parse binary non-message file with perfect fidelity and no core dump added workaround for gcc optimization bug (mparse.c) 0.75 21 Jun 2004 fixed bug in recognition of MIME boundary delimiters 0.74 21 Jun 2004 fixed minor bug in hooktest affecting recursive parsing of complex messages 0.73 21 Jun 2004 fixed bug in end-of-message processing when header-only processing is specified improved handling of generation and repair 0.72 20 Jun 2004 added RFC 3028 reference and examples fixed bug in handling trailing body line whitespace vs. format=flowed, simplified tests added error reporting for missing separator lines and for missing MIME closing delimiter fixed bug in mime part count when new part is inserted fixed bug in generation of MIME-Version field during repair fixed bug in handling of malformed messages added fixes for missing separator and missing MIME close delimiter more fixes implemented and tested missing empty separator line missing MIME multipart close delimiter fixed bug in handling of EDI response MDN messages fixed bug in generation of error/warning messages for extremely long lines warning/error messages related to tokens on long lines are compressed yet another batch of new application media types rdf+xml soap+xml vnd.nokia.landmark+xml vnd.nokia.landmark+wbxml vnd.syncml.+xml built with bison 1.875d added more test messages 0.71 09 Jun 2004 hard syntax errors count as errors regardless of RFC modes yet another improvement in handling broken messages 0.70 05 Jun 2004 more improvements to handling of broken messages 0.69 05 Jun 2004 improved handling of broken MIME messages; less formal MIME parsing in order to accept more broken MIME -- spammers keep coming up with new ways to generate broken messages... added LICENSE and FAQ files to the distribution fixed omissions in distribution files still more new application media types spirits-event+xml 0.68 02 Jun 2004 fixed typographical error in make.file (bug reported by Gianluca Ramunno) removed unused variable in reply.c 0.67 29 May 2004 tweaked generated header file for library prefix handle lone dot at end of message when byte stuffing output via print_message more minor tweaks to documentation added description of maintenance via automatic hash table updates from reference data envelope_sender now returns what is specified, even if it's an empty path still more new application media types vnd.hcl-bireports RFC 2298 superseded by RFC 3798; problems remain 2004a version of time zone data used 0.66 25 May 2004 more new video media types more new audio media types fixed bug in check for message ending with CRLF more stringent error reporting for MIME transfer encoding inconsistency errors 0.65 23 May 2004 more new application media types minor tweaks to make.file 0.64 16 May 2004 fixed minor bug in test for byte-stuffing in print_token and print_tokens; test simplified more new application media types 0.63 12 May 2004 leaner distribution -- most individual files eliminated from distribution site (they of course remain in the compressed tar archive) handle byte stuffing in print_token and print_tokens for message output other than in copy mode (e.g. when outputting from in-memory message during end-of-message hook processing) more new application media types 0.62 09 May 2004 improved read from socket with timeout more new application media types libtool tweaks for version 1.5.6 improved handling of broken MIME messages; less formal MIME parsing in order to accept more broken MIME -- the 800-pound gorilla keeps coming up with new ways to generate broken messages... more minor tweaks to documentation 0.61 02 May 2004 included RFC 3464 example files suppress output of errors/warnings corresponding to drafts minor tweaks to documentation 0.60 01 May 2004 corrected formatting errors in documentation corrected some errors in documentation corrected some errors in comments in source files fixed minor bugs in is_ew_char added some reference material to documentation 0.59 28 Apr 2004 bug fixes, efficiency tweaks to status.c code fixed bug in handling some error returns fixed bugs in list termination tokens in grammar experimental handling of addresses in Followup-To fields 0.58 25 Apr 2004 more stringent checking of DSN extended status support for status and extended status codes associated with errors and warnings (maintained, but not used internally) fixed bug related to charsets specified in extended parameters improved documentation layout and content RFC 886 Illegal-Field and Illegal-Object fields supported added options to mparse test program to check transport protocol status, allocation, and to activate message repair better handling of Netscape/Mozilla language vs. list bug fixed bug in Received field component compatibility test fixed bug in handling extended parameters more new application media types 0.57 09 Apr 2004 added RFC references to documentation added application/message/transport/session description to documentation improved documentation layout added function to generate MIME part number string provision for prefix and/or suffix when printing message, entity added function to print a field count message, entity, field, token bytes via print functions with NULL FILE pointer added functions to process entity fields, entity body, field tokens via callback functions handle address-list in From field (RFCs 724, 733) date change in image media type registry (no significant content change) 0.56 04 Apr 2004 better description for message-type.gperf more liberal acceptance of RFC 2156 Message-Type fields (accept trailing FWS) unused code #ifdefed handle RFC 2821 "IPv6:" tag in domain literals improved domain literal error/warning messages extended handling of Content-Type field to recognize RFC 1049 syntax nested lists handled better (necessary for handling Encoding field) RFC 1154/1505 Encoding field handled test messages for RFC 3696 and for RFC 1049 Content-Type and Encoding field [N.B. 3696 not yet added to list of supported RFCs due to errors in 3696 w.r.t. quoting in address local-parts] more liberal parsing of disposition-notification-options parameter value lists (empty list elements) date change in application media type registry (no significant content change) 0.55 28 Mar 2004 improvements to mailto -> message conversion minor tweaks to make.file improved field_body() for unstructured fields added validity checks for RFC 2156 autosubmitted, importance, sensitivity, and message-type values more new application media types corrected bugs in RFC 987 field definitions explicitly handle RFC 850, 1036 prohibition of "all" in newsgroup name components 0.54 14 Mar 2004 new application media subtypes new audio media subtypes [these seem to be appearing at a fast and furious rate; I wish there were a better way to handle updates of this sort. Unfortunately, real-time run-time online lookup is not practical due to (a) disconnected operation, and (b) possible server unavailability or overload, and (c) spurious reference document format changes. Local text files converted to some sort of DFA at run-time might work (need to investigate available tools & techniques); however, that opens up some security concerns (access rights to the text files, DoS if they're removed, etc.). Yet another option would be to put the object files compiled from gperf-generated code in separate libraries, but that won't work for statically-linked applications, and has most of the same security concerns mentioned above. Sometimes, easy registration can lead to hassles in other areas... Handling additions at the application level is at least theoretically possible via the extension hooks; but then there's RFC 3692 and its requirement for explicit configuration for experimental name support (which might mean two extension hooks for each type of name; one for application level support for official extensions, and the other for private- and experimental use). All of which is further complicated by the need to handle specification requirements (e.g. mandatory parameters used with media types). Somebody is going to have to read the relevant specification and code for enforcement of its requirements. No changes for now.] added new functions to translate between messages and mailto URIs, also some related character class functions for URI character classes Resent- field names now appear as a single logical token (made of "Resent-" and remaining name physical tokens) check for all-numeric top-level domain names (RFC 3696) [N.B. 3696 not yet added to list of supported RFCs due to errors in 3696 w.r.t. quoting in address local-parts] some minor code tweaks and documentation updates rough outline for handling W3C PICS-Label field added to grammar file (commented out due to problems with the specification). Full support will also entail handling at least a subset of ISO date formats (I guess W3C hasn't heard of "If there are several ways of doing the same thing, choose one" (RFC 1958) -- we really dont need *two* different date formats in text messages). 0.53 08 Mar 2004 fixed bug in handling MIXER printablestring RFC 987/1026 fields supported: Bilateral-Info Delivery-Report-Content-Billing-Information Delivery-Report-Content-Intermediate-Trace Delivery-Report-Content-Original Delivery-Report-Content-Reported-Recipient-Info Delivery-Report-Content-UA-Content-ID P1-Content-Type P1-Message-ID P1-Recipient UA-Content-ID X400-Trace new application media subtypes fixed bug in error messages re. space-after-colon (RFCs 850, 1036) improved error/warning messages where offending token is beyond column 78 fixed bug related to reordering of Received field components 0.52 29 Feb 2004 updated field definitions per recent Usefor mailing list discussions new image and video media subtypes (twice in one day!) new application media subtypes added support for RFC 788 Mail-From header field added RFC 1958 support Per RFC 3692, experimental and private-use names are not recognized by default; message structure experimental flag must be set to enable recognition 0.51 22 Feb 2004 corrected typos in revised RFC 2822 grammar; also fixed some problems updated documentation 0.50 15 Feb 2004 more charset registry additions updated simplified RFC 2822 grammar based on ietf-822 feedback; extensive changes address more 2822 grammar bugs 0.49 02 Feb 2004 ripped out Subject field hacks. Some statistics: grammar file 4752 lines -> 4336 lexical analyzer source 1207 lines -> 1124 parser C file 490371 bytes -> 450761 lexical analyzer C file 431673 bytes -> 415460 fixed bug related to folding points when RFC 733 slash- date is converted to current date-time format handle trailing whitespace appropriately, including silly RFC 2646 requirements 0.48 01 Feb 2004 no encoded-words in undefined extension fields added utility function is_trailing_ws() provisions to encode or delete trailing whitespace in token_string and tokens_string added newly registered charsets accommodate changes to IANA media type registration page formatting updated simplified RFC 2822 grammar based on ietf-822 feedback added gentest test to regression tests added regression test file for obs-utext improved handling of obs-utext 0.47 04 Jan 2004 fixed typos in make.file updated time zone data 2003e (12-15-2003) updated media type data support for "Auto:" subject hack defined in draft-moore-auto-email-response fixed bugs in encoded-word syntax checks; improved error messages updated copyright notices for the new year more explicit subject hack handling 0.46 30 Nov 2003 fixed typographical errors in documentation make.file improvements for builds over NFS more media type updates (mostly proprietary cruft) conditional compilation of error code consistency checks and lexical analyzer code consistency checks major rework of received.gperf: consolidated/simplified item/value compatibility tests better repair of field component order updated time zone data per 2003d source version workarounds for libtool version inconsistencies more media type changes (IANA registry) overhauled error/warning function call interface and replaced manifest constants with macros errors/warnings now counted in 4 categories: RFC modes in effect RFCs not in effect supported internet drafts miscellaneous (application errors, system errors, common-sense errors, hard syntax errors) repair (fix_*) functions take arguments to control which categories of errors/warnings are repaired improved handling of backslash-quoted CRLF (RFC 822 [3.4.3, 3.4.5]) improved handling of CRLF in comments 0.45 05 Oct 2003 silently flag lone CR, lone newline tokens in non- binary bodies and in MIME boundary delimiters separate (internal) function to fix token errors text/prs.fallenstein.rst media type added added RFCs 724, 733, 850 to list of RFCs recognized by the mparse executable (usage message) added function message_identifier() to obtain message id from a message revised auto-submitted in accordance with draft version 4 internal-use flag to keep track of fields with token errors (instead of stroing in 1st token's val) multiple Content-Duration fields not permitted in a single entity 0.44 30 Sep 2003 incorporated updates to IANA registries -- many new media subtypes updated for 2003c version of time zone data added test message with binary body content including 0x00, lone 0x0a, and 0x0d0a sequence minor tweaks (casts, etc.) to clean up type matching more robust handling of input with lone CR or newline instead of CRLF 0.43 31 Jul 2003 fixed bugs in handling of backslash-quoted CRLF pair parse_test (regression test) improved added support for RFC 3458 Message-Context header field built with bison 1.875b fixed bug in handling of bad User-Agent fields added RFC 850 support: Relay-Version, Posting-Version, Date-Received fields consolidated character type tests in new file ctype.c cleanup of check_angle_addr_syntax and check_domain_syntax added functions to extract reply/followup addresses from a message improved error/warning messages (isolated local-part, unfold and squeeze WS in comments)) made field_body function public added functions to count mailboxes and to break mailbox into components (display name, bracket, route, local-part, @, domain) updated application media types: vnd.criticaltools.wbs+xml, vnd,wqd, vnd.yamaha.smaf-audio, vnd.yamaha.smaf-phrase are new revision to languages.awk to cope with revised source file HTML added option mode to token_string, tokens_string to canonicalize quoting; revised function calls to combine token context flags fixed bugs in list linking and list traversal code more application media types: vnd.kidspiration, vnd.nervana more support for Auto-Submitted field updated in draft-moore-auto-email-response-01 added functions to indicate if a message is a reply, an automatic reply tested code with Intel 7.1 compiler, updated makefile 0.42 15 Jun 2003 added functions to quote a string and to make a phrase fixed bug in line-folding algorithm protect encapsulated header fields in message/partial first piece from minimize_mime_fields 0.41 09 Jun 2003 corrected attributes for References field (multiple instances not permitted) revised collapse_composite, encapsulate, and related ancillary functions to handle message/partial creation and collapse (for reassembly) content-features not cached (multiple fields permitted) documented discrepancy between RFC 2046 [5.2.2] and RFC 2822 [3.6.1] 0.40 02 Jun 2003 revised handling of Received fields for message generation new function time_stamp to insert a Received field fixed bug in insert_field2 new function return_path to insert a Return-Path field gentest demonstration program revised to use higher level message generation functions new function original_recipient to insert a Original-Recipient field IANA language tags updated 0.39 27 May 2003 check/fix number of digits in hours, minutes, seconds, day in date-time fixed bug in token_string more RFC 733 addressing compatibility, but that leads to some oddities with some addresses: A. Jones @ Localhost is not legal RFC 733 (no dot in phrase (which (RFC 2822 requires acceptance of on parsing), nor is it legal RFC 2822 (no whitespace in dot-atom), but it is legal RFC 822. Also, Foo At Bar At Host (valid RFC 733) is ambiguous; it could be equivalent to Foo @ Bar @ Host or Foo At Bar @ Host Address local-parts are likely to appear to be phrases (RFC 733 equivalent of local-part), which is a significant change. Handling the legal RFC 822 address form above means that a phrase is now treated as a single logical (linked) token, just like a dot-atom, which changes some error/warning messages (an error associated with a token in a phrase will now print not only that token, but also the remainder of the phrase. generalized fix_insert_ws to fix_insert_token RFC 733 date-time syntax now accepted (optional '-' between day and month, month and year, and/or before zone, colon between hours and minutes and between minutes and seconds (if present) is optional (heuristics are used to determine year/hour/minutes/seconds if whitespace is missung) fixed bug in link_list (list element count) put list element count in list structure instead of list-head token's val RFC 561/724 date-time syntax now accepted (slash-date) RFC 724/733 message-id (mach-host-phrase) now accepted RFC 561/724/733 addressing fully recognized added support for RFC 724 Fcc header field application media types updated (vnd.fints, watcherinfo+xml) ISO country codes updated fixed bug related to byte-stuffing multipart messages that don't end in CRLF RFC 724/733 syntax for In-Reply-To and References (comma-separated list) supported 0.38 13 May 2003 revised URI grammar and Content-Base, Content-Location rulesets URI split into pieces (for folding long URIs) are checked via check_uricstrs_syntax (internal function) in new file uri.c; related changes to list handling handling of split URI (including in bracketed URIs in List- fields) improved; parsing now requires POSIX regex functions as RFC 2396 Appendix B regex method is used in validation of split URIs (after recombining, of course). Grammar rules are inadequate, as a URI might be split at an arbitrary place (when generating, very long URIs are only split at syntactic boundaries unless there is no alternative). validation of URI schemes enumeration of registered uri-schemes more URI (RFC 2396) syntax checks 1 token for Content-Location hook liberalized Content-Localtion, Content-Base and List- (RFC 2369) field grammar rulesets revised hdrtest.c and placed under SCCS control fixed bug in error counting fixed bug in unlink_list corrected typos in test files tm20 & tm21 corrected typos in documentation revised RFC 2156 grammar rulesets to provide more structured token streams revised RFC 2506/2533/2738 grammar rulesets to provide more structured token streams newsgroup and distribution parsing made more liberal, with errors detected and flagged in newsgroups.c added additional regression test improved handling of multiple '@' in addr-spec (specific error messages rather than generic parser "syntax error"); also handled in angle-addr and msg-id handle RFC 733 Message-ID (phrase); also handles some broken modern fields minor performance tweaks 0.37 09 May 2003 fixed some documentation typos added test message with ASCII NUL in various contexts, fixed some bugs related to display of NUL in error/warning messages made is_cfws() more robust added test message with unbalanced parentheses and stray [ in domain literal corrected omission in documentation for obsolete_name field_state structure member added function to enumerate known fields, changed some variable names to avoid clash with new function added function to enumerate mime-compatible charsets, changed some variable names to avoid clash with new function added function to enumerate dispositions, changed some variable names to avoid clash with new function handled reporting errors for some Content-MD5-related requirements combined/cached some information in header_end to improve performance ensure that body_end is called after header_end fixed bugs in error counting functions handle long encoded-words in fix_encode implemented fix_split_enc fixed bug in encode_b64_word fixed bug related to default media type and subtype when Content-Type field has a syntax error added test messages for Content-Type vs. Content-Transfer-Encoding issues revisions to distinguish entity domain as specified by Content-Transfer-Encoding vs. as determined by examining entity body content gperf for MDN options and modifiers gperf for external-body access-types modified tm? and tm?? files to use Content-Description instead of Comments handle extension and unregistered address-types, MTA-name-types, and diagnostic-types validate disposition notification option parameter attributes (including extensions via application hook) validate MDN disposition modifiers (including extensions via application hook) validate micalg vlaues in COntent-Type micalg parameter used with multipart/signed and in Received-content-MIC MDN fields gperf for magic newsgroup keywords all validation application hooks now take a struct token pointer revised date_time.c to handle validation of leap seconds some single-production rules eliminated from grammar improved error checking and correction during generation of In-Reply-To and References fields with obsolete phrase fixed bug in test program which prevented effective memory allocation debugging fixed bug in handling of list of "addresses" (for mixed angle-addr/addr-spec in Received field for clause) fixed bug removing obsolete phrase when generating In-Reply-To and References fields removed some redundant code added assertions revised for gperf 3.0 new application/vnd.ibm.rights-management media type new audio/dsr-es201108 media type 0.36 26 Apr 2003 package name changed to reflect generic message parsing made makefile recipes more robust fixed bug in decode_encoded_word fixed bug in fix_encode revised fix_field_errors to tag text strings if language is specified and valid consolidated tests for WS, FWS, CFWS in public utility functions improved field generation revised some fold_val values simplified lexical analyzer and parser by using gperf for more keywords; some still remain due to reduce/reduce conflicts which would otherwise be introduced. split parse.c in two revised implementations of token_string, tokens_string eliminated (internal) use of linktok in favor of the slightly slower but more flexible link_tokens revised error processing to defer string creation as long as possible and to prepare for error / warning message i18n added support for some RFC 724 constructs (empty From, To, Cc field bodies) removed attempted handling of spurious RFC 2821 mix of angle-addrs w/ addr-specs, multiple addr-specs (didn't work and caused parsing errors) registration information for RFC 2503 application media subtypes appears to have disappeared from the IANA registry! 2503 removed from list of supported RFCs. combined entity2.c into header_body.c changed append_body_* function calls to handle more of the grunt work of dealing with charset, language, and disposition combined MIME cache items into a structure for ease of maintenance when adding cache items; simplified clear/copy check for and remove duplicate parameters (based on attribute name) during field generation fixed bug when inserting multipart delimiter copied from parent entity's boundary parameter lexical analyzer now puts tokens in a field structure; some performance improvement and some API changes increased use of const qualifier to detect bugs, at the expense of several spurious compiler warning messages removed fdebug for function debugging; can use tracing on some OSes for the same purpose, but without the overhead (2%) and code bulk (6% of source code lines) application media types updated from IANA registry pkix-pkipath, vnd.llamagraphics.life-balance.exchange+xml, vnd.yamaha.hv-dic, vnd.yamaha.hv-voice, vnd.yamaha.hv-script video media types registry change date updated (no substantive change to registry) performance improvement in lexical analyzer support functions (token allocation and initialization) generation of index.html for navigation; cross- reference generated by cxref (not in tar archive) added support for Auto-Submitted field defined in draft-moore-auto-email-response-00 fixed error in tm27 revised checks for encoded-word context based on RFC 2047 clarification received from Keith Moore fixed bugs in error list insertion latest time zone information used 0.35 07 Mar 2003 fixed bug in digestify fixed bug in categorizing decoded type of encoded token added some design philosophy text to the documentation functions provided for decoding RFC 2047 Q and B encoded encoded-words warning and error messages added for improper use of encoded-words token structure has some added flags to support detection of inappropriate use of encoded-words token structure has added pointer to enclosing header structure field structure has added pointer to enclosing entity structure minor cleanup of grammar file to remove empty braces where possible in header list (bison "type clash" error avoidance) gperf for transfer encodings updated message subtypes (RFC 3261 "sip") gperf for all media types and subtypes corresponding changes to use gperf; performance improved for most functions in utils.c hooks added for extension tags for charsets, languages, media types and subtypes added IANA-registered language tags to ISO tags and added appropriate checks to bad_language entity structure now holds structure pointers to effective media type (and subtype) and transfer encoding (and domain) in addition to the pointers to the header tokens performance tweak for included strcasecmp and strncasecmp message/multipart subtypes are now validated when encapsulating or creating a multipart entity insert_fields_from_string function removed from public interface in preparation for handling RFC 2047/2231 encoding of header content. Use insert_header_line instead. eliminated redundant display_name token flag in favor of in_phrase improved error/warning messages for comments and phrases regex length limit on encoded text in encoded-words to avoid REJECT for some errors made as-read token list doubly linked to improve performance of several functions revisions to make.file to streamline build when generated header files are unchanged from previous build revised grammar file and basic field checks to deal more gracefully with Usefor space-after-colon requirement revised format of warning and error message X- headers for maximal portability header generation via parser and correction of errors CRLF between CLOSE_DELIMITER and epilogue is now stored as separator instead of as part of epilogue body text revised grammar to parse (with error messages) broken MIME parameters generated by Netscape/Mozilla more appropriate empty list element error/warning messages revised error handling; attach new struct error to token, field, or entity responsible for error or warning revised handling of MIME parameters so that RFC 2231 encoded parameters are output correctly via print_message, etc. reorganized functions in source files to maintain reasonable file size and to hide as many of the non-public functions as practical consolidated common token-handling code in lexical analyzer improved handling of message body hooks, particularly for MIME external-body eliminated REJECT and variable trailing context from lexical analyzer added more post-processing of flex-generated files to improve portability separate tokens returned for runs of spaces and runs of tab characters (some header definitions preclude tabs) added robust parsing of lower-case hex digits in quoted- printable encoded body text uncommented Final-Log-ID implementation (appears in 1894bis-02.txt draft) more specific error/warning messages for some types of illegal input preprocessed MIME parameter lists in order to simplify get_parameter various performance improvements case-independent lexical analyzer; case dependencies handled by functions called from parser reformatted source code for consistency (using GNU indent) moved most code out of grammar file function to generate RFC 2231-compliant MIME parameters added some RFC 733 syntax handled added gperf files for MDN Disposition field items updated RFC references removed some special-case lexical tokens (q, 0, 1) improved error/warning messages for extended parameters automatic update of gperf input file sources via wget relocated copies of gperf input file sources to top-level directory improved error/warning messages for msg-id series and lists, also address and mailbox lists moved most code out of lexical analyzer pattern file split header file private.h added RCS/SCCS id strings to document build removed y.output and spamdetect* from distribution 0.34 30 Sep 2002 interim release pending flex update and Usefor draft issues resolution protection of lex regular expressions from POSIX/ non-POSIX repetition sequence interpretation differences updated bison.patch (more pointer assignment fixes, unrelated to mailparse) uses 2002c tzdata released Apr reduced header files needed by applications to one file: mailparse.h (should be installed in the system include path) debugging code always included (avoids problems when library is compiled with DEBUG and application is not or vice versa) shared library via libtool 'install' target added optional timeout for read at EOF, primarily for input from stream associated with a socket which has O_NONBLOCK set (avoids hanging forever on dropped connections) added flag to suppress error headers fixed bug in processing of 'for' clause of Received headers RFC 3282 reference added, replacing draft document reference when unstuffing byte-stuffed messages (e.g. POP RETR data), a line with a lone '.' is considered the end-of-file when byte-stuffing, a line consisting of a lone '.' (terminated with CRLF) is written after the end of the message (e.g. as when sending in SMTP DATA) fixed memory fault bug after successful parser return eliminated redundant delete_parameters function (use free_parameters) added argument checking code in many functions uses flex 2.5.11 or later for reentrant lexical analyzer fixed bug when inserting first header from string entity mode 0 used when inserting a single header to avoid spurious warnings about header number and cross-reference issues uses reentrant functions for date/time; consolidate date/time string generation code removed never-reduced rule from grammar (thank you yacc; bison didn't catch it) uses flex 2.5.15 or later for mailflex.h instead of sc.h uses gperf for day-of-week and month names (faster) added IPv6 domain literal parsing added RFC 3297 header Content-alternative added 3297 examples to distribution added RFC 2530 header Media-Accept-Features cached header properties to improve performance improved checking of message header compatibility and maximum count dependencies in make.file improved to include source code directory mailparse.h requires flex 2.5.21 or later (flex keeps changing) simplified header inclusion for build fixed minor bugs in conversion of token lists to strings added RFC 2047 to documentation list of RFCs tag encoded words (RFC 2047) in headers check length of header lines with encoded-word per RFC 2047 illegal use of RFC 2047 encoding is checked and reported in quoted strings and parameters insert_header_line, build_header_string argument for header line length added added checks for ISO 639 language codes and ISO 3166 country codes updated gperf patch to use strncasecmp when -Acl gperf options are specified. That permits checking a language subtag without modifying the tag or copying the subtag improved determination of optimal key position set in gperf patch added checks for charset tags added Received-content-MIC MDN field per RFC 3335 removed MS executables from distribution because MS OS' are not POSIX compliant (no fcntl) added detailed syntax checks for address local-part 0.33 31 Mar 2002 some cleanup of fws rules in grammar support for alternative Content-Location header styles (w/o comments, but quoted) compaction and cleanup of grammar file RFC 2184 support removed revised header inclusion order to work around ast/gcc conflict issues 0.32 13 Mar 2002 removed dead code from parameters.c checks encoding of enclosed entities vs. parent encoding 0.31 12 Mar 2002 added RFC 2919 List-ID header fixed some CFWS bugs in token.c removed dead code from token.c cleaned up warnings vs. RFC 1036 and Usefor draft added RFC 2369 headers: List-Archive List-Help List-Post List-Owner List-Subscribe List-Unsubscribe fixed typos in fields.gperf 0.30 10 Mar 2002 fixed some typos in documentation made print_parameters public and added third argument to permit control of which components are output revised method of protecting line numbers while editing SCCS-maintained source files improved handling of unrecognized message subtypes and of unusual message subtype content (e.g. message/html) encapsulate() now will properly handle oddball message media types (e.g. message/html) fixed small bug in mailparse usage message Content-MD5 header (RFC 1864) added more stringent checking of RFC 1036 header requirements (SP after colon) RFC 1036 cmsg hack no longer supported (draft of successor may prohibit what RFC 1036 strongly recommends with no transition period) Control header checking now handled in mail_parse.y More formatting cleanup of grammar file changes to token structure to make list and CFWS processing simpler; required changes in several files rolled message_id.c into address.c check for default charset parameter in Content-Type header, which can be eliminated if equal to default us-ascii fixed subtle bug in handling extended parameter values handles extended parameters for Disposition-Notification-Options more header inclusion cleanup documentation for token links and list traversal message/eternal-body access-type "url" added per RFC 2017 0.29 25 Feb 2002 revised key sets for gperf runs no obsolete RFC 1327 headers; headers now final except for possible clarification of questionable ones (Content-Location, Reporting-UA, Final-Log-ID) bison 1.33 now used (patch for misc. bison issues included) [affected regression output] RFC 1700 has been obsoleted by RFC 3232; references updated more documentation updates documented partial support for problem headers and MIME media types added MIME media types: application/iso-10161-ill-1 (RFC 2503) application/ill-ddi (RFC 2503) application/index.* (RFC 2652) application/pkcs10 (RFC 2311) application/pkcs7-mime (RFC 2311, 2633, 2634) application/pkcs7-signature (RFC 2311, 2633, 2634) audio/L16 (RFC 2586) multipart/encrypted (RFCs 1847, 3156) multipart/related (RFC 2387) multipart/signed (RFCs 1847, 2311, 2633, 2634, 3156) text/directory (RFC 2425, 2927) gperf patch revised again; now also includes code to determine minimal key set (case-independence code also improved) 0.28 20 Feb 2002 revised gperf case-independence patch 0.27 20 Feb 2002 make.file changes to reduce bulk of ms executables combined SCCS ID and copyright in gen* files remaining RFC 2156 headers added: Discarded-X400-IPMS-Extensions Discarded-X400-MTS-Extensions Original-Encoded-Information-Types X400-Content-Type X400-MTS-Identifier X400-Received fixed bug in handling of RFC 1036 "Re:" hack added (buggy) RFC 2156 example some cleanup of header file inclusion 0.26 18 Feb 2002 fixed problem causing multiple SCCS IDs in executables and object files corrected typos in documentation more documentation improvements RFC 1894 headers added: Action Arrival-Date Diagnostic-Code DSN-Gateway Last-Attempt-Date Original-Envelope-ID Received-From-MTA Remote-MTA Reporting-MTA Status Will-Retry-Until minor changes in hooktest resulting in improved regression test output grammar revision handles more broken messages (e.g. the original RFC 1894 example 9.4) message/delivery-status (RFC 1894, 2852) supported Some RFC 2156 headers added: Alternate-Recipient Autoforwarded Autosubmitted Conversion Conversion-With-Loss DL-Expansion-History Deferred-Delivery Delivery-Date Disclose-Recipients Generate-Delivery-Report Incomplete-Copy Latest-Delivery-Time Message-Type Originator-Return-Address Prevent-NonDelivery-Report Priority Reply-By Supersedes X400-Content-Identifier X400-Content-Return X400-Originator X400-Recipients bugs when changing input buffers fixed spamdetect dehtml cleaned up and bugs fixed make.file revisions to facilitate building w/o gperf control.c now generated and maintained by gperf received.c now generated and maintained by gperf additional files via.gperf, via2.gperf, and with.gperf for related keywords inlist.c no longer used case-independence patch for gperf 2.7.2 included more cleanup of utils.c several changes to facilitate maintenance: header counts tallied in header_info instead of on the fly; also saves considerable in parameters structure application hooks stored in separate structure copy is simple memcpy function call check header fields vs. type of entity and issue error message if incompatible tentative (still awaiting response re. RFC issues) implementation for some headers: Content-Base (RFC 2110) Content-Location (RFC 2557) Reporting-UA (RFC 2298) combined headers.c into parameters.c, made more functions static more multipart and broken-MIME grammar tweaks option for substituting canonical header capitalization when inserting headers some cleanup and simplification of digestify added hooks for processing at end of MDN fields, end of DSN per-message fields, end of DSN per-recipient fields 0.25 12 Feb 2002 added Prerequisites section to READ_ME.txt restored order of calling hooks destroyed in 0.24; error and warning messages are cached per header until after user hooks are called made more functions static separated public and private function declarations #ifdef'ed some unused functions minor make.file tweaks; mostly for development aids (time) zones now generated from data by script and gperf; offset stored in struct with zone string added encoding functions for quoted-printable and base64 encodings documentation updated cleanup of token structure and some efficiency improvement related to token type matching message body now stored in a header structure; permits separate flags for headers and body content changed names of some functions to better describe what they do 0.24 7 Feb 2002 more make.file refinements for building distribution added ability to parse from a string functions for copying message structure more functions for generating content; generating and inserting folded header lines, body text lines, delimiters, encapsulating messages, generating multipart composite entities digestify executable consolidated several files and made more internal functions static call end of headers hook for messages with no body section fixed small bug in token_string() quoted-printable-like encoding for mom-ascii or control characters in warning and error messages (more concise) 0.23 1 Feb 2002 fixed a few more make.file bugs (distribution) URI parsing nearly ready (waiting for resolution of Content-Location and Content-Base BNF issues, as well as RFC 2396 issues) fine-tuning of newsgroup and system rulesets for RFC 1036 headers more improvements to error message output formatting and code maintainability resolved more issues with multipart messages; badly constructed encapsulations caught and reported added option to mailparse program to exclude an RFC from error reporting efficiency improvements (removed some old code from linktok, hash table for header field name lookup) eliminated memory leaks initialize_lexer no longer needs to be called prior to mailparse cleared bison/gcc problems with yyparse (now mailparse_internal) declarations, also obviates ugly casts in mail_parse.y additions to parameters structure to support multipart/ related; mutilinked structure is retained until end of message corrected typos in documentation separated reentrant flex structure from other parameters Disposition header supported (RFC 2298) make.file and makefile cleanup make.file improvements for making distribution make.file rework to eliminate gmake's annoying drivel eliminated most compiler diagnostic warnings (gcc and MSVC++) except for boilerplate code from flex and bison patched bad boilerplate code from bison/flex (with sed) to eliminate more compiler diagnostic warnings moved non-essential code out of grammar definition file restored gen* functions to libmailparse.a, eliminated separate libmailgen.a eliminated some unnecessary flags and data in parameters structure fixed bug in charset parsing (backslash) more robust handling of malformed messages Support for more RFC 2298 headers: Disposition-Notification-Options Disposition-Notification-To Error Failure Final-Recipient MDN-Gateway Original-Recipient Original-Message-Id Warning multipart/report (RFC 1892, 1911) message/disposition-notification (RFC 2298, 2421, 2422, 2423) simplified and improved parser/lexical analyzer interface partial support for message/delivery-status 0.22 17 Jan 2002 fixed bug in parsing extended predicates in Content-Features fixed make.file bugs 0.21 17 Jan 2002 recursive handling of DEFS in makefile while building Windows/Intel executables RFC 2424 added to mailparse usage message RFC 3066 added to mailparse usage message corrected some subtle bugs in error/warning reporting Content-Disposition header supported (RFC 2183) Content-Duration header supported (RFC 2424) Content-Features header supported (RFC 2533, 2912) Content-Language header supported (draft-alvestrand-content-language-02.txt) Accept-Language header supported (draft-alvestrand-content-language-02.txt) Sensitivity header (RFC 2156) Importance header (RFC 2156) Xref header (RFC 1036) improved language tags support added option for reentrant parser testing to hooktest fixed some remaining bugs in multipart MIME parsing handling of MIME message/external-body type encapsulating multipart or message types improved fixed bugs in addr-spec syntax check (showed up w/ 2822example9) improved X-Err, X-NG, X-Warning output (fold long lines) made some minor changes to RFC example files to avoid parse errors fixed bug triggered when yyerror called major revisions to parser and lexical analyzer: lexical analyzer is simplified and parser does more work in building constructs from simple tokens cleanup (reorganization) of mailparse.y added 2912 examples to distribution suffix rules to simplify makefile fixed bug tickled when looking for a content-type parameter when there are none added all RFC example files to regression test separated configurable part of makefile from static part 0.20 4 Jan 2002 fixed typos in documentation updated reference in zones.c updated copyright notices for the new year verified reentrant operation fixed minor bugs in .skl file; it should now work for non-reentrant lexical analyzers split library functions into two libraries: libmailparse.a for parsing libmailgen.a for generating messages and components cleaned up documentation and makefile removed html.o from libmailparse.a limit of parsing is at the transfer encoding level, not data content (html, postscript, etc.) 0.19 31 Dec 2001 more MIME issues: multipart/digest and message/* parsing cleanup of mime.c added regression test more examples 0.18 27 Aug 2001 major change: parser is reentrant yacc and lex no longer supported bug in address/mailbox list handling fixed hooks called as C functions at run-time; no need to recompile to change hooks parser and support functions placed in library archive changes to lexical analyzer and parser to support decoding of body section (e.g. quoted-printable) improved handling of 8-bit characters more strict adherence to MIME message/entity model: parser states now 687 (not that that matters now that yacc is no longer supported...) extraction of MIME parts is no longer supported in the example program; fully MIME-compliant extraction requires caching the entity headers, which is a bit too complex for a simple sample program headers in MIME entities are tested for RFC compliance improved handling of ASCII NUL another example program, spamdetect, is included in distribution documentation improved error messages 0.17 12 Aug 2001 added "version resource" to MS executable more syntax checks/warnings for length limits on local-part, subdomains, domains, angle-addrs, domains in routes byte-stuffing/unstuffing code and flags minor changes to mail_parse.y to reduce macro evaluation issues with hooks -lfl not needed with flex 2.5.4 0.16 9 Aug 2001 fixed lexical analyzer handling of difficult messages (1st body line begins w/ whitespace) end-of-body-section hook invoked after each MIME section 0.15 9 Aug 2001 fixed typos in generating removed extraneous code in errors.c cleaned up error handling call some hooks even with erroneous headers added hook for start of message (for initialization of user code) argument to token_string to control writing of CFWS fixed dependencies in makefile for hooks.h 0.14 8 Aug 2001 fixed bugs in newsgroup, distribution list handling 0.13 8 Aug 2001 Added support for RFC 1036 "Re:"; 666 parser states simplified mime.c and newsgroups.c code errors for empty newsgroups, distribution lists 0.12 8 Aug 2001 separated dows and mons, zones from date_time.c and made global: a) for use by other files, b) for ease in updating (zones) added hooks for end of headers, end of message added code to distribution for generating date-time, host names, domain names, domain literals, and message-ids fixed error messages in message_id.c consolidated error handling for address fields used base rules to handle resent- fields: number of parser states reduced to 660 some changes to handle return-path peculiarities improvements to error reporting for empty list elements and comments in address fields corrected parsing of domain literals corrected parsing of address lists where addresses have source routes fixed typo in errors.c changes to avoid namespace clashes and work around header file inclusion order idiosyncrasies with The World's Buggiest Software (TM) 0.11 5 Aug 2001 forced "binary" I/O mode for Windows executable to prevent extraneous doubled CRs in output fixed header inclusion in date_time.c (stdlib.h for abs) sed scripts in makefile to fix problems with AT&T yacc output corrected use of extension vs. user-defined header terms fixed typos in generating 0.10 3 Aug 2001 Fixed typo in errors.c Fixed typo in domain.c included strtoul.c moved msg-id syntax check to message_id.c, modified received.c, control.c, mail_parse.y accordingly fixed some problems arising from error handling of angle_addr and msg_id in mail_parse.y removed stray debugging statements angle-addr syntax checking in address.c; modified received.c, mailbox.c accordingly user pointer in token structure renamed struct err to struct token more compact error messages general cleanup of code some efficiency tweaks major changes to error/warning reporting: compatibility (interoperability) issues are always reported (except in quiet mode, and when explicitly suppressed). modes (RFCs and -g) may cause some of these to be treated as parse errors. hard parse errors are always reported (except in quiet mode) fixed handling of RFC 1036 "cmsg" Subject hack: number of parser states increased to 716 revised rfc2822grammar_simplified.txt w.r.t. ASCII NUL, RFC 1036 "cmsg" hack, obs-text issues and added some notes at the end of the file check start and end of header in each header line ruleset, as well as checking line length at each intermediate CRLF (line folding), saving the composite result in the value of the header field token for use in detecting bad header lines 0.9 29 Jul 2001 fixed error messages related to header counts removed unused token declaration from mail_parse.y notes on generating messages added to distribution added hooks for doing real work consolidated some rulesets in mail_parse.y; number of states is now 709 -- it can be handled by AT&T yacc (limit 750 states) again still more flags in token mailbox validation revised to handle full mailbox syntax address and list (mailbox and address) validation functions added to mailbox.c distribution and list validation functions added to newsgroup.c domain validation bug (null domain) fixed validate_date_time sets a time_t for the date and time if it's valid command-line option to suppress all output put MIME parameter extraction in mime.c reorganized rulesets in mail_parse.y 0.8 27 Jul 2001 RFC 822 no longer in default added capability to specify any numeric RFC as an option added option to extract MIME parts consolidated various types of string tokens RFC 821 warnings added consolidated checking of date-time in date_time.c consolidated checking of Received headers (except date-time) in received.c getopt emulation to work around implementation issues support for RFC 821 "string" in Received header 'id' clause issues with 2821 vs. 2822 Received header 'for' clause require not supporting 2821 mix of addr-spec and angle-addr or multiple addr-specs RFC section numbers for most warnings and error messages buffers for error messages and warnings are allocated as needed added option to not echo message body consolidated Control header checking in control.c some efficiency improvements tweaks to lexical analyzer to handle some errant messages 0.7 22 Jul 2001 added mailparse.exe to distribution bug fixes: cattok revised fixed handling of FWS inside comments and inside quoted strings multipart MIME message boundaries (top level only) recognized in message body command-line -m option for reporting MIME errors makefile tweaks for building distribution files simplification of grammar for comments consolidated warnings about lone CR, lone LF, NUL further simplification of date validation code cleanup of warnings and error messages 0.6 20 Jul 2001 cleaned up headers and header inclusion getopt: don't need to repeat flags for each file (files should be of the same type), can combine flags, can use -- to end options (i.e. can read a file named "-s") bug fixes: lexical analyzer field name rule could eat up too much text in body containing a colon; had to remove trailing context fixed error in rfc2822grammar_simplified.txt thanks to suggestions from Pete Resnick lexical analyzer rule for recognizing MIME multipart messages; but implementation is incomplete added MIME example files to distribution 0.5 19 Jul 2001 eliminated mktime() in favor of pared-down code that just normalizes the date and computes day-of-week (strictly speaking, normalization shouldn't be required), thus avoiding trouble with some library versions of mktime and simplifying the makefile. -t command-line option is needed to produce 2822 error reporting; default is RFC 822 rules 0.4 17 Jul 2001 MIME content-Type syntax confirmed; corresponding tweaks to lexical analyzer and parser lexical analyzer structure declarations vs. function prototypes (again; this time for sure) separate file with main() so parser can be called from within an application broke out support functions into separate files command-line options to enable SMTP (RFC 2821) and Usenet (RFC 1036) error reporting, which are now disabled by default (full parsing is still performed). ASCII NUL handled correctly with flex 2.5.4 (2.5.2 had a bug) 0.3 16 Jul 2001 portability issues: structure vs. function declarations in mail_lex.l before in mail_parse.y More general Usenet Path: header parsing. Raised number of shift/reduce conflicts to 94. MIME header parsing. shift/reduce conflicts: 132. handling multiple body parts with MIME headers in the body is not supported. parser grammar is now too complex for AT&T yacc. OK with Berkeley yacc 1.9, bison 1.28. bug fixes: days per month bug fixed 8-bit characters and NUL not allowed in ctext, dtext, qtext[*]. Lex complains loudly about non-portable classes. And lex doesn't ECHO 8-bit characters. NUL handling for flex doesn't yet work. * not even with obs- rules in 2822. RFC 822 permitted NUL in ctext, dtext, and qtext. reporting of lone '\r' and lone '\n' (as well as '\n' as line separator) improved. lexical analyzer now allocates a structure for each token except the empty line separating headers from body. This enables cleanup of all allocated structures in the event of a parse error. Side-effect; each header and each line of body text exists as a string of tokens until EOH (CRLF in body) or a parse error is encountered (this will enable handling of multi-part MIME messages). number of flags used per token reduced; code simplified makefile recognizes LEX = lex and performs needed filtering of source files automatically. However, lex shouldn't be used as it isn't 8-bit clean. many single-production rules eliminated from grammar. shift/reduce conflicts: 128 applied precedence and associativity for WS, CRLF, '('. shift/reduce conflicts: 26 applied precedence and associativity for '.'. shift/reduce conflicts: 18 applied precedence and associativity for ATEXTSTR, LANGUAGE, '\''. no remaining conflicts. 0.2 14 Jul 2001 bug fixes: Resent- header handling RFC 2821 doesn't mandate use of registered via and with terms multiple non-fatal parse errors handled per header eliminated most hard-coded RFC numbers Usenet header handling more rule consolidation in lexical analyzer cleanup of validation code added mktime() implementation to work around buggy ones 0.1 Fri, 13 Jul 2001 symbolic names for RFC numbers in constants.h lex compatibility issues: terse start condition names to avoid overflowing small fixed buffer no indented comments in rules section (replaced with blank lines via sed script to preserve line numbering) some changes to patterns (lex still complains about portability) major revisions to definitions and rules; pushed patterns down to rules from definitions and consolidated remaining definitions due to severe lex limits on definitions tables of header field names and received header clause names substantial changes to lexical analyzer - parser tokens for special characters; eliminated a bunch of tokens in favor of returning ASCII value compatibility changes in parser code: eliminated (X/Open) snprintf supplied stub ast.h supplied strcopy, strcasecmp implementations parser rules to handle pathological date-time year/hours constructs too complex to handle in lexical analyzer, e.g. 1Jan200122(10PM):33 -0000 all non resent- address fields are now validated per 2821 3.8.4, 2822 3.6.6 makefile revisions for configuration brief README this list of CHANGES tested with AT&T lex, AT&T yacc, Berkeley yacc, Flex 2.5.2, Bison 1.28 0.0 10 Jul 2001 initial release primarily for debugging issues with some lex implementations Description: change log for mparse