tools: rename convert tools to regular mk*
rfc1345convert: defer source download to makefile
hidden digraph entries for common character details Prepare fake digraphs with $ prepended for non-digraph pages.
charset: page with latin1 character table A 16x16 table with each iso-8859-1 byte in order. Rather than simply using chr(code point), it converts the byte range using Encode::decode, so it can just as easily display any other charset known by Perl. Uses the digraphs include for character details. While this may lack some characters, it is faster and easier. Plug the most glaring gaps by adding the entire ASCII range as single-character "digraphs". Linked from vi i^v (as code points can be entered there), but mostly useful as reference (not necessarily limited to vim).
rfc1345convert: prevent output encoding warnings Output is fixed code which should always be UTF-8, regardless of STDOUT capabilities (as it's rarely shown directly). Declare this to prevent warnings about wide characters (or worse).
rfc1345convert: documentation and automatic download If no source is specified on the command-line, the document is downloaded from ietf.org (the official RFC body). With the addition of some perldoc describing the script (including license and expamples) it could potentially even be usable by others (though I admit its scope is limited, but who knows).
digraphs: unicode 5.0 character details Recreate the digraphs include with version 5.0.0 of the Unicode Character Database, encompassing most of the glyphs missing from 4.1 (as well as minor category improvements). Fix the generator to account for undefined 'script' values, which now occur for private use characters which were previously (erroneously) categorized as 'Common'.
digraphs: alternate glyph string in include Allow the digraph include to specify string overrides in cases where a glyph should not be shown literally. These are: - Combining characters: prepend a placeholder. Dead chars are invalid on their own in fact. - ASCII control characters: substitude display symbols at U+24xx. Though browsers usually show a character placeholder, it's not very nice to send control chars directly. - Other control characters: show the replacement character U+FFFD. Actually with some (Linux) fonts, the anonymous code point fallback is more descriptive, but better to be on the safe side (they still have semantic value after all).
digraphs: map private use characters to modern equivalents RFC-1345 contains several characters in the private use block (for various unofficial proposals at that time) which by now mostly have official Unicode designations. Using the character value instead of intended meaning is imho stupid (as the digraphs don't make any kind of sense for most modern usage) even though Vim and other adopters do so, probably unknowingly. Try to convert these to suitable standard equivalents (going by character names, Google, context, and some guesswork).
digraphs: mark reversed matches At undefined digraph points, at least Vim also recognizes another digraph with its characters swapped if it exists. Style these cases differently, to say it might actually do something as well. Also hover the code, but do not show any additional information to discourage actual usage instead of the original.
digraphs: control character names Control characters are all named <control>, which is useless to discern them (especially since these glyphs aren't very descriptive either). Substitute the old Unicode 1.0 names for good identification purposes. In the few (latin1) cases where there's no name, at least add the code point.
digraphs: custom shiar digraphs Quite some personal addendums not in the official RFC-1345. Indicated as such, so should not be much of a hindrance (on the contrary, people looking for these missing characters will find and perhaps add them themselves).
digraphs: mark latin/ascii characters Add classes if characters belong to the 'Basic Latin' or 'Latin-1 Supplement' blocks (i.e. are ASCII or latin1), and indicate these on the digraphs page.
cache unicode character details in digraph include Looking up UCD data on page generation is quite intensive for this many characters, so instead prefetch it by rfc1345convert and store it with the static digraph data.
rfc1345 digraphs include generator Put all official RFC-1345 digraphs in the digraphs.inc.pl include, by downloading the original RFC text and converting it using rfc1345convert.