unicode-sampler.git
8 years agodrop greetings in "various" [3] languages
Mischa POSLAWSKY [Wed, 19 Aug 2015 09:21:22 +0000 (11:21 +0200)]
drop greetings in "various" [3] languages

Superfluous (especially near the end) as we already have more extensive
te(x)sts for English, Greek, and Japanese.

8 years agomore succinct font overview
Mischa POSLAWSKY [Wed, 19 Aug 2015 08:47:25 +0000 (10:47 +0200)]
more succinct font overview

Reorganise assortment to:
- 3 lines (uppercase, lowercase, non-letters) for better discoverability;
- letters grouped together, all with case counterparts;
- repick extended latin to cover a wider range of more common glyphs;
- include more ASCII symbols for programming and internet usage;
- reduce set of mathematical additions to a good comparison of essentials;
- restrict drawing glyphs to one example per type:
  arrow, box drawing (3 styles), block shade, dingbat.

8 years agogeneric font overview at beginning
Mischa POSLAWSKY [Wed, 19 Aug 2015 08:21:56 +0000 (10:21 +0200)]
generic font overview at beginning

Move font selection text near start (but below <pre>) to have a generic
overview/comparison before delving into specifics.

8 years agomove apl snippet with other code
Mischa POSLAWSKY [Wed, 19 Aug 2015 07:58:52 +0000 (09:58 +0200)]
move apl snippet with other code

Logical progression from Perl to even non-ASCII symbols.
Keep only the life program; the first function doesn't contribute as much,
and the second part is just unneeded gibberish.

8 years agoperl code snippet for additional programming symbols
Mischa POSLAWSKY [Wed, 19 Aug 2015 07:58:09 +0000 (09:58 +0200)]
perl code snippet for additional programming symbols

Golf entry by teebee for character referencing
<http://golf.shinh.org/p.rb?Break+Lorem+Ipsum+fixed>
which includes a good amount of \W ASCII, including the common dollar sign,
slashes, and quotes.

8 years agoruby code for basic programming language overview
Mischa POSLAWSKY [Wed, 19 Aug 2015 07:44:30 +0000 (09:44 +0200)]
ruby code for basic programming language overview

Inconsequential but thematic code (encodes and displays a Unicode entity)
written to include all braces (([{|}])), hash (#), and 0/O distinction.

Inspired by font comparison snippets at
<http://hivelogic.com/articles/top-10-programming-fonts>.

8 years agohtml code at top with open <pre>
Mischa POSLAWSKY [Wed, 19 Aug 2015 07:16:42 +0000 (09:16 +0200)]
html code at top with open <pre>

To compare appearance of basic XML (tags, attribute, entity)
and support rendering the following contents as text/html
by opening a preformatted (monospaced) tag early on.

8 years agorewrite introduction to avoid charset and mention updated authorship
Mischa POSLAWSKY [Wed, 19 Aug 2015 07:06:04 +0000 (09:06 +0200)]
rewrite introduction to avoid charset and mention updated authorship

File encoding is insignificant so don't mention it explicitly (test works
equally well if converted to another UCS-compatible charset).
Instead list versions of this file and the covered standard.

8 years agoextended cyrillic samples of sakha and kazakh
Mischa POSLAWSKY [Wed, 19 Aug 2015 06:35:16 +0000 (08:35 +0200)]
extended cyrillic samples of sakha and kazakh

Between these languages the most important non-Slavic letters are present,
including common non-Russian glyphs І Ә Ө Ү Һ, a ligature Ҥ, and variants
with descender or bar (Ң Қ Ғ Ұ).
Not complete, but good for a general impression of Turkish support.

Official translations from <http://www.ohchr.org/>.

8 years agobraille contraction missed by converter
Mischa POSLAWSKY [Wed, 19 Aug 2015 05:29:36 +0000 (07:29 +0200)]
braille contraction missed by converter

Groupsign -en- (lower e) may be used in the end of "queen" (explicitly
mentioned in <http://www.brailleauthority.org/literary/ebae2002.pdf>).

8 years agobraille of english pangram instead
Mischa POSLAWSKY [Wed, 19 Aug 2015 03:36:55 +0000 (05:36 +0200)]
braille of english pangram instead

Replace long story in "scientific" (and very artificial) GS8 notation
by a transcription of the English panphone using common Grade-2 British
courtesy of <https://www.branah.com/braille-translator>.

Loses coverage of some 8-cell dots, but includes common abbreviations and
practical orthography.  Use braille blanks U+2800 instead of spaces.

8 years agoadjust english phonetic sample to cover more sounds
Mischa POSLAWSKY [Wed, 19 Aug 2015 00:24:38 +0000 (02:24 +0200)]
adjust english phonetic sample to cover more sounds

* Reorder words for more a less contrived meaning.
* Replace "wanted before" by "looked for it", to add /ʊ/ (the only absent
  monophtong) without losing any sounds.
* Introduce /ʉ/ by using a slight Scottish accent for "hue".
* Include commonly found glottal stop before "all".

8 years agoenglish phonetic pangram/panphone for ipa showcase
Mischa POSLAWSKY [Tue, 18 Aug 2015 23:49:25 +0000 (01:49 +0200)]
english phonetic pangram/panphone for ipa showcase

Replace the poor "linguistics" section by the last example from
<http://www.quora.com/Is-there-a-text-that-covers-the-entire-English-phonetic-range>
which covers most English phonemes including more rare distinctions
(m/ɱ, x, l/ɫ, w/ʍ).

Manually transcribed in an attempt to cover most sounds naturally, using a
mostly Irish/generic pronunciation (I'm not native though).  Compare
<https://en.wikipedia.org?oldid=673810019> for an overview of regional
differences.  Alternate IPA transcriptions in native dialects found at
<https://www.reddit.com/r/conlangs/comments/2quvnf/make_a_dialect_of_english/>
but not used due to more limited inventories.

8 years agojapanese iroha in all scripts
Mischa POSLAWSKY [Fri, 14 Aug 2015 21:27:27 +0000 (23:27 +0200)]
japanese iroha in all scripts

Kanji, hiragana, and original version downloaded from
<https://en.wikipedia.org?oldid=670286422>.

Katakana transliteration from <http://www.columbia.edu/~fdc/utf8/>.

Halfwidth variant derived using perl -Mcharnames=:full -CS -pe'
package charnames; s/\S/chr vianame("HALFWIDTH ".viacode(ord $&))/ge'
with incompatible characters replaced by small forms (prefer coverage over
natural conversion) and a voiced mark appended for even more coverage.

8 years agoreplace chinese extension B character from extension A
Mischa POSLAWSKY [Fri, 14 Aug 2015 19:40:34 +0000 (21:40 +0200)]
replace chinese extension B character from extension A

U+4D85 is obviously incorrect; assume U+24D85 was intended.

8 years agochinese samples of extended unicode blocks
Mischa POSLAWSKY [Fri, 14 Aug 2015 19:36:51 +0000 (21:36 +0200)]
chinese samples of extended unicode blocks

Random characters from each block from <http://ctext.org/font-test-page>.

8 years agochinese sample text: 1st chapter of qian zi wen
Mischa POSLAWSKY [Fri, 14 Aug 2015 19:14:55 +0000 (21:14 +0200)]
chinese sample text: 1st chapter of qian zi wen

Classic coverage poem in traditional orthography downloaded from
<http://www.gutenberg.org/ebooks/24184>.

8 years agochinese transliteration of 3 choice characters -ü
Mischa POSLAWSKY [Fri, 14 Aug 2015 19:02:39 +0000 (21:02 +0200)]
chinese transliteration of 3 choice characters -ü

Selected most frequently used characters ending in ü with all its tones.
Covers the most difficult pinyin (multiple accents), some limited bopomofo,
IPA tone bars (combinable into contours), and traditional/simplified glyph
comparison.

8 years agochinese selection of 50 most common mandarin characters
Mischa POSLAWSKY [Fri, 14 Aug 2015 18:17:28 +0000 (20:17 +0200)]
chinese selection of 50 most common mandarin characters

Extracted from Modern Chinese Character Frequency List (updated 2005-12-21)
published by 笪骏 [DA Jun] <http://lingua.mtsu.edu/chinese-computing>.
These characters should cover 30% of modern chinese texts.

8 years agotibetan declaration of human rights
Mischa POSLAWSKY [Fri, 14 Aug 2015 17:25:09 +0000 (19:25 +0200)]
tibetan declaration of human rights

Good sample copied from ཝེ་ཁེ་རིག་མཛོད <https://bo.wikipedia.org?oldid=123541>.
Prefix the title for yig-mgo, and adoption date [1948-12-10] as found on
<http://blog.amdotibet.cn/aaa999/archives/82869.aspx> for numbers.

8 years agotamil and kannada poems from Kermit UTF-8 Sampler
Mischa POSLAWSKY [Fri, 14 Aug 2015 17:17:51 +0000 (19:17 +0200)]
tamil and kannada poems from Kermit UTF-8 Sampler

Extracted from 2012-05-07 version of <http://www.columbia.edu/~fdc/utf8/>
by Frank da Cruz.

8 years agoapl function for game of life
Mischa POSLAWSKY [Fri, 14 Aug 2015 17:15:00 +0000 (19:15 +0200)]
apl function for game of life

8 years agodrop headers and abbreviate descriptions
Mischa POSLAWSKY [Fri, 31 Jul 2015 01:23:13 +0000 (03:23 +0200)]
drop headers and abbreviate descriptions

Get rid of some English clutter;
Original sources should be easy to find by searching online.

8 years agohebrew sample
Mischa POSLAWSKY [Fri, 31 Jul 2015 00:30:32 +0000 (02:30 +0200)]
hebrew sample

Ideally test RTL, but good for modern script coverage in any case.
Best use of the common Unicode invitation so far, as it mixes direction
and includes niqqud.

8 years agodevanagari sample
Mischa POSLAWSKY [Fri, 31 Jul 2015 00:19:09 +0000 (02:19 +0200)]
devanagari sample

Copied from <http://r12a.github.io/scripts/summaries/devanagari>.

Context-based positioning at start of last 2 lines; digits at end of line 3;
multiple combining characters at line 2 start; contextual shaping in line 1
and start of line 4.

8 years agospell old english in old english
Mischa POSLAWSKY [Fri, 31 Jul 2015 00:16:12 +0000 (02:16 +0200)]
spell old english in old english

Includes capital AE.

8 years agoreplace s in latin old english by long variants
Mischa POSLAWSKY [Fri, 31 Jul 2015 00:13:23 +0000 (02:13 +0200)]
replace s in latin old english by long variants

Also include precomposed st-ligature for good measure (matching runic).

8 years agotransliterate runes with traditional orthography
Mischa POSLAWSKY [Thu, 30 Jul 2015 23:49:10 +0000 (01:49 +0200)]
transliterate runes with traditional orthography

Prefer original thorn and wynn letters.  Then "modernize" eth and long
vowels for additional coverage of Old English transcription.

8 years agoupdate to current upstream version 2002/2009
Markus Kuhn [Mon, 6 Apr 2009 18:13:43 +0000 (20:13 +0200)]
update to current upstream version 2002/2009

Latest <http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-demo.txt>
removes trailing whitespace.

21 years agoupdate to 2002 version
Markus Kuhn [Thu, 25 Jul 2002 12:00:00 +0000 (12:00 +0000)]
update to 2002 version

Retrieved from <http://www.cl.cam.ac.uk/~mgk25>.

22 years agoUTF-8 encoded sample plain-text file
Markus Kuhn [Fri, 20 Aug 1999 12:00:00 +0000 (12:00 +0000)]
UTF-8 encoded sample plain-text file

Extracted from <http://www.w3.org/2001/06/utf-8-test/UTF-8-demo.html>,
the earliest version I could find.