unicode-sampler.git
8 years agobraille contraction missed by converter
Mischa POSLAWSKY [Wed, 19 Aug 2015 05:29:36 +0000 (07:29 +0200)]
braille contraction missed by converter

Groupsign -en- (lower e) may be used in the end of "queen" (explicitly
mentioned in <http://www.brailleauthority.org/literary/ebae2002.pdf>).

8 years agobraille of english pangram instead
Mischa POSLAWSKY [Wed, 19 Aug 2015 03:36:55 +0000 (05:36 +0200)]
braille of english pangram instead

Replace long story in "scientific" (and very artificial) GS8 notation
by a transcription of the English panphone using common Grade-2 British
courtesy of <https://www.branah.com/braille-translator>.

Loses coverage of some 8-cell dots, but includes common abbreviations and
practical orthography.  Use braille blanks U+2800 instead of spaces.

8 years agoadjust english phonetic sample to cover more sounds
Mischa POSLAWSKY [Wed, 19 Aug 2015 00:24:38 +0000 (02:24 +0200)]
adjust english phonetic sample to cover more sounds

* Reorder words for more a less contrived meaning.
* Replace "wanted before" by "looked for it", to add /ʊ/ (the only absent
  monophtong) without losing any sounds.
* Introduce /ʉ/ by using a slight Scottish accent for "hue".
* Include commonly found glottal stop before "all".

8 years agoenglish phonetic pangram/panphone for ipa showcase
Mischa POSLAWSKY [Tue, 18 Aug 2015 23:49:25 +0000 (01:49 +0200)]
english phonetic pangram/panphone for ipa showcase

Replace the poor "linguistics" section by the last example from
<http://www.quora.com/Is-there-a-text-that-covers-the-entire-English-phonetic-range>
which covers most English phonemes including more rare distinctions
(m/ɱ, x, l/ɫ, w/ʍ).

Manually transcribed in an attempt to cover most sounds naturally, using a
mostly Irish/generic pronunciation (I'm not native though).  Compare
<https://en.wikipedia.org?oldid=673810019> for an overview of regional
differences.  Alternate IPA transcriptions in native dialects found at
<https://www.reddit.com/r/conlangs/comments/2quvnf/make_a_dialect_of_english/>
but not used due to more limited inventories.

8 years agojapanese iroha in all scripts
Mischa POSLAWSKY [Fri, 14 Aug 2015 21:27:27 +0000 (23:27 +0200)]
japanese iroha in all scripts

Kanji, hiragana, and original version downloaded from
<https://en.wikipedia.org?oldid=670286422>.

Katakana transliteration from <http://www.columbia.edu/~fdc/utf8/>.

Halfwidth variant derived using perl -Mcharnames=:full -CS -pe'
package charnames; s/\S/chr vianame("HALFWIDTH ".viacode(ord $&))/ge'
with incompatible characters replaced by small forms (prefer coverage over
natural conversion) and a voiced mark appended for even more coverage.

8 years agoreplace chinese extension B character from extension A
Mischa POSLAWSKY [Fri, 14 Aug 2015 19:40:34 +0000 (21:40 +0200)]
replace chinese extension B character from extension A

U+4D85 is obviously incorrect; assume U+24D85 was intended.

8 years agochinese samples of extended unicode blocks
Mischa POSLAWSKY [Fri, 14 Aug 2015 19:36:51 +0000 (21:36 +0200)]
chinese samples of extended unicode blocks

Random characters from each block from <http://ctext.org/font-test-page>.

8 years agochinese sample text: 1st chapter of qian zi wen
Mischa POSLAWSKY [Fri, 14 Aug 2015 19:14:55 +0000 (21:14 +0200)]
chinese sample text: 1st chapter of qian zi wen

Classic coverage poem in traditional orthography downloaded from
<http://www.gutenberg.org/ebooks/24184>.

8 years agochinese transliteration of 3 choice characters -ü
Mischa POSLAWSKY [Fri, 14 Aug 2015 19:02:39 +0000 (21:02 +0200)]
chinese transliteration of 3 choice characters -ü

Selected most frequently used characters ending in ü with all its tones.
Covers the most difficult pinyin (multiple accents), some limited bopomofo,
IPA tone bars (combinable into contours), and traditional/simplified glyph
comparison.

8 years agochinese selection of 50 most common mandarin characters
Mischa POSLAWSKY [Fri, 14 Aug 2015 18:17:28 +0000 (20:17 +0200)]
chinese selection of 50 most common mandarin characters

Extracted from Modern Chinese Character Frequency List (updated 2005-12-21)
published by 笪骏 [DA Jun] <http://lingua.mtsu.edu/chinese-computing>.
These characters should cover 30% of modern chinese texts.

8 years agotibetan declaration of human rights
Mischa POSLAWSKY [Fri, 14 Aug 2015 17:25:09 +0000 (19:25 +0200)]
tibetan declaration of human rights

Good sample copied from ཝེ་ཁེ་རིག་མཛོད <https://bo.wikipedia.org?oldid=123541>.
Prefix the title for yig-mgo, and adoption date [1948-12-10] as found on
<http://blog.amdotibet.cn/aaa999/archives/82869.aspx> for numbers.

8 years agotamil and kannada poems from Kermit UTF-8 Sampler
Mischa POSLAWSKY [Fri, 14 Aug 2015 17:17:51 +0000 (19:17 +0200)]
tamil and kannada poems from Kermit UTF-8 Sampler

Extracted from 2012-05-07 version of <http://www.columbia.edu/~fdc/utf8/>
by Frank da Cruz.

8 years agoapl function for game of life
Mischa POSLAWSKY [Fri, 14 Aug 2015 17:15:00 +0000 (19:15 +0200)]
apl function for game of life

8 years agodrop headers and abbreviate descriptions
Mischa POSLAWSKY [Fri, 31 Jul 2015 01:23:13 +0000 (03:23 +0200)]
drop headers and abbreviate descriptions

Get rid of some English clutter;
Original sources should be easy to find by searching online.

8 years agohebrew sample
Mischa POSLAWSKY [Fri, 31 Jul 2015 00:30:32 +0000 (02:30 +0200)]
hebrew sample

Ideally test RTL, but good for modern script coverage in any case.
Best use of the common Unicode invitation so far, as it mixes direction
and includes niqqud.

8 years agodevanagari sample
Mischa POSLAWSKY [Fri, 31 Jul 2015 00:19:09 +0000 (02:19 +0200)]
devanagari sample

Copied from <http://r12a.github.io/scripts/summaries/devanagari>.

Context-based positioning at start of last 2 lines; digits at end of line 3;
multiple combining characters at line 2 start; contextual shaping in line 1
and start of line 4.

8 years agospell old english in old english
Mischa POSLAWSKY [Fri, 31 Jul 2015 00:16:12 +0000 (02:16 +0200)]
spell old english in old english

Includes capital AE.

8 years agoreplace s in latin old english by long variants
Mischa POSLAWSKY [Fri, 31 Jul 2015 00:13:23 +0000 (02:13 +0200)]
replace s in latin old english by long variants

Also include precomposed st-ligature for good measure (matching runic).

8 years agotransliterate runes with traditional orthography
Mischa POSLAWSKY [Thu, 30 Jul 2015 23:49:10 +0000 (01:49 +0200)]
transliterate runes with traditional orthography

Prefer original thorn and wynn letters.  Then "modernize" eth and long
vowels for additional coverage of Old English transcription.

8 years agoupdate to current upstream version 2002/2009
Markus Kuhn [Mon, 6 Apr 2009 18:13:43 +0000 (20:13 +0200)]
update to current upstream version 2002/2009

Latest <http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-demo.txt>
removes trailing whitespace.

21 years agoupdate to 2002 version
Markus Kuhn [Thu, 25 Jul 2002 12:00:00 +0000 (12:00 +0000)]
update to 2002 version

Retrieved from <http://www.cl.cam.ac.uk/~mgk25>.

22 years agoUTF-8 encoded sample plain-text file
Markus Kuhn [Fri, 20 Aug 1999 12:00:00 +0000 (12:00 +0000)]
UTF-8 encoded sample plain-text file

Extracted from <http://www.w3.org/2001/06/utf-8-test/UTF-8-demo.html>,
the earliest version I could find.