Shiar's Git - unicode-sampler.git/log

git.shiar.nl / unicode-sampler.git / log

Mischa POSLAWSKY [Tue, 1 Sep 2015 16:27:41 +0000 (18:27 +0200)]

append negative squared letters to mathematical fonts

A, B, and AB indicate blood types, and are supported
as :x: campfire/github/&c emoticon entities.

commit | commitdiff | tree

Mischa POSLAWSKY [Tue, 1 Sep 2015 16:18:17 +0000 (18:18 +0200)]

circled letters to introduce mathematical fonts

More general-purpose, but similar and more commonly supported
so leaves a basic impression if other glyphs do not render.

commit | commitdiff | tree

Mischa POSLAWSKY [Tue, 1 Sep 2015 16:03:05 +0000 (18:03 +0200)]

mathematical letter symbols (ABC in all styles)

Compare letterlike fonts at U+2100 and U+1D400.
Should not be used to create words.

commit | commitdiff | tree

Mischa POSLAWSKY [Tue, 1 Sep 2015 14:49:46 +0000 (16:49 +0200)]

replace mixed scripts in "stargate" diacritics sample

Prefer latin "turned V" over greek lambda for strokeless A.
May not matter in most fonts, but should be more appropriate,
or at least introducing another (more rare) character.

commit | commitdiff | tree

Mischa POSLAWSKY [Tue, 1 Sep 2015 14:38:52 +0000 (16:38 +0200)]

khoekhoen/nama pangram to cover khoisan orthography

Sample from <http://www.omniglot.com/writing/khoekhoe.htm>
for the most widely spoken "Khoisan" language, featuring 3 click letters
(only lacking ʘ) and 2 distinct tone accents.

commit | commitdiff | tree

Mischa POSLAWSKY [Tue, 1 Sep 2015 14:13:46 +0000 (16:13 +0200)]

igbo single line to feature all non-ascii letters

Keep only second sentence since it contains all unique characters.
This part should remain correct stand-alone, translating as "Rejoice, get
together, speak and agree that it may stand firm, (s)he surely will grow".

Translation and contraction from deleted Wikipedia post available at:
<http://wpedia.goo.ne.jp/enwiki/Wikipedia_talk:Articles_for_creation/Igbo_Pangram>

Unfortunately no suitable samples found featuring tone marks as commonly
found in practical orthographies.

commit | commitdiff | tree

Mischa POSLAWSKY [Wed, 26 Aug 2015 06:15:10 +0000 (08:15 +0200)]

indicate voiced sounds in katakana

Test properly voiced characters which unlike hiragana aren't covered by
the kanji version.

commit | commitdiff | tree

Mischa POSLAWSKY [Wed, 26 Aug 2015 06:13:54 +0000 (08:13 +0200)]

align japanese characters using ideographic spaces

Avoid excessive column misalignment with variable width rendering.
Test monospacing equivalence in header row only.

commit | commitdiff | tree

Mischa POSLAWSKY [Wed, 26 Aug 2015 05:56:02 +0000 (07:56 +0200)]

korean pangram with halfwidth jamo

Another alphabetic equivalent for modern korean, created by:

perl -Mcharnames=:full -CS -pe'package charnames;
s{\S}{chr vianame(viacode(ord $&) =~ s/^(?=HANGUL)/HALFWIDTH /r)}ge'

commit | commitdiff | tree

Mischa POSLAWSKY [Wed, 26 Aug 2015 05:50:29 +0000 (07:50 +0200)]

korean pangram with separate jamo

Purely alphabetic variant for further comparison, created using:

perl -Mcharnames=:full -CS -pe 'package charnames;
s/\N{HANGUL CHOSEONG IEUNG}//g;
s{\S}{chr vianame(viacode(ord $&) =~ s/^HANGUL \K\S+/LETTER/r)}ge'

commit | commitdiff | tree

Mischa POSLAWSKY [Wed, 26 Aug 2015 05:39:14 +0000 (07:39 +0200)]

korean pangram to compare jamo decomposition

Seeing how some fonts/terminals/editors mangle the jamo version of
hunminjeongeum, add a line of modern korean in different encodings
to more extensively test equivalent rendering.

This most complete option from <https://ko.wikipedia.org/?oldid=14664370>
contains all jamo including double consonants and combined vowels.

Decomposed version created using:
perl -MLingua::KO::Hangul::Util=:all -CS -ne 'print decomposeSyllable($_)'

commit | commitdiff | tree

Mischa POSLAWSKY [Wed, 26 Aug 2015 05:19:31 +0000 (07:19 +0200)]

original middle korean hangeul for hunminjeongeum

Based on <http://faq.ktug.or.kr/wiki/uploads/hunmin.uni> (3rd lines)
with private use characters replaced manually, and hangeul syllables
decomposed to match.

commit | commitdiff | tree

Mischa POSLAWSKY [Wed, 26 Aug 2015 05:17:50 +0000 (07:17 +0200)]

korean hunminjeongeum

Original introduction to hangeul in modern korean and classical chinese from
<https://ko.wikipedia.org?oldid=14743128> with 스물여덟 replaced by 28 to
test mixing modern digits.

commit | commitdiff | tree

Mischa POSLAWSKY [Tue, 25 Aug 2015 11:41:26 +0000 (13:41 +0200)]

test diacritic composition with latvian pangram

Compare the same sentence with precomposed and decomposed characters,
which should look alike with correct support for diacritics composition.
The (alternate) Latvian pangram features accents both above and below
letters, and does not match automated Unicode decomposition because
typographically preferred commas accents are used instead of cedillas.

commit | commitdiff | tree

Mischa POSLAWSKY [Tue, 25 Aug 2015 11:37:06 +0000 (13:37 +0200)]

replace duplicate "text" in introduction by synonym

commit | commitdiff | tree

Mischa POSLAWSKY [Tue, 25 Aug 2015 11:36:36 +0000 (13:36 +0200)]

shavian transcription of english panphone

commit | commitdiff | tree

Mischa POSLAWSKY [Tue, 25 Aug 2015 11:04:23 +0000 (13:04 +0200)]

runic punctuation in rune sentence

commit | commitdiff | tree

Mischa POSLAWSKY [Tue, 25 Aug 2015 11:03:18 +0000 (13:03 +0200)]

move old english near modern english section

commit | commitdiff | tree

Mischa POSLAWSKY [Tue, 25 Aug 2015 10:53:38 +0000 (12:53 +0200)]

border around font overview instead of typography list

commit | commitdiff | tree

Mischa POSLAWSKY [Tue, 25 Aug 2015 10:49:33 +0000 (12:49 +0200)]

adjust font overview to include more ascii characters

commit | commitdiff | tree

Mischa POSLAWSKY [Tue, 25 Aug 2015 10:31:07 +0000 (12:31 +0200)]

german pangram with precomposed ligatures

From HTML sampler, attributed to Karl Pentzlin.  Covers many digraphs and
trigraphs, some of which have been replaced by presentational forms if
available in Unicode.  This is mostly a technical test of code points,
not of proper typesetting: proper ligatures should be determined by fonts
and rarely matches only these 6 sets.

Common Fraktur ligatures: ch ck ff ffi ffl fft fi fl ft ll ſch ſi ſſ ſt tz;
replaced by single glyphs for:  ﬀ  ﬃ   ﬄ       ﬁ  ﬂ                  ﬅ.
Usage of long s precludes inclusion of U+FB06 ﬆ, but this is already present
elsewhere.

commit | commitdiff | tree

Mischa POSLAWSKY [Tue, 25 Aug 2015 10:27:26 +0000 (12:27 +0200)]

amend dutch pangram with short afrikaans to cover accents

Closely related languages augment each other well:
'n digraph is only used in afrikaans, ij only in dutch,
accented letters in both but more common in afrikaans.

commit | commitdiff | tree

Mischa POSLAWSKY [Tue, 25 Aug 2015 10:22:05 +0000 (12:22 +0200)]

append agus araile to irish pangram

Meaningless "et cetera" abbreviation to cover Tironian et sign (agus),
with non-standard insular letter form of r to test Latin Extended-D.

commit | commitdiff | tree

Mischa POSLAWSKY [Tue, 25 Aug 2015 10:18:21 +0000 (12:18 +0200)]

uppercase part of turkish pangram

Cover uppercase i.

commit | commitdiff | tree

Mischa POSLAWSKY [Tue, 25 Aug 2015 10:17:26 +0000 (12:17 +0200)]

latin pangrams in 16 languages

Selected mostly from Wikipedia (pangram page in different languages)
to succinctly cover many common latin letters.

commit | commitdiff | tree

Mischa POSLAWSKY [Wed, 19 Aug 2015 09:21:22 +0000 (11:21 +0200)]

drop greetings in "various" [3] languages

Superfluous (especially near the end) as we already have more extensive
te(x)sts for English, Greek, and Japanese.

commit | commitdiff | tree

Mischa POSLAWSKY [Wed, 19 Aug 2015 08:47:25 +0000 (10:47 +0200)]

more succinct font overview

Reorganise assortment to:
- 3 lines (uppercase, lowercase, non-letters) for better discoverability;
- letters grouped together, all with case counterparts;
- repick extended latin to cover a wider range of more common glyphs;
- include more ASCII symbols for programming and internet usage;
- reduce set of mathematical additions to a good comparison of essentials;
- restrict drawing glyphs to one example per type:
arrow, box drawing (3 styles), block shade, dingbat.

commit | commitdiff | tree

Mischa POSLAWSKY [Wed, 19 Aug 2015 08:21:56 +0000 (10:21 +0200)]

generic font overview at beginning

Move font selection text near start (but below <pre>) to have a generic
overview/comparison before delving into specifics.

commit | commitdiff | tree

Mischa POSLAWSKY [Wed, 19 Aug 2015 07:58:52 +0000 (09:58 +0200)]

move apl snippet with other code

Logical progression from Perl to even non-ASCII symbols.
Keep only the life program; the first function doesn't contribute as much,
and the second part is just unneeded gibberish.

commit | commitdiff | tree

Mischa POSLAWSKY [Wed, 19 Aug 2015 07:58:09 +0000 (09:58 +0200)]

perl code snippet for additional programming symbols

Golf entry by teebee for character referencing
<http://golf.shinh.org/p.rb?Break+Lorem+Ipsum+fixed>
which includes a good amount of \W ASCII, including the common dollar sign,
slashes, and quotes.

commit | commitdiff | tree

Mischa POSLAWSKY [Wed, 19 Aug 2015 07:44:30 +0000 (09:44 +0200)]

ruby code for basic programming language overview

Inconsequential but thematic code (encodes and displays a Unicode entity)
written to include all braces (([{|}])), hash (#), and 0/O distinction.

Inspired by font comparison snippets at
<http://hivelogic.com/articles/top-10-programming-fonts>.

commit | commitdiff | tree

Mischa POSLAWSKY [Wed, 19 Aug 2015 07:16:42 +0000 (09:16 +0200)]

html code at top with open <pre>

To compare appearance of basic XML (tags, attribute, entity)
and support rendering the following contents as text/html
by opening a preformatted (monospaced) tag early on.

commit | commitdiff | tree

Mischa POSLAWSKY [Wed, 19 Aug 2015 07:06:04 +0000 (09:06 +0200)]

rewrite introduction to avoid charset and mention updated authorship

File encoding is insignificant so don't mention it explicitly (test works
equally well if converted to another UCS-compatible charset).
Instead list versions of this file and the covered standard.

commit | commitdiff | tree

Mischa POSLAWSKY [Wed, 19 Aug 2015 06:35:16 +0000 (08:35 +0200)]

extended cyrillic samples of sakha and kazakh

Between these languages the most important non-Slavic letters are present,
including common non-Russian glyphs І Ә Ө Ү Һ, a ligature Ҥ, and variants
with descender or bar (Ң Қ Ғ Ұ).
Not complete, but good for a general impression of Turkish support.

Official translations from <http://www.ohchr.org/>.

commit | commitdiff | tree

Mischa POSLAWSKY [Wed, 19 Aug 2015 05:29:36 +0000 (07:29 +0200)]

braille contraction missed by converter

Groupsign -en- (lower e) may be used in the end of "queen" (explicitly
mentioned in <http://www.brailleauthority.org/literary/ebae2002.pdf>).

commit | commitdiff | tree

Mischa POSLAWSKY [Wed, 19 Aug 2015 03:36:55 +0000 (05:36 +0200)]

braille of english pangram instead

Replace long story in "scientific" (and very artificial) GS8 notation
by a transcription of the English panphone using common Grade-2 British
courtesy of <https://www.branah.com/braille-translator>.

Loses coverage of some 8-cell dots, but includes common abbreviations and
practical orthography. Use braille blanks U+2800 instead of spaces.

commit | commitdiff | tree

Mischa POSLAWSKY [Wed, 19 Aug 2015 00:24:38 +0000 (02:24 +0200)]

adjust english phonetic sample to cover more sounds

* Reorder words for more a less contrived meaning.
* Replace "wanted before" by "looked for it", to add /ʊ/ (the only absent
monophtong) without losing any sounds.
* Introduce /ʉ/ by using a slight Scottish accent for "hue".
* Include commonly found glottal stop before "all".

commit | commitdiff | tree

Mischa POSLAWSKY [Tue, 18 Aug 2015 23:49:25 +0000 (01:49 +0200)]

english phonetic pangram/panphone for ipa showcase

Replace the poor "linguistics" section by the last example from
<http://www.quora.com/Is-there-a-text-that-covers-the-entire-English-phonetic-range>
which covers most English phonemes including more rare distinctions
(m/ɱ, x, l/ɫ, w/ʍ).

Manually transcribed in an attempt to cover most sounds naturally, using a
mostly Irish/generic pronunciation (I'm not native though). Compare
<https://en.wikipedia.org?oldid=673810019> for an overview of regional
differences. Alternate IPA transcriptions in native dialects found at
<https://www.reddit.com/r/conlangs/comments/2quvnf/make_a_dialect_of_english/>
but not used due to more limited inventories.

commit | commitdiff | tree

Mischa POSLAWSKY [Fri, 14 Aug 2015 21:27:27 +0000 (23:27 +0200)]

japanese iroha in all scripts

Kanji, hiragana, and original version downloaded from
<https://en.wikipedia.org?oldid=670286422>.

Katakana transliteration from <http://www.columbia.edu/~fdc/utf8/>.

Halfwidth variant derived using perl -Mcharnames=:full -CS -pe'
package charnames; s/\S/chr vianame("HALFWIDTH ".viacode(ord $&))/ge'
with incompatible characters replaced by small forms (prefer coverage over
natural conversion) and a voiced mark appended for even more coverage.

commit | commitdiff | tree

Mischa POSLAWSKY [Fri, 14 Aug 2015 19:40:34 +0000 (21:40 +0200)]

replace chinese extension B character from extension A

U+4D85 is obviously incorrect; assume U+24D85 was intended.

commit | commitdiff | tree

Mischa POSLAWSKY [Fri, 14 Aug 2015 19:36:51 +0000 (21:36 +0200)]

chinese samples of extended unicode blocks

Random characters from each block from <http://ctext.org/font-test-page>.

commit | commitdiff | tree

Mischa POSLAWSKY [Fri, 14 Aug 2015 19:14:55 +0000 (21:14 +0200)]

chinese sample text: 1st chapter of qian zi wen

Classic coverage poem in traditional orthography downloaded from
<http://www.gutenberg.org/ebooks/24184>.

commit | commitdiff | tree

Mischa POSLAWSKY [Fri, 14 Aug 2015 19:02:39 +0000 (21:02 +0200)]

chinese transliteration of 3 choice characters -ü

Selected most frequently used characters ending in ü with all its tones.
Covers the most difficult pinyin (multiple accents), some limited bopomofo,
IPA tone bars (combinable into contours), and traditional/simplified glyph
comparison.

commit | commitdiff | tree

Mischa POSLAWSKY [Fri, 14 Aug 2015 18:17:28 +0000 (20:17 +0200)]

chinese selection of 50 most common mandarin characters

Extracted from Modern Chinese Character Frequency List (updated 2005-12-21)
published by 笪骏 [DA Jun] <http://lingua.mtsu.edu/chinese-computing>.
These characters should cover 30% of modern chinese texts.

commit | commitdiff | tree

Mischa POSLAWSKY [Fri, 14 Aug 2015 17:25:09 +0000 (19:25 +0200)]

tibetan declaration of human rights

Good sample copied from ཝེ་ཁེ་རིག་མཛོད <https://bo.wikipedia.org?oldid=123541>.
Prefix the title for yig-mgo, and adoption date [1948-12-10] as found on
<http://blog.amdotibet.cn/aaa999/archives/82869.aspx> for numbers.

commit | commitdiff | tree

Mischa POSLAWSKY [Fri, 14 Aug 2015 17:17:51 +0000 (19:17 +0200)]

tamil and kannada poems from Kermit UTF-8 Sampler

Extracted from 2012-05-07 version of <http://www.columbia.edu/~fdc/utf8/>
by Frank da Cruz.

commit | commitdiff | tree

Mischa POSLAWSKY [Fri, 14 Aug 2015 17:15:00 +0000 (19:15 +0200)]

apl function for game of life

commit | commitdiff | tree

Mischa POSLAWSKY [Fri, 31 Jul 2015 01:23:13 +0000 (03:23 +0200)]

drop headers and abbreviate descriptions

Get rid of some English clutter;
Original sources should be easy to find by searching online.

commit | commitdiff | tree

Mischa POSLAWSKY [Fri, 31 Jul 2015 00:30:32 +0000 (02:30 +0200)]

hebrew sample

Ideally test RTL, but good for modern script coverage in any case.
Best use of the common Unicode invitation so far, as it mixes direction
and includes niqqud.

commit | commitdiff | tree

Mischa POSLAWSKY [Fri, 31 Jul 2015 00:19:09 +0000 (02:19 +0200)]

devanagari sample

Copied from <http://r12a.github.io/scripts/summaries/devanagari>.

Context-based positioning at start of last 2 lines; digits at end of line 3;
multiple combining characters at line 2 start; contextual shaping in line 1
and start of line 4.

commit | commitdiff | tree

Mischa POSLAWSKY [Fri, 31 Jul 2015 00:16:12 +0000 (02:16 +0200)]

spell old english in old english

Includes capital AE.

commit | commitdiff | tree

Mischa POSLAWSKY [Fri, 31 Jul 2015 00:13:23 +0000 (02:13 +0200)]

replace s in latin old english by long variants

Also include precomposed st-ligature for good measure (matching runic).

commit | commitdiff | tree

Mischa POSLAWSKY [Thu, 30 Jul 2015 23:49:10 +0000 (01:49 +0200)]

transliterate runes with traditional orthography

Prefer original thorn and wynn letters. Then "modernize" eth and long
vowels for additional coverage of Old English transcription.

commit | commitdiff | tree

Markus Kuhn [Mon, 6 Apr 2009 18:13:43 +0000 (20:13 +0200)]

update to current upstream version 2002/2009

Latest <http://www.cl.cam.ac.uk/~mgk25/ucs/examples/UTF-8-demo.txt>
removes trailing whitespace.

commit | commitdiff | tree

Markus Kuhn [Thu, 25 Jul 2002 12:00:00 +0000 (12:00 +0000)]

update to 2002 version

Retrieved from <http://www.cl.cam.ac.uk/~mgk25>.

commit | commitdiff | tree

Markus Kuhn [Fri, 20 Aug 1999 12:00:00 +0000 (12:00 +0000)]

UTF-8 encoded sample plain-text file

Extracted from <http://www.w3.org/2001/06/utf-8-test/UTF-8-demo.html>,
the earliest version I could find.

Unicode sampler - various texts to test Unicode support

RSS Atom