Shiar's Git - unicode-sampler.git/log

git.shiar.nl / unicode-sampler.git / log

Mischa POSLAWSKY [Sat, 4 Apr 2020 00:45:14 +0000 (02:45 +0200)]

arabic vowel diacritics

Fully vocalised version with roman transliteration copied from:
http://clagnut.com/blog/2380/#Perfect_pangrams_in_English_.2826_letters.29

commit | commitdiff | tree

Mischa POSLAWSKY [Sat, 4 Apr 2020 00:15:02 +0000 (02:15 +0200)]

common arabic pangram

Mentioned in "Computational Linguistics, Speech and Image Processing for
Arabic Language" by El Gayar and Suen at Test image 3.1.4.b as "the most
common Arabic pangram ... containing all of the basic letters".

Copied version from <http://clagnut.com/blog/2380/
#Perfect_pangrams_in_English_.2826_letters.29> with additional diacritics.

commit | commitdiff | tree

Mischa POSLAWSKY [Sat, 4 Apr 2020 00:03:25 +0000 (02:03 +0200)]

larger african sample in taa

Part of a story on <http://archive.phonetics.ucla.edu/Language/NMN
/nmn_story_1972_01.html> transcript using special glyphs ǀǁǂǃʘʔɟʼʰʲa̰ɜʉʉ̰m̩ŋn̩ɲə
(the last two missing in this snippet).

Contains all distinctive characters of the Khoekhoe pangram, so replace that
by a phrase from <https://en.wikipedia.org/wiki/Taa_language?oldid=943791933>
with slightly different orthography including ɢ.

commit | commitdiff | tree

Mischa POSLAWSKY [Sat, 4 Apr 2020 01:04:58 +0000 (03:04 +0200)]

ForTheWin in smallcaps, superscript, subscript, turned letters

commit | commitdiff | tree

Mischa POSLAWSKY [Thu, 19 Mar 2020 02:08:15 +0000 (03:08 +0100)]

compare mathematical letter symbols for ForTheWin

Reintroduce an even more elaborate overview of letterlike scripts, dismissed
in commit 26e8aa257e (2015-09-12) [drop mathematical ABC symbols line].
While it remains as ill-advised for spelling out words, unfortunately it is
widely used that way nowadays (at least in Twitter user names).

Regardless, style should be consistent (especially considering characters
have been introduced in different versions) and distinct (as it's intended
for unique variables), so a comparison makes sense in any case.
Just put it after proper language scripts.

commit | commitdiff | tree

Mischa POSLAWSKY [Sat, 4 Apr 2020 03:42:48 +0000 (05:42 +0200)]

cjk comparison of traditional, simplified, shinjitai

Sample characters copied from <https://en.wikipedia.org/wiki
/Differences_between_Shinjitai_and_Simplified_characters?oldid=932950382>
"Different simplifications in both languages".

commit | commitdiff | tree

Mischa POSLAWSKY [Sat, 14 Mar 2020 22:19:54 +0000 (23:19 +0100)]

random characters from cjk extension F

commit | commitdiff | tree

Mischa POSLAWSKY [Sat, 14 Mar 2020 22:18:11 +0000 (23:18 +0100)]

common line break after ethiopic header

Match /^\N+:\n\n/ syntax like other titles.

commit | commitdiff | tree

Mischa POSLAWSKY [Sat, 14 Mar 2020 22:06:06 +0000 (23:06 +0100)]

single word of zalgo should suffice

Generated by <https://www.zalgotextgenerator.com/>.

commit | commitdiff | tree

Mischa POSLAWSKY [Sat, 14 Mar 2020 21:50:29 +0000 (22:50 +0100)]

append cypriot glyph to linear A space

Previously omitted as it shares the same 0x80 space as Aramaic,
but can be safely moved a bit since the preceding code point is empty.

commit | commitdiff | tree

Mischa POSLAWSKY [Sat, 14 Mar 2020 21:44:33 +0000 (22:44 +0100)]

match circled ideograph to cjk representation

Reuse a common glyph as outlined in the Last Resort guidelines, improving
over their own choice.

commit | commitdiff | tree

Mischa POSLAWSKY [Sat, 14 Mar 2020 21:32:07 +0000 (22:32 +0100)]

move A homographs to typography section

Good place to test for expected similarities in derived writing systems,
next to unwanted same-script lookalikes.

The large amount of O-ish glyphs do not really add much except for testing
script coverage.

commit | commitdiff | tree

Mischa POSLAWSKY [Sat, 14 Mar 2020 21:09:49 +0000 (22:09 +0100)]

include letter Ø as common 0 lookalike

Programming fonts regularly use a slashed zero to distinguish the number,
but in some cases make it look very similar to the scandinavian vowel.
In low quality could even be mistaken for 8 shown subsequently.

Confusion with the symbols ∅ and ⌀ is less likely, usually having a
different shape, as do glyphs representing a dotted digit 0 (ʘ, ☉, ⊙, ⨀).

commit | commitdiff | tree

Mischa POSLAWSKY [Sat, 14 Mar 2020 20:22:40 +0000 (21:22 +0100)]

lower pronunciation of Arthur in panphone

Replace duplicate mid-central ɚ by open-mid ɝ for a near-identical result
with a different glyph, already distinguished in all other orthographies.

commit | commitdiff | tree

Mischa POSLAWSKY [Sat, 14 Mar 2020 20:12:12 +0000 (21:12 +0100)]

reorder panphone to drop "on" in second line

Equivalent sentence containing the same sounds, but shorter so it fits
within 76 characters.

commit | commitdiff | tree

Mischa POSLAWSKY [Sat, 14 Mar 2020 20:36:09 +0000 (21:36 +0100)]

panphone in deseret

Manual composition, disregarding the automated translation by <2deseret.com>
as it does not conform to standard spelling outlined in
<http://www.chem.ucla.edu/~jericks/Historical%20or%20Technical/Linguistics/Deseret_Guide.pdf>

Consider it a mostly dead script, so not positioned next to other English.

commit | commitdiff | tree

Mischa POSLAWSKY [Sat, 14 Mar 2020 00:21:44 +0000 (01:21 +0100)]

align U+03x blocks for single width hexagram

Assume U+4DC3 symbol should be one column wide, ignoring double width
displayed by Unifont.

commit | commitdiff | tree

Mischa POSLAWSKY [Thu, 12 Mar 2020 05:31:09 +0000 (06:31 +0100)]

smp block allocation glyphs

Extend representation characters for U+10000-10FFF usually copying
the Unicode font <https://github.com/unicode-org/last-resort-font>.

commit | commitdiff | tree

Mischa POSLAWSKY [Thu, 12 Mar 2020 02:45:16 +0000 (03:45 +0100)]

prefer apple last resort glyphs

Described at <http://developer.apple.com/fonts/LastResortFont/>
for a more uniform style:

> Unicode blocks are illustrated by a representative glyph from the block,
> chosen to be as distinct as possible from glyphs of other blocks.
>
> Examplar glyphs were chosen in a number of ways.  Almost all of the
> Brahmic scripts show the initial consonant ka.  Latin uses the letter A
> because it's the first letter, and because in each Latin block there is
> a letter A so they can be easily differentiated.  Greek and Cyrillic use
> their last letters, omega and ya, because they are so distinctive.  Most
> other alphabets and syllabaries use their initial letter where
> distinctive.

Try to avoid unnecessary exceptions, though in some cases I can't help but
know better (usually improving distinctiveness, especially considering
unknown output variants).

Restrict to a single entry per 0x80, mostly keeping the latest unicode
version for maximum effort.

commit | commitdiff | tree

Mischa POSLAWSKY [Wed, 11 Mar 2020 21:51:08 +0000 (22:51 +0100)]

bmp block allocation glyphs

Overview of BMP blocks similar to <http://sheet.shiar.nl/charset/unicode>
each represented by an identifying glyph copied from Unidings v9.19
<http://users.teilar.gr/~g1951d/Unidings.pdf> by George Douros.

Characters align neatly to 0x40 code points, preferring every other column
if feasible, but keeping various smaller (rtl, brahmic) scripts for now.
Silently break positions between U+3400 and U+A400 because these only
contain cjk and yi with usually uniform font coverage.

commit | commitdiff | tree

Mischa POSLAWSKY [Tue, 10 Mar 2020 01:54:56 +0000 (02:54 +0100)]

update version to unicode 10.0

Required for recently added hentaigana.

commit | commitdiff | tree

Mischa POSLAWSKY [Mon, 9 Mar 2020 23:02:33 +0000 (00:02 +0100)]

hentaigana variant of iroha

Version from 現今児童重宝記 <https://www.sljfaq.org/afaq/iroha.html>
(No. 6) painstakingly matched to unicode glyphs described on
<https://en.wikipedia.org/wiki/Hentaigana?oldid=935497055>
with uncertainties resolved by comparing <https://www.semanticscholar.org
/paper/Distinction-and-Difference:-From-Kana-to-Hiragana-Marks
/f334ea3cb70e933ed4f52e174a41c0242d5204a2/figure/5>.

commit | commitdiff | tree

Mischa POSLAWSKY [Mon, 9 Mar 2020 22:41:25 +0000 (23:41 +0100)]

haskell oneliner with programming ligatures

Some obfuscated code (not particularly typical) as found and explained on
<https://stackoverflow.com/questions/12659951/-obfuscated-haskell-code-work>
featuring multi-character combinations <$>, <*>, =<<, >>= substituted by
"modern" coding fonts such as <https://github.com/tonsky/FiraCode>.

This whole practice seems like an awful idea to me, but regardless needs to
be represented for font comparison.

commit | commitdiff | tree

Mischa POSLAWSKY [Mon, 9 Mar 2020 22:28:48 +0000 (23:28 +0100)]

powerline lookalikes for branch and linenr

Indicators for vc branch and line number are common requirements of modern
status bars. While console fonts still prefer private use area U+E0Ax,
similar symbols for "alternate key" and "newline" as mentioned in
<https://vi.stackexchange.com/a/3363> can be advertised instead.

commit | commitdiff | tree

Mischa POSLAWSKY [Mon, 9 Mar 2020 22:22:57 +0000 (23:22 +0100)]

reorder minority alphabets in overview

Cyrillic before Greek as it's stylistically closer to Latin.
Georgian before Armenian as it aligns better with following lines.

commit | commitdiff | tree

Mischa POSLAWSKY [Mon, 9 Mar 2020 22:16:57 +0000 (23:16 +0100)]

reorganise overview table

commit | commitdiff | tree

Mischa POSLAWSKY [Mon, 9 Mar 2020 20:43:45 +0000 (21:43 +0100)]

restrict currencies to most traded

A "compact" overview does not need 17 different currency symbols, mostly
inherited from the <http://kermitproject.org/utf8.html> sampler line.
Test only internationally significant valuta, guided by the top 28 listed
at <https://en.wikipedia.org/wiki/Currency?oldid=942055810#cite_ref-10>,
keeping recent additions (₹, ₽) and adding a full-width character (元).

commit | commitdiff | tree

Mischa POSLAWSKY [Mon, 25 Nov 2019 13:13:37 +0000 (14:13 +0100)]

homoglyphs of A and O

Collect visually similar characters from different scripts.
Unlike ASCII lookalikes presented earlier, these are not expected to be
distinguishable if mixed, and a worst-case scenario of homograph attacks.

commit | commitdiff | tree

Mischa POSLAWSKY [Mon, 22 Oct 2018 22:04:28 +0000 (00:04 +0200)]

excessive, scary usage of diacritics; ZALGO!

Copied from <https://knowyourmeme.com/memes/zalgo#scrambled-text>
to stress test combining marks:

- Rendering limitations may exclude glyphs after a certain number.
- Accumulated marks should extend vertically to avoid overlapping.
- Monospace rendering with increased height may cause lines to overlap
or be cropped.

commit | commitdiff | tree

Mischa POSLAWSKY [Fri, 29 Jun 2018 20:06:21 +0000 (22:06 +0200)]

coptic sample text in old nubian

Equivalents in coptic and greek characters, copied from:
https://en.wikipedia.org/wiki/Old_Nubian?oldid=847789397#Sample_text

commit | commitdiff | tree

Mischa POSLAWSKY [Mon, 9 Mar 2020 22:33:33 +0000 (23:33 +0100)]

different vietnamese dong

Fill available space by a "different" expression with the đồng currency sign
used in its homonym (compensating for its removal from the font overview),

followed by IPA pronunciation averaging several Wiktionary entries including
<https://zh.wiktionary.org/wiki/bất_đồng?oldid=4888545> to most
significantly provide missing ɓ, ɗ, ɜ.

commit | commitdiff | tree

Mischa POSLAWSKY [Fri, 29 Jun 2018 20:04:47 +0000 (22:04 +0200)]

pointer compass, triangles in all directions

Black triangle characters and related, similar (and next) to arrows.

commit | commitdiff | tree

Mischa POSLAWSKY [Fri, 29 Jun 2018 20:03:09 +0000 (22:03 +0200)]

random kaomoji faces

Test appearance of some common Japanese face characters, picked from
https://en.wikipedia.org/wiki/Emoticon#Japanese_style and (mixed)
https://en.wikipedia.org/wiki/List_of_emoticons#Eastern containing
some complex Unicode glyphs.

Excellent test of mixed scripts and common visual expectations.

commit | commitdiff | tree

Mischa POSLAWSKY [Fri, 29 Jun 2018 20:01:39 +0000 (22:01 +0200)]

sanskrit transcriptions from wikipedia

Compare some brahmic scripts with a common sentence copied from:
https://commons.wikimedia.org/?title=File:Phrase_sanskrit.png&oldid=308591152

Meaning seems nice and related:

> May Śiva bless those who take delight in the language of the gods

Even though the variants have various issues, and actual source is unknown
according to http://mendenlama.tumblr.com/post/120050473698/camfoc-issues:

> the ascription of this blessing phrase to Kālidāsa is spurious

commit | commitdiff | tree

Mischa POSLAWSKY [Fri, 29 Jun 2018 14:09:31 +0000 (16:09 +0200)]

shavian corrections for "waters", "heard"

Fix waters being incorrectly transcribed as woiters, losing oil.
Replace expected h-err-d by morphophonemic h-ear-d to include another glyph.

Remaining letters unrepresented: 𐑬𐑭𐑲𐑴 𐑹𐑺𐑾

commit | commitdiff | tree

Mischa POSLAWSKY [Sat, 31 Mar 2018 18:23:14 +0000 (20:23 +0200)]

align hangeul decomposition

Start sentences at same column assuming expected character widths.

commit | commitdiff | tree

Mischa POSLAWSKY [Wed, 13 Jul 2016 17:08:18 +0000 (19:08 +0200)]

chinese transliteration below samples

Mixed scripts after more typical CJK.

commit | commitdiff | tree

Mischa POSLAWSKY [Sun, 5 Jun 2016 12:44:23 +0000 (14:44 +0200)]

cantonese transliteration (jyutping, ipa)

One character per line for better overview and space for additional details,
introducing common non-pinyin tone digits and ɵ pronunciation.

commit | commitdiff | tree

Mischa POSLAWSKY [Sun, 13 Sep 2015 18:17:12 +0000 (20:17 +0200)]

update dated update date to uptodate date

commit | commitdiff | tree

Mischa POSLAWSKY [Sun, 13 Sep 2015 18:07:29 +0000 (20:07 +0200)]

symmetric ascii art bunny

Keep to ASCII characters as commonly used (curved quotation marks were
likely substituted due to an erroneous copypaste).

commit | commitdiff | tree

Mischa POSLAWSKY [Sat, 12 Sep 2015 13:29:06 +0000 (15:29 +0200)]

drop mathematical ABC symbols line

Places too much emphasis on an relatively insignificant plane 1 block.
One such character also introduced in commit 30491ef4cf (2015-09-09)
[complex conjugate formula to cover blackletter and italic letters]
remains elsewhere.

commit | commitdiff | tree

Mischa POSLAWSKY [Sat, 12 Sep 2015 13:25:54 +0000 (15:25 +0200)]

insert non-joiner between non-ligature fl in german pangram

Lost during copypaste from original.

commit | commitdiff | tree

Mischa POSLAWSKY [Fri, 11 Sep 2015 18:11:40 +0000 (20:11 +0200)]

fix mistyped letter in greek iliad

Obvious mistake caught while rereading.

commit | commitdiff | tree

Mischa POSLAWSKY [Fri, 11 Sep 2015 17:22:47 +0000 (19:22 +0200)]

glagolitic tower of bable transcription

Another line to properly finish the story.  Preferred succession from
Slavonic would be old Croatian in Glagolitic script.  However, unable to
find any such version online, settle for an original composition.

Based on a different source of Church Slavonic without abbreviations
from <http://www.vechnoe.info/bible/translit/gen/11>:

Прїидѣте и изшедше смѣсимъ имъ ту язы́ки ихъ,
да не услы́шатъ ко́ждо дру́га своего.

Converted to Glagolitic using some naive conversion rules:

tr{абвгдежѕзиїклмнопрстуфхѡщцчшъыьѣёюяѩѫ,.}
  {ⰰⰱⰲⰳⰴⰵⰶⰷⰸⰺⰻⰽⰾⰿⱀⱁⱂⱃⱄⱅⱆⱇⱈⱉⱋⱌⱍⱎⱏⰹⱐⱑⱖⱓⱔⱗⱘ·:};
s/ⰹ/ⱏⰹ/g;
s/\Bⰺ/ⰻ/g;

Arbitrarily appended three dot+paragraphos punctuation to end text.

commit | commitdiff | tree

Mischa POSLAWSKY [Fri, 11 Sep 2015 14:32:41 +0000 (16:32 +0200)]

cyrillic tower of babel in multiple slavic languages

Replace Russian sample by Genesis 11:1-6 with each line in another
translation from <http://www.omniglot.com/babel/langfam.htm#ie>:
Russian, Serbian, Belarusian, Ukrainian, Macedonian, Church Slavonic.
Adds 16 distinct letters in 29 forms, only loses ф.

Manually transcribed the image of Slavonic (hopefully correctly),
featuring obsolete letters (yat, yus, ou both monographic and digraphic)
and diacritics including U+0483 titlo and U+2DED es with pokrytie.

commit | commitdiff | tree

Mischa POSLAWSKY [Wed, 9 Sep 2015 21:07:56 +0000 (23:07 +0200)]

complex conjugate formula to cover blackletter and italic letters

Elaborate on complex numbers ℂ, covering some more symbols including an
italic i from plane 1. Should provide a new challenge to render correctly
(notably aligning bracket lines after combining mark and 4-byte UTF-8).

commit | commitdiff | tree

Mischa POSLAWSKY [Wed, 9 Sep 2015 20:44:50 +0000 (22:44 +0200)]

reorder languages to transition from semitic to indic

Ethiopic after Hebrew (similar languages, simple rendering);
Thai before Hindi to group the more complex scripts together.

commit | commitdiff | tree

Mischa POSLAWSKY [Wed, 9 Sep 2015 20:42:05 +0000 (22:42 +0200)]

move typography section up

More logical to list general features before going into specific languages.

commit | commitdiff | tree

Mischa POSLAWSKY [Wed, 9 Sep 2015 20:35:43 +0000 (22:35 +0200)]

move font overview to top

Generic introduction before going into specifics.

Initially kept below <pre> for html compatibility, but this should not
influence the normal/intended layout.

commit | commitdiff | tree

Mischa POSLAWSKY [Wed, 9 Sep 2015 20:25:05 +0000 (22:25 +0200)]

replace random signs in typography by basic arithmetics

Keeps minus and dashes but with better context and some other common symbols.
Loses trademark sign, but who wants to see that anyway?

commit | commitdiff | tree

Mischa POSLAWSKY [Wed, 9 Sep 2015 19:53:54 +0000 (21:53 +0200)]

typography item for currency usage (elaborating on euro sign)

Sample Spanish, American, and Japanese pricing syntax equivalent to
"1 USD a piece", featuring the cada/una sign, superscript zeroes,
and small katakana and wide numbers.

commit | commitdiff | tree

Mischa POSLAWSKY [Wed, 9 Sep 2015 00:50:00 +0000 (02:50 +0200)]

improve paragraph headers

Mark sub-headers with leading bullet (improve structural clarity without
another indentation level).

commit | commitdiff | tree

Mischa POSLAWSKY [Tue, 8 Sep 2015 23:17:49 +0000 (01:17 +0200)]

replace ethiopic by multilingual sample

Replace armaric proverbs by Unicode introduction translations
from <http://www.unicode.org/standard/WhatIsUnicode-more.html>
> Unicode provides a unique number for every character,
> no matter what the platform,
> no matter what the program,
> no matter what the language.
in common Ethiopian/Amharic, non-semitic Agaw/B(i)lin, and supplemented
SebatBe(i)t/Cheha, for a more varied overview: containing 61 letter forms
from 29 distinct consonant groups, previously 82 from 25.

Other translations in Tigrigna and Xamtanga do not fit in 80 columns,
but are related to Ethiopian and Agaw respectively so are not too distinct.

Different punctuation marks in originals were kept for variety.
Cheha includes 2 characters (ᎏ and ᎇ) from Ethiopic Supplement block U+138x
and 1 variant form (ኵ could be rendered with a labialisation loop).

Unfortunately no extended code points (U+2D8x/AB0x) as sources are very hard
to come by: theoretically there should be bible translations of Basketo,
Gumuz, or Bench; but no equivalent texts could be found online, let alone in
a suitable encoding (found only snippets with private-use chars and images).

commit | commitdiff | tree

Mischa POSLAWSKY [Tue, 8 Sep 2015 21:33:23 +0000 (23:33 +0200)]

extend hebrew sample to 5 lines

Two lines is a very minimal test; rather include the entire body from
<http://www.oocities.org/kr/tomchiukc/Language/Unicode/x-utf8.html>
covering the entire alefbet (adds חטפק).

commit | commitdiff | tree

Mischa POSLAWSKY [Tue, 8 Sep 2015 20:58:39 +0000 (22:58 +0200)]

combine short pangrams

Single line showcasing all language-specific letters of
Danish, Hungarian, Polish, Esperanto.

Besides saving space, it gives a good overview of traditional
character set compatibility, respectively listing characters from
ISO-8859/Latin-1 (da), Latin-2 (hu/pl), and Latin-3 (eo).

Bullet separators are only in Windows-125x supersets.

Bedaŭrinde malgajnis mian propran ŝerceton ĉar tro longas :(

commit | commitdiff | tree

Mischa POSLAWSKY [Tue, 8 Sep 2015 20:33:19 +0000 (22:33 +0200)]

lithuanian pangram

While it does not introduce any new diacritics (just czech háčeks and polish
ogoneks and dot) the different combinations are not as widely supported
(less widely used and not in ISO-8859-2).

commit | commitdiff | tree

Mischa POSLAWSKY [Tue, 8 Sep 2015 19:58:26 +0000 (21:58 +0200)]

more natural ipa transcription of english panphone

Pronunciation of "again" is rarely with a palatal g, and usually with
a monophthong. Instead include other non-phonemic realisations like
- common aspiration (frequent after initial voiceless stops),
- retraction (postalveolar variation before rhotic sound), and
- release (not audible for the first stop of a cluster).
Also replace one instance of /ʌ/ by allophone /ɐ/.

commit | commitdiff | tree

Mischa POSLAWSKY [Mon, 7 Sep 2015 16:32:31 +0000 (18:32 +0200)]

hebrew zarka table including all common prosodic signs

Test sentence from <http://www.sagreiss.org/cantillizer/cantillation.htm>
covering 32 distinct marks over 24 letters. Superset of more commonly used
vowel points, with combinations of upto 4 glyphs. Notably Unifont (v8.0.01)
makes no effort at all even to avoid overlapping with letters.

commit | commitdiff | tree

Mischa POSLAWSKY [Mon, 7 Sep 2015 15:27:11 +0000 (17:27 +0200)]

thai pangram

Replace random text containing only 43 distinct letters with 15 diacritics
(103 combinations) by a dedicated pangram used on multiple websites, notably
<http://www.thai-language.com/ref/typographical-styles>. It is said to be
owned by "The Computer Association of Thailand under the Royal Patronage of
His Majesty the King" (abbreviated to fit in header line).

This should cover all commonly used forms (an additional 11 letters in
86 combinations). It still lacks the mostly obsolete consonants ฃ and ฅ,
but includes all vowel signs not found in other pangrams.
Also no traditional numerals, but western digits are more commonly used.

Padded to retain aligned columns (but no longer explicitly indicated as it's
only a minor aspect of correct rendering).

Append "angkhan wisanchani khomut" marking the very end of a written work
to cover khankhu and khomut signs, but leaving out the obsolete fongman mark
at the beginning.

commit | commitdiff | tree

Mischa POSLAWSKY [Sat, 5 Sep 2015 08:52:49 +0000 (10:52 +0200)]

cover all latin1 spacing accents by adding cedilla

Fill line by including missing U+00B8 CEDILLA which exists together with
Unicode extension U+02D8 BREVE on U+1E1D E.

commit | commitdiff | tree

Mischa POSLAWSKY [Sat, 5 Sep 2015 08:33:29 +0000 (10:33 +0200)]

include vietnamese in diacritics decomposition line

Test multiple combining marks with a sentence containing various accents from
<http://lists.hanoilug.org/pipermail/du-an-most/2011-December/005228.html>
(apparently a common saying).

Replace latvian by a shorter variant from <http://clagnut.com/blog/2380/>,
with unneeded adjectives (brīvi, celofāna) omitted to fit.

commit | commitdiff | tree

Mischa POSLAWSKY [Sat, 5 Sep 2015 08:29:58 +0000 (10:29 +0200)]

single-line slovak pangram

More efficient sentence from <https://sk.wikipedia.org?oldid=6081871>,
lacking some ASCII but including all specific letters.

commit | commitdiff | tree

Mischa POSLAWSKY [Sat, 5 Sep 2015 07:45:42 +0000 (09:45 +0200)]

ancient greek verse from homer's iliad

Slightly non-standard orthography showcasing some archaic features.

Based on polytonal transcription with pneuma (including psili), tonos
(oxeia, bareia, perispwmenh), coronis, and diairesis. Includes macron
length marks from <http://www.ancientgreekonline.com/Iliad/Iliad.htm>,
which is also the source of the more minimal punctuation:

> Modern emendations and modern punctuation have been avoided wherever
> possible, especially in the case of commas, which are usually an unwelcome
> intrusion upon the exquisite system of particles that give both clarity
> and effervescence to the epic hexameters.

Reconstructed digamma and qoppa have been ported from the hypothetical
adapter's hand illustrated on p65 of "Homer and the Origin of the Greek
Alphabet", 1991 by Barry B. Powell (available at <http://monoskop.org/>),
resulting in one of each:

> the digamma was written, in recording poetry, only in those cases where the
> sound represented by digamma still made metrical position in the verse.

This seems like a far better approach compared to for example
<http://www.download-free-mp3music.com/song/homers-iliad-1-32/190349993/>
which attempts to recover all parachronistic occurrences.

Qoppa is rarely seen in modern texts, but is an appropriate variation of
kappa before back vowels.

Does not include san variation (too random in this style), nor iota
ligatures (which came later). Does seem to feature everything from the
previous sample, except for the excessive amount of lines.

commit | commitdiff | tree

Mischa POSLAWSKY [Sat, 5 Sep 2015 08:10:00 +0000 (10:10 +0200)]

monotonic greek anthem

Replace hymn lyrics by post-1982 orthography, which should be covered first
and foremost. Includes all letters except Ξ and Φ, including accented
vowels; just no dialytica.

commit | commitdiff | tree

Mischa POSLAWSKY [Sat, 5 Sep 2015 06:10:04 +0000 (08:10 +0200)]

enlarge font overview block to 4 lines

Extend further to be able to feature:

- all 94 ASCII characters;
- uncommon extension examples for georgian, hebrew, arabic;
- many more currency signs (inspired by HTML sampler top row),
  grouped by rarity: common symbols first (last ones are new to Unicode:
  rupee in 6.0, ruble in 7.0), then other significant national symbols
  (assortment from <http://sheet.shiar.nl/unicode>);
- dingbats and technical symbols (mostly personal favourites),
  restrict to plane 0 for now (extended emoji have unreliable width);
- common key/control graphic representations (space, bs, option/alt,
  command, menu, enter, null; also playstation (or symbolics kb) shapes).

commit | commitdiff | tree

Mischa POSLAWSKY [Sat, 5 Sep 2015 05:22:52 +0000 (07:22 +0200)]

extend font overview to 3x70

Maximise width to include more characters, including:

- more punctuation (guillemets, <>, section marks);
- grouped brackets for better discoverability;
- the first 3 letters for other alphabets (testing support of arabic
contextual forms);
- extended cyrillic (old yat, uncommon accent).

commit | commitdiff | tree

Mischa POSLAWSKY [Thu, 3 Sep 2015 10:33:51 +0000 (12:33 +0200)]

separate perl operators with whitespace

Assume keming is never really an issue for ASCII,
so prefer common spacing style outside of golf.

commit | commitdiff | tree

Mischa POSLAWSKY [Thu, 3 Sep 2015 10:32:29 +0000 (12:32 +0200)]

random c code sample

From <http://people.mpi-inf.mpg.de/~uwe/misc/uw-ttyp0/>.
Includes several bitwise operators and other common sequences (->, !=, ++).

commit | commitdiff | tree

Mischa POSLAWSKY [Thu, 3 Sep 2015 01:14:21 +0000 (03:14 +0200)]

chess notation line

Part of the "Immportal Game" between Adolf Anderssen and Lionel Kieseritzky,
from <https://en.wikipedia.org?oldid=644618706> in algebraic notation with:

- figurine pieces (7 out of 12, and 5 kinds (only missing rooks), probably
  as good as we can get without resorting to an uninspired listing),
- chess marks using stylistic daggers,
- precomposed annotation symbols ligature of ?! (along with ⁇/⁉/‼ apparently
  in Unicode for this very purpose),
- precomposed glyphs for ordinal digits with period,
- multiplication symbol for captures (common in books, as summarised in
  Wikipedia discussion <https://en.wikipedia.org?oldid=599812962>
  though they finally prefer compatibility over typography it seems).

commit | commitdiff | tree

Mischa POSLAWSKY [Thu, 3 Sep 2015 01:09:20 +0000 (03:09 +0200)]

lookalikes for 2 and 5

Though rarely indistinct, still good for font comparison.

commit | commitdiff | tree

Mischa POSLAWSKY [Thu, 3 Sep 2015 01:07:02 +0000 (03:07 +0200)]

append guillemets to quoting line

commit | commitdiff | tree

Mischa POSLAWSKY [Thu, 3 Sep 2015 00:46:23 +0000 (02:46 +0200)]

condense typography points

- Quoting styles on single line;
- Additional latin1 spacing accents in sentence;
- Simplify euro sign sample (extremely commonly supported nowadays),
removing need for double spacing.

commit | commitdiff | tree

Mischa POSLAWSKY [Thu, 3 Sep 2015 00:17:57 +0000 (02:17 +0200)]

arrow drawing characters

Random assortment of basic arrow characters, especially to test arrow line
extensions U+23AF/23D0. Also contains all eighth blocks and scan lines.

commit | commitdiff | tree

Mischa POSLAWSKY [Sat, 22 Aug 2015 04:20:17 +0000 (06:20 +0200)]

aztec scan code in block drawing characters

Test proper rendering and alignment (which can be verified by external scanner)
of an original 2D code creatable using Barcode Writer in Pure PostScript:

cat <<-. |
0 0 moveto (Unicode code) (ecaddchars=2)
/azteccode /uk.co.terryburton.bwipp findresource exec
.
cat /usr/share/libpostscriptbarcode/barcode.ps -

Older 2014 version in Debian needs pre-encoded input to create 15x15 code:

(10110111000111101010001001000000101001100000100100100000010100110)(raw)

Then converted via PBM to 2x2 box characters:

pstopnm -pbm -forceplain -stdout -portrait -xsize 15 \
-llx 0 -lly 0 -urx .42 -ury .42 -xborder 0 -yborder 0 |
perl -CO -ln -e'
use utf8;
/^[01]+$/  or next;
sub halfbits ($;$) {
pack "C*", map { $_ << $_[1] } unpack "C*", # multiply values
pack "(b2)*", split /..\K/, $_[0];  # value of every 2 bits
}
($_) = halfbits($_) | halfbits(<>, 2);  # 0..3 + 0,4,8,12
y/\0-\017/ ▘▝▀▖▌▞▛▗▚▐▜▄▙▟█/;
print;
'

Ironically, Aztec Codes do not really support Unicode (raw bytes are
declared to be Latin1), but it's currently the only widely supported 2D code
of this size.

commit | commitdiff | tree

Mischa POSLAWSKY [Thu, 3 Sep 2015 00:05:44 +0000 (02:05 +0200)]

ascii/jis art of bunny and japanese smiley thing

Bunny from <https://steamcommunity.com/groups/ascii-art> and elsewhere
to test practical line art.

Kaomoji amalgamated from various sources to mix widths and scripts
(Japanese, Tibetan, Geometric, Maths).

commit | commitdiff | tree

Mischa POSLAWSKY [Thu, 3 Sep 2015 00:04:25 +0000 (02:04 +0200)]

ascii art for practical box drawing test

ANSI art example from <https://commons.wikimedia.org?oldid=117654083>
featuring IBM-CP437 compatible block drawing commonly found in .nfo logos.

commit | commitdiff | tree

Mischa POSLAWSKY [Wed, 2 Sep 2015 22:25:06 +0000 (00:25 +0200)]

redesign box drawing to fit 137 code points in less space

The 5 7x7 box figures can be condensed into 3 with similar aesthetics,
more concisely targeted to:
- simple single and double lines (IBM CP850 superset),
- single/double transitions and round corners (previously 3rd and 4th boxes,
includes all IBM CP437 lines),
- heavy lines and diagonals (5th box with 1st innards).

Introduce another such drawing featuring all dashed lines (including
previously missing quadruple dashed horizontal lines) and block quadrants
(still missing ▚ and ▞).

Keep some smaller figures for heavy line combinations, introducing a single
#-shape to cover half line endings as well as 4 additional heavy transitions,
though various other such code points are still missing.

commit | commitdiff | tree

Mischa POSLAWSKY [Tue, 1 Sep 2015 16:42:16 +0000 (18:42 +0200)]

micro sign in scientific sample

Replace random length unit to cover another common (Latin1) code point.

commit | commitdiff | tree

Mischa POSLAWSKY [Tue, 1 Sep 2015 16:27:41 +0000 (18:27 +0200)]

append negative squared letters to mathematical fonts

A, B, and AB indicate blood types, and are supported
as :x: campfire/github/&c emoticon entities.

commit | commitdiff | tree

Mischa POSLAWSKY [Tue, 1 Sep 2015 16:18:17 +0000 (18:18 +0200)]

circled letters to introduce mathematical fonts

More general-purpose, but similar and more commonly supported
so leaves a basic impression if other glyphs do not render.

commit | commitdiff | tree

Mischa POSLAWSKY [Tue, 1 Sep 2015 16:03:05 +0000 (18:03 +0200)]

mathematical letter symbols (ABC in all styles)

Compare letterlike fonts at U+2100 and U+1D400.
Should not be used to create words.

commit | commitdiff | tree

Mischa POSLAWSKY [Tue, 1 Sep 2015 14:49:46 +0000 (16:49 +0200)]

replace mixed scripts in "stargate" diacritics sample

Prefer latin "turned V" over greek lambda for strokeless A.
May not matter in most fonts, but should be more appropriate,
or at least introducing another (more rare) character.

commit | commitdiff | tree

Mischa POSLAWSKY [Tue, 1 Sep 2015 14:38:52 +0000 (16:38 +0200)]

khoekhoen/nama pangram to cover khoisan orthography

Sample from <http://www.omniglot.com/writing/khoekhoe.htm>
for the most widely spoken "Khoisan" language, featuring 3 click letters
(only lacking ʘ) and 2 distinct tone accents.

commit | commitdiff | tree

Mischa POSLAWSKY [Tue, 1 Sep 2015 14:13:46 +0000 (16:13 +0200)]

igbo single line to feature all non-ascii letters

Keep only second sentence since it contains all unique characters.
This part should remain correct stand-alone, translating as "Rejoice, get
together, speak and agree that it may stand firm, (s)he surely will grow".

Translation and contraction from deleted Wikipedia post available at:
<http://wpedia.goo.ne.jp/enwiki/Wikipedia_talk:Articles_for_creation/Igbo_Pangram>

Unfortunately no suitable samples found featuring tone marks as commonly
found in practical orthographies.

commit | commitdiff | tree

Mischa POSLAWSKY [Wed, 26 Aug 2015 06:15:10 +0000 (08:15 +0200)]

indicate voiced sounds in katakana

Test properly voiced characters which unlike hiragana aren't covered by
the kanji version.

commit | commitdiff | tree

Mischa POSLAWSKY [Wed, 26 Aug 2015 06:13:54 +0000 (08:13 +0200)]

align japanese characters using ideographic spaces

Avoid excessive column misalignment with variable width rendering.
Test monospacing equivalence in header row only.

commit | commitdiff | tree

Mischa POSLAWSKY [Wed, 26 Aug 2015 05:56:02 +0000 (07:56 +0200)]

korean pangram with halfwidth jamo

Another alphabetic equivalent for modern korean, created by:

perl -Mcharnames=:full -CS -pe'package charnames;
s{\S}{chr vianame(viacode(ord $&) =~ s/^(?=HANGUL)/HALFWIDTH /r)}ge'

commit | commitdiff | tree

Mischa POSLAWSKY [Wed, 26 Aug 2015 05:50:29 +0000 (07:50 +0200)]

korean pangram with separate jamo

Purely alphabetic variant for further comparison, created using:

perl -Mcharnames=:full -CS -pe 'package charnames;
s/\N{HANGUL CHOSEONG IEUNG}//g;
s{\S}{chr vianame(viacode(ord $&) =~ s/^HANGUL \K\S+/LETTER/r)}ge'

commit | commitdiff | tree

Mischa POSLAWSKY [Wed, 26 Aug 2015 05:39:14 +0000 (07:39 +0200)]

korean pangram to compare jamo decomposition

Seeing how some fonts/terminals/editors mangle the jamo version of
hunminjeongeum, add a line of modern korean in different encodings
to more extensively test equivalent rendering.

This most complete option from <https://ko.wikipedia.org/?oldid=14664370>
contains all jamo including double consonants and combined vowels.

Decomposed version created using:
perl -MLingua::KO::Hangul::Util=:all -CS -ne 'print decomposeSyllable($_)'

commit | commitdiff | tree

Mischa POSLAWSKY [Wed, 26 Aug 2015 05:19:31 +0000 (07:19 +0200)]

original middle korean hangeul for hunminjeongeum

Based on <http://faq.ktug.or.kr/wiki/uploads/hunmin.uni> (3rd lines)
with private use characters replaced manually, and hangeul syllables
decomposed to match.

commit | commitdiff | tree

Mischa POSLAWSKY [Wed, 26 Aug 2015 05:17:50 +0000 (07:17 +0200)]

korean hunminjeongeum

Original introduction to hangeul in modern korean and classical chinese from
<https://ko.wikipedia.org?oldid=14743128> with 스물여덟 replaced by 28 to
test mixing modern digits.

commit | commitdiff | tree

Mischa POSLAWSKY [Tue, 25 Aug 2015 11:41:26 +0000 (13:41 +0200)]

test diacritic composition with latvian pangram

Compare the same sentence with precomposed and decomposed characters,
which should look alike with correct support for diacritics composition.
The (alternate) Latvian pangram features accents both above and below
letters, and does not match automated Unicode decomposition because
typographically preferred commas accents are used instead of cedillas.

commit | commitdiff | tree

Mischa POSLAWSKY [Tue, 25 Aug 2015 11:37:06 +0000 (13:37 +0200)]

replace duplicate "text" in introduction by synonym

commit | commitdiff | tree

Mischa POSLAWSKY [Tue, 25 Aug 2015 11:36:36 +0000 (13:36 +0200)]

shavian transcription of english panphone

commit | commitdiff | tree

Mischa POSLAWSKY [Tue, 25 Aug 2015 11:04:23 +0000 (13:04 +0200)]

runic punctuation in rune sentence

commit | commitdiff | tree

Mischa POSLAWSKY [Tue, 25 Aug 2015 11:03:18 +0000 (13:03 +0200)]

move old english near modern english section

commit | commitdiff | tree

Mischa POSLAWSKY [Tue, 25 Aug 2015 10:53:38 +0000 (12:53 +0200)]

border around font overview instead of typography list

commit | commitdiff | tree

Mischa POSLAWSKY [Tue, 25 Aug 2015 10:49:33 +0000 (12:49 +0200)]

adjust font overview to include more ascii characters

commit | commitdiff | tree

Mischa POSLAWSKY [Tue, 25 Aug 2015 10:31:07 +0000 (12:31 +0200)]

german pangram with precomposed ligatures

From HTML sampler, attributed to Karl Pentzlin.  Covers many digraphs and
trigraphs, some of which have been replaced by presentational forms if
available in Unicode.  This is mostly a technical test of code points,
not of proper typesetting: proper ligatures should be determined by fonts
and rarely matches only these 6 sets.

Common Fraktur ligatures: ch ck ff ffi ffl fft fi fl ft ll ſch ſi ſſ ſt tz;
replaced by single glyphs for:  ﬀ  ﬃ   ﬄ       ﬁ  ﬂ                  ﬅ.
Usage of long s precludes inclusion of U+FB06 ﬆ, but this is already present
elsewhere.

commit | commitdiff | tree

Mischa POSLAWSKY [Tue, 25 Aug 2015 10:27:26 +0000 (12:27 +0200)]

amend dutch pangram with short afrikaans to cover accents

Closely related languages augment each other well:
'n digraph is only used in afrikaans, ij only in dutch,
accented letters in both but more common in afrikaans.

Unicode sampler - various texts to test Unicode support

RSS Atom