unicode-sampler.git
5 months agodomino tile celebrating number 21 master
Mischa POSLAWSKY [Sun, 5 Nov 2023 20:25:16 +0000 (21:25 +0100)]
domino tile celebrating number 21

Technically 15 in base7, but assume the pips are read as base10 digits.
Alternatively U+1F0F5 🃵 PLAYING CARD TRUMP-21 might be represented by
roman ⅩⅡ, but not semantically.

5 months agoextended repertoire of latest muchunicode conversions
Mischa POSLAWSKY [Sun, 5 Nov 2023 18:59:09 +0000 (19:59 +0100)]
extended repertoire of latest muchunicode conversions

More relevant matches from later or obscure unicode extensions:
- Filler lookalikes and/or digraphs for:
  - "c" (with hook to refer to "h" pronunciation),
  - "un" (armenian t),
  - "i" (information symbol),
  - "co" (digraph),
  - "d" (roman 500),
  - "e" (estimated symbol).
- Acute variants for "h" and "d" (fuzzy semantics but similar).
- Letter variants with dot below,
  - only missing "C"/"c", good for combining dot and diaeresis.
- Stroked variants for "m" (mill) and "n" (oblique), improved for:
  - "d" (overlay),
  - "u" (stroke without smallcaps standin).
  - "U" (bar),
  - complementary digits.

5 months agomissing muchunicode conversions
Mischa POSLAWSKY [Thu, 26 Oct 2023 17:23:21 +0000 (19:23 +0200)]
missing muchunicode conversions

Initial copy of Fullwidth (plus Tags) and [A-cute, Rock Dots, Stroked]
pseudoalphabets from <https://qaz.wtf/u/convert.cgi?text=MuchUniCode>.

12 months agogame emojis (cards, hands, races, environments, foods)
Mischa POSLAWSKY [Mon, 17 Oct 2022 20:24:59 +0000 (22:24 +0200)]
game emojis (cards, hands, races, environments, foods)

Either direct representations, or picked visually most recognisable:

- playing card suits
  - French: diamonds, hearts, spades, clubs
  - Swiss: bells, shields, roses, acorns (chestnut)
  - Latin: clubs (dagger), cups (trophy), swords, coins
- rock paper scissors hands (fist palm V-sign)
  - excluding spock 🖖 and lizard 🤏
- Starcraft races (terran zerg protoss, or man bug alien)
- Magic the Gathering mana colours
  - sun for white plains
  - tree for green forests
  - fire(ball) for red mountains
  - skull for black swamps
  - water (drop) for blue islands
- Wingspan foods (cherries, worm, wheat (rice), fish, rat)

18 months agoencode actual data in base2048, also ecoji
Mischa POSLAWSKY [Sun, 21 Aug 2022 17:50:04 +0000 (19:50 +0200)]
encode actual data in base2048, also ecoji

Replace random payload by the popular 0x09F9 cryptographic key (belatedly)
following the 2007 protest/meme.

Alternate encoding in emoji glyphs <https://github.com/keith-turner/ecoji>
for some random coverage.

18 months agobase2048 encrypted line of random glyphs
Mischa POSLAWSKY [Sun, 21 Aug 2022 14:15:22 +0000 (16:15 +0200)]
base2048 encrypted line of random glyphs

Binary encoding <https://github.com/qntm/base2048> using Twitter "light"
characters, encountered for Hatetris <https://qntm.org/hatetris> replays.

18 months agoreplace fake armenian full stop
Mischa POSLAWSKY [Mon, 26 Sep 2022 15:49:10 +0000 (17:49 +0200)]
replace fake armenian full stop

Proper U+0589 (։) instead of lookalike colon U+3A (:).

18 months agoanimal band learns to play unicode 15 instruments
Mischa POSLAWSKY [Wed, 14 Sep 2022 00:22:02 +0000 (02:22 +0200)]
animal band learns to play unicode 15 instruments

Bring back turtle and snail to show off some flute and maracas.

19 months agoalfauxbet old-lisu letters D/G/L
Mischa POSLAWSKY [Sun, 11 Sep 2022 22:57:09 +0000 (00:57 +0200)]
alfauxbet old-lisu letters D/G/L

Prefer Fraser script ꓓꓖꓡ over (roman/cyrillic/armenian) ⅮԌԼ,
as the latter are more likely to appear visually distinct (serifed).

19 months agoone eight fill blocks in block box
Mischa POSLAWSKY [Sun, 18 Sep 2022 16:03:31 +0000 (18:03 +0200)]
one eight fill blocks in block box

19 months agofill overview table gap by fill blocks
Mischa POSLAWSKY [Sun, 18 Sep 2022 16:16:05 +0000 (18:16 +0200)]
fill overview table gap by fill blocks

19 months agoproper goose instead of swan imposter in duckling story
Mischa POSLAWSKY [Wed, 14 Sep 2022 00:09:08 +0000 (02:09 +0200)]
proper goose instead of swan imposter in duckling story

Finally a literal encoding of "and five white geese" in Unicode 15.0.

19 months agoblock support table updated to unicode 15.0
Mischa POSLAWSKY [Tue, 13 Sep 2022 23:45:30 +0000 (01:45 +0200)]
block support table updated to unicode 15.0

Copy Devanagari-A and Kawi (and bonus CJK-H) representations from just
updated Last resort <https://github.com/unicode-org/last-resort-font>.

19 months agoreplace subscript c replacement x by greek chi
Mischa POSLAWSKY [Sun, 11 Sep 2022 22:58:33 +0000 (00:58 +0200)]
replace subscript c replacement x by greek chi

Transliteration of chi matches the c in much.

19 months agosample characters from cjk extension G
Mischa POSLAWSKY [Fri, 2 Sep 2022 23:44:40 +0000 (01:44 +0200)]
sample characters from cjk extension G

Include named references given on Wikipedia about this block:

> The infamous exotic characters Biáng and Taito are present in this block,
> along with the character for Ky Fan's given name.

20 months agojavascript emoji (food transforms and combined people)
Mischa POSLAWSKY [Fri, 19 Aug 2022 00:41:15 +0000 (02:41 +0200)]
javascript emoji (food transforms and combined people)

Copied from: https://twitter.com/steveluscher/status/741089564329054208

20 months agoanother stable release v2.1
Mischa POSLAWSKY [Thu, 18 Aug 2022 14:17:51 +0000 (16:17 +0200)]
another stable release v2.1

20 months agofuck it animal band
Mischa POSLAWSKY [Fri, 30 Jul 2021 15:25:03 +0000 (17:25 +0200)]
fuck it animal band

Compilation of animals and musical instruments as found on Twitter
https://twitter.com/beIicoso/status/1218722519638794240 and
https://twitter.com/jenniferdaniel/status/1397294999289536515
with more varied animal talent inspired by replies.

20 months agothe fuzzy duckling in emoji
Mischa POSLAWSKY [Thu, 29 Jul 2021 19:17:07 +0000 (21:17 +0200)]
the fuzzy duckling in emoji

Original transcription from the 1949 book by Jane Werner Watson.
Missing geese and cattails replaced by approximating swans and rice ears
respectively.

20 months agocaterpillar uneatable illustrations
Mischa POSLAWSKY [Thu, 29 Jul 2021 19:05:25 +0000 (21:05 +0200)]
caterpillar uneatable illustrations

Addition of everything emojiable from the book.

20 months agothe very hungry caterpillar in emoji
Mischa POSLAWSKY [Thu, 29 Jul 2021 19:02:28 +0000 (21:02 +0200)]
the very hungry caterpillar in emoji

Transcription by Jennifer Daniel of the 1969 book by Eric Carle:
https://twitter.com/jenniferdaniel/status/1397793178351149061

20 months agoremove superfluous key symbols from overview
Mischa POSLAWSKY [Mon, 9 Mar 2020 22:28:39 +0000 (23:28 +0100)]
remove superfluous key symbols from overview

Lesser used symbols for alt ⎇ and newline ␤ are specific to shell prompts;
remaining option ⌥ and return ↵ are enough to get an impression.

20 months agoreorder currency symbols in font overview
Mischa POSLAWSKY [Wed, 17 Aug 2022 00:44:13 +0000 (02:44 +0200)]
reorder currency symbols in font overview

More problematic CJK yuan to secondary line, common pound and won on first.

20 months agoreduced font overview width with 5th line
Mischa POSLAWSKY [Mon, 15 Aug 2022 21:51:09 +0000 (23:51 +0200)]
reduced font overview width with 5th line

Remove less related columns for Georgian, Armenian, and Hebrew/Arabic,
as these are more rarely (expected to be) supported, provide a very
limited summary, and can be seen in the following table.

Move all symbols of ambiguous width to a new row, filled out with wide
emoji samples (common categories in GBoard &al), personal preferences:

- Smileys & Emotion: smiley
- People: raised hand
- Animals & Nature: turtle
- Food & Drink: coffee (only BMP glyph not in U+1F*)
- Travel & Places: ideal transport
- Activities & Events: trophy
- Objects: bulb
- Symbols: keyboard input
- Flags: checkered flag

20 months agoarmenian
Mischa POSLAWSKY [Sun, 14 Aug 2022 21:40:17 +0000 (23:40 +0200)]
armenian

Pangram (if "clunky") suggested by @Armenotype on Twitter.

Paragraph from Wikipedia including a number and Old Armenian quote.

2 years agofücking rock dots
Mischa POSLAWSKY [Mon, 6 Jul 2020 00:58:33 +0000 (02:58 +0200)]
fücking rock dots

Combine common umlaut and uncommon triple dot (logo from Die Ärzte)
with "metal" in each increasingly ill-suited CJK script.
Also attempt a diaeresis below exclamation mark to finish ströng
with overlapping dots.

Inspired by Chinese Häagen Dazs with a fucking umlaut in 哈̈根達斯 as seen on
<https://twitter.com/GretchenAMcC/status/1279629956406968321>.

2 years agoxml header in html example
Mischa POSLAWSKY [Sat, 4 Apr 2020 04:01:49 +0000 (06:01 +0200)]
xml header in html example

2 years agolatin alfauxbet (or fauxlphabet) A-Z in mostly cyrillic
Mischa POSLAWSKY [Tue, 20 Aug 2019 13:19:20 +0000 (15:19 +0200)]
latin alfauxbet (or fauxlphabet) A-Z in mostly cyrillic

Found 18 [extended] Cyrillic capitals that usually appear (near) identical
to Latin counterparts; so with remaining kin from Greek, Lisu, and Armenian,
and one mathematical symbol, can provide a complete IDN homograph attack
example (similar to AAA lookalikes earlier).

2 years agoslopes and diagonals from legacy drawing block U+1FBxx
Mischa POSLAWSKY [Thu, 29 Jul 2021 20:24:12 +0000 (22:24 +0200)]
slopes and diagonals from legacy drawing block U+1FBxx

Copied from test samples in vte <https://gitlab.gnome.org/GNOME/vte>
doc/boxes.txt added in commit 0.59.91~41 (2019-11-21).

2 years agocompass comparison in 11 columns of arrows and triangles
Mischa POSLAWSKY [Thu, 12 Aug 2021 03:20:40 +0000 (05:20 +0200)]
compass comparison in 11 columns of arrows and triangles

Besides existing single/double arrows and black triangles, add samples for
paired, white, black, triangle-headed (also w/bar), sans-serif, light barb
arrows, and white and medium triangles.

Double triangles last and separate because they are commonly interpreted as
double-width and emoji.

2 years agolegacy computing shade blocks block
Mischa POSLAWSKY [Thu, 29 Jul 2021 19:26:32 +0000 (21:26 +0200)]
legacy computing shade blocks block

Unicode 13.0 added vertical halves of medium shade and checkered fills,
combined with other gradients into a coherent testing box inspired by the
Unscii 2.0 test picture <http://viznut.fi/unscii/>.

2 years agoreorganise arrows like triangles
Mischa POSLAWSKY [Sat, 14 Nov 2020 20:12:57 +0000 (21:12 +0100)]
reorganise arrows like triangles

2 years agosymmetric kaomoji by reversing a tilde
Mischa POSLAWSKY [Fri, 13 Nov 2020 21:36:46 +0000 (22:36 +0100)]
symmetric kaomoji by reversing a tilde

2 years agochinese periodic table of elements
Mischa POSLAWSKY [Fri, 13 Nov 2020 13:49:35 +0000 (14:49 +0100)]
chinese periodic table of elements

Some chronology requiring up to Unicode 11.0 for the latest additions.
Copied from Wikipedia.

3 years agoaztek code in 2x3 block segments
Mischa POSLAWSKY [Tue, 10 Nov 2020 20:36:25 +0000 (21:36 +0100)]
aztek code in 2x3 block segments

Redraw the image from commit v1.0-30-g378bf4526a (2015-09-05)
using 27 distinct 2x3 block segments from Unicode v13.0,
replacing 14 2x2 drawing characters still covered in an earlier diagram.

Results in a square aspect ratio assuming tall fonts.

Larger micro QR code does not offer more distinct characters.

3 years agoredraw heavy grid transitions in 7x7 box
Mischa POSLAWSKY [Mon, 9 Nov 2020 20:23:47 +0000 (21:23 +0100)]
redraw heavy grid transitions in 7x7 box

Visually similar presentation of all significant light/heavy line characters.
No longer includes ╇╈╁╀ in addition to other more complex combinations:

  ┞┲┭┮┱┧
  ┟╀┾╈╉┦
  ┡┽╁╊╇┪
  ┢┵┺┹┶┩

3 years agoversion update to unicode 13.0
Mischa POSLAWSKY [Sun, 8 Nov 2020 07:22:41 +0000 (08:22 +0100)]
version update to unicode 13.0

3 years agoblock allocation glyphs for brahmic zone U+11xxx
Mischa POSLAWSKY [Sun, 8 Nov 2020 08:37:18 +0000 (09:37 +0100)]
block allocation glyphs for brahmic zone U+11xxx

Up to unicode 13.0, mostly following Last Resort icons in
<https://github.com/unicode-org/last-resort-font/releases/tag/13.001>.

3 years agoupdate sanskrit transcriptions
Mischa POSLAWSKY [Sat, 7 Nov 2020 17:40:39 +0000 (18:40 +0100)]
update sanskrit transcriptions

Replace flawed contents with recent improvements from:
https://commons.wikimedia.org/wiki/File:संस्कृतम्.png?oldid=495574592

Seems to address concerns raised on
https://mendenlama.tumblr.com/post/120050473698/camfoc-issues

3 years agoinverse bullet in empty center of block drawing
Mischa POSLAWSKY [Sat, 7 Nov 2020 16:13:02 +0000 (17:13 +0100)]
inverse bullet in empty center of block drawing

3 years agodrop icelandic pangram
Mischa POSLAWSKY [Sat, 7 Nov 2020 15:56:32 +0000 (16:56 +0100)]
drop icelandic pangram

Unique thorn and eth both represented in old english just above.

3 years agoyezidi letter at empty U+10E8x block
Mischa POSLAWSKY [Sat, 7 Nov 2020 16:37:07 +0000 (17:37 +0100)]
yezidi letter at empty U+10E8x block

3 years agochange letter symbols to MuchUniCode
Mischa POSLAWSKY [Sat, 7 Nov 2020 14:47:28 +0000 (15:47 +0100)]
change letter symbols to MuchUniCode

Replace symbols by spelling out a more descriptive phrase without as much
encouragement.

Slight increase in glyph coverage, with different advantages, notably:
- Double-struck letters include C from legacy letterlike block.
- Fraktur letters with legacy black-letter C.
- Missing subscript letters for c and d replaced by soundalikes.
- Squared O might be an exceptional emoji :o2: (blood type).
- No good representation for rotated U, substitute big ∩.
- Epigraphic inverted M from latin extended D.
- Include negative enclosed variants instead of missing lowercase.
- Circled japanese and korean to spell out mu-ch.

Count each item with a number in similar style if possible.
Reorder considering those with only single digits available.

Most characters generated by: echo MuchUniCode |
perl -mcharnames -CO -E'
my $line = <STDIN>;
print $line =~ s{\S}{
$name = uc "$_ $&";
$name =~ s/CAPITAL/SMALL/ if $& =~ /\p{Ll}/;
chr(charnames::vianame($name) || 0xFFFD)
}egr for @ARGV;
' \
'MATHEMATICAL MONOSPACE CAPITAL' \
'MATHEMATICAL BOLD CAPITAL' \
'MATHEMATICAL ITALIC CAPITAL' \
'MATHEMATICAL BOLD ITALIC CAPITAL' \
'MATHEMATICAL DOUBLE-STRUCK CAPITAL' \
'MATHEMATICAL SANS-SERIF CAPITAL' \
'MATHEMATICAL SANS-SERIF BOLD CAPITAL' \
'MATHEMATICAL SANS-SERIF ITALIC CAPITAL' \
'MATHEMATICAL SANS-SERIF BOLD ITALIC CAPITAL' \
'LATIN CAPITAL LETTER TURNED' \
'MATHEMATICAL FRAKTUR CAPITAL' \
'MATHEMATICAL BOLD FRAKTUR CAPITA' \
'MATHEMATICAL SCRIPT CAPITAL' \
'MATHEMATICAL BOLD SCRIPT CAPITAL' \
'MODIFIER LETTER CAPITAL' \
'LATIN SUBSCRIPT SMALL LETTER' \
'PARENTHESIZED LATIN CAPITAL LETTER' \
'CIRCLED LATIN CAPITAL LETTER' \
'SQUARED LATIN CAPITAL LETTER'

3 years agodeobfuscate haskell golf a bit
Mischa POSLAWSKY [Sat, 31 Oct 2020 08:08:33 +0000 (09:08 +0100)]
deobfuscate haskell golf a bit

Keep only a single <$> to test its ligature.
Language comment in front and in full.

3 years agocjk extension G character biáng
Mischa POSLAWSKY [Wed, 11 Mar 2020 19:48:04 +0000 (20:48 +0100)]
cjk extension G character biáng

Famously complex with 58 strokes, and very recently added so minimally
supported (in some cases not even recognised as a wide character).
More details on <https://en.wikipedia.org/wiki/Biangbiang_noodles
#Chinese_character_for_bi.C3.A1ng> of course.

3 years agocompare variant forms in chinese transliteration
Mischa POSLAWSKY [Sat, 31 Oct 2020 03:38:44 +0000 (04:38 +0100)]
compare variant forms in chinese transliteration

Choose characters to showcase differences in simplified (chinese and
japanese) and traditional cjk, sometimes even matching a separately encoded
root symbol.  Also adds some cyrillic of dungan mandarin, and (traditional)
character composition in IDS.

Words mostly chosen at random, pronunciation from wiktionaries.

3 years agoadditional potential A-lookalikes
Mischa POSLAWSKY [Tue, 21 Apr 2020 04:06:45 +0000 (06:06 +0200)]
additional potential A-lookalikes

Related letters found in Pau Cin Hau, Carian, Old Italic and Lydian,
which could look identical depending on the font.

Same for mathematical monospace symbol assuming monospace display.

3 years agostroke diacritics can emulate underline, strikethrough, overscore
Mischa POSLAWSKY [Fri, 30 Oct 2020 23:07:15 +0000 (00:07 +0100)]
stroke diacritics can emulate underline, strikethrough, overscore

Support is remarkably bad in some fonts, with marks not connecting,
being mispositioned, or not even combining.

3 years agoreorder overview to align math and currency symbols
Mischa POSLAWSKY [Tue, 21 Apr 2020 03:11:55 +0000 (05:11 +0200)]
reorder overview to align math and currency symbols

3 years agosample glyphs from kStrange categories A-U
Mischa POSLAWSKY [Sat, 4 Apr 2020 03:49:58 +0000 (05:49 +0200)]
sample glyphs from kStrange categories A-U

From Unicode proposal L2/20-059 by Ken Lunde containing Han ideographs
considered "strange" in 12 categories:

Asymmetric, Bopomofo, Cursive, Fully-reflective, Hangul Component,
Incomplete, Katakana Component, Mirrored, Odd Component, Rotated,
Stroke-heavy, Unusual Arrangment/Structure.

Pick 4 mostly random characters, preferably from different CJK blocks of
increasing version up to extension F.

3 years agovowelless arabic transliteration
Mischa POSLAWSKY [Sat, 31 Oct 2020 07:15:43 +0000 (08:15 +0100)]
vowelless arabic transliteration

Omit spoken sounds not in the written script.

3 years agoarabic test sentences
Mischa POSLAWSKY [Sat, 4 Apr 2020 00:58:53 +0000 (02:58 +0200)]
arabic test sentences

From "Automating the generation and typesetting of Arabic script" by Mansour,
test in §5.4.3:

> The eight sentences were chosen to test these main features:
> - Each letter is correctly generated without compatibility problems with MetaPost
> - Choosing the correct forms of letters (context analysis engine test)
> - Connections between letters (joining problems that require modification in Meta-font files)
> - Kerning

3 years agoarabic presentational forms
Mischa POSLAWSKY [Sat, 4 Apr 2020 00:57:10 +0000 (02:57 +0200)]
arabic presentational forms

Copy contextual forms from vim output +set arabic to compare with rendered
ligatures.  Besides testing compatibility encoding, they mostly serve to
highlight missing support in some terminals (or even rtl if they do).

3 years agoarabic vowel diacritics
Mischa POSLAWSKY [Sat, 4 Apr 2020 00:45:14 +0000 (02:45 +0200)]
arabic vowel diacritics

Fully vocalised version with roman transliteration copied from:
http://clagnut.com/blog/2380/#Perfect_pangrams_in_English_.2826_letters.29

3 years agocommon arabic pangram
Mischa POSLAWSKY [Sat, 4 Apr 2020 00:15:02 +0000 (02:15 +0200)]
common arabic pangram

Mentioned in "Computational Linguistics, Speech and Image Processing for
Arabic Language" by El Gayar and Suen at Test image 3.1.4.b as "the most
common Arabic pangram ... containing all of the basic letters".

Copied version from <http://clagnut.com/blog/2380/
#Perfect_pangrams_in_English_.2826_letters.29> with additional diacritics.

3 years agolarger african sample in taa
Mischa POSLAWSKY [Sat, 4 Apr 2020 00:03:25 +0000 (02:03 +0200)]
larger african sample in taa

Part of a story on <http://archive.phonetics.ucla.edu/Language/NMN
/nmn_story_1972_01.html> transcript using special glyphs ǀǁǂǃʘʔɟʼʰʲa̰ɜʉʉ̰m̩ŋn̩ɲə
(the last two missing in this snippet).

Contains all distinctive characters of the Khoekhoe pangram, so replace that
by a phrase from <https://en.wikipedia.org/wiki/Taa_language?oldid=943791933>
with slightly different orthography including ɢ.

3 years agoForTheWin in smallcaps, superscript, subscript, turned letters
Mischa POSLAWSKY [Sat, 4 Apr 2020 01:04:58 +0000 (03:04 +0200)]
ForTheWin in smallcaps, superscript, subscript, turned letters

3 years agocompare mathematical letter symbols for ForTheWin
Mischa POSLAWSKY [Thu, 19 Mar 2020 02:08:15 +0000 (03:08 +0100)]
compare mathematical letter symbols for ForTheWin

Reintroduce an even more elaborate overview of letterlike scripts, dismissed
in commit 26e8aa257e (2015-09-12) [drop mathematical ABC symbols line].
While it remains as ill-advised for spelling out words, unfortunately it is
widely used that way nowadays (at least in Twitter user names).

Regardless, style should be consistent (especially considering characters
have been introduced in different versions) and distinct (as it's intended
for unique variables), so a comparison makes sense in any case.
Just put it after proper language scripts.

3 years agocjk comparison of traditional, simplified, shinjitai
Mischa POSLAWSKY [Sat, 4 Apr 2020 03:42:48 +0000 (05:42 +0200)]
cjk comparison of traditional, simplified, shinjitai

Sample characters copied from <https://en.wikipedia.org/wiki
/Differences_between_Shinjitai_and_Simplified_characters?oldid=932950382>
"Different simplifications in both languages".

3 years agorandom characters from cjk extension F
Mischa POSLAWSKY [Sat, 14 Mar 2020 22:19:54 +0000 (23:19 +0100)]
random characters from cjk extension F

3 years agocommon line break after ethiopic header
Mischa POSLAWSKY [Sat, 14 Mar 2020 22:18:11 +0000 (23:18 +0100)]
common line break after ethiopic header

Match /^\N+:\n\n/ syntax like other titles.

3 years agosingle word of zalgo should suffice
Mischa POSLAWSKY [Sat, 14 Mar 2020 22:06:06 +0000 (23:06 +0100)]
single word of zalgo should suffice

Generated by <https://www.zalgotextgenerator.com/>.

3 years agoappend cypriot glyph to linear A space
Mischa POSLAWSKY [Sat, 14 Mar 2020 21:50:29 +0000 (22:50 +0100)]
append cypriot glyph to linear A space

Previously omitted as it shares the same 0x80 space as Aramaic,
but can be safely moved a bit since the preceding code point is empty.

3 years agomatch circled ideograph to cjk representation
Mischa POSLAWSKY [Sat, 14 Mar 2020 21:44:33 +0000 (22:44 +0100)]
match circled ideograph to cjk representation

Reuse a common glyph as outlined in the Last Resort guidelines, improving
over their own choice.

3 years agomove A homographs to typography section
Mischa POSLAWSKY [Sat, 14 Mar 2020 21:32:07 +0000 (22:32 +0100)]
move A homographs to typography section

Good place to test for expected similarities in derived writing systems,
next to unwanted same-script lookalikes.

The large amount of O-ish glyphs do not really add much except for testing
script coverage.

3 years agoinclude letter Ø as common 0 lookalike
Mischa POSLAWSKY [Sat, 14 Mar 2020 21:09:49 +0000 (22:09 +0100)]
include letter Ø as common 0 lookalike

Programming fonts regularly use a slashed zero to distinguish the number,
but in some cases make it look very similar to the scandinavian vowel.
In low quality could even be mistaken for 8 shown subsequently.

Confusion with the symbols ∅ and ⌀ is less likely, usually having a
different shape, as do glyphs representing a dotted digit 0 (ʘ, ☉, ⊙, ⨀).

3 years agolower pronunciation of Arthur in panphone
Mischa POSLAWSKY [Sat, 14 Mar 2020 20:22:40 +0000 (21:22 +0100)]
lower pronunciation of Arthur in panphone

Replace duplicate mid-central ɚ by open-mid ɝ for a near-identical result
with a different glyph, already distinguished in all other orthographies.

3 years agoreorder panphone to drop "on" in second line
Mischa POSLAWSKY [Sat, 14 Mar 2020 20:12:12 +0000 (21:12 +0100)]
reorder panphone to drop "on" in second line

Equivalent sentence containing the same sounds, but shorter so it fits
within 76 characters.

3 years agopanphone in deseret
Mischa POSLAWSKY [Sat, 14 Mar 2020 20:36:09 +0000 (21:36 +0100)]
panphone in deseret

Manual composition, disregarding the automated translation by <2deseret.com>
as it does not conform to standard spelling outlined in
<http://www.chem.ucla.edu/~jericks/Historical%20or%20Technical/Linguistics/Deseret_Guide.pdf>

Consider it a mostly dead script, so not positioned next to other English.

3 years agoalign U+03x blocks for single width hexagram
Mischa POSLAWSKY [Sat, 14 Mar 2020 00:21:44 +0000 (01:21 +0100)]
align U+03x blocks for single width hexagram

Assume U+4DC3 symbol should be one column wide, ignoring double width
displayed by Unifont.

3 years agosmp block allocation glyphs
Mischa POSLAWSKY [Thu, 12 Mar 2020 05:31:09 +0000 (06:31 +0100)]
smp block allocation glyphs

Extend representation characters for U+10000-10FFF usually copying
the Unicode font <https://github.com/unicode-org/last-resort-font>.

3 years agoprefer apple last resort glyphs
Mischa POSLAWSKY [Thu, 12 Mar 2020 02:45:16 +0000 (03:45 +0100)]
prefer apple last resort glyphs

Described at <http://developer.apple.com/fonts/LastResortFont/>
for a more uniform style:

> Unicode blocks are illustrated by a representative glyph from the block,
> chosen to be as distinct as possible from glyphs of other blocks.
>
> Examplar glyphs were chosen in a number of ways.  Almost all of the
> Brahmic scripts show the initial consonant ka.  Latin uses the letter A
> because it's the first letter, and because in each Latin block there is
> a letter A so they can be easily differentiated.  Greek and Cyrillic use
> their last letters, omega and ya, because they are so distinctive.  Most
> other alphabets and syllabaries use their initial letter where
> distinctive.

Try to avoid unnecessary exceptions, though in some cases I can't help but
know better (usually improving distinctiveness, especially considering
unknown output variants).

Restrict to a single entry per 0x80, mostly keeping the latest unicode
version for maximum effort.

3 years agobmp block allocation glyphs
Mischa POSLAWSKY [Wed, 11 Mar 2020 21:51:08 +0000 (22:51 +0100)]
bmp block allocation glyphs

Overview of BMP blocks similar to <http://sheet.shiar.nl/charset/unicode>
each represented by an identifying glyph copied from Unidings v9.19
<http://users.teilar.gr/~g1951d/Unidings.pdf> by George Douros.

Characters align neatly to 0x40 code points, preferring every other column
if feasible, but keeping various smaller (rtl, brahmic) scripts for now.
Silently break positions between U+3400 and U+A400 because these only
contain cjk and yi with usually uniform font coverage.

3 years agoupdate version to unicode 10.0
Mischa POSLAWSKY [Tue, 10 Mar 2020 01:54:56 +0000 (02:54 +0100)]
update version to unicode 10.0

Required for recently added hentaigana.

3 years agohentaigana variant of iroha
Mischa POSLAWSKY [Mon, 9 Mar 2020 23:02:33 +0000 (00:02 +0100)]
hentaigana variant of iroha

Version from 現今児童重宝記 <https://www.sljfaq.org/afaq/iroha.html>
(No. 6) painstakingly matched to unicode glyphs described on
<https://en.wikipedia.org/wiki/Hentaigana?oldid=935497055>
with uncertainties resolved by comparing <https://www.semanticscholar.org
/paper/Distinction-and-Difference:-From-Kana-to-Hiragana-Marks
/f334ea3cb70e933ed4f52e174a41c0242d5204a2/figure/5>.

3 years agohaskell oneliner with programming ligatures
Mischa POSLAWSKY [Mon, 9 Mar 2020 22:41:25 +0000 (23:41 +0100)]
haskell oneliner with programming ligatures

Some obfuscated code (not particularly typical) as found and explained on
<https://stackoverflow.com/questions/12659951/-obfuscated-haskell-code-work>
featuring multi-character combinations <$>, <*>, =<<, >>= substituted by
"modern" coding fonts such as <https://github.com/tonsky/FiraCode>.

This whole practice seems like an awful idea to me, but regardless needs to
be represented for font comparison.

3 years agopowerline lookalikes for branch and linenr
Mischa POSLAWSKY [Mon, 9 Mar 2020 22:28:48 +0000 (23:28 +0100)]
powerline lookalikes for branch and linenr

Indicators for vc branch and line number are common requirements of modern
status bars.  While console fonts still prefer private use area U+E0Ax,
similar symbols for "alternate key" and "newline" as mentioned in
<https://vi.stackexchange.com/a/3363> can be advertised instead.

3 years agoreorder minority alphabets in overview
Mischa POSLAWSKY [Mon, 9 Mar 2020 22:22:57 +0000 (23:22 +0100)]
reorder minority alphabets in overview

Cyrillic before Greek as it's stylistically closer to Latin.
Georgian before Armenian as it aligns better with following lines.

3 years agoreorganise overview table
Mischa POSLAWSKY [Mon, 9 Mar 2020 22:16:57 +0000 (23:16 +0100)]
reorganise overview table

3 years agorestrict currencies to most traded
Mischa POSLAWSKY [Mon, 9 Mar 2020 20:43:45 +0000 (21:43 +0100)]
restrict currencies to most traded

A "compact" overview does not need 17 different currency symbols, mostly
inherited from the <http://kermitproject.org/utf8.html> sampler line.
Test only internationally significant valuta, guided by the top 28 listed
at <https://en.wikipedia.org/wiki/Currency?oldid=942055810#cite_ref-10>,
keeping recent additions (₹, ₽) and adding a full-width character (元).

3 years agohomoglyphs of A and O
Mischa POSLAWSKY [Mon, 25 Nov 2019 13:13:37 +0000 (14:13 +0100)]
homoglyphs of A and O

Collect visually similar characters from different scripts.
Unlike ASCII lookalikes presented earlier, these are not expected to be
distinguishable if mixed, and a worst-case scenario of homograph attacks.

3 years agoexcessive, scary usage of diacritics; ZALGO!
Mischa POSLAWSKY [Mon, 22 Oct 2018 22:04:28 +0000 (00:04 +0200)]
excessive, scary usage of diacritics; ZALGO!

Copied from <https://knowyourmeme.com/memes/zalgo#scrambled-text>
to stress test combining marks:

- Rendering limitations may exclude glyphs after a certain number.
- Accumulated marks should extend vertically to avoid overlapping.
- Monospace rendering with increased height may cause lines to overlap
  or be cropped.

3 years agocoptic sample text in old nubian
Mischa POSLAWSKY [Fri, 29 Jun 2018 20:06:21 +0000 (22:06 +0200)]
coptic sample text in old nubian

Equivalents in coptic and greek characters, copied from:
https://en.wikipedia.org/wiki/Old_Nubian?oldid=847789397#Sample_text

3 years agodifferent vietnamese dong
Mischa POSLAWSKY [Mon, 9 Mar 2020 22:33:33 +0000 (23:33 +0100)]
different vietnamese dong

Fill available space by a "different" expression with the đồng currency sign
used in its homonym (compensating for its removal from the font overview),

followed by IPA pronunciation averaging several Wiktionary entries including
<https://zh.wiktionary.org/wiki/bất_đồng?oldid=4888545> to most
significantly provide missing ɓ, ɗ, ɜ.

3 years agopointer compass, triangles in all directions
Mischa POSLAWSKY [Fri, 29 Jun 2018 20:04:47 +0000 (22:04 +0200)]
pointer compass, triangles in all directions

Black triangle characters and related, similar (and next) to arrows.

3 years agorandom kaomoji faces
Mischa POSLAWSKY [Fri, 29 Jun 2018 20:03:09 +0000 (22:03 +0200)]
random kaomoji faces

Test appearance of some common Japanese face characters, picked from
https://en.wikipedia.org/wiki/Emoticon#Japanese_style and (mixed)
https://en.wikipedia.org/wiki/List_of_emoticons#Eastern containing
some complex Unicode glyphs.

Excellent test of mixed scripts and common visual expectations.

3 years agosanskrit transcriptions from wikipedia
Mischa POSLAWSKY [Fri, 29 Jun 2018 20:01:39 +0000 (22:01 +0200)]
sanskrit transcriptions from wikipedia

Compare some brahmic scripts with a common sentence copied from:
https://commons.wikimedia.org/?title=File:Phrase_sanskrit.png&oldid=308591152

Meaning seems nice and related:

> May Śiva bless those who take delight in the language of the gods

Even though the variants have various issues, and actual source is unknown
according to http://mendenlama.tumblr.com/post/120050473698/camfoc-issues:

> the ascription of this blessing phrase to Kālidāsa is spurious

3 years agoshavian corrections for "waters", "heard"
Mischa POSLAWSKY [Fri, 29 Jun 2018 14:09:31 +0000 (16:09 +0200)]
shavian corrections for "waters", "heard"

Fix waters being incorrectly transcribed as woiters, losing oil.
Replace expected h-err-d by morphophonemic h-ear-d to include another glyph.

Remaining letters unrepresented: 𐑬𐑭𐑲𐑴 𐑹𐑺𐑾

3 years agoalign hangeul decomposition
Mischa POSLAWSKY [Sat, 31 Mar 2018 18:23:14 +0000 (20:23 +0200)]
align hangeul decomposition

Start sentences at same column assuming expected character widths.

3 years agochinese transliteration below samples
Mischa POSLAWSKY [Wed, 13 Jul 2016 17:08:18 +0000 (19:08 +0200)]
chinese transliteration below samples

Mixed scripts after more typical CJK.

3 years agocantonese transliteration (jyutping, ipa)
Mischa POSLAWSKY [Sun, 5 Jun 2016 12:44:23 +0000 (14:44 +0200)]
cantonese transliteration (jyutping, ipa)

One character per line for better overview and space for additional details,
introducing common non-pinyin tone digits and ɵ pronunciation.

8 years agoupdate dated update date to uptodate date v2.0
Mischa POSLAWSKY [Sun, 13 Sep 2015 18:17:12 +0000 (20:17 +0200)]
update dated update date to uptodate date

8 years agosymmetric ascii art bunny
Mischa POSLAWSKY [Sun, 13 Sep 2015 18:07:29 +0000 (20:07 +0200)]
symmetric ascii art bunny

Keep to ASCII characters as commonly used (curved quotation marks were
likely substituted due to an erroneous copypaste).

8 years agodrop mathematical ABC symbols line
Mischa POSLAWSKY [Sat, 12 Sep 2015 13:29:06 +0000 (15:29 +0200)]
drop mathematical ABC symbols line

Places too much emphasis on an relatively insignificant plane 1 block.
One such character also introduced in commit 30491ef4cf (2015-09-09)
[complex conjugate formula to cover blackletter and italic letters]
remains elsewhere.

8 years agoinsert non-joiner between non-ligature fl in german pangram
Mischa POSLAWSKY [Sat, 12 Sep 2015 13:25:54 +0000 (15:25 +0200)]
insert non-joiner between non-ligature fl in german pangram

Lost during copypaste from original.

8 years agofix mistyped letter in greek iliad
Mischa POSLAWSKY [Fri, 11 Sep 2015 18:11:40 +0000 (20:11 +0200)]
fix mistyped letter in greek iliad

Obvious mistake caught while rereading.

8 years agoglagolitic tower of bable transcription
Mischa POSLAWSKY [Fri, 11 Sep 2015 17:22:47 +0000 (19:22 +0200)]
glagolitic tower of bable transcription

Another line to properly finish the story.  Preferred succession from
Slavonic would be old Croatian in Glagolitic script.  However, unable to
find any such version online, settle for an original composition.

Based on a different source of Church Slavonic without abbreviations
from <http://www.vechnoe.info/bible/translit/gen/11>:

Прїидѣте и изшедше смѣсимъ имъ ту язы́ки ихъ,
да не услы́шатъ ко́ждо дру́га своего.

Converted to Glagolitic using some naive conversion rules:

tr{абвгдежѕзиїклмнопрстуфхѡщцчшъыьѣёюяѩѫ,.}
  {ⰰⰱⰲⰳⰴⰵⰶⰷⰸⰺⰻⰽⰾⰿⱀⱁⱂⱃⱄⱅⱆⱇⱈⱉⱋⱌⱍⱎⱏⰹⱐⱑⱖⱓⱔⱗⱘ·:};
s/ⰹ/ⱏⰹ/g;
s/\Bⰺ/ⰻ/g;

Arbitrarily appended three dot+paragraphos punctuation to end text.

8 years agocyrillic tower of babel in multiple slavic languages
Mischa POSLAWSKY [Fri, 11 Sep 2015 14:32:41 +0000 (16:32 +0200)]
cyrillic tower of babel in multiple slavic languages

Replace Russian sample by Genesis 11:1-6 with each line in another
translation from <http://www.omniglot.com/babel/langfam.htm#ie>:
Russian, Serbian, Belarusian, Ukrainian, Macedonian, Church Slavonic.
Adds 16 distinct letters in 29 forms, only loses ф.

Manually transcribed the image of Slavonic (hopefully correctly),
featuring obsolete letters (yat, yus, ou both monographic and digraphic)
and diacritics including U+0483 titlo and U+2DED es with pokrytie.

8 years agocomplex conjugate formula to cover blackletter and italic letters
Mischa POSLAWSKY [Wed, 9 Sep 2015 21:07:56 +0000 (23:07 +0200)]
complex conjugate formula to cover blackletter and italic letters

Elaborate on complex numbers ℂ, covering some more symbols including an
italic i from plane 1.  Should provide a new challenge to render correctly
(notably aligning bracket lines after combining mark and 4-byte UTF-8).

8 years agoreorder languages to transition from semitic to indic
Mischa POSLAWSKY [Wed, 9 Sep 2015 20:44:50 +0000 (22:44 +0200)]
reorder languages to transition from semitic to indic

Ethiopic after Hebrew (similar languages, simple rendering);
Thai before Hindi to group the more complex scripts together.