Identify Japanese and Korean unsimplified, canonical characters #5

Transfusion · 2021-01-28T09:12:34Z

卫、衛、衞󠄀

Note that in Japan, https://www.kanjipedia.jp/kanji/0000403800 衞󠄀 is the 旧字 of 衛 (!!)

One cannot go hunting in the Unihan database directly since they are preexisting variants in G sources too - https://www.unicode.org/cgi-bin/GetUnihanData.pl?codepoint=U%2B885E

「說文解字」has https://dict.variants.moe.edu.tw/variants/rbt/word_attribute.rbt?quote_code=QTAyNzY4 眞, and furthermore goes on to say: 僊人變形而登天也。从匕从目从乚。 Korea and Japan consider this variant to be canonically traditional.

The case of 既 and 即 is strange in Japanese: they are 既 and 卽 respectively.

Transfusion · 2021-01-30T10:52:27Z

Unsimplified canonical Japanese variants are mostly available here
https://github.com/cjkvi/cjkvi-variants/blob/e4f1da248c9737a243f9930b5dc497cef5d5ae16/jp-old-style.txt#L64-L69

Korean variants of the same nature are taken from the 1800 Hanja for Everyday Use

I consider variants of this nature (along with simplified / traditional chinese / the numerals / shinjitai in joyo kanji, radicals, etc) to be orthographic variants to ensure they are grouped together
https://github.com/Transfusion/cjk-radical-search/blob/19d0d1b672d7a652bfcd6cc784dcd43ce7c669e1/etl/variants-fetcher.ts#L109

https://github.com/Transfusion/cjk-radical-search/blob/19d0d1b672d7a652bfcd6cc784dcd43ce7c669e1/etl/variants-fetcher.ts#L241-L260

TODO: investigate the 1800 korean hanja list and check whether any of them are not in the commonly used traditional chinese set, as I do not include them when computing orthographic variants, rather only in the expandVariantIslands function (TBD: discussion on what this does and the design issues faced)

https://github.com/Transfusion/cjk-radical-search/blob/19d0d1b672d7a652bfcd6cc784dcd43ce7c669e1/etl/genVariants.ts#L105-L116

Transfusion added a commit that referenced this issue Jan 30, 2021

Generate islands of related variant characters, solve issues #5 and #6

19d0d1b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Identify Japanese and Korean unsimplified, canonical characters #5

Identify Japanese and Korean unsimplified, canonical characters #5

Transfusion commented Jan 28, 2021 •

edited

Loading

Transfusion commented Jan 30, 2021

Identify Japanese and Korean unsimplified, canonical characters #5

Identify Japanese and Korean unsimplified, canonical characters #5

Comments

Transfusion commented Jan 28, 2021 • edited Loading

Transfusion commented Jan 30, 2021

Transfusion commented Jan 28, 2021 •

edited

Loading