Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> we need a standard

That’s exactly the benefit. Think of Unicode as “Archive.org for the semantics of text codels.” Every time someone invents a text codel (like those examples you gave, where Slack invented their own text codels), Unicode takes the semantics behind that codel and standardizes their own codepoint equivalent to it, so that Unicode documents will have a way of encoding that text codel at the Unicode-text level.

If Unicode doesn’t do this, then people have to use other encodings on top of Unicode to specify their text; they come up with incompatible encodings of the same semantic characters; and suddenly we’re back to having to create code-pages to specify what set of incompatible codels each text stream is using. It also means that we go back to having to create OS-specific, or GUI-toolkit-specific rendering encodings to translate those code-pages into a text-layout-system specific “normalized” encoding; and thus, we also go back to having OS/GUI-toolkit specific fonts.

> Emoji aren’t programmatically generated.

No, not usually, but they can totally be programmatically consumed. They’re machine-readable! The modifiers allow emoji to be “structured text” in the ML sense. Since there’s one codepoint that always means “sad face”, an algorithm can attach a meaning to that codepoint apart from its modifiers, to do e.g. sentiment analysis. It’s much harder to learn when you have 100 different “sad face” codepoints; let alone when different documents use different incompatible encodings with different codels to refer to the same “sad face.”

> better support for custom emoji

That’d be like saying a dictionary should better support private in-jokes. Unicode, like a dictionary, watches for when things become common or important to define, and then defines them. Until then, it considers them “someone’s attempt at injecting a gibberish in-joke into language.” In-jokes don’t need an entry; but as soon as an in-joke becomes a word, it does. Because people don’t look up in-jokes, but they do look up slang (= ascended in-jokes), so if you want your dictionary to be useful, you’d better give them a definition for slang terms.

Actually, that’s a very good equivalence: emoji are to Unicode as slang terms are to dictionaries. In both cases, people think it’s ridiculous that the authors of the text would include them; but in both cases, the usefulness of the work would be hampered if they didn’t.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: