Back in the days of Total Request Live, the Spice Girls, and AOL, we had to be pretty creative to express ourselves on the internet. Emoticons like <3 and :D were one of the few ways to convey emotion in the nascent digital space and the only tools available to us were what appeared on our keyboard.

Fast forward to 2015 and Kim Kardashian’s Kimoji app hit the top of the App Store, the world celebrated the humble 🌮’s arrival (if you cant see that, update your OS already), and “Face with Tears of Joy” 😂 became Oxford Dictionaries word of the year (is 😂 even in Oxford Dictionaries?). 

Why did we have to wait 20 years, which is an eternity in internet 🕑, for the emoji Renaissance?

Decoder Rings

Imagine you want to email a friend, but you spilled ☕️ (this is the last gratuitous emoji I promise) all over your laptop. Your keyboard is completely busted except for the row of numbers, preventing you from typing normally. Since you don’t want to buy a new computer and this is a contrived hypothetical situation, you decide to encode your messages using only numbers. Not too challenging right? Just replace each letter with its order in the alphabet like a Cracker Jack decoder ring. You also want your friend to be able to read your cryptic messages, so you give them this handy table (the number 0 signifies a space):

Encoding Table V1

Congratulations, you’re halfway there to inventing the ASCII standard. This is essentially what your computer does when it sends and displays text. Engineers like to call numbers that represent characters “code points” and in this example we’ll represent code points with two digits (like "07") so it’s easy to look up in a table and looks cooler.

Encoded Text: 01 14 04 00 01 14 15 20 08 01 00 15 14 05

We can now communicate with merely 27 numbers which is pretty neat. However, to send something like “MEET AT 2 PM” we need to handle numerals, so let's have 00-09 represent 0-9 and bump the alphabet up to code points 10-36.

Encoding Table V2

What if we want to get fancy and support lowercase letters and punctuation? We’ll need to reserve 26 more code points for a-z and 32 slots for punctuation. Our table is getting pretty big!

Encoding Table V3

One Encoding to Rule Them All
Even with our very simple encoding scheme, we breezed through three different versions and it becomes apparent that our table could grow as big as our imaginations. In the 70's and 80's, new standards proliferated to handle languages like Russian, Japanese, Chinese, and many more. It was painful to keep track of which encoding standard to use at what time - in our toy example if you got a message encoded using Table 3 and decoded it using Table 1, you would get gibberish. 

Encoded Text: 18 15 22 22 25 00 19 30 82 29 00 23 15
Decoded with Table  v1: ROVVY S��� WO
Decoded with Table v3: HELLO IT’S ME

To counter the growing chaos, in 1987 engineers from Xerox and Apple sought to create a unified character set, aptly named Unicode. A few years later other companies like Sun and Microsoft joined, creating the Unicode Consortium - an organization that determines how characters are encoded and what character sets should be included. Now Unicode is the dominant standard¹ with over a hundred thousand code points and the capacity for over a million, even supporting languages like Egyptian Hieroglyphics.

Dawn of the Emoji

In the early 2000's when we still had to use B) to convey 😎 like barbarians², Japanese carriers developed proprietary standards for rendering cartoon smiley faces and symbols (hence the word “emoji”). The first set was created by Shigetaka Kurita at NTT DoCoMo and these Japanese pictograms could only work within their systems.

Other companies soon followed suit but were still restricted by platform - you couldn’t send a GChat emoji to someone on AIM. Even the iPhone started out using SoftBank's proprietary emoji encoding; iPhone users could send emoji to each other but Android was left out in the code, only able to see those weird little boxes .

It wasn’t until 2010 when smartphones really took off that the Unicode Consortium created the Emoji Subcommittee and approved the first set of emoji at the behest of Google and Apple, effectively replacing these proprietary encodings with interoperability. This ushered in the modern era where anyone can send a 💩 to their friend regardless of app or platform.


Don’t expect the emoji hype train to stop anytime soon. Each new version of Unicode introduces a new set of emoji, which goes through a rigorous review process that considers universality, distinctiveness, and popularity - Unicode is backwards compatible so once an Emoji is enshrined in the character set it is there for good. You can take a look at some candidates for Unicode 9.0 (highlights include an avocado, a stack of pancakes, and my personal favorite - “face palm”).

It’s pretty strange that a small committee created to solve an engineering problem could essentially dictate how billions communicate. It’s illustrative of the power technology possesses and the responsibility it shoulders - recently the Unicode Consortium created the ability to change skin tones on emoji involving humans, giving digital representation to everyone with skin darker than Highlighter yellow.

Unfortunately the committee doesn’t look too kindly on fads so chances are Kim Kardashian won’t be inducted into the Unicode Standard anytime soon 👀

¹  Like all standards in computing, it still takes a long time for everyone to adopt - especially an entire country with legacy systems. Character encoding can also still cause nasty surprises for developers working with email and SMS.

²  While <3 has largely been replaced by ❤, a new class of emoticons have been created thanks to Unicode’s ubiquity. Modern classics like (╯°□°)╯︵ ┻━┻ and ¯\_(ツ)_/¯ depend on Unicode characters to make those expressive faces.


A Tourist's Guide to Deep Learning
When Do NFL Players Hit Their Peak?