Lunduke
News • Science & Tech
The Unlikely Story of UTF-8: The Text Encoding of the Web
Plan 9, Placemats, New Jersey Diners, and last minute ideas
June 22, 2023
post photo preview

If you are reading this on a computer -- of any kind -- odds are good that the words on the screen are all encoded using something called "UTF-8".

UTF-8 (or "Unicode Transformation Format - 8 bit") is, put simply, a format for encoding and storing text -- one which allows for far more text characters than the older "ASCII" encoding (which could only show a total of 95 printable characters).

And UTF-8 is, quite simply, everywhere.

Nearly every major computer operating system heavily uses UTF-8 for handling text... likewise it is the standard for websites, with close to 100% of all webpages explicitly using UTF-8 for the text on the page.

test
The source for Wikipedia.  Like most of the web, using UTF-8.

An argument could be made that UTF-8 is one of the most successful and widely adopted standards in all of computer history.

But this almost wasn't the case.

In fact, UTF-8 was created -- at the very last possible moment -- and it was first implemented in a computer system that most people don't even know existed.

X/Open's search for better text encoding

In the early 1990s, text encoding was... an issue.

While solutions for extended character sets (beyond simple ASCII characters) existed, they were less than ideal.  To put it mildly.  The most popular solution, known as UTF-1 (aka "ISO 10646"), suffered from serious performance issues... and often caused significant problems with software which used plain "ASCII" text (including UNIX file system paths).

Having a character encoding on UNIX systems that could cause problems with UNIX file systems?  Not good.

Obviously a new type of text encoding was needed.

So, in 1992, X/Open (originally known as the "Open Group for UNIX Systems", a consortium of UNIX vendors, including: Sun, HP, AT&T, IBM, and several others) set about the task of selecting a proper text encoding standard to be used across all of the UNIX world.

The proposal that gained the most traction was known as FSS/UTF (aka "File System Safe Universal Character Set Transformation Format").  Roills off the toungue, right?

This proposal was both faster than the old text encoding standard... and, as the name suggests, it was "File System Safe".  Which was a big win.

Enter: The Plan 9 Nerds

Which brings us to September 2nd, 1992.  Sometime in the early evening.

The X/Open group was meeting, in Austin, Texas, to formally decide on the file encoding standard.

Looking to get some feedback on the proposal, some members of X/Open made a call to two legendary programmers -- Ken Thompson and Rob Pike -- who were working on the Plan 9 Operating System project at Bell Labs in New Jersey.


A little background...

Ken Thompson was one of the creators of MULTICS, UNIX, the B programming language (the predecessor to C), among many other accomplishments.

Rob Pike, also a UNIX programmer, was the co-creator of Blit, writer of multiple UNIX and programming books, and the creator of the first UNIX windowing system.

To call these two "absolute legends" in the world of computing would be, perhaps, a bit of an understatement.  The two were currently working together on a research operating system, at Bell Labs, called Plan 9.  An attempt to fix some of the perceived shortcomings of UNIX... by the creators of UNIX, itself.


What happened next... after Ken Thompson and Rob Pike received that phone call?  Luckily, we have a detailed accounting... written by Rob Pike, himself.

"We had used the original UTF from ISO 10646 to make Plan 9 support 16-bit characters, but we hated it.  We were close to shipping the system when, late one afternoon, I received a call from some folks, I think at IBM - I remember them being in Austin - who were in an X/Open committee meeting.  They wanted Ken and me to vet their FSS/UTF design."

Asking two legendary engineers for their input?  You can probably guess what happened next...

"Ken and I suddenly realized there was an opportunity to use our experience to design a really good standard and get the X/Open guys to push it out.  We suggested this and the deal was, if we could do it fast, OK."

That's right.  Ken and Rob had some ideas.  And the X/Open folks agreedd to listen to those ideas... if they could get them something fast.

And, by fast, they really meant "immediately... like... right now."  Because the X/Open team were, quite literally, all gathered in Austin to decide on this... right then.

"Yeah.  I could eat."

Ken and Rob did what any good programmers would do when placed on an almost impossibly tight deadline -- and needed to come up with an amazing idea that could change the course of computing for decades to come... they went out to grab some grub.

"So we went to dinner, Ken figured out the bit-packing, and when we came back to the lab after dinner we called the X/Open guys and explained our scheme.  We mailed them an outline of our spec, and they replied saying that it was better than theirs (I don't believe I ever actually saw their proposal; I know I don't remember it) and how fast could we implement it?  I think this was a Wednesday night and we promised a complete running system by Monday, which I think was when their big vote was."

Remember.  This was 1992.

Which means, while laptops and such certainly existed, most people (even legendary programmers) did not have any sort of mobile, portable computers.  Certainly not the kind you could take out to a restaurant.

So what, pray tell, did they write their new text encoding design on?

A placemat from a New Jersey diner.

This is not the placemat that UTF-8 was designed on.

Seriously.

"UTF-8 was designed, in front of my eyes, on a placemat in a New Jersey diner."

The boys, Ken and Rob, now had just a few days to get all of this done -- before the big vote on the new text encoding standard.  And they sure as heck didn't waste any time.

They got back from dinner, placemat in hand, and got to work.

"So that night Ken wrote packing and unpacking code and I started tearing into the C and graphics libraries.  The next day all the code was done and we started converting the text files on the system itself.  By Friday some time Plan 9 was running, and only running, what would be called UTF-8.  We called X/Open and the rest, as they say, is slightly rewritten history."

They converted an entire operating system over to a brand new -- just designed on a placemat -- text encoding format... in less than two days.

Here's the rough time-line:

  • Wednesday (Sep 2) evening: Dinner at a New Jersey Diner.  Ken Sketches out the idea on a placemat.
  • Wednesday night: Coding begins.
  • Thursday: Coding complete.
  • Friday: Entire Plan 9 operating system is now using "UTF-8".
  • Monday (Sep 7): X/Open group votes to use the Ken/Rob encoding design.

On Tuesday, September 8th, 1992 (at 3:22am), mere hours after the official vote to accept their text encoding design, Ken Thompson sends out the following email regarding Plan 9 now using UTF-8:

"The code has been tested to some degree and should be pretty good shape.  We have converted Plan 9 to use this encoding and are about to issue a distribution to an initial set of university users."

That's right.

Ken and Rob got a call asking for feeback on a Wednesday.  By the next Tuesday (at 3am) they were ready to ship a version of their Plan 9 OS with all the changes, and their designs had been voted on by the largest UNIX companies in the world.

Like I said.

A recent picture of the two legends, themselves.

These guys are legends.

What about that placemat?

Considering the vast impact of UTF-8 on the world of computing... whatever happened to that original "design document" (aka "the placemat")?  It would certainly be of historic significance.

"I very clearly remember Ken writing on the placemat and wished we had kept it!"

Let this be a lesson to all of the programmers out there:

Keep all of you doodles, notes, and sketches you make for your projects... you never know when one of those projects will become critical to the entire world... making your quick sketch worthy of being in a museum.

Especially if it's on a placemat.  From a diner.  In New Jersey.


Copyright © 2023 by Bryan Lunduke.  All rights reserved.  The contents of this article are licensed under the terms of The Lunduke Content Usage License.

community logo
Join the Lunduke Community
To read more articles like this, sign up and join my community today
11
What else you may like…
Videos
Podcasts
Posts
Articles
End of the Internet? Dead Internet Theory + Disappearing Content = Rut Roh

The Internet is mostly made by AI... but that's ok, it's all being deleted anyway.

00:28:24
The Future of Computing: A.I. and Advocacy. ...Seriously?

Microsoft, Firefox maker Mozilla, & Red Hat envision a future where computers are focused on Artificial Intelligence & Political Advocacy (and Activism). Where do others, like Apple & Ubuntu, stand?

00:28:46
The Open Source Community is Neither "Open" nor a "Community"

Other words that don't describe the Open Source World: Free, Democracy, Welcoming, Inclusive, Honest.

00:32:04
November 22, 2023
The futility of Ad-Blockers

Ads are filling the entirety of the Web -- websites, podcasts, YouTube videos, etc. -- at an increasing rate. Prices for those ad placements are plummeting. Consumers are desperate to use ad-blockers to make the web palatable. Google (and others) are desperate to break and block ad-blockers. All of which results in... more ads and lower pay for creators.

It's a fascinatingly annoying cycle. And there's only one viable way out of it.

Looking for the Podcast RSS feed or other links? Check here:
https://lunduke.locals.com/post/4619051/lunduke-journal-link-central-tm

Give the gift of The Lunduke Journal:
https://lunduke.locals.com/post/4898317/give-the-gift-of-the-lunduke-journal

The futility of Ad-Blockers
November 21, 2023
openSUSE says "No Lunduke allowed!"

Those in power with openSUSE make it clear they will not allow me anywhere near anything related to the openSUSE project. Ever. For any reason.

Well, that settles that, then! Guess I won't be contributing to openSUSE! 🤣

Looking for the Podcast RSS feed or other links?
https://lunduke.locals.com/post/4619051/lunduke-journal-link-central-tm

Give the gift of The Lunduke Journal:
https://lunduke.locals.com/post/4898317/give-the-gift-of-the-lunduke-journal

openSUSE says "No Lunduke allowed!"
September 13, 2023
"Andreas Kling creator of Serenity OS & Ladybird Web Browser" - Lunduke’s Big Tech Show - September 13th, 2023 - Ep 044

This episode is free for all to enjoy and share.

Be sure to subscribe here at Lunduke.Locals.com to get all shows & articles (including interviews with other amazing nerds).

"Andreas Kling creator of Serenity OS & Ladybird Web Browser" - Lunduke’s Big Tech Show - September 13th, 2023 - Ep 044

News fast update!

Last weekend, on what was pretty much a dare, I decided to not look at any "news" for the week. I signed out of all social media, except for FB and Discord. I'm in a couple motorcycle groups on FB that revolve around the Mid-Atlantic Backcountry Discovery Route. I just looked at that, and didn't even load up the main feed. Discord just for chat, though I even kept that to a minimum. I didn't click through anything that even remotely looked like news. I didn't go to Slashdot or CNN or Fox or the orange site, any of it. I skipped the current affairs people I follow on Youtube.

I didn't watch or read any of Bryan's new stuff. Heck, the only time I even signed into Locals was when I accidentally fired up the Android app, which I promptly closed. (I was trying to open my banking app and it's right beside Locals.)

In other words, I currently don't have a clue about anything. I see there's a new Q&A. I'ma go watch it now.

Going forward, I think a lot of what I...

post photo preview
post photo preview
post photo preview
Last week at The Lunduke Journal (May 19 - May 25, 2024)
Open Source! The Future of Computing! The End of the Internet! Yowza, what a week!

We tackled some pretty big topics last week!  The nature of "The Open Source Community", the future of computing (according to "the powers that be"), and the... end of the Internet?  Yikes.  Intense stuff!

Luckily, we ended the week with various nerdy goofiness.  You know.  To cleanse the palate.

Oh!  Have I thanked all of The Lunduke Journal subscribers yet today?  No?  Well, gosh darn it, I should!  You amazing nerds make all of this possible.

The Videos

The Articles

Previous Few Weeks

And, would you look at that?  A new week is about to begin!  If the past is a good indictor of the future... better buckle up, Buttercup!  This next week is gonna be a fun ride!

Read full Article
post photo preview
Instantly Become an Elite Movie Hacker
(with 3 simple tools)

Feeling lazy?  Want anyone who happens to walk past your computer screen to think you are incredibly busy writing — or compiling — a mountain of code?

Filming a movie about a squad of elite hackers and need the computer screens to... you know... look the part?

Or, heck, are you just a bit bored and want to make your computer do something funky looking?

Whichever situation you find yourself in, here are three different tools that will make your computer appear like it is hard at work doing some seriously elite hacking and coding.

1 - Genact

Genact is described as a “nonsense activity generator”. And boy does it do its job well.

Pretend to be busy or waiting for your computer when you should actually be doing real work! Impress people with your insane multitasking skills. Just open a few instances of Genact and watch the show. Genact has multiple scenes that pretend to be doing something exciting or useful when in reality nothing is happening at all.

Runs on Linux, Windows, and Mac… and creates screens like this:

Compiling!

 

Memory Dumping!

 

Download-inating!

Genact has a whole boatload of different modules to help you pretend to do a bunch of different things: Mining crypto, handling docker images, compiling kernels, viewing logs… it’s all here.

2 - Hollywood

Hollywood is a Linux-only option, and it looks oh-so-cool. It runs in a terminal, and opens up a whole bunch of different applications (mostly real performance and network monitoring tools) each of which displays constantly updating bits of information.

The whole point is to make your computer look super busy… and super hacker-y. Just like in a movie.

In fact, Hollywood looks so good that it’s been used in multiple TV shows.

For example, here it is in a segment for NBC Nightly News:

So much elite hacker-y-ness!

Yeah. The news. A fake “make your computer look like it’s hacking something” application. On the news. If that’s not a great representation of the sorry state of TV News, I don’t know what is.

Just for the sake of completeness… here’s a shot of two investigators -- from that news report -- pointing to the random, gibberish output of Hollywood… and pretending like it’s super fascinating, real data that is somehow relevant to the news segment.

"Hmm.  Yes.  My elite hacker brain is thinking about this very real hacker stuff on this computer screen.  Look!  Right there!  Hacker stuff!"

That is, I kid you not, absolutely real.  This was on the news.

And here’s Hollywood in a sketch on Saturday Night Live:

"Don't interrupt me!  Can't you see I'm hacking!"

Seriously.  Hollywood is tons of fun to play with.  Even if you're not filming a news report.

3 - HackerTyper.net

HackerTyper.net is a bit different than the other ones.

Here’s how it works:

  1. Open up HackerTyper.net.

  2. Start hitting keys on the keyboard. Any keys at all. Doesn’t matter.

  3. Perfectly formatted C code appears on the screen!

Alloc's and Struct's and Int's!  Huzzah!

Now you can write code just like actors in the movies! Just sit back and pound away at your keyboard -- like a deranged, drunken monkey -- with complete disregard for what keys you’re actually pressing!

Whichever of these three options you choose -- HackerTyper, Hollywood, or Genact -- you are now fully equiped to become the most elitest of elite movie hackers.  (You're even ready to be on the evening news.)

Read full Article
post photo preview
Funny Programming Pictures Part XLI
That's, like, 41 in normal, non-fancy numbers.

Behold!  Pictures... from... the Internet!

 

Dangit, Bilbo!  Knock it off!

 

I would... still apply the .gitignore rules.  ... right?  I think?

 

HTTP jokes are all the rage these days.

 

According to every PM and Scrum Master I've ever worked with, this is true.

 

False.  Five months.

 

The Amazon Shareholders would like to thank you for your contribution.

 

It's totally normal for the value of a currency to fluxuate by 10% every day.  At random.  Wink wink.

 

You think a measly garage door is going to stop Flanders from asking you about LLM's?

 

Ok.  This one's not about programming or computers.  But.  You know.  Think about it.

 

This comic is highly misleading.  In real life, the "issues" bag is roughly twice the size.  Also the corporation stole the tree.

 

Can't argue with that.

 

He he.  Still one of my favorites.

 

I should make a rule about this within The Lunduke Journal.  Anyone who mentions "AI" has to drop a quarter in the tip jar.  I'd be rich!  Rich, I say!

 

Dagnabbit.

 

I'm not saying Infrastructure guys are pansies nowadays.  But they're pansies nowadays.

 

Yup.

 

It takes screenshots of your mouth.

 

You see... because they have small brains.
Read full Article
See More
Available on mobile and TV devices
google store google store app store app store
google store google store app tv store app tv store amazon store amazon store roku store roku store
Powered by Locals