Guessing Names Based on What They Start With

I’m really bad at names. A lot of the time when I meet someone new, the name goes in one ear and out the other. If I manage to remember the name short-term, remembering long-term is still a toss-up in favor of forgetting.

But sometimes I can remember the first letter and then I can cycle the alphabet on the second letter to jog my memory.

I wonder: If I can remember the first letter or two, can I use name data from the Social Security Administration to make an educated guess about the full name?

Put in your sex, the decade you were born, and start entering your name below. I’ll try to guess your full name before you’re done.
 

 

Oh, so that’s why sometimes people call me Nick (or why people often call Amelia the more common Amanda).

This is based on data from the Social Security Administration, up to 2018. It’s relatively comprehensive, but there are a few limitations. First, it’s data for the United States, so the numbers don’t really apply elsewhere. Second, the SSA doesn’t include names with fewer than five people in a year, so the chart doesn’t cover more unique names. Third, there were no Social Security Numbers before 1935, so the name counts are fuzzier for years before that.

But like I said, the data still has a wide range. I aggregated the annual data by decade and calculated percentages by dividing name counts by total number of Social Security Numbers provided.

Before you enter anything, the chart shows the most popular names for the given sex and decade. Then as you enter a name, the chart shows conditional probabilities. The more information you give it, the stronger the guess.

It’s a simplification of how we remember names I am sure, but it seems to do a decent job.

In reality, this is me all of the time:

But from now on, when I can’t remember someone’s name, I’ll make them slowly spell it out so that I can guess based on this data. It will be so awkward, yet so satisfying.

Notes

  • This was inspired by Amelia McNamara’s curiosity as to why so many people call her Amanda.
  • I used name data from the Social Security Administration, which is current up to 2018. But Social Security Numbers weren’t a thing until 1935, so the name data before that is more fuzzy.
  • The SSA data, which is annual, doesn’t include names with fewer than five people. And to keep file size down, I limited the above names to those with at least 100 people in each decade.
  • I used R for analysis and D3.js for the interactive.


Become a member. Support an independent site. Make great charts.

See What You Get

Learn to Visualize Data See All →

How to Make a Custom Stacked Area Chart in R

You could use a package, but then you couldn’t customize every single element, and where’s the fun in that?

How to Make a Connected Scatter Plot

The combination of a time series chart and a scatter plot lets you compare two variables along with temporal changes.

How I Made That: National Dot Density Map

Mapping one dot per person, it’s all about putting the pieces together.

How to Make a Sankey Diagram to Show Flow

These tend to be made ad hoc and are usually pieced together manually, which takes a lot of time. Here’s a way to lay the framework in R, so you don’t have to do all the work yourself.

Favorites

How People Like You Spend Their Time

Looking at American time use for various combinations of sex, age, and employment status, on weekdays and weekends.

Counting Happiness and Where it Comes From

Researchers asked 10,000 participants to list ten things that recently made them happy. I counted and connected the dots.

10 Best Data Visualization Projects of 2017

It was a rough year, which brought about a lot of good work. Here are my favorite data visualization projects of the year.

Years You Have Left to Live, Probably

The individual data points of life are much less predictable than the average. Here’s a simulation that shows you how much time is left on the clock.