Distribution of letters in the English language

Some letters in the English language appear more often in the beginning of words. Some appear more often at the end, and others show up in the middle. Using the Brown corpus from the Natural Language Toolkit, David Taylor looked closer at letter position and usage.

I’ve had many “oh, yeah” moments looking over the graphs. For example, words almost never begin with “x”, but it’s quite common as the second letter. There’s a little hump near the beginning of “u” that’s caused by its proximity to “q”, which is most common at the beginning of a word. When you remove “q” from the dataset, the hump disappears. “F” occurs toward the extremes, especially in prepositions (“for”, “from”, “of”, “off”) but rarely just before the middle.

Next step: letter proximity.

Favorites

Years You Have Left to Live, Probably

The individual data points of life are much less predictable than the average. Here’s a simulation that shows you how much time is left on the clock.

Top Brewery Road Trip, Routed Algorithmically

There are a lot of great craft breweries in the United States, but there is only so much time. This is the computed best way to get to the top rated breweries and how to maximize the beer tasting experience. Every journey begins with a single sip.

A Day in the Life of Americans

I wanted to see how daily patterns emerge at the individual level and how a person’s entire day plays out. So I simulated 1,000 of them.

Most popular porn searches, by state

We’ve seen that we can learn from what people search for, through the eyes of Google suggestions: state stereotypes, national …