Distribution of letters in the English language

Some letters in the English language appear more often in the beginning of words. Some appear more often at the end, and others show up in the middle. Using the Brown corpus from the Natural Language Toolkit, David Taylor looked closer at letter position and usage.

I’ve had many “oh, yeah” moments looking over the graphs. For example, words almost never begin with “x”, but it’s quite common as the second letter. There’s a little hump near the beginning of “u” that’s caused by its proximity to “q”, which is most common at the beginning of a word. When you remove “q” from the dataset, the hump disappears. “F” occurs toward the extremes, especially in prepositions (“for”, “from”, “of”, “off”) but rarely just before the middle.

Next step: letter proximity.


The Most Unisex Names in US History

Moving on from the most trendy names in US history, let’s look at the most unisex ones. Some names have …

Real Chart Rules to Follow

There are rules—usually for specific chart types meant to be read in a specific way—that you shouldn’t break. When they are, everyone loses. This is that small handful.

Think Like a Statistician – Without the Math

I call myself a statistician, because, well, I’m a statistics graduate student. However, the most important things I’ve learned are less formal, but have proven extremely useful when working/playing with data.

Most popular porn searches, by state

We’ve seen that we can learn from what people search for, through the eyes of Google suggestions: state stereotypes, national …