In the most recent update to their atlas coming in September, National Geographic explains the shrinking Arctic through the lens of previous atlas maps. It’s not looking good.
Read More
-
-
When you first learn statistics, visualization, or any data-related subject, the data usually is given to you in a ready-to-use format. This is so that you can spend most of your time on the topic of interest. But once you step outside the learning bubble, data rarely comes in the format you want.
Marc Bellemare, an associate professor in the Department of Applied Economics at the University of Minnesota, provides some practical tips on how to deal with this. Bellemare’s parting advice:
Really, there is no big secret to cleaning data other than “Document everything” and to save everything in different files and in different locations (i.e., your computer, Dropbox, Google Drive), and there is no other way to learn data cleaning than by doing it.
Yep.
Some of the tips are in the context of specific software environment, but you can easily apply them to more general situations.
-
Tabula, available for Windows and Mac, lets you extract data from PDF files, and it just got an update. The user interface got an overhaul and it’s now easier to grab data from multiple pages. I wrote about Tabula last year, but orgs continue to publish data in PDF files, and sometimes PDF is just all there is. So this is definitely a good thing.
Keep it in your toolbox.
-
Government data sites are typically sluggish and a pain to use. So many forms. So slow. So much cruft in the way of what you really want. We went over this.
Reply All, one of my favorite podcasts, talked to technologist Clay Johnson about why government sites are like this. It’s not so much the people as it is the system that gets in the way of making things better.
Listen to the podcast -
Malaysia Airlines Flight 370 went down a year ago, and with recently found debris that is possibly from the flight, researchers have a few more bits of data to work from. The New York Times picked up on coverage of what’s going on, and in the latest, they provide an animated map that shows possible routes the debris could have taken. This is based on computer models from the Commonwealth Scientific and Industrial Research Organisation, and suggests a search area.
-
Poverty is on the rise. Justin Palmer mapped it for major cities in the United States.
Read More -
Pennsylvania is considering the use of risk assessment — the chances that someone will commit a crime in the future — in criminal sentencing. Risk assessment is already used in every state to some regard, so why not extend the concept? FiveThirtyEight and The Marshall Project look at the WTF-ness of this question.
Read More -
Lukasz Piwek is chipping away at a collection of Tufte-style charts using R, along with the code snippets. Fittingly, the project is called Tufte in R. The Tufte stuff is nice and all, but that’s not why I like this project. Two reasons.
Read More -
How to Map and Use GeoTIFF Files in R
It’s like working with a bunch of tiny dots, and oh look, all of sudden patterns emerge.
-
The Washington Post mapped power plants in the United States by type and capacity in megawatts. Color indicates the former, and bubble size indicates the latter. There are a lot more natural gas power plants, supplying 30 percent of the nation’s energy, than I expected.
See the article for a map for each type, along with a state-level breakdown.
-
Based on annual high school play and musical rankings from the magazine Dramatics, which date back to 1938, NPR charted the most popular plays by decade. For a variety of reasons — cast size, family-friendliness, and licensing — the oldies still reign.
Read More -
Here’s a straightforward stacked area chart from the Economist that shows shifting market share in the technology sector. It highlights the quick shrinkage of IBM in the 1990s, Microsoft reign soon after, and the apple surge mid-2000s. Be sure to look at the nominal and real views too, because even though relative dominance shifted, the sector as a whole is up and up.
-
Todd Schneider likes trivia, and he plays in an online league called LearnedLeague. Curious, Schneider wondered if there was anything interesting he could glean from the performance of the LLamas (Learned League members) that might apply to knowledge in general.
Read More -
You typically hear about data breaches in terms of number of records that were hacked. “A million email addresses were stolen” or “hackers ripped off 100,000 passwords.” Does anyone care? After the initial gasp-shock-horror, we move on and everyone forgets until the next time it happens.
However, if a hack affects you in some way, you pay closer attention. That long random string password reminds you every time you log in somewhere.
That’s the idea behind this quiz from the New York Times. Answer a few quick questions. See the potential information bits about you that were stolen in the past couple of years.
It’s a good spin on the record tally, and leads you right in to privacy tips and more information about each hack.
-
Waiting in line stinks. I purposely go to the grocery store during off-times with my son, so I don’t have to deal with the long lines. Google, I think currently only on Android phones, now provides information on when people go to the businesses around you, using a similar logic to auto traffic on Google Maps. Nice.
-
What is machine learning? It sounds like a bunch of computers get together in the library on Tuesdays and study during all-nighters. It’s not quite that.
Stephanie Yee and Tony Chu provide a really good visual explanation of the computer science subfield. The vertical scroller should clear up some misconceptions.
Read More -
Incarceration costs a lot of money. We know this, sort of. But how much really? Million Dollar Blocks, by Daniel Cooper and Ryan Lugalia-Hollon, estimates the cost in Chicago, down to the block level.
Read More -
We usually see Census data in aggregate. It comes in choropleth maps or as statistics about various subpopulations and geographies. Is there value in seeing the numbers as individuals? What about the people behind the numbers? FiveThirtyEight intern Jia Zhang experiments on Twitter.
Read More -
-
CompStat is a program that started in the New York Police Department, and several other departments have implemented it since. Officers are held accountable by tracking crime over time. Crime goes up, based on the data, and you can ask why. It seems like a fine idea, but problems arise when humans game the system to fill quotas. FiveThirtyEight highlights one such case within the NYPD.
Watch the Documentary