• # A visualization of pi for high school math students

March 22, 2013 to Data Art by Nathan Yau

On Kickstarter: A project that uses a visualization of pi to connect Brooklyn high school students to their community.

They've already made a histogram of emotions in their school's hallway and a stacked area chart mural at a nearby senior center. Next up is a wall currently covered in graffiti.

In Math class, students will construct the golden spiral based on the Fibonacci Sequence and begin to explore the relationship between the golden ratio and Pi. The number Pi will be represented in a color-coded graph within the golden spiral. In this, the numbers will be seen as color blocks that vary in size proportionately within the shrinking space of the spiral, allowing us to visualize the shape of Pi and it's negative space.

• # Internet Census

March 22, 2013 to Statistical Visualization by Nathan Yau

Upon discovering hundreds of thousands open embedded devices on the Internet, an anonymous researcher conducted a Census of the Internet, mapping 460 million IP addresses around the world.

While playing around with the Nmap Scripting Engine (NSE) we discovered an amazing number of open embedded devices on the Internet. Many of them are based on Linux and allow login to standard BusyBox with empty or default credentials. We used these devices to build a distributed port scanner to scan all IPv4 addresses. These scans include service probes for the most common ports, ICMP ping, reverse DNS and SYN scans. We analyzed some of the data to get an estimation of the IP address usage.

It's a pretty thorough analysis, but the conclusion interested me most:

The why is also simple: I did not want to ask myself for the rest of my life how much fun it could have been or if the infrastructure I imagined in my head would have worked as expected. I saw the chance to really work on an Internet scale, command hundred thousands of devices with a click of my mouse, portscan and map the whole Internet in a way nobody had done before, basically have fun with computers and the Internet in a way very few people ever will. I decided it would be worth my time.

• # Declining songwriter ratings with age

March 21, 2013 to Statistics by Nathan Yau

Do singer-songwriters age well like a fine wine, or does quality decline with age? Kyle Biehle analyzed fan ratings by age.

I understand all of the reasons for not comparing artists in this way. Despite twenty-one Academy Award nominations, Woody Allen never attends the Oscars. His reason is that art isn't competition — judging art is so subjective who's to say who or what is best? After all one man's Poison is another man's Cream. Similarly, Elvis Costello (featured in the viz) is famously credited with saying: "Writing about music is like dancing about architecture - It's a really stupid thing to want to do." I agree that using ratings - whether from fans or critics — to judge artistic merit is at best flawed and at worst a fool's exercise.

But I wanted to do it anyway.

Most peak in their 20s and either stabilize later on or continue to decline. Occasionally, as in the case with Bob Dylan, there's some see-sawing. Take a look at the Tableau interactive for a closer look. [via Waxy]

• # Interactive: Common chord progressions in 1,300 songs

March 21, 2013 to Network Visualization by Nathan Yau

If you listen to the radio long enough, you've probably noticed that many songs sound similar or remind of you of a song you've heard before. Hooktheory shows you just how similar some songs are via chord progressions in over 1,300 songs. The small group analyzed the data last year and presented some static charts, but this interactive version takes it a step further.

Simply start by selecting a chord in the network diagram. Songs that use that chord appear on the right. Then select another chord in the network diagram to find songs that use the chord progression from the original to the new. Keep selecting chords to filter further.

So in the end, there are two main things you can do: (1) Find songs that use the same chord progression and (2) see the most likely chord given the current selection.

My musical knowledge from middle school jazz band is long gone, but it's fun to explore, and you'll likely find relationships to songs that you didn't expect. [Thanks, Dave]

• # A new brand of cartographer

March 20, 2013 to Mapping by Nathan Yau

Emily Underwood on new cartographers and the growing field:

Geographers have traditionally studied how the natural environment contributes to human society and vice versa, whereas cartographers have focused more explicitly on the art and science of mapmaking. Over the past couple of decades, a new field has emerged: geographical information systems (GIS), blending the study and expression of geographic information. Cartography and geography have overlapped and spawned innumerable subspecialties and applications. Modern geographers and cartographers are involved in diverse projects: tracking fleets of vehicles or products, helping customers locate a Dunkin' Donuts, modeling environmental scenarios such as oil spills, and studying the spread of disease.

You could substitute visualization and statistics for cartography throughout, and it'd almost all still be valid. The reoccurring theme is that although academic programs can be fine resources, most of your success has to do with what you can learn on your own, as data-related fields are changing fast.

• # How to be Interesting by Jessica Hagy

March 19, 2013 to Infographics by Nathan Yau

Jessica Hagy, the one who made Venn diagrams on index cards popular, has a new book out today: How to be Interesting.

You want to leave a mark, not a blemish. Be a hero, not a spectator. You want to be interesting. (Who doesn’t?) But sometimes it takes a nudge, a wake-up call, an intervention!—and a little help. This is where Jessica Hagy comes in. A writer and illustrator of great economy, charm, and insight, she’s created How to Be Interesting, a uniquely inspirational how-to that combines fresh and pithy lessons with deceptively simple diagrams and charts.

The book started from this, which could probably also stand in as a guide on how to enjoy life.

• # Monitor your surroundings with these sensors

March 19, 2013 to Self-surveillance by Nathan Yau

It wasn't long ago that sensors and personal tracking seemed like pure nerdery. In the early stages of graduate school — before smartphones were popular or even widely available — I played around with sensors that had finicky battery life and Internet connectivity, the software was buggy, and the hardware looked clunky.

New tracking devices pop up regularly these days. They're built and designed for a wider audience, and sometimes to my surprise, the devices are embraced by the target audience. It started with personal trackers that are fitness and health-related, but people are branching out now to monitoring their surroundings.

Two showed up on my radar this past week: CubeSensors and Thermodo.

• # An exploration of how Oscar winners express gratitude

March 18, 2013 to Visualization by Nathan Yau

Each year, Oscar speeches seem to follow a similar format, with familiar names and groups sputtered in 30 seconds. For her master's project, Thank the Academy, digital media student Rebecca Rolfe explored these patterns.

• # App shows what the Internet looks like

March 15, 2013 to Mapping by Nathan Yau

In a collaboration between PEER 1 Hosting, Steamclock Software, and Jeff Johnston, the Map of the Internet app provides a picture of what the physical Internet looks like.

• # Feltron 2012 Annual Report

March 14, 2013 to Self-surveillance by Nathan Yau

Today might be pi day, but yesterday was Feltron Report day. The theme this year is visual density — or maybe programmatic graphics. Either way, it looks mighty fine.

• # Choosing the right seat

March 13, 2013 to Infographics by Nathan Yau

It can be tricky picking the right seat at a dinner party. So much depends on how many people there are and what shape the table is. Luckily, Alex Cornell provides a guide on where to sit and when to arrive to get the best seat of the night. The 4-person circle is your best bet.

This is the ideal setup. You are safe sitting in any seat. Regardless how interesting everyone is, you pretty much can’t go wrong. Note: as the diameter of the table increases, so too does the importance that you sit adjacent to someone you like.

Sorry for always sitting at the lonely end seat in the 7-person rectangle. [via kottke]

March 12, 2013 to Network Visualization by Nathan Yau

In 2007, Martin Wattenberg and Fernanda Viégas created the word tree, a search tool for unstructured text. You enter the text, pick a word or phrase, and you can see how other words and phrases branch from the root. Data visualization developer Jason Davies rephrased the visualization in JavaScript, and you can enter a URL or a Twitter username (or enter your own text like with the original). There's also a nice sidebar that makes it easier to browse through the text.

So for example, the above is a word tree for The Cat in the Hat, and you can see what branches from Thing One and Thing Two. The phrase "and Thing Two" often follows "Thing One" as do exclamation points. The reverse feature comes in handy for text like Steve Jobs' commencement speech.

• # Data hackathon challenges and why questions are important

March 12, 2013 to Statistics by Nathan Yau

Jake Porway, executive director of DataKind on data hackathons and why they require careful planning to actually work:

Any data scientist worth their salary will tell you that you should start with a question, NOT the data. Unfortunately, data hackathons often lack clear problem definitions. Most companies think that if you can just get hackers, pizza, and data together in a room, magic will happen. This is the same as if Habitat for Humanity gathered its volunteers around a pile of wood and said, "Have at it!" By the end of the day you'd be left with a half of a sunroom with 14 outlets in it.

Without subject matter experts available to articulate problems in advance, you get results like those from the Reinvent Green Hackathon. Reinvent Green was a city initiative in NYC aimed at having technologists improve sustainability in New York. Winners of this hackathon included an app to help cyclists "bikepool" together and a farmer's market inventory app. These apps are great on their own, but they don't solve the city's sustainability problems. They solve the participants' problems because as a young affluent hacker, my problem isn't improving the city's recycling programs, it's finding kale on Saturdays.

Without clear direction on what to do with the data or questions worth answering, hackathons can end up being a bust from all angles. From the organizer side, you end up with a hodgepodge of projects that vary a lot in quality and purpose. From the participant side, you're left up to your own devices and have to approach the data blind, without a clear starting point. From the judging side, you almost always end up having to pick a winner when there isn't a clear one, because the criteria of the contest was fuzzy to begin with.

This also applies to hiring freelancers for visualization work. You should have a clear goal or story to tell with your data. If you expect the hire to analyze your data and produce a graphic, you better get someone with a statistics background. Otherwise, you end up with a design-heavy piece with little substance.

Basically, the more specific you can be about what you're looking for, the better.

• # Amiigo: The exercise tracker that identifies exercises

March 11, 2013 to Self-surveillance by Nathan Yau

Self-tracking devices are all the rage these days. I went to the Apple store, and there was practically a whole wall of them. They were all uni-taskers though. There was one for cycling, another for running, and one for golfing. Amiigo, an Indiegogo campaign with four days left to contribute (but funded to completion five times over as of this writing), aims to track multiple exercises and figure out what you're exercise you're doing automatically.

• # What data brokers know about you

March 11, 2013 to Statistics by Nathan Yau

Lois Beckett for ProPublica has a thorough piece on data brokers — companies that collect and sell information about you — and what they know and where they get the data from.

They start with the basics, like names, addresses and contact information, and add on demographics, like age, race, occupation and "education level," according to consumer data firm Acxiom's overview of its various categories.

But that's just the beginning: The companies collect lists of people experiencing "life-event triggers" like getting married, buying a home, sending a kid to college — or even getting divorced.

Credit reporting giant Experian has a separate marketing services division, which sells lists of "names of expectant parents and families with newborns" that are "updated weekly."

The companies also collect data about your hobbies and many of the purchases you make. Want to buy a list of people who read romance novels? Epsilon can sell you that, as well as a list of people who donate to international aid charities.

So if you're wondering why you received that catalog in the mail, it was probably because a store sold your purchase data to a broker.

• # The world as one city

March 8, 2013 to Data Art by Nathan Yau

When we build models of the world, we often think of it broken down into pieces, such as cities, counties, and countries. In their newly funded project The City of 7 Billion, architects Joyce Hsiang and Bimal Mendis aim to model the world as one city, to study the impact of population growth on the environment and natural resources on a larger scale.

Every corner of the planet, they argue, is "urban" in some sense, touched by farming that feeds cities, pollution that comes out of them, industrialization that has made urban centers what they are today. So why not think of the world as a single urban entity?

Hsiang and Mendis don't yet know exactly what this will look like (that is the question, Mendis says). But they are planning to seed their geo-spatial model with worldwide data on population growth, economic and social indicators, topography, ecology and more. Ultimately, they hope, other researchers will be able to use the open-source platform for research on development patterns or air quality; the public will be able to use it to grasp the implications of building in a flood plain or implementing an energy policy; and architects will be able to use it to view the world as if it were a single project site.

Along with a slew of other challenges I am sure, one of the big ones is finding comparable data at high granularity. Large cities tend to track (and hopefully release) data about what's going, but once you step out of the major areas, data grows scarce.

They started with population, which was transformed into the physical installation above.

• # Using search data to find drug side effects

March 8, 2013 to Statistics by Nathan Yau

Along the same lines as Google Flu Trends, researchers at Microsoft, Stanford and Columbia University are investigating whether search data can be used to find interactions between drugs. They recently found an interaction.

Using automated software tools to examine queries by six million Internet users taken from Web search logs in 2010, the researchers looked for searches relating to an antidepressant, paroxetine, and a cholesterol lowering drug, pravastatin. They were able to find evidence that the combination of the two drugs caused high blood sugar.

The idea is that people are searching for symptoms and medications, and this data is stored in anonymized search logs. They then followed a suspicion that using the two drugs at the same time might cause hyperglycemia. Those that searched for the two drugs were more likely to search for hyperglycemia than the control group (probably those who didn't search for hyperglycemia).

The work is still in its infancy, but it'll be interesting to see how this sort of data can be used to supplement existing work by the Food and Drug Administration.

# How to Make an Animated Growth Map in R

Although time series plots and small multiples can go a long way, animation can make your data feel more real and relatable. Here is how to do it in R via the animated GIF route.
