• Peter J. Rentfrow, et al. studied personality clusters across states using data from five surveys, totaling responses from about 1.6 million people. They recently published their results in the Journal of Personality and Social Psychology [pdf].

    There is overwhelming evidence for regional variation across the United States on a range of key political, economic, social, and health indicators. However, a substantial body of research suggests that activities in each of these domains are typically influenced by psychological variables, raising the possibility that psychological forces might be the mediating or causal factors responsible for regional variation in key indicators.

    They found three main clusters, mapped above: friendly and conventional, relaxed and creative, and temperamental and uninhibited.

    The maps suggest that states belong only to specific clusters, but I suspect it’s a more continuous scale. For instance, a state might be partially part of cluster 1 and 2, not really 3, as opposed to just cluster 1. Still though, it’s an interesting start. Now if only the data they used were more easily accessible.

  • A challenge these days with visualization is that a piece might look great on a computer monitor and then break on a tablet or phone. However, if you design your software with that in mind so that it adapts to the device it’s on — so that it’s responsive — your audience loves you more for it. Chris Amico explains how to get started in D3.js: responsive maps, charts, and legends.

  • It’s been continuous tracking and monitoring for the past couple of days and nights, but none of that matters now. We get to bring him home today.

  • Those who use the ggplot2 package in R and do everything else in Python will appreciate this Python port of the package from yhat.

    Excel makes some great looking plots, but I wouldn’t be the first to say that creating charts in Excel involves a lot of manual work. Data is messy, and exploring it requires considerable effort to clean it up, transform it, and rearrange it from one format to another. R and Python make these tasks easier, allowing you to visually inspect data in several ways quickly and without tons of effort.

    The preeminent graphics packages for R and Python are ggplot2 and matplotlib respectively. Both are feature-rich, well maintained, and highly capable. Now, I’ve always been a ggplot2 guy for graphics, but I’m a Python guy for everything else. As a result, I’m constantly toggling between the two languages which can become rather tedious.

    Once you get the Python library installed (and its dependencies), you’ll be able to use the same layered graphics approach as the R package, with a similar syntax.

  • Check out this awesome new thing called MAP. It’s made of 100% sustainable material, easy to share, unbreakable, fits in your pocket, and most importantly, shares none of your information.

    Pre-ordered.

  • Most of the major pizza chains are within a 5-mile radius of where…

  • James Hamblin for The Atlantic rendered the average American man based on BMI and compared him to the average man in other countries. Hamblin named the average man Todd.

    Though in his face this reads lonesome, Todd does have three international guyfriends. They met at a convention for people with perfectly average bodies, where each won the award for most average body in their respective country: U.S., Japan, Netherlands, and France. The others’ BMIs, based on data from each country’s national health centers, are 23.7, 25.2, and 25.6.

    I named them all Todd, actually, even though it could be confusing, because not everyone’s name is a testament to their cultural heritage.

  • Digital artist Lauri Vanhala animated a day of maritime traffic in the Baltic Sea.

    Here’s a marine traffic and accident visualization that I created for the Baltic Marine Environment Protection Commission. The video was shown last week in a conference where the ministers of environment in the region of Baltic Sea and a bunch of other professionals were discussing how to protect the vulnerable and polluted sea in the future.

    The background music feels cinematic but not surprising given the audience. I particularly like the highlighting and annotation sync around the one-minute mark.

    See also: Britain from Above and Netherlands from Above. Oldies but goodies.

  • In their continued efforts to help potential home buyers find out all they can about the neighborhoods they want to live in, Trulia added median listing prices to their set of local maps. In the zoomed out view, you get prices per county, at medium zoom it’s per ZIP code, and zoomed in all the way it’s per block. You can also see sale price and sale price per square foot.

    With this, supplemented by crime data, commute, schools, and natural hazards, Trulia’s maps are a required stop for home buyers.

  • Dan Delany took a simple look at furloughed employees due to the government shutdown. There are tickers for duration, estimated unpaid salary, and estimated food vouchers unpaid, but the main view is the interactive tree map that shows furloughed proportions by department.

    Data was nicely collated into one spreadsheet from a bunch of government-released PDF files (of course), and the code for the page is available on Github. [Thanks, Dan]

  • Sometimes it can be a challenge to produce data graphics in vector format, which is useful for high-resolution prints. Raw, an alpha-version tool by Density Design, helps make the process smoother.

    Primarily conceived as a tool for designers and vis geeks, Raw aims at providing a missing link between spreadsheet applications (e.g. Microsoft Excel, Apple Numbers, OpenRefine) and vector graphics editors (e.g. Adobe Illustrator, Inkscape, Sketch). In this sense, it is not intended to be a full “visualization tool” like Tableau or other similar products: as the name suggests it is a sketch tool, useful for quick and preliminary data explorations as well as for generating editable visualizations.

    Although still in its early stages, Raw is actually quite useable. Start with a dataset copy and pasted from your spreadsheet, select a visualization format, and then click-and-drag how you want to represent values. Modify options as you see fit and download in the format you need.

  • Last year, URL shortening service bitly and Forbes made a map that showed popular news sources by state. However, the map was based on a static month of data, so what it showed then doesn’t necessarily apply to now. Bitly took it a step further this year and shows media consumption in real-time.

    They also categorized media sources into newspapers, tv and radio, magazines, and online only for a more detailed view. And to top it off, you can click on states to see a list of top sources, and you can see links driving traffic to the listed sites.

    One key thing to keep in mind as you read the maps: They show disproportionality rather than raw counts. So when you see that Texas is a TMZ fiend, that doesn’t mean they click more on the celebrity news site more than on Huffington Post. Rather, it means the relative volume of TMZ-clicking from Texas versus other states is higher versus the relative volume of Huffington Post-clicking.

  • I’m at an age where my feed is filled with data points. Every day my friends post about their data points, how precious their data points are, and all the peculiarities of their data points. Even though there are hundreds of thousands of data points that pop up every day, each one is a gift that can be compared to nothing and no one else. At the most superficial level, yes, they are easy to compare, but once you look to the details — a wink, a smirk, a bobble — you see individuals, and that’s where we seem to find the most joy.

    I’ll let you know in a few weeks.

    My wife just went on maternity leave, in preparation for our very own data point.

    It’s our first, so I’m not sure how this works, but something tells me I’ll be busier and maybe more tired than usual. I hope to keep FlowingData running, as if nothing changed, but in case you don’t hear from me for a few days, you’ll know why.

  • The video below shows ten seconds of trading on Blackberry on October 2, when they reported a bigger loss than they thought. It might also be a super advanced level of Space Invaders.

  • Mark Graham and Stefano De Sabbata for Information Geographies mapped the most visited site based on Alexa data. Countries are sized by Internet population. There aren’t many surprises with Facebook and Google in the Americas and and Europe, but it gets more interesting when you look elsewhere.

    The situation is more complex in Asia, as local competitors have been able to resist the two large American empires. Baidu is well known as the most used search engine in China, which is currently home to the world’s largest Internet population at over half a billion users. At the same time, we see a puzzling fact that Baidu is also listed as the most visited website in South Korea (ahead of the popular South Korean search engine, Naver). We speculate that the raw data that we are using here are skewed. However, we may also be seeing the Baidu empire in the process of expanding beyond its traditional home territory.

    The remaining territories that have escaped being subsumed into the two big empires include Yahoo! Japan in Japan (in join venture with SoftBank) and Yahoo! in Taiwan (after the acquisition of Wretch). The Al-Watan Voice newspaper is the most visited website in the Palestinian Territories, the e-mail service Mail.ru is the most visited in Kazakhstan, the social network VK the most visited in Belarus, and the search engine Yandex the most visited in Russia.

  • When you go to the United States Census site, Data.gov, or similar government-run sites, you see this. “Due to the lapse in government funding, census.gov sites, services, and all online survey collection requests will be unavailable until further notice.” Now it’s personal.

  • Big data, in all its glory, promises insights into the soul of humankind. There’s a hefty restriction though. Data only tells you about the population and actions of individuals it represents, which inevitably excludes part of the population. Jonas Lerman considers two hypothetical people. The first one:

    The first is a thirty-year-old white-collar resident of Manhattan. She participates in modern life in all the ways typical of her demographic: smartphone, Google, Gmail, Netflix, Spotify, Amazon. She uses Facebook, with its default privacy settings, to keep in touch with friends. She dates through the website OkCupid. She travels frequently, tweeting and posting geotagged photos to Flickr and Instagram. Her wallet holds a debit card, credit cards, and a MetroCard for the subway and bus system. On her keychain are plastic barcoded cards for the “customer rewards” programs of her grocery and drugstore. In her car, a GPS sits on the dash, and an E‑ZPass transponder (for bridge, tunnel, and highway tolls) hangs from the windshield.

    That’s a lot of data. The second person:

    He lives two hours southwest of Manhattan, in Camden, New Jersey, America’s poorest city. He is underemployed, working part-time at a restaurant, paid under the table in cash. He has no cell phone, no computer, no cable. He rarely travels and has no passport, car, or GPS. He uses the Internet, but only at the local library on public terminals. When he rides the bus, he pays the fare in cash.

    The second person has fewer data flows.

    These days, big data exclusion almost sounds like a good thing — if you’re intent on avoiding all marketing-related data collection — but when policy-making, fund allocation, etc. come into play, it’s possible the excluded aren’t counted. That’s not to say people should hurriedly sign up for Facebook and opt-in to every tracking study. It’s the opposite. Those in charged of the data and those who decide based on what they see in the data are responsible for knowing the background of their source.

  • Measuring and Mapping Space: Geographic Knowledge in Greco-Roman Antiquity opens at Institute for the Study of the Ancient World of NYU, this Friday. The exhibit serves as an appreciation of maps and more importantly, the history behind them and what they represent of their time.

    Our modern knowledge of ancient cartography relies almost exclusively on written sources. Despite this paucity of ancient artifacts, it is clear that Greeks and Romans applied topographical studies to the mapping of land and sea routes, to the implementation of an accurate system of recording public and private lands, and to promote specific political agendas. In all these instances, the resulting representations of places presented the viewer with a distorted and schematized version of geographic and topographic elements, transforming those regions both on a conceptual and on a physical level.

    [via The New York Times]

  • Mike Bostock, Shan Carter, and Kevin Quealy for The New York Times explore quarterback streaks in the National Football League since 1970. The longest streak for each team is highlighted yellow, and you can search for your favorite players either by mousing over streaks or via the dropdown/search menu.

    Be sure to also check out the chart iterations of the interactive. First, a couple of bar graphs in R for a visual summary, and then 17 sketches later, out comes the finished product.

    I’m surprised that many of the longest streaks took place in the 1970s and 1980s. You’d think with today’s rules, there’d be more in the latter half of the timespan. Then again, trades and quarterback rotations aren’t the same as they were back then either.

  • Foursquare check-ins can be self-encapsulated and personal to the individual, where each dot represents a specific place in time. Each point represents a stop at a restaurant, store, or place of business. However, look at check-ins from lots of people and movement appears, which is the premise of Foursquare’s latest videos.

    Because it’s Foursquare, there’s an added dimension of location categories, so color codes show people go to work, grab lunch, shop, and get after-work drinks.

    The above shows the pulse of Tokyo. See also: Chicago, London, Istanbul, San Francisco, and New York. [via Fast Company]