• Search how phrases have been used via Google Ngram Viewer

    December 20, 2010  |  Online Applications

    Ngram - kindergarten

    Language changes. Culture changes. And we can see some of these changes via what authors write about in books over the years. Google's Book Ngram Viewer lets you search through this data, and shows a graph similar similar to the output of Google Trends. The above is the trends for nursery school, kindergarten, and child care:

    This shows trends in three ngrams from 1950 to 2000: "nursery school" (a 2-gram or bigram), "kindergarten" (a 1-gram or unigram), and "child care" (another bigram). What the y-axis shows is this: of all the bigrams contained in our sample of books written in English and published in the United States, what percentage of them are "nursery school" or "child care"? Of all the unigrams, what percentage of them are "kindergarten"? Here, you can see that use of the phrase "child care" started to rise in the late 1960s, overtaking "nursery school" around 1970 and then "kindergarten" around 1973. It peaked shortly after 1990 and has been falling steadily since.

    Find anything interesting?
    Continue Reading

  • Advanced visualization without programming – Impure

    December 2, 2010  |  Online Applications

    Color map

    Programming can be tough in the beginning, which can make advanced visualization beyond the Excel spreadsheet hard to come by. Bestiario tries to make it easier with their most recent creation Impure:

    Impure is a visual programming language aimed to gather, process and visualize information. With impure is possible to obtain information from very different sources; from user owned data to diverse feeds in internet, including social media data, real time or historical financial information, images, news, search queries and many more.

    It's not a plug-and-play application, but it's not scripting in a text editor either. Think of it as somewhere in between that (hence the visual programming language). They've taken the logic behind code, and encapsulated them into modules or structures, and you can piece them together like a puzzle. The interface kind of reminds me of Yahoo Pipes.
    Continue Reading

  • R is the need-to-know stat software

    November 17, 2010  |  Software, Statistics

    This Forbes post on the greatness that is R is being passed around by every statistician and his mother today.

    It's not that this type of analysis wasn't possible before — statisticians have existed, and commercial software has been available to support them, for decades. The fact that R is free to use, free to modify, and its source is open to view, extend and improve means students, stock traders-in-training and fantasy football junkies can familiarize themselves with the software. They can write programs against it. They're likely to continue that usage into their professional lives. When they share their work, the community, down the line, benefits. And the virtuous cycle strengthens.

    What's your favorite (graphical) use of R?

  • Format and clean your data with Google Refine

    November 16, 2010  |  Software

    When we first learn how to deal with data in school, it's nicely formatted and fits perfectly into a rectangular spreadsheet. Then when we start to deal with real data, we find missing values, inconsistencies, and for some reason it doesn't plug straight into our software. What the heck?

    The caveman way to fix this problem is to open Excel and manually edit everything. Some ad hoc code can often fix your problems, but still that takes time and can be a pain. Google Refine, the Googley evolution of Freebase Gridworks, can help you.
    Continue Reading

  • Find the names in your data with Mr. People

    November 8, 2010  |  Online Applications

    Inspired by Shan Carter's simple data converter, appropriately named Mr. Data Converter, Matthew Ericson just put Mr. People online. The tool lets you paste a list of names, and it will parse the first and last name, suffix, title, and other parts for you. You can even have multiple names in a single row.

    Years ago, while trying to clean up the names of donors in campaign finance data from the Federal Election Commission, I hacked together a Perl module — loosely based on the Lingua-EN-NameParse module — to standardize names. One port to Ruby later, I've finally put together a Web front end for it.

    Getting data in the right format, whether for analysis or visualization, can be a huge pain. Imagine. All the data you need is right in front of you, but you can't do anything with it yet, because as often is the case, it's not in a nice and pretty rectangular format. So anything that makes this easier and quicker is an instant bookmark for me.

    [Mr. People via @mericson]

  • Why everyone should learn programming

    October 28, 2010  |  Coding

    Daniel Shiffman, assistant professor at the NYU Interactive Telecommunications Program, talks programming, computation, data, and why everyone should learn programming in this interview by Mark Webster.

    It's not just about saving time. There are certain things you can discover and be creative with with computation that you can't by hand. They both go together.

    Watch the four-minute interview below. The excitement in Shiffman's voice alone might want to make you learn some Processing (which he wrote a useful book for).
    Continue Reading

  • How people in your area spend money

    October 28, 2010  |  Online Applications

    San Francisco spending

    The personal finance site Mint aggregates spending data from four million users. At the individual level, Mint is useful in that it brings all of your finances into one place. Zoom out and aggregate, and you have spending for a city or a state. This is what Mint Data does.
    Continue Reading

  • Find your flight via visual interface

    October 21, 2010  |  Misc. Visualization, Online Applications

    hipmunk flight search

    Booking flights became so much easier when it all shifted online, but it hasn't changed in years. You put in your preferred dates and times and you get a long list of options. Oftentimes those listings can be a pain as you browse through all of your options. Oh the burden of choice. Hipmunk tries to make flight search easier with a visual interface.

    As usual, you enter your origin and destination but instead of plain HTML tables, you get something like the above, and you can sort the options from least to greatest amount of agony. Rectangle lengths represent flight times and are color-coded by airline. Flights with the same take off and arrival times, but priced higher are hidden to help you narrow down quicker.

    Hipmunk is still in the early stages, but a quick search shows a lot of promise.

    [Hipmunk via Matt]

  • How K-12 schools in your area measure up

    October 13, 2010  |  Mapping, Online Applications

    Education scorecard - how does this district compare

    In collaboration with NBC News and The Gates Foundation, Ben Fry-headed Fathom Design shows you how K-12 schools measure up in your area. If you're a parent or soon-to-be parent considering a move, this will be especially interesting to you. The Education Nation Scorecard lets you search for your location or a specific school to see how they perform and how they compare to the rest of the country.
    Continue Reading

  • The state of mapping APIs

    September 15, 2010  |  Mapping, Software

    O'Reilly Radar surveys the state of mapping APIs from old sources (like Google) and new ones (like CloudMade). Spoiler alert: there's a lot of opportunity out there.

    Maps took over the web in mid-2005, shortly after the first Where 2.0 conference. They quickly moved from fancy feature to necessary element of any site that contained even a trace of geographic content. Today we're amidst another location and mapping revolution, with mobile making its impact on the web. And with it, we're seeing even more geo services provided by both the old guard and innovative new mapping platforms.

    [O'Reilly Radar]

  • Graph and explore your Gmail inbox

    September 14, 2010  |  Online Applications, Self-surveillance

    Graph your inbox

    Your email says a lot about who you are, who you interact with, and what you're up to at any given time. Maybe it's receipts from that online travel site or notifications from Facebook. There are lots of tidbits you can extract from your inbox. But how? PhD candidate Bill Zeller provides you with Graph Your Inbox.
    Continue Reading

  • Simple data converter from Excel

    September 6, 2010  |  Online Applications, Statistics

    If you've ever created an interactive graphic or anything else that requires that you feed in data, you will love this barebones data conversion tool by Shan Carter. Copy and paste data from Excel, which I feel like I've done a billion times, and then take your pick from Actionscript, JSON, XML, and Ruby. Simple, but a potential time saver. [via]

  • Design advanced online and interactive maps with Polymaps

    August 20, 2010  |  Mapping, Software

    Flickshapes map with polymaps

    In a collaboration between SimpleGeo, who makes location data easier to access, and Stamen, who does all kinds of wonderful with maps, announced Polymaps today. It's a free and open-source JavaScript library for image- and vector-tiled maps using SVG.

    Polymaps provides speedy display of multi-zoom datasets over maps, and supports a variety of visual presentations for tiled vector data, in addition to the usual cartography from OpenStreetMap, CloudMade, Bing, and other providers of image-based web maps.

    Because Polymaps can load data at a full range of scales, it’s ideal for showing information from country level on down to states, cities, neighborhoods, and individual streets. Because Polymaps uses SVG (Scalable Vector Graphics) to display information, you can use familiar, comfortable CSS rules to define the design of your data. And because Polymaps uses the well known spherical mercator tile format for its imagery and its data, publishing information is a snap.

    The above is map using Flickr shapefiles. Here's a map of pavement quality in San Francisco.
    Continue Reading

  • Stamen makes experimental prettymaps

    August 16, 2010  |  Mapping, Software

    Los Angeles prettymap by Stamen Design

    Add another toy to Stamen's bag of tricks. The recently launched prettymaps by Aaron Straup Cope uses shapefiles from Flickr, urban areas from Natural Earth, and road, highway, and path data form OpenStreetMap, for an interactive map that's well, pretty.
    Continue Reading

  • Gapminder makes its way to the desktop

    July 13, 2010  |  Software, Statistical Visualization

    Gapminder Desktop

    You've seen the presentation. You've seen the motion graph tool. But up until now, the data exploration tool, Trendalyzer, has always been in the browser. Now you can download the desktop version, and keep everything on your own computer with Gapminder Desktop:

    Gapminder Desktop is particularly useful for presentations as it allows you to prepare your graphs in advance and you won’t need an Internet connection at your lecture or presentation.

    In the "list of graphs" you will get at preset list of graphs on the left side, but you can also very easily create your own favorite examples. Simply arrange the graph the way you want it and click “bookmark this graph”. Your example will the appear in your own list of favorite graphs. Perfect when you want to prepare a lecture or presentation.

    Basically, it's the exact same thing as the online version as an Adobe Air application, which is handy for all you motion graph fans out there.
    Continue Reading

  • Poyozo the personal data gatherer

    July 7, 2010  |  Self-surveillance, Software

    Poyozo the personal data gatherer

    Take a moment and think off all the data you put other there on separate Web services. Email, photos, status updates, documents, location, contacts, and the list goes on. Many of the services are really good, but what if they went down? Where would are your data go? Or what if you could bring all that data into one place, so that you didn't have to login to Flickr, Twitter, Foursquare, and Facebook. Poyozo tries to get all your data in one place - on your own computer - and help "make life make sense."

    Poyozo gives you your own data back by downloading the information you're currently giving to the web on to your own computer. You can opt-in to importing your data from Facebook, Twitter, Foursquare, Last.fm, Google Calendar, any email service, any RSS feed, Flickr, Wesabe, Listit, Skydeck, Dopplr, your Firefox browsing history, the local weather, and your location, allowing you to access all of this personal data as easily as the companies that run these services can.

    Simply install the Firefox plugin, choose what services you want to scrape, and you're good to go. Poyozo then provides an API that you can use to access and query your data. Visualize it any way you want. Continue Reading

  • Stack Overflow for data geeks

    July 6, 2010  |  Online Applications

    I can't count how many times I've googled a programming-related question and found myself at Stack Overflow, the question and answer site for programmers. MetaOptimize is like a Stack Overflow for data geeks:

    You and other data geeks can ask and answer questions on machine learning, natural language processing, artificial intelligence, text analysis, information retrieval, search, data mining, statistical modeling, and data visualization.

    Here you can ask and answer questions, comment and vote for the questions of others and their answers. Both questions and answers can be revised and improved. Questions can be tagged with the relevant keywords to simplify future access and organize the accumulated material.

    Those with some data munging under their belt might find MetaOptimize useful. If you're a n00b, you might want to stick to the FD forums.

    [Thanks, John]

  • JavaScript InfoVis Toolkit 2.0 released

    July 6, 2010  |  Software, Statistical Visualization

    the Jit treemap example

    Visualization in JavaScript is all the rage these days. Just a couple of years ago, this would've seemed ridiculous because the engines were too slow, but no more of that. To that end, Nicolas Garcia Belmonte just released his JavaScript InfoVis Toolkit 2.0. It's got your treemaps, stacked area charts, pie charts, weighted graph, so on and so forth. You can see all the demos, plus code examples to get the full picture.

    This is not dissimilar to Protovis from the Stanford visualization group. Although, I'm told the JIT is fully functioning in Internet Explorer. Protovis only partly works in IE right now.

  • Protovis 3.2 released – more examples and layouts

    June 7, 2010  |  Software, Statistical Visualization

    parralel coordinates

    The most recent version of Protovis, the open-source visualization library that uses JavaScript and SVG, was just released not too long ago - this time with more layout and examples. This is especially helpful since Protovis was "designed to be learned by example." Among the new stuff is the ever popular streamgraphs, along with the force-directed layout. With only 10 to 20 lines of code, you'll have your viz, so lots of bang for the buck.

    There are, however, still some limitations with dreaded Internet Explorer (mainly with interaction), but they're getting there, I think.

    Find plenty of other examples on the Protovis site. Robert Kosara has also started a series of Protovis tutorials on how to use the library if you want some guidance on where to start.

  • R for enterprise?

    June 4, 2010  |  Software

    Norman Nie, co-creator of SPSS (acquired by IBM for $1.2 billion last summer), and his group Revolution Analytics aim to bring analysis to a wider audience with a product built on top of R, the popular statistical computing language. They call it Revolution R.

    Noted in a recent Forbes article:

    R is a powerful tool but difficult for novices to use. Nie's Revolution Analytics aims to make it more accessible with a better-organized library, capabilities for bigger jobs and a user interface that lets users drag and drop statistical analyses into place, outputting easily read charts.

    The rest of the article is about Nie, the growing importance of data, etc.

    I'm curious. Has anyone tried Revolution R? They say that it has "faster performance and greater stability" than base R. Is it that much better?

    [Thanks, Victoria]

Unless otherwise noted, graphics and words by me are licensed under Creative Commons BY-NC. Contact original authors for everything else.