• Data grab bag

    February 14, 2014  |  Miscellaneous

    Spring

    Using Dates and Times in R.

    — Jerzy Wieczorek describes his first semester as a stat PhD student.

    — Apparently there's a stochastic process in probability theory called the Chinese Restaurant Process, and a closely related Indian Buffet Process. Matt Dickenson made a quick visualization to demonstrate the latter. These require more investigation for their names alone.

    — Stravinky's The Rite of Spring visualized.

    — Why didn't anyone tell me Arduino was so fun and accessible? It's like LEGOs raised to the nerdeth degree.

    Source code in TV and film.

  • Reporter app, for self-discovery through data

    February 13, 2014  |  Self-surveillance

    Reporter app

    Nicholas Felton, Drew Breunig, and Friends of the Web released Reporter for iPhone. The app—$3.99 on the app store—prompts you with quizzes, such as who you're with or what you're doing, sparsely throughout the day to help you collect data about yourself and surroundings. You can also create your own survey questions to collect data on what interests you and use your phone's existing capabilities to record location, sound levels, weather, and photo counts automatically.
    Continue Reading

  • Basketball analytics

    February 12, 2014  |  Statistics

    Kirk Goldsberry talks the rise of analytics usage in the NBA. With cameras above every court recording player movements, there's a higher granularity analysis that is now possible, beyond the box score. One of the key metrics is expected possession value, or EPV, which estimates the number of points a possession is worth, given where everyone is on the court and where the ball is.

    But the clearest application of EPV is quantifying a player's overall offensive value, taking into account every single action he has performed with the ball over the course of a game, a road trip, or even a season. We can use EPV to collapse thousands of actions into a single value and estimate a player's true value by asking how many points he adds compared with a hypothetical replacement player, artificially inserted into the exact same basketball situations. This value might be called "EPV-added" or "points added."

    As a basketball fan, I hope this makes the game more fun and interesting to watch, and as a statistician, I hope this work can be applied to other facets of life like traffic or local movements. If just the latter, that'd be fine too.

  • Interactive maps with R

    February 11, 2014  |  Software

    Interactive maps with R

    You can make static maps in R relatively well, if you know what packages to use and what to look for, but there isn't much direct interaction with your graphics. rMaps is a package that helps you create maps that you can mouse over and zoom in to.

    Don't get too excited though. A scan of the docs shows that it's basically a wrapper around JavaScript libraries Leaflet, DataMaps and Crosslet, so you could learn those directly instead, and you'd be better for it in the long run if you plan to make more maps. But if you're just working on a one-off or must stay in R because your life depends on, rMaps might be an option.

  • Olympic event explainer videos

    February 10, 2014  |  Infographics

    Olympics coverage by NYT

    Winter Olympic events are filled with subtleties that if you know about them, can help you appreciate athletes' skills and the sports a bit more. The New York Times published three explainer videos to help you do just that. So far, there's one on slopestyle, which has roots in the Winter X Games, another on the luge, which is freakin' dangerous, and the halfpipe, from Shaun White's perspective. The features are a nice combination of video, graphics, and narrative.

    If you're watching the Olympics, do yourself a favor and bookmark NYT Olympic coverage.

  • Disinformation visualization

    February 7, 2014  |  Design

    Mushon Zer-Aviv offers up examples and guidance on lying with visualization.

    We don't spread visual lies by presenting false data. That would be lying. We lie by misrepresenting the data to tell the very specific story we're interested in telling. If this is making you slightly uncomfortable, that's a good thing, it should. If you're concerned about adopting this new and scary habit, well, don't worry, it's not new. Just open your CV to be reminded you've lied with truthful data before. This time however, it will be explicit and visual.

    It comes back to the whole "let the data speak" ideal. Data might have something to say, but the analyst, designer, etc still has to translate, whether that's through statistical methods or visualization. Sometimes meaning gets lost when you're not careful.

  • Texting data to save lives

    February 6, 2014  |  Data Sources

    Remember that TED talk from a couple of years ago on texting patterns to a crisis hotline? The TED talker Nancy Lublin proposed the analysis of these text messages to potentially help the individuals texting. Her group, the Crisis Text Line, plans to release anonymized aggregates in the coming months.

    Ms. Lublin said texts also provided real-time information that showed patterns for people in crisis.

    Crisis Text Line's data, she said, suggests that children with eating disorders seek help more often Sunday through Tuesday, that self-cutters do not wait until after school to hurt themselves, and that depression is reported three times as much in El Paso as in Chicago.

    This spring, Crisis Text Line intends to make the aggregate data available to the public. "My dream," Ms. Lublin said, "is that public health officials will use this data and tailor public policy solutions around it."

    Keeping an eye on this.

  • Map: US bus and Amtrak routes

    February 6, 2014  |  Mapping

    Bus routes

    In case you're wondering how to travel the country without a car (in a way other than running), this map from the American Intercity Bus Riders Association [pdf] shows you all the bus and Amtrak routes that span the United States. Keep in mind that these trains don't run 24/7, so plan accordingly.

  • Olympic events placed in New York for scale

    February 5, 2014  |  Infographics

    Bryant park ski jump

    The New York Times published a fun piece that places Winter Olympic events in the city. Events include the luge in Times Square, ski jump in Bryant Park, and speed skating down Broadway.

    The Winter Olympics sometimes gets flack for being the thing in between the more popular Summer Olympics, but I think it has a lot to do with scale and perception of the events. People know how fast they run, but don't always get how steep the mountains are. I used to go downhill skiing, and from a distance the hills didn't look especially daunting, but when I stood at the top of the black diamond, it looked pretty scary.

  • Where people run

    February 5, 2014  |  Data Underload

    Where people run

    There are many exercise apps that allow you to keep track of your running, riding, and other activities. Record speed, time, elevation, and location from your phone, and millions of people do this, me included. However, when we look at activity logs, whether they be our own, from our friends, or from a public timeline, the activities only appear individually.

    What about all together? Not only is it fun to see, but it can be useful to the data collectors to plan future workouts or even city planners who make sure citizens have proper bike lanes and running paths.
    Continue Reading

  • Quotes poster updates

    February 4, 2014  |  Announcements

    For those who ordered a famous quotes poster: I'll be updating the printing and shipping status on this page.

    I sent the poster to the printers on Friday, approved the digital proof yesterday, and the posters might be printing as I'm writing this. I still expect to start mailing to you in the middle of this month, assuming posters and (a lot of) shipping supplies are in my hands as scheduled. Thanks!

    For those who did not order a poster but still want one: It's not too late.

  • Places in the US with the most pleasant days per year

    February 4, 2014  |  Mapping

    Pleasant places to live

    We've seen plenty of maps the past few weeks that show how bad the weather is, in just about everywhere but California. Kelly Norton looked at it from the other direction and estimated how many pleasant days per year areas of the US get, based on historical NOAA data.

    I decided to take a stab at what constitutes a "pleasant" day and then aggregate NOAA data for the last 23 years to figure out the regions of the United States with the most (and least) pleasant days in a typical year. The results, I think, are not that surprising and pretty much affirm the answer given off the cuff by many of my west coast friends when asked about the best places, "Southern California?" For the areas with the least pleasant days, I admit I would have guessed North Dakota. However, it’s much of Montana that gets an average of a couple of weeks of pleasantness each year.

    Of course the map changes (mainly the geographic range) depending on the definition of a "pleasant" day. In this case it's defined as one where the mean temperature is between 55 and 75 degrees.

  • Digitally revamped atlas of historical geography, from 1932

    February 3, 2014  |  Mapping

    Snow cover

    In 1932, Charles O. Paullin and John K. Wright published Atlas of the Historical Geography of the United States, a reference of almost 700 maps about a varied set of topics, such as weather, travel, and population. The Digital Scholarship Lab at the University of Richmond brought the atlas to digital life.

    In this digital edition we've tried to bring—hopefully unobtrusively and respectfully—Paullin and Wright’s maps a bit closer to that ideal. First, with the exception of the historical maps from the cartography section and a handful of others (those that used polar projections, for example), we’ve georeferenced and georectified all of the maps from the atlas so that they can be overlaid consistently within a digital mapping environment. (Georeferencing is a process of linking points on a map to geographic coordinates, and georectification is a process of warping a map using those coordinates to properly align it within a particular projection, here web mercator.) High-quality scans of all of the maps as they appeared on the plates are available too.

    Not only are the maps overlaid on a slippy map, but the lab also added simple interactions with tool tips and animation so you can look more specifically at the data.

    I could spend all day (or several days) looking through this. [Thanks, Lee]

  • Amount of snow to cancel school

    January 31, 2014  |  Mapping

    Snow day

    Someone ended an email to me last week with "Stay warm." Not to sound like a jerk, but I happened to be answering email outside with my t-shirt on and sweater slung over the chair. I was also half-wondering whether I should change into shorts. Anyway, this map by Alexandr Trubetskoy, or reddit user atrubetskoy, might be of interest to many of you not in California. It shows an estimated amount of snow required to close school for the day, by county.
    Continue Reading

  • How R came to be

    January 30, 2014  |  Statistics

    Statistician John Chambers, the creator of S and a core member of R, talks about how R came to be in the short video below. Warning: Super nerdy waters ahead.

    I've heard this story before, but it was nice to hear it again, since it is about something I use almost every day. I would also like to hear about the invention of the toilet. [via Revolutions]

  • History through the president’s words

    January 30, 2014  |  Infographics

    History through the Presidents Words

    The Washington Post visualized the use of specific words throughout the years during State of the Union addresses.

    Since 1900, there have been 116 State of the Union addresses, given by 20 presidents, with some presidents giving two addresses a year. Studying their choice of words, over time, provides glimpses of change in American politics—"communism" fades, "terrorism" increases—and evidence that some things never change ("America" comes up steadily, of course. As does "I.").

    For some reason the interactive won't load for me now (It did yesterday.), but there's also a PDF version that you can download. Although the PDF only goes back to 1989 Bush, so try for the interactive version first. It was an interesting one. Update: Works again.

    Can you believe it? We made it through an entire SOTU without a single word cloud. Come to think of it, I can't even remember the last time I saw one. I almost feel cheated.

  • Last day to pre-order quotes poster

    January 29, 2014  |  Projects

    Famous movie quotesIt's been an interesting few days. I thought a few people would find the famous quotes graphic amusing, but I didn't expect so many to share my odd sense of humor. Thanks.

    If you haven't pre-ordered a poster yet, today's the last day to get it at a discounted price.

    Put your order in here.

    I'm going to proof the poster a few more times tonight and then send it to the printers. They should take about a week to get the finished posters to me. From there, I'll be (really) busy signing and rolling.

    I still expect mid-February shipments to you. International shipping takes a little longer of course, depending on where you are.

  • Learn R interactively with the swirl package

    January 29, 2014  |  Software

    R, the statistical computing language of choice and what I use the most, can seem odd to those new to the language or programming. And I think this what holds a lot of people back and what keeps people stuck in limited software. The swirl package for R helps beginners get over that first hurdle by teaching you within R itself.

    swirl is a software package for the R statistical programming language. Its purpose is to teach users statistics and R simultaneously and interactively. It attempts to do this in the most authentic learning environment possible by guiding users through interactive lessons directly within the R console.

    Assuming you installed R on your computer already, install the package (and the other packages it depends on), make a call to swirl(), and you get a guide through the basics.

  • What a computer sees while watching movies

    January 28, 2014  |  Data Art

    Benjamin Grosser visualized how computers "watch" movies through vision algorithms and artificial intelligence in Computers Watching Movies.

    Computers Watching Movies was computationally produced using software written by the artist. This software uses computer vision algorithms and artificial intelligence routines to give the system some degree of agency, allowing it to decide what it watches and what it does not. Six well-known clips from popular films are used in the work, enabling many viewers to draw upon their own visual memory of a scene when they watch it.

    Above is the bag scene from American Beauty. Contrast this with the more frantic Inception scene, and you get a good idea of how it works. See computer-watching scenes for several more movies here.

  • Members Only
    How to Map Paths in R

    How to Map Geographic Paths in R

    As people and things move through a place, it can be useful to see their connected paths instead of just individual points.
Unless otherwise noted, graphics and words by me are licensed under Creative Commons BY-NC. Contact original authors for everything else.