• Maris Jensen just made SEC filings readable by humans. The motivation:

    But in the twenty years since, despite hundreds of millions invested in rounds of contracted EDGAR modernization efforts and interactive data false starts, the SEC’s EDGAR has remained almost untouched. In 2014, the SEC is quite literally doing less with SEC filings than their predecessors had planned for 1984. Data tagging is the red-headed stepchild of the Commission — out of hundreds of forms, only about a dozen are filed as structured data — and the first program to automate the selection of SEC filings for review, the Division of Economic and Risk Analysis (DERA)’s ‘Robocop’, has been ‘aspirational’ for years. The academics in the division responsible for the SEC’s interactive data initiatives write papers about information asymmetry, using EDGAR data they repurchase in usable form for millions each year, but do nothing to fix it. Companies are chastised for insufficient and inefficient disclosure, while the SEC fails to help retail investors navigate corporate disclosures at all.

    Look up a company and see their financials, ownership, influences, and board members, among other things typically not so straightforward to look up.

  • This is all sorts of neat. Researchers Andrew Adamatzky and Ramon Alonso-Sanz are using a slime mold, P polycephalum, to find the most efficient road routes to provide guidance on how to rework them. P polycephalum is a single-celled organism that forages for food through various branches, and when it finds the most efficient food source, backs away from the others. The video above is a sped up version of it in action. Adamatzky and Alonso-Sanz put a map underneath.

    We cut agar plates in a shape of Iberian peninsula, place oat flakes at the sites of major urban areas and analyse the foraging network developed. We compare the plasmodial network with principle motorways and also analyse man-made and plasmodium networks in a framework of planar proximity graphs.

    [via infosthetics]

  • Nick Danforth for Al Jazeera delves into the history books for why north is typically on the top of our maps. There’s no single reason for it, but Ptolemy might have had something to do with it.

    The north’s position was ultimately secured by the beginning of the 16th century, thanks to Ptolemy, with another European discovery that, like the New World, others had known about for quite some time. Ptolemy was a Hellenic cartographer from Egypt whose work in the second century A.D. laid out a systematic approach to mapping the world, complete with intersecting lines of longitude and latitude on a half-eaten-doughnut-shaped projection that reflected the curvature of the earth. The cartographers who made the first big, beautiful maps of the entire world, Old and New — men like Gerardus Mercator, Henricus Martellus Germanus and Martin Waldseemuller — were obsessed with Ptolemy. They turned out copies of Ptolemy’s Geography on the newly invented printing press, put his portrait in the corners of their maps and used his writings to fill in places they had never been, even as their own discoveries were revealing the limitations of his work.

    Ptolemy put north on top. Although, we don’t know why he put it there.

  • Victor Powell, who has visualized the Central Limit Theorem and Simpson’s Paradox, most recently provided a visual explainer for conditional probability.

    Two bars, one blue and one red, represent two events that can happen together or independently of the other. When a ball hits a bar the corresponding event occurs. What is the probability that one event occurs given that the other does and vice versa? If the probability of both events increases and decreases, how does that change the separate probabilities? Sliders and options let you experiment, and the visual and counters change to help you learn.

    A fun one to tinker with.

  • As most of us know, it’s not easy getting by on minimum wage, and in some places it’s not possible. The New York Times provides a calculator to see how challenging it can be.

    A simple visual on the right shows dollars made per year, one box per dollar colored green initially and then red to signal debt. It’s a good way to make the numbers more relatable. Select a state, enter expenses, and watch dollars disappear, and most likely you’ll end up in the red early.

  • Using Dates and Times in R.

    — Jerzy Wieczorek describes his first semester as a stat PhD student.

    — Apparently there’s a stochastic process in probability theory called the Chinese Restaurant Process, and a closely related Indian Buffet Process. Matt Dickenson made a quick visualization to demonstrate the latter. These require more investigation for their names alone.

    — Stravinky’s The Rite of Spring visualized.

    — Why didn’t anyone tell me Arduino was so fun and accessible? It’s like LEGOs raised to the nerdeth degree.

    Source code in TV and film.

  • Nicholas Felton, Drew Breunig, and Friends of the Web released Reporter for iPhone. The app—$3.99 on the app store—prompts you with quizzes, such as who you’re with or what you’re doing, sparsely throughout the day to help you collect data about yourself and surroundings. You can also create your own survey questions to collect data on what interests you and use your phone’s existing capabilities to record location, sound levels, weather, and photo counts automatically.
    Read More

  • Kirk Goldsberry talks the rise of analytics usage in the NBA. With cameras above every court recording player movements, there’s a higher granularity analysis that is now possible, beyond the box score. One of the key metrics is expected possession value, or EPV, which estimates the number of points a possession is worth, given where everyone is on the court and where the ball is.

    But the clearest application of EPV is quantifying a player’s overall offensive value, taking into account every single action he has performed with the ball over the course of a game, a road trip, or even a season. We can use EPV to collapse thousands of actions into a single value and estimate a player’s true value by asking how many points he adds compared with a hypothetical replacement player, artificially inserted into the exact same basketball situations. This value might be called “EPV-added” or “points added.”

    As a basketball fan, I hope this makes the game more fun and interesting to watch, and as a statistician, I hope this work can be applied to other facets of life like traffic or local movements. If just the latter, that’d be fine too.

  • You can make static maps in R relatively well, if you know what packages to use and what to look for, but there isn’t much direct interaction with your graphics. rMaps is a package that helps you create maps that you can mouse over and zoom in to.

    Don’t get too excited though. A scan of the docs shows that it’s basically a wrapper around JavaScript libraries Leaflet, DataMaps and Crosslet, so you could learn those directly instead, and you’d be better for it in the long run if you plan to make more maps. But if you’re just working on a one-off or must stay in R because your life depends on, rMaps might be an option.

  • Winter Olympic events are filled with subtleties that if you know about them, can help you appreciate athletes’ skills and the sports a bit more. The New York Times published three explainer videos to help you do just that. So far, there’s one on slopestyle, which has roots in the Winter X Games, another on the luge, which is freakin’ dangerous, and the halfpipe, from Shaun White’s perspective. The features are a nice combination of video, graphics, and narrative.

    If you’re watching the Olympics, do yourself a favor and bookmark NYT Olympic coverage.

  • Mushon Zer-Aviv offers up examples and guidance on lying with visualization.

    We don’t spread visual lies by presenting false data. That would be lying. We lie by misrepresenting the data to tell the very specific story we’re interested in telling. If this is making you slightly uncomfortable, that’s a good thing, it should. If you’re concerned about adopting this new and scary habit, well, don’t worry, it’s not new. Just open your CV to be reminded you’ve lied with truthful data before. This time however, it will be explicit and visual.

    It comes back to the whole “let the data speak” ideal. Data might have something to say, but the analyst, designer, etc still has to translate, whether that’s through statistical methods or visualization. Sometimes meaning gets lost when you’re not careful.

  • Remember that TED talk from a couple of years ago on texting patterns to a crisis hotline? The TED talker Nancy Lublin proposed the analysis of these text messages to potentially help the individuals texting. Her group, the Crisis Text Line, plans to release anonymized aggregates in the coming months.

    Ms. Lublin said texts also provided real-time information that showed patterns for people in crisis.

    Crisis Text Line’s data, she said, suggests that children with eating disorders seek help more often Sunday through Tuesday, that self-cutters do not wait until after school to hurt themselves, and that depression is reported three times as much in El Paso as in Chicago.

    This spring, Crisis Text Line intends to make the aggregate data available to the public. “My dream,” Ms. Lublin said, “is that public health officials will use this data and tailor public policy solutions around it.”

    Keeping an eye on this.

  • In case you’re wondering how to travel the country without a car (in a way other than running), this map from the American Intercity Bus Riders Association [pdf] shows you all the bus and Amtrak routes that span the United States. Keep in mind that these trains don’t run 24/7, so plan accordingly.

  • The New York Times published a fun piece that places Winter Olympic events in the city. Events include the luge in Times Square, ski jump in Bryant Park, and speed skating down Broadway.

    The Winter Olympics sometimes gets flack for being the thing in between the more popular Summer Olympics, but I think it has a lot to do with scale and perception of the events. People know how fast they run, but don’t always get how steep the mountains are. I used to go downhill skiing, and from a distance the hills didn’t look especially daunting, but when I stood at the top of the black diamond, it looked pretty scary.

  • There are many exercise apps that allow you to keep track of your…

  • For those who ordered a famous quotes poster: I’ll be updating the printing and shipping status on this page.

    I sent the poster to the printers on Friday, approved the digital proof yesterday, and the posters might be printing as I’m writing this. I still expect to start mailing to you in the middle of this month, assuming posters and (a lot of) shipping supplies are in my hands as scheduled. Thanks!

    For those who did not order a poster but still want one: It’s not too late.

  • We’ve seen plenty of maps the past few weeks that show how bad the weather is, in just about everywhere but California. Kelly Norton looked at it from the other direction and estimated how many pleasant days per year areas of the US get, based on historical NOAA data.

    I decided to take a stab at what constitutes a “pleasant” day and then aggregate NOAA data for the last 23 years to figure out the regions of the United States with the most (and least) pleasant days in a typical year. The results, I think, are not that surprising and pretty much affirm the answer given off the cuff by many of my west coast friends when asked about the best places, “Southern California?” For the areas with the least pleasant days, I admit I would have guessed North Dakota. However, it’s much of Montana that gets an average of a couple of weeks of pleasantness each year.

    Of course the map changes (mainly the geographic range) depending on the definition of a “pleasant” day. In this case it’s defined as one where the mean temperature is between 55 and 75 degrees.

  • In 1932, Charles O. Paullin and John K. Wright published Atlas of the Historical Geography of the United States, a reference of almost 700 maps about a varied set of topics, such as weather, travel, and population. The Digital Scholarship Lab at the University of Richmond brought the atlas to digital life.

    In this digital edition we’ve tried to bring—hopefully unobtrusively and respectfully—Paullin and Wright’s maps a bit closer to that ideal. First, with the exception of the historical maps from the cartography section and a handful of others (those that used polar projections, for example), we’ve georeferenced and georectified all of the maps from the atlas so that they can be overlaid consistently within a digital mapping environment. (Georeferencing is a process of linking points on a map to geographic coordinates, and georectification is a process of warping a map using those coordinates to properly align it within a particular projection, here web mercator.) High-quality scans of all of the maps as they appeared on the plates are available too.

    Not only are the maps overlaid on a slippy map, but the lab also added simple interactions with tool tips and animation so you can look more specifically at the data.

    I could spend all day (or several days) looking through this. [Thanks, Lee]

  • Someone ended an email to me last week with “Stay warm.” Not to sound like a jerk, but I happened to be answering email outside with my t-shirt on and sweater slung over the chair. I was also half-wondering whether I should change into shorts. Anyway, this map by Alexandr Trubetskoy, or reddit user atrubetskoy, might be of interest to many of you not in California. It shows an estimated amount of snow required to close school for the day, by county.
    Read More

  • Statistician John Chambers, the creator of S and a core member of R, talks about how R came to be in the short video below. Warning: Super nerdy waters ahead.

    I’ve heard this story before, but it was nice to hear it again, since it is about something I use almost every day. I would also like to hear about the invention of the toilet. [via Revolutions]