• PlotDevice: Draw with Python

    September 2, 2014  |  Software

    PlotDevice

    You've been able to visualize data with Python for a while, but Mac application PlotDevice from Christian Swinehart couples code and graphics more tightly. Write code on the right. Watch graphics change on the right.

    The application gives you everything you need to start writing programs that draw to a virtual canvas. It features a text editor with syntax highlighting and tab completion plus a zoomable graphics viewer and a variety of export options.

    PlotDevice's simple but com­pre­hen­sive set of graphics commands will be familiar to users of similar graphics tools like NodeBox or Processing. And if you're new to programming, you'll find there's nothing better than being able to see the results of your code as you learn to think like a computer.

    Looks promising. Although when I downloaded it and tried to run it, nothing happened. I'm guessing there's still compatibility issues to iron out at version 0.9.4. Hopefully that clears up soon. [via Waxy]

  • CSV Fingerprint: Spot errors in your data at a glance

    August 14, 2014  |  Online Applications

    CSV Fingerprint

    You get your CSV file, snuggle under your blanket with a glass of fine wine, all ready for the perfect Saturday night. Then — what the heck — there's a bunch of missing data and poorly formatted entries. Don't let this happen to you. CSV Fingerprint by Victor Powell provides a simple, wideout view of your CSV file, color-coded for quick quality control.

    To make it easier to spot mistakes, I've made a "CSV Fingerprint" viewer (named after the "Fashion Fingerprints" from The New York Times's "Front Row to Fashion Week" interactive ). The idea is to provide a birdseye view of the file without too much distracting detail. The idea is similar to Tufte's Image Quilts...a qualitative view, as opposed to a rendering of the data in the file themselves. In this sense, the CSV Fingerprint is a sort of meta visualization.

    Try it with your own CSV data. Never let a subpar CSV file ruin your Saturday night again.

  • Vector maps on the web with Mapbox GL

    August 12, 2014  |  Software

    Mapbox GL

    Online mapping just got an upgrade:

    Announcing Mapbox GL JS — a fast and powerful new system for web maps. Mapbox GL JS is a client-side renderer, so it uses JavaScript and WebGL to dynamically draw data with the speed and smoothness of a video game. Instead of fixing styles and zoom levels at the server level, Mapbox GL puts power in JavaScript, allowing for dynamic styling and freeform interactivity.

    For the non-developers: Online maps are typically stored pre-made on a server, in the form of a bunch of image files that are stitched together when you zoom in and out of a map. So developers have to periodically update the image files if they want their base maps to change. It's a hassle, which is why base maps often look similar. With Mapbox GL, making changes is easier because the development pipeline is shorter.

    More details on the JavaScript library here.

  • Accessible Web visuals and code with p5.js

    August 7, 2014  |  Coding

    p5 JavaScript library

    Visualization on the Web can be tricky for those unfamiliar with code. The new JavaScript library p5.js, developed by Lauren McCarthy and collaborators, aims to make your first steps easier and less painful.

    Using the original metaphor of a software sketchbook, p5.js has a full set of drawing functionality. However, you're not limited to your drawing canvas, you can think of your whole browser page as your sketch! For this, p5.js has addon libraries that make it easy to interact with other HTML5 objects, including text, input, video, webcam, and sound.

    The library follows some of the same philosophy as Processing — that is, straightforward to get up and running — and reimagines the implementation and approach for recent web technology. Even if you're not into programming, it's worth visiting if just to watch, listen, and interact with Dan Shiffman as he enthusiastically talks about the library.

  • Mirador: A tool to help you find correlations in complex datasets

    June 25, 2014  |  Software

    Mirador

    Mirador, a collaborative effort led by Andrés Colubri from Fathom Information Design, is a tool that helps you find correlative patterns in datasets with a lot of variables and observations. It's in the early stages of development, but is available to use and test on Windows and Mac. Colubri explains the process, from its early stages to its current iteration.

    Although fields like Machine Learning and Bayesian Statistics have grown enormously in the past decades and offer techniques that allows the computer to infer predictive models from data, these techniques require careful calibration and overall supervision from the expert users who run these learning and inference algorithms. A key consideration is what variables to include in the inference process, since too few variables might result in a highly-biased model, while too many of them would lead to overfitting and large variance on new data (what is called the bias-variance dilemma.)

    Leaving aside model building, an exploratory overview of the correlations in a dataset is also important in situations where one needs to quickly survey association patterns in order to understand ongoing processes, for example, the spread of an infectious disease or the relationship between individual behaviors and health indicators.

    Download Mirador to try it for yourself.

  • R meme generator

    June 17, 2014  |  Software

    Nobody asked for it, so you got it. The meme package for R by Thomas Leeper lets you create the web's most popular memes in a line of code. Enjoy.

    R all the things

  • Beaker allows data exploration in various languages

    May 20, 2014  |  Software

    Beaker Notebook

    Currently in beta, Beaker lets you work and experiment with data with different languages, but in one environment.

    Beaker is a code notebook that allows you to analyze, visualize, and document data using multiple programming languages including Python, R, Groovy, Julia, and Node. Beaker's plugin-based polyglot architecture enables you to seamlessly switch between languages and add support for new languages.

    Sounds like a good place to tuck away your snippets or development in the early stages of larger projects.

  • Responsive data tables

    May 13, 2014  |  Coding

    responsive table

    Alyson Hurt for NPR Visuals describes how they make responsive data tables for their articles. That is, a table might look fine on a desktop but then it might be illegible on a mobile device. This is a start in making tables that work in more places.

  • Optimizing your R code

    May 9, 2014  |  Coding

    Hadley Wickham offers a detailed, practical guide to finding and removing the major bottlenecks in your R code.

    It's easy to get caught up in trying to remove all bottlenecks. Don't! Your time is valuable and is better spent analysing your data, not eliminating possible inefficiencies in your code. Be pragmatic: don't spend hours of your time to save seconds of computer time. To enforce this advice, you should set a goal time for your code and only optimise only up to that goal. This means you will not eliminate all bottlenecks. Some you will not get to because you've met your goal. Others you may need to pass over and accept either because there is no quick and easy solution or because the code is already well-optimized and no significant improvement is possible. Accept these possibilities and move on to the next candidate.

    This is how I approach it. Some people spend a lot of time optimizing, but I'm usually better off writing code without speed in mind initially. Then I deal with it if it's actually a problem. I can't remember the last time that happened though. Obviously, this approach won't work in all settings. So just use common sense. If it takes you longer to optimize than it does to run your "slow" code, you've got your answer.

  • Create a barebones R package from scratch

    May 6, 2014  |  Coding

    While we're on an R kick, Hilary Parker described how to create an R package from scratch, not just to share code with others but to save yourself some time on future projects. It's not as hard as it seems.

    This tutorial is not about making a beautiful, perfect R package. This tutorial is about creating a bare-minimum R package so that you don’t have to keep thinking to yourself, "I really should just make an R package with these functions so I don't have to keep copy/pasting them like a goddamn luddite." Seriously, it doesn't have to be about sharing your code (although that is an added benefit!). It is about saving yourself time. (n.b. this is my attitude about all reproducibility.)

    I need to do this. I've been meaning to wrap everything up for a while now, but it seemed like such a chore. Sometimes I even go back to my own tutorials for copy and paste action. Now I know better. And that's half the battle.

  • R for cats and cat lovers

    May 6, 2014  |  Coding

    Programmer catFollowing the lead of JavaScript for Cats by Maxwell Ogden, Scott Chamberlain and Carson Sievert wrote R for Cats. It's a playful introduction to R intended for those who have little to no programming experience.

    The bulk of it so far is a primer on data structures, and there's a little bit on functions and some dos and don'ts. It's stuff you should know before you get into more advanced tutorials.

    Mainly though: ooo look, kitty.

    Once you're done with that (It only takes about 30 minutes.), there are lots of other resources for getting started with R.

  • Combatting the Obsession with New Tools

    April 29, 2014  |  Software

    Michal Migurski thinks about finding the right job for the tool rather than the other way around:

    Near the second half of most nerd debates, your likelihood of hearing the phrase "pick the right tool for the job" approaches 100% (cf. frameworks, rails, more rails, node, drupal, jquery, rails again). "Right tool for the job" is a conversation killer, because no shit. You shouldn't be using the wrong tool. And yet, working in code is working in language (naming things is the second hard problem) so it's equally in-bounds to debate the choice of job for the tool. "Right tool" assumes that the Job is a constant and the Tool is a variable, but this is an arbitrary choice and notably contradicted by our own research into the motivations of idealistic geeks.

    Along the same lines, Frank Chimero on not trying any new tools for the year and how each represents someone's perspective:

    Everything that's made has a bias, but simple implements—a hammer, a lever, a text editor—assume little and ask less. The tool doesn't force the hand. But digital tools for information work are spookier. The tools can force the mind, since they have an ideological perspective baked into them. To best use the tool, you must think like the people who made it. This situation, at its best, is called learning. But more often than not, with my tools, it feels like the tail wagging the dog.

    These approaches apply well to analysis and visualization. In the early goings especially, there tends to be an obsession with what tools to use. Which is best? Which is fastest? Which can handle the most data? Which makes everything beautiful? And yeah, it's good to give these some thought in the beginning, but don't get stuck asking so many questions or pondering so many scenarios that you never settle down and do actual work.

    There's always going to be a new application that promises to help you do something with your data. Work on this stuff long enough and you'll find that you probably won't need that new thing.

  • Learn regular expressions with RegExr

    April 29, 2014  |  Online Applications

    RegExrLearning regular expressions tends to involve a lot of trial and error and can be confusing for newcomers. RegExr is an online tool that lets you learn more interactively. Add a body of text in one area and type various regular expressions in another. Matches are highlighted and errors are noted on the fly, which is kind of perfect. Even if you aren't new to regular expressions, this is worth bookmarking for later.

  • PourOver allows filtering of large datasets in your browser

    April 24, 2014  |  Software

    The New York Times released PourOver, a library that lets you do database-like things client-side, so that (1) you, the developer, can worry less about database optimization and server loads and (2) users get a more responsive, faster experience.

    PourOver is built around the ideal of simple queries that can be arbitrarily composed with each other, without having to recalculate their results. You can union, intersect, and difference queries. PourOver will remember how your queries were constructed and can smartly update them when items are added or modified. You also get useful features like collections that buffer their information periodically, views that page and cache, fast sorting, and much, much more.

    Also: How great is it that The New York Times is now getting into the habit of releasing code?

  • Extract CSV data from PDF files with Tabula

    April 8, 2014  |  Software

    Tabula

    Tabula, by Manuel Aristarán, came out months ago, but I've been poking at government data recently and came back to this useful piece of free software to get the data tables out of countless free-floating PDF files.

    If you've ever tried to do anything with data provided to you in PDFs, you know how painful this is — you can't easily copy-and-paste rows of data out of PDF files. Tabula allows you to extract that data in CSV format, through a simple interface.

    It's not the fastest software in the world, but it really is simple to use and it sure beats manual entry. You just load a PDF file into Tabula, which runs on your computer, highlight the table to extract, and the program does the rest. Save as a CSV and do what you want with it.

    Download Tabula here. Find out a little more about it on Source.

  • Interactive maps with R

    February 11, 2014  |  Software

    Interactive maps with R

    You can make static maps in R relatively well, if you know what packages to use and what to look for, but there isn't much direct interaction with your graphics. rMaps is a package that helps you create maps that you can mouse over and zoom in to.

    Don't get too excited though. A scan of the docs shows that it's basically a wrapper around JavaScript libraries Leaflet, DataMaps and Crosslet, so you could learn those directly instead, and you'd be better for it in the long run if you plan to make more maps. But if you're just working on a one-off or must stay in R because your life depends on, rMaps might be an option.

  • Learn R interactively with the swirl package

    January 29, 2014  |  Software

    R, the statistical computing language of choice and what I use the most, can seem odd to those new to the language or programming. And I think this what holds a lot of people back and what keeps people stuck in limited software. The swirl package for R helps beginners get over that first hurdle by teaching you within R itself.

    swirl is a software package for the R statistical programming language. Its purpose is to teach users statistics and R simultaneously and interactively. It attempts to do this in the most authentic learning environment possible by guiding users through interactive lessons directly within the R console.

    Assuming you installed R on your computer already, install the package (and the other packages it depends on), make a call to swirl(), and you get a guide through the basics.

  • Introducing R to a non-programmer, in an hour

    January 7, 2014  |  Coding

    Biostatistics PhD candidate Alyssa Frazee was tasked with teaching her sister, an undergraduate in sociology, how to use R. She had only one hour.

    Once you load in a dataset, things start to get fun. We learned a whole bunch of stuff from this data frame, like how to do basic tabulations and calculate summary statistics, how to figure out if you have missing data, and how to fit a simple linear model. This part was pretty fun because my sister started leading the session: instead of me saying "I'm going to show you how to do this," it was her asking "Hey, could we make a scatterplot?" or "Do you think we could put the best-fit line on that plot?" I was really glad this happened — I hope it meant she was engaged and enjoying herself!

    This is the nice thing about R. There are so many built-in functions and packages that you can get something useful with a few lines of code, and you don't really even have to know what a function is to get started (although you should eventually). Then you can go as far down the rabbit hole as you want.

  • Bokeh, a Python library for interactive visualization

    November 22, 2013  |  Software

    Bokeh

    Bokeh, a Python library by Continuum Analytics, helps you visualize your data on the web.

    Bokeh is a Python interactive visualization library for large datasets that natively uses the latest web technologies. Its goal is to provide elegant, concise construction of novel graphics in the style of Protovis/D3, while delivering high-performance interactivity over large data to thin clients.

    If you're new to this stuff, you might just want to start with D3.js simply to avoid the Python setup, but if you use Python exclusively already, this might fit well in your workflow.

  • Databases for lazy people, a Python library

    November 15, 2013  |  Software

    Friedrich Lindenberg and Gregor Aisch recently released dataset, a Python library to take the grunt work out of using databases in Python.

    Although managing data in relational database has plenty of benefits, they’re rarely used in day-to-day work with small to medium scale datasets. But why is that? Why do we see an awful lot of data stored in static files in CSV or JSON format, even though they are hard to query and update incrementally?

    The answer is that programmers are lazy, and thus they tend to prefer the easiest solution they find. And in Python, a database isn't the simplest solution for storing a bunch of structured data. This is what dataset is going to change!

    So many times I start with a dataset, try to avoid the busy work in creating a database for a smallish project, and eventually dig up an old script or the most recent version of it. Saving this one for later.

Copyright © 2007-2014 FlowingData. All rights reserved. Hosted by Linode.