Optimizing your R code

Posted to Coding  |  Tags: ,  |  Nathan Yau

Hadley Wickham offers a detailed, practical guide to finding and removing the major bottlenecks in your R code.

It’s easy to get caught up in trying to remove all bottlenecks. Don’t! Your time is valuable and is better spent analysing your data, not eliminating possible inefficiencies in your code. Be pragmatic: don’t spend hours of your time to save seconds of computer time. To enforce this advice, you should set a goal time for your code and only optimise only up to that goal. This means you will not eliminate all bottlenecks. Some you will not get to because you’ve met your goal. Others you may need to pass over and accept either because there is no quick and easy solution or because the code is already well-optimized and no significant improvement is possible. Accept these possibilities and move on to the next candidate.

This is how I approach it. Some people spend a lot of time optimizing, but I’m usually better off writing code without speed in mind initially. Then I deal with it if it’s actually a problem. I can’t remember the last time that happened though. Obviously, this approach won’t work in all settings. So just use common sense. If it takes you longer to optimize than it does to run your “slow” code, you’ve got your answer.


The Best Data Visualization Projects of 2014

It’s always tough to pick my favorite visualization projects. Nevertheless, I gave it a go.

A Day in the Life of Americans

I wanted to see how daily patterns emerge at the individual level and how a person’s entire day plays out. So I simulated 1,000 of them.

Reviving the Statistical Atlas of the United States with New Data

Due to budget cuts, there is no plan for an updated atlas. So I recreated the original 1870 Atlas using today’s publicly available data.

Causes of Death

There are many ways to die. Cancer. Infection. Mental. External. This is how different groups of people died over the past 10 years, visualized by age.