# Challenge: Graphing obesity trends

Apr 29, 2010

Here we have a case of worthwhile data and an interesting story about obesity trends. People are getting heavier younger. The graph was made to show this; however, it’s hard to understand and kind of makes things more confusing. Can you redesign the above, using the same data, to tell the story more clearly?

## The Data

This is what we know about the data. It comes from a study that took place between 1971 and 2006. We have obesity rates, separated by when study participants were born. So for example, in the above graph, the orange line shows the obesity trend for people who were born between 1956 and 1965. When that group was in between 30 and 39 years old, about 27% of them were obese.

It’s a little confusing at first, but let it simmer for a little. It’s actually not too bad.

## The Question

Okay, now the question: are people getting fatter faster? The original graph suggests that yes, people are, but the story isn’t as clear as it could be. Plus, it took 19 powerpoint slides to tell it. Your job is to put it all in one graphic. Are you up for it? I think so. Leave your suggestions and links to remakes in the comments below.

One more time – here’s the data [csv], and you can find more info about the study here.

[via FD forums]

• My solution http://www.visualisingdata.com/index.php/2010/04/obesity-prevalence-graph-makeover/ (with acknowledgement to Bill for his similar and faster creation of his solution!)

• Much better than mine, I’m at work so I didn’t have much time to play with it. A heatmap is definately the way to go I think but the problem with mine, and to a certain extent, yours, is that they sorta imply that where there is no data the rate is 0%, just due to the white background tying in with the bottom end of the colour scale. Not sure how to fix this without a background colour, but putting the numerical values in, as you’ve done, certainly helps.

• PS, never thought of doing a heatmap in excel using conditional formatting, worked out nicely. I did mine in R.

• Thanks for your response Bill. Nice one for achieving that graph using R, its something I’ve only briefly played with (as far as playing goes) but clearly its got a great deal of potential. Your comment about the gaps where there are no data points is completely right. Also, as someone has commented on my site, there is the issue of a visual staircase created which can lead to a pattern perhaps being amplified simply be the absence of data readings for those year of birth groups. I’ve not had time to think about a satisfactory solution to either of these, I’m sure the FlowingData community will arrive at something though!

• Nice job, guys. The heatmap didn’t cross my mind when I was thinking about this, but it works, with the exception of that staircase dilemma.

• I transposed the data so that the cohorts are on the X axis and each separate line represents an age group. So each lines shows the percentage of obese people in a particular age group. This way the graph tells you the probability of you being obese at a given age in a particular decade.

I made it in R (the details are here – http://www.prettygraph.com/blog/response-to-flowingdata-challenge-graphing-obesity-trends/).

What do you guys think?

I like the heatmaps but a better colour scheme might make them easier to read.

• I like it. Does this show that obesity rates peaked and may be starting to drop off?

With regard to heatmap colours, I always struggle a bit with this in R. There are a very limited number of preset colour vectors to use. Also, does anyone know how to invert the colour vectors? I had to invert the data values instead to get the effect I wanted.

• I think we’d need more data going forward to tell if the obesity rates are peaking off.

You could use the rev() function to invert the colour vectors. For example, rev(heat.colors(10)).

• Thanks Hrishi, just what I needed.

• I like that yours shows that people are getting fatter younger (pick an age group and follow as its line slopes upwards). It is still a little hard to make sense of but I think it portrays this idea better than the heat map which I don’t find super helpful (might as well just give me a table).

• Patrick

Dammit, that’s what I did, about 9 hours too late. Guess I should read the comments first.

Anyway,

http://dl.dropbox.com/u/580244/Obesity.png

So, “great minds…” and such, right?

• Just tinkering with this, and if the focus is *childhood* obesity, just ignoring a bunch of the data makes a reasonable point too:

It’s a less “complete” picture, but might make a more dramatic point?

Still kind of interesting because that last point is a little incomplete, in that there are no 9 year olds who were born in 2005 (yet).

• Also, there is no data for % 9yr olds who are obese for 1926 – 1955. Your graph shows it as 0%. This “jump” from 0% to 6% makes the figures look far more significant than they would if the baseline was at 6%.

• Was wondering about that – but the data is missing there, which I assumed to mean 0 – but that of course may not be correct.

Nonetheless, the biggest problem with this data is the inconsistency. Multiple discontinuous lines is confusing, and bars/heat maps have the step problem. If I were telling a story, I’d pull out the data that best supported my story (making sure it didn’t misrepresent the fuller set!) and go from there. Stories need to be simple…! The best example of this is that everybody chooses to plot that 1996-2005 point which has only one data point when displayed in the broader context. It’s odd and distracting.

But maybe I’m too simple a mind..!

Cool exercise, as always, Nathan.

• Agreed PR, perhaps, rather than one graph, this would be a good candidate for a little video? If I was a rich man I’d have a copy of After Effects and would try this. I’m not a rich man.

• Agreed. To me, that makes the strongest point.

• Hm, no time/tools here to try myself, but I’d suggest one thing: I think the key is that the x-axis should be percentage, and the y-axis should be decade of birth. Then perhaps the age comes in as a height, or contoured in? I’m not sure exactly.

What I do know, however, is that sketching on paper with these two axes and writing the age group first number (eg “40” for 40-49) as a scatter plot is interesting and compelling to me even in its very crude state. Just seeing all the ages shoot up to higher percentages at later dates is unsettling.

• In all fairness to the original, it was shown one line at a time in a presentation with explanations along the way. It was not meant to be shown as a single graph with eight lines. As Nathan points out, it is confusing in this form. One reason it is confusing is that it is not labeled properly without the commentary. Adding “Year of birth” to the legend and “Age group” to the x axis would help quite a bit. Also, without the commentary of the original presentation, the multicolored lines don’t work. It would be easier to see that the lines moving from right to left were in decreasing order of birth dates if a sequential color scheme (http://www.personal.psu.edu/cab38/ColorSch/SchHTMLs/CBColorSeq.html) were used. This comment applies to Hrishi’s figure as well to show that the age groups are in order without having to go back and forth from the legend.

Andy’s figure would be less cluttered without a % sign in every cell. He’s already told us that the data is in percent in the title; I don’t need to be told 31 times.

• for sure. i’m sure it was much more clear during the actual presentation. for the sake of this exercise though, we’ll go with one graphic.

• Naomi, those are good points. I decided against using a sequential colour scheme because I thought it would make distinguishing so many different lines harder. Sequential colours would be good if there were fewer lines and the contrast between the colours would be high. In choosing colours, I think of how the graph can be discussed in words and the names of colours become quite important then.

• Here is my visualization:
http://jonas.sekamane.com/2010/04/obesity-trends-–-makeover/

It is kind of index/histogram-ish :)

• That last graphic is great

• This has the same stair-step problem as some earlier submissions, but it’s not nearly as distracting here. My eye was drawn to each set of bars as distinct datasets, with each set of bars just adding increasing context. I do think >50 yrs could just be dropped since there’s fewer data points, but at least in this case they don’t hurt the graphic.

The “average” grey bar is interesting too – I’m not sure it adds much from an interpretation standpoint – since the focus is the trend, but it works visually I think – particularly because it helps in minimizing the stair-step effect.

I think this is the big winner so far!

• Valeria Montero

I really like your histogram graphic

• Hi guys,

Here it is my humble solution:

http://tinyurl.com/3ajwm64

Regards,

Alberto

• Alberto, I like your solution. A small legend on the side would make it perfect.

I am really impressed with graphael and want to learn it. Could you share the code you used to make the graph?

• Hrishi,

I’ve uploaded the html code to my post. Graphael has a lot of potencial in the field of data vis but has not documentation.

• I really like this one.. Sizing the dots according to percentage is a nice way to make a clean graphic. Nice job!

• Cherie

My thought was that the age group date ranges corresponded with years so you could look at the years as the x axis (which looks like a timeline) and follow the age groups that way.

• To me this one far and away demonstrates the increase in rate over time, especially among young children. Nice job.

• Ok, I did this real quick in Excel because I’m at work, but I really wanted to give a poke at this. You can find my graph here:

I thought a stack chart might be the best for this graph as it is a more visual representation of size or percent, as opposed to heights on a line chart.

I did a couple of things here: first I switched the axis so that the series is via age group rather than decade, and the decade born in on the horizontal axis. Next, I put the decades in chronological order. I think this better shows a progressive rise when you compare each series in the stack- and is more clear that we are getting fatter faster, because each block is bigger than the last decade.

• Sheri Gilley

Here’s mine – http://docs.google.com/View?id=dfxr92f6_131gxjnx8dw

I simplified the labels to make it easier to take in, although you could argue that ‘preteen’ isn’t quite as precise as ‘2-9’ .

Also I color coded the birth year in addition to using it on the X axis, with a different facet for each age group. The colors make it easier to read down the axis to see how age affects obesity within a birth year, while the axis allows you to read across to see how it has been changing over the years within each age grouping.

• Charlotte Wickham

I really like Jonas’ plot. The trend is clear within in each age group, rather than emphasizing the trend over age groups within a year. I’m not sure all the different colours are necessary though?

What about looking at the age at which a certain % of people in the age group are obese? Say 20%?

The data aren’t quite what we need to estimate this but here’s a rough go at it. Take the average age in each age group to be its midpoint. Then linearly interpolate at “y = percent obese = 20” to find “x = midpoint of age group we would expect to have 20% obesity”.

There isn’t enough data to estimate this for the recent time periods but the trend is pretty clear.

This relies on you knowing (or assuming) that obesity increases monotonically with age – which is relatively well supported with this data (except in the oldest age group).

• Why use an ordinary chart when I pie chart will do? :-)

http://public.tableausoftware.com/views/ObesityTrends/Summary

Reorganized the data so it fits on a horizontal time scale. Generations run from bottom-left to top-right.

Could also distinguish obese, overweight, normal and underweight, if this data was available…

• Horrific and crude! I know… But for this one, pencils (limited..) and paper are what I have to work with. Also, I skipped the 70yr olds.

http://tinyurl.com/2a3o7zw

I like this because I think it provides interesting insights. I like that I can pick a horizontal line and essentially follow that generation’s life going from left to right.

I also like following the curved lines of constant-age, and seeing that, for example, <10% of 20-yr-olds born in the forties were obese, and this was stable for a bit, then that group fattened up rapidly. Those born in the 60s were also around 10%, but then suddenly those born in the 70s had twice as many obese.

I notice that 2yr olds got quite fat when born in the 80s and 90s, but now hopefully that's reversing a bit.

ALL age groups have dramatically fattened up, and basically at the same rate. The 2, 20, 30, and 50-yr-old groups all show a rapid increase at the SAME TIME chronologically: the 80s! I'd like to transform this graph to show that more explicitly. What happened in the 80s? Very interesting.

• Pourmehr

Ok so here’s what I came up with:
http://tinyurl.com/obesity-rates-3D

I hope this isn’t considered cheating since it’s in 3D …

I also added the white dotted line to emphasize the increasing percentage of [young] obese people.

And here’s another attempt:
http://tinyurl.com/obesity-rates-3D-black

This visualization suggests the same thing: an increasing percentage of [young] obese individuals.

• Louis

Turn the graph 90 degrees to the right and draw in a picture of a person getting fatter (side view). The charts would expand to the right in shape of a tummy. This may look like a pregnant woman though.

You could do something similar mirroring the chart (a bit like a population pyramid). You could repeat it for different calendar periods or make them overlap semi-transparrently.

• wingster

hi folks!

i’m playing around with R at the moment and found that challenge very motivating, thanx!

i’m reading ggplot2 from hadley wickham at the moment, so i just used it for my try:

http://www.flickr.com/photos/[email protected]/4564890011/

it took me hours, but i really learned a lot about R and ggplot2.

conclusion: more challenges! ;o)

greetings from germany!

wingster

• learning is fun!

• Just read a great posting on obesity on a great site.
http://fatthenfitnow.wordpress.com/2010/04/30/the-biggest-line-of-nonsense/

• Edward Carney

Mine is here.

I re-thought the data as a series of lines beginning at n years post-birth (x axis). So the eldest cohort extends farthest to the right. I used numerals to show the data points. This permits directly reading the percentage of obesity from bottom to top at equivalent years post-birth.

It’s evident that at equivalent points after birth, the younger cohorts are more obese. The youngest cohort is strangely low. A good thing?

• Chuck Aligbe

Hey, this is my first time posting, but I have been recently gotten into data visualization as a designer myself, so I am pretty interested in this. Here is one rendition of the obesity data in a coxcomb style.

Coxcomb

The benefits of the coxcomb is the arrangement of the data is in one place showing the increasing trend of obesity as age increases. The data however is exaggerated by the area increases based on the coxcomb, and can be misleading. But the exaggeration works in the favor of making the increasing trend more visible.

Here is the data, represented in a histogram sparkline. This shows the data as individual points that can be scrolled through easily. This is less prone to conceptual problems but cannot show the data in a condensed format.

Sparkline

• In the presentation, it is explained that if you look at a single age group column of data points, in most cases, the newer cohorts for that age group is more obese. This is why the lines for the new cohorts are more to the left.

So the questions I have are:
1. At what times were an age group cohort less obese than than the previous?
2. From their first cohort, what age group has raised the most percentage points?

Here is my viz to answer these questions:
http://public.tableausoftware.com/views/ObesityTrends_1/Dashboard

• Chuck Aligbe

This graph solves one of the problems that exists in the data: the fact that there are three variables (age, date of birth, and percentage of obesity prevalence). By removing the trivial date of birth and interpreting the data as a point of sampling, it makes the data easily readable. However, the data would be more telling in regards to the other two data variables if you use the ages as the x-axis to make a comparison about the correlation between age and rates of obesity prevalence. With a dashboard, it would even be possible to map the three variables in two-variable charts.

• Roger

I’m a little late to the party, but think I’ve at least got a different story out of the data. http://i40.tinypic.com/aeb5nl.jpg

I looked in terms of percentage change and grouped the data by the decades represented (instead of age). It looks like 30 years prior, all groups stopped gaining weight. Then they started back up again the following decade. Weird.

• Roger

sorry. I should be more clear. the graph represents the % more likely a generation is to be obese than the previous generation was at the same age.

The story is that for snapshot in 1975. all age groups were no fatter than the preceding generation. That was unique to that snapshot

• Nick Mamich

I find it curious that occurances of obesity (along with its twin sister diabetes) skyrocketed at the same time smoking was being taxed out of existance;

It seems appears the actuarial study done by the cigarette companies was right on… A smoker will need fewer lifetime benefits than an overweight tub with diabetes.