<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>FlowingData &#187; Tutorials</title>
	<atom:link href="http://flowingdata.com/category/tutorials/feed/" rel="self" type="application/rss+xml" />
	<link>http://flowingdata.com</link>
	<description>Strength in Numbers</description>
	<lastBuildDate>Fri, 10 Feb 2012 20:28:38 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
	<atom:link rel="next" href="http://flowingdata.com/category/tutorials/feed/?page=2" />

		<item>
		<title>Members Only: How to Make a Contour Map</title>
		<link>http://flowingdata.com/2012/02/07/how-to-make-a-contour-map/</link>
		<comments>http://flowingdata.com/2012/02/07/how-to-make-a-contour-map/#comments</comments>
		<pubDate>Tue, 07 Feb 2012 09:32:24 +0000</pubDate>
		<dc:creator>Nathan Yau</dc:creator>
				<category><![CDATA[Tutorials]]></category>
		<category><![CDATA[contour map]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://flowingdata.com/?p=21149</guid>
		<description><![CDATA[<p><a href="http://flowingdata.com/2012/02/07/how-to-make-a-contour-map/"><img width="625" height="383" src="http://flowingdata.com/wp-content/uploads/2012/02/10-filled-contour-colors.png" class="attachment-medium wp-post-image" alt="Contour plots" title="Contour plots" /></a></p>Filled contour plots are useful for looking at density across two dimensions and are often used to visualize geographic data. It's straightforward to make them in R &#8212; once you get your data in the right format, that is.]]></description>
			<content:encoded><![CDATA[Filled contour plots are useful for looking at density across two dimensions and are often used to visualize geographic data. It's straightforward to make them in R &mdash; once you get your data in the right format, that is.]]></content:encoded>
			<wfw:commentRss>http://flowingdata.com/2012/02/07/how-to-make-a-contour-map/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Members Only: Using Color Scales and Palettes in R</title>
		<link>http://flowingdata.com/2012/01/19/using-color-scales-and-palettes-in-r/</link>
		<comments>http://flowingdata.com/2012/01/19/using-color-scales-and-palettes-in-r/#comments</comments>
		<pubDate>Thu, 19 Jan 2012 18:55:06 +0000</pubDate>
		<dc:creator>Nathan Yau</dc:creator>
				<category><![CDATA[Tutorials]]></category>
		<category><![CDATA[color]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://flowingdata.com/?p=20827</guid>
		<description><![CDATA[<p><a href="http://flowingdata.com/2012/01/19/using-color-scales-and-palettes-in-r/"><img width="625" height="252" src="http://flowingdata.com/wp-content/uploads/2012/01/colormatrix-stripped.png" class="attachment-medium wp-post-image" alt="Colors in R" title="Colors in R" /></a></p>Color can drastically change how a chart reads and what you see in your data, so don't leave it up to chance with defaults.]]></description>
			<content:encoded><![CDATA[Color can drastically change how a chart reads and what you see in your data, so don't leave it up to chance with defaults.]]></content:encoded>
			<wfw:commentRss>http://flowingdata.com/2012/01/19/using-color-scales-and-palettes-in-r/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Members Only: Build Interactive Time Series Charts with Filters</title>
		<link>http://flowingdata.com/2012/01/05/build-interactive-time-series-charts-with-filters/</link>
		<comments>http://flowingdata.com/2012/01/05/build-interactive-time-series-charts-with-filters/#comments</comments>
		<pubDate>Thu, 05 Jan 2012 17:45:39 +0000</pubDate>
		<dc:creator>Nathan Yau</dc:creator>
				<category><![CDATA[Tutorials]]></category>
		<category><![CDATA[D3]]></category>
		<category><![CDATA[interactive]]></category>
		<category><![CDATA[javascript]]></category>
		<category><![CDATA[time series]]></category>

		<guid isPermaLink="false">http://flowingdata.com/?p=20553</guid>
		<description><![CDATA[<p><a href="http://flowingdata.com/2012/01/05/build-interactive-time-series-charts-with-filters/"><img width="625" height="360" src="http://flowingdata.com/wp-content/uploads/2012/01/Area-charts-625x360.png" class="attachment-medium wp-post-image" alt="Area charts" title="Area charts" /></a></p>When you have several time series over many categories, it can be useful to show them separately rather than put it all in one graph. This is one way to do it interactively with categorical filters.]]></description>
			<content:encoded><![CDATA[When you have several time series over many categories, it can be useful to show them separately rather than put it all in one graph. This is one way to do it interactively with categorical filters.]]></content:encoded>
			<wfw:commentRss>http://flowingdata.com/2012/01/05/build-interactive-time-series-charts-with-filters/feed/</wfw:commentRss>
		<slash:comments>15</slash:comments>
		</item>
		<item>
		<title>How to map connections with great circles</title>
		<link>http://flowingdata.com/2011/05/11/how-to-map-connections-with-great-circles/</link>
		<comments>http://flowingdata.com/2011/05/11/how-to-map-connections-with-great-circles/#comments</comments>
		<pubDate>Wed, 11 May 2011 09:27:17 +0000</pubDate>
		<dc:creator>Nathan Yau</dc:creator>
				<category><![CDATA[Tutorials]]></category>
		<category><![CDATA[airlines]]></category>
		<category><![CDATA[arcs]]></category>
		<category><![CDATA[featured]]></category>
		<category><![CDATA[great circle]]></category>

		<guid isPermaLink="false">http://flowingdata.com/?p=16555</guid>
		<description><![CDATA[<p><a href="http://flowingdata.com/2011/05/11/how-to-map-connections-with-great-circles/"><img width="575" height="390" src="http://flowingdata.com/wp-content/uploads/2011/05/4-airline-color.jpg" class="attachment-medium wp-post-image" alt="Mapping connections with great circles" title="Mapping connections with great circles" /></a></p>There are various ways to visualize connections, but one of the most intuitive and straightforward ways is to actually connect entities or objects with lines. And when it comes to <em>geographic</em> connections, great circles are a nice way to do this.]]></description>
			<content:encoded><![CDATA[<p><a href="http://flowingdata.com/2011/05/11/how-to-map-connections-with-great-circles/"><img width="575" height="390" src="http://flowingdata.com/wp-content/uploads/2011/05/4-airline-color.jpg" class="attachment-medium wp-post-image" alt="Mapping connections with great circles" title="Mapping connections with great circles" /></a></p><p>Here's the technical definition of great circles on Wikipedia:</p>
<blockquote><p>A great circle, also known as a Riemannian circle, of a sphere is the intersection of the sphere and a plane which passes through the center point of the sphere, as distinct from a small circle. Any diameter of any great circle coincides with a diameter of the sphere, and therefore all great circles have the same circumference as each other, and have the same center as the sphere. A great circle is the largest circle that can be drawn on any given sphere. Every circle in Euclidean space is a great circle of exactly one sphere.</p></blockquote>
<p><img src="http://flowingdata.com/wp-content/uploads/2011/05/220px-Great_circle-210x192.png" alt="" title="Great circle" width="210" height="192" class="alignnone size-thumbnail wp-image-16563 img-right" />The important bit is that the shortest distance between two points on a sphere is the minor arc of a great circle. When currents and wind don't interfere, ships and aircraft use great circle routes, which makes it perfect to show air carrier coverage. This is what you'll do in this example.</p>
<p>It turns out these maps are relatively easy to make in <a href="http://r-project.org">R</a> once you know how to put the pieces together. The maps that I posted on <a href="http://flowingdata.com/2011/05/05/where-do-major-airlines-fly-in-the-united-states/">flight connections for each airline</a> are what we're after. With only about 30 lines of code, you can produce a series of maps that show flights for every major airline, so you get a lot of bang for the amount of effort.</p>
<h2>Step 0. Setup</h2>
<p><span class="tip">1. I wish they'd update the site; it totally looks like something out of the 1990s, but nevermind that. It's useful software.</span>You're going to use R in this example, so <a href="http://www.r-project.org/">download the free and open-source software</a> if you haven't already. It's a straightforward one-click install<sup>1</sup>.</p>
<h2>Step 1. Load packages</h2>
<p>Open R. You need two packages to do the heavy-lifting: <code>maps</code> and <code>geosphere</code>. If they're not installed, you can do that via the main menu <strong>Packages & Data > Package Installer</strong>. Once installed, load the two packages as follows:</p>
<pre class="brush: r; title: ; notranslate">library(maps)
library(geosphere)</pre>
<p>The first package <code>maps</code>, is used to draw the base maps, and the second, <code>geosphere</code> is used to draw the great circle arcs.</p>
<h2>Step 2. Draw base maps</h2>
<p>The <code>maps</code> package makes it easy to draw geographic areas in R with the <code>map()</code> function. Pass it a database name, and you you get a map in one line of code. For example, to map the United States, type the following in the R console.</p>
<pre class="brush: r; title: ; notranslate">map(&quot;state&quot;)</pre>
<p>Here's the map that you get:</p>
<p><span class="tip">Blank state map created in R. Nothing else.</span><img src="http://flowingdata.com/wp-content/uploads/2011/05/Screen-shot-2011-05-08-at-6.28.19-PM.png" alt="Contiguous USA map" title="Screen shot 2011-05-08 at 6.28.19 PM.png" border="0" width="575" height="301" /></p>
<p><span class="tip">2. Map projections are finicky in R and can be a pain sometimes, but if you want to map with a different projection like say, Albers, look into <code>mapproject()</code> in the <code>maps</code> package.</span>The projection isn't the prettiest thing in the world, but it'll do for now<sup>2</sup>. The bigger problem is that the <code>state</code> database doesn't include Alaska or Hawaii. To include the two often left out states, you use the <code>world</code> database.</p>
<pre class="brush: r; title: ; notranslate">map(&quot;world&quot;)</pre>
<p>This gives you a full black and white map of the world.</p>
<p><span class="tip">A blank world map is just as easy to make in R.</span><img src="http://flowingdata.com/wp-content/uploads/2011/05/Screen-shot-2011-05-08-at-6.28.37-PM1.png" alt="Screen shot 2011 05 08 at 6 28 37 PM" title="Screen shot 2011-05-08 at 6.28.37 PM.png" border="0" width="575" height="275" /></p>
<h2>Step 3. Limiting boundaries</h2>
<p>The data at hand, which you'll get to soon, is only domestic flights for the United States, so you should focus on that area. Use the <code>xlim</code> and <code>ylim</code> variables to limit the map to a rectangle that only covers a range of latitude and longitude. </p>
<p>You can also play with the color of the base map at this point. By default, <code>maps()</code> doesn't fill regions, but it will if you set <code>fill</code> to <code>TRUE</code>. For some reason though, if you set the fill color, you can't change the border color. So instead (if you don't want to edit in Illustrator later) you can set the line width (<code>lwd</code>) to something really skinny. For the purpose of this example, we want the the border lines to get out of the way.</p>
<pre class="brush: r; title: ; notranslate">xlim &lt;- c(-171.738281, -56.601563)
ylim &lt;- c(12.039321, 71.856229)
map(&quot;world&quot;, col=&quot;#f2f2f2&quot;, fill=TRUE, bg=&quot;white&quot;, lwd=0.05, xlim=xlim, ylim=ylim)</pre>
<p>And here's what you get with the above code. A simple and clean map of the United States that includes Alaska and Hawaii.</p>
<p><span class="tip">Focusing on all states. That includes Hawaii and Alaska.</span><img src="http://flowingdata.com/wp-content/uploads/2011/05/Screen-shot-2011-05-08-at-6.29.11-PM.png" alt="Screen shot 2011 05 08 at 6 29 11 PM" title="Screen shot 2011-05-08 at 6.29.11 PM.png" border="0" width="575" height="386" /></p>
<h2>Step 4. Draw connecting lines</h2>
<p>Now that you have a map, you can draw connecting lines. This is really easy with <code>gcIntermediate()</code> from the <code>geosphere</code> package. Pass it the latitude and longitude of the two connecting points, and <code>gcIntermediate()</code> spits out the coordinates of points on the circle. </p>
<p>The <code>n</code> argument indicates how many points you want the function to return. The more points you indicate, the smoother the resulting line will be, but up to a certain point, you won't see much difference. The <code>addStartEnd</code> argument indicates that you want to include the start and end points in the great circle coordinates. Lastly, use <code>lines()</code> to actually draw the line. </p>
<pre class="brush: r; title: ; notranslate">lat_ca &lt;- 39.164141
lon_ca &lt;- -121.640625
lat_me &lt;- 45.213004
lon_me &lt;- -68.906250
inter &lt;- gcIntermediate(c(lon_ca, lat_ca), c(lon_me, lat_me), n=50, addStartEnd=TRUE)
lines(inter)</pre>
<p>This draws a great circle arc from California to Maine.</p>
<p><span class="tip">A single connection.</span><img src="http://flowingdata.com/wp-content/uploads/2011/05/1-single-connection1.jpg" alt="1 single connection" title="1-single-connection.jpg" border="0" width="575" height="391" /></p>
<p>Similarly, you can add another line simply by doing the same as above with different latitude and longitude coordinates. For example, you can draw a line from California to Texas, and you can use the <code>col</code> argument to set the line color to red.</p>
<pre class="brush: r; title: ; notranslate">lat_tx &lt;- 29.954935
lon_tx &lt;- -98.701172
inter2 &lt;- gcIntermediate(c(lon_ca, lat_ca), c(lon_tx, lat_tx), n=50, addStartEnd=TRUE)
lines(inter2, col=&quot;red&quot;)</pre>
<p><span class="tip">Okay, now let's do two connections.</span><img src="http://flowingdata.com/wp-content/uploads/2011/05/2-addtl-connection1.jpg" alt="2 addtl connection" title="2-addtl-connection.jpg" border="0" width="575" height="393" /></p>
<h2>Step 5. Load flight data</h2>
<p>So now you know how to do the hard part. You just have to iterate over latitude and longitude pairs to get the full map for a specific carrier. Start by loading the data with <code>read.csv()</code>, as shown below:</p>
<pre class="brush: r; title: ; notranslate">airports &lt;- read.csv(&quot;http://datasets.flowingdata.com/tuts/maparcs/airports.csv&quot;, header=TRUE) flights &lt;- read.csv(&quot;http://datasets.flowingdata.com/tuts/maparcs/flights.csv&quot;, header=TRUE, as.is=TRUE)</pre>
<p><span class="tip">3. I used a five-line Python script to aggregate data that I downloaded from the <a href="http://transtats.bts.gov/Tables.asp?DB_ID=120&DB_Name=Airline%20On-Time%20Performance%20Data&DB_Short_Name=On-Time">Bureau of Transportation Statistics</a>. The original file was about 50mb. You can aggregate in R, but it's usually a better idea to not load large-ish files in R. Luckily airport latitude and longitude coordinates were available on the page for <a href="http://stat-computing.org/dataexpo/2009/">Data Expo 2009</a>. Otherwise, I would've geocoded them myself, which I started to do and then got stalled by API limits.</span>This is processed data that I cleaned up for this tutorial. It's flight counts between each airport, categorized by airline<sup>3</sup>.</p>
<h2>Step 6. Draw multiple connections</h2>
<p>Data in. To map all connections for say, American Airlines, you filter as shown in line 3. Then you loop over each row of data, which has latitude/longitude for two two airports and the number of flights between them.</p>
<pre class="brush: r; title: ; notranslate">map(&quot;world&quot;, col=&quot;#f2f2f2&quot;, fill=TRUE, bg=&quot;white&quot;, lwd=0.05, xlim=xlim, ylim=ylim)

fsub &lt;- flights[flights$airline == &quot;AA&quot;,]
for (j in 1:length(fsub$airline)) {
	air1 &lt;- airports[airports$iata == fsub[j,]$airport1,]
	air2 &lt;- airports[airports$iata == fsub[j,]$airport2,]

	inter &lt;- gcIntermediate(c(air1[1,]$long, air1[1,]$lat), c(air2[1,]$long, air2[1,]$lat), n=100, addStartEnd=TRUE)

	lines(inter, col=&quot;black&quot;, lwd=0.8)
}
</pre>
<p>Here's the mess of black lines that you get from the above code.</p>
<p><span class="tip">Rough image of American Airlines flights.</span><img src="http://flowingdata.com/wp-content/uploads/2011/05/3b-solid.jpg" alt="3b solid" title="3b-solid.jpg" border="0" width="575" height="390" /></p>
<p>Not bad, but you can do better than that with some rearranging and coloring appropriately. </p>
<h2>Step 7. Color for clarity</h2>
<p>You learned how to change the color of lines back in step 4. You change the <code>col</code> argument in <code>lines()</code>. You hard-coded the color though. You could instead create a vector of colors that scaled from, say, light gray to black, and then pick a shade from farther down the vector for connections with more flights. That way connections with more flights would be more prominent.</p>
<p>If only there were a way to create a color scale automagically. Oh wait, there is. It's called <code>colorRampPalette()</code>. Pass it the base colors you want to use, and it'll fill in everything in between. More specifically, it creates a function that you can pass a number two, indicating how many shades you want to use. In the code below, we use 100 (lines 1 and 2).</p>
<p>Then you do the same as you did the previous step, but instead of setting all lines to black, you calculate the color based on how many fewer flights the current connection has compared to the maximum flight count. </p>
<pre class="brush: r; title: ; notranslate">pal &lt;- colorRampPalette(c(&quot;#f2f2f2&quot;, &quot;black&quot;))
colors &lt;- pal(100)

map(&quot;world&quot;, col=&quot;#f2f2f2&quot;, fill=TRUE, bg=&quot;white&quot;, lwd=0.05, xlim=xlim, ylim=ylim)

fsub &lt;- flights[flights$airline == &quot;AA&quot;,]
maxcnt &lt;- max(fsub$cnt)
for (j in 1:length(fsub$airline)) {
	air1 &lt;- airports[airports$iata == fsub[j,]$airport1,]
	air2 &lt;- airports[airports$iata == fsub[j,]$airport2,]

	inter &lt;- gcIntermediate(c(air1[1,]$long, air1[1,]$lat), c(air2[1,]$long, air2[1,]$lat), n=100, addStartEnd=TRUE)
	colindex &lt;- round( (fsub[j,]$cnt / maxcnt) * length(colors) )

	lines(inter, col=colors[colindex], lwd=0.8)
}</pre>
<p>Below is the map that you get. The problem is the longer, less prominent flights are obscuring the more popular connections, because they're being drawn on top. The above code just draws lines in the order that the data comes.</p>
<p><span class="tip">Use color to emphasize more prominent flights.</span><img src="http://flowingdata.com/wp-content/uploads/2011/05/3a-connection.jpg" alt="3a connection" title="3a-connection.jpg" border="0" width="575" height="391" /></p>
<p>To fix this we use the <a href="http://paulbutler.org/archives/visualizing-facebook-friends/">same method</a> that Paul Butler used for his Facebook map. We just order connection from least to greatest flight counts. That way less popular connections are drawn first and therefore, will be on the bottom, while the darker connections will be drawn on top.</p>
<pre class="brush: r; title: ; notranslate">pal &lt;- colorRampPalette(c(&quot;#f2f2f2&quot;, &quot;black&quot;))
pal &lt;- colorRampPalette(c(&quot;#f2f2f2&quot;, &quot;red&quot;))
colors &lt;- pal(100)

map(&quot;world&quot;, col=&quot;#f2f2f2&quot;, fill=TRUE, bg=&quot;white&quot;, lwd=0.05, xlim=xlim, ylim=ylim)

fsub &lt;- flights[flights$airline == &quot;AA&quot;,]
fsub &lt;- fsub[order(fsub$cnt),]
maxcnt &lt;- max(fsub$cnt)
for (j in 1:length(fsub$airline)) {
	air1 &lt;- airports[airports$iata == fsub[j,]$airport1,]
	air2 &lt;- airports[airports$iata == fsub[j,]$airport2,]

	inter &lt;- gcIntermediate(c(air1[1,]$long, air1[1,]$lat), c(air2[1,]$long, air2[1,]$lat), n=100, addStartEnd=TRUE)
	colindex &lt;- round( (fsub[j,]$cnt / maxcnt) * length(colors) )

	lines(inter, col=colors[colindex], lwd=0.8)
}</pre>
<p><span class="tip">Layer dark on top of light so that it's easier to read.</span><img src="http://flowingdata.com/wp-content/uploads/2011/05/3-airline1.jpg" alt="3 airline" title="3-airline.jpg" border="0" width="575" height="393" /></p>
<p>That's much better. At this point, you can play around with color, by changing the shades in <code>colorRampPalette()</code>. Below uses a light gray (#f2f2f2) to red, but you can do whatever you like. You can even use more than two colors.</p>
<pre class="brush: r; title: ; notranslate">pal &lt;- colorRampPalette(c(&quot;#f2f2f2&quot;, &quot;red&quot;))</pre>
<p><span class="tip">Red? Sure, you can do that, too.</span><img src="http://flowingdata.com/wp-content/uploads/2011/05/4-airline-color.jpg" alt="4 airline color" title="4-airline-color.jpg" border="0" width="575" height="390" /></p>
<h2>Step 8. Map every carrier</h2>
<p>The only thing left to do is make a map for each carrier. You can do it manually by changing the airline code and the rerunning the script, but there's an easier way. Find all unique carriers with the <code>unique()</code> function, and iterate over each one. The place that you put "AA" you replace with <code>carriers[i]</code> to indicate the current carrier in the loop. The code below will create a PDF for each carrier and save to your current working directory.</p>
<pre class="brush: r; title: ; notranslate"># Unique carriers
carriers &lt;- unique(flights$airline)

# Color
pal &lt;- colorRampPalette(c(&quot;#333333&quot;, &quot;white&quot;, &quot;#1292db&quot;))
colors &lt;- pal(100)

for (i in 1:length(carriers)) {

	pdf(paste(&quot;carrier&quot;, carriers[i], &quot;.pdf&quot;, sep=&quot;&quot;), width=11, height=7)
	map(&quot;world&quot;, col=&quot;#191919&quot;, fill=TRUE, bg=&quot;#000000&quot;, lwd=0.05, xlim=xlim, ylim=ylim)
	fsub &lt;- flights[flights$airline == carriers[i],]
	fsub &lt;- fsub[order(fsub$cnt),]
	maxcnt &lt;- max(fsub$cnt)
	for (j in 1:length(fsub$airline)) {
		air1 &lt;- airports[airports$iata == fsub[j,]$airport1,]
		air2 &lt;- airports[airports$iata == fsub[j,]$airport2,]

		inter &lt;- gcIntermediate(c(air1[1,]$long, air1[1,]$lat), c(air2[1,]$long, air2[1,]$lat), n=100, addStartEnd=TRUE)
		colindex &lt;- round( (fsub[j,]$cnt / maxcnt) * length(colors) )

		lines(inter, col=colors[colindex], lwd=0.6)
	}

	dev.off()
}</pre>
<p>Here is the map for American Airlines again, produced by the code above. I fiddled with color some to match the maps I created for the <a href="http://flowingdata.com/2011/05/05/where-do-major-airlines-fly-in-the-united-states/">original flight post</a>.</p>
<p><span class="tip">A dark map background with gray to white to blue paths.</span><img src="http://flowingdata.com/wp-content/uploads/2011/05/5-black-theme1.jpg" alt="5 black theme" title="5-black-theme.jpg" border="0" width="575" height="389" /></p>
<p>That's all there is to it. Can you think of other datasets this method could be applied to? Give this tutorial a whirl and post your results in the comments. </p>
<p>For more examples, guidance, and all-around data goodness like this, <a href="http://flowingdata.com/membership/">sign up for FlowingData membership</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://flowingdata.com/2011/05/11/how-to-map-connections-with-great-circles/feed/</wfw:commentRss>
		<slash:comments>48</slash:comments>
		</item>
		<item>
		<title>How to Make Bubble Charts</title>
		<link>http://flowingdata.com/2010/11/23/how-to-make-bubble-charts/</link>
		<comments>http://flowingdata.com/2010/11/23/how-to-make-bubble-charts/#comments</comments>
		<pubDate>Tue, 23 Nov 2010 08:25:48 +0000</pubDate>
		<dc:creator>Nathan Yau</dc:creator>
				<category><![CDATA[Tutorials]]></category>
		<category><![CDATA[bubbles]]></category>

		<guid isPermaLink="false">http://flowingdata.com/?p=12845</guid>
		<description><![CDATA[<p><a href="http://flowingdata.com/2010/11/23/how-to-make-bubble-charts/"><img width="625" height="419" src="http://flowingdata.com/wp-content/uploads/2010/11/5-edited-version1-625x419.png" class="attachment-medium wp-post-image" alt="Crime Rates by State" title="Crime Rates by State" /></a></p>Ever since Hans Rosling presented a motion chart to tell his story of the wealth and health of nations, there has been an affinity for proportional bubbles on an x-y axis. This tutorial is for the static version of the motion chart: the bubble chart.]]></description>
			<content:encoded><![CDATA[<p><a href="http://flowingdata.com/2010/11/23/how-to-make-bubble-charts/"><img width="625" height="419" src="http://flowingdata.com/wp-content/uploads/2010/11/5-edited-version1-625x419.png" class="attachment-medium wp-post-image" alt="Crime Rates by State" title="Crime Rates by State" /></a></p><p>A bubble chart can also just be straight up proportionally sized bubbles, but here we're going to cover how to create the variety that is like a scatterplot with a third, bubbly dimension.</p>
<p>The advantage of this chart type is that it lets you compare three variables at once. One is on the x-axis, one is on the y-axis, and the third is represented by area size of bubbles. Have a look at <a href="http://flowingdata.com/2010/11/23/how-to-make-bubble-charts/5-edited-version-2/">the final chart</a> to see what we're making.</p>
<h2>Step 0. Download R</h2>
<p>We're going to use R to do this, so <a href="http://www.r-project.org/">download that</a> before moving on. It's free and open-source, so you have nothing to lose. Plus it's a <a href="http://flowingdata.com/2010/11/17/r-is-the-need-to-know-stat-software/">need-to-know-name of 2011</a>, so you might as well get to know it now. You can thank me later.</p>
<h2>Step 1. Load the data</h2>
<p>Assuming you already have R open, the first thing we'll do is load the data. We're examining the same crime data the we did for our last tutorial. I've added state population this time around. One note about the data. The crime numbers are actually for 2005, while the populations are for 2008. This isn't a huge deal since we're more interested in relative populations than we are the raw values, but keep that in mind. </p>
<p>Okay, moving on. You can download the tab-delimited file <a href="http://datasets.flowingdata.com/crimeRatesByState2005.csv">here</a> and keep it local, but the easiest way is to load it directly into R with the below line of code:</p>
<p>
<pre class="brush: r; title: ; notranslate">crime &lt;- read.csv(&quot;http://datasets.flowingdata.com/crimeRatesByState2008.csv&quot;, header=TRUE, sep=&quot;\t)
</pre>
</p>
<p>You're telling R to download the data and read it as a comma-delimited file with a header. This loads it as a data frame in the <code>crime</code> variable.</p>
<h2>Step 2. Draw some circles</h2>
<p>Now we can get right to drawing circles with the <code>symbols()</code> command. Pass it values for the x-axis, y-axis, and circles, and it'll spit out a bubble chart for you.</p>
<p>
<pre class="brush: r; title: ; notranslate">symbols(crime$murder, crime$burglary, circles=crime$population)
</pre>
</p>
<p>Run the line of code above, and you'll get this:</p>
<p><span class="tip">Circles incorrectly sized by radius instead of area. Large values appear much bigger.</span><img class="alignnone size-medium wp-image-12846" title="1-wrong-sized-circles" src="http://flowingdata.com/wp-content/uploads/2010/11/1-wrong-sized-circles-575x519.png" alt="" width="575" height="519" /></p>
<p>All done, right? Wrong. That was a test. The above sizes the radius of the circles by population. We want to size them by <em>area</em>. The relative proportions are all out of wack if you size by radius. </p>
<h2>Step 3. Size the circles correctly</h2>
<p>To size radiuses correctly, we look to the equation for area of a circle:</p>
<p>Area of circle = &#960;r<sup>2</sup></p>
<p>In this case area of the circle is population. We want to know <em>r</em>. Move some things around and we get this:</p>
<p>r = &#8730;(Area of circle / &#960;)</p>
<p>Substitute population for the area of the circle, and translate to R, and we get this:</p>
<p>
<pre class="brush: r; title: ; notranslate">radius &lt;- sqrt( crime$population/ pi )
symbols(crime$murder, crime$burglary, circles=radius)
</pre>
</p>
<p><span class="tip">Circles correctly sized by area, but the range of sizes is too much. The chart is cluttered and unreadable.</span><img src="http://flowingdata.com/wp-content/uploads/2010/11/2-correctsize-too-big-575x530.png" alt="" title="2-correctsize-too-big" width="575" height="530" class="alignnone size-medium wp-image-12847" /></p>
<p>Yay. Properly scaled circles. They're way too big though for this chart to be useful. By default, <code>symbols()</code> sizes the largest bubble to one inch, and then scales the rest accordingly. We can change that by using the <code>inches</code> argument. Whatever value you put will take the place of the one-inch default. While we're at it, let's add color and change the x- and y-axis labels.</p>
<p>
<pre class="brush: r; title: ; notranslate">symbols(crime$murder, crime$burglary, circles=radius, inches=0.35, fg=&quot;white&quot;, bg=&quot;red&quot;, xlab=&quot;Murder Rate&quot;, ylab=&quot;Burglary Rate&quot;)
</pre>
</p>
<p>Notice we use <code>fg</code> to change border color, <code>bg</code> to change fill color. Here's what we get:</p>
<p><span class="tip">Scale the circles to make the the chart more readable, and use the <code>fg</code> and <code>bg</code> arguments to change colors.</span><img src="http://flowingdata.com/wp-content/uploads/2010/11/3-sized-circles-by-area-575x530.png" alt="" title="3-sized-circles-by-area" width="575" height="530" class="alignnone size-medium wp-image-12848" /></p>
<p>Now we're getting somewhere.</p>
<p>By the way, you can make a chart with other shapes too with <code>symbols()</code>. You can make squares, rectangles, thermometers, boxplots, and stars. They take different arguments than the circle. The squares, for example, are sized by the length of a side. Again, make sure you size them appropriately.</p>
<p>Here's what squares look like, using the below line of code.</p>
<p>
<pre class="brush: r; title: ; notranslate">symbols(crime$murder, crime$burglary, squares=sqrt(crime$population), inches=0.5)</pre>
</p>
<p><span class="tip">You can use squares sized by area instead of circles, too.</span><img src="http://flowingdata.com/wp-content/uploads/2010/11/crime-squares-no-labels-575x457.png" alt="" title="crime-squares-no-labels" width="575" height="457" class="alignnone size-medium wp-image-12863" /></p>
<p>Let's stick with circles for now.</p>
<h2>Step 4. Add labels</h2>
<p>As it is, the chart shows some sense of distribution, but we don't know which circle represents each state. So let's add labels. We do this with <code>text()</code>, whose arguments are x-coordinates, y-coordinates, and the actual text to print. We have all of these. Like the bubbles, the <em>x</em> is murders and the <em>y</em> is burglaries. The actual labels are state names, which is the first column in our data frame.</p>
<p>With that in mind, we do this:</p>
<p>
<pre class="brush: r; title: ; notranslate">text(crime$murder, crime$burglary, crime$state, cex=0.5)
</pre>
</p>
<p>The <code>cex</code> argument controls text size. It is 1 by default. Values greater than one will make the labels bigger and the opposite for less than one. The labels will center on the x- and y-coordinates.</p>
<p>Here's what it looks like.</p>
<p><span class="tip">Add labels so you know what each circle represents.</span><img src="http://flowingdata.com/wp-content/uploads/2010/11/4-added-labels-575x521.png" alt="" title="4-added-labels" width="575" height="521" class="alignnone size-medium wp-image-12849" /></p>
<h2>Step 5. Clean up</h2>
<p>Finally, as per usual, I clean up in Adobe Illustrator. You can mess around with this in R, if you like, but I've found it's way easier to save my file as a PDF and do what I want with Illustrator. I uncluttered the state labels to make them more readable, rotated the y-axis labels, so that they're horizontal, added a legend for population, and removed the outside border. I also brought Georgia to the front, because most of it was hidden by Texas.</p>
<p>Here's the <a href="http://flowingdata.com/2010/11/23/how-to-make-bubble-charts/5-edited-version-2/">final version</a>. Click the image to see it in full.</p>
<p><span class="tip">Cleanup and a key make the chart more informative.</span><a href="http://flowingdata.com/2010/11/23/how-to-make-bubble-charts/5-edited-version-2/" rel="attachment wp-att-12941"><img src="http://flowingdata.com/wp-content/uploads/2010/11/5-edited-version1-575x385.png" alt="" title="5-edited-version" width="575" height="385" class="alignnone size-medium wp-image-12941" /></a></p>
<p>And there you go. Type in <code>?symbols</code> in R for more plotting options. Go wild.</p>
<p>For more examples, guidance, and all-around data goodness like this, <a href="http://book.flowingdata.com/">buy Visualize This</a>, the new FlowingData book.</p>
]]></content:encoded>
			<wfw:commentRss>http://flowingdata.com/2010/11/23/how-to-make-bubble-charts/feed/</wfw:commentRss>
		<slash:comments>63</slash:comments>
		</item>
		<item>
		<title>How to visualize data with cartoonish faces ala Chernoff</title>
		<link>http://flowingdata.com/2010/08/31/how-to-visualize-data-with-cartoonish-faces/</link>
		<comments>http://flowingdata.com/2010/08/31/how-to-visualize-data-with-cartoonish-faces/#comments</comments>
		<pubDate>Tue, 31 Aug 2010 07:48:57 +0000</pubDate>
		<dc:creator>Nathan Yau</dc:creator>
				<category><![CDATA[Tutorials]]></category>
		<category><![CDATA[Chernoff Faces]]></category>
		<category><![CDATA[multivariate]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://flowingdata.com/?p=11227</guid>
		<description><![CDATA[<p><a href="http://flowingdata.com/2010/08/31/how-to-visualize-data-with-cartoonish-faces/"><img width="613" height="312" src="http://flowingdata.com/wp-content/uploads/2010/08/How-to-visualize-data-with-cartoonish-faces.png" class="attachment-medium wp-post-image" alt="How to visualize data with cartoonish faces" title="How to visualize data with cartoonish faces" /></a></p>The goal of Chernoff faces is to show a bunch of variables at once via facial features like lips, eyes, and nose size. Most of the time there are better solutions, but the faces can be interesting to work with.]]></description>
			<content:encoded><![CDATA[<p><a href="http://flowingdata.com/2010/08/31/how-to-visualize-data-with-cartoonish-faces/"><img width="613" height="312" src="http://flowingdata.com/wp-content/uploads/2010/08/How-to-visualize-data-with-cartoonish-faces.png" class="attachment-medium wp-post-image" alt="How to visualize data with cartoonish faces" title="How to visualize data with cartoonish faces" /></a></p><p>FlowingData reader Chris asks:</p>
<blockquote><p>I was wondering, have you ever considered doing a Chernoff faces tutorial for R? I think Chernoff faces are pretty interesting and I haven't seen much about them on the web.</p></blockquote>
<p>This wasn't the first time someone's asked how to make Chernoff faces, so I did a quick search. Guess what. There's an <a href="http://bm2.genes.nig.ac.jp/RGM2/R_current/library/aplpack/man/faces.html">R package for that</a>. This tutorial describes how to apply Chernoff faces to your own data.</p>
<h2>Chernoff Faces</h2>
<p>The point of Chernoff faces is to display multiple variables at once by positioning parts of the human face, such as ears, hair, eyes, and nose, based on numbers in a dataset. The assumption is that we can read people's faces easily in real life, so we should be able to recognize small differences when they represent data. Now that's a pretty big assumption, but debate aside, they're fun to make. </p>
<p><span class="tip">1. Because these are faces rather than abstract geometric shapes, be careful what you show with this method and who you show it to. As was the case in this tutorial, those who aren't familiar with the method might take the faces literally and take offense.</span>We've seen them <a href="http://flowingdata.com/2008/04/04/chernoff-faces-to-display-baseball-managers-from-2007-mlb-season/">applied to baseball players</a> and <a href="http://en.wikipedia.org/wiki/File:Chernoff_faces_for_evaluations_of_US_judges.svg">judge ratings</a>. In this tutorial, we'll look at US crime rate by state.<sup>1</sup></p>
<h2>Download R</h2>
<p>Like in previous tutorials, we'll be using R (surprise, surprise), the software environment for statistical computing and graphics, to make our Chernoff faces, so if you haven't already,<a href="http://cran.stat.ucla.edu/"> download and install R</a> first before moving on. It's free, open-source, and a one-click install. Go on, I'll wait for you.</p>
<h2>Step 1. Install package</h2>
<p>Once you've opened up R, the first thing we need to do is install the <a href="http://cran.r-project.org/web/packages/aplpack/index.html">aplpack</a> (Another Plot Package) package by Peter Wolf. Go to the the "Packages & Data" menu in R, and select the "Package Installer." Select "CRAN (binaries)" in the dropdown menu if it's not already on that, and then click on "Get List." Scroll down to "aplpack" and click on the "Install Selected" button and installation should begin.</p>
<p><span class="tip">The Another Plot Package will do most of the grunt work.</span><img src="http://flowingdata.com/wp-content/uploads/2010/08/Install-R-package-550x600.png" alt="" title="Install R package" width="550" height="600" class="alignnone size-medium wp-image-11241" /></p>
<p>Alternatively, you can also just type this in the R console:</p>
<p><code>install.packages("aplpack")</code></p>
<h2>Step 2. Load the data</h2>
<p>Next we need to load the data into the R environment. Like I said, we'll be looking at crime rates by state. I got the data from <a href="http://infochimps.org/datasets/crime-rates-by-state-2004-and-2005-and-by-type-2005-cleaned-up-v--2">Infochimps</a>, which is actually from Table 301 of the 2008 US Statistical Abstract, but it's typically a headache going through dot gov navigation, so I avoid it when I can. </p>
<p>I cleaned the datafile I got from Infochimps a little bit more so it only includes the numbers we're interested in. You can find it <a href="http://datasets.flowingdata.com/crimeRatesByState-formatted.csv">here</a>, but you don't need to download it. We'll load it directly into R via the URL using the <code>read.csv()</code> command.</p>
<p><code>crime <- read.csv("http://datasets.flowingdata.com/crimeRatesByState-formatted.csv")</code></p>
<p>To view the data, type the following:</p>
<p><code>crime[1:6,]</code></p>
<p>This shows you the first six lines of our dataset. Note that there are eight columns. The first column is state name, with the exception of the row for US average and District of Columbia later on. The rest of the columns are seven categories of crime.</p>
<h2>Step 3. Make some faces</h2>
<p>Once the data is in, it's actually really easy to make some faces using the <code>faces()</code> function from the <code>aplpack</code> package. So far we've only installed the package, so now we'll load it:</p>
<p><code>library(aplpack)</code></p>
<p>If you get errors when you try to load, you might want to check to see if you installed the package correctly.</p>
<p>Okay, let's make some faces:</p>
<p><code>faces(crime[,2:8])</code></p>
<p>Here we're telling R to use the <code>faces()</code> function, using columns 2 through 8 of our crime data. Remember, the first column is state name. You get something that <a href="http://flowingdata.com/?attachment_id=11249">looks like this</a>:</p>
<p><span class="tip">Default Chernoff Faces using <code>faces()</code></span><img src="http://flowingdata.com/wp-content/uploads/2010/08/Chernoff-faces-showing-crime-part-1-550x605.png" alt="" title="Chernoff faces showing crime - part 1" width="550" height="605" class="alignnone size-medium wp-image-11249" /></p>
<h2>Step 4. Change Features</h2>
<p>This is pretty much what we want except for two things. The first is that the faces are labeled with numbers. That isn't of much use without a key. The second is that some of the faces are smiling. For more positive datasets like quality of life or baseball stats, that would make sense. The higher the value, the better. This is crime data though. The higher the value, the worse. Smiles for rate of larceny theft doesn't seem quite right.</p>
<p>Unfortunately, the <code>faces()</code> function doesn't let us choose what face parts to associate with each metric, so we need to find a workaround. According to the documentation (view by typing <code>?faces</code>), the curve of the smile is applied to the sixth column in the input matrix, which is <code>crime</code> in this case.</p>
<p>Ah. Here's what we'll do. We make the sixth column in our data all the same value. That way all smile curves will be neutral. Here's how we can do that:</p>
<p><code>crime_filled <- cbind(crime[,1:6], rep(0, length(crime$state)), crime[,7:8])</code></p>
<p>The <code>cbind()</code> function combines multiple columns to form a matrix. In the above, we combine the first six columns of <code>crime</code>, stick a column of zeros whose length matches the number of rows in our crime data, and then we end with the last two columns in <code>crime</code>. We save the new matrix into a variable called <code>crime_filled</code>. Similar to in Step 2, you can type the following to see the first rows of <code>crime_filled</code>.</p>
<p><code>crime_filled[1:6,]</code></p>
<p>Notice the new column of zeros?</p>
<p>Now use <code>faces()</code> with <code>crime_filled</code>: </p>
<p><code>faces(crime_filled[,2:8])</code></p>
<p>We get similar faces, but with no more smiles:</p>
<p><span class="tip">Using different features to indicate variables</span><img src="http://flowingdata.com/wp-content/uploads/2010/08/Chernoff-faces-with-no-smiles-550x613.png" alt="" title="Chernoff faces with no smiles" width="550" height="613" class="alignnone size-medium wp-image-11255" /></p>
<h2>Step 5. Add labels</h2>
<p>Instead of numbers, it'd be much more useful to include state names. Easy.</p>
<p><code>faces(crime_filled[,2:8], labels=crime_filled$state)</code></p>
<p>It's the same as previous, but we use the <code>labels</code> argument to use the <code>state</code> column in <code>crime_filled</code> to label with state names.</p>
<p><span class="tip">Add state name labels so it's not so ambiguous.</span><img src="http://flowingdata.com/wp-content/uploads/2010/08/Chernoff-faces-with-state-names-550x571.png" alt="" title="Chernoff faces with state names" width="550" height="571" class="alignnone size-medium wp-image-11258" /></p>
<p>Much more useful now. We can easily associate the faces with a state. It's a little cluttered, but we can fix that up easy in Illustrator.</p>
<h2>Step 6. Fix up in Illustrator (optional)</h2>
<p>You can pretty much stop here if you like, but as most of you know, I like to save the image as a PDF, bring it into <a href="http://www.amazon.com/gp/product/B003B32AOW?ie=UTF8&tag=flowingdata-20&linkCode=as2&camp=1789&creative=390957&creativeASIN=B003B32AOW">Adobe Illustrator</a> (aff), and clean things up to make it more readable. You can also try <a href="http://www.inkscape.org/">Inkscape</a>, the open-source alternative, although I've never tried it.</p>
<p>After some label cleanup and some annotation, <a href="http://flowingdata.com/?attachment_id=11265">here's our final result</a>. What's going on there Washington, D.C.?</p>
<p><span class="tip">Uncluttered labels, unambiguous features, and cited data source</span><a href="http://flowingdata.com/2010/08/31/how-to-visualize-data-with-cartoonish-faces/crime-chernoff-faces-by-state-edited-2/" rel="attachment wp-att-20488"><img src="http://flowingdata.com/wp-content/uploads/2010/08/Crime-Chernoff-Faces-by-state-edited1-625x871.gif" alt="" title="Crime Chernoff Faces by state edited" width="625" height="871" class="alignnone size-medium wp-image-20488" /></a></p>
<p>Not too bad, right?</p>
<p>Read the <a href="http://bm2.genes.nig.ac.jp/RGM2/R_current/library/aplpack/man/faces.html">R documentation</a> on <code>faces()</code> for more details on what else you can do with the function. Remember, documentation is your friend when it comes to making full use of R.</p>
<p>Now go on. Have some fun with your new Chernoff toy.</p>
<p>For more examples, guidance, and all-around data goodness like this, <a href="http://book.flowingdata.com/">order Visualize This</a>, the FlowingData book on visualization, statistics, and design.</p>
]]></content:encoded>
			<wfw:commentRss>http://flowingdata.com/2010/08/31/how-to-visualize-data-with-cartoonish-faces/feed/</wfw:commentRss>
		<slash:comments>21</slash:comments>
		</item>
		<item>
		<title>How to: make a scatterplot with a smooth fitted line</title>
		<link>http://flowingdata.com/2010/03/29/how-to-make-a-scatterplot-with-a-smooth-fitted-line/</link>
		<comments>http://flowingdata.com/2010/03/29/how-to-make-a-scatterplot-with-a-smooth-fitted-line/#comments</comments>
		<pubDate>Mon, 29 Mar 2010 08:20:49 +0000</pubDate>
		<dc:creator>Nathan Yau</dc:creator>
				<category><![CDATA[Tutorials]]></category>
		<category><![CDATA[LOESS]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://flowingdata.com/?p=6479</guid>
		<description><![CDATA[<p><a href="http://flowingdata.com/2010/03/29/how-to-make-a-scatterplot-with-a-smooth-fitted-line/"><img width="625" height="425" src="http://flowingdata.com/wp-content/uploads/2012/01/loessplot-revised.png" class="attachment-medium wp-post-image" alt="Loess Plot" title="Loess Plot" /></a></p>Oftentimes, you'll want to fit a line to a bunch of data points. This tutorial will show you how to do that quickly and easily using open-source software, R.]]></description>
			<content:encoded><![CDATA[<p><a href="http://flowingdata.com/2010/03/29/how-to-make-a-scatterplot-with-a-smooth-fitted-line/"><img width="625" height="425" src="http://flowingdata.com/wp-content/uploads/2012/01/loessplot-revised.png" class="attachment-medium wp-post-image" alt="Loess Plot" title="Loess Plot" /></a></p><p>Maybe you have observations over time or it might be two variables that are possibly related. In either case, a scatter plot just might not be enough to see something useful. A fitted line can let you see a trend or relationship more easily.</p>
<p>As an example, we'll take a look at monthly unemployment data, from 1948 to February this year, according to the <a href="http://www.bls.gov/bls/unemployment.htm">Bureau of Labor Statistics</a>.</p>
<h2>What LOESS is</h2>
<p>First, let's briefly go over what we're actually doing with this loess thing. LOESS stands for locally weighted scatterplot smoothing. It was <a href="http://www.econ.pdx.edu/faculty/KPL/readings/cleveland88.pdf">developed</a> [pdf] in 1988 by William Cleveland and Susan Devlin, and it's a way to fit a curve to a dataset.</p>
<p>If we plot unemployment without any lines or anything fancy, it looks like this:</p>
<p><span class="tip">Dot plot showing unemployment over time</span><img src="http://flowingdata.com/wp-content/uploads/2010/03/plain-plot.png" alt="" title="plain-plot" width="545" height="322" class="alignnone size-full wp-image-6592" /></p>
<p>Most of us are familiar with fitting just a plain old straight line. The end result is a slope and an intercept. You know the whole <em>y=mx + b</em> equation back from middle school?</p>
<p><span class="tip">Scatterplot with a linear fit, <em>y = mx + b</em></span><img src="http://flowingdata.com/wp-content/uploads/2010/03/linear-fit.png" alt="" title="linear-fit" width="545" height="322" class="alignnone size-full wp-image-6594" /></p>
<p>So without going into the nitty-gritty, the above fit looks at all the data and then fits a line. Loess however, moves along the dataset, and looks at chunks at a time, fitting a bunch of smaller lines that connect to make one smooth line.</p>
<p>Alright, enough background. On to the how-to.</p>
<h3>Step 0. Download R</h3>
<p>You've already done this, right? If not, you can download it for <a href="http://cran.stat.ucla.edu/bin/windows/base/">Windows</a>, <a href="http://cran.stat.ucla.edu/bin/macosx/">Mac</a>, or <a href="http://cran.stat.ucla.edu/bin/linux/">Linux</a>. Don't let the out-dated site full you. You can get a lot done with the free software, and it'll be a simple one-click install for most.</p>
<h3>Step 1. Load the data</h3>
<p>Like I said, I got the data from the Bureau of Labor Statistics. You can download it <a href="http://datasets.flowingdata.com/unemployment-rate-1948-2010.csv">here</a> in CSV format if you like, but we'll load it directly into R with the following:</p>
<p><code>unemployment &lt;- read.csv("http://datasets.flowingdata.com/unemployment-rate-1948-2010.csv", sep=",")</code></p>
<p>You're basically telling R to load data in the <code>unemployment</code> variable from the given URL, and columns are separated by commas.</p>
<p>Once it's loaded, take a brief look by typing <code>unemployment[1:10,]</code>. Your screen will look something like this:</p>
<p><span class="tip">As usual, you load your data in R before you start anything else</span><img src="http://flowingdata.com/wp-content/uploads/2010/03/datashot-545x508.png" alt="" title="datashot" width="545" height="508" class="alignnone size-medium wp-image-6601" /></p>
<p>There are four columns, but we're actually just going to use that last one: <code>Value</code>.</p>
<h3>Step 2. Time to plot</h3>
<p>Yup, it's already time to make the scatterplot with fitted curve:</p>
<p><code>scatter.smooth(x=1:length(unemployment$Value), y=unemployment$Value)</code></p>
<p>Since we're only looking at unemployment, the x-axis is just a sequence from 1 to the total number of observations. Here's what the above line will give you.</p>
<p><span class="tip">Fit a LOESS curve to the dots</span><img src="http://flowingdata.com/wp-content/uploads/2010/03/loessplot.png" alt="" title="loessplot" width="542" height="327" class="alignnone size-full wp-image-6604" /></p>
<p>Not bad, right? Two lines of code, and you've already got your plot. We can do a little better though. Let's fix it up a bit.</p>
<h3>Step 3. Modify axis limits</h3>
<p>It's usually a good idea to start your values axis at zero if you can. The above graph doesn't start at zero, so let's fix that using the <code>ylim</code> argument to make it go from 0 to 11.</p>
<p><code>scatter.smooth(x=1:length(unemployment$Value), y=unemployment$Value, ylim=c(0,11))</code></p>
<p><span class="tip">Update the axes to start at zero</span><img src="http://flowingdata.com/wp-content/uploads/2010/03/loess2.png" alt="" title="loess2" width="543" height="326" class="alignnone size-full wp-image-6607" /></p>
<p>That's a little better. Now let's do something about the color.</p>
<h3>Step 4. Modify colors</h3>
<p>I want the curve to stand out some more. Everything blends together as it is now. We'll use the <code>col</code> argument to change the dots to light gray:</p>
<p><code>scatter.smooth(x=1:length(unemployment$Value), y=unemployment$Value, ylim=c(0,11), col="#CCCCCC")</code></p>
<p><span class="tip">Make the fitted the line the point of interest and put dots in the background</span><img src="http://flowingdata.com/wp-content/uploads/2010/03/loess-light.png" alt="" title="loess-light" width="543" height="325" class="alignnone size-full wp-image-6608" /></p>
<h3>Step 5. Save as PDF and do whatever</h3>
<p>So at this point, you can fuss around with arguments to tweak. Just type <code>?scatter.smooth</code> to read documentation on the function. As many of you know though, I like to take it into Adobe Illustrator at this point. This just happens to be what works for me. There are lots of ways to edit PDF files.</p>
<p>Anyways, after some color changes, and label cleanup, we're done.</p>
<p><span class="tip">Title, color, cite, and fonts</span><img src="http://flowingdata.com/wp-content/uploads/2010/03/loessplot-final.png" alt="" title="US Unemployment" width="545" height="377" class="alignnone size-full wp-image-6612" /></p>
<p>Tada. And it only took two lines of code. How about that? Give it a try for yourself, and happy graphing.</p>
<p>For more examples, guidance, and all-around data goodness like this, <a href="http://flowingdata.com/book/">pre-order Visualize This</a>, the upcoming FlowingData book.</p>
]]></content:encoded>
			<wfw:commentRss>http://flowingdata.com/2010/03/29/how-to-make-a-scatterplot-with-a-smooth-fitted-line/feed/</wfw:commentRss>
		<slash:comments>38</slash:comments>
		</item>
		<item>
		<title>An Easy Way to Make a Treemap</title>
		<link>http://flowingdata.com/2010/02/11/an-easy-way-to-make-a-treemap/</link>
		<comments>http://flowingdata.com/2010/02/11/an-easy-way-to-make-a-treemap/#comments</comments>
		<pubDate>Thu, 11 Feb 2010 08:35:53 +0000</pubDate>
		<dc:creator>Nathan Yau</dc:creator>
				<category><![CDATA[Tutorials]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[treemap]]></category>

		<guid isPermaLink="false">http://flowingdata.com/?p=5299</guid>
		<description><![CDATA[<p><a href="http://flowingdata.com/2010/02/11/an-easy-way-to-make-a-treemap/"><img width="425" height="269" src="http://flowingdata.com/wp-content/uploads/2012/02/Treemap1.png" class="attachment-medium wp-post-image" alt="Treemap" title="Treemap" /></a></p>If your data is a hierarchy, a treemap is a good way to show all the values at once and keep the structure in the visual. This is a quick way to make a treemap in R.]]></description>
			<content:encoded><![CDATA[<p><a href="http://flowingdata.com/2010/02/11/an-easy-way-to-make-a-treemap/"><img width="425" height="269" src="http://flowingdata.com/wp-content/uploads/2012/02/Treemap1.png" class="attachment-medium wp-post-image" alt="Treemap" title="Treemap" /></a></p><p>Back in 1990, <a href="http://www.cs.umd.edu/~ben/">Ben Shneiderman</a>, of the University of Maryland, <a href="http://www.cs.umd.edu/hcil/treemap-history/index.shtml">wanted to visualize</a> what was going on in his always-full hard drive. He wanted to know what was taking up so much space. Given the hierarchical structure of directories and files, he first tried a <a href="http://en.wikipedia.org/wiki/Tree_%28data_structure%29">tree diagram</a>. It got too big too fast to be useful though. Too many nodes. Too many branches.</p>
<p>The <a href="http://en.wikipedia.org/wiki/Treemapping">treemap</a> was his solution. It's an area-based visualization where the size of each rectangle represents a metric since made popular by Martin Wattenberg's <a href="http://www.smartmoney.com/map-of-the-market/">Map of the Market</a> and Marcos Weskamp's <a href="http://newsmap.jp/">newsmap</a>.</p>
<p>Here's a really easy way to make your own treemap in just a couple lines of code. We're looking to make something like the above.</p>
<h3>Step 0. Download R</h3>
<p>Like before, we're going to use R, so you'll want to get it before going any further. Download it for <a href="http://cran.stat.ucla.edu/bin/windows/base/">Windows</a>, <a href="http://cran.stat.ucla.edu/bin/macosx/">Mac</a>, or <a href="http://cran.stat.ucla.edu/bin/linux/">Linux</a>. Don't let the out-dated site full you. You can get a lot done with the free software.</p>
<h3>Step 1. Load the Data</h3>
<p>We'll use data covering a hundred popular posts on FlowingData. Here it is in <a href="http://datasets.flowingdata.com/post-data.txt">CSV format</a>. You don't have to download it though. We'll just load it directly into R. The main thing to take note of is what is there. There's post id, number of views, number of comments, and category.</p>
<p>Okay, let's load it into R using <code>read.csv()</code>:</p>
<p><code>data &lt;- read.csv("http://datasets.flowingdata.com/post-data.txt")</code></p>
<p><span class="tip">Loading data in CSV format into R.</span><img class="alignnone size-medium wp-image-5333" title="step1" src="http://flowingdata.com/wp-content/uploads/2010/02/step1-545x477.png" alt="" width="545" height="477" /></p>
<p>Easy enough. We just used the <code>read.csv()</code> function to load data from a URL. If your data is on your computer, you could also do something like <code>data &lt;- read.csv("post-data.txt")</code>. Just make sure the data file is in your current working directory, which you can change via the "Miscellaneous" menu.</p>
<h3>Step 2. Load the Portfolio Library</h3>
<p>Only a few more lines of code, and you've got a treemap. It's so easy, because we're going to use the <code>portfolio</code> library in R. First, you have to install it. You can either install the library via the "Package Installer" or you can do it through the command line. Let's do the latter. Type this in the console to install <code>portfolio</code>:</p>
<p><code>install.packages("portfolio")</code></p>
<p>Once installed, load it into R:</p>
<p><code>library(portfolio)</code></p>
<h3>Step 3. Make the Treemap</h3>
<p>It's time to make the treemap with <code>map.market()</code>. Type this in the console:</p>
<p><code>map.market(id=data$id, area=data$views, group=data$category, color=data$comments, main="FlowingData Map")</code></p>
<p>Tada. You should get something like this:</p>
<p><span class="tip">The default treemap uses a red-green color scale.</span><img class="alignnone size-medium wp-image-5314" title="original" src="http://flowingdata.com/wp-content/uploads/2010/02/original-545x471.png" alt="" width="545" height="471" /></p>
<p>To sum up, we did this with four lines of code:</p>
<p><code>data &lt;- read.csv("http://datasets.flowingdata.com/post-data.txt")<br />
install.packages("portfolio")<br />
library(portfolio)<br />
map.market(id=data$id, area=data$views, group=data$category, color=data$comments, main="FlowingData Map")</code></p>
<h3>Step 4. Customize</h3>
<p>Now maybe you want to modify something like color. The cool thing about R is that you can see the code for all the functions, edit it, and then use your customized version. If the green and red scheme isn't for you or you don't care about the positive/negative cutoff, then you can change the code to do that. I won't go into detail, but if you type <code>map.market</code> in the console, you'll see the function. You can change color or cutoff around lines 36-46.</p>
<p>For example, you can do a black and white color scheme:</p>
<p><span class="tip">You don't have to stick to the default color scale though.</span><img class="alignnone size-medium wp-image-5340" title="black-white" src="http://flowingdata.com/wp-content/uploads/2010/02/black-white-545x453.png" alt="" width="545" height="453" /></p>
<p>I was alright with the green for this, so I saved it as a PDF and then loaded it into Illustrator as usual. I numbed the green some, cleaned up the labels with a new font and layout, and updated the legend.</p>
<p><span class="tip">Touched up version of treemap with black-green color scale.</span><img class="alignnone size-full wp-image-5343" title="treemap-revised" src="http://flowingdata.com/wp-content/uploads/2010/02/treemap-revised1.gif" alt="" width="545" height="446" /></p>
<p>And there you go - a treemap with just a few lines of code in our all-trusty R. Rinse and repeat with your own data.</p>
<p>For more examples, guidance, and all-around data goodness like this, <a href="http://book.flowingdata.com/">order Visualize This</a>, the FlowingData book on visualization, design, and statistics.</p>
]]></content:encoded>
			<wfw:commentRss>http://flowingdata.com/2010/02/11/an-easy-way-to-make-a-treemap/feed/</wfw:commentRss>
		<slash:comments>49</slash:comments>
		</item>
		<item>
		<title>How to Make a Heatmap &#8211; a Quick and Easy Solution</title>
		<link>http://flowingdata.com/2010/01/21/how-to-make-a-heatmap-a-quick-and-easy-solution/</link>
		<comments>http://flowingdata.com/2010/01/21/how-to-make-a-heatmap-a-quick-and-easy-solution/#comments</comments>
		<pubDate>Thu, 21 Jan 2010 11:50:44 +0000</pubDate>
		<dc:creator>Nathan Yau</dc:creator>
				<category><![CDATA[Tutorials]]></category>
		<category><![CDATA[heatmap]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://flowingdata.com/?p=4884</guid>
		<description><![CDATA[<p><a href="http://flowingdata.com/2010/01/21/how-to-make-a-heatmap-a-quick-and-easy-solution/"><img width="625" height="327" src="http://flowingdata.com/wp-content/uploads/2012/01/heatmap-625x327.png" class="attachment-medium wp-post-image" alt="heatmap" title="heatmap" /></a></p>A heatmap is a literal way of visualizing a table of numbers, where you substitute the numbers with colored cells. This is a quick way to make one in R.]]></description>
			<content:encoded><![CDATA[<p><a href="http://flowingdata.com/2010/01/21/how-to-make-a-heatmap-a-quick-and-easy-solution/"><img width="625" height="327" src="http://flowingdata.com/wp-content/uploads/2012/01/heatmap-625x327.png" class="attachment-medium wp-post-image" alt="heatmap" title="heatmap" /></a></p><h3>The Heatmap</h3>
<p>In case you don't know what a heatmap is, it's basically a table that has colors in place of numbers. Colors correspond to the level of the measurement. Each column can be a different metric like above, or it can be all the same like <a href="http://online.wsj.com/article/SB125993225142676615.html#articleTabs%3Dinteractive">this one</a>. It's useful for finding highs and lows and sometimes, patterns.</p>
<p>On to the tutorial.</p>
<h3>Step 0. Download R</h3>
<p>We're going to use <a href="http://r-project.org">R</a> for this. It's a statistical computing language and environment, and it's free. Get it for <a href="http://cran.stat.ucla.edu/bin/windows/base/">Windows</a>, <a href="http://cran.stat.ucla.edu/bin/macosx/">Mac</a>, or <a href="http://cran.stat.ucla.edu/bin/linux/">Linux</a>. It's a simple one-click install for Windows and Mac. I've never tried Linux.</p>
<p>Did you download and install R? Okay, let's move on. </p>
<h3>Step 1. Load the data</h3>
<p>Like all visualization, you should start with the data. No data? No visualization for you.</p>
<p>For this tutorial, we'll use NBA basketball statistics from last season that I downloaded from <a href="http://databasebasketball.com">databaseBasketball</a>. I've made it available <a href="http://datasets.flowingdata.com/ppg2008.csv">here</a> as a CSV file. You don't have to download it though. R can do it for you.</p>
<p>I'm assuming you started R already. You should see a blank window.</p>
<p><span class="tip">Initial R window when you open it. Exciting, I know.</span><img src="http://flowingdata.com/wp-content/uploads/2010/01/1Rconsole-545x473.png" alt="" title="1Rconsole" width="545" height="473" class="alignnone size-medium wp-image-4895" /></p>
<p>Now we'll load the data using <code>read.csv()</code>.</p>
<p><code>nba <- read.csv("http://datasets.flowingdata.com/ppg2008.csv", sep=",")<br />
</code></p>
<p>We've read a CSV file from a URL and specified the field separator as a comma. The data is stored in <code>nba</code>.</p>
<p>Type <code>nba</code> in the window, and you can see the data.</p>
<p><span class="tip">What the data looks like when you load it into R</span><img src="http://flowingdata.com/wp-content/uploads/2010/01/2load-545x473.png" alt="" title="2load" width="545" height="473" class="alignnone size-medium wp-image-4897" /></p>
<h3>Step 2. Sort data</h3>
<p>The data is sorted by points per game, greatest to least. Let's make it the other way around so that it's least to greatest.</p>
<p><code>nba <- nba[order(nba$PTS),]<br />
</code></p>
<p>We could just as easily chosen to order by assists, blocks, etc. </p>
<h3>Step 3. Prepare data</h3>
<p>As is, the column names match the CSV file's header. That's what we want.</p>
<p>But we also want to name the rows by player name instead of row number, so type this in the window:</p>
<p><code>row.names(nba) <- nba$Name<br />
</code></p>
<p>Now the rows are named by player, and we don't need the first column anymore so we'll get rid of it:</p>
<p><code>nba <- nba[,2:20]<br />
</code></p>
<h3>Step 4. Prepare data, again</h3>
<p>Are you noticing something here? It's important to note that a lot of visualization involves gathering and preparing data. Rarely, do you get data exactly how you need it, so you should expect to do some data munging before the visuals. Anyways, moving on.</p>
<p>The data was loaded into a data frame, but it has to be a data matrix to make your heatmap. The difference between a frame and a matrix is not important for this tutorial. You just need to know how to change it.</p>
<p><code>nba_matrix <- data.matrix(nba)<br />
</code></p>
<h3>Step 5. Make a heatmap</h3>
<p>It's time for the finale. In just one line of code, build the heatmap (remove the line break):</p>
<p><code>nba_heatmap <- heatmap(nba_matrix, Rowv=NA, Colv=NA, <br />
        col = cm.colors(256), scale="column", margins=c(5,10))<br />
</code></p>
<p>You should get a heatmap that looks something like this:</p>
<p><span class="tip">Default cyan to purple heatmap</span><img src="http://flowingdata.com/wp-content/uploads/2010/01/3heatmap-545x565.png" alt="" title="3heatmap" width="545" height="565" class="alignnone size-medium wp-image-4902" /></p>
<h3>Step 6. Color selection</h3>
<p>Maybe you want a different color scheme. Just change the argument to <code>col</code>, which is <code>cm.colors(256)</code> in the line of code we just executed. Type <code>?cm.colors</code> for help on what colors R offers. For example, you could use more heat-looking colors:</p>
<p><code>nba_heatmap <- heatmap(nba_matrix, Rowv=NA, Colv=NA, <br />
        col = heat.colors(256), scale="column", margins=c(5,10))<br />
</code></p>
<p><span class="tip">Changing to heat colors with the <code>col</code> argument</span><img src="http://flowingdata.com/wp-content/uploads/2010/01/4heat-545x564.png" alt="" title="4heat" width="545" height="564" class="alignnone size-medium wp-image-4913" /></p>
<p>For the heatmap at the beginning of this post, I used the <a href="http://cran.r-project.org/web/packages/RColorBrewer/index.html">RColorBrewer library</a>. Really, you can choose any color scheme you want. The <code>col</code> argument accepts any vector of hexidecimal-coded colors.</p>
<h3>Step 7. Clean it up - optional</h3>
<p>If you're using the heatmap to simply see what your data looks like, you can probably stop. But if it's for a report or presentation, you'll probably want to clean it up. You can fuss around with the options in R or you can save the graphic as a PDF and then import it into your favorite illustration software.</p>
<p>I personally use <a href="http://www.amazon.com/gp/product/B001EUDJWQ?ie=UTF8&tag=flowingdata-20&linkCode=as2&camp=1789&creative=390957&creativeASIN=B001EUDJWQ">Adobe Illustrator</a>, but you might prefer <a href="http://www.inkscape.org/">Inkscape</a>, the open source (free) solution. Illustrator is kind of expensive, but you can probably find an old version on the cheap. I still use CS2. Adobe's up to CS4 already.</p>
<p>For the final basketball graphic, I used a blue color scheme from RColorBrewer and then lightened the blue shades, added white border, changed the font, and organized the labels in Illustrator. Voila.</p>
<p><span class="tip">Updated heatmap in Illustrator with clearer labels and a blue-white color scale</span><img src="http://flowingdata.com/wp-content/uploads/2010/01/nba_heatmap_revised.png" alt="" title="Finished heatmap" width="545" height="861" class="alignnone size-full wp-image-4915" /></p>
<p>Rinse and repeat to use with your own data. Have fun heatmapping.</p>
<p>For more examples, guidance, and all-around data goodness like this, <a href="http://flowingdata.com/book/">buy Visualize This</a>, the new FlowingData book.</p>
]]></content:encoded>
			<wfw:commentRss>http://flowingdata.com/2010/01/21/how-to-make-a-heatmap-a-quick-and-easy-solution/feed/</wfw:commentRss>
		<slash:comments>60</slash:comments>
		</item>
		<item>
		<title>How to Make an Interactive Area Graph with Flare</title>
		<link>http://flowingdata.com/2009/12/09/how-to-make-an-interactive-area-graph/</link>
		<comments>http://flowingdata.com/2009/12/09/how-to-make-an-interactive-area-graph/#comments</comments>
		<pubDate>Wed, 09 Dec 2009 08:55:33 +0000</pubDate>
		<dc:creator>Nathan Yau</dc:creator>
				<category><![CDATA[Statistical Visualization]]></category>
		<category><![CDATA[Tutorials]]></category>

		<guid isPermaLink="false">http://flowingdata.com/?p=4165</guid>
		<description><![CDATA[<a href="http://flowingdata.com/2009/12/09/how-to-make-an-interactive-area-graph/" title="How to Make an Interactive Area Graph with Flare"><img src="http://flowingdata.com/wp-content/uploads/yapb_cache/orange_area.bp1nwgwg0sg00wc8gs08w0ow8.ei3320h1mlkos0g4gc0scg40c.th.png" width="550" height="401" alt="How to Make an Interactive Area Graph with Flare" ></a><p><a href="http://flowingdata.com/2009/12/09/how-to-make-an-interactive-area-graph/"><img width="625" height="455" src="http://flowingdata.com/wp-content/uploads/2012/02/flare-graph.png" class="attachment-medium wp-post-image" alt="flare graph" title="flare graph" /></a></p>You've seen the NameExplorer from the Baby Name Wizard by Martin Wattenberg. It's an interactive area chart that lets you &#8230;]]></description>
			<content:encoded><![CDATA[<a href="http://flowingdata.com/2009/12/09/how-to-make-an-interactive-area-graph/" title="How to Make an Interactive Area Graph with Flare"><img src="http://flowingdata.com/wp-content/uploads/yapb_cache/orange_area.bp1nwgwg0sg00wc8gs08w0ow8.ei3320h1mlkos0g4gc0scg40c.th.png" width="550" height="401" alt="How to Make an Interactive Area Graph with Flare" ></a><p><a href="http://flowingdata.com/2009/12/09/how-to-make-an-interactive-area-graph/"><img width="625" height="455" src="http://flowingdata.com/wp-content/uploads/2012/02/flare-graph.png" class="attachment-medium wp-post-image" alt="flare graph" title="flare graph" /></a></p><p>You've seen the <a href="http://www.babynamewizard.com/voyager">NameExplorer</a> from the Baby Name Wizard by Martin Wattenberg. It's an interactive area chart that lets you explore the popularity of names over time. Search by clicking on names or typing in a name in the prompt. It's simple. It's sexy. Everybody loves it.</p>
<p>This is a step-by-step guide on how to make a similar visualization in Actionscript/Flash with your own data and how to customize the design for whatever you need. We're after last week's <a href="http://projects.flowingdata.com/america/spending/">graphic</a> on consumer spending (above).</p>
<h3>Audience</h3>
<p>This tutorial is for people with at least a little bit of programming experience. I'll try to make it as straightforward as possible, but the concepts might be a little hard to grasp if you've never written a line of code. Just a heads up. Of course it never hurts to try.</p>
<p>If you don't care about customization or integration into an application and don't mind putting your data in the public domain, you could also just dump your data into Many Eyes, and use the <a href="http://manyeyes.alphaworks.ibm.com/manyeyes/page/Stack_Graph.html">Stack Graph</a>.</p>
<h3>Get Adobe Flex Builder</h3>
<p>Like I said, this is all in Actionscript, so before we start anything, I strongly recommend you get <a href="http://www.amazon.com/gp/product/B0014A4G5U?ie=UTF8&amp;tag=flowingdata-20&amp;linkCode=as2&amp;camp=1789&amp;creative=390957&amp;creativeASIN=B0014A4G5U">Adobe Flex Builder</a> if you don't already have it. You can buy it, get a trial version from the Adobe site, or if you're in education, you can <a href="https://freeriatools.adobe.com/flex/">get it for free</a>.</p>
<p>There are <a href="http://opensource.adobe.com/wiki/display/flexsdk/Downloads">ways</a> to compile Actionscript without Flex Builder, but they are more complicated.</p>
<h3>Working With Flare</h3>
<p>Luckily you don't have to start from scratch. In fact, most of the work has already been done for you using Jeffrey Heer's <a href="http://flare.prefuse.org/">Flare visualization toolkit</a>. It's an Actionscript library. We're going to work off one of the sample applications: <a href="http://flare.prefuse.org/apps/job_voyager">JobVoyager</a>. Once you get your development environment setup, it's just a matter of switching in your data, and then customizing the look and feel. </p>
<p>Okay, let's get started (finally).</p>
<h3>Step 0. Download and import Flare</h3>
<p><a href="http://flare.prefuse.org/download">Download</a> the most recent version of Flare, and then unpack the contents into your working directory.</p>
<p>Open Flex Builder. You should see something like this:</p>
<p><img class="alignnone size-medium wp-image-4205" title="Flex Builder Window" src="http://flowingdata.com/wp-content/uploads/2009/12/Picture-1-545x375.png" alt="Flex Builder Window" width="545" height="375" /></p>
<p>Right click on the Flex Navigator (left-hand side) and click on "Import..." You'll get a popup that looks like this:</p>
<p><img class="alignnone size-medium wp-image-4206" title="Picture 2" src="http://flowingdata.com/wp-content/uploads/2009/12/Picture-2-545x567.png" alt="Picture 2" width="545" height="567" /></p>
<p>Select "Existing Projects into Workspace" and click "Next." Browse to where you put the Flare files. Select the "flare" directory, and then make sure "flare" is checked in the project window.</p>
<p><img class="alignnone size-medium wp-image-4208" title="Picture 3" src="http://flowingdata.com/wp-content/uploads/2009/12/Picture-3-545x567.png" alt="Picture 3" width="545" height="567" /></p>
<p>Do the same thing with the "flare.apps" folder. Your Flex Builder window should look like this once you've expanded the flare.apps/flare/apps/ and click on JobVoyager.as.</p>
<p><img class="alignnone size-medium wp-image-4213" title="JobVoyager Code" src="http://flowingdata.com/wp-content/uploads/2009/12/Picture-5-545x375.png" alt="JobVoyager Code" width="545" height="375" /></p>
<p>If you clicked the run button right now (the green button with the white play triangle top left), you should see the working <a href="http://flare.prefuse.org/apps/job_voyager">JobVoyager</a>. Get that working, and you're done with the hardest part.</p>
<h3>Step 1. Update the data source</h3>
<p>Let's dive into the code now so you can adapt the visualization to your own data and customize the aesthetics. Again, we're going to adapt this code to <a href="http://projects.flowingdata.com/america/spending/expenditures.txt">consumer spending data</a> to make last week's graphic.</p>
<p><img class="alignnone size-full wp-image-4183" title="consumer spending" src="http://flowingdata.com/wp-content/uploads/2009/12/spending.png" alt="consumer spending" width="545" height="578" /></p>
<p>First we need to change the data source. This is specified on <code>line 57</code>.</p>
<pre class="actionscript"><span style="color: #0066CC;">private</span> <span style="color: #000000; font-weight: bold;">var</span> <span style="color: #0066CC;">_url</span>:<span style="color: #0066CC;">String</span> = <span style="color: #ff0000;">&quot;http://flare.prefuse.org/data/jobs.txt&quot;</span>;
&nbsp;</pre>
<p>Change the <code>_url</code> to point at the spending data, which like <code>jobs.txt</code>, is also a tab-delimited file. The first column is <code>year</code>, the second <code>category</code>, and the last column is <code>expenditure</code>:</p>
<pre class="actionscript"><span style="color: #0066CC;">private</span> <span style="color: #000000; font-weight: bold;">var</span> <span style="color: #0066CC;">_url</span>:<span style="color: #0066CC;">String</span> = <span style="color: #ff0000;">&quot;http://datasets.flowingdata.com/expenditures.txt&quot;</span>
&nbsp;</pre>
<p>Now the file will read in our spending data instead of for jobs. Easy stuff so far.</p>
<h3>Step 2. Change the years</h3>
<p>The next two lines, <code>line 58</code> and <code>59</code> are the column names, or in this case, the distinct years that job data was available. It's by decade from 1850 to 2000. We could make things more robust by finding the years in the loaded data, but since the data isn't changing we can same some time and explicitly specify the years.</p>
<p>Our expenditures data is annual from 1984 to 2008. We'll change lines 58-59 accordingly.</p>
<pre class="actionscript"><span style="color: #0066CC;">private</span> <span style="color: #000000; font-weight: bold;">var</span> _cols:<span style="color: #0066CC;">Array</span> = <span style="color: #66cc66;">&#91;</span><span style="color: #cc66cc;">1984</span>,<span style="color: #cc66cc;">1985</span>,<span style="color: #cc66cc;">1986</span>,<span style="color: #cc66cc;">1987</span>,<span style="color: #cc66cc;">1988</span>,<span style="color: #cc66cc;">1989</span>,<span style="color: #cc66cc;">1990</span>,<span style="color: #cc66cc;">1991</span>,
		<span style="color: #cc66cc;">1992</span>,<span style="color: #cc66cc;">1993</span>,<span style="color: #cc66cc;">1994</span>,<span style="color: #cc66cc;">1995</span>,<span style="color: #cc66cc;">1996</span>,<span style="color: #cc66cc;">1997</span>,<span style="color: #cc66cc;">1998</span>,<span style="color: #cc66cc;">1999</span>,
		<span style="color: #cc66cc;">2000</span>,<span style="color: #cc66cc;">2001</span>,<span style="color: #cc66cc;">2002</span>,<span style="color: #cc66cc;">2003</span>,<span style="color: #cc66cc;">2004</span>,<span style="color: #cc66cc;">2005</span>,<span style="color: #cc66cc;">2006</span>,<span style="color: #cc66cc;">2007</span>,<span style="color: #cc66cc;">2008</span><span style="color: #66cc66;">&#93;</span>;
&nbsp;</pre>
<h3>Step 3. Data headers</h3>
<p>Next we need to change references to the data headers. The original data file (<code>jobs.txt</code>) has four columns: <code>year</code>, <code>occupation</code>, <code>people</code> and <code>sex</code>. Our spending data only has three columns: <code>year</code>, <code>category</code>, and <code>expenditure</code>. We have to adapt the code to this new data structure.</p>
<p>Luckily, it's pretty easy. The <code>year</code> column is the same, so we just need to change any <code>people</code> references to <code>expenditure</code> (vertical axis) and any <code>occupation</code> references to <code>category</code> (the layers). Finally, we'll remove all uses of gender.</p>
<p>At <code>line 74</code> the data is reshaped and prepared for the stacked area chart. It specifies by <code>occupation</code> and <code>sex</code> as the categories (i.e. layers), uses <code>year</code> on the x-axis, and <code>people</code> on the y-axis.</p>
<pre class="actionscript"><span style="color: #000000; font-weight: bold;">var</span> dr:<span style="color: #0066CC;">Array</span> = reshape<span style="color: #66cc66;">&#40;</span>ds.<span style="color: #006600;">nodes</span>.<span style="color: #0066CC;">data</span>, <span style="color: #66cc66;">&#91;</span><span style="color: #ff0000;">&quot;occupation&quot;</span>,<span style="color: #ff0000;">&quot;sex&quot;</span><span style="color: #66cc66;">&#93;</span>,
     <span style="color: #ff0000;">&quot;year&quot;</span>, <span style="color: #ff0000;">&quot;people&quot;</span>, _cols<span style="color: #66cc66;">&#41;</span>;
&nbsp;</pre>
<p>Change it to this:</p>
<pre class="actionscript"><span style="color: #000000; font-weight: bold;">var</span> dr:<span style="color: #0066CC;">Array</span> = reshape<span style="color: #66cc66;">&#40;</span>ds.<span style="color: #006600;">nodes</span>.<span style="color: #0066CC;">data</span>, <span style="color: #66cc66;">&#91;</span><span style="color: #ff0000;">&quot;category&quot;</span><span style="color: #66cc66;">&#93;</span>,
     <span style="color: #ff0000;">&quot;year&quot;</span>, <span style="color: #ff0000;">&quot;expenditure&quot;</span>, _cols<span style="color: #66cc66;">&#41;</span>;
&nbsp;</pre>
<p>We only have one category (sans <code>sex</code>), and that's uh, <code>category</code>. The x-axis is still <code>year</code>, and the y-axis is <code>expenditure</code>.</p>
<h3>Step 4. Sorting</h3>
<p><code>Line 84</code> sorts the data by <code>occupation</code> (alphabetically) and then <code>sex</code> (numerically). We'll just sort by <code>category</code>:</p>
<pre class="actionscript"><span style="color: #0066CC;">data</span>.<span style="color: #006600;">nodes</span>.<span style="color: #006600;">sortBy</span><span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">&quot;data.category&quot;</span><span style="color: #66cc66;">&#41;</span>;
&nbsp;</pre>
<p>Are you starting to get the idea here?</p>
<h3>Step 5. Proper categories</h3>
<p><code>Line 92</code> colors layers by <code>sex</code>, but we don't have that split in our data, so we don't need to do that. Remove the entire row:</p>
<pre class="actionscript"><span style="color: #0066CC;">data</span>.<span style="color: #006600;">nodes</span>.<span style="color: #0066CC;">setProperty</span><span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">&quot;fillHue&quot;</span>, iff<span style="color: #66cc66;">&#40;</span><span style="color: #0066CC;">eq</span><span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">&quot;data.sex&quot;</span>,<span style="color: #cc66cc;">1</span><span style="color: #66cc66;">&#41;</span>, <span style="color: #cc66cc;">0.7</span>, <span style="color: #cc66cc;">0</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>;
&nbsp;</pre>
<p>We'll come back to customizing the colors of the stacks a little later.</p>
<h3>Step 6. Labeling areas</h3>
<p><code>Line 103</code> adds labels based <code>occupation</code>:</p>
<pre class="actionscript">_vis.<span style="color: #006600;">operators</span>.<span style="color: #0066CC;">add</span><span style="color: #66cc66;">&#40;</span><span style="color: #000000; font-weight: bold;">new</span> StackedAreaLabeler<span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">&quot;data.occupation&quot;</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>;
&nbsp;</pre>
<p>We want to label based on spending <code>category</code>, so we'll change the line accordingly:</p>
<pre class="actionscript">_vis.<span style="color: #006600;">operators</span>.<span style="color: #0066CC;">add</span><span style="color: #66cc66;">&#40;</span><span style="color: #000000; font-weight: bold;">new</span> StackedAreaLabeler<span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">&quot;data.category&quot;</span><span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>;
&nbsp;</pre>
<h3>Step 7. Interactive filters</h3>
<p><code>Lines 213-231</code> handle filtering in JobVoyager. First there's the male/female filter and then there's the filter by occupation. We don't need the former, so we can get rid of <code>lines 215-218</code> and then make <code>line 219</code> a plain <code>if</code> statement.</p>
<p>Similarly, <code>lines 264-293</code> at buttons to trigger the male/female filter. We can get rid of that too.</p>
<h3>Step 8. Search categories</h3>
<p>We're getting really close to fully customizing the voyager to our spending data. Go back to the <code>filter()</code> function at <code>line 213</code>. Again, we need to update the function so that we can filter by spending <code>category</code> instead of <code>occupation</code>.</p>
<p>Here's <code>line 222</code> as-is:</p>
<pre class="actionscript"><span style="color: #000000; font-weight: bold;">var</span> s:<span style="color: #0066CC;">String</span> = <span style="color: #0066CC;">String</span><span style="color: #66cc66;">&#40;</span>d.<span style="color: #0066CC;">data</span><span style="color: #66cc66;">&#91;</span><span style="color: #ff0000;">&quot;occupation&quot;</span><span style="color: #66cc66;">&#93;</span><span style="color: #66cc66;">&#41;</span>.<span style="color: #0066CC;">toLowerCase</span><span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span>;
&nbsp;</pre>
<p>Change <code>occupation</code> to <code>category</code>:</p>
<pre class="actionscript"><span style="color: #000000; font-weight: bold;">var</span> s:<span style="color: #0066CC;">String</span> = <span style="color: #0066CC;">String</span><span style="color: #66cc66;">&#40;</span>d.<span style="color: #0066CC;">data</span><span style="color: #66cc66;">&#91;</span><span style="color: #ff0000;">&quot;category&quot;</span><span style="color: #66cc66;">&#93;</span><span style="color: #66cc66;">&#41;</span>.<span style="color: #0066CC;">toLowerCase</span><span style="color: #66cc66;">&#40;</span><span style="color: #66cc66;">&#41;</span>;
&nbsp;</pre>
<h3>Step 9. Color scheme</h3>
<p>If you ran the code right now, everything should compile correctly, and it'll look something like this:</p>
<p><img src="http://flowingdata.com/wp-content/uploads/2009/12/red-area-545x393.png" alt="red-area" title="red-area" width="545" height="393" class="alignnone size-medium wp-image-4235" /></p>
<p>Color is specified in two places. First <code>lines 86-89</code> specify stroke color and colors everything red:</p>
<pre class="actionscript">shape: Shapes.<span style="color: #006600;">POLYGON</span>,
lineColor: <span style="color: #cc66cc;">0</span>,
fillValue: <span style="color: #cc66cc;">1</span>,
fillSaturation: <span style="color: #cc66cc;">0.5</span>
&nbsp;</pre>
<p>Then <code>line 105</code> updates saturation (the level of red), by count. The code for the <code>SaturationEncoder()</code> is in <code>lines 360-383</code>. We're not going to use saturation though. Instead, we'll explicitly specify the color scheme.</p>
<p>First update <code>lines 86-89</code> to this:</p>
<pre class="actionscript">shape: Shapes.<span style="color: #006600;">POLYGON</span>,
lineColor: 0xFFFFFFFF
&nbsp;</pre>
<p>We're going to make stroke color white with <code>lineColor</code>. If there were more spending categories, we probably wouldn't do this because it'd be cluttered. We don't have that many though, so it'll make reading a little easier.</p>
<p>Next, make an array of the colors we want to use ordered by levels. Put it towards the top around line 50:</p>
<pre class="actionscript"><span style="color: #0066CC;">private</span> <span style="color: #000000; font-weight: bold;">var</span> _reds:<span style="color: #0066CC;">Array</span> = <span style="color: #66cc66;">&#91;</span>0xFFFEF0D9, 0xFFFDD49E, 0xFFFDBB84,
    0xFFFC8D59, 0xFFE34A33, 0xFFB30000<span style="color: #66cc66;">&#93;</span>;
&nbsp;</pre>
<p>I used the <a href="http://colorbrewer2.com">ColorBrewer</a> for these colors.</p>
<p>Then we'll add a new <code>ColorEncoder</code> around <code>line 110</code>:</p>
<pre class="actionscript"><span style="color: #000000; font-weight: bold;">var</span> colorPalette:ColorPalette = <span style="color: #000000; font-weight: bold;">new</span> ColorPalette<span style="color: #66cc66;">&#40;</span>_reds<span style="color: #66cc66;">&#41;</span>;
_vis.<span style="color: #006600;">operators</span>.<span style="color: #0066CC;">add</span><span style="color: #66cc66;">&#40;</span><span style="color: #000000; font-weight: bold;">new</span> ColorEncoder<span style="color: #66cc66;">&#40;</span><span style="color: #ff0000;">&quot;data.max&quot;</span>, <span style="color: #ff0000;">&quot;nodes&quot;</span>, <span style="color: #ff0000;">&quot;fillColor&quot;</span>, <span style="color: #000000; font-weight: bold;">null</span>, colorPalette<span style="color: #66cc66;">&#41;</span><span style="color: #66cc66;">&#41;</span>;
&nbsp;</pre>
<p>Tada, you now have something that looks like what we're after:</p>
<p><img src="http://flowingdata.com/wp-content/uploads/2009/12/orange-area-545x397.png" alt="orange-area" title="orange-area" width="545" height="397" class="alignnone size-medium wp-image-4238" /></p>
<p>Here's the <a href="http://projects.flowingdata.com/america/spending/">finished product</a>.</p>
<h3>Step 10. Download the code and see for yourself</h3>
<p>To play with this yourself, just complete <em>step 0</em> and then replace JobVoyager.as with <a href="http://projects.flowingdata.com/america/spending/JobVoyager.as">this file</a>. It has all the updates we've just covered.</p>
<h3>Step 11. Where to Go From Here</h3>
<p>There's a lot of things you can do with this. You can apply this to your own data, use a different color scheme, and further customize to fit your needs. Maybe change the font or the tooltip format. Then you can get fancier and integrate it with other tools or add more Actionscript, so on and so forth. If anything, you should at least check out what else you can do with Jeffrey's <a href="http://flare.prefuse.org/">Flare visualization toolkit</a>. </p>
<p>Have fun!</p>
<p>For more examples, guidance, and all-around data goodness like this, <a href="http://flowingdata.com/book/">pre-order Visualize This</a>, the upcoming FlowingData book.</p>
]]></content:encoded>
			<wfw:commentRss>http://flowingdata.com/2009/12/09/how-to-make-an-interactive-area-graph/feed/</wfw:commentRss>
		<slash:comments>22</slash:comments>
		</item>
	</channel>
</rss>

