<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>FlowingData &#187; Tutorials</title>
	<atom:link href="http://flowingdata.com/category/tutorials/feed/" rel="self" type="application/rss+xml" />
	<link>http://flowingdata.com</link>
	<description>Strength in Numbers</description>
	<lastBuildDate>Thu, 24 May 2012 07:48:21 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
	<atom:link rel="next" href="http://flowingdata.com/category/tutorials/feed/?page=2" />

		<item>
		<title>How to Visualize and Compare Distributions</title>
		<link>http://flowingdata.com/2012/05/15/how-to-visualize-and-compare-distributions/</link>
		<comments>http://flowingdata.com/2012/05/15/how-to-visualize-and-compare-distributions/#comments</comments>
		<pubDate>Wed, 16 May 2012 05:47:38 +0000</pubDate>
		<dc:creator>Nathan Yau</dc:creator>
				<category><![CDATA[Tutorials]]></category>
		<category><![CDATA[distributions]]></category>
		<category><![CDATA[featured]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://flowingdata.com/?p=24220</guid>
		<description><![CDATA[<p><a href="http://flowingdata.com/2012/05/15/how-to-visualize-and-compare-distributions/"><img width="625" height="397" src="http://flowingdata.com/wp-content/uploads/2012/05/distribution-plots1.png" class="attachment-medium wp-post-image" alt="How to visualize distributions" title="How to visualize distributions" /></a></p>Single data points from a large dataset can make it more relatable, but those individual numbers don't mean much without something to compare to. That's where distributions come in.]]></description>
			<content:encoded><![CDATA[<p><a href="http://flowingdata.com/2012/05/15/how-to-visualize-and-compare-distributions/"><img width="625" height="397" src="http://flowingdata.com/wp-content/uploads/2012/05/distribution-plots1.png" class="attachment-medium wp-post-image" alt="How to visualize distributions" title="How to visualize distributions" /></a></p><p>There are a lot of ways to show distributions, but for the purposes of this tutorial, I'm only going to cover the more traditional plot types like histograms and box plots. Otherwise, we could be here all night. Plus the basic distribution plots aren't exactly well-used as it is.</p>
<p>Before you get into plotting in R though, you should know what I mean by distribution. It's basically the spread of a dataset. For example, the median of a dataset is the half-way point. Half of the values are less than the median, and the other half are greater than. That's only part of the picture. </p>
<p>What happens in between the maximum value and median? Do the values cluster towards the median and quickly increase? Are there are lot of values clustered towards the maximums and minimums with nothing in between? Sometimes the variation in a dataset is a lot more interesting than just mean or median. Distribution plots help you see what's going on.</p>
<p><span class="tip">Want more? Google and Wikipedia are your friend.</span>Anyways, that's enough talking. Let's make some charts.</p>
<p>If you don't have R installed yet, <a href="http://www.r-project.org/">do that now</a>.</p>
<h2>Box-and-Whisker Plot</h2>
<p>This old standby was created by statistician John Tukey in the age of graphing with pencil and paper. I wrote <a href="http://flowingdata.com/2008/02/15/how-to-read-and-use-a-box-and-whisker-plot/">a short guide</a> on how to read them a while back, but you basically have the median in the middle, upper and lower quartiles, and upper and lower fences. If there are outliers more or less than 1.5 times the upper or lower quartiles, respectively, they are shown with dots. </p>
<p>The method might be old, but they still work for showing basic distribution. Obviously, because only a handful of values are shown to represent a dataset, you do lose the variation in between the points.</p>
<p>To get started, load the data in R. You'll use state-level crime data from the <a href="http://flowingdata.com/2010/08/31/how-to-visualize-data-with-cartoonish-faces/">Chernoff faces tutorial</a>.</p>
<pre class="brush: r; title: ; notranslate">
# Load crime data
crime &lt;- read.csv(&quot;http://datasets.flowingdata.com/crimeRatesByState-formatted.csv&quot;)
</pre>
<p></p>
<p>Remove the District of Columbia from the loaded data. Its city-like makeup tends to throw everything off.</p>
<pre class="brush: r; title: ; notranslate">
# Remove Washington, D.C.
crime.new &lt;- crime[crime$state != &quot;District of Columbia&quot;,]
</pre>
<p></p>
<p>Oh, and you don't need the national averages for this tutorial either.</p>
<pre class="brush: r; title: ; notranslate">
# Remove national averages
crime.new &lt;- crime.new[crime.new$state != &quot;United States &quot;,]
</pre>
<p></p>
<p>Now all you have to do to make a box plot for say, robbery rates, is plug the data into <code>boxplot()</code>.</p>
<pre class="brush: r; title: ; notranslate">
# Box plot
boxplot(crime.new$robbery, horizontal=TRUE, main=&quot;Robbery Rates in US&quot;)
</pre>
<p></p>
<p><img src="http://flowingdata.com/wp-content/uploads/2012/05/01box-plot-625x389.png" alt="" title="box plot" width="625" height="389" class="alignnone size-medium wp-image-24241" /></p>
<p>Want to make box plots for every column, excluding the first (since it's non-numeric state names)? That's easy, too. Same function, different argument.</p>
<pre class="brush: r; title: ; notranslate">
# Box plots for all crime rates
boxplot(crime.new[,-1], horizontal=TRUE, main=&quot;Crime Rates in US&quot;)
</pre>
<p></p>
<p><span class="tip">Multiple box plot for comparision.</span><img src="http://flowingdata.com/wp-content/uploads/2012/05/02multiple-box-plots-625x518.png" alt="" title="multiple box plots" width="625" height="518" class="alignnone size-medium wp-image-24242" /></p>
<h2>Histogram</h2>
<p>Like I said though, the box plot hides variation in between the values that it does show. A histogram can provide more details. Histograms look like bar charts, but they are not the same. The horizontal axis on a histogram is continuous, whereas bar charts can have space in between categories.</p>
<p>Just like <code>boxplot()</code>, you can plug the data right into the <code>hist()</code> function. The <code>breaks</code> argument indicates how many breaks on the horizontal to use. </p>
<pre class="brush: r; title: ; notranslate">
# Histogram
hist(crime.new$robbery, breaks=10)
</pre>
<p></p>
<p><span class="tip">Look, ma! It's not a a bar chart.</span><img src="http://flowingdata.com/wp-content/uploads/2012/05/03histogram-625x475.png" alt="" title="histogram" width="625" height="475" class="alignnone size-medium wp-image-24250" /></p>
<p>Using the <code>hist()</code> function, you have to do a tiny bit more if you want to make multiple histograms in one view. Iterate through each column of the dataframe with a for loop. Call <code>hist()</code> on each iteration.</p>
<pre class="brush: r; title: ; notranslate">
# Multiple histograms
par(mfrow=c(3, 3))
colnames &lt;- dimnames(crime.new)[[2]]
for (i in 2:8) {
	hist(crime[,i], xlim=c(0, 3500), breaks=seq(0, 3500, 100), main=colnames[i], probability=TRUE, col=&quot;gray&quot;, border=&quot;white&quot;)
}
</pre>
<p></p>
<p><span class="tip">Using the same scale for each makes it easy to compare distributions.</span><img src="http://flowingdata.com/wp-content/uploads/2012/05/04multiple-histograms-625x500.png" alt="" title="multiple histograms" width="625" height="500" class="alignnone size-medium wp-image-24244" /></p>
<h2>Density Plot</h2>
<p>For smoother distributions, you can use the density plot. You should have a healthy amount of data to use these or you could end up with a lot of unwanted noise.</p>
<p>To use them in R, it's basically the same as using the <code>hist()</code> function. Iterate through each column, but instead of a histogram, calculate density, create a blank plot, and then draw the shape.</p>
<pre class="brush: r; title: ; notranslate">
# Density plot
par(mfrow=c(3, 3))
colnames &lt;- dimnames(crime.new)[[2]]
for (i in 2:8) {
	d &lt;- density(crime[,i])
	plot(d, type=&quot;n&quot;, main=colnames[i])
	polygon(d, col=&quot;red&quot;, border=&quot;gray&quot;)
}
</pre>
<p></p>
<p><span class="tip">Multiple filled density plots.</span><img src="http://flowingdata.com/wp-content/uploads/2012/05/05density-plots-625x648.png" alt="" title="density plots" width="625" height="648" class="alignnone size-medium wp-image-24245" /></p>
<p>You can also use histograms and density lines together. Instead of <code>plot()</code>, use <code>hist()</code>, and instead of drawing a filled <code>polygon()</code>, just draw a line.</p>
<pre class="brush: r; title: ; notranslate">
# Histograms and density lines
par(mfrow=c(3, 3))
colnames &lt;- dimnames(crime.new)[[2]]
for (i in 2:8) {
	hist(crime[,i], xlim=c(0, 3500), breaks=seq(0, 3500, 100), main=colnames[i], probability=TRUE, col=&quot;gray&quot;, border=&quot;white&quot;)
	d &lt;- density(crime[,i])
	lines(d, col=&quot;red&quot;)
}
</pre>
<p></p>
<p><span class="tip">Histogram and density, reunited, and it feels so good.</span><img src="http://flowingdata.com/wp-content/uploads/2012/05/06histogram-and-density-625x648.png" alt="" title="histogram and density" width="625" height="648" class="alignnone size-medium wp-image-24246" /></p>
<h2>Rug</h2>
<p>The rug, which simply draws ticks for each value, is another way to show distributions. It usually accompanies another plot though, rather than serve as a standalone. Simply make a plot like you usually would, and then use <code>rug()</code> to draw said rug.</p>
<pre class="brush: r; title: ; notranslate">
# Density and rug
d &lt;- density(crime$robbery)
plot(d, type=&quot;n&quot;, main=&quot;robbery&quot;)
polygon(d, col=&quot;lightgray&quot;, border=&quot;gray&quot;)
rug(crime$robbery, col=&quot;red&quot;)
</pre>
<p></p>
<p><span class="tip">Using a rug under a density plot.</span><img src="http://flowingdata.com/wp-content/uploads/2012/05/07rug-625x538.png" alt="" title="rug" width="625" height="538" class="alignnone size-medium wp-image-24249" /></p>
<h2>Violin Plot</h2>
<p>The violin plot is like the lovechild between a density plot and a box-and-whisker plot. There's a box-and-whisker in the center, and it's surrounded by a centered density, which lets you see some of the variation.</p>
<pre class="brush: r; title: ; notranslate">
# Violin plot
library(vioplot)
vioplot(crime.new$robbery, horizontal=TRUE, col=&quot;gray&quot;)
</pre>
<p></p>
<p><span class="tip">I bet this violin sounds horrible.</span><img src="http://flowingdata.com/wp-content/uploads/2012/05/07violin-plot-625x539.png" alt="" title="violin plot" width="625" height="539" class="alignnone size-medium wp-image-24247" /></p>
<h2>Bean Plot</h2>
<p>The bean plot takes it a bit further than the violin plot. It's something of a combination of a box plot, density plot, and a rug in the middle. I've never actually used this one, and I probably never will, but there you go.</p>
<pre class="brush: r; title: ; notranslate">
# Bean plot
library(beanplot)
beanplot(crime.new[,-1])
</pre>
<p></p>
<p><span class="tip">A little too busy for me, but here you go.</span><img src="http://flowingdata.com/wp-content/uploads/2012/05/08bean-plot-625x538.png" alt="" title="bean plot" width="625" height="538" class="alignnone size-medium wp-image-24248" /></p>
<h2>Wrapping Up</h2>
<p>If you take away anything from this, it should be that variance within a dataset is worth investigating. Picking out single datapoints or only using medians is the easy thing to do, but it's usually not the most interesting.</p>
<h4>Related</h4><p><ul>
<li><a href='http://flowingdata.com/2009/08/12/mapping-crime-in-oxford-over-time/' rel='bookmark' title='Mapping Crime in Oxford Over Time'>Mapping Crime in Oxford Over Time</a></li>
<li><a href='http://flowingdata.com/2010/08/31/how-to-visualize-data-with-cartoonish-faces/' rel='bookmark' title='How to visualize data with cartoonish faces ala Chernoff'>How to visualize data with cartoonish faces ala Chernoff</a></li>
<li><a href='http://flowingdata.com/2012/03/15/calendar-heatmaps-to-visualize-time-series-data/' rel='bookmark' title='Calendar Heatmaps to Visualize Time Series Data'>Calendar Heatmaps to Visualize Time Series Data</a></li>
</ul></p>]]></content:encoded>
			<wfw:commentRss>http://flowingdata.com/2012/05/15/how-to-visualize-and-compare-distributions/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Members Only: How to Make a Sankey Diagram to Show Flow</title>
		<link>http://flowingdata.com/2012/04/26/how-to-make-a-sankey-diagram-to-show-flow/</link>
		<comments>http://flowingdata.com/2012/04/26/how-to-make-a-sankey-diagram-to-show-flow/#comments</comments>
		<pubDate>Thu, 26 Apr 2012 15:10:35 +0000</pubDate>
		<dc:creator>Nathan Yau</dc:creator>
				<category><![CDATA[Tutorials]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[Sankey]]></category>

		<guid isPermaLink="false">http://flowingdata.com/?p=23358</guid>
		<description><![CDATA[<p><a href="http://flowingdata.com/2012/04/26/how-to-make-a-sankey-diagram-to-show-flow/"><img width="625" height="412" src="http://flowingdata.com/wp-content/uploads/2012/04/sankey-leader-625x412.png" class="attachment-medium wp-post-image" alt="How to Make a Sankey Diagram" title="How to Make a Sankey Diagram" /></a></p>These tend to be made ad hoc and are usually pieced together manually, which takes a lot of time. Here's a way to lay the framework in R, so you don't have to do all the work yourself.]]></description>
			<content:encoded><![CDATA[These tend to be made ad hoc and are usually pieced together manually, which takes a lot of time. Here's a way to lay the framework in R, so you don't have to do all the work yourself.]]></content:encoded>
			<wfw:commentRss>http://flowingdata.com/2012/04/26/how-to-make-a-sankey-diagram-to-show-flow/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Members Only: Interactive Time Series Chart with Filters</title>
		<link>http://flowingdata.com/2012/04/04/interactive-time-series-chart-with-filters/</link>
		<comments>http://flowingdata.com/2012/04/04/interactive-time-series-chart-with-filters/#comments</comments>
		<pubDate>Wed, 04 Apr 2012 16:20:08 +0000</pubDate>
		<dc:creator>Nathan Yau</dc:creator>
				<category><![CDATA[Tutorials]]></category>
		<category><![CDATA[D3]]></category>
		<category><![CDATA[interactive]]></category>
		<category><![CDATA[javascript]]></category>
		<category><![CDATA[time series]]></category>

		<guid isPermaLink="false">http://flowingdata.com/?p=21507</guid>
		<description><![CDATA[<p><a href="http://flowingdata.com/2012/04/04/interactive-time-series-chart-with-filters/"><img width="625" height="355" src="http://flowingdata.com/wp-content/uploads/2012/02/Time-series-with-highlight-625x355.png" class="attachment-medium wp-post-image" alt="Time series with highlight" title="Time series with highlight" /></a></p>Time series charts can easily turn to spaghetti when you have multiple categories. By highlighting the ones of interest, you can direct focus and allow comparisons.]]></description>
			<content:encoded><![CDATA[Time series charts can easily turn to spaghetti when you have multiple categories. By highlighting the ones of interest, you can direct focus and allow comparisons.]]></content:encoded>
			<wfw:commentRss>http://flowingdata.com/2012/04/04/interactive-time-series-chart-with-filters/feed/</wfw:commentRss>
		<slash:comments>10</slash:comments>
		</item>
		<item>
		<title>Members Only: Calendar Heatmaps to Visualize Time Series Data</title>
		<link>http://flowingdata.com/2012/03/15/calendar-heatmaps-to-visualize-time-series-data/</link>
		<comments>http://flowingdata.com/2012/03/15/calendar-heatmaps-to-visualize-time-series-data/#comments</comments>
		<pubDate>Thu, 15 Mar 2012 17:18:10 +0000</pubDate>
		<dc:creator>Nathan Yau</dc:creator>
				<category><![CDATA[Tutorials]]></category>
		<category><![CDATA[calendar]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://flowingdata.com/?p=22383</guid>
		<description><![CDATA[<p><a href="http://flowingdata.com/2012/03/15/calendar-heatmaps-to-visualize-time-series-data/"><img width="625" height="339" src="http://flowingdata.com/wp-content/uploads/2012/03/calendar-sample1.png" class="attachment-medium wp-post-image" alt="Calendar heatmaps made easy" title="Calendar heatmaps made easy" /></a></p>The familiar but underused layout is a good way to look at patterns over time. This tutorial gives you an easy way to make them and guides you through the code so you can adapt it to your needs.]]></description>
			<content:encoded><![CDATA[The familiar but underused layout is a good way to look at patterns over time. This tutorial gives you an easy way to make them and guides you through the code so you can adapt it to your needs.]]></content:encoded>
			<wfw:commentRss>http://flowingdata.com/2012/03/15/calendar-heatmaps-to-visualize-time-series-data/feed/</wfw:commentRss>
		<slash:comments>16</slash:comments>
		</item>
		<item>
		<title>Members Only: How to Hand Edit R Plots in Inkscape</title>
		<link>http://flowingdata.com/2012/02/28/how-to-edit-r-plots-by-hand-in-inkscape/</link>
		<comments>http://flowingdata.com/2012/02/28/how-to-edit-r-plots-by-hand-in-inkscape/#comments</comments>
		<pubDate>Tue, 28 Feb 2012 17:00:53 +0000</pubDate>
		<dc:creator>Nathan Yau</dc:creator>
				<category><![CDATA[Tutorials]]></category>
		<category><![CDATA[Inkscape]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://flowingdata.com/?p=21911</guid>
		<description><![CDATA[<p><a href="http://flowingdata.com/2012/02/28/how-to-edit-r-plots-by-hand-in-inkscape/"><img width="625" height="511" src="http://flowingdata.com/wp-content/uploads/2012/02/plot-final-625x511.png" class="attachment-medium wp-post-image" alt="R plot edited in Inkscape" title="R plot edited in Inkscape" /></a></p>You can control graph elements with code as you output things from R, but sometimes it is easier to do it manually. Inkscape, an Open Source alternative to Adobe Illustrator, might be what you are looking for.]]></description>
			<content:encoded><![CDATA[You can control graph elements with code as you output things from R, but sometimes it is easier to do it manually. Inkscape, an Open Source alternative to Adobe Illustrator, might be what you are looking for.]]></content:encoded>
			<wfw:commentRss>http://flowingdata.com/2012/02/28/how-to-edit-r-plots-by-hand-in-inkscape/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Members Only: How to Make a Contour Map</title>
		<link>http://flowingdata.com/2012/02/07/how-to-make-a-contour-map/</link>
		<comments>http://flowingdata.com/2012/02/07/how-to-make-a-contour-map/#comments</comments>
		<pubDate>Tue, 07 Feb 2012 09:32:24 +0000</pubDate>
		<dc:creator>Nathan Yau</dc:creator>
				<category><![CDATA[Tutorials]]></category>
		<category><![CDATA[contour map]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://flowingdata.com/?p=21149</guid>
		<description><![CDATA[<p><a href="http://flowingdata.com/2012/02/07/how-to-make-a-contour-map/"><img width="625" height="383" src="http://flowingdata.com/wp-content/uploads/2012/02/10-filled-contour-colors.png" class="attachment-medium wp-post-image" alt="Contour plots" title="Contour plots" /></a></p>Filled contour plots are useful for looking at density across two dimensions and are often used to visualize geographic data. It's straightforward to make them in R &#8212; once you get your data in the right format, that is.]]></description>
			<content:encoded><![CDATA[Filled contour plots are useful for looking at density across two dimensions and are often used to visualize geographic data. It's straightforward to make them in R &mdash; once you get your data in the right format, that is.]]></content:encoded>
			<wfw:commentRss>http://flowingdata.com/2012/02/07/how-to-make-a-contour-map/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>Members Only: Using Color Scales and Palettes in R</title>
		<link>http://flowingdata.com/2012/01/19/using-color-scales-and-palettes-in-r/</link>
		<comments>http://flowingdata.com/2012/01/19/using-color-scales-and-palettes-in-r/#comments</comments>
		<pubDate>Thu, 19 Jan 2012 18:55:06 +0000</pubDate>
		<dc:creator>Nathan Yau</dc:creator>
				<category><![CDATA[Tutorials]]></category>
		<category><![CDATA[color]]></category>
		<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://flowingdata.com/?p=20827</guid>
		<description><![CDATA[<p><a href="http://flowingdata.com/2012/01/19/using-color-scales-and-palettes-in-r/"><img width="625" height="252" src="http://flowingdata.com/wp-content/uploads/2012/01/colormatrix-stripped.png" class="attachment-medium wp-post-image" alt="Colors in R" title="Colors in R" /></a></p>Color can drastically change how a chart reads and what you see in your data, so don't leave it up to chance with defaults.]]></description>
			<content:encoded><![CDATA[Color can drastically change how a chart reads and what you see in your data, so don't leave it up to chance with defaults.]]></content:encoded>
			<wfw:commentRss>http://flowingdata.com/2012/01/19/using-color-scales-and-palettes-in-r/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Members Only: Build Interactive Time Series Charts with Filters</title>
		<link>http://flowingdata.com/2012/01/05/build-interactive-time-series-charts-with-filters/</link>
		<comments>http://flowingdata.com/2012/01/05/build-interactive-time-series-charts-with-filters/#comments</comments>
		<pubDate>Thu, 05 Jan 2012 17:45:39 +0000</pubDate>
		<dc:creator>Nathan Yau</dc:creator>
				<category><![CDATA[Tutorials]]></category>
		<category><![CDATA[D3]]></category>
		<category><![CDATA[interactive]]></category>
		<category><![CDATA[javascript]]></category>
		<category><![CDATA[time series]]></category>

		<guid isPermaLink="false">http://flowingdata.com/?p=20553</guid>
		<description><![CDATA[<p><a href="http://flowingdata.com/2012/01/05/build-interactive-time-series-charts-with-filters/"><img width="625" height="360" src="http://flowingdata.com/wp-content/uploads/2012/01/Area-charts-625x360.png" class="attachment-medium wp-post-image" alt="Area charts" title="Area charts" /></a></p>When you have several time series over many categories, it can be useful to show them separately rather than put it all in one graph. This is one way to do it interactively with categorical filters.]]></description>
			<content:encoded><![CDATA[When you have several time series over many categories, it can be useful to show them separately rather than put it all in one graph. This is one way to do it interactively with categorical filters.]]></content:encoded>
			<wfw:commentRss>http://flowingdata.com/2012/01/05/build-interactive-time-series-charts-with-filters/feed/</wfw:commentRss>
		<slash:comments>19</slash:comments>
		</item>
		<item>
		<title>How to map connections with great circles</title>
		<link>http://flowingdata.com/2011/05/11/how-to-map-connections-with-great-circles/</link>
		<comments>http://flowingdata.com/2011/05/11/how-to-map-connections-with-great-circles/#comments</comments>
		<pubDate>Wed, 11 May 2011 09:27:17 +0000</pubDate>
		<dc:creator>Nathan Yau</dc:creator>
				<category><![CDATA[Tutorials]]></category>
		<category><![CDATA[airlines]]></category>
		<category><![CDATA[arcs]]></category>
		<category><![CDATA[featured]]></category>
		<category><![CDATA[great circle]]></category>

		<guid isPermaLink="false">http://flowingdata.com/?p=16555</guid>
		<description><![CDATA[<p><a href="http://flowingdata.com/2011/05/11/how-to-map-connections-with-great-circles/"><img width="575" height="390" src="http://flowingdata.com/wp-content/uploads/2011/05/4-airline-color.jpg" class="attachment-medium wp-post-image" alt="Mapping connections with great circles" title="Mapping connections with great circles" /></a></p>There are various ways to visualize connections, but one of the most intuitive and straightforward ways is to actually connect entities or objects with lines. And when it comes to <em>geographic</em> connections, great circles are a nice way to do this.]]></description>
			<content:encoded><![CDATA[<p><a href="http://flowingdata.com/2011/05/11/how-to-map-connections-with-great-circles/"><img width="575" height="390" src="http://flowingdata.com/wp-content/uploads/2011/05/4-airline-color.jpg" class="attachment-medium wp-post-image" alt="Mapping connections with great circles" title="Mapping connections with great circles" /></a></p><p>Here's the technical definition of great circles on Wikipedia:</p>
<blockquote><p>A great circle, also known as a Riemannian circle, of a sphere is the intersection of the sphere and a plane which passes through the center point of the sphere, as distinct from a small circle. Any diameter of any great circle coincides with a diameter of the sphere, and therefore all great circles have the same circumference as each other, and have the same center as the sphere. A great circle is the largest circle that can be drawn on any given sphere. Every circle in Euclidean space is a great circle of exactly one sphere.</p></blockquote>
<p><img src="http://flowingdata.com/wp-content/uploads/2011/05/220px-Great_circle-210x192.png" alt="" title="Great circle" width="210" height="192" class="alignnone size-thumbnail wp-image-16563 img-right" />The important bit is that the shortest distance between two points on a sphere is the minor arc of a great circle. When currents and wind don't interfere, ships and aircraft use great circle routes, which makes it perfect to show air carrier coverage. This is what you'll do in this example.</p>
<p>It turns out these maps are relatively easy to make in <a href="http://r-project.org">R</a> once you know how to put the pieces together. The maps that I posted on <a href="http://flowingdata.com/2011/05/05/where-do-major-airlines-fly-in-the-united-states/">flight connections for each airline</a> are what we're after. With only about 30 lines of code, you can produce a series of maps that show flights for every major airline, so you get a lot of bang for the amount of effort.</p>
<h2>Step 0. Setup</h2>
<p><span class="tip">1. I wish they'd update the site; it totally looks like something out of the 1990s, but nevermind that. It's useful software.</span>You're going to use R in this example, so <a href="http://www.r-project.org/">download the free and open-source software</a> if you haven't already. It's a straightforward one-click install<sup>1</sup>.</p>
<h2>Step 1. Load packages</h2>
<p>Open R. You need two packages to do the heavy-lifting: <code>maps</code> and <code>geosphere</code>. If they're not installed, you can do that via the main menu <strong>Packages & Data > Package Installer</strong>. Once installed, load the two packages as follows:</p>
<pre class="brush: r; title: ; notranslate">library(maps)
library(geosphere)</pre>
<p>The first package <code>maps</code>, is used to draw the base maps, and the second, <code>geosphere</code> is used to draw the great circle arcs.</p>
<h2>Step 2. Draw base maps</h2>
<p>The <code>maps</code> package makes it easy to draw geographic areas in R with the <code>map()</code> function. Pass it a database name, and you you get a map in one line of code. For example, to map the United States, type the following in the R console.</p>
<pre class="brush: r; title: ; notranslate">map(&quot;state&quot;)</pre>
<p>Here's the map that you get:</p>
<p><span class="tip">Blank state map created in R. Nothing else.</span><img src="http://flowingdata.com/wp-content/uploads/2011/05/Screen-shot-2011-05-08-at-6.28.19-PM.png" alt="Contiguous USA map" title="Screen shot 2011-05-08 at 6.28.19 PM.png" border="0" width="575" height="301" /></p>
<p><span class="tip">2. Map projections are finicky in R and can be a pain sometimes, but if you want to map with a different projection like say, Albers, look into <code>mapproject()</code> in the <code>maps</code> package.</span>The projection isn't the prettiest thing in the world, but it'll do for now<sup>2</sup>. The bigger problem is that the <code>state</code> database doesn't include Alaska or Hawaii. To include the two often left out states, you use the <code>world</code> database.</p>
<pre class="brush: r; title: ; notranslate">map(&quot;world&quot;)</pre>
<p>This gives you a full black and white map of the world.</p>
<p><span class="tip">A blank world map is just as easy to make in R.</span><img src="http://flowingdata.com/wp-content/uploads/2011/05/Screen-shot-2011-05-08-at-6.28.37-PM1.png" alt="Screen shot 2011 05 08 at 6 28 37 PM" title="Screen shot 2011-05-08 at 6.28.37 PM.png" border="0" width="575" height="275" /></p>
<h2>Step 3. Limiting boundaries</h2>
<p>The data at hand, which you'll get to soon, is only domestic flights for the United States, so you should focus on that area. Use the <code>xlim</code> and <code>ylim</code> variables to limit the map to a rectangle that only covers a range of latitude and longitude. </p>
<p>You can also play with the color of the base map at this point. By default, <code>maps()</code> doesn't fill regions, but it will if you set <code>fill</code> to <code>TRUE</code>. For some reason though, if you set the fill color, you can't change the border color. So instead (if you don't want to edit in Illustrator later) you can set the line width (<code>lwd</code>) to something really skinny. For the purpose of this example, we want the the border lines to get out of the way.</p>
<pre class="brush: r; title: ; notranslate">xlim &lt;- c(-171.738281, -56.601563)
ylim &lt;- c(12.039321, 71.856229)
map(&quot;world&quot;, col=&quot;#f2f2f2&quot;, fill=TRUE, bg=&quot;white&quot;, lwd=0.05, xlim=xlim, ylim=ylim)</pre>
<p>And here's what you get with the above code. A simple and clean map of the United States that includes Alaska and Hawaii.</p>
<p><span class="tip">Focusing on all states. That includes Hawaii and Alaska.</span><img src="http://flowingdata.com/wp-content/uploads/2011/05/Screen-shot-2011-05-08-at-6.29.11-PM.png" alt="Screen shot 2011 05 08 at 6 29 11 PM" title="Screen shot 2011-05-08 at 6.29.11 PM.png" border="0" width="575" height="386" /></p>
<h2>Step 4. Draw connecting lines</h2>
<p>Now that you have a map, you can draw connecting lines. This is really easy with <code>gcIntermediate()</code> from the <code>geosphere</code> package. Pass it the latitude and longitude of the two connecting points, and <code>gcIntermediate()</code> spits out the coordinates of points on the circle. </p>
<p>The <code>n</code> argument indicates how many points you want the function to return. The more points you indicate, the smoother the resulting line will be, but up to a certain point, you won't see much difference. The <code>addStartEnd</code> argument indicates that you want to include the start and end points in the great circle coordinates. Lastly, use <code>lines()</code> to actually draw the line. </p>
<pre class="brush: r; title: ; notranslate">lat_ca &lt;- 39.164141
lon_ca &lt;- -121.640625
lat_me &lt;- 45.213004
lon_me &lt;- -68.906250
inter &lt;- gcIntermediate(c(lon_ca, lat_ca), c(lon_me, lat_me), n=50, addStartEnd=TRUE)
lines(inter)</pre>
<p>This draws a great circle arc from California to Maine.</p>
<p><span class="tip">A single connection.</span><img src="http://flowingdata.com/wp-content/uploads/2011/05/1-single-connection1.jpg" alt="1 single connection" title="1-single-connection.jpg" border="0" width="575" height="391" /></p>
<p>Similarly, you can add another line simply by doing the same as above with different latitude and longitude coordinates. For example, you can draw a line from California to Texas, and you can use the <code>col</code> argument to set the line color to red.</p>
<pre class="brush: r; title: ; notranslate">lat_tx &lt;- 29.954935
lon_tx &lt;- -98.701172
inter2 &lt;- gcIntermediate(c(lon_ca, lat_ca), c(lon_tx, lat_tx), n=50, addStartEnd=TRUE)
lines(inter2, col=&quot;red&quot;)</pre>
<p><span class="tip">Okay, now let's do two connections.</span><img src="http://flowingdata.com/wp-content/uploads/2011/05/2-addtl-connection1.jpg" alt="2 addtl connection" title="2-addtl-connection.jpg" border="0" width="575" height="393" /></p>
<h2>Step 5. Load flight data</h2>
<p>So now you know how to do the hard part. You just have to iterate over latitude and longitude pairs to get the full map for a specific carrier. Start by loading the data with <code>read.csv()</code>, as shown below:</p>
<pre class="brush: r; title: ; notranslate">airports &lt;- read.csv(&quot;http://datasets.flowingdata.com/tuts/maparcs/airports.csv&quot;, header=TRUE) flights &lt;- read.csv(&quot;http://datasets.flowingdata.com/tuts/maparcs/flights.csv&quot;, header=TRUE, as.is=TRUE)</pre>
<p><span class="tip">3. I used a five-line Python script to aggregate data that I downloaded from the <a href="http://transtats.bts.gov/Tables.asp?DB_ID=120&DB_Name=Airline%20On-Time%20Performance%20Data&DB_Short_Name=On-Time">Bureau of Transportation Statistics</a>. The original file was about 50mb. You can aggregate in R, but it's usually a better idea to not load large-ish files in R. Luckily airport latitude and longitude coordinates were available on the page for <a href="http://stat-computing.org/dataexpo/2009/">Data Expo 2009</a>. Otherwise, I would've geocoded them myself, which I started to do and then got stalled by API limits.</span>This is processed data that I cleaned up for this tutorial. It's flight counts between each airport, categorized by airline<sup>3</sup>.</p>
<h2>Step 6. Draw multiple connections</h2>
<p>Data in. To map all connections for say, American Airlines, you filter as shown in line 3. Then you loop over each row of data, which has latitude/longitude for two two airports and the number of flights between them.</p>
<pre class="brush: r; title: ; notranslate">map(&quot;world&quot;, col=&quot;#f2f2f2&quot;, fill=TRUE, bg=&quot;white&quot;, lwd=0.05, xlim=xlim, ylim=ylim)

fsub &lt;- flights[flights$airline == &quot;AA&quot;,]
for (j in 1:length(fsub$airline)) {
	air1 &lt;- airports[airports$iata == fsub[j,]$airport1,]
	air2 &lt;- airports[airports$iata == fsub[j,]$airport2,]

	inter &lt;- gcIntermediate(c(air1[1,]$long, air1[1,]$lat), c(air2[1,]$long, air2[1,]$lat), n=100, addStartEnd=TRUE)

	lines(inter, col=&quot;black&quot;, lwd=0.8)
}
</pre>
<p>Here's the mess of black lines that you get from the above code.</p>
<p><span class="tip">Rough image of American Airlines flights.</span><img src="http://flowingdata.com/wp-content/uploads/2011/05/3b-solid.jpg" alt="3b solid" title="3b-solid.jpg" border="0" width="575" height="390" /></p>
<p>Not bad, but you can do better than that with some rearranging and coloring appropriately. </p>
<h2>Step 7. Color for clarity</h2>
<p>You learned how to change the color of lines back in step 4. You change the <code>col</code> argument in <code>lines()</code>. You hard-coded the color though. You could instead create a vector of colors that scaled from, say, light gray to black, and then pick a shade from farther down the vector for connections with more flights. That way connections with more flights would be more prominent.</p>
<p>If only there were a way to create a color scale automagically. Oh wait, there is. It's called <code>colorRampPalette()</code>. Pass it the base colors you want to use, and it'll fill in everything in between. More specifically, it creates a function that you can pass a number two, indicating how many shades you want to use. In the code below, we use 100 (lines 1 and 2).</p>
<p>Then you do the same as you did the previous step, but instead of setting all lines to black, you calculate the color based on how many fewer flights the current connection has compared to the maximum flight count. </p>
<pre class="brush: r; title: ; notranslate">pal &lt;- colorRampPalette(c(&quot;#f2f2f2&quot;, &quot;black&quot;))
colors &lt;- pal(100)

map(&quot;world&quot;, col=&quot;#f2f2f2&quot;, fill=TRUE, bg=&quot;white&quot;, lwd=0.05, xlim=xlim, ylim=ylim)

fsub &lt;- flights[flights$airline == &quot;AA&quot;,]
maxcnt &lt;- max(fsub$cnt)
for (j in 1:length(fsub$airline)) {
	air1 &lt;- airports[airports$iata == fsub[j,]$airport1,]
	air2 &lt;- airports[airports$iata == fsub[j,]$airport2,]

	inter &lt;- gcIntermediate(c(air1[1,]$long, air1[1,]$lat), c(air2[1,]$long, air2[1,]$lat), n=100, addStartEnd=TRUE)
	colindex &lt;- round( (fsub[j,]$cnt / maxcnt) * length(colors) )

	lines(inter, col=colors[colindex], lwd=0.8)
}</pre>
<p>Below is the map that you get. The problem is the longer, less prominent flights are obscuring the more popular connections, because they're being drawn on top. The above code just draws lines in the order that the data comes.</p>
<p><span class="tip">Use color to emphasize more prominent flights.</span><img src="http://flowingdata.com/wp-content/uploads/2011/05/3a-connection.jpg" alt="3a connection" title="3a-connection.jpg" border="0" width="575" height="391" /></p>
<p>To fix this we use the <a href="http://paulbutler.org/archives/visualizing-facebook-friends/">same method</a> that Paul Butler used for his Facebook map. We just order connection from least to greatest flight counts. That way less popular connections are drawn first and therefore, will be on the bottom, while the darker connections will be drawn on top.</p>
<pre class="brush: r; title: ; notranslate">pal &lt;- colorRampPalette(c(&quot;#f2f2f2&quot;, &quot;black&quot;))
pal &lt;- colorRampPalette(c(&quot;#f2f2f2&quot;, &quot;red&quot;))
colors &lt;- pal(100)

map(&quot;world&quot;, col=&quot;#f2f2f2&quot;, fill=TRUE, bg=&quot;white&quot;, lwd=0.05, xlim=xlim, ylim=ylim)

fsub &lt;- flights[flights$airline == &quot;AA&quot;,]
fsub &lt;- fsub[order(fsub$cnt),]
maxcnt &lt;- max(fsub$cnt)
for (j in 1:length(fsub$airline)) {
	air1 &lt;- airports[airports$iata == fsub[j,]$airport1,]
	air2 &lt;- airports[airports$iata == fsub[j,]$airport2,]

	inter &lt;- gcIntermediate(c(air1[1,]$long, air1[1,]$lat), c(air2[1,]$long, air2[1,]$lat), n=100, addStartEnd=TRUE)
	colindex &lt;- round( (fsub[j,]$cnt / maxcnt) * length(colors) )

	lines(inter, col=colors[colindex], lwd=0.8)
}</pre>
<p><span class="tip">Layer dark on top of light so that it's easier to read.</span><img src="http://flowingdata.com/wp-content/uploads/2011/05/3-airline1.jpg" alt="3 airline" title="3-airline.jpg" border="0" width="575" height="393" /></p>
<p>That's much better. At this point, you can play around with color, by changing the shades in <code>colorRampPalette()</code>. Below uses a light gray (#f2f2f2) to red, but you can do whatever you like. You can even use more than two colors.</p>
<pre class="brush: r; title: ; notranslate">pal &lt;- colorRampPalette(c(&quot;#f2f2f2&quot;, &quot;red&quot;))</pre>
<p><span class="tip">Red? Sure, you can do that, too.</span><img src="http://flowingdata.com/wp-content/uploads/2011/05/4-airline-color.jpg" alt="4 airline color" title="4-airline-color.jpg" border="0" width="575" height="390" /></p>
<h2>Step 8. Map every carrier</h2>
<p>The only thing left to do is make a map for each carrier. You can do it manually by changing the airline code and the rerunning the script, but there's an easier way. Find all unique carriers with the <code>unique()</code> function, and iterate over each one. The place that you put "AA" you replace with <code>carriers[i]</code> to indicate the current carrier in the loop. The code below will create a PDF for each carrier and save to your current working directory.</p>
<pre class="brush: r; title: ; notranslate"># Unique carriers
carriers &lt;- unique(flights$airline)

# Color
pal &lt;- colorRampPalette(c(&quot;#333333&quot;, &quot;white&quot;, &quot;#1292db&quot;))
colors &lt;- pal(100)

for (i in 1:length(carriers)) {

	pdf(paste(&quot;carrier&quot;, carriers[i], &quot;.pdf&quot;, sep=&quot;&quot;), width=11, height=7)
	map(&quot;world&quot;, col=&quot;#191919&quot;, fill=TRUE, bg=&quot;#000000&quot;, lwd=0.05, xlim=xlim, ylim=ylim)
	fsub &lt;- flights[flights$airline == carriers[i],]
	fsub &lt;- fsub[order(fsub$cnt),]
	maxcnt &lt;- max(fsub$cnt)
	for (j in 1:length(fsub$airline)) {
		air1 &lt;- airports[airports$iata == fsub[j,]$airport1,]
		air2 &lt;- airports[airports$iata == fsub[j,]$airport2,]

		inter &lt;- gcIntermediate(c(air1[1,]$long, air1[1,]$lat), c(air2[1,]$long, air2[1,]$lat), n=100, addStartEnd=TRUE)
		colindex &lt;- round( (fsub[j,]$cnt / maxcnt) * length(colors) )

		lines(inter, col=colors[colindex], lwd=0.6)
	}

	dev.off()
}</pre>
<p>Here is the map for American Airlines again, produced by the code above. I fiddled with color some to match the maps I created for the <a href="http://flowingdata.com/2011/05/05/where-do-major-airlines-fly-in-the-united-states/">original flight post</a>.</p>
<p><span class="tip">A dark map background with gray to white to blue paths.</span><img src="http://flowingdata.com/wp-content/uploads/2011/05/5-black-theme1.jpg" alt="5 black theme" title="5-black-theme.jpg" border="0" width="575" height="389" /></p>
<p>That's all there is to it. Can you think of other datasets this method could be applied to? Give this tutorial a whirl and post your results in the comments. </p>
<p>For more examples, guidance, and all-around data goodness like this, <a href="http://flowingdata.com/membership/">sign up for FlowingData membership</a>.</p>
<h4>Related</h4><p><ul>
<li><a href='http://flowingdata.com/2008/10/10/great-data-visualization-tells-a-great-story/' rel='bookmark' title='Great Data Visualization Tells a Great Story'>Great Data Visualization Tells a Great Story</a></li>
<li><a href='http://flowingdata.com/2011/05/05/where-do-major-airlines-fly-in-the-united-states/' rel='bookmark' title='Geographic breakdown: Where do major airlines fly?'>Geographic breakdown: Where do major airlines fly?</a></li>
<li><a href='http://flowingdata.com/2011/07/07/where-the-aliens-are-flying-their-ufos/' rel='bookmark' title='Where the aliens are flying their UFOs'>Where the aliens are flying their UFOs</a></li>
</ul></p>]]></content:encoded>
			<wfw:commentRss>http://flowingdata.com/2011/05/11/how-to-map-connections-with-great-circles/feed/</wfw:commentRss>
		<slash:comments>48</slash:comments>
		</item>
		<item>
		<title>How to Make Bubble Charts</title>
		<link>http://flowingdata.com/2010/11/23/how-to-make-bubble-charts/</link>
		<comments>http://flowingdata.com/2010/11/23/how-to-make-bubble-charts/#comments</comments>
		<pubDate>Tue, 23 Nov 2010 08:25:48 +0000</pubDate>
		<dc:creator>Nathan Yau</dc:creator>
				<category><![CDATA[Tutorials]]></category>
		<category><![CDATA[bubbles]]></category>

		<guid isPermaLink="false">http://flowingdata.com/?p=12845</guid>
		<description><![CDATA[<p><a href="http://flowingdata.com/2010/11/23/how-to-make-bubble-charts/"><img width="625" height="419" src="http://flowingdata.com/wp-content/uploads/2010/11/5-edited-version1-625x419.png" class="attachment-medium wp-post-image" alt="Crime Rates by State" title="Crime Rates by State" /></a></p>Ever since Hans Rosling presented a motion chart to tell his story of the wealth and health of nations, there has been an affinity for proportional bubbles on an x-y axis. This tutorial is for the static version of the motion chart: the bubble chart.]]></description>
			<content:encoded><![CDATA[<p><a href="http://flowingdata.com/2010/11/23/how-to-make-bubble-charts/"><img width="625" height="419" src="http://flowingdata.com/wp-content/uploads/2010/11/5-edited-version1-625x419.png" class="attachment-medium wp-post-image" alt="Crime Rates by State" title="Crime Rates by State" /></a></p><p>A bubble chart can also just be straight up proportionally sized bubbles, but here we're going to cover how to create the variety that is like a scatterplot with a third, bubbly dimension.</p>
<p>The advantage of this chart type is that it lets you compare three variables at once. One is on the x-axis, one is on the y-axis, and the third is represented by area size of bubbles. Have a look at <a href="http://flowingdata.com/2010/11/23/how-to-make-bubble-charts/5-edited-version-2/">the final chart</a> to see what we're making.</p>
<h2>Step 0. Download R</h2>
<p>We're going to use R to do this, so <a href="http://www.r-project.org/">download that</a> before moving on. It's free and open-source, so you have nothing to lose. Plus it's a <a href="http://flowingdata.com/2010/11/17/r-is-the-need-to-know-stat-software/">need-to-know-name of 2011</a>, so you might as well get to know it now. You can thank me later.</p>
<h2>Step 1. Load the data</h2>
<p>Assuming you already have R open, the first thing we'll do is load the data. We're examining the same crime data the we did for our last tutorial. I've added state population this time around. One note about the data. The crime numbers are actually for 2005, while the populations are for 2008. This isn't a huge deal since we're more interested in relative populations than we are the raw values, but keep that in mind. </p>
<p>Okay, moving on. You can download the tab-delimited file <a href="http://datasets.flowingdata.com/crimeRatesByState2005.tsv">here</a> and keep it local, but the easiest way is to load it directly into R with the below line of code:</p>
<p>
<pre class="brush: r; title: ; notranslate">crime &lt;- read.csv(&quot;http://datasets.flowingdata.com/crimeRatesByState2005.tsv&quot;, header=TRUE, sep=&quot;\t&quot;)
</pre>
</p>
<p>You're telling R to download the data and read it as a comma-delimited file with a header. This loads it as a data frame in the <code>crime</code> variable.</p>
<h2>Step 2. Draw some circles</h2>
<p>Now we can get right to drawing circles with the <code>symbols()</code> command. Pass it values for the x-axis, y-axis, and circles, and it'll spit out a bubble chart for you.</p>
<p>
<pre class="brush: r; title: ; notranslate">symbols(crime$murder, crime$burglary, circles=crime$population)
</pre>
</p>
<p>Run the line of code above, and you'll get this:</p>
<p><span class="tip">Circles incorrectly sized by radius instead of area. Large values appear much bigger.</span><img class="alignnone size-medium wp-image-12846" title="1-wrong-sized-circles" src="http://flowingdata.com/wp-content/uploads/2010/11/1-wrong-sized-circles-575x519.png" alt="" width="575" height="519" /></p>
<p>All done, right? Wrong. That was a test. The above sizes the radius of the circles by population. We want to size them by <em>area</em>. The relative proportions are all out of wack if you size by radius. </p>
<h2>Step 3. Size the circles correctly</h2>
<p>To size radiuses correctly, we look to the equation for area of a circle:</p>
<p>Area of circle = &#960;r<sup>2</sup></p>
<p>In this case area of the circle is population. We want to know <em>r</em>. Move some things around and we get this:</p>
<p>r = &#8730;(Area of circle / &#960;)</p>
<p>Substitute population for the area of the circle, and translate to R, and we get this:</p>
<p>
<pre class="brush: r; title: ; notranslate">radius &lt;- sqrt( crime$population/ pi )
symbols(crime$murder, crime$burglary, circles=radius)
</pre>
</p>
<p><span class="tip">Circles correctly sized by area, but the range of sizes is too much. The chart is cluttered and unreadable.</span><img src="http://flowingdata.com/wp-content/uploads/2010/11/2-correctsize-too-big-575x530.png" alt="" title="2-correctsize-too-big" width="575" height="530" class="alignnone size-medium wp-image-12847" /></p>
<p>Yay. Properly scaled circles. They're way too big though for this chart to be useful. By default, <code>symbols()</code> sizes the largest bubble to one inch, and then scales the rest accordingly. We can change that by using the <code>inches</code> argument. Whatever value you put will take the place of the one-inch default. While we're at it, let's add color and change the x- and y-axis labels.</p>
<p>
<pre class="brush: r; title: ; notranslate">symbols(crime$murder, crime$burglary, circles=radius, inches=0.35, fg=&quot;white&quot;, bg=&quot;red&quot;, xlab=&quot;Murder Rate&quot;, ylab=&quot;Burglary Rate&quot;)
</pre>
</p>
<p>Notice we use <code>fg</code> to change border color, <code>bg</code> to change fill color. Here's what we get:</p>
<p><span class="tip">Scale the circles to make the the chart more readable, and use the <code>fg</code> and <code>bg</code> arguments to change colors.</span><img src="http://flowingdata.com/wp-content/uploads/2010/11/3-sized-circles-by-area-575x530.png" alt="" title="3-sized-circles-by-area" width="575" height="530" class="alignnone size-medium wp-image-12848" /></p>
<p>Now we're getting somewhere.</p>
<p>By the way, you can make a chart with other shapes too with <code>symbols()</code>. You can make squares, rectangles, thermometers, boxplots, and stars. They take different arguments than the circle. The squares, for example, are sized by the length of a side. Again, make sure you size them appropriately.</p>
<p>Here's what squares look like, using the below line of code.</p>
<p>
<pre class="brush: r; title: ; notranslate">symbols(crime$murder, crime$burglary, squares=sqrt(crime$population), inches=0.5)</pre>
</p>
<p><span class="tip">You can use squares sized by area instead of circles, too.</span><img src="http://flowingdata.com/wp-content/uploads/2010/11/crime-squares-no-labels-575x457.png" alt="" title="crime-squares-no-labels" width="575" height="457" class="alignnone size-medium wp-image-12863" /></p>
<p>Let's stick with circles for now.</p>
<h2>Step 4. Add labels</h2>
<p>As it is, the chart shows some sense of distribution, but we don't know which circle represents each state. So let's add labels. We do this with <code>text()</code>, whose arguments are x-coordinates, y-coordinates, and the actual text to print. We have all of these. Like the bubbles, the <em>x</em> is murders and the <em>y</em> is burglaries. The actual labels are state names, which is the first column in our data frame.</p>
<p>With that in mind, we do this:</p>
<p>
<pre class="brush: r; title: ; notranslate">text(crime$murder, crime$burglary, crime$state, cex=0.5)
</pre>
</p>
<p>The <code>cex</code> argument controls text size. It is 1 by default. Values greater than one will make the labels bigger and the opposite for less than one. The labels will center on the x- and y-coordinates.</p>
<p>Here's what it looks like.</p>
<p><span class="tip">Add labels so you know what each circle represents.</span><img src="http://flowingdata.com/wp-content/uploads/2010/11/4-added-labels-575x521.png" alt="" title="4-added-labels" width="575" height="521" class="alignnone size-medium wp-image-12849" /></p>
<h2>Step 5. Clean up</h2>
<p>Finally, as per usual, I clean up in Adobe Illustrator. You can mess around with this in R, if you like, but I've found it's way easier to save my file as a PDF and do what I want with Illustrator. I uncluttered the state labels to make them more readable, rotated the y-axis labels, so that they're horizontal, added a legend for population, and removed the outside border. I also brought Georgia to the front, because most of it was hidden by Texas.</p>
<p>Here's the <a href="http://flowingdata.com/2010/11/23/how-to-make-bubble-charts/5-edited-version-2/">final version</a>. Click the image to see it in full.</p>
<p><span class="tip">Cleanup and a key make the chart more informative.</span><a href="http://flowingdata.com/2010/11/23/how-to-make-bubble-charts/5-edited-version-2/" rel="attachment wp-att-12941"><img src="http://flowingdata.com/wp-content/uploads/2010/11/5-edited-version1-575x385.png" alt="" title="5-edited-version" width="575" height="385" class="alignnone size-medium wp-image-12941" /></a></p>
<p>And there you go. Type in <code>?symbols</code> in R for more plotting options. Go wild.</p>
<p>For more examples, guidance, and all-around data goodness like this, <a href="http://book.flowingdata.com/">buy Visualize This</a>, the new FlowingData book.</p>
<h4>Related</h4><p><ul>
<li><a href='http://flowingdata.com/2007/10/22/bars-as-an-alternative-to-bubble-charts/' rel='bookmark' title='Bars as an Alternative to Bubble Charts'>Bars as an Alternative to Bubble Charts</a></li>
<li><a href='http://flowingdata.com/2009/02/16/fail-area-circles-on-wall-street/' rel='bookmark' title='Fail: Area Circles on Wall Street'>Fail: Area Circles on Wall Street</a></li>
<li><a href='http://flowingdata.com/2010/07/22/7-basic-rules-for-making-charts-and-graphs/' rel='bookmark' title='7 Basic Rules for Making Charts and Graphs'>7 Basic Rules for Making Charts and Graphs</a></li>
</ul></p>]]></content:encoded>
			<wfw:commentRss>http://flowingdata.com/2010/11/23/how-to-make-bubble-charts/feed/</wfw:commentRss>
		<slash:comments>63</slash:comments>
		</item>
	</channel>
</rss>

