Grabbing Weather Underground Data with BeautifulSoup

Posted to Tutorials  |  Nathan Yau

Weather Underground is a useful site and a fun place for weather enthusiasts. WU has a bunch of weather data (current and historical) from established weather stations, like at airports, and home stations setup by hobbyists. One problem: most of the data is in HTML tables instead of the CSV format that we like. I say most because you can download hourly data from a single day in CSV, but if you want say, temperature data over the past 5 years, you’re kind of at a loss.

But wait, there’s a solution. That solution is BeautifulSoup, an XML/HTML parser written in Python. Um, parse… what does that mean? Basically, the Python script will go through, or look at, a document extracting certain information from that document.

Back to WU. Like I said, there’s historical data in HTML tables like this. I just want the actual mean temperature in Fahrenheit for the past five years or so. I could go to every single page manually and record the temperature in Excel, but why do that when I can make the computer do it for me?

I’m not going to get into all of the details, but here’s the Python script I used to grab mean temperature from WU using BeautifulSoup.

import urllib2
from BeautifulSoup import BeautifulSoup

# Create/open a file called wunder.txt (which will be a comma-delimited file)
f = open('wunder-data.txt', 'w')

# Iterate through year, month, and day
for y in range(1980, 2007):
  for m in range(1, 13):
    for d in range(1, 32):

      # Check if leap year
      if y%400 == 0:
        leap = True
      elif y%100 == 0:
        leap = False
      elif y%4 == 0:
        leap = True
        leap = False

      # Check if already gone through month
      if (m == 2 and leap and d > 29):
      elif (m == 2 and d > 28):
      elif (m in [4, 6, 9, 10] and d > 30):

      # Open url
      url = ""+str(y)+ "/" + str(m) + "/" + str(d) + "/DailyHistory.html"
      page = urllib2.urlopen(url)

      # Get temperature from page
      soup = BeautifulSoup(page)
      dayTemp = soup.body.nobr.b.string

      # Format month for timestamp
      if len(str(m)) < 2:
        mStamp = '0' + str(m)
        mStamp = str(m)

      # Format day for timestamp
      if len(str(d)) < 2:
        dStamp = '0' + str(d)
        dStamp = str(d)

      # Build timestamp
      timestamp = str(y) + mStamp + dStamp

      # Write timestamp and temperature to file
      f.write(timestamp + ',' + dayTemp + '\n')

# Done getting data! Close file.

The script goes through each day of the year from 1980 through 2007, parses the corresponding WU page, and stores the temperature data in wunder-data.txt, and there you go. Keep in mind, this was really just a proof of concept, and the script can be modified quite a bit to fit your needs.

The Main Point

Just because data isn’t in CSV format, doesn’t mean it’s unavailable. If it’s on the Web, it’s up for grabs.


A Day in the Life of Americans

I wanted to see how daily patterns emerge at the individual level and how a person’s entire day plays out. So I simulated 1,000 of them.

Who is Older and Younger than You

Here’s a chart to show you how long you have until you start to feel your age.

Pizza Place Geography

Most of the major pizza chains are within a 5-mile radius of where I live, so I have my pick, …

Real Chart Rules to Follow

There are rules—usually for specific chart types meant to be read in a specific way—that you shouldn’t break. When they are, everyone loses. This is that small handful.