• I don’t want my credit card numbers floating around, because then I’d be screwed. That kind of data needs to be locked up tight behind a billion firewalls, a lock safe, five armed guards, and another locked safe and then one more guard plus another safe. However, there are lots of other kinds of data that should be online and publicly available or at least accessible via a phone call.
    Read More

  • A huge 8.0 earthquake shook Peru a few days ago killing at least 510 people. Homes and buildings were destroyed and many people’s lives were changed forever. I’m ashamed to admit that if it weren’t for my internship, I probably would have never even known about the quake. I hope a lot of help is headed towards Peru.

    This map graphic was a bit tricky because it was made for color in the paper. That means the color layer and text layer had to be split and sent separately to the printers. It’s this odd process, that I’m afraid I don’t quite understand, but the color printers are in a different place than the black and whites. The color part gets printed, and since the text and color is separated, there’s still time to make any last minute changes to the black and white. Uh, scratch that. That’s probably wrong.

    One of the map people provided me with the base map and then I filled in the blanks i.e. everything that isn’t land and water, and after about one billion back and forths I finally set it and was able to leave a couple hours later than usual. To top things off, some of the text was different in the paper today than I had put in.

  • COMING MAY 29

    Pre-order on Amazon
  • The well-known college rankings are now available for your viewing pleasure. Whether the ranking system is legit or not, I’ll let you be the judge, but I think everyone should take note that UC Berkeley was again the number one ranked public national university and UCLA was ranked number three. Go Calee-forn-ee-ah! In a nutshell, here’s what U.S. News ranks the universities:

    • Peer Assessment – 25%
    • Retention – 20% in national universities and liberal arts colleges and 25% in master’s and baccalaureate colleges
    • Faculty Resources – 20%
    • Student Selectivity – 15%
    • Financial Resources – 10%
    • Graduate Rate Performance – 5%; only in national universities and liberal arts colleges
    • Alumni giving rate 5%

    I wonder how much bias is in peer assessment.

  • Terrorist Attacks in Iraq

    Two hundred and fifty people died a couple of days ago in the deadliest attack deadliest attack of the war. We compiled a list of the most deadly attacks since February and then mapped them out. It was a team (and by team, I mean two) effort — I collected the data and a co-worker mapped it out.

    In this case, I went through old Times stories and took note of attacks that killed 20 or more people. It was really depressing reading all that stuff, but I’m definitely better for it. Without a doubt, I know more now than I ever have about what’s going on in the world.

    As you can see, my co-worker went with the old bubble map standby. I wish we could show the data differently than the usual map, but what type of visualization would that be?

  • Big Mac meal from McDonald’sEvery now and then I indulge in a Big Mac meal from McDonald’s. I feel satisfied while I eat the burger and fries and suck down my diet soda, but afterwards I feel sleepy, sluggish, and fat. Today was one of those days.

    As I ate my my satisfying-not-so-satisfying meal, I wondered what the Big Mac price differences from state to state or even city to city. I know that there’s data going around about Big Mac prices in different countries, but I’m pretty sure it varies quite a bit in the U.S. alone. I don’t remember paying over $6 for the number 1 in California. What a jip (and yet I’ve been to the golden arches at least three times in the past month).

  • PedometerI began my path of higher education at Berkeley as an Electrical Engineering and Computer Science student. As a stat graduate student, it’s hard to remember sitting in all of those (boring) engineering classes.

    If I learned anything though, it was from the painful computer science projects. No matter how big the project, I would start by breaking it up into lots of mini-tasks and work my way up to the final solution. I think this has helped me a lot not only in grad school, but solving problems in my life. Hence, my first attempt at continuous data collection has started at a very basic level — my pedometer.

    Read More

  • Five Romney Brothers

    As you might know (or don’t know), Mitt Romney is vying for the Republican presidential nomination. His five sons have all lent a helping hand to the campaign.

    This graphic is really basic, but sadly, it took me quite a while to finish. I thought I had finished it efficiently, but there were a bunch of style things I had to change e.g. how I cropped the mugshots. On my first pass, I had cropped the pictures in a way so that there was white space in between each brother. Of course, as I know now, that was a waste of precious space, and it looks a whole lot better this way.

  • While doing research on the process of rebuilding New Orleans after Hurricane Katrina and the U.S. Army Corps of Engineers, I’ve run across a frequent critic close and knowledgeable watcher of the New Orleans rebuild: Robert Bea. I don’t know much about him except that he seems like a very nice man. I found this on his Berkeley homepage:

    The world needs engineers who….

    • whose truth cannot be bought,
    • whose word is their bond,
    • who put character and honesty above wealth,
    • who do not hesitate to take chances,
    • who will not lose their identity in a crowd,
    • who will be as honest in small things as in great things,
    • who will make no compromise with wrong,
    • whose ambitions are not confined to their own selfish desires,
    • who will not say they do it “because everybody else does it,”
    • who are true to their friends through good report and evil report, in adversity as well as in prosperity,
    • who do not believe that shrewdness and cunning are the best qualities for winning success,
    • who are not ashamed to stand for the truth when it is unpopular, and · who have integrity and wisdom in addition to knowledge.

    Please help me to be this kind of engineer.

    Bob Bea

    This can certainly be applied to statisticians as well. Please help me be that kind of statistician.

    UPDATE: Just did some back and forth email with Professor Bea. He IS a nice man.

  • If I’ve learned anything in my first month at The Times, it’s that ArcGIS and Microsoft Excel are not worthless.

    For a while now, since I started grad school, I had this beef against Microsoft Excel. I hated how everyone used it and how I didn’t have the money to buy the Office suite or even cared enough to want to buy it. It seemed so limited in what it could do compared to a quickly setup MySQL database.

    Then last year, I took this crash course on ArcGIS. It was four days, eight hours a day of mapping. I hated ArcGIS after that workshop. The whole software suite seemed sluggish, bloated, and so not worth my time.

    Today I saw some ArcGIS and Excel proficiency I had never seen before. My co-worker flew through giant spreadsheets, punched in formulas, and joined columns left and right. It was quite the scene. Once the data were prepared in Excel, she shot it over to ArcGIS. She quickly loaded a shape file for all counties in the Tri-state region, changed some limits, and voila, a few seconds later we had the map we needed. Put in some labeling, some numbers, and the graphic was complete.

    Yes, ArcGIS and Excel are worthwhile.

    I have so much to learn.

    Growing Minority Populations

  • If you subscribe to Freakonomics, you probably already know that it’s moved to the NYTimes domain. Stephen J. Dubner and Steven D. Levitt are the blog authors, who co-wrote the book that goes along with the blog. I read the book, which dug into data and revealed a lot of interesting things like sumo wrestlers cheating and race/career correlations. Admittedly though, I totally forgot that there was a blog until I saw the ad on the NYTimes site.

    I think this’ll be great to promote data awareness just as the book has. Of course, now on NYTimes, a lot more people are going to be reading the blog.

    One downside though, being on the NYTimes site, it’s a limited feed, and that’s just kind of annoying. Wah, wah, wah. Yes, I like to complain sometimes.

  • UPDATE: I found the essay! Programmers Need To Learn Statistics Or I Will Kill Them All by Mr. Zed Shaw

    There was this online essay that I read by a guy in the computer science/electrical engineering field who totally loves statistics. He read text books, and truly spoke like someone who respects data. I thought I bookmarked it, but now have no clue where the heck it is. Argh :(. If anyone knows who I’m talking about, please tell me!

    He worked with a company where everyone thought they “knew” statistics. Automated reports would give them numbers, and they’d fully trust them. That was statistics to the computer engineers. Crunch some numbers and see what the software gives me. As a result, these engineer-types really pissed off the author of the article. Read More

  • Senator Hillary Rodham Clinton’s Changing Views on the Iraq War

    I recently put together a timeline for Senator Hillary Rodham Clinton’s changing views on the Iraq war. In 2002, she voted in support of the war. In 2006, her language was a bit non-committal, as far as setting a deadline to get troops out of Iraq. Now, in 2007, she’s firmly set on getting troops out of Iraq by some deadline. The goal of the timeline is to show this change.

    Here’s the important lesson I learned during this task — even though it’s easy to put a timeline together, it still has to tell a story. Think about the purpose of the timeline. Usually, you want to show some change or progression over time. The tinting on the above timeline is for events during which Senator Clinton shows a definite change in her stance. The hope is that the reader keeps going left to right.

    If you don’t keep the story in mind, the timeline is no longer as useful. It’s just a bunch of text arranged in time order, which is sort of what the above timeline looked like after my first jab at it. I put tinting on the events i.e. the things that weren’t quotes from Mrs. Clinton. In retrospect, such tinting plainly defeats the purpose of this particular timeline, which went with a story that discussed the change. Duh.

  • I made this graphic early last week, or actually, maybe it was during my first week. In any case, they finally ran the story, and my graphic is on the front page of The Times Online (as of 1:39am Eastern time).

    Housing Prisoners Out of State

    You can read the article. It’s pretty interesting. In a nutshell, prisons are getting crowded, so states are shipping inmates to out-of-state private prisons. For example, California is sending prisoners all the way to Wheelright, Kentucky.

  • Jorge Luis Borges wrote this really good fictional short story in 1944 called Funes, the Memorious. It’s about a boy, Funes, who isn’t incredibly bright until one day he falls off his horse and hits his head. After the accident Funes has finds that he suddenly has an amazing memory with which he remembers every single detail of every moment in his life.

    His memory is so vivid that at one point he sees a dog, and a moment later the dog seems different. Funes remembers the way each hair stood on the the dog’s back, the direction of the breeze, what direction the dog’s tail was pointed, the perspiration on his own body, where everyone else was, etc. That dog could not possibly be the same dog that he saw a moment ago.

    Funes not only remembered every leaf on every tree of every wood, but even every one of the times he had perceived or imagined it. He determined to reduce all of his past experience to some seventy thousand recollections, which he would later define numerically. Two considerations dissuaded him: the thought that the task as interminable and the thought that it was useless.

    Trying to Remember Too Much

    At this day and age, when so much of everything is stored in database and everything is logged, is it possible to remember too much? Technology has enabled us to surveil others, video tape every moment of our life, store every email, take a seemingly endless river of pictures, record conversations, and log data out the wazoo.

    Sure, it’s great to have it, but what use can you make of a year’s worth of data? What about ten years? Or dare I say, a century’s worth of data?

    This is when visualization becomes important. It’s our duty to make the ocean of data available without letting the ocean’s never-ending vastness overwhelm the data explorers. Otherwise, our technological memory becomes like that of Funes’, and all is lost. OK, cue the dramatic music… now.

  • Admittedly, ever since the Spring quarter ended, I’ve either been preparing for my internship at The Times or have been occupied by the internship. I haven’t given much thought to my dissertation topic, which in the most vaguest of terms will somehow encompass three things:

    • Social Data Visualization
    • Eco-Visualization
    • Visualization of my Life

    I have yet to figure out how to tie the three together in a worthwhile way or even whether I will include all three. Wrapped around the three will be data sharing. I got to thinking a little bit about visualizing my life in data today.

    My adviser forwarded me this info design piece, by Gregory Dizzia (which was apparently also featured on infosthetics):

    Greg’s Relationships

    First off, this is a cool piece. If you haven’t seen it, go to the site and download the pdf. It’s a simple idea. Document past relationships — how they began, how they ended, what happened in between. The information is organized very well. At a glance, you can see how many relationships Greg has had in his life and all the one night stands he had after his mid-life, long-term relationship. The design is attractive and I could relate to the information, so I was drawn in to look more.

    Dig a little deeper, and you’ll see that there’s not just one engagement ring during that long-term relationship with Sarah. There’s a second one during his very first girlfriend, Megan. Although, I’m a little wary of calling Megan a girlfriend since it was during Greg’s tender years at age 9 to 11. Stuff like that makes me want to know more.

    Was he really engaged? Was it an arranged marriage or something? What do those breakup symbols really mean?

    Life Visualization Appeal

    Right off, Greg’s piece drew me in, because (1) it was pretty, and (2) I could relate to the data, and (3) there was a very human factor. This could probably be generalized to all types of successful visualization, but (2) and (3) are, I think, synonymous with life viz. That’s two out of three things that are automatic. Plus, as the visualizer I have a very strong emotional attachment to the data.

    NOW, what happens when we have 100 people’s relationships to visualize? 1000? That’s when it gets really interesting and social data visualization makes its way into the picture. Well, something to think about.

  • Tired of looking at my New York Times graphics yet? Too bad. Here’s another one for my your viewing pleasure.

    CUNY SAT Math Graph

    CUNY schools are planning to raise their SAT math scores to 510 for their top-tier schools and to 500 for the rest. Believe it or not, the current cutoff for all schools is 480. Some say the increase in standards is good for the school to improve reputability. Others argue that the new cutoffs single out a lot of minorities since the high school education system is uneven.

    Currently, lots of students are coming into CUNY schools unprepared to take college-level math courses, and the college ends up teaching remedial courses like pre-algebra. That’s just SAD. It’s probably more important to focus on improving the high school education system than it is to try to get unqualified students into college.

  • I had a chance to browse through some of my subscribed feeds today, and I saw a post called Noisy Subways by Kaiser over at Junk Charts blog. So I clicked, since it isn’t one of those full feeds, and then I saw The New York subway report card. I smiled, because, well, I made that chart just a few days ago!

    Just a disclaimer: The Times chart was just The New York Times version of the original Straphangers report:

    Straphanger Subway Report Card

    Anyways, there was bit of a discussion, which again, I found very amusing. I felt kind of special in a way.

    There were two main points to the post – 1. Noisy data; and 2. Chart is hard to read. I’m very tired right now, so I’ll just say a few things.

    Yes, the data is really noisy, but why shouldn’t it be? We shouldn’t assume that all six variables are positively correlated. It’s very possible for a line to be very reliable, but have no seats. One could argue that the lines with more people HAVE to be more reliable, because if something goes wrong, more people are going to get screwed.

    Secondly – sure, the chart is a bit hard to read at a glance, but who’s the audience? New Yorkers are the audience, and the first thing that they’re going to do is look for their subway line. That’s what I did. With the audience in mind, I think the chart serves its purpose.

    Most of the commenters provided decent ideas for alternative graphics. My opinion is that with this kind of data, it’s up for grabs. Audience is key though for charts, graphs, plots, maps, etc in a newspaper. Spiders and whiskers won’t make sense to many people. You’d be amazed of how many people don’t know how to read a scatter plot. The public is getting better though. They’ll get there.

    As for the person who left the comment about the gaps in the chart. I’m going to assume that was in haste. Some lines are tied, hence some blanks spaces.

    Welp, that was fun. Yawwwwn. Time for bed.

  • My second graphic was in The Times Metro section today (Tuesday, July 24, pg B2). It’s an annual report card compiled by the Straphangers Campaign for every New York subway line. The No. 1 line was coincidentally ranked best while the C and the W (one of the lines I take) were near the bottom.

  • Google Reader Trends

    This is just really amusing to me. Above is a bar plot, from Google Reader, of the number of items I’ve read in the past 30 days, with each bar representing a day. Quite easy to see when I had a little bit too much time on my hands. Right when the internship starts, the number of items read plummets. I miss my subscribed feeds =(.

  • It’s six days in, and I’m starting to get used to Adobe Illustrator. It’s one honker of a program, so I’m picking up things as I go along, but on the upside, I’m really glad I went through some of Illustrator lessons to at least familiarize myself with layers, etc.

    I think I’m getting closer to the point where it’s less about “How do I do this?” and more about “What am I going to show?” Don’t get me wrong. There’s A LOT I still don’t know how to do, but at least I know enough to figure out a good amount on my own. Just a lot to figure out about The Times graphics style — font, sizes, color, etc.

    The administrative stuff is the hardest part of all though. While I’m working on a graphic I have to keep all the necessary people updated i.e. the reporter of the story of whom I am making the graphic for. I got scolded today, because a reporter didn’t know that her story was put on hold. I didn’t know that I was her only contact link. Lesson learned. I’m just going to contact everyone from now on. Better to provide too much information than too little (in this case, at least).

    Once a graphic is completed, I have to print out five copies or so and hand them out to all of the necessary people. Next, update the graphic schedule, and then place it in the active list. It’s strange that even though we’re all equipped with these super awesome computers that I still have to walk upstairs and hand-deliver copies of a graphic. I guess nothing can replace human contact.