• The Padma River in Bangladesh is constantly shifting its 75-mile path. Joshua Stevens for the NASA Earth Observatory shows what the shifting looked like through satellite imagery, over a 30-year span.

    Kasha Patel:

    The upper section of the Padma—the Harirampur region— has experienced the most erosion and shows the most notable changes. The river has become wider at this section by eroding along both banks, although most activity occurred on the left bank. Using topographic, aerial, and satellite imagery, scientists found that the left bank shifted 12 kilometers towards the north from 1860 to 2009 and developed a meandering bend. The river left a scar where the water once flowed, as you can see in the 2018 image.

    See also the dramatic shifts of the Ucayali River in Peru.

  • The camera on the slightly creepy arm takes a picture of the pages in the book, the software uses OpenCV to extract faces, and the faces are passed to Google Auto ML Vision comparing the faces to a Waldo model. The result: There’s Waldo.

  • Wherever more attention or the appearance of it equates to more money, there are those who try to game the system. Michael H. Keller for The New York Times examines the business of fake YouTube views:

    YouTube’s engineers, statisticians and data scientists are constantly improving in their ability to fight what Ms. O’Connor calls a “very hard problem,” but the attacks have “continually gotten stronger and more sophisticated,” she said.

    After the Times reporter presented YouTube with the videos for which he had bought views, the company said sellers had exploited two vulnerabilities that had already been fixed. Later that day, the reporter bought more views from six of the same vendors. The view count rose again, though more slowly. A week later, all but two of the vendors had delivered the full amount.

  • Sometimes the visualization takes care of itself. Photographer Tim Whittaker filmed sheepdogs herding thousands of sheep, and the flows one place to another are like organized randomness.

  • Popular summer songs have had a bubbly, generic feel to them the past several years, but it wasn’t always like that. Styles used to be more diverse, and things might be headed back in that direction. Sahil Chinoy and Jessia Ma charted song fingerprints over the years for a musical comparison.

    Turn up your speakers or put on your headphones for the full experience. The song and music video snippets provide a much better idea of what the charts represent.

  • Members Only

    The New York Times published an election map. A lot of people did not like the map, arguing that it was an inaccurate representation. Those who did like the map argued that one must consider intent before throwing a map to the flames.

    What happens when intended use and actual use do not match up?

  • We usually visualize data on computers, because it’s where the data exists and it’s a more efficient process. But as long as you can make shapes and use colors, you can use just about any material. Amy Cesal, as part of a 100-day creative project called Day Doh Viz, is using Play-Doh.

    Ever since my son shifted his art station to my office, I’ve been drawn to his crayons, markers, and masking tape. The manual labor of it forces a shifted thought process that’s less technical and more about what you want to show. It also feels more like playing. Recommended. [via Visualising Data]

  • There are many mistakes you can make when you first get into visualization. Yan Holtz and Conor Healy catalog the common pitfalls as part of their project From Data to Viz. While there are a lot, keep in mind that you’ll learn these as you go. But it’s good to at least be aware of them from the start.

  • Levees are intended to prevent flooding in the areas they are built, but they change the direction and speed of flowing water, which can cause unintended flooding in areas upstream. ProPublica and Reveal collaborated with the St. Anthony Falls Laboratory to build a scale model to show how this can happen.

    An interactive graphic lets you shift flow rate up and down to see the changes yourself. The video coupled with the illustration makes the effects super clear.

  • Genetic algorithms are inspired by natural selection, where the system is given a set of inputs and the “best” iteration is chosen until there’s some kind of convergence to a solution. Joel Simon applied this process to floor plan design.

    The creative goal is to approach floor plan design solely from the perspective of optimization and without regard for convention, constructability, etc. The research goal is to see how a combination of explicit, implicit and emergent methods allow floor plans of high complexity to evolve. The floorplan is ‘grown’ from its genetic encoding using indirect methods such as graph contraction and emergent ones such as growing hallways using an ant-colony inspired algorithm.

    The results were biological in appearance, intriguing in character and wildly irrational in practice. It was a fun learning experience and I plan to re-use methods in other projects.

    [via kottke]

  • Apple’s value passed $1 trillion on Thursday, and as tradition requires, we must consider the scale of such a large number. We must compare the value of Apple against the sum value of a surprising number of small and medium companies. The New York Times has you covered with a bucket of blobs metaphor.

    So blobby. So bucket-y.

  • NPR used video from a thermographic camera to explain why cities tend to be hotter than their surrounding areas. Straightforward and a good complement to the video.

  • Members Only

    Welcome to the new members-only newsletter: The Process. In this first update, a certain data graphics expert seems to really dislike R, which prompts a look into the visualization tools we use and what one might get out of a bigger toolbox.

  • I’m happy to introduce an in-depth, process-focused newsletter for FlowingData members. It’s called The Process. If you’re already a member, you should receive the first issue soon. It’ll be weekly on Thursdays. It’ll be about the how and why of visualization. It’s a benefit in addition to tutorials, courses, etc. Basically, nothing changes, except that you get more now.

    If you’re not a member yet, I’d love for you to join. Find out all the benefits of membership here.

    A big part of FlowingData is the what of data and visualization. I highlight the interesting work of others and publish my own projects. This all continues too.

    However, with The Process I’ll talk more about the practical side of designing data graphics. That means tools, design choices, limitations, analysis, and everything else involved in visualizing data. My hope is that it gives me a chance to broaden my scope past the tutorials and courses, which tend to stick around R and JavaScript, while still providing a practical point of view.

    I’m sure the newsletter evolves over time, but I’m excited to see how it develops. I think it will force me to learn new things about visualization and think more critically about the field. I want more depth. I hope you’ll join me.

  • As the field grows and needs develop throughout companies, specialization in data science is a natural next step. Elena Grewal, head of data science at Airbnb, describes their three main tracks and further specialization within each.

    We decided to restructure data science along three tracks. These described what we were looking for and are areas we want to attract talent.

    The Analytics track is ideal for those who are skilled at asking a great question, exploring cuts of the data in a revealing way, automating analysis through dashboards and visualizations, and driving changes in the business as a result of recommendations. The Algorithms track would be the home for those with expertise in machine learning, passionate about creating business value by infusing data in our product and processes. And the Inference track would be perfect for our statisticians, economists, and social scientists using statistics to improve our decision making and measure the impact of our work.

    Sounds about right.

    I’m curious what statistics and data science look like ten or twenty years from now. As each becomes more like the other, do the two fields converge into one, or do they collide and differentiate themselves as much as possible? I guess the former.

  • Oliver Roeder for FiveThirtyEight:

    FiveThirtyEight has obtained nearly 3 million tweets from accounts associated with the Internet Research Agency. To our knowledge, it’s the fullest empirical record to date of Russian trolls’ actions on social media, showing a relentless and systematic onslaught. In concert with the researchers who first pulled the tweets, FiveThirtyEight is uploading them to GitHub so that others can explore the data for themselves.

    The data set is the work of two professors at Clemson University: Darren Linvill and Patrick Warren. Using advanced social media tracking software, they pulled the tweets from thousands of accounts that Twitter has acknowledged as being associated with the IRA. The professors shared their data with FiveThirtyEight in the hope that other researchers, and the broader public, will explore it and share what they find. “So far it’s only had two brains looking at it,” Linvill said of their trove of tweets. “More brains might find God-knows-what.”

    How amazing would it be if this weren’t a thing and Twitter released this data themselves? Or better yet, what if Twitter released their own report based on research conducted by their own data scientists? No one knows how trolls use Twitter better than Twitter.

    Of course, that’s not happening any time soon. So until then, download the data here.

  • Dave Merrill and Lauren Leatherby for Bloomberg visualized land use for the conterminous United States using a pixel-like grid map:

    The 48 contiguous states alone are a 1.9 billion-acre jigsaw puzzle of cities, farms, forests and pastures that Americans use to feed themselves, power their economy and extract value for business and pleasure.

    Using surveys, satellite images and categorizations from various government agencies, the U.S. Department of Agriculture divides the U.S. into six major types of land. The data can’t be pinpointed to a city block—each square on the map represents 250,000 acres of land. But piecing the data together state-by-state can give a general sense of how U.S. land is used.

    The above map is the full aggregate, but be sure to click through to see the comparisons across categories. Using a scrollytelling format, the graphics are a hybrid of grid maps and square pie charts. States serve as a point of reference. They’re the banana for scale. I like it.

  • There’s been some disagreement about who wrote “In My Life” by The Beatles, so researchers did what any normal person does and tried to model the songs of Paul McCartney and John Lennon:

    Mark Glickman, senior lecturer in statistics at Harvard University, and Jason Brown, Professor of Mathematics at Dalhousie University, created a computer model which broke down Lennon and McCartney songs into 149 different components to determine the musical fingerprints of each songwriter.

    McCartney says he wrote the music for the song, but Glickman and Brown give that claim a less than 1 in 50 chance.

  • Maximilian Noichl visualized the relationships between philosophers from 600 B.C. to 160 B.C.:

    The Sociology of Philosophies is a fascinating book by Randall Collins, in which he attempts to lay out a global history of philosophy in terms of interpersonal relations between philosophers. These connections are layed out in the form of graphs. This my attempt to visualize a bit of that data, depicting 440 years of western ancient philosophy.

    Green represents a relationship between master and pupil, and the red-ish color represents a more contentious relationship. The rest are less concrete. Dang, Sokrates.