• The Bureau of Labor Statistics announced a reduction in data collection to put together the Consumer Price Index, which is used to estimate inflation in the United States.

    BLS is reducing sample in areas across the country. In April, BLS suspended CPI data collection entirely in Lincoln, NE, and Provo, UT. In June, BLS suspended collection entirely in Buffalo, NY.

    Sample reduction and collection suspension affect both the commodity and services survey and the housing survey. These actions have minimal impact on the overall all-items CPI-U index, but they may increase the volatility of subnational or item-specific indexes. The number of imputed items and the response rates increased in April due to these actions. BLS makes reductions when current resources can no longer support the collection effort. BLS will continue to evaluate survey operations.

    I am sorry, Buffalo. You no longer directly feed into the national estimates.

    Budget cuts continue to force agencies to reduce their data coverage, which inevitably shifts the estimates. This seems to be an ongoing challenge across agencies, but it is growing worse.

  • Chess grandmaster Magnus Carlsen played against 143,000 people in a single game. The crowd voted on each move, and it eventually ended in a draw.

    Carlsen, 34, became the world’s top-ranked player in 2010 at 19 and has won five World Championships. He achieved the highest-ever chess rating of 2882 in 2014 and has remained the undisputed world No. 1 for more than a decade.

    “Overall, ‘the world’ has played very, very sound chess from the start. Maybe not going for most enterprising options, but kind of keeping it more in vein with normal chess — which isn’t always the best strategy, but it worked out well this time,” Carlsen said in a statement Friday as Monday’s draw seemed imminent.

    I hadn’t thought about the wisdom of crowds in a while. Over the years, it’s felt like the crowds have gotten a little less wisdom-y, but maybe it’s a good time to revisit. Use our powers for good.

  • A couple years ago, Harvard professor Francesca Gino was accused of faking data, ironically for research on honesty. Gino recently lost tenure:

    A Harvard professor who has written extensively about honesty was stripped of her tenure this month, a university spokesman said on Tuesday, after allegations that she had falsified data.

    The scholar, Francesca Gino, a professor of business administration at Harvard Business School and a prominent behavioral scientist, has studied how small changes can influence behavior and been published in a number of peer-reviewed journals. Among the studies in which Dr. Gino has been a co-author are, for example, one showing that counting to 10 before deciding what to eat can lead to choosing healthier food.

  • Members Only

    Hi folks. It’s Nathan. Welcome to the Process, the newsletter for FlowingData members…

  • There is always ample discussion about progressive tax rates in the United States. For those unfamiliar, income earned within certain ranges are taxed differently. Higher income is taxed higher. For Datawrapper, Luc Guillemot charted the rates for countries in Western Europe.

    The x-axes represent income levels as a percentage of average income in each country. The y-axes represent the tax rate for the income level. The black bars show averages for the European Union. Belgium, with the steepest climb, increases taxes the most, whereas Hungary and Bulgaria use a flat rate across income levels.

  • For MIT Technology Review, James O’Donnell and Casey Crownhart ran numbers and interviewed experts go piece together a projection for how much energy AI will use. The takeaway is that it’s impossible to know with any certainty, because companies don’t disclose what they’re building.

    The Lawrence Berkeley researchers offered a blunt critique of where things stand, saying that the information disclosed by tech companies, data center operators, utility companies, and hardware manufacturers is simply not enough to make reasonable projections about the unprecedented energy demands of this future or estimate the emissions it will create. They offered ways that companies could disclose more information without violating trade secrets, such as anonymized data-sharing arrangements, but their report acknowledged that the architects of this massive surge in AI data centers have thus far not been transparent, leaving them without the tools to make a plan.

    “Along with limiting the scope of this report, this lack of transparency highlights that data center growth is occurring with little consideration for how best to integrate these emergent loads with the expansion of electricity generation/transmission or for broader community development,” they wrote. The authors also noted that only two other reports of this kind have been released in the last 20 years.

  • Hardware for AI uses a whole lot of energy while training on data from the internets, processing queries, and hallucinating surprising solutions. Alex de Vries-Gao, from the Institute for Environmental Studies in the Netherlands, published estimates for how much energy and compared to energy demand for countries.

    Over the full year of 2025, a power demand of 5.3–9.4 GW could result in 46–82 TWh of electricity consumption (again, without further production output in 2025). This is comparable to the annual electricity consumption of countries such as Switzerland, Austria, and Finland (see Figure 2; Data S1, sheet 6). As the International Energy Agency estimated that all data centers combined (excluding crypto mining) consumed 415 TWh of electricity in 2024, specialized AI hardware could already be representing 11%–20% of these figures.

    There are many assumptions behind the estimates, and they could be lower or higher depending on the unknowns, but most signs appear to point to steep increases.

    We should probably plan for that. It doesn’t seem like this AI train is going to slow down any time soon. (via Wired)

  • Downloading survey microdata from public resources can be tricky. Sometimes the documentation is sparse, the tools are outdated, or the datasets are tucked away in obscure FTP subdirectories. This is annoying when you just want to work with the data.

    Analyze Survey Data for Free, maintained by Anthony Damico, aims to streamline the download process via R. From a decade ago:

    Governments spend billions of dollars each year surveying their populations. If you have a computer and some energy, you should be able to unlock it for free, with transparent, open-source software, using reproducible techniques. We’re in a golden era of public government data, but almost nobody knows how to mine it with technology designed for this millennium. I can change that, so I’m gonna. Help. Use it.

    The site received an update to make downloading easier across 49 public datasets. Given the data takedowns these days, now seems like a good time to make quick use of the resource.

  • Roger Peng and Hilary Parker started the statistics and data science podcast Not So Standard Deviations almost a decade ago. It was one of the few podcasts I kept up with while I drove my kids to school. They posted their last episode last month.

    Pouring one out for NSSD.

  • As we enter a time when people question the usefulness of vaccines, even though there are clear benefits, Neil Halloran revisits a time when vaccines did not exist. With a mix of charts, information graphics, and photographs, Halloran tells the story of the smallpox vaccine. High mortality rate and hundreds of millions dead transformed to zero.
    Read More

  • The Congressional Budget Office published a report estimating effects on household income if the GOP’s budget reconciliation bill pushes through. The poor and middle class will effectively net less in income and the upper class, especially the top 0.1%, will net more. G. Elliott Morris describes it as paying more for less, which is a raw deal.

    In other words, by decreasing the amount of tax revenue it gathers from the ultra-rich, decreasing the transfers it makes to the poor, and increasing its overall spending, Republicans are asking middle-income and poor families to shoulder a much larger share of the federal deficit — all while they get less from the government. They are asking you, in summary, to pay more for less.

    The chart above is from CBO. It compares income for the lowest decile against the highest as a percentage. Check out Morris’s chart that shows the shift in dollars. The bars grow much taller on the high end with an absolute scale.

  • Members Only

    The good stuff from May: tools you can use, data to play with, and resources to learn from.

  • As the administration tries to block international students from attending Harvard University, NYT’s the Upshot charted the schools with the highest percentage of international students.

    I don’t know anything about Illinois Tech, but whoa, over half of undergraduates and graduate students are from outside the U.S.

  • The administration is making it more difficult if not impossible for foreign students to attend college and universities in the United States. Catherine Rampell, for Washington Post Opinion, argues that doing so is increasing trade deficits when treating education as an export.

    We also run a huge trade surplus in this sector, meaning that foreigners buy much more education from the United States than Americans buy from other countries. In the 2022-2023 school year, more than three times as many international students were enrolled in the United States as there were American students studying abroad. Translated to cash: Our education-services trade surplus is larger than the trade surplus in the entire completed civilian aircraft sector.

    On top of that, the people who are able to study abroad are often hard-working and the brightest in their class. They provide American students with fresh perspectives.

  • For the Intercept, Sam Biddle reports on government plans for a one-stop shop to buy all the data.

    Rather than each agency purchasing CAI individually, as has been the case until now, the “Intelligence Community Data Consortium” will provide a single convenient web-based storefront for searching and accessing this data, along with a “data marketplace” for purchasing “the best data at the best price,” faster than ever before, according to the documents. It will be designed for the 18 different federal agencies and offices that make up the U.S. intelligence community, including the National Security Agency, CIA, FBI Intelligence Branch, and Homeland Security’s Office of Intelligence and Analysis — though one document suggests the portal will also be used by agencies not directly related to intelligence or defense.

    “In practice, the Data Consortium would provide a one-stop shop for agencies to cheaply purchase access to vast amounts of Americans’ sensitive information from commercial entities, sidestepping constitutional and statutory privacy protections,” said Emile Ayoub, a lawyer with the Brennan Center’s liberty and national security program.

    Data privacy issues still get shoulder shrugs from most people. But it’s getting easier to see why access to such data can grow problematic when certain individuals are out to get others who have done nothing wrong. (Right??)

  • For The Washington Post, Douglas MacMillan and Aaron Schaffer report on a system that was in use for two years before inquiry:

    Police across the country rely on facial recognition software, which uses artificial intelligence to quickly map the physical features of a face in one image and compare it to the faces in huge databases of images — usually drawn from mug shots, driver’s licenses or photos on social media — looking for possible matches. New Orleans’s use of automated facial recognition has not been previously reported and is the first known widespread effort by police in a major U.S. city to use AI to identify people in live camera feeds for the purpose of making immediate arrests, Wessler said.

    It seems clear that facial recognition can be helpful in some cases. Problems arise when the systems go unchecked and everyone has to argue their innocence when out for a walk in the park.

  • Ukraine has suffered ongoing damage to their power infrastructure since the invasion began. Bloomberg mapped the damage through the lens of lights from above.

    A Bloomberg News analysis of satellite imagery collected by NASA found that Kharkiv City experienced a 94% drop in the intensity of nighttime lights in the autumn of 2024 when compared to three years before Russia’s invasion. The northeast Ukrainian city’s dramatic change in satellite-detected lighting ranks third of all cities, urban areas and other communities, after Nikopol and Avdiyivka.

    Bloomberg averaged pixels before and after the invasion. Be sure to click through to see the lights fade in and out as an animation.

    You can download the satellite imagery data through NASA, updated daily since 2012. It amazes me every time when I’m reminded that such detailed data is openly available and easy to access.

  • With a circular voronoi diagram, NYT’s the Upshot shows a much slower rate of funding from the National Science Foundation, through May 21.

    The full cells show the average funding over the past decade up to the same date, and the darker cells show the current funding. It looks like there are four categories with more funding than usual. Everything else is a big cut.

    In case this view looks familiar, the Upshot used a similar view to show infrastructure proposals a few years ago. Although this time the circles look like petri dishes, given the topic.

    Rewind back further to 2008 for the O.G. consumer spending graphic.

  • Last year, the Consumer Financial Protection Bureau proposed a new rule that would better protect individuals’ privacy from the companies that collect and collate digital traces from wherever they can. Seemed like a good idea. But the current CFPB director Russell Vought has different ideas.

    For Wired, Dell Cameron and Dhruv Mehrotra report on the potential harm:

    Data brokers operate within a multibillion-dollar industry built on the collection and sale of detailed personal information—often without individuals’ knowledge or consent. These companies create extensive profiles on nearly every American, including highly sensitive data such as precise location history, political affiliations, and religious beliefs. This information is frequently resold for purposes ranging from marketing to law enforcement surveillance.

    Many people are unaware that data brokers even exist, let alone that their personal information is being traded. In January, the Texas Attorney General’s Office, led by attorney general Ken Paxton, accused Arity—a data broker owned by Allstate—of unlawfully collecting, using, and selling driving data from over 45 million Americans to insurance companies without their consent.

    I’m sure money had nothing to do with these choices.

  • Members Only

    This week we look at how the same data can easily lead to different conclusions that can all be correct, even when they conflict.