Hannah Davis works with machine learning, which relies on an input dataset to…
Statistics
More than mean, median, and mode.
-
Dataset as worldview
-
Testing Gmail’s tab choices on presidential candidates’ emails
For many, Gmail automatically categorizes incoming emails to the primary inbox, promotions, and…
-
Federal budget scaled to per person dollars
For The Upshot, Alicia Parlapiano and Quoctrung Bui scaled down the federal budget…
-
Billionaire’s spending scaled to your net worth
We hear about billionaires spending millions of dollars on ads, acquisitions, etc. It…
-
Data problems in Iowa caucus results
It wasn’t just issues with an app. There appears to be many more…
-
Privacy algorithm could lead to Census undercount of small towns
To increase anonymity in the Census records, the bureau is testing an algorithm…
-
Nationwide database of credibly accused Catholic clergy
For ProPublica, Ellis Simani and Ken Schwencke compiled an interactive database that you…
-
Dataset for rejected license plate applications
Noah Veltman just posted a dataset of 23,463 personalized license plate applications that…
-
Google Dataset Search moves out of beta
Over a year ago, Google released Dataset Search in public beta. The goal…
-
To get your personal data, provide more personal data
File another one under the sounds-good-on-paper-but-really-challenging-in-practice. Kashmir Hill, for The New York Times,…
-
How police use facial recogntion
For The New York Times, Jennifer Valentino-DeVries looked at the current state of…
-
Squirrel census count in Central Park
In 2018, there was a squirrel census count at Central Park in New…
-
Misinterpreted or misleading fire maps
With all of the maps of fire in Australia, be sure to check…
-
Scripts from The Office, the dataset
The decade is almost done. You’re sitting there and you’re thinking: “I wish…
-
Analysis of online sermons
Pew Research Center analyzed online sermons in U.S. searches, taking a closer look…
-
Deaths from child abuse, a starting dataset
By way of the Child Abuse Prevention and Treatment Act, ProPublica and The…
-
AI-generated pies
Janelle Shane applied her know-how with artificial intelligence to generate new types of…
-
Looking for similar NBA games, based on win probability time series
Inpredictable, a sports analytics site by Michael Beuoy, tracks win probabilities of NBA…
-
Sephora dataset is a collection of makeup reviews that mention crying
Interested in reviews on the Sephora website for waterproof makeup, Connie Ye figured…
-
Data shelf life
Stephen M. Stigler argues that data have a limited shelf life. The abstract:…