scraping

Bot traffic surpasses human traffic

Artificial Intelligence / bot, Cloudflare, Internet, scraping

Traffic has been rising extra quickly these past couple of years. Unfortunately (or…

LinkedIn sues company for fake bots

Data Sharing / fake, lawsuit, LinkedIn, ProAPIs, scraping

Suzanne Smalley reporting for The Record:
Social media giant LinkedIn on Thursday filed…

Decline in data for AI bots to scrape

Data Sharing / AI, Data Provenance Initiative, ethics, scraping

The Data Provenance Initiative audited 14,000 web domains to see how sites currently…

News organizations blocking OpenAI

Data Sharing / news, OpenAI, scraping

Ben Welsh has a running list of the news organizations blocking OpenAI crawlers:…

Scraping data without programming

Software / Google Sheets, Samantha Sunne, scraping

Maybe you’ve wished you could quickly grab the data on a webpage and…

Scraping public data ruled legal

Data Sources / public, scraping

For TechCrunch, Zack Whittaker reporting:
In its second ruling on Monday, the Ninth…

Spatula, a Python library for maintainable web scraping

Software / Python, scraping

This looks promising:
While it is often easy, and tempting, to write a…

Mining Parler data

Data Sources / Capitol, metadata, Motherboard, Parler, scraping

Just before the social network Parler went down, a researcher who goes by…

Practical tips for scraping data

Coding / NPR, scraping

It’s an unpleasant feeling when you have an idea for a project and…

Link

Purifying the Sea of PDF Data, Automatically →

Software / link, PDF, scraping

Jeremy B. Merrill is working on the problem of too much data in PDF files. “My pattern solves this problem using tabula-extractor, the Ruby library (and command-line tool) that powers Tabula. It’s built to output data to CSVs or to a MySQL database.”

Link

Mining the Social Web

Software / link, scraping

The example repository for Mining the Social Web if you’re interested in getting started. The Twitter examples rely on a soon to be defunct API, because the book was written in 2011, but the rest is still valid.

A guide for scraping data

Data Sources / scraping

Data is rarely in the format you want it. Dan Nguyen, for ProPublica,…

Bot traffic surpasses human traffic

LinkedIn sues company for fake bots

Decline in data for AI bots to scrape

News organizations blocking OpenAI

Scraping data without programming

Scraping public data ruled legal

Spatula, a Python library for maintainable web scraping

Mining Parler data

Practical tips for scraping data

Purifying the Sea of PDF Data, Automatically →

Mining the Social Web

A guide for scraping data

Recently for Members

June 18, 2026
Great expectations

June 11, 2026
Showing when the data flips

June 4, 2026
Visualization is not the goal

May 28, 2026
Visualization tools, resources, and data, May 2026 roundup

May 21, 2026
Weakness of a bar chart

Second Edition

Visualize This: The FlowingData Guide to Design, Visualization, and Statistics (2nd Edition)

Browse by Chart Type See All →

Browse By Topic

Visualization

Maps

Infographics

Networks

Statistics

Software

Sources

Design

Made by FlowingData

The Process

Data Underload

Chart Everything

Guides

Books

Shop