Simon Willison asked a straightforward question about the tools people use:
If someone gives you a CSV file with 100,000 rows in it, what tools do you use to start exploring and understanding that data?
Then he expanded the question asking what people use for files with 1 million rows, 10 million rows, and 1 billion rows.
Browse the thousands of replies, and you quickly see that (1) there are many options to explore a dataset and (2) many people feel that what they’re using is the best option. There’s click-and-play programs, web-based products, programming languages, and command-line options. Some use a combination of whatever works for them at a given time for a certain dataset.
This is why when people ask me what the “best” tool is, I usually have to follow up with what they know already and what they want to do with the tool. It’s also why best-of lists for data exploration are usually not worth your time, unless you account for the assumptions about usage.