I've been reading papers on how people learn statistics (and thoughts on teaching the subject) and came across the frequently-cited work of mathematical psychologists Amos Tversky and Daniel Kahneman. In 1972, they studied statistical misconceptions. It doesn't seem much has changed. Joan Garfield (1995) summarizes in How to Learn Statistics [pdf].
People estimate the likelihood of a sample based on how closely it resembles the population.
You can't always judge how likely or improbable a sample is based on how it compares to a known population. For example, let's say you flip a coin four times and get four tails in a row (TTTT). Then you flip four more times and get HTHT. In the long run, heads and tails are going to be split 50/50, but that doesn't mean the second sequence is more likely.
Similarly, a sequence of ten heads in a row isn't the same as getting a million heads in a row.
Use of the representative heuristic leads to the view that chance is a self-correcting process.
The history boards at roulette tables mean nothing. They're just for show. Just because a red hasn't come up in a while doesn't mean the roulette wheel is due for a red soon. Each spin is independent of the spins that came before it.
People ignore the relative sizes of population subgroups when judging the likelihood of contingent
events involving the subgroups.
You have to consider the base population for comparison. Maybe a company is comprised of 80 percent men and 20 percent women. If your base is the US population, you might consider that inequality, but what if the applicant breakdown was 90 percent men and 10 percent women? In the latter case, a higher percentage of women than men were actually hired.
Strength of association is used as a basis for judging how likely an event will occur.
Just because some percentage of your friends are designers doesn't mean that the same percentage of people are designers elsewhere (obviously). Or the example that Garfield uses: a ten percent divorce rate among people you know isn't necessarily the same nationwide or globally.
The conjunction of two correlated events is judged to be more likely than either of the events themselves.
The common example from Tversky and Kahneman:
"Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations." A group of people were asked if it was more probable that Linda was a bank teller or a bank teller active in the feminist movement (a sign of the times this poll was taken).
Eighty-five percent of respondents chose the latter, but the probability of two things happening together is always less than or equal to the events occurring individually.
Notice that there's still not much math involved in these examples. It's logic that plays into thinking like a statistician without the math (with statistical foundations). You can get a lot done just by thinking critically about your data.