You would think that something so concrete, carefully recorded by authorities, wouldn’t be too tough to tabulate, even if at a large scale. Not so.
Homicide is a “serious crime that many people are concerned with, it is well-measured, and it is to a large degree well-reported and -recorded,” says Alfred Blumstein, a criminologist at Carnegie Mellon University. “That is not to say that there aren’t a variety of ways for fudging the measurement.”
Among the factors that cloud homicide numbers: gaps between police-reported numbers and counts by public-health organizations. The discrepancy is wide in many African countries and some Caribbean ones. The United Nations attributes the disparity to several factors, including definitional differences—whether honor killings should count—a lack of public-health infrastructure in some countries, and undercounting—possibly deliberate—by police.
I think this is something the common public often doesn’t understand about data. The numbers are entered and analyzed on a computer, so it’s easy to mistake data for mechanical output. It must be accurate, right? That’s usually not the case though, especially when it comes to data collection outside a controlled lab setting.
The game always changes when humans are involved. Not everyone responds to surveys, definitions of events vary across organizations, estimation methods change every year, and the list goes on.
For those who do stuff with data, you have to deal with that uncertainty, and as data consumers, you have to remember that numbers don’t automatically mean fact.
I think it is good to point out that not only do you have to make sure you don’t make miscalculations with your data, but also to check carefully where your data comes from. I agree that people (including me) tend to feel comfortable as soon as the data is on the pc (“It must be true”). This article shows once again how wrong we are.
This post reminds me to go back and watch The Wire. They spend almost entire episodes showing how cops ‘juke the numbers’ in crime reporting to make themselves look better. As soon as people’s paychecks and promotions depend on self-reported statistics (or even just something they have any influence over) the numbers are questionable. Just look towards the recent test cheating scandal in Atlanta with No Child Left Behind.
Even in the US, counting homicides can be complicated. I work on a site called Homicide Watch DC, where we cover every murder in the District of Columbia from crime to conviction. One thing we had to decide on early was how to count.
The short version is that homicides are counted on the day they’re declared a homicide, not the day the original crime occurred.
I wrote a post with more explanation here.
Then there are closure rates, which, well, read this and see if it makes sense: http://homicidewatch.org/2011/12/30/understanding-mpds-94-homicide-closure-rate/
As someone who lived in and worked as a reporter in a high-crime city, there is a huge discrepancy between reported and actual crimes – but the homicide stats are by far the most accurate which is why I wrote “Lies, damn lies and crime statistics: You can only trust the homicide stats.” http://kevinjmireles.wordpress.com/2010/06/15/lies-damn-lies-and-crime-statistics-you-can-only-trust-the-homicide-stats/
Now that I work in the BI/analytics business there are a lot of good lessons to be learned from the vast gulf between reported crime stats and actual crime stats.