Garbage in, garbage out the old adage goes. Nigel Hawkes, Director of Straight Statistics, describes a sort of statistical whistleblowing letter to the British Medical Journal.
A team from Imperial College found that in 2009-10, nearly 20,000 adults were coded as having attended paediatric outpatient services, and 3,000 patients under 19 were apparently treated in geriatric clinics. Even more striking, between 15,000 and 20,000 men have been admitted to obstetric wards each year since 2003, and almost 10,000 to gynaecology wards.
It’s hard to put your faith in analysis, visualization, policy, and anything else that comes out of data with reports like these. With human error being a known issue, we have to find better ways of inputting and double-checking data. Unfortunate mistakes at the outset only lead to bigger problems down the line.
Reading the comments on the post you link to, apparently a very large percentage of the “pregnant” men are actually newborn male babies, registered in obstetric wards. So perhaps the problem is not as grave as it would first appear…
Yeah, this sounds like sloppy analysis rather than dirty data – according to the comments in the article, 96% of the 20,000 seemingly-pregnant men were “relating to [male] babies less than one week old” receiving treatment by midwives in obstetric wards, and “almost all” were young babies. These 20,000 represent around 2% of the sample, and so the ‘almonst none’ that seem to have been actual errors suggest an error rate below 0.1%. Sounds legit.
That leaves the “3,000 patients under 19 [who] were apparently treated in geriatric clinics”. Now I’m no expert, but I would imagine that when a person under 19 is a rare case of a condition that is most common among the elderly, and when the best available treatment for them is therefore in a geriatric clinic, it would seem perfectly rational to check them into a geriatric clinic, if that’s where the experts and the equipment is. Since this is a nationalised health service covering a population of over 60 million, 3,000 cases like this in a year seems plausible.
An adult friend of mine has a condition that is most common in children, and so she’s often checked in to pediatric wards and seen by specialists on her condition whose original training was in pediatrics and who normally see children. It’s no mistake – it’s someone getting the best treatment available – but her case would probably look like a mistake with a shallow analysis like this.
I find the biggest problem with health data is that it is captured by medical staff who are often under pressure to concentrate on giving care rather than doing admin (for which we should be grateful I guess). Gaps and errors are the result. Key facts are recorded but secondary data such as ethnicity which may shed useful light on some diseases is often left unrecorded.
However, the designers of medical data systems need to make these applications much more robust in terms of validation and much more like apps that are used daily outside of work where web style icons and contextual help are employed to speed up data entry.
In a post about sloppy statistics, you quote numbers without indicating the size of the total sample? Classy. I wonder if there are 20 million or 200,000 people in these data sets…
@Will – precisely
My instinct is that these will probably be an acceptable level of noise in a data set such as ‘all NHS patients’ but of course we don’t know unless we’re given the figures
The NHS is now split by country in the UK so this is probably looking at England (the largest population) or the whole UK combined. The NHS in England caters for ~52m people while the whole of the UK is ~62m.
However, we just don’t know because the total is not stated.
The same is true of vital registration statistics, in high- as well as low- and middle-income countries. There will be numerous cases of men coded as dying from cervical cancer, newborns of maternal causes, and so on. There are also cases where the age at death and cause of death are implausible – for example, four-year-olds dying of conditions that often take many years to manifest themselves, like lung cancer.
My husband actually was one of those men admitted to an obstetric ward! He had surgery to have tonsils and adenoids out and to widen his nasal passages. Due to other medical issues he had to stay overnight post-surgery for recovery. There was no other rooms available in the hospital…so they admitted him to the obstetrics floor.
So some of those “pregnant men” actually may represent issues of crowding rather than errors in documentation.
“The government are very keen on amassing statistics. They collect them, add them, raise them to the nth power, take the cube root and prepare wonderful diagrams. But you must never forget that every one of these figures comes in the first instance from the village watchman, who just puts down what he damn pleases.” – Sir Josiah Stamp
So men being admitted to obstetric wards is part of a dark government conspiracy? :-/
Every dataset has some error. The difficulty come with accurately estimating the error and attributing error to different sources.
Thus, rather than cause to criticize this dataset, having clear “errors” like (adult) men being admitted to obstetric wards could offer the ability to estimate the overall error in the dataset.
True, but also, between the comments on the original article and AKDB’s comment here, I think we’ve pretty much established that a man being admitted to an obstetric ward can’t just be assumed to be a coding error.
The stats are from HES http://www.hesonline.nhs.uk and relate, in the first instance to the specialty of the clinician, rather than the function they are performing, the latter set I’m guessing mainly relates to male babies.
The byline on this article about faulty data reads: Nigel Hawkes :: Fri, 06/04/2012 – 08:24
For the record, today is Friday, 18 May, 2012.
Now if you all will excuse me, I’m going to go find out who wins this weekend’s Preakness Stakes.
Are you referring to the date? If so, I believe they mean 6-APR-2012.