Data plural versus data singular

Posted to Statistics  |  Tags:  |  Nathan Yau

Kevin Drum on data is or data are:

Now, I know that lots of people continue to foolishly disagree with me about this, but I’m curious how far they’re willing to push things. If you had, say, five bits of information, would you say I only have five data? If you really, truly believe that data is a plural noun, you’d have no problem with this. But does anyone actually do it?

This was in response to the Wall Street Journal’s style guy saying that they can go either way, as the word as has evolved to also mean a singular collection of numbers.

Here’s what the New York Times style guide has to say about it:

[D]ata is acceptable as a singular term for information: The data was persuasive. In its traditional sense, meaning a collection of facts and figures, the noun can still be plural: They tabulate the data, which arrive from bookstores nationwide. (In this sense, the singular is datum, a word both stilted and deservedly obscure.)

I say data is. The plural version sounds weird to me.


  • this is an important topic. after we settle this, we need to tackle the pronunciation: ‘day-ta’ vs. ‘dah-ta’

    • ImSpartacus July 12, 2012 at 9:41 am

      Long a (day-tuh) sounds less stuffy.

      I naturally associate a short a with a British accent. There’s a reason British butlers get paid more. They sound stuffier.

      • And I associate Day-ta with Brent Spiner’s character and Dah-ta with information… and as such can only say the latter.

  • Then we should say “five dataS” and not “five data” I think this is where it makes it even more complicated and weird to hear/read. Reminds me of Infographics VS Data visualisation.

  • Benjamin: historically, “data” is the plural form of “datum,” which means a single data point, so “five data” would be the correct inflection. Even if we treat “data” as plural, though, it doesn’t necessarily follow that it’s correct to say “five data.” There are other grammatically-plural nouns that behave in weird ways, like “pants.”

    Personally, both “this data” and “these data” sound wrong to me, so I contort my sentences to avoid the distinction.

    • The thing is, data itself has changed over the years. Data used to always be clear cut singular observations. It used to be unambiguous where one datum ends and the next datum begins. Data today is more varied and less clear cut, so it’s appropriate that it has changed from being a count noun, like “these birds are” and “those barrels are”, to a mass noun, like “this rice is” and “that water is”.

      If you say “I have data”, and someone says “How many?”, could you answer without first clarifying what one unit of data is in this context? Usually not – only in rare, exceptionally simple cases where you both already know how this data is structured. For any real plural noun, however, asking “How many?” is unambiguous and perfectly normal. You might not be in a position to able to count the things, but you’d know how to count them if you were and you’d know what standard defined one unit.

      Another word that went through a similar change is “physics”. It used to be the plural of “physic”, where “physic” means “a branch of natural science”. There used to be a universally defined, agreed set of physics. Someone could ask “How many physics do you teach?” and you could count them and give a definite answer (“Three. I teach the physic of motion, the physic of light, and the physic of thermal dynamics” – I’m guessing with the example physics but you get the idea). Then physics developed as a field and now, there are an almost unlimited number of legitimate ways of defining branches, sub-branches and areas of multi-disciplinary study within the natural sciences. Using “physics” as a plural today would be wrong because it would imply a clear universal standard for counting different physics that simply no longer exists.

      It’s the same with data. What is one datum in the context of the machine code that makes up a digital photograph? Is a datum one bit, one byte or the data defining one pixel in this context? What about the metadata attached to most photos containing things like timestamps and information about the camera? If asked “How many data are in this photo”, do you include them or exclude them? There is no universal definition, so we talk about the data in a photo like the water in a bottle, and we measure it in mass units (this is 86kb of data , that is 2 litres of water).

      What is one datum in the context of the data stored in a map? Is a road a datum, or is each angle in the vector of a road a datum? Is a landmark one datum, or 5 linked data (latitude, longitude, altitude, label, category…)? We don’t pretend that the data in the map can be counted using a standard criteria, instead, we communicate the amount of data by talking about standards and mass units (“This is a map of X type, at Y scale and Z size”).

  • It depends on whether you’re talking about individual items of data (e.g. “the data show that…”) or all the items collectively (“the data shows that…”). It’s like “people” vs “persons.”

  • But there is a singular form of data — datum. This is the joy of English (language in general but English in particular offers such a wealth of examples) — there is no absolute rule.

    I actually don’t care either way but I admit, I don’t think of data in terms of 0s and 1s or as individual units of information but more like a flock of birds — yes it is made of individual birds but what is interesting is the group (i.e., data:flock not data:geese).

  • I pretty much treat it as a collective noun, like flock or crowd. Since the term “datum” is fairly obsolete at this point, I don’t think there is much danger of running across “data” as an actual plural. Any use of “data” as a plural just sounds very wrong to me. “Five data” sounds completely nonsensical.

    As to the pronunciation, I’m really hoping that we can agree that both are correct. Because I find myself using both, sometimes within the same conversation. Possibly within the same sentence.

  • In my world, the use of data as a plural noun serves primarily as a way to distinguish quantitative researchers from our more qualitative-focused colleagues or those with less training or education. It’s a bit of an obscure relic and not something that anyone will ever really care about although using data as a singular noun might raise a few eyebrows or cause some of us (e.g. me) to more closely scrutinize one’s quantitative methods and conclusions.

  • Data Schlepper July 12, 2012 at 11:31 am

    I predict that the word “datum” will make a comeback. Politicians and pundits will start to use it because it sounds cooler than “data”. Look for it in op-ed pieces and sound bites.

  • Data, like “Mathematics”, or “Physics”, but unlike “pants” is a collective noun, and therefore perfectly correct in “The data is convincing”. While “datum” is obsolete, it really refers to the true singular, i.e. “a (single) data point”.

    Data, being a collective noun is both a singular and a plural simultaneously. This is not a problem.

    Just please do not start saying “datas”, as that would be truly annoying…

  • I’m a very precise guy, who likes to point out the real definition of words like “plethora” to people (who I secretly regard as idiots). Nevertheless, I don’t give a f**k what the dictionary says, saying “the data are” sounds weird so I won’t use it that way. But I know that I’m technically wrong, so like the guy above I try to contort my sentences when writing to avoid the issue, but I can’t bring myself to say it any other way than the natural way when speaking. I try to avoid considering the hypocrisy of judging other people by their standards of language whilst refusing to obey the laws myself

  • For individual “datums”, don’t we have records, observations, measurements, values? Data is like water, there may be a lot of H2O molecules out there, but I think of the water in that river.

  • Matteo Cerri July 13, 2012 at 2:23 am

    Datum is a latin word meaning “given”. It is singular and the plural is data. Usually latin words should be used following the latin grammar, so datum for 1 point, data for more than one. But often in english the latin origin is forgotten and the latin word start to intermingle with the english grammar. Similar to datum/data is, for instance, the word media. Media (like mass media, social media…) is the plural form of medium, a latin word meaning “instrument”, but I have never read about a social medium or mass medium. Much more common is to read sentence like “Facebook is a social media…….”

  • You just ruined my day.

  • Richard Hackathorn July 13, 2012 at 7:22 am

    LOL… You bought a smile to me this morning! We can now laugh about this glitch in the English language. However, several decades ago authors of early textbooks on information systems had a big flight with technical editors at major publishers about this issue. I know! Been there; done that. The data is in… Go either way, but just be consistent.

  • “The data suggest” makes sense if you’re thinking about a collection of observations (or think meta-studies).

  • My opinion: Day-ta is. Dah-ta are. Information is. The day-ta suggests… The dah-ta suggest… Water is. water droplets are. Dat-ta is fluid. Dah-ta are many individual units.

  • [D]ata is acceptable as a singular term for information: The data was persuasive.

    In this example, the data here means more than one piece of data as it implies more than one piece of information. Data is ALWAYS a COLLECTION of information.

    The data IS persuasive not the data ARE persuasive is correct, and one of many oddities about the English language: The people (more than one) are but the group (still more than one) is and everyone (all the people) is but bloggers (more than one) are…

    No consistency at all, but it is beginning to make sense. People and bloggers are many individuals while group, everyone, and data are words that are ONE entity that encompasses many. So I guess that IS consistent after all.

    So if you can put many in front: many people, many bloggers then use are but if many doesn’t make sense: many group, many everyone, many data then you use is.

  • This – the singular/plural “data” dilemma – refers more broadly to the issue of collective nouns. Research, learn, and then understand that “correct” only matters to prescriptivists anyhow. If you’re a descriptivist, then this is a pointless debate (which it is).

  • I always thought that the singular of “data” was “anecdote”?

  • Clearly, the people has spoken…

  • Pronunciation of data: surely the most common is datta, not dayta or dahta (both of which have a “long” a).
    In Latin the “a” is short, so datta would seem to be preferable. But the Brits tend to Anglicize all foreign words—think Paris or sine die (signee die-ee)—so their dayta/dahta is suspect.
    As for data/media is/are, it might be worth recalling the Greek practice, which is sometimes forgotten. All Greek neuter plural nouns—like criteria or phenomena—take a singular verb: for them the criteria is, never are. And while that’s not true of Latin, Latin neuter plurals like data are closer to their Greek cousins (they both end in -a, for instance) than to their English counterparts—perhaps we should accord all such Classical borrowings the dignity of a Classical construction.


