Reduced privacy risk in exchange for accuracy in the Census count

December 6, 2018


Statistics  /  , , ,

Mark Hansen for The Upshot describes the search for balance between individual privacy and an accurate 2020 Census count. It turns out to not be that difficult to reconstruct person-level data from publicly available aggregates:

On the face of it, finding a reconstruction that satisfies all of the constraints from all the tables the bureau produces seems impossible. But Mr. Abowd says the problem gets easier when you notice that these tables are full of zeros. Each zero indicates a combination of variables — values for one or more of block, sex, age, race and ethnicity — for which no one exists in the census. We might find, for example, that there is no one below voting age living on a particular block. We can then ignore any reconstructions that include people under 18 living there. This greatly reduces the set of viable reconstructions and makes the problem solvable with off-the-shelf software.

To combat this, the Census is looking into injecting more uncertainty into their published data. The challenge is figuring out how much uncertainty is too much and what level of privacy is enough.