For the past few days, I've been doing more research on recent data breaches, especially including types of breaches and numbers affected. One number keeps coming up in the media: 88 million. In many cases, "88 million" is described as the number of compromised records. In other cases, it is described as "Americans" or "people":
http://www.first.org/newsroom/globalsecurity/32460.html (Americans)
http://biz.yahoo.com/bizwk/060623/b3991041.html (Americans)
http://www.computerworld.com/action/article.do?command=viewArticleBasic&articleId=9001282 (records)
http://www.internetnews.com/security/article.php/3615461 (people)
We know that the number 88,000,000 or so has been calculated by adding the number of total people affected from all listed breaches since Choicepoint in February 2005. Looking at this total though, it seems to me that the number is inflated due to the fact that it appears to represent unique individuals. The VA breach really caused me to take a better look at the situation and rework some of the numbers.
In this situation, all numbers are estimates and examples are hypothetical. Let's use 26.5 million as the estimated number of people affected in the VA breach. Because the total U.S. population is approaching 300 million, 26.5 million would represent one out of every eleven U.S. citizens, or roughly nine percent. For rounding purposes, let's say about ten percent of U.S. citizens were affected by the VA breach.
88,000,000 total
minus
26,500,000 VA
----------------
61,500,000 non-VA breached
Assuming ten percent of the U.S. population has been in the military based on the VA numbers, it would be safe to estimate that about 6.15 million former vets were involved in all other breaches. Those 6.15 million would be duplicated in the VA total, so should be subtracted from the overall total, which would then equal about 81.85 million.
But what about other duplicates? I'm sure many people were affected by more than one breach. Those with records in the Choicepoint incident may likely have been affected by the LexisNexis breach. Someone with an Ameriprise account may have been cared for by Providence Home Services. It probably goes on and on to the point that the *unique* number of people affected will probably never be accurately determined. I can understand saying 88 million "records" have been breached, but if we're judging by records and not individuals, then Acxiom would have been the worst breach of all time:
http://attrition.org/errata/dataloss/2003/12/acxiom05.html
More than a billion records.. but how many individuals? Did each individual have ten records per listing in Acxiom's database? Fifty? A hundred? Did Acxiom really have the records of one-sixth of the world's population in a database? Did the media bother to make this distinction, or just use the number "one billion" for shock value without digging to find the facts?
I honestly believe that the media either is using the wrong terminology when referring to "number affected" or doesn't understand the complexity of quantitatively analyzing how many people are truely affected by data breaches. This may be a point for us all to consider when using overall "totals" as a statistic in the media. While the number of individual records, Americans, or people *per incident* may be relatively accurate, 88 million "people" or "Americans" seems high, and it should be the media's responsibility to make this distinction.