[Dataloss] 88 million... is it really an accurate number?

lyger lyger at attrition.org
Tue Jun 27 17:30:48 EDT 2006



For the past few days, I've been doing more research on recent data breaches, 
especially including types of breaches and numbers affected.  One number keeps 
coming up in the media: 88 million.  In many cases, "88 million" is described 
as the number of compromised records.  In other cases, it is described as 
"Americans" or "people":

http://www.first.org/newsroom/globalsecurity/32460.html (Americans)

http://biz.yahoo.com/bizwk/060623/b3991041.html (Americans)

http://www.computerworld.com/action/article.do?command=viewArticleBasic&articleId=9001282 
(records)

http://www.internetnews.com/security/article.php/3615461 (people)

We know that the number 88,000,000 or so has been calculated by adding the 
number of total people affected from all listed breaches since Choicepoint in 
February 2005.  Looking at this total though, it seems to me that the number is 
inflated due to the fact that it appears to represent unique individuals.  The 
VA breach really caused me to take a better look at the situation and rework 
some of the numbers.

In this situation, all numbers are estimates and examples are hypothetical. 
Let's use 26.5 million as the estimated number of people affected in the VA 
breach.  Because the total U.S. popluation is approaching 300 million, 26.5 
million would represent one out of every eleven U.S. citizens, or roughly nine 
percent.  For rounding purposes, let's say about ten percent of U.S. citizens 
were affected by the VA breach.

88,000,000 total
minus
26,500,000 VA
----------------
61,500,000 non-VA breached

Assuming ten percent of the U.S. population has been in the military based on 
the VA numbers, it would be safe to estimate that about 6.15 million former 
vets were involved in all other breaches.  Those 6.15 million would be 
duplicated in the VA total, so should be subtracted from the overall total, 
which would then equal about 81.85 million.

But what about other duplicates?  I'm sure many people were affected by more 
than one breach.  Those with records in the Choicepoint incident may likely 
have been affected by the LexisNexis breach.  Someone with an Ameriprise 
account may have been cared for by Providence Home Services. It probably goes 
on and on to the point that the *unique* number of people affected will 
probably never be accurately determined.  I can understand saying 88 million 
"records" have been breached, but if we're judging by records and not 
individuals, then Acxiom would have been the worst breach of all time:

http://attrition.org/errata/dataloss/2003/12/acxiom05.html

More than a billion records.. but how many individuals?  Did each individual 
have ten records per listing in Acxiom's database?  Fifty?  A hundred?  Did 
Acxiom really have the records of one-sixth of the world's population in a 
database?  Did the media bother to make this distinction, or just use the 
number "one billion" for shock value without digging to find the facts?

I honestly believe that the media either is using the wrong terminology when 
referring to "number affected" or doesn't understand the complexity of 
quantitatively analyzing how many people are truely affected by data breaches. 
This may be a point for us all to consider when using overall "totals" as a 
statistic in the media.  While the number of individual records, Americans, or 
people *per incident* may be relatively accurate, 88 million "people" or 
"Americans" seems high, and it should be the media's responsibility to make 
this distinction.

Lyger


More information about the Dataloss mailing list