[Dataloss] 88 million... is it really an accurate number?
lyger
lyger at attrition.org
Tue Jun 27 17:30:48 EDT 2006
For the past few days, I've been doing more research on recent data breaches,
especially including types of breaches and numbers affected. One number keeps
coming up in the media: 88 million. In many cases, "88 million" is described
as the number of compromised records. In other cases, it is described as
"Americans" or "people":
http://www.first.org/newsroom/globalsecurity/32460.html (Americans)
http://biz.yahoo.com/bizwk/060623/b3991041.html (Americans)
http://www.computerworld.com/action/article.do?command=viewArticleBasic&articleId=9001282
(records)
http://www.internetnews.com/security/article.php/3615461 (people)
We know that the number 88,000,000 or so has been calculated by adding the
number of total people affected from all listed breaches since Choicepoint in
February 2005. Looking at this total though, it seems to me that the number is
inflated due to the fact that it appears to represent unique individuals. The
VA breach really caused me to take a better look at the situation and rework
some of the numbers.
In this situation, all numbers are estimates and examples are hypothetical.
Let's use 26.5 million as the estimated number of people affected in the VA
breach. Because the total U.S. popluation is approaching 300 million, 26.5
million would represent one out of every eleven U.S. citizens, or roughly nine
percent. For rounding purposes, let's say about ten percent of U.S. citizens
were affected by the VA breach.
88,000,000 total
minus
26,500,000 VA
----------------
61,500,000 non-VA breached
Assuming ten percent of the U.S. population has been in the military based on
the VA numbers, it would be safe to estimate that about 6.15 million former
vets were involved in all other breaches. Those 6.15 million would be
duplicated in the VA total, so should be subtracted from the overall total,
which would then equal about 81.85 million.
But what about other duplicates? I'm sure many people were affected by more
than one breach. Those with records in the Choicepoint incident may likely
have been affected by the LexisNexis breach. Someone with an Ameriprise
account may have been cared for by Providence Home Services. It probably goes
on and on to the point that the *unique* number of people affected will
probably never be accurately determined. I can understand saying 88 million
"records" have been breached, but if we're judging by records and not
individuals, then Acxiom would have been the worst breach of all time:
http://attrition.org/errata/dataloss/2003/12/acxiom05.html
More than a billion records.. but how many individuals? Did each individual
have ten records per listing in Acxiom's database? Fifty? A hundred? Did
Acxiom really have the records of one-sixth of the world's population in a
database? Did the media bother to make this distinction, or just use the
number "one billion" for shock value without digging to find the facts?
I honestly believe that the media either is using the wrong terminology when
referring to "number affected" or doesn't understand the complexity of
quantitatively analyzing how many people are truely affected by data breaches.
This may be a point for us all to consider when using overall "totals" as a
statistic in the media. While the number of individual records, Americans, or
people *per incident* may be relatively accurate, 88 million "people" or
"Americans" seems high, and it should be the media's responsibility to make
this distinction.
Lyger
More information about the Dataloss
mailing list