The problem with damage figures for telephone fraud and software piracy are shady to say the least. What these companies refuse to tell you is that their figures are rough estimates. Not only that, they are rough estimates based on how many copies of a program or how many phone calls were made. The problem here is, that if the person had to pay for the software or phone call, there is a good chance they wouldn't have. So if they do it because it is free and ONLY because it is free.. is it real loss? The person usually would never have payed in the first place.
For a great read on computer crime damage figures, M.E. Kabay, PhD has written a paper titled "Understanding Studies and Surveys of Computer Crime" that explores the aspects/facets of determining such damage figures.
Dan Barrett does a nice job questioning stats created by CERT and regurgitated in an IEEE Computer article: http://catless.ncl.ac.uk/Risks/18.04.html#subj9
Julie Ryan and Theresa Jefferson have published a great paper titled "The Use, Misuse, and Abuse of Statistics in Information Security Research" which examines several reports related to security and statistics. This is a must read.
While not specifically related to the InfoSec community, Stephen Jay Gould was a gifted palenontologist and researcher who had a knack for explaining things in a way most folks could understand them. Anyone looking for a no bullshit view of interpreting statistics should check out Full House: The Spread of Excellence from Plato to Darwin by Stephen J. Gould.
v2.0a: This is an update of the original statistics section maintained by Jericho. It's a reorganization, unification of style, and hopefully we'll start regular updates of the content.
| Introduction | Main | Archive | Analysis | Laptops | News Sources |
Some important notes about context
For anyone who hasn't taken a course in statistics or hasn't given them
much thought here's an example.
"75% of respondants indicated that they felt spyware wasn't an issue for them or anyone they knew."
75%, that's a nice round believable number well above a majority. With the type of statistics thats most often used in news reporting the idea is that because sampling an entire population (republicans, the French, etc.) is inphesible, it is possible to sample a much smaller population but given some fairly straight forward math we can say with a certain ammount of confidence that if we were to sample that larger population the results would match that of the smaller.
This is the type of statistics used when election polls come out or opinion surveys are done. If you hear some news anchor prattling off a number like the President's approval rating, they didn't ask everyone in the country (you didn't get a call did you?) what their opinion of the president was. Statistics can be misleading though.
Using our example, "75% of the respondants" seems like its a statement that gives you a lot of information.
We took a survey
75% of people thought this way
The crux of this issue, though, is sample size. This statement would be equally true of I had a sample size of 4, 400 or 40,000. Further, how did the survey come to the conclusion that Spyware wasn't a big threat? Did the survey ask point blank, "do you feel spyware is a threat?" What answers were allowed? Were a series of questions asked to come to this conclusion and were those questions equally rated? Who was the subject pool, did they all use computers? Did they all have a reasonably similar idea of what spyware is? What is reasonable in this case? How was the survey conducted? What was the incentive for the participant?
I think statistics can be an excellent way to give a summary of a large ammount of data. However, they can also be an excellent tool to obfuscate the truth if the reader doesn't ask the proper questions. Keep this in mind when you see a statistic that purports to report the answer of a complex question for a large number of people. Unless that question is quantitative (How many children do you have) it probably deserves a closer look.