An Analysis of the Pew Internet and American Life Project's report on Spyware, 7/6/2005

[an error occurred while processing this directive]

Introduction:
Around 7/7/05 while checking some of my daily news websites I saw a few articles which mentioned some interesting statistics as they pertained on online behavior. Being the maintainer of the Errata > Statistics subsection of Attrition.org this piqued my interest; furthermore, they all seemed to be referencing the same report. I decided that since I found multiple articles with the same reference I might want to take a deeper look at what the referenced report said. This page is an analysis of the Pew Internet and American Life Project's report on Spyware which is based off of a survey it sponsored and was conducted by Princeton Survey Research Associates International.

1.	PIP Spyware Report (article)	SHA	Remote Link
2.	PIP Spyware Report (Questionaire/Data)	SHA	Remote Link

Structure:
Since there are two documents involved I'll look at each separately, although because the Pew report is based off of the Spyware survey some remarks may be apply to both. "Report" refers to the Pew document and "Data" or "Data Report" refers to the Princeton Survey document.

Full Disclosure:
During my time as an undergraduate student at the University of Mary Washington I took classes ranging from introductory to advanced statistics as a requirement for completion of the psychology major. All told I took three semesters of courses dealing explicitly with statistics and five semesters dealing with the application of statistics to psychology research. I authored two questionnaires dealing with online behavior that were edited and approved by faculty and an IRB review board for experimentation on human subjects. I presented results of this research seven times at various undergraduate research symposiums. I've also worked part time as a systems-administrator and help desk technician for six years, and a network security administrator for two. Recently I began full time work in this field after my graduation from university. It is with these qualifications in mind that I offer my analysis. Comments, questions and criticisms can be mailed to Zodiac@attrition , flames piped to /dev/null.

Notes: The reference numbers refer to the pages as listed in the .PDF, not the printed pages. I felt since most people would be reading a digital copy this would be a reasonable concession.

Pew Internet and American Life Project's report on Spyware

Overview:
The reason the statistics section in errata was created was a response to media reports of statistics of computer security or cyber crime that did not reference a credible source or any source at all. The Pew report does reference its conclusions, provides its source, questions, descriptive statistics about the data and some verbatim user responses to certain questions. While there are some parts of the report I take issue with, it is ultimately well documented and ethically sound.

Problems:
(p. 3) ... "Although most do not know the source of their woes, tens of millions have experienced computer problems in the past year that are consistent with problems caused by spyware or viruses."

These problems are consistent but not exclusive. For example: (p. 3) "52% of home internet users say their computer has slowed down or is not running as fast as it used to.” (These results reference Q.26 in the data report). These statistics are a bit misleading because these problems can be attributed to a variety of sources, for example if I download a resource intensive program that runs in the background and continue computing as normal, my experience will seem slower than it has before. If I were presented with this problem as a sys-admin the first thing I would look at would be the spyware status of the affected computer and generally speaking that would be the cause, however, it wouldn't always be the cause. So how much of this 52% can be attributed to spyware? Earlier in the section (p. 2) the author references the Online Safety Study by AOL and the National Cyber Security Alliance which reported from scans for spyware and adware on users computers results saying 80% of computers were affected. Combining results from the two studies we find that 80% of 52% is approximately 42%, so I would contend that of the 52% of users that reported problems in the Pew report, not all of those problems are a result of spyware. Then another problem emerges, are the definitions of spyware and adware consistent across the two reports, since the AOL study was done in October or 2004 has their 80% statistic changed as technology has improved? The answer just isn't clear.

(p. 8) ... "Either way, adware is used to serve up targeted advertising based on the user's online behavior, much like a personal assistant who accompanies you in your online travles, making suggestions about what you might like or where you might find a bargain elsewhere."

This is a fairly tame metaphor for describing adware. When dealing with spyware and adware infestations in a corporate environment, all programs, cookies, and registry entries must go. The simple reason being that any program classified as spyware is insidious in its very nature and adware is still enough of a privacy risk that it has no place in a corporate environment. It is arguable that end users may not need this level of privacy, if the sole purpose of a home computer is to facilitate online purchases wouldn't a program fitting the report's description of adware be welcome? Yes and no, the report goes on to point out that many users unknowingly install spyware or adware programs or will even allow the installation without being fully cognizant of the implications. I offer that most people are at least concerned about their privacy and current methods provided by adware companies to alert users about the privacy risks are insufficient to address that concern. Even if better methods are developed, it is an incredible amount of trust to place in a product. Given all of this I would argue that adware currently has no place on in a user's computing environment given the risks.

(p. 14) ... "88% of users say they have a good idea of what 'spam' means."

I can say I have a good idea of what quantum theory is all about but that doesn't make it true. The report references a wikipedia definition (see points of interest for more on this) for what spam is, but cross referencing this statistic with the data report indicates this definition was not used when the participants were asked the question. They were asked to rate their level of understanding on a fairly unspecific scale but their idea of what constituted spam was not compared to any commonly held definition at all. This isn't misrepresentation of data because the report doesn't state that these users do in fact have a good idea of what those terms mean, but states they only believe they do. I think it would be more appropriate to put this data into more of a definite context by explaining its limitations.

(p. 14) ... "78% of internet users say they have a good idea of what 'spyware' means."

I am unclear on what order the questions were asked in, the report gives them in an order that is out of sequence with the question numbers. If they were read in the order given in the report then the validity of questions 51 and 35 are called into question. These questions rely on a common understanding of what spyware is to be valid, so the confidence rating of question 51 becomes suspect because the confidence may increase or decrease given the user’s definition of spyware. Question 35 has similar problems because given the definition of spyware participants may have a clearer idea of what the source of their problem was. If anyone can provide clarification on this point, shoot me an e-mail. I checked the methodology section of the data report and didn't find anything.

(p. 21) ... "Half of computer fixes are quick and easy, but one in five problems is never solved."

This is somewhat counter intuitive for me, most spyware problems I've dealt with are not quick and easy fixes. Usually it involves deep scanning the computer with multiple spyware and adware removal tools, a virus scanner and probably some operating system patches. Given this, are users really fixing the problem or are they only delaying its reappearance?

Points of Interest:
(p. 7, footnote 2) ... "A 2004 Ponemon Institute survey found that 66% of consumers said they would welcome personalize banner ads, but do not want Web sites to collect personally identifiable information. ..."

This raises a privacy issue that Google dealt with when they debuted GMail and wanted to integrate search features that would require what some felt was a breach of privacy. Methods have been discussed whereby user privacy could reasonably be maintained but rested on the enormous technical and funding resources google possessed. I would argue that most firms who provide targeted marketing, not having the aforementioned resources, could not make a similar guarantee. Therefore I don't think it is a reasonable possibility that firms could give viable personalized advertising without sacrificing an unacceptable amount of user privacy. I think, then, that adware still represents a significant security and privacy risk similar to that of spyware.

(p. 11, charts) ...

I think its interesting to note that users who experienced adware changed their behavior more than users who experienced spyware. I would say this supports my argument that adware has no place on in a user's computing environment.

(p. 14, footnotes) ...

While I was looking through the document I noticed that the report referenced the wikipedia project for definitions. I'm a big fan of the wikipedia, so I thought this was pretty cool.

(p. 6) ... Acknowledgements

Two executives from targeted advertising companies were consulted and are listed in the acknowledgements. It is unclear on how much their involvement may have influenced the report in favor of their companies and market, I found nothing explicit but as noted above I felt the description of adware was a bit "softer" than perhaps it should have been. Also a point of interest, Microsoft is currently preparing to buy Claria. Keep this in mind, this is not an indictment of fraud, data manipulation or any other such thing. It is, however, a point of interest and I leave the reader to draw their own conclusions based on the information presented in this report.

Conclusions:
I found this report to be generally well documented and based clearly on data found by their survey. While I took issue with some of the questions and methods of the survey document and some conclusions drawn by the report, this is very competently done. Having reviewed similar reports in my academic life the problems and questions raised here are not uncommon.

Princeton Survey Research Associates International data report and questionnaire for the Pew Internet Spyware Survey

Overview:
Upon review of the data report I found the questions to be appropriately worded and the data provided to be believable as representitive of their sample.

Full Disclosure:
The statistical methods listed in methodology section are out of my range of experience. I can not speak to their appropriateness or how accurately the data provided meshes with their methodology. I leave it up to the statistically canny reader to draw their own conclusions; and please share them with me.

Problems:
(p. 15, Q61a and Q61b) ... "How often is the virus protection on your main home computer usually updated?"

Given my experience in the corporate world the reported numbers aren't representative. However upon installation of most virus protection software there is a prompt for how often the user would like to update the product. If the computer is not on or connected to the internet at the time the updates are to take place, does the process re-initiate when the computer is on and connected? I've certainly seen instances where it doesn't, but I'm unsure of how prevalent that problem is. Further, if the product does not update automatically I have found that most users simply forget to do it, which draws into the question the results of Q61b. In perfect circumstances it is difficult to get an accurate response to this question because the survey participant has a vested interest in maintaining a positive self image, so they may not be totally honest when answering this question.

Points of Interest:
(p.13) ... Results

I found it interesting that respondents rated typical adware behavior closely to some of the more troubling aspects of spyware.

(p. 41) ... Collection methods

This is an interesting section on how telephone survey companies find valid telephone numbers, not something I was familiar with but cool nonetheless.

Final thoughts:
In my experience I've found that there's very little difference between spyware and adware. Most spyware that I remove from user computers comes bundled with ads and I will often find evidence of password theft, key logging, and transmission of data back to a central location. I think a worthy project would be to establish something similar to a spam blacklist, but instead a spyware blacklist. If the programs report back to a central source that's something that can be controlled by egress filtering. Sys-logging of blocked connections to those address can greatly simplify the process of finding and eliminating spyware in an enterprise environment. Maybe some day we'll see ASAB, the Attrition Spyware Address Blacklist.

[an error occurred while processing this directive]