[Infowarrior] - Websites could be required to retain visitor info

Wed Aug 8 13:13:32 UTC 2007

Original URL: 
http://www.theregister.co.uk/2007/08/08/litigation_data_retention/
Websites could be required to retain visitor info
By Mark Rasch, SecurityFocus
Published Wednesday 8th August 2007 10:26 GMT

A series of legal events means that companies that have no business reason
to retain documents or records may be compelled to create and retain such
records just so they can become available for discovery.

Companies routinely create, maintain and store electronic records. Some
records are consciously created  like memoranda, letters, spreadsheets, and
even e-mails and chat or instant message communications. Other records are
created inadvertently, like meta data, log records, IP history records and
the like. Some information is useful to the company, and it wants to retain
it, and other information is of little use, merely takes up space, creates
potential liability, and represents an unwarranted threat for attack or
violation of privacy. The problem for most companies in developing or
maintaining a document retention/destruction policy is identifying the
documents and records it wants to keep and effectively purging the ones it
doesn't want. Some recent legal events have made the problem of document
retention and destruction even more complicated.

A recent case involving file sharing site TorrentSpy illustrates the point.
Torrentspy's privacy policy (http://www.torrentspy.com/privacy.asp) is clear
and concise. It states:

    TorrentSpy.com is committed to protecting your privacy. TorrentSpy.com
does not sell, trade or rent your personal information to other companies.
TorrentSpy.com will not collect any personal information about you except
when you specifically and knowingly provide such information.

Pretty straightforward, and not too dissimilar from thousands of other
website privacy policies. Such privacy policies are considered to be legally
binding contracts, and the United States Federal Trade Commission, and
Privacy Commissioners in Europe, Asia and other places routinely hold
companies to their promises  under threat of civil and criminal prosecution
or fines.

The first problem with this privacy policy  like most privacy policies  is
that it's not true. Whenever you visit a website, you "involuntarily"
provide "personal" information to the site operator  things like the type
of browser you are using, your IP address, the physical location of that IP
address, your configuration settings, and what website you may have been
referred from or to, among other things.

If you are engaging in malicious, unlawful, or otherwise "actionable"
conduct, the website operator may certainly attempt to use this information
to identify you and discern what you are doing  the essence of "personal
information". Indeed, much of what we do as forensic investigators is to use
this kind of information to find people.

While net-savvy individuals know that this information is being collected
and utilized, the vast majority of individuals would not say that they
"specifically and knowingly" provided that information to the website. This
information frequently has economic value to the website operator as well.
Knowing what site referred the user may result in payments from or to the
referring site under "pay per click" agreements.

Aggregated personal information is useful for advertisers, and valuable to
those who collect it. So its not accurate to say that your website ONLY
collects information that you voluntarily give them. A better approach to a
privacy policy would include language similar to that used by, for example,
Google, which specifically states
(http://www.google.com/privacypolicy.html):

    Log information - When you use Google services, our servers
automatically record information that your browser sends whenever you visit
a website. These server logs may include information such as your web
request, Internet Protocol address, browser type, browser language, the date
and time of your request and one or more cookies that may uniquely identify
your browser.

Some of this information is collected automatically as a consequence of
delivering web content to the requestor. You would think that, in pursuance
of its privacy policies, a company could choose not to collect or more
accurately not to store or retain such information  after all, that's what
they promised their customers, no?

There has long been an adage in the law that essentially states that "if it
exists, it is discoverable". Now, as a result of a lawsuit involving
TorrentSpy, the United States District Court for the Central District of
California has essentially extended this logic to state that, "if it doesn't
exist, we will require that it be created and stored so that it can become
discoverable".

The case, Columbia Pictures v. Bunnell
(http://www.eff.org/legal/cases/torrentspy/columbia_v_bunnell_magistrate_ord
er.pdf) (pdf) arose when the movie studios wanted to find out the identity
of people using TorrentSpy to download copyrighted works  personal
information about TorrentSpy's users. TorrentSpy promised its users that it
wouldn't collect such information, and had no legal obligation to do so. As
the court noted:

    In general, when a user clicks on a link to a page or a file on a
website, the website's web server program receives from the user a request
for the page or the file. The request includes the IP address of the user's
computer, and the name of the requested page or file, among other things.
Such information is copied into and stored in RAM.).

    RAM is a form of temporary storage that every computer uses to process
data. Every user request for a page or file is stored by the web server
program in RAM in this fashion. The web server interprets and processes that
data, while it is stored in RAM, in order to respond to user requests.

    The web server then satisfies the request by sending the requested file
to the user. If the website's logging function is enabled, the web server
copies the request into a log file, as well as the fact that the requested
file was delivered. If the logging function is not enabled, the request is
not retained.

In keeping with its stated contractual privacy policy, TorrentSpy did not
enable the logging function, did not capture the information in RAM (or more
accurately did not store it) and therefore alleged that it could not produce
it in litigation.

After TorrentSpy was sued, the question arose about whether or not the
information NOT regularly collected by TorrentSpy  the information in RAM 
constituted Electronically Stored Information subject to both discovery and
what is called a litigation hold. Under a litigation hold, once you become
aware that information you may posess is relevant to ongoing or threatened
litigation, you must suspend your document destruction policy and stop
deleting that relevant information.

Electronically Stored Information is defined under the Federal Rules of
Civil Procedure (http://www.law.cornell.edu/rules/frcp/Rule26.htm) as
"information that is fixed in a tangible form and to information that is
stored in a medium from which it can be retrieved and examined".

The court rejected TorrentSpy's claims that the information in RAM was never
"stored" since logging was never enabled, and that requiring TorrentSpy to
enable logging amounted to requiring it to "create"; records that didn'
exist. Certainly, the information in RAM was  for a brief time  stored at
least transitorily, just as streaming media (like a VOIP call, or
videoconference) is stored on your computer for the brief interval it is
being displayed.

Thus, the information is (1) electronic; (2) stored; and (3) relevant. The
consequence of this is that not only is the information subject to discovery
under the TorrentSpy precedent, but the entity must then suspend its
document deletion policy, which in the case of TorrentSpy was to delete
information in RAM that it never stored.

The potential consequences of this ruling (which is currently on appeal) are
frightening. Whenever a company or other entity learns that information that
it doesn't collect (or more accurately collects but doesn't store more than
briefly) might be relevant to some litigation, it has to undertake
affirmative efforts to start collecting and storing this information, in
violation of its express privacy policy (creating potential FTC or privacy
commission liability) for no purpose other than to create liability.

Thus, when you learn of the possibility of litigation, you may have to START
storing streaming media, contents of VOIP calls, contents of
videoconferences, webinars, chats, instant messages, logs, scans, or other
electronic records that you never stored before.

The court also noted that companies "cannot insulate themselves from
complying with their legal obligations to preserve and produce relevant
information within their possession, custody or control and responsive to
proper discovery requests, by reliance on a privacy policy -- the terms of
which are entirely within [their] control". Thus, even if you SAY that the
information wont be collected (stored) and you have no reason to collect
(store) it, a court could mandate that you do so at your own expense.
ISPs, Portals and Telcos

A similar issue arises with respect to information held by Internet Service
Providers (ISPs), web portals like Google, Yahoo and Microsoft, and
telephone companies. These entities routinely collect massive volumes of
data about their clients and customers  including things like search
requests and results, IP history information, logon information, services
utilized, date, time, source, destination, and duration of calls.

VoIP providers or ISPs may also store the contents of voice or video
communications temporarily as a consequence of transmission of the packet
network. Remember the adage  if it exists, it is discoverable.

Now there are legitimate reasons for companies to want to collect, store and
use at least some of this information. There are business models based on
the analysis of this information. Load balancing, billing, and even selling
this information are all legitimate uses (provided that the consumer has
some awareness that this is going on.) What is important is that the
provider  the telco, the ISP or the portal  decides what information is
going to be collected, how it is going to be used, whether it is going to be
stored (and for how long) and then communicates these facts to the consumer.

There has long been a debate over how long these entities will retain the
records, and what they will do with them. The Department of Justice and the
FBI has long been seeking authority to require ISPs, Telcos and others to
retain log data and other data at their own expense, "just in case" the
information might later become relevant
(http://www.securityfocus.com/columnists/406) to some investigation.

European countries have also been engaged in the same dialogue. If the
records are retained (even when there is no business reason for keeping
them) the records become discoverable  by grand jury subpoena, FISA or
Title III wiretap orders, National Security Letters, or by voluntary
cooperation by the ISP or subject. They also become available in any other
litigation  copyright infringement, defamation, or routine divorce cases.

Since the ISP or portal would generally be a third party with respect to the
underlying litigation, they might not be mandated to create or permanently
store log or other transitory information, but that is not entirely clear.
What is clear is that the government wants companies that create electronic
data to keep it "just in case".

Indeed, ABC News reported that the FBI, in a Department of Defense
authorization bill 
(http://blogs.abcnews.com/theblotter/2007/07/fbi-would-skirt.html) requested
a grant of $5m to pay telephone companies to store information such as call
records, and to develop a method of retrieving such information at the
request of law enforcement. As reported by ABC News:

    The $5m project would apparently pay private firms to store at least two
years' worth of telephone and Internet activity by millions of Americans,
few of whom would ever be considered a suspect in any terrorism,
intelligence or criminal matter. The project would involve "the development
of data storage and retrieval systems...for at least two years' worth of
network calling records," according to an unclassified budget document
posted to the FBI's Web site.

So instead of warehousing the records themselves (and with no legal
authority to subpoena ALL records), the government is essentially issuing a
document preservation request to the telephone companies, requesting that
the records be kept by the telcos for two years, and agreeing to pay all or
some of the cost of doing so.

Effectively, this makes the telephone companies into the warehouses for the
government and for anybody with a subpoena. Note that there is nothing wrong
with the phone companies keeping these records for their own business
purposes, but now they will be keeping them presumably just in case.

The issue is not unique to telephone companies. Financial services
companies, credit card companies, ISPs, web portals, VoIP providers, social
networking sites, chat and IM providers all could be either compelled to
retain records, or paid off to retain them just in case - even when their
own privacy policy expressly forbids it.

Web portals like Google, Yahoo! and Microsoft learned the lesson of the
adage that if records exist they will be subpoenaed when, in the context of
defending Congress' anti-smut statute, the government subpoenaed (in a civil
lawsuit) massive volumes of data about how people used these portals, what
they searched for, and what was ultimately delivered.

As a result of this, and of the document retention requests by law
enforcement and regulators, all of the major portals have voluntarily agreed
to anonymize their records after a period of time  Yahoo! for 13 months,
Google and Microsoft for 18 to 24 months.

Ask.com went further, offering a service called AskEraser
(http://www.securityfocus.com/columnists/450/) which it claims would allow
for anonymous web surfing, and where "the company claims it will not retain
the search histories of customers who opt in for the AskEraser".

Which brings us back to where we started. Just because you promise NOT to
collect or retain records, doesn't mean that you won't be required to
collect and maintain them. Even if you don't have technology readily
available to capture data streaming through your network, if the information
is stored there briefly, you may be required to capture it.

Sure, you can try anonymizing technologies, but these usually work by NOT
LOGGING data, which as we learned with TorrentSpy doesn't always work. What
we need is a commonsense approach to what really is a record that is stored
by a company, as opposed to log data which COULD be stored by a company.

This article originally appeared in Security Focus
(http://www.securityfocus.com/columnists/450/).

Copyright © 2007, SecurityFocus (http://www.securityfocus.com/)