[Infowarrior] - Tax Takers Send in the Spiders

Richard Forno rforno at infowarrior.org
Thu Jan 25 12:07:03 EST 2007


Tax Takers Send in the Spiders

http://www.wired.com/news/technology/security/0,72564-0.html?tw=wn_index_1

By Quinn Norton| Also by this reporter
02:00 AM Jan, 25, 2007

Websites around the world are getting a new computerized visitor among the
Googlebots and Yahoo web spiders: The taxman. A five-nation tax enforcement
cartel has been quietly cracking down on suspected internet tax cheats,
using a sophisticated web crawling program to monitor transactions on
auction sites, and track operators of online shops, poker and porn sites.

The "Xenon" program -- a reference to the super-bright auto headlights that
light up dark places -- was started in The Netherlands in 2004 by the Dutch
equivalent of the IRS, Belastingdienst. It has since been expanded and
enhanced by international group of tax authorities in Austria, Denmark,
Britain and Canada, with the assistance of Amsterdam-based data mining firm
Sentient Machine Research.

Xenon is primarily a spider: a program that downloads a web page, then
traverses its links and downloads those as well, ad infinitum. In this
manner spiders can create huge datasets of web material, while preserving
the relationships between pages at the moment they were spidered --
something that can reveal a lot about the people that made the pages.

It's unclear how effective Xenon has been in generating investigative leads.
Contacted by Wired News, the tax departments of Canada and the United
Kingdom confirmed participation in the program, but declined further
comment.

Dag Hardyson, the national project leader for e-commerce for Skatteverket,
the Swedish tax authority, was more forthcoming. Skatteverket is scheduled
to join the Xenon project this year, and Hardyson said web crawling is well
suited to tax enforcement.

"The internet is wide open for tools," said Hardyson. "It's much easier to
handle than the real world."

Xenon, explained Marten den Uyl of Sentient, is in some ways the opposite of
something like Google's web crawler, which traverses a tree of links and
grabs a copy of everything it sees. Xenon is smart about link selection and
context, and uses a "slow search paradigm," he said.

Whereas a spider like the Googlebot might hit thousands of websites in a
second, "With Xenon it may take minutes, hours or even days to do a slow
search."

The slow search prevents the crawler from creating excessive traffic on a
website, or drawing attention in the sites' server logs. Den Uyl declined to
say what user-agent the Xenon software reports itself as, but it's likely to
be variable or configurable on the tax investigator's part.

The spider can also be configured and trained to look at particular economic
niches -- a useful feature for compiling lists of business in industries
that traditionally have high rates of non-filing. "For instance, weight
control (yields) 85,000 hits, some for products ... also services," says
Sweden's Hardyson.

Once the web pages are screen-scraped, Xenon's Identity Information
Extraction Module interfaces with national databases containing information
like street and city names. It uses that data to automatically identify
mailing addresses and other identity information present on the websites it
has crawled, which it puts into a database that can be matched in bulk with
national tax records.

As illuminating as Xenon is for the tax man, the data-mining effort poses
dangers to citizen privacy, said Par Strom, a noted privacy advocate in the
world of Swedish IT.

"Of course it's not illegal," said Strom. "I don't feel quite comfortable
having a tax office sending out those kind of spiders."

One issue has to do with how the information Xenon captures is protected.

Sentient has created access controls for its law-enforcement data-mining
tool, called Data Detective, but its Xenon software lacks many of those
protections, said dan Uyl, commenting on the theory that investigators will
quickly delete the compiled data.

"Data Detective (handles) long-term data warehousing," he said, "(Xenon is)
short-term project data warehousing. Different type of data, different type
of analysis."

But Hardyson said the Swedish government -- which already has its own
internally developed tax crawlers -- is currently keeping a copy of
everything it spiders. That means that someone's long-expired actions have
the potential to come back and haunt them. "We can scan and store all
actions for every e-marketplace in Sweden, it's about 55,000 per day," said
Hardyson. He said his agency hasn't decided if it will change its policies
with the new, more sophisticated Xenon software. "Is this what we should do?
Our lawyers must look at it."

Canada's tax authorities declined to state what its Xenon data retention
policies are, as did Simon Bird, head of the "Web Robot Team" at the British
HM Revenue and Customs office.

In the United States, the IRS is not a part of the Xenon project, but would
neither confirm nor deny that it uses spidering software in its
investigations.

Strom said now that the cat is out of the bag, there's no way to get
governments or corporations to forgo technologies like spiders and data
mining.

"The information is public of course, because it's posted on the internet,"
Strom says. "It wasn't meant to be used this way ... (this is) using the
naivete of people. It's on the limit of what is ethical."




More information about the Infowarrior mailing list