[Infowarrior] - The dark side of the web
Richard Forno
rforno at infowarrior.org
Sat Mar 13 16:53:33 UTC 2010
The dark side of the web
Posted on 9 Mar 2010 at 15:47
http://www.pcpro.co.uk/features/356254/the-dark-side-of-the-web
Google sees only a fraction of the content that appears on the internet. Stuart Andrews finds out what's lurking in the deep web
When Google indexes so many billions of web pages that it doesn’t even bother listing the number any more, it’s hard to imagine that much lies beyond its far-reaching tentacles.
Beneath, however, lies an online world that few know exists. It’s a realm of huge, untapped reserves of valuable information containing sprawling databases, hidden websites and murky forums. It’s a world where academics and researchers might find the data required to solve some of mankind’s biggest problems, but also where criminal syndicates operate, and terrorist handbooks and child pornography are freely distributed.
Disappear into the dark web
There's many valid reasons why you might not want your online exploits searchable. Find out how you can disappear from the web
At the same time, the underground web is the best hope for those who want to escape the bonds of totalitarian state censorship, and share their ideas or experiences with the outside world.
Interested? You’re not alone. The deep web and its “darknets” are a new battleground for those who want to uphold the right to privacy online, and those who feel that rights need to be sacrificed for the safety of society. The deep web is also the new frontier for those who want to rival Google in the field of search. Take a journey with us to the other side of the internet.
Deep webs, the dark web and darknets
The first thing to grasp is that, while the elements that make up this other web have aspects in common, we’re not talking about a single, unified entity. Those in the know will often talk in terms of the deep or invisible web, darknets and the dark web, and you might think these are all the same thing. In fact, they’re separate phenomena, albeit linked by common themes, properties or interests.
The deep web isn’t half as strange or sinister as it sounds. In computer-science speak, it refers to those portions of the web that, for whatever reason, have been invisible to conventional search engines such as Google.
The majority of this deep web is made up of dynamically created pages and database entries that are accessible only through manual completion of an HTML form
The majority of this deep web is made up of dynamically created pages and database entries that are accessible only through manual completion of an HTML form. A smaller proportion has been accidentally or purposefully made inaccessible to Google’s crawlers, while other areas sit behind password-protected or subscription-only sites.
Make no mistake, the deep web is huge. Michael Bergman’s pioneering 2001 study, The Deep Web: Surfacing Hidden Value, estimated that it accounted for 7,500TB of data at a time when search engines could index only 19.
Even the more conservative estimates in a 2007 paper written by Google’s Jayant Madhavan, Alon Halevy and colleagues, suggests that there are more than 25 million different sources of deep web content, many of which are huge repositories.
“There is a prevailing sense in the database community that we missed the boat with the WWW,” the Google paper concluded. “The over-arching message of this paper is that a second boat is here, with staggering volumes of structured data, and that boat should be ours.”
Treasures of the deep
“There’s a lot of legitimate and valuable content in the deep web,” said Dr Juliana Freire, the leader of a University of Utah project, DeepPeep, which aims to make deep web content more accessible.
“For example, there are several scientific data sets (such as the Sloan Digital Sky Survey and the Center for Coastal Margin Observation & Prediction), documents and databases, and these are useful to society and have many important applications.”
For Freire, exposing this data and giving researchers the tools to share and analyse it could be a key step for the evolution of science. DeepPeep is far from alone. Next-generation search engines such as Kosmix and info-driven harvesters such as BrightPlanet are working hard to pull data from the deep, while Google now has its own automated deep web search program in place.
There’s nothing necessarily secretive about the majority of this hidden content. When asked if the deep web harbours criminal or illicit activities, Dr Freire explains that “underworld” content is just as likely to be found on the “surface web”, and describes the deep web as “a more benign place” than some imagine. There are, however, areas that are more intentionally secretive, and this is where the deep becomes the dark.
The deep web is a more benign place than some imagine
Liam O Murchu, a security expert at Symantec’s Security Technology and Response team (STAR), believes there are three tiers of criminal operating online. The least serious, and most common, will operate in plain sight, on forums that can be found with a conventional search engine.
Beyond this, there are more serious – and paranoid – cybercriminals who “may only work in environments that they consider secure, for example, invite-only forums or secure private chat channels”. These forums will be “harder to find, often by word of mouth in other forums, or by invitation only or via ‘vetting’ and will not be indexed in search engines”. For a higher level of secrecy, however, there’s the third option: the darknet.
Exploring the anonymous web
Often associated with small file-sharing networks, the term darknet refers to any closed, private network that operates on top of the more conventional internet protocols. To join these hidden internets, all you need to do is install a program, such as Freenet or I2P, and browse away, secure in the knowledge that you’re almost impossible to trace.
Freenet is effectively a shadow of the web, with its own sites, forums and email services. A related service, TOR (The Onion Router), provides tools to set up hidden services, including websites, which will be anonymous within TOR and inaccessible from the outside.
Technically, these applications are ingenious. Freenet operates as a network of decentralised nodes, with each system on the network contributing bandwidth.
Since Freenet sites don’t sit on servers, but on data stores spread throughout the network, they can’t be taken down, and because each communication between one computer and another is routed through other nodes, with each one only “knowing” the address of the next node and that of the last, Freenet’s users can maintain high levels of anonymity.
On Freenet, nobody knows who you are, or what you’re looking at. Each system also contributes hard disk space, which is occupied by a data cache containing chunks of heavily encrypted data that the program can reassemble into Freenet forums and sites.
A trip through Freenet can be unsettling. It isn’t hard to find sites offering hard-core porn or such charming tomes as The Terrorist’s Handbook, Arson Around with Auntie ALF and the Mujahideen Poisons Handbook, along with copyrighted software, video and music to download.
And while we didn’t come across any child pornography during our time on Freenet (for obvious reasons, we didn’t look), it’s widely acknowledged that it can be found.
Freenet was the brainchild of a young Irish computer scientist, Ian Clarke, who came up with the idea during his studies at the University of Edinburgh in the mid-1990s. He wanted to “build a communication tool that would realise the things that a lot of people thought the internet was – a place where you could communicate without being watched, and where people could be anonymous if they wanted to be”.
Built by a global team of developers, more than two million people have downloaded Freenet, and the network has up to 10,000 concurrent users at peak times. Clarke has evidence that Freenet has been distributed in heavily censored regions such as China, and that it’s used as a vehicle for free speech and safe communication.
But does this justify its use as a vehicle for child porn or inflammatory material? “The post is used more widely by paedophiles than Freenet is, yet nobody would talk seriously about shutting down the Royal Mail,” Clarke retorts. “While there will be content, such as child pornography, that we wish didn’t exist, we feel that the benefits, such as the freedom to communicate, that are provided by Freenet greatly outweigh the risks.”
Steven J Murdoch, a security specialist at the University of Cambridge and a member of the TOR project, would doubtless agree. By bouncing communications through a distributed network of relays, TOR both hides the source of your internet traffic – your IP address – and the destination: the site you’re visiting.
Like Freenet, TOR is used by dissidents living under oppressive regimes to counteract IP-based censorship and to preserve their anonymity. It’s also used by law-enforcement agencies, journalists and those – such as corporate whistleblowers or abused wives talking to a support group – who need to cover their tracks.
TOR is used by dissidents living under oppressive regimes to counteract IP-based censorship and to preserve their anonymity
The application is easy to download, and can be switched on with nothing more than a browser plugin.
Like Ian Clarke, Murdoch doesn’t shirk from the accusation that TOR can be used for illicit purposes. As with any technology, “bad people will use it, and TOR and other anonymous communication networks are really no exception in this regard”.
For Murdoch, the overall benefit to society is greater, however, “not only because the bad users are a small proportion, but also because the people who are willing to break the law already have the ability to get reasonable anonymous communications”.
It’s a view echoed by Symantec’s Liam O Murchu. “One property that all cybercriminals desire is anonymity online. Then, even if their activity is monitored, their identity still remains hidden.” However, he adds that, “this doesn’t mean that closed networks should be banned, of course, because there are perfectly legitimate reasons for legal groups to use them”.
There’s another issue with services such as Freenet, I2P and TOR that might make some users uncomfortable: as the whole technology relies on routing traffic through the various nodes on the network, your system and your internet connection will inevitably be used to transmit content – albeit in an unreadable and encrypted form – that you might find objectionable.
Worse still, Freenet will use the cache on your hard disk to store and serve it. “There is potential that, on your computer, there would be a hold of material like that sitting on your hard disk,” Ian Clarke explains, “but it would be in a form that you couldn’t access, even if you wanted to.
Certainly, for some people, they view that as a reason not to use Freenet, but a higher percentage realise that they’re providing a service to people, and that while, yes, some material like that will be on it, they can’t be held responsible.”
Policing the darknet
Do these applications and services make things more difficult for those investigating, say, child abuse? “To a degree,” a spokesperson for the UK’s Child Exploitation and Online Protection Centre (CEOP) told us. “We are aware of darknets, closed networks and closed forums, and how offenders are using them to communicate, but we can and we do use everything within our power to track down these people.”
It’s also worth pointing out that services such as TOR are in active use by law-enforcement and intelligence agencies. After all, it’s hard to investigate criminal networks if your IP address marks you as a cop.
Of course, it isn’t only ordinary criminals who have adopted the dark web. Terrorist organisations, too, are looking at it as an alternative to more easily monitored forms of internet communication. In 2007, Mark Burgess, director of the World Security Institute in Brussels, warned that “too much focus on closing down websites could also be counter-productive, since it likely forces terrorist websites to go underground to the so-called ‘deep’ or hidden web”.
It looks like this warning was justified. In an article written for the Combating Terrorism Center at the US Military Academy, West Point, Dr Manuel R Torres Soriano, professor of political science at the University of Seville, explains how Islamic terrorists have responded to the constant closure of propaganda websites by going underground.
They’ve adopted the practices of internet pirates by using file-hosting websites and forum software to maintain a web presence. Online terrorists have also been known to use TOR (its use is covered in some Jihadist FAQs), and have even created their own secrecy tools, such as the Mujahideen Secrets encryption tool.
However, the same techniques being used to mine the deep web for information can also make life harder for the terrorists. In 2007, a team led by Hsinchun Chen of the University of Arizona unveiled a project, DarkWeb, which now tracks terrorist activity across the surface and deep webs.
Where previously various counter-terrorist and law-enforcement agencies worked piecemeal on infiltrating and extracting information from websites and forums, DarkWeb is designed to root out terrorist groups and, in Chen’s words, “exhaustively collect their content”.
Over the past eight years, DarkWeb has collected close to two million files, documents, videos and messages, logged them and made them accessible to intelligence agencies and research bodies across the world. If these organisations want to investigate a threat or try out new theories, they no longer have to trawl the deep web themselves. Instead, “they can take a look at that collection and study it in a more systematic and data-driven manner,” said Chen.
As far as Chen is concerned, however, darknets and closed forums aren’t a major concern. “In general, 95 to 99% [of terrorist content], is really in the open area,” he explains. For terrorists, moving to “somewhere more secretive, like a darknet, isn’t so interesting because they won’t be able to recruit or touch or influence a large number of their target audience”.
The Internet Watch Foundation – the UK industry body charged with removing paedophile content from the web – makes a similar point about child abuse. “The majority of content still comes from big, commercial enterprises,” a spokesperson told us, “and they need to be out there on the open web.”
In fact, Dr Chen argues that terrorists are more likely to make use of familiar forms of communication. “We’ve done a lot of work in websites, forums and even on YouTube, and now we’re doing a lot of exploration in Second Life, because we need to monitor the more fluid and more dynamic web environments that are more difficult to look at.”
In short, there’s some dark stuff going on in the deep, dark portions of the web, but don’t get too hung up on it. After all, there’s plenty of equally dark stuff still floating on the surface.
Author: Stuart Andrews
More information about the Infowarrior
mailing list