Google shock for Los Rios

March 7, 2007

Eric Stern and Dorothy Korber

http://www.sacbee.com/101/story/133870.html



A community college student who was "Googling" himself last month found some disconcerting information when he typed his name into the popular Internet search engine.

A Los Rios Community College District database popped up that included his name, birth date and Social Security number. The file also contained data on about 2,000 other students.

"We didn't think the information was open to Google," said Susie Williams, a spokeswoman for the Los Rios schools. "It was a shock to learn they were able to do it."

The file has since been removed from the Internet, and Los Rios sent letters to students alerting them of the breach. But the incident shows how Internet search engines are aggressively digging into every nook and cranny of cyberspace.

Computer security experts say lessons about privacy can be learned from those operating Web sites and those searching them.

Google, the No. 1 search company, isn't packing a room full of Web geeks to surf for new sites to archive.

Like other search engines, Google uses algorithms -- they call them spiders -- to creep through computer networks seeking publicly available information.

"Google's spiders regularly crawl the Web to rebuild our index," according to a statement by Google. "Computer programs determine which sites to crawl, how often, and how many pages to fetch from each site."

Matt Bishop, a computer science professor at the University of California, Davis, said it's aggressive -- but not criminal.

"It's kind of crawling all over and looking under rocks," he said. "Occasionally things you don't want to turn up, do turn up."

Despite advances in Internet security, it can still be remarkably simple to find personal -- or embarrassing -- information over the Web. There doesn't need to be an official Web address to find the information, either.

By deleting a few characters at the end of Gov. Arnold Schwarzenegger's Web site last year, Democratic opponents found audio recordings of the Republican governor making racially charged comments.

The governor's staff accused Democrats of hacking the site, but a California Highway Patrol investigation concluded last month that no laws were broken by "backward browsing."

"Google hacking" has become a sport - if not profession - of using high-level searches to ferret out information that Web operators probably don't realize is out there.

A Web site by Johnny Long, johnny.ihackstuff.com, includes a database of hundreds of sneaky Google-search tips, such as adding "not for distribution" or "confidential" into query searches. Typing "filetype:xls" will spit out Microsoft Excel spreadsheets.

In the case of Los Rios, staff members were testing a new online application system and "just grabbed some files" to upload, said Williams, the college spokeswoman.

"Google had come along and indexed this little test batch," Williams said. "The data was on what we thought was a secure part of our Web server."

The data involved 2,000 of the school's 78,000 students.

That was in October. More than three months had passed before a Los Rios student contacted the school about finding the personal data file on Google.

"In Web time, that's a long time," said Pam Dixon, executive director of the World Privacy Forum. "If you leave it up for longer than a week, you're going to have some issues."

Bishop, the UC Davis professor, said Los Rios erred.

"Why were they testing with real data?" he asked.

The Los Rios district, which includes American River, Cosumnes River, Folsom Lake and Sacramento City colleges, is the second largest community college system in the state. Los Rios is now in the process of hiring an outside security firm to help beef up the computer network.

In a March 1 letter to the 2,000 students, Marie Smith, Los Rios vice chancellor of education and technology, said:

"After searching all the other major search engines and finding nothing, we believe the student information was only in the Google database and on no other search engine."

After checking the Los Rios Web logs, which track computer addresses of people accessing the school's site, Williams said only the one student who spotted the information -- and his wife -- clicked on the file.

Dixon of the World Privacy Forum said once the information is let out, it can be hard to track it all down.

More than 100 search engines beyond the big players -- Google, Yahoo and MSN -- are hunting for data, she said.

Dixon did a quick check Tuesday of Gigablast, a growing presence in the Internet search world, and found regularly archived Los Rios pages.

"You just don't know who picked up the data," she said. "If you think it's hidden it's probably not."

She also said other users may have accessed the student data by going to a cached, or stored, Web page, instead of clicking directly to the file on the school's Web server.

"Smart people uses caches and not the actual Web," she said.

And here's a warning to those searching for themselves online -- out of concern or conceit -- don't type in your full Social Security number and name, Dixon said.

Web search engines store everything.

Last year, AOL released search terms of 650,000 users, which included Social Security numbers and medical data.

The information, meant for researchers, was released to a public site and discovered by a blogger.


main page ATTRITION feedback