September 06, 2003

Full text search of history

… well at least that portion of history that is online since 1996.

Recall is a search engine at the Internet Archive that indexes the text of over 11 Billion pages. The archive has pages dating way back to 1996 through the present day.

The index on this search engine is 2 terabytes. It is indexing over a petabyte of data. I’m pretty sure that those numbers mean really big.

I like the little graphs that chart the frequency that your search term appeared over the years….eg. A search on “napster” gives you this cool “rise and fall” picture:

470382805177-total.jpg

There’s also some nifty on-the-fly categorization: It suggests “being misused” as a sub category of pages about the ICANN board.

This will be fun.

Posted by james at September 6, 2003 01:01 AM | TrackBack
0 Comments and Trackbacks
Post a comment









Remember personal info?