|
Feed Me! |
|
|
|
|
Internet Archaeology Article |
|
Monday, 30 July 2007 |
Welcome interested people, 2600 Readers, and everyone else! I couldn't wait, so here is the article!
Published in 2600 - The Hacker Quarterly, Summer 2007
By ilikenwf (A.K.A. Matt Parnell)
Archaeology is a term that describes unearthing an artifact that is
old, long lost, or forgotten. The internet is no different from the
real world in the sense that it too has artifacts of media from days
gone by. You just have to know where to look. The best place to start
is the Internet Archive "Wayback Marchine," which houses over
8 Petabytes of old information gleaned from the earliest days of the
Internet up to now. Just put in an address, and you can view a site,
provided it was indexed, all the way back to 1996. Beginning Methodology
I had wanted to find as much "lost" TechTV and ZDTV media as possible,
for nostalgia's sake. Starting out, I just was viewing the sites by
individual archive dates. This was way too tedious and time consuming
to be worth while, and it didn't really give me much to work with.
Digging around on the archive's information pages, I discovered that
searching sites with wildcards (*) is supported. To give it a shot, I
typed in "http://www.techtv.com/*", as well as "http://www.zdtv.com/*".
These searches yielded long lists (45,000+) of pages from the two
domains. At first it was really slow to sift through the information,
until I found a way to speed it up - go to the bottom of the search
page, and set the number of results displayed to 30. Then, when the
page reloads, the url will look like this: "http://web.archive.org/web/*sr_1nr_30/http://url.com/*".
Just change the 30 to a reasonable number that won't cause your browser
to crash and load the page from your edited url. The list will be much
larger, therefore you don't have to click "next" over and over again.
Then, scroll/pagedown through the content looking for interestingly
named files, and files with uncommon extensions, like pdf, psd, zip,
etc. Find one, click the link, and if there is only one copy of that
file in the archive, it will pop right up unless it was indexed
incorrectly. Otherwise, you will get a choice of dates the file was
archived on. Choose the first one. Keep working through the dates until
you find a good uncorrupted copy of the file (see tips and tricks
section for expanation).
Subdomains
The problem with this method is that it doesn't search all of the
subdomains of a top-level domain address. To do this, either use a
whois search, look at the web pages' (html, php, xml, etc.) sources and
look at the paths. Using a combination of these methods, as well as my
memory of the sites, I stumbled across subdomains like cache.techtv.com , chat.techtv.com , and more. You can see a list of the domains I found by clicking here.
See The Findings
Using the above methods, I searched other domains and found all sorts
of stuff - a font of Cat's handwriting, psd and eps source images for
many of the show's logos, lots of wallpapers, avatars from the old ZDTV
chat palace, among other things. I also found many video and sound
clips from the old "Fox Kids" television network on the archived copies
of "foxkids.com..." All in all, I was very successful, and very
pleased. You can grab a copy of my discoveries from the links at the bottom of the page. Practical Uses
These methods can all be used for good or evil - you can see the inner
workings of sites that have, since archiving, locked down areas that
were once pulicly open. Sometimes, you can even find media that was
free, but is now charged for, thus saving you money. In truth, the
sky's the limit! Have fun!
Some Tips and Tricks:
1.These methods WILL give you files other than "web only" files, such as executables, zip files, and video files.
2.One
problem is that some of the zip files and exe files get garbled and
corrupted during transfer to the archive (especially on older pages)
and don't always work. You can sometimes repair the zip files, but many
times it doesn't work. Try finding another archive date with the same
file. If you can't, it is best to move on.
3.Take note that you
aren't really supposed to download from the archive. People do it
anyway, but you really should make sure that you don't sell the
material you find, and use it for "educational" and "archivial"
purposes only.
Findings:
These are hosted on Megaupload so that my site doesn't crash from bandwidth overuse.
Links are now working again. These links will be updated as needed, both here and on my Downloads page.
TechTV Archaeological Findings (RAR)
Fox Kids.com (and other related domains) Archeological Findings (RAR)
Shoutz:
For what it's worth, shoutz Adrian Lamo at 2600, as well as Greg, Hevnsnt, CodedChaos, Surbo, and all the other guys at I-Hacked, and the Edge. Have a good time at Def Con, you lucky jerks!
|
|