Main Menu
Home
About
Github
Archive
Woot Alarm
apt-fast
Zen Kernel
Downloads
Satellite
Dish Keys
SURGE
Links
Search
Search Bible
Feed Me!
 

 Subscribe

Add to Google

Add to Pageflakes

Subscribe in Bloglines

Add to My AOL



 







Robots.txt - Get Indexed Fast, Keep Google Out of Private Areas
Tuesday, 17 April 2007
There are two aspects for webmasters to understand about the robots.txt file. The first is that it is good for telling friendly robots which directories they don't have permission to index. The other is quite new, and it allows you to get your site indexed much more quickly. To learn about such things, Grashoppper, read on.

Protect Your Directories:
Disallowing access to directories is a good idea when you have administrative files, as well as private directories on your server. Since the debut of Google, many sites are exploited because webmasters didn't realize that their sensitive "hidden" folders were in fact, being indexed by Google. Since then, people have started to catch on, and robots.txt is now a standard among search robots. This is a benefit. To disallow access to a folder by all search engines, use the following:

User-agent: *
Disallow: /superSecret/

This tells the search engine to not look in the folder superSecret when crawling your webspace. User-agent: * means that this rule applies to all robots, and not any in particular. Should you want to prevent a certain bot from seeing your site, while keeping others able to scan it, simply replace * with the robot's user agent (i.e. "googlebot"). 

Get Indexed Quickly:
Sitemaps (a sitemap contains a list of every link on your website) are  a very useful tool for getting your site indexed quickly. Since the beginning of time until now, you had to have a specific sitemap format for both Google and Yahoo, and you had to prove you owned your site, then submit the map to the engine. Now, thanks to a new standard agreed to by the "big 4" search engines (Google, Yahoo, MSN, AltaVista) you can insert the location of your sitemap in your robots.txt file, and these engines simply need to have your main URL submitted to them. Provided you have a valid sitemap xml file, the engine will automatically look first for the sitemap, and then will proceed to index your site using the URL's provided within the sitemap. There are plenty of free sitemap makers available, see Google for that. An example for a sitemap location in robots.txt can be found below. I personally prefer to just stick it at the top of the file.


That's all there really is to it! Go protect your sensitive directories now, before Google Hackers can find your personal files or break into your site (unless they look at robots.txt, then try browsing to the directories that should be owner read/write only anyway)...
Comments
RSS
Only registered users can write comments!

3.25 Copyright (C) 2007 Alain Georgette / Copyright (C) 2006 Frantisek Hliva. All rights reserved."

 

© Matt Parnell's Brain: Plugged In!