Main Menu
Home
About
Archive
Woot Alarm
apt-fast
Zen Kernel
Downloads
Satellite
Dish Keys
SURGE
Links
Search
Search Bible
Feed Me!
 

 Subscribe

Add to Google

Add to Pageflakes

Subscribe in Bloglines

Add to My AOL



 







front page
Even More Stuff You Shouldn't Know
Thursday, 19 April 2007
Since the first and second editions of "Stuff You Shouldn't Know," I have gleaned even more interesting files from the internet. Since people enjoy this kind of stuff, I thought I would go ahead and keep posting these tidbits as I find more. Before we get started with the fun, I have to write a boring disclaimer...

Disclaimer: Matt Parnell, nor his host or anyone else is responsible for my stupidity. If I am incarcerated, killed, injured, lose money, or have any detrimental thing happen as a result of reading or using the information below, I do not have the right to blame Matt Parnell, nor do I have the right to sue him for any reason in relation to said information. By reading below this disclaimer and downloading files, I agree.


Whew...now that the boring part is over, lets get some files (moved here to the downloads section).

First, we have an ebook entitled "Googlecash: Make Money Using Google," a PDF file.

Next, we have the "FM 21-76 US Army Survival Manual," a DOC file great for camping trips!

Following that, we have a PDF ebook entitled "How to Develop A Super-Power Memory."

Then we have a useful PDF entitled "How to Read Body Language."

Finally, we have an ebook entitled "How to Get the Truth Out of Anyone," which is also a PDF.

Note:If you are the copyright holder, and wish me to either take down an item of yours, or post a web address or other information linking to you, contact me and I will promptly do so without any trouble whatsoever.
 
Wife Finds Husband's Lost Disc
Wednesday, 18 April 2007
From cs.helsinki.fi .... no words needed, see the translation below.


 
Hosting - A Big Problem I Had, Now Fixed
Wednesday, 18 April 2007
When I first started my site, I had a decent host. After a few months when I really got a good amount of traffic, and at the same time got Dugg a couple times, I had a serious problem. I was fine when it came to storage space and bandwidth, but the real problem was that my SQL conqueries were super high, around 250% above what they were supposed to be. I had three choices:

    -Find another host that doesn't limit SQL queries, that has equal or greater storage and bandwidth

    -Host it myself using a dynamic DNS provider

    -Use My ISPs hosting

The problem with the first one is having to pay a fee. That is just about the only downside. The upside is that if I did this, I would have a ton of features for later use.

The problem with the second one is the wait time between IP address changes. Every time your IP changes, it takes around 5 minutes for a dynamic DNS provider to send users of your site to it. Besides that, uptime on the cable modem hasn't been very good in recent weeks. The upside is that you have total control and a dedicated server.

The last one is out of the question. My ISP only provided 100mb of storage space, and gave no other details. Besides that fact, they haven't been very reliable lately.

So, what did I do? I ended up finding another host. I picked Dreamhost, because their low-end "Crazy Domain Insane" package gives you unlimited SQL databases, unlimited queries, unlimited utility domains, 1 free domain registration, 160gb of storage, that increases by 1gb every week, and 2500gb of starting bandwidth, which increases by 16gb every week. There are dozens of other features, including one click installs, developer tools, support for various web languages, and more! All of this is at what I consider to be a very fair price, as it actually costs less than my previous provider, which gave me less features for more money. Where's the logic in staying with them?

Needless to say, I am really impressed. I have been with Dreamhost now for around a month, and my site now can stand up to gigantic amounts of traffic with no trouble (or overage fees) whatsoever. If you plan on getting more traffic on your site soon, or if you need a host, I highly suggest you give Dreamhost a try - besides, they give you a 97 day money-back guarantee (which I am still in if they somehow screw up really badly)!
 
Robots.txt - Get Indexed Fast, Keep Google Out of Private Areas
Tuesday, 17 April 2007
There are two aspects for webmasters to understand about the robots.txt file. The first is that it is good for telling friendly robots which directories they don't have permission to index. The other is quite new, and it allows you to get your site indexed much more quickly. To learn about such things, Grashoppper, read on.

Protect Your Directories:
Disallowing access to directories is a good idea when you have administrative files, as well as private directories on your server. Since the debut of Google, many sites are exploited because webmasters didn't realize that their sensitive "hidden" folders were in fact, being indexed by Google. Since then, people have started to catch on, and robots.txt is now a standard among search robots. This is a benefit. To disallow access to a folder by all search engines, use the following:

User-agent: *
Disallow: /superSecret/

This tells the search engine to not look in the folder superSecret when crawling your webspace. User-agent: * means that this rule applies to all robots, and not any in particular. Should you want to prevent a certain bot from seeing your site, while keeping others able to scan it, simply replace * with the robot's user agent (i.e. "googlebot"). 

Get Indexed Quickly:
Sitemaps (a sitemap contains a list of every link on your website) are  a very useful tool for getting your site indexed quickly. Since the beginning of time until now, you had to have a specific sitemap format for both Google and Yahoo, and you had to prove you owned your site, then submit the map to the engine. Now, thanks to a new standard agreed to by the "big 4" search engines (Google, Yahoo, MSN, AltaVista) you can insert the location of your sitemap in your robots.txt file, and these engines simply need to have your main URL submitted to them. Provided you have a valid sitemap xml file, the engine will automatically look first for the sitemap, and then will proceed to index your site using the URL's provided within the sitemap. There are plenty of free sitemap makers available, see Google for that. An example for a sitemap location in robots.txt can be found below. I personally prefer to just stick it at the top of the file.


That's all there really is to it! Go protect your sensitive directories now, before Google Hackers can find your personal files or break into your site (unless they look at robots.txt, then try browsing to the directories that should be owner read/write only anyway)...
 
One .htaccess File to Rule Them All
Monday, 16 April 2007
For those of you who don't know what .htaccess is, it is a file used by servers running Apache to configure individual servers. In my case, I use it to block a large volume of spammers, scrapers, bad and rude robots, and other Internet evils. When dealing with .htaccess files, you need to make sure your host allows you to use them, and that your server runs Apache. Besides that, you just need to be careful because you can easily get a "500 Internal Server Error" if you aren't careful. Note: at the bottom of this article is a link to a copy and pasteable example that will secure your site very well.

That said, lets get on to the good stuff! To block an IP address with your .htaccess file, simply add the following:

Example: deny from 000.000.000.000

You can even block using partial IP addresses, if you so desire:
 
Example: deny from 000.000.000

The above methods are good for blocking bots, domains, and unruly users. I know that many people use them to block governments, as well as the RIAA and MPAA from their sites.

Another use is to use mod_rewrite to block common exploits. These exploits, though common, can often be used to take your site over in the event you aren't secured. (duh)...

Example(# denotes a comment):
 # Block out any script trying to modify a _REQUEST variable via URL
RewriteCond %{QUERY_STRING} _REQUEST(=|\[|\%[0-9A-Z]{0,2})

Some rules are complicated looking, although they do their purpose in the end. Such filters as [0-9] and [A-Za-z] simply mean that the number 0-9 and letters A-Z, a-z are being looked for. In the below example, a series of these are used in the remote address to determine if a certain bad bot is attempting to crawl, and if it finds it to be true, blocks it.

Example:
RewriteCond %{REMOTE_ADDR} "^63\.148\.99\.2(2[4-9]|[3-4][0-9]|5[0-5])$" [OR]  # Cyveillance spybot

These methods are useful for blocking certain user-agents too!

Example:
RewriteCond %{HTTP_USER_AGENT} TurnitinBot [OR] # Turnitin spybot


You can even block single words coming from referers, such as spammy domains.

Example:
RewriteCond %{HTTP_REFERER} viagra [NC,OR]

Finally, for the last nail in these baddies' coffins, just use this at the end of your blocklists and create a noindex.html page. This will really tick them off. If you really want to hurt them, write an infinite php loop in there to confuse and crash whoever or whatever bot it may be.

Example:
ReWriteRule ^.*$ /noindex.html  [L]

An example file of everything discussed here, with blocks for hundreds of baddies can be found by clicking this sentence.
Don't forget to rename it .htaccess!

For an official manual to the .htaccess file, see this Apache tutorial.

Next time, I will write a brief article about the robots.txt file.






 
<< Start < Prev 11 12 13 14 15 16 17 18 19 20 Next > End >>

Results 127 - 135 of 230

© Matt Parnell's Brain: Plugged In!