There are many reasons you will see 404 errors in your log files. In
case you forgot, a 404 error means a request was made for a file (or
an object) and it did not exist.
Some of the more common reasons are listed below.
- Renaming Pages-
There a few instances where the name of the page is changed. For
example, you might have originally created "associatedcontent.htm" and
changed it to "ac.htm". In these instances, be sure and create 301
redirects when you do this.
-Moving a Page-
At times, webmasters decide to reorganize their websites, in such
cases references to old pages may still exist. Even though it has been
transfered to a new hosting service or domain. This is especially true
of off-site references - search engines, link exchanges and others.
When you reorganize a site, you must remember to include 301
redirection to the new pages. This informs the search engines that
your pages have moved.
-Case Sensitive Operating Systems-
There are some operating systems which are case sensitive. On Windows
(IIS), the case (upper and lower characters) of files and folder names
does not matter. Windows will match "ThisIsAPage" and "thisisapage" as
identical. However, on Unix and Linux, these names are not identical.
In fact, you have have both at the same time. This will cause 404
errors. It's very common when webmasters move their site from Windows
to another operating system. In general, it's best to stick to
lower-case characters for file and folder names, regardless of the
operating system.
-Misspelling-
A very common reason for 404 errors is Misspellings. Sometimes users
do type URLs, especially if it's been printed on a business card, in a
magazine or in an advertisement. These URLs are often misspelled and
this causes 404 errors. To prevent this, you can create 301 redirects
for some common misspellings, and keep those URLs which are entered by
users short.
-Nimda and other Worms-
Many worms attempt to gain access to web servers by exploiting known
vulnerabilities. These often consist of strange URLs and bizarre
combinations of characters. The result is lots of 404 errors (assuming
your system is secure and it is not compromised). There is really no
way to prevent these errors.
-Used Domains-
If you purchase a domain name which has been previously owned, then
you might find yourself getting strange 404 errors for files and
directories which have no relation to your site. These are left over
from the previous owner. There probably isn't much you can do with
these, although you might set up 301 redirects for some of the more
active pages.
-Used TCP/IP Addresses-
If someone would have used the same TCP/IP address before, you might
be catching some of the old traffic. It wont be for long, but it will
create a lot of 404 errors.
-Robots.txt file-
The robots exclusion standard specifies that web sites should include
a file called robots.txt in their root directory. This file indicates
which parts of the web sites should NOT be spidered. Thus, spiders
will attempt to open this file, and 404 errors will result of the file
does not exist. You should always create a robots.txt file, even if
it's empty.
Thanks a lot, I hope it was useful for you. If you have any more
please let me know on my e-mail address.