Back to TF Net

From: Daniel O'Leary <Daniel_O'Leary@
To: cibt <cibt@headgap.com>
Subject: Searching Personal Web Pages
Date:Sat, January 06, 2001 04:09 AM


There are several things that could be wrong here:

1. The submission URL had the user's name or other path fragment with a space in it. many spiders choke on this and you can see it in the web server logs. There will be a file not found error returned with the robot asks for a page containing part of the user's name (but not all of it-everything behind and including the space is MISSING) because they did not include the space and what follows it in the URL they requested from the site. Spaces should be replaced with a %20 (URL-Encoded space). If the URL fragment is a user name, then the space may be replaced with an underscore, but this only works for the user's name.

2. The page has a robot exclusion tag in its header. This could be directly in the hompage file, or pehaps in a common header pulled via Server Side Include. I intentionally have these in many pages of my site to direct robots away from them, but the robots do not always obey the directives to "NOINDEX" and "NOFOLLOW". I believe webcrawler has online docs that describe the correct syntax and usage for these.

3. The user "Stuffs" the page with metatags. Some robots are now discarding pages that have excessive metatags, tha were placed there to increase the odds that the page is relevant to a given search, when the content itself has nothing relevent. Porn sites and some MLM sites are really bad about this.

A look at the logs would tell me more. If you would give me a sample URL I and look at it and make some suggestions.


On 01/05/2001 12:53 PM, cibt wrote:

>I have users who would like to list their personal web sites on search
>engines. Despite registering multiple times, their pages still don't
>seem to be included in the indices. I've tried listing them myself on
>both Lycos (which tells me the pages were successfully spidered), and
>Yahoo. Several weeks later, test searches still don't bring up the
>users' pages.
>
>Is there something in the server settings preventing spiders from
>accessing personal pages?

---
Daniel O'Leary, Admin/WebMaster KloneZone - A TeleFinder 5.7 BBS   
Voice=> 817-367-2558 Dial-In=> 817-367-2712 Fidonet=> 1130/1015
TFNet=> klonezone.tfnet.org Internet=> kz.eaze.net WWW=> http://kz.eaze.net


185


Running TeleFinder Server v5.7.
© Copyright Spider Island Software