Indexing pages which aren't linked from other pages
By default, FDSE will only "discover" web pages on your site if they are linked from the other pages. Web pages which are not linked from the others will not be indexed within a normal web-crawler realm.
There are several ways to work around this behavior. They include:
-
Use a website realm with file-system discovery instead of web crawler discovery. File-system discovery involves directly exploring all folders and subfolders on your web server, instead of parsing links in HTML pages. This type of realm is useful if your web pages are not interlinked.
See Administration: Creating a "file system" realm for more information on how to create one of these realms.
-
Or, you can add individual URL's, one by one, to an existing web-crawler website realm by using the "Add New URL" form on the main Admin Page.
You only need to do this once for each URL that is not indexed. Once the URL has been entered into the index, it will be remembered when re-indexing.
-
Or, create a file in the top-level folder of your web server. Within this file, include three items:
-
A robots exclusion tag that tells FDSE to follow links, but not index the actual file.
<meta http-equiv="robots" content="noindex,follow" /> -
A link to your main page.
<a href="/"> main </a> -
Standard A HREF links to all pages that are not in the index. Examples would be:
<a href="/foo/bar/1.html"> orphan </a><a href="/x/y/z/w/"> orphan </a>
Name this file "seed.html" and upload it to your root folder. Then, when creating your web-crawler website realm, using the seed file as the Base URL, i.e. http://xav.com/seed.html instead of http://xav.com/.
There will now be links to all of your pages. As new URL's are added, just update the seed.html file with links to them, and rebuild the index.
-
"Indexing pages which aren't linked from other pages"
http://www.xav.com/scripts/search/help/1142.html