Home > Fluid Dynamics Search Engine > Help > 1059

Format of "search.pending.txt" file

When the crawler scans a web page, it extracts all of the links on that page for searching later. These links are stored in the "search.pending.txt" file in the data files folder (typically "searchdata").

The format of the file is one record per line. Each record looks like:

http://Address/File/ RealmName Number

This file is alphabetically sorted, and duplicates are removed.

Number can have three values. If 0, this page has not yet been indexed, but should be (for Website Realms, File-Fed Realms, or Open Realms where "Index Entire Site" is used). The crawler will index this file when you click "Maintain Realm" from the admin page. If Number is 2, this page has been forbidden - this means at one time the crawler tried to index this file and encountered an error. This page will never be attempted again by the robot during an automatic crawl session. However, you can still index the page by entering it directly in the Add URL form. Finally, if Number is greater than 2, then it is the numeric date of when the page was last indexed. During an automatic crawl session, the crawler will look for any addresses which haven't been updated in a while, and will re-index them. The age of pages for re-crawling is controlled by the "Crawler: Days Til Refresh" General Setting, which defaults to 30.

    "Format of "search.pending.txt" file"