Log of pages that fail to be indexed
When FDSE indexes a large batch of documents, it will print a status header that shows how many pages were indexed, how many failed, and how many still need to be indexed, like so:
|Running in automatic mode on batch W. Status: XX documents indexed; YY failed; ZZ waiting to be indexed.|
Users often wonder which documents have failed, and why.
FDSE does not have a clean web-based view of this, but it does log the information to:
Each line of this file contains a URL, followed by the realm name, followed by a status number. All failed URL's have status number "2".
To view which pages have failed during your index operation, just download that file, open it in a text editor, and step through the file.
There are many reasons that a page might fail -- the robots.txt file may exclude it, the FDSE Filter Rules or the FDSE size limitations may exclude it, or the server may return a 404 Not Found error, to name a few possible reasons. The log file does not store the reason. To find out the reason, simply return to the FDSE admin page and use the "Add New URL" form to try to add the URL to the realm. That action should fail, and the reason for the failure will be written to the status screen.
A more user-friendly web-based error log is on the feature request list.
"Log of pages that fail to be indexed" http://www.xav.com/scripts/search/help/1018.html