How to bulk load URL's
Bulk loading is the practice of programmatically adding many URL's to the index, where manually using the web-based admin interface would not be practical.
This practice has become easier in more recent versions.
-
For version 2.0.0.0030 and earlier:
You can use the pending pages file "search.pending.txt" to bulk-load a long list of URLs to index. Enter one URL per line with Number equal to 0. Using the "Rebuild" command from the admin page will result in all pages being indexed.
The entries added to "search.pending.txt" should look like this:
http://xav.com/index/me My%20Realm%201 0 http://xav.com/index/me/2 My%20Realm%201 0 http://xav.com/index/me/too My%20Realm%201 0
Note that "My%20Realm%201" is just "My Realm 1" which has been URL-encoded. All realm names are URL-encoded in this data file.
Once the "search.pending.txt" file has been updated, use the "Rebuild" or "Maintain" commands on the given realm. All of the URL's will be indexed.
-
For version 2.0.0.0031 and newer:
If you have a long list of URL's that you'd like to bulk load, place them all in a text file, one per line. Use your Find and Replace function to replace "http://" with "><a href=http://" and add a single ">" at the end of the file. This will cause them to look sufficiently like URL's that the parser will extract them. The file should start like this:
http://xav.com/index/me http://xav.com/index/me/2 http://xav.com/index/me/too
After the find-and-replace, it should look like this:
><a href=http://xav.com/index/me ><a href=http://xav.com/index/me/2 ><a href=http://xav.com/index/me/too >
Then create a File-Fed realm with this text file as the Base URL. Use the Rebuild command on that newly-created File-Fed realm to load all of those URL's.
If your list of URL's is already in HTML format as a href links, then you can immediately create the file-fed realm and point the base URL to your HTML list.
Updated 2002-05-15
"How to bulk load URL's"
http://www.xav.com/scripts/search/help/1058.html