Filter Rules: Allowing Only Top-Level Documents
To limit the number of the documents in the index and prevent spamming, some FDSE administrators choose to restrict additions to only the top-level document of each web site. Thus, http://www.xav.com/ could be added, but not http://www.xav.com/index.html nor http://www.xav.com/scripts/.
This restriction is created using a custom Filter Rule. Follow these instructions to set it up:
Go to "Admin Page" - "Filter Rules" - "Create New Rule"
Create the new rule with these settings:
Name: "Allow Top-Level"
Enabled: [x] checked
Action: Deny
Analyze: URL
Minimum Occurrences: 1
(_) Apply rule only if...
(*) Always apply rule, unless
Strings: (blank)
Patterns:^http://([^/]*)/$
Scope: (*) Apply only to these types of realms / [x] Open Realms
Once this rule is active, all new additions will be required to be in the form "http://host.tld/". If there are any characters past the "/", then the entire URL will be forbidden. To apply the rule to existing entries, simply rebuild the realm.
Note: some web sites have starting pages below the web server's top-level document. For example, those hosted at Geocities start as "http://www.geocities.com/username/". This rule would prevent those sites.
"Filter Rules: Allowing Only Top-Level Documents"
http://www.xav.com/scripts/search/help/1099.html