Home > Fluid Dynamics Search Engine > Help > 1049

How to prevent your pages from being indexed

FDSE respects the The Robots Exclusion Standard. This standard is used to prevent sites, individual pages, or folders from being indexed by any standards-compliant indexing process.

Website exclusion file

To use the Standard, create a "robots.txt" file in the top-level folder of your site. The format of the file is one or more "User-Agent" headers followed by paths which are forbidden to that agent. Here is an example:

User-Agent: *
Disallow: /secret/
User-Agent: BadGuy
Disallow: /
User-Agent: FDSE
Disallow: /logs/
Disallow: /cgi-bin/

See my robots.txt file for another example, or visit the Standard home page linked below.

The robots.txt file marks sections of a site off-limits based on the User-Agent string. The FDSE crawler has a variable User-Agent string which can be customized at "Admin Page" => "General Settings" => "Crawler: User Agent". When parsing a robots.txt file, the parser will respect any section whose User-Agent label matches the "Crawler: User Agent" setting, or that matches string 'FDSE', or that matches string '*'. The parser uses a case insensitive substring match.

Page-level exclusion tags

Alternately, you can forbid access to a single document using a meta tag. Include the following in your HTML source:

	<title>Search Engine Help</title>
	<meta name="robots" content="none" />
<body> ...

The value "none" means to not index the file and not follow any links. The value "noindex" means to not index the file, but still extract links. Conversely, the value "nofollow" allows the file to be indexed, but does not allow links to be extracted.


Support for the Robots Exclusion Standard may be disabled by going to "Admin Page" => "General Settings" and setting "Crawler: Rogue" to 1 (checked).

For an alternate way of excluding documents from the search, see How to forbid pages that you don't control.

    "How to prevent your pages from being indexed"