Home > Guardian Error Handling System > Help > 1006

Overview of Guardian Filter Rules

The Guardian software handles errors on your site. Different errors are handled different ways. The Guardian software decided how to handle each error by running the error information through a set of filter rules. Each rule has a criteria portion, which tells Guardian which types of errors it applies to, and a response or reaction portion, which tells Guardian how to handle that type of error.

Filter rules can be customized by going to the Guardian Admin Page and choosing the "Manage Filter Rules" link.

How filter rules are handled

Guardian processes Filter Rules in the order they appear. It will stop processing rules once it encounters a matching rule that involves displaying a response to the visitor. The "ignore" and "blacklist" reaction types are the only ones that don't involve displaying a response. Those should always be entered at the beginning of the filter rules.

You may list as many filter rules as you wish. Rules are separated by two equal signs "==" appearing on a line by themselves. The format of a Filter Rule is:

==
# comment (optional)
# explanation (optional)
action_type: action_string
reaction_type: reaction_string
==

Lines beginning with the pound sign "#" are treated as comments as of version 2.0.0.0005.

Help: Action Types

The "action types" are the part of the rule that is used to identify which errors this rule applies to.

url-string: http://host.tld/path/file.ext

The "action_string" must contain a fully-qualified URL. If the URL involved in the error matches this exactly, then this rule will take affect.

url-substring: substring

The "action_string" must contain a string. If the full URL of the error document contains this string, then this rule will take affect.

Example: "url-substring: scripts" will be valid for error URL "http://xav.com/scripts/no-such-doc.txt".

url-pattern: perl-regex

The "action_string" should contain a valid Perl regular expression. The full URL of the error document must pattern match to this string.

Example: "url-pattern: \d\d-\d\d.txt" will be valid for error URL "http://xav.com/scripts/55-66.txt".

refer-string: http://host.tld/path/file.ext

The "action_string" must contain a fully-qualified URL. If the HTTP_REFERER matches this string exactly, then this rule will take affect.

refer-substring: substring

The "action_string" must contain a string. If the full HTTP_REFERER URL contains this string, then this rule will take affect.

Example: "refer-substring: scripts" will be valid for error URL "http://xav.com/scripts/no-such-doc.txt".

refer-pattern: perl-regex

The "action_string" should contain a valid Perl regular expression. The full HTTP_REFERER URL of the error document must pattern match to this string.

Example: "refer-pattern: \d\d-\d\d.txt" will be valid for error URL "http://xav.com/scripts/55-66.txt".

ua-string: Netscape 3.0

The "action_string" must contain a full browser name. If the HTTP_USER_AGENT matches this string exactly, then this rule will take affect.

ua-substring: substring

The "action_string" must contain a string. If the full HTTP_USER_AGENT contains this string, then this rule will take affect.

Example: "ua-substring: MSIE" will be valid for visiting browser "Mozilla/4.0 (compatible; MSIE 6.0b)".

ua-pattern: perl-regex

The "action_string" should contain a valid Perl regular expression. The full HTTP_USER_AGENT must pattern match to this string.

Example: "ua-pattern: Mozilla/\d" will be valid for visiting browser "Mozilla/2.0".

error-code: http-response-code

The "action_string" must be an HTTP 3-digit response code, such as "404" for Not Found errors, or "401" for Authorization Required.

Example: "error-code: 500" will capture all script errors.

any: *

This will apply to all requests. It should only be used as the final catch-all Filter Rule.

Help: Reaction Types

The "reaction type" is the part of the rule that tells how an error should be handled.

ignore: ignore-request-value

This simply switches the ignore_request parameter to the value of the reaction string. For ignore_request values of 1 or 3, no email is sent. For values of 2 or 3, the request is not logged to a file. The remaining Filter Rules will still be processed in order to find the first applicable Filter Rule that involves a response for the visitor. If multiple rules apply to a single request, the largest value will be used.

Ignore-Request
Value
Logged
to file?
Email
sent?
0 Yes Yes
1 Yes No
2 No Yes
3 No No


error-template: file.txt

Will display the error template file "file.txt". The contents of "file.txt" will be returned inside the larger "template.txt" template, inserted as the %specific_message% variable. All templates can be reviewed and edited by going to Admin Page => Manage Templates.

redirect: http://www.example.com/

The user is redirected to the URL string. An HTML META-refresh is used, rather than a HTTP 300-series response. The visitor will see a brief "Moved" message. The reaction string can be an absolute or relative URL. An absolute URL is recommended.

This reaction type is not available in Freeware mode.

http-redirect: http://www.example.com/

Same as "redirect", but uses a low-level HTTP 301 "Moved Permanently" response instead of the HTML meta refresh. The visitor does not see any "moved" message, and instead just sees the final document. The reaction string can be an absolute or relative URL. An absolute URL is recommended.

The 301 response is cacheable by browsers. Search engine crawlers may interpret this response as meaning that the requested URL no longer exists. As such, the 301 response is ideal for 404 Not Found errors. It is not ideal for 500 Internal Server Errors, for which the error condition may change from request to request.

This reaction type is not supported on the Zeus web server. The "redirect" method will be used in its place.

This reaction type is not available in Freeware mode.

http-redirect-temp: http://www.example.com/

Same as "redirect", but uses a low-level HTTP 302 "Found" response instead of the HTML meta refresh. The visitor does not see any "moved" message, and instead just sees the final document. The reaction string can be an absolute or relative URL. An absolute URL is recommended.

The 302 response is not cacheable. This response is ideal for responding to temporary error conditions, such as a 500 server error.

This reaction type is not supported on the Zeus web server. The "redirect" method will be used in its place.

This reaction type is not available in Freeware mode.

replace: /path/to/file.ext

The replace action will cancel the normal response code (like "404 Not Found") and replace it with a valid response (like "200 OK"). It will then send the contents of the file given in the reaction string. It will return a default Content-Type header of "text/html", or a more specific type based on the value of the %MimeType hash, defined in the source of this script. This is useful for handling requests for "favicon.ico" files, in which the web browser doesn't follow redirects. This can also be used to respond with a specific error message file without being forced to use the "template.txt" wrapper (compare with the "error-template" reaction type).

This reaction type is not supported on the Zeus web server.

This reaction type is not available in Freeware mode.

dos: time-value

The DOS response is a very limited denial-of-service response. It should only be used in reaction to requests that are likely to be hostile and automated (for example, probing of the /vti_ folders to check for unprotected Front Page extensions). The script will return an artificially high Content-Length, and will then spoon-feed content bytes back to the client at a rate of one byte per second, for whatever time value is listed. For many simple HTTP clients based on a single-threaded or fixed-threadpool model, this will hang all of their requests and render the attack/probe inoperable. For example, assume this script is configured with the "dos: 3600" response to "url-substring: /vti_". If an aspiring hacker opens the Front Page authoring tool and tries to connect to the site, the Front Page program will hang completely for one hour.

This response type does no damage to any software, computer or network, other than to hang the remote thread making the seemingly-hostile request. The response uses minimal bandwidth and processing power on both the client and server.

This reaction type will not be triggered if the HTTP_VIA environment variable is defined. In practice, this variable is defined only for requests that arrive via a proxy server. Guardian does not wish to return invalid data to a proxy server since the proxy is generally blameless.

Some web hosting providers and legal departments discourage the use of this reaction type.

This reaction type is not supported on the Zeus web server.

This reaction type is not available in Freeware mode.

blacklist: /path/to/.htaccess/file

Guardian will add a "deny" entry for the visitor IP address in the .htaccess file. For example:

# Added by Guardian Mon Oct  1 23:04:17 2001 (url-pattern:(cmd.exe|root.exe))
deny from 209.208.162.142

It includes a timestamp of when it added the rule, and a copy of the "action:action-string" rule that triggered it. This can be very helpful in preventing hostile probes from scanning your site.

This reaction type will not be triggered if the HTTP_VIA environment variable is defined. In practice, this variable is defined only for requests that arrive via a proxy server. Guardian does not wish to deny the IP address of a proxy server since that would deny everyone who routes through it, not just the offending visitor.

The syntax of the .htaccess "deny from" command is supported on Apache 1.2 and later, which should cover all Apache releases since 1998. Do not use the blacklist reaction with earlier versions of Apache, because then the added lines in the file will not be recognized and Apache may stop serving all content on your site.

Once a client has been blacklisted, all further requests by the client will return the "403 Access denied" response. If you have an "ErrorDocument 403" handler set up, it will be forced to deal with all of these denied requests. Since blacklisting is often done to get rid of probes that cause hundreds of server errors in a very short time (and a resulting email flood from Guardian), it is best to not configure a 403 handler when using the blacklist response. Allow 403 errors to be handled quickly by Apache itself, and only offload 401, 404, and 500 errors to Guardian. Here is an example .htaccess file that is set up this way:

ErrorDocument 403 "Error 403 / Access Denied.
ErrorDocument 404 /guardian/ag.pl
ErrorDocument 500 /guardian/ag.pl

This reaction type is not available in Freeware mode.

Special Rules

These rules are used to automatically resolve certain types of common errors. In order to work, you must have accurately configured the "base URL" and "base folder" settings on the "Manage System Settings" page.

Syntax

==
case-match: *
http-redirect: %new_url%?fix
==

Description

This rule applies to any 404 Not Found error in which the URL that is not found contains some uppercase characters. If the URL as it stands generates the Not Found error, but would generate a normal working request if it was entirely lowercase, then this rule will apply. A copy of the new URL in lowercase is stored in the %new_url% variable. Typically this rule is used in conjunction with the "http-redirect: %new_url%?fix" response. Query string "?fix" is appended because this rule only works on URL's without a query string. By forcing a query string on the redirect, you help protect against looping.

This rule cannot be applied until you first go to "Admin Page" => "Manage System Settings" and enter valid values for "Base URL" and "Base Folder". Those base variables are needed so that Guardian can test for physical file existence based on URL path information.

This rule is particularly helpful when people migrate from a Windows server (which has a case insensitive file system) to a Unix server. All sorts of mixed-case local URL's can build up on the Windows server and it can take a long time to fix them all.

Syntax

==
trailer: *
http-redirect: %new_url%?fix
==

Description

This rule applies to any 404 Not Found error in which the URL that is not found contains trailing characters, like whitespace, newlines, quote marks, or periods. If the URL as it stands generates the Not Found error, but would generate a normal working request without that trailing information, then this rule will apply. A copy of the new URL without the trailing information is stored in the %new_url% variable. Typically this rule is used in conjunction with the "http-redirect: %new_url%?fix" response. Query string "?fix" is appended because this rule only works on URL's without a query string. By forcing a query string on the redirect, you help protect against looping.

This rule cannot be applied until you first go to "Admin Page" => "Manage System Settings" and enter valid values for "Base URL" and "Base Folder". Those base variables are needed so that Guardian can test for physical file existence based on URL path information.


Regular Expression Substitutions

Three types of filter rules involve pattern matching: url-pattern, refer-pattern, and ua-pattern.

If those pattern matches involve parentheses-bound things, then a Perl-style substitution will be performed on the reaction-string of the rule, replacing "$1" with the content of the first thing in parentheses, replacing "$2" with the thing in the second parentheses, and so on.

For example, consider the rule:

==
url-pattern: /foo/(.*)/bar/
http-redirect: http://$1/
==

This rule will look for any error URL of the pattern /foo/SOMETHING/bar/ and will then redirect to http://SOMETHING/.

Notes: the $1 through $n substitutions are done only to the extent that $n is defined by the regular expression in the pattern filter rule. If there is a regular expression with three parentheses-enclosed blocks, then only $1, $2, and $3 will be defined. If you have $4 present in your reaction string, then it will remain as literal "$4", without being processed.

Thanks to Brian Renken for this cool feature.



    "Overview of Guardian Filter Rules"
    http://www.xav.com/scripts/guardian/help/1006.html