Skipping pages which use META refresh
FDSE attempts to skip HTML documents which quickly redirect to other files using a META refresh. An example of such code is:
<meta http-equiv="refresh" content="5;url=http://www.yahoo.com/" />
The above code tells the browser to automatically navigate to Yahoo! after five seconds.
The exact behavior of FDSE depends on its crawling mode:
-
When using the web crawler mode to index a web site, FDSE will treat the META refresh as a redirect. If the file redirects to another URL which also lies on the same site, then FDSE will try to index that file. If the URL lies on a different site, then it will be skipped.
-
When using the web crawler in a way that does not limit the URL namespace -- such as an open realm or a file-fed realm -- FDSE will always follow the redirect to whatever URL is returned.
-
When using the file system crawler, the document will simply be skipped.
The special refresh behavior applies only to pages which quickly refresh. FDSE applies its special refresh logic to any document with a refresh tag whose time value is less than the General Setting "Refresh Time Delay". The default value for this setting is 10 seconds. Documents which refresh is 9 seconds or less will have their refresh followed; documents which refresh in 10 or more seconds will be treated as normal content. Setting the "Refresh Time Delay" to zero will disable FDSE's special refresh behavior.
Bugs: FDSE scans the first 4096 bytes of the file for any string which approximates to a refresh META tag. In particular, it will match constructs such as:
<noscript>
<meta http-equiv="refresh" content="0;url=need_javascript.html" />
</noscript>
If this causes a problem for you, you should disable the special refresh behavior by setting "Refresh Time Delay" to zero.
History: the General Setting "Refresh Time Delay" was introduced in FDSE version 2.0.0.0071 with a default value of 10 seconds. Prior to that version, the script logic was the same but the time delay was hardcoded at 10 seconds and could not be changed.
"Skipping pages which use META refresh"
http://www.xav.com/scripts/search/help/1196.html