Support for META headers
The Fluid Dynamics Search Engine will recognize META headers of the following formats:
<meta name="description" content="Description of document." />
<meta name="description" content="Description of document.">
<meta http-equiv="description" content="Description of document." />
<meta http-equiv="description" content="Description of document.">
FDSE accepts both "http-equiv" and "name" as the primary attribute name.
If you are adding META tags to your pages, use the following standard: the "http-equiv" attribute name is used only for META tags associated with an HTTP-level response header, including "last-modified", "expires", and "content-type". All other META tags - including "description", "keywords", "robots", and "fdse-index-as" - should use "name".
The attributes must be single- or double-quoted. If there is no internal whitespace in the attribute value, then it can be unquoted. Double-quotes are recommended.
META tags recognized
FDSE uses the "description" and "keywords" META tags as sources of additional keywords and phrases. It uses the "description" and "last-modified" values when displaying the search result listings. Other META tags are used mostly for low-level content filtering.
- description
-
The document description, for use in the search results display. If missing, FDSE will extract the first few words from the document.
Words present in this META header can be given extra weight (see the General Setting "Multiplier: Description").
- keywords
-
The keywords for the document. If missing, FDSE will leave the keywords attribute blank.
Words present in this META header can be given extra weight (see the General Setting "Multiplier: Keywords").
- last-modified
-
The last modified time of the file. FDSE will use this value if present, overriding the value from the "Last-Modified" HTTP header or the value from the
stat()function. See Calculating last-modified time for more information and for supported time formats. - robots
-
The robots exclusion tag. FDSE obeys "none", "noindex", and "nofollow".
See How to prevent your pages from being indexed for a full description of the robots exclusion standard.
- pics
The header for the Platform for Internet Content Selection. If PICS-based filtering is enabled (see "Admin Page" => "Filter Rules" => "PICS"), then the RASCi and SafeSurf implementations will be followed. See Filter Rules: Filtering based on PICS header.
- refresh
The META refresh header for redirecting to another file. If this header is present, and the time for the redirect is less than 10 seconds, then FDSE will follow the redirect and skip the current document.
- fdse-index-as
Proprietary tag. See Support for FDSE-Index-As META header.
FDSE ignores the "content-type" header and its associated charset value. This is a limitation of FDSE.
Limitations in recognizing tags
-
FDSE will only search the first 4096 bytes of the file for META tags. The entire tag must be completed within those first bytes.
(In FDSE version 2.0.0.0032 and earlier, it only looked in the first 1024 bytes.)
-
META tags are parsed based on a raw pattern match within the first 4096 bytes. There is not a complete parse of the HTML tree. Because of this, META tags which are commented out will still be matched. META tags which contain odd attributes may not be parsed correctly.
-
The "fdse-name" META tag will override the "name" META tag. In the following example, FDSE will recognize "abc abc" as the document description.
<meta name="description" content="123 123" /> <meta name="fdse-description" content="abc abc" /> <-- override wins -
When there are multiple META tags with the same name, only the first tag will be used. In the following example, FDSE will recognize only "marshall mathers" as the document keywords.
<meta name="keywords" content="marshall mathers" /> <-- first valid wins <meta name="keywords" content="slim shady" /> <meta name="keywords" content="eminem" /> -
There is no support for extended numbered META tags, or language-specific META tags. In the following example, FDSE will still only recognize "marshall mathers" as the document keywords.
<meta name="keywords" content="marshall mathers" /> <-- first valid wins <meta name="keywords2" content="slim shady" /> <meta name="keywords" lang="de" content="eminem" />The HTML 4.0 specification suggests that authors use multiple META headers with the same name but separate lang="XX" attributes for language-specific keywords and descriptions. FDSE will ignore the lang="XX" attributes and simply extract the first matching tag.
Advanced: Custom coding
All extraction of META headers is done using the parse_meta_header subroutine in library file search/searchmods/common_parse_page.pl. If you wish to change FDSE's support for META headers, you can edit that subroutine.
"Support for META headers"
http://www.xav.com/scripts/search/help/1151.html