Sorting Results: How relevance is calculated
Search results are sorted by their relevance. Relevance is calculated using a three-step process:
Every time a search term is found in the web page text, one point is added to the web page's relevance.
Every time a search term is found in the title, the value of the "Multiplier: Title" General Setting is added to the relevance.
Similar additions are made for search terms in the META Keywords, META Description, and URL, as per the "Multiplier" General Settings for those attributes.
All the Filter Rules with action "Promote" are checked. For each rule that applies to this web page, the current relevance is multiplied by the "promote multiplier" associated with that rule, up to a maximum 99x multiplier.
The search results are sorted according to this final relevance value.
Example: a visitor searches for "perl scripts".
- Web page http://www.xav.com/contact.html contains "scripts" 6 times.
- None of the occurrences are in the title, description, or keywords
- No "Promote"-type Filter Rules apply to this file
- Final relevance: 6
Example: a visitor searches for "perl scripts".
- Web page http://www.xav.com/ contains "perl" twice and "scripts" eight times. Thus, the page begins with a base relevance value of 10.
- "perl" occurs once in the META description, and "scripts" appears there twice. The administrator has configured "Multiplier: Description = 25". The search engine adds 25 to the relevance value for each instance of each term, so relevance becomes 10 + 25 (perl) + 25 (scripts) + 25 (scripts) = 85.
- The Filter Rules are parsed. The rule "Promote Sites" has been configured to give a 20x multiplier to any site with "xav.com" in the hostname.
- Final relevance: 85 * 20 = 1700. A normal web page would have to include 1700 instances of "perl" or "scripts" before it could appear higher in the search results than http://www.xav.com/.
Example: a visitor searches for "president".
- Web page http://www.whitehouse.gov/ contains this term 12 times, so the base relevance is 12.
- No multipliers are defined.
- Promote rules are parsed: the administrator has configured a 20x multiplier on all ".gov" hostnames, and a 20x multiplier for any document with "george w. bush" in it. The total multiplier becomes 20 * 20 = 400. The search engine has limit of 99 as the maximum multiplier for any document, so the 400 multiplier is kicked down to 99.
- Final relevance: 12 * 99 = 1188
Final relevance values can range from 1 to 999999. After they reach one million, they will wrap around, since they are stored in a fixed-width field. This should never happen.
This article applies to searches that are sorted by relevance, which is the default. When searches are sort by last modified time or last indexed time, the relevance values are not calculated or used.
Advanced: Description Matches
When "Multiplier: Description" is set to a non-zero value, FDSE will add extra points for each search keyword found in the description. The "description" consists of whatever FDSE has extracted for the document. By default, this will be the META tag "description", but if that tag is not present, then it will be the first few words of the file. FDSE will treat the descriptions equally for purposes of calculating relevance, whether the text comes from the META description or the first few words of the file.
In a similar fashion, a web page without a TITLE tag will be given its filename as a title, or the string "No title available". In the latter cases, the filename or the string "No title available" will then be searchable.
FDSE performs a case sensitive substring match against the index record to calculate how many times the search term appears in the document. This match may result in a slight overcount, as follows. The index record contains some duplicated fields, namely the title and description. For the title and description, FDSE stores an original, presentable version like "Foo: Title of main Web Page" as well as a stripped down searchable version, like "foo title main web page". If a user searches for, say, "Web", then the keyword will be converted to "web" and then compared against the index file, where it will match exactly one time against the searchable title field (where everything is lowercase). However, if the user searches for "main", then this keyword will match against both the literal title and searchable title fields, generating a raw relevance value of 2 instead of 1. There are no plans to change this since searches still return the correct results in mostly the same order, but it is something to be aware of.
The Multiplier settings have performance implications; see:
"Sorting Results: How relevance is calculated" http://www.xav.com/scripts/search/help/1074.html