Searching for something does not return expected pages
This is a very general class of problem. Here are some good things to check:
-
Confirm that your search term is not being ignored. If it is, it will say so at the top of the search results, like "Ignored: what". Many common words are ignored (the list is customizable by you using the "Ignore Words" General Setting).
-
Confirm that the appropriate realm is selected - try "All".
-
Confirm that pages are being searched. At the top of the search results it will say like "of 1000 documents searched". If it says "of 0 documents searched" then there is some lower-level problem. Perhaps your index files aren't loading or your database connection is down.
-
The engine will optimize away from extracting all the text on the page, often getting only the first 64,000 characters. For extremely long documents, the words at the end of the document will be ignored.
You can control this using the "Max Characters: File" and "Max Characters: Text" General Settings. There are also "Max Characters" General Settings that can truncate the title, description, and keywords of the document. If you change any of those settings, you will need to rebuild your index files in order for the changes to take effect.
-
FDSE searches text, not images. Confirm that the "text" that should be found is true file text and not an image masquerading as text. In the example fragment below, a search for "contact" will not result in a hit:
<img src="1062a.gif" />
For this to work, you must 1) Include ALT text that exactly matches the image text, and 2) set the General Setting "Index ALT Text" to 1 and 3) then rebuild your index file:
<img src="1062a.gif" alt="Contact Us" />
-
Make sure that the URL that should have been returned is in the search index. If you expect "http://xav.com/notify/" to appear when you search for the word "notify", try a search for "url:http://xav.com/notify/" to confirm that the URL is in the index in the first place.
-
Check the HTML source of the document you expected to come up. The search engine will usually replace HTML tags with spaces. So, if your HTML page contains:
<big>T</big>he <big>E</big>ndthen it may be stored in the index as "t he e nd" and a search for "end" won't bring it up.
This problem often arises when people use Front Page or other WSIWYG editors to create web pages. Front Page will start and stop HTML tags in the middle of words for no reason. Here is a common fragment from a Front-Page authored document:
<font color="black">meani</font><font color="black">ngless</font>It does this if you are editing in Front Page and highlight the string "meani" and set the color to black, for instance.
History: starting with the 2.0.0.0045 release (2001-07-10), FDSE will no longer replace certain "zero-width" HTML tags with spaces. It will replace them with empty strings instead, which should work around this problem in most cases. The following HTML tags are considered zero-width: B; TT; I; EM; STRONG; BIG; SMALL; FONT; SPAN. This list can be customized by tweaking subroutine parse_html_ex in the common_parse_page.pl library.
If you have a case where the word appears on the page, and the page is indexed, and the search terms do not bring up the expected page, then please post a bug report to the discussion forum and include all of the relevant data.
"Searching for something does not return expected pages"
http://www.xav.com/scripts/search/help/1062.html