What is the Fluid Dynamics Search Engine?
FDSE is a search engine that you install on your own site. Visitors to your site use it to find files on your site or on a small cluster of sites. The search box at the top of this page is an example of how FDSE is typically used.
FDSE is different than Google or Altavista, which search the entire Internet. FDSE only searches the sites that you tell it to. It can handle about 10,000 documents in all, which is plenty for one site but much fewer than the total number of documents on the Internet. (more info on size limits)
FDSE is smaller than Google or Altavista, but it is qualitatively identical to them. It has its own built-in web robot for retreiving files, which means it is not limited to searching only documents on its own server. It builds its own index files and returns results from them, unlike some "meta-search" scripts which make behind-the-scenes requests to major search engines to gather results.
FDSE runs entirely on your server, so visitors aren't redirected to a separate centralized server to get their results (as with Atomz and Freefind). If your web server doesn't support Perl CGI at all, then you might be better off with one of those remotely-hosted solutions.
FDSE is a flat search engine - it accepts keywords and shows a ranked list of search results. It does not organize pages into browsable categories and subcategories like Yahoo does.
Features and Benefits:
Unrestricted full version download - you can try before you buy.
Code executes 100% locally on your own server - no dependencies on other sites or companies.
Code is 100% pure Perl - no dependencies on external modules or system calls.
No forced banner advertisements to distract your visitors.
Extras are optional. For example, you can configure your own keyword-triggered banner ads, but that's your choice. They aren't forced on you.
Platform indepedence - runs well on Unix, Linux, Windows NT, Windows 200X, Win95/98/ME.
Completely template-based: you control the entire look-and-feel of the site by editing text/html template files. No need to edit the source code... though you can do that too. You can always preserve your existing templates and data when upgrading or re-installing the product.
Dependable user support, featuring many in-depth help files and an active discussion forum.
Code is modular and heavily commented for the benefit of those who want to be hardcore. Can be called as an API from another Perl script. Format of all data files is documented in the help file.
Highly customizable filter rules allow you to programmatically control which web pages are included in the index. Filtering can be done based on patterns in the hostname, URL, or Document Text, or based on RASCi and Safesurf PICS headers.
Resource-intensive actions, like indexing entire web sites, are spread across multiple CGI executions, using META refreshes. This prevents web server timeouts due to excessive resource usage, and allows the action to recover if some individual CGI executions fail.
Searches text and HTML files. Can also search PDF, MP3, and MS Word files with helper applications (help file).
Add Your URL - any visitor can add her own website to the index, at your option. This can be turned on or off by the script owner. (more info)
Attribute Indexing - a document's text, keywords, description, title, and address are all extracted and used for searching.
Rich Display - the title, description, size, last modified time, and address of each document are shown to the user in the list of hits. The admin can configure the number of hits to show per page.
- Relevance Listing - documents are sorted by the number of keyword hits, so that the most relevant document appears first. Search terms found in the title, keywords, or description are given additional weight.
- Smart HTML Parsing - the search engine does not index text appearing inside of HTML tags, nor inside <SCRIPT> or <STYLE> blocks
-
Attribute Searching - by default, searches find words in the body, title, keywords, URL, links, or text of a document. By using attribute:value searches, each portion of a document can be searched. The supported attributes are:
- url:value (host:value) (domain:value)
- Finds "value" in the web address of the document. For example, host:whitehouse.gov will only find matches on that website. The prefixes "url," "host," and "domain" all act the same.
- title:value
- Finds "value" between the <TITLE> and </TITLE> tags of the target document.
- text:value
- Searches only the actual text of the document, not the links or the URL. Due to the data structure of the index file, this attribute will include the title, keywords, and description of the file
- link:value
- Searches only the text extracted from hyperlinks in the document. Useful to see which documents link to a particular page, such as "link:http://my.host.com/". Relative links are extracted as-is, and are not expanded.
- Phrase Searching - Enclosing words in quotation marks causes them to be evaluated as a phrase. That is, all terms must occur next to each other and in order. "My bad self", when quoted, will not match "my self is bad".
- Intended Phrase Optimization - a set of unquoted search terms will be treated as a phrase first, and as individual terms second. Thus, users who don't quote their phrases will still see phrase matches near the top of the results list.
- Punctuation for Phrase Binding - words joined by punctuation will be treated as a phrase. Searching for "Bill.Clinton" (unquoted) is the same as "Bill Clinton" when quoted.
- Punctuation-Insensitve - only alpha-numeric characters can be used for search terms. The characters "+," "|," "-," ":," and "*" all have special meaning (require term, prefer term, forbid term, bind attribute and wildcard match, respectively.) All other punctuation characters are treated as whitespace.
- Case Sensitivity - All searches are case insensitive and accent insensitive. Searching for "Fur" will match the lowercase "fur", uppercase "FUR", and German "für".
-
Granular Any/All Control - users may configure each search to find "any" keyword or "all" keywords in the set. In addition to setting a default for all keywords, users can specify whether specific keywords should be required by using a "+" sign before them. Words can be optional with a leading "|", and forbidden with a leading "-".
For example, the query +oatmeal +cookies |raisins -store-bought should return cookies.html.
- Indexing remote files is done with a web crawler. It also operates on fixed batch sizes of documents, preventing infinite loops on robot traps or error conditions. The crawler uses the HTTP/1.0 protocol, but also supports Host headers and dynamic cookies.
- Author Control - those who don't want their pages indexed can protect their site with a robots.txt file or the Robots Meta tag, described in the robots exclusion standard. If their pages have already been indexed, the author can resubmit the pages once robot exclusion is in place, and the pages will automatically be removed.
- When optimal performance is required, the web crawler can run on a computer separate from the one providing search services. All indexing writes to a single, self-contained file that can be transferred from a search workstation up to the search server.
- International Support - all Latin extended characters are reduced to their English equivalents. For example, German "für" becomes "fur". Because this translation is done on the web documents and on the search terms that users enter, the net effect is transparent support for all non-English languages which use the Latin character set. This also enables non-English searching for users with English-only keyboards.
- Auditing - all searches are logged, with the user host, time, search terms, and number of results returned. The script owner can learn about visitor interests by viewing this log.
- Site Promotion - the script owner can force certain "preferred" web pages or sites to appear higher in the index.
- Site Blacklisting - the script owner can remove certain "blacklisted" web sites from the index. This is useful when the "Add Your URL" feature is turned on for all visitors.
Bad Features (Known Limitations and Problems)
Latin text only - this script will index and search any text document written in a European language (Latin character set.) Two-byte languages such as Japanese or Chinese are not supported.
Web only - this search engine runs on a Perl-CGI-enabled web server. It is not suitable for non-CGI web servers. It is not suitable for searching files when a web server is not available, such as a CD-ROM of technical support information.
Web only - the robot does not support protocols other than HTTP. For example, FTP, Gopher, or Secured Socket Layer (HTTPS) documents can not be indexed.
Memory and CPU Needs - this search engine was designed to provide a rich feature set, which requires more memory and processor power. There may be leaner search engines available - if this becomes an issue for you, look for a leaner engine on www.cgi-resources.com.