Home > Fluid Dynamics Search Engine > Help > 1132

Total number of documents which can be searched

There is no fixed "hard" limit on the number of documents which FDSE can search. However, in practice, there is a "soft" limit which tends to vary with server power and configuration options.

Approximate Limit - Ten Thousand Documents

An unofficial figure of "10,000 documents" is often cited as an approximate upper bound on the number of documents which FDSE can search at one time.

The upper bound for your system might vary by a factor of 10 -- so, from 1,000 to 100,000 documents -- depending on your server power and configuration options. I have heard of no cases where FDSE handled more than 100,000 documents (with reasonable speed), or where it had trouble handling fewer than 1,000 documents.

(Note that the limit actually applies to the number of documented searched, not the number indexed. If you have ten realms, each with 10,000 documents, then the limit will be exceeded if a visitor chooses to search "All" realms. However, if you remove the "All" option and force the visitor to select a specific realm, then the searches will complete within the limit.)

"What happens when I exceed the approximate 10,000-document limit?"

At first, nothing will change. The search speed will be about the same. As the capacity continues to grow, however, the searches will take longer to complete. Eventually the search process will use more resources than is allowed by the web server policy, and the process will be killed before it can complete. The exact limit at which this will happen depends on your server power, your server policy, and your configuration options.

"Which configuration options affect capacity?"

The capacity of FDSE is directly related to its speed. The faster searches execute, the more documents that can be stored.

To maximize capacity, follow all of the suggestions in the help article How to make searches faster.

Resource Usage in Detail

FDSE consumes system resources and time for each document searched. Therefore, as more documents are searched, the engine uses more memory, more CPU power, and takes longer to return results.

There are server-side and client-side limits on just how many system resources and how much time can be used. Normally the search engine finishes long before the limits are reached. However, if there are too many documents to be searched, one of the following will happen:

  1. the web server will kill the search process for using too much memory
  2. the web server will kill the search process for using too much CPU power
  3. the web server will kill the search process for taking too much time
  4. the visitor's browser will time out because the search process is taking too much time

Also, as we approach the upper physical limit, searching may take a very long time, like 15 to 30 seconds, and so visitors may give up even though there is no software-level error.

Not all web servers have memory or CPU limits. Unix servers used by web hosting companies often have strict memory, CPU, and time restrictions. Windows / IIS servers tend to have time restrictions only. Some dedicated web servers have no restrictions at all.

Searching each document uses CPU power and time, but does not absorb memory by itself. However, for each document which matches the keywords, memory is allocated to store the result, and some extra CPU power is used for sorting. Thus there is a higher chance of exceeding the memory limit when the keywords return many results.

See also:

How to be a Internet-wide search engine

Updated 2002-07-05


    "Total number of documents which can be searched"
    http://www.xav.com/scripts/search/help/1132.html