Home > Fluid Dynamics Search Engine > Help > 1185

Search for keywords only in URL and filename of binary files

This technique is supported only for reverse compatibility with the earliest versions of FDSE. The other techniques described at Searching binary files will offer better results when searching binary files.

This technique can only be used with the file system crawler (used with Runtime Realms and Website Realms - File System Discovery). The web crawler is not able to index only the URL and filename of binary files.

Steps to enable:

  1. Create a realm using the file system crawler.

  2. Go to Admin Page => General Settings => Allow Binary Files and confirm that it is enabled.

  3. Go to Admin Page => General Settings => Ext and list all of the file extensions that you would like to index.

    Ext extension "null" matches files with no extension. Ext extension "*" will match all files.

FDSE will automatically determine which files are binary and which are text, using the Perl -T test. The text files will be parsed as HTML. The binary files will have only their URL and filename included in the index.

If there are any binary converters loaded, such as XPDF or Antiword, they will be allowed to convert their binaries to text format first. The only binaries that have only URL and filename indexed will be those listed in "Ext", and which fail the -T "is-text" test, and which have no converters loaded.

For best results, use highly descriptive file and folder names, such as:


Note that under this approach, certain binary files like PDF will tend to pass Perl's imperfect "is-text" test, and their binary content will be indexed. If you have problems with this, the best solution is to not try to index just the URL and filename for those file types. There are plenty of other better options listed at Searching binary files.

    "Search for keywords only in URL and filename of binary files"