Searching binary files
The Fluid Dynamics Search Engine is designed to find keywords within text files. Binary files do not contain searchable text, and so special techniques must be used to search them. There are many techniques available, but essentially they all involve the creation of a text HTML file that corresponds to the binary. FDSE extracts keywords from this text HTML file, but points the user to the binary in the search results.
Techniques for binary searching
-
Maintain HTML copies of each written binary document
Recommended solution for binary written documents, including Microsoft Word, Microsoft Excel, PDF
-
Maintain HTML files about each non-written binary document
Recommended solution for non-written binaries, such as audio, video, and images
-
Perform automatic extraction of text (MS Word, PDF, MP3 only)
-
Search for keywords only in URL and filename of binary files
Recommended solutions
Maintaining an HTML page for each binary has these advantages:
Works with all search engines, not just FDSE
People who cannot or will not view the binary can still read the file
You can configure FDSE to link to both HTML and binary content in the search results
You have maximum freedom in customizing the searchable title, description, keywords and text.
See also Displaying file-type icons in search results.
"Searching binary files"
http://www.xav.com/scripts/search/help/1053.html