Home > Fluid Dynamics Search Engine > Help > 1179

Maintain HTML copies of each written binary document

Overview

It is recommended that you maintain HTML copies of all binary written documents. These copies should be stored alongside the binaries, and you should link to both formats. The benefits of this approach are:

This help file describes just one way to search binary files; see Searching binary files for others.

Steps to enable:

  1. Begin with a consistent naming convention whereby every binary document has an HTML equivalent whose name is exactly the same, but with an additional ".html" extension.

    A directory listing would look like this:

    Board_for_Social_Responsibility_2001.pdf
    Board_for_Social_Responsibility_2001.pdf.html
    Tips_for_Exercise_and_Motivation.doc
    Tips_for_Exercise_and_Motivation.doc.html
  2. Next, use a consistent linking policy. Every time you link to a binary document, include a link to the HTML equivalent. For example:

    <a href="Board_for_Social_Responsibility_2001.pdf">
    	Board_for_Social_Responsibility_2001.pdf
    </a>
    (<a href="Board_for_Social_Responsibility_2001.pdf.html">HTML format</a>)

    An example of such links can be seen here.

  3. At the top of every HTML copy, insert the following items:

    • A link to the original binary file

    • Short instructions on how to view that kind of binary file

    • A descriptive title, META description, and META keywords (recommended)

    • If you want only FDSE to search these HTML documents, while all other search engines do not, then add the following META headers:

      <meta name="robots" content="noindex" />
      <meta name="fdse-robots" content="index" />

      These META headers tell all search engines to ignore the file, but then override that command for FDSE.

    • If you plan to complete steps 5-8 below, add META headers to set the file size and last-modified date of the binary:

      <meta name="fdse-content-length" content="21841" />
      <meta name="fdse-last-modified" content="Mon, 30 Jun 2003 20:55:00 GMT" />

      The content-length header is an integer for the byte size of the file. See Calculating last-modified time for date formats to use with the last-modified header.

    An example of an HTML copy with these insertions can be seen here.

  4. With the first three steps completed, your site is already completely accessible.

    • Visitors who are not able to view binaries will always have an option to view the document in HTML format, because each link to a binary is accompanied by a link to an HTML equivalent (step #2).

    • Visitors who stumble upon the HTML version of the file -- such as through a search result, or by someone sending them the link -- will always have the option of requesting the original binary, because the HTML version always starts with a link to the original (step #3).

    The next steps are optional. They allow you to link to both the binary and HTML formats from the FDSE search result listings.

  5. Make sure you are running FDSE version 2.0.0.0064 or newer.

  6. Edit source code file:

    /search/searchmods/common.pl

    Find the following lines of code:

    $pagedata{'file_type_icon'} = &get_file_type_icon_by_url( $pagedata{'url'} );
    
    return &PrintTemplate( 1, 'line_listing.txt', $::Rules{'language'}, \%pagedata, 0, \%::const);
  7. Insert the following custom code, so that those lines read:

    $pagedata{'html_format_link'} = '';
    
    # handle .doc.html files:
    if ($pagedata{'url'} =~ m!\.doc\.html$!i) {
    	$pagedata{'html_format_link'} = qq! - <a href="$pagedata{'url'}">View as HTML</a>!;
    	$pagedata{'url'} =~ s!(\.doc)\.html$!$1!i;
    	}
    
    # handle .pdf.html files:
    if ($pagedata{'url'} =~ m!\.pdf\.html$!i) {
    	$pagedata{'html_format_link'} = qq! - <a href="$pagedata{'url'}">View as HTML</a>!;
    	$pagedata{'url'} =~ s!(\.pdf)\.html$!$1!i;
    	}
    
    # ... repeat as needed for other .binary.html types ...
    
    $pagedata{'file_type_icon'} = &get_file_type_icon_by_url( $pagedata{'url'} );
    
    return &PrintTemplate( 1, 'line_listing.txt', $::Rules{'language'}, \%pagedata, 0, \%::const);

    (Note that in older versions, $::Rules and %::const are written as $Rules and %const. Keep using whatever names are used by the code you are editing.)

  8. After making this change, the main search results will link directly to the binary PDF or DOC file, even though the HTML file was searched. A secondary link to the HTML-formatted file is included in the template variable %html_format_link%.

    To expose that secondary link, edit template file:

    /search/searchdata/templates/line_listing.txt

    This is the template that creates each search result line listing. By default, the HTML looks like this:

    <dl>
    	<dt><b>%Rank%. <a href="%Redirector%%URL%">%Title%</a></b> %admin_options%</dt>
    	<dd class="sr">
    		%Description%<br />
    		<b>URL:</b> %url% - %Size% - %Day% %Month% %Year%
    		%context_line%
    	</dd>
    </dl>

    You can just insert the variable %html_format_link% wherever you like. One solution is:

    <dl>
    	<dt><b>%Rank%. <a href="%Redirector%%URL%">%Title%</a></b>
    		%admin_options% %html_format_link%</dt>
    	<dd class="sr">
    		%Description%<br />
    		<b>URL:</b> %url% - %Size% - %Day% %Month% %Year%
    		%context_line%
    	</dd>
    </dl>

Here is an example of search results listings after steps 5-8 have been completed.

Example of search results with HTML-equivalent links added.

Here is a similar example, with the additional customizations from Displaying file-type icons in search results.

Example of search results with HTML-equivalent links and file type icons added.


    "Maintain HTML copies of each written binary document"
    http://www.xav.com/scripts/search/help/1179.html