Home > Fluid Dynamics Search Engine > Help > 1156

Common problems when customizing HTML

Use this checklist if you are having trouble customizing your FDSE templates.

Validating your templates

Whenever you edit the templates, you should use an HTML validator service like http://www.doctor-html.com/RxHTML/ to programmatically validate your HTML.

You can send your basic URL of http://www.mysite.com/search/search.pl against the validator to make sure your header.htm, searchform.htm, tips.htm, and footer.htm templates are all valid and work together properly. Then you can send a URL like http://www.mysite.com/search/search.pl?q=test against the validator to make sure that header.htm, line_listing.txt, searchform.htm, and footer.htm work properly.

Always validate against the output of search.pl, rather than validating the individual template files themselves.

Validation is a great way to find problems with mismatched, overlapped, and unclosed tags, which are a major source of trouble when editing.

Please provide feedback

If you encounter a problem that isn't described here, please share it. We can add it to this document, or perhaps improve the templating system itself so that others can benefit from your experience.


The troublesome <DOCTYPE ... > tag

Symptoms:

Styles, font sizes, and other layout details render differently in the search.pl output compared with all other pages on the site, even though all pages use almost identical HTML. The difference is most apparent when using Internet Explorer 6.0.

Background:

The current W3 standard is XHTML. This standard requires a DOCTYPE header which specifies the version of the markup in the file. We respect Internet standards, and so the FDSE output has been standardized on XHTML and the appropriate DOCTYPE is included at the top of the header.htm template.

At the same time, however, Microsoft has chosen to fix all standards-compatibility bugs in its Internet Explorer 6.0+ browser by having two modes, standards-compliant and compatibility. The mode is chosen based on the appearance and version of the DOCTYPE header, making the MSIE behavior controllable by the web author. There are significant behavioral differences between these modes, particularly with sites that use stylesheets.

Does your website contains HTML files without a DOCTYPE tag, or with an old tag? If so, when users navigate between your files and the FDSE script, the fundamental behavioral mode of their browser may be switching back and forth. This will cause major differences in the way HTML is rendered. For those who've been raised to believe that DOCTYPE is irrelevant, this can be very confusing and hard to track down. If you see these problems, the DOCTYPE tag is the first thing to check.

You can read more about Microsoft's approach to DOCTYPE here:

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnie60/html/cssenhancements.asp

For more information on XHTML, see:

http://www.w3.org/TR/xhtml1/

Solution:

Edit template file "search/searchdata/templates/header.htm" and remove the leading <DOCTYPE ... >, or edit it so that it exactly matches the DOCTYPE used elsewhere on your site.

Note that the FDSE script output is set to XHTML 1.0. Your only choice is in whether you label it as that or not. The actual output will always be XHTML 1.0 and cannot be changed. This will result in validation errors on your site if you place an HTML 4.01 Strict DOCTYPE header above the search.pl output.


Using the <base href="..." /> tag

Symptoms:

Page renders correctly on first visit, but all self-referencing links and forms lead to a 404 Not Found error.

Background:

FDSE uses the shortest possible self-referencing string when creating forms or links back to itself. For example, assume your search engine is installed to:

	http://www.mysite.tld/cgi-bin/search/search.pl

In this case, there are three different but equally valid ways for the script to refer to itself. They are:

	http://www.mysite.tld/cgi-bin/search/search.pl

	/cgi-bin/search/search.pl

	search.pl

By default, FDSE will use the final, shortest form. This means that links and forms will look like this:

<a href="search.pl?Mode=Tips">Tips</a>

<form method="get" action="search.pl">

Unfortunately, when the "header.htm" template is customized to include a base tag, those links and forms will render as:

<base href="http://www.mysite.tld/" />

<a href="search.pl?Mode=Tips">Tips</a>

<form method="get" action="search.pl">

The browser interprets the URL as:

	http://www.mysite.tld/search.pl

which returns a 404 Not Found error.

Deeper Background:

FDSE always uses the shortest possible path for self-reference because the environment variables SCRIPT_NAME, SERVER_NAME and HTTP_HOST, which are used to auto-detect the full URL, are not reliable across all web server software. In particular, some CGI wrappers will corrupt the path portion found in SCRIPT_NAME so that a path like "/cgi-bin/search/search.pl" will be returned as "/cgi-sys/cgiwrap/search/search.pl". There are so many minor variants of this path corruption that it is not feasible to auto-correct it.

There are many ways for us to deal with this programming problem. The tough-love approach is to always use SCRIPT_NAME anyway, and let users face failures on mis-configured hosts, in hopes that this forces the host to return to the standard. (This approach was tried for a long time but it only resulted in user frustration, which was directed as us. There were no cases of fixes to an underlying server problem as a result of the pain we inflicted.) A very common but labor-intensive approach is to require every FDSE user to hardcode the full URL of their script into the source code as a variable. The approach we have chosen is to extract the shortest possible name, which is least likely to be corrupt, and to use that. Since that is not always the best solution, we make it easy to override that value as shown below.

Solution

One solution is to not use a base tag in your templates.

Another solution is to edit the file search.pl (or search.cgi) and find the code that looks like this:

my $sn = &query_env('SCRIPT_NAME');
$sn =~ s!^.*/(.+?)$!$1!;

The $sn variable is a string which contains the self-referencing URL. Simply hardcode this value to your full URL:

# my $sn = &query_env('SCRIPT_NAME');
# $sn =~ s!^.*/(.+?)$!$1!;
my $sn = 'http://www.mysite.tld/cgi-bin/search/search.pl';

By doing this, FDSE will use full URL's for all self-reference, and you can add any base tag you wish and everything will work properly. If you do this, you will need to re-edit your source code if your domain name or path ever changes.


Using relative paths to images, stylesheets, and other embedded objects

Symptoms:

Images, styles, and other objects do not appears on the search page, even though the same header and footer HTML code works correctly on all other pages.

Background:

This problem frequently comes up when all HTML pages on a web site reside in the same folder. Consider the following directory structure:

.
..
about.html
index.html
foo.html
images/
	butterfly.gif
cgi-bin/
	search/
		search.pl

On this site, the files about.html, index.html, and foo.html all contain a butterfly image at the top, coded as:

<img src="images/butterfly.gif" />

The webmaster does a cut-and-paste of all the header HTML text into the FDSE header template. Unfortunately, because FDSE executes at a depth of two subdirectories, the above IMG tag results in a request for:

http://www.mysite.tld/cgi-bin/search/images/butterfly.gif

This path is not found and so a broken image appears. The same problem occurs for embedded stylesheets, sounds, embedded Javascript files, A HREF links, and all other inlined objects.

Solution:

The easiest solution is to add a <base href="..."> tag to the top of the header template, forcing all relative paths to be evaluated using the base path with which the HTML author is most familiar. Note, of course, that when using the base tag you must make an additional source code change as described earlier in this help article.

Another solution is to rewrite all HTML to reference inlined objects using their full URL paths, i.e.

<img src="http://www.mysite.tld/images/butterfly.gif" />

Note that this relative path problem arises in a particularly difficult way with embedded Javascript tags, like:

<script type="text/javascript" src="myscript.js"></script>

In this case, the code executed at myscript.js may embed additional scripts, images, and other objects. Thus, simply replacing the call to "myscript.js" with "/myscript.js" may not solve the problem for those objects embedded later on. In these cases, using a <base href> tag is your best solution. Also, when designing your Javascripts, you may want to use absolute URL's in all places where objects are embedded.


Using relative paths: a second problem

Symptoms:

Images, styles, and other objects do not appears on the search page, even though the HTML has been customized to take paths into account.

Background:

This problem comes up when editors do not fully understand the context in which FDSE templates are executed. This problem can also come up when the FDSE templates are edited with an HTML editor program.

Consider the directory structure:

index.html
butterfly.gif
cgi-bin/
	search/
		search.pl
		searchdata/
			templates/
				header.htm
				footer.htm
				english/
					line_listing.txt

Note that "search.pl" exists at a depth of two subdirectories below the root. The template file "header.htm" exists at a depth of four subdirectories, while "line_listing.txt" exists at a depth of five.

An editor wishes to include the file "butterfly.gif" on the FDSE header. This editor might incorrectly code the image as:

<img src="../../../../butterfly.gif" /> # four '..' for four levels deep

This image tag would actually work if the browser requested the header.htm file directly. However, what we want is for the search.pl file to render correctly. The browser will request search.pl, at a depth of two subdirectories, and the contents of the "header.htm" file will be streamed to user's browser at that depth. Neither the end user nor his browser are even aware that a header.htm file exists, must less at which subdirectory depth it exists. All the browser sees is a search.pl file at a given depth which is outputting HTML. Thus, the proper way to write the relatively-linked image is in relation to how the browser sees search.pl:

<img src="../../butterfly.gif" /> # two '..' for two levels deep (search.pl depth)

If anyone has any trouble with this concept, remember that it is always safe to embed the image using the absolute URL:

<img src="http://www.mysite.tld/butterfly.gif" />

Since many HTML editor programs are designed to use relative paths (it is the safest approach for them), this may lead to consistent problems of this type. For best results, do not use WYSIWYG editors on the FDSE template files, but if you must do that, just manually edit the templates in a text editor later to correct for the paths.

Incidentally, to include the butterfly image in the line_listing.txt template, you would also code it for a subdirectory depth of two, the same as for header.htm, since in both cases the user's browser will see the HTML streaming from the search.pl file at that depth.

Related problems

Similarly, it is not permissible to add an image without path information, as with:

<img src="butterfly.gif" />

and to then add the actual image file to the templates subfolder:

index.html
cgi-bin/
	search/
		search.pl
		searchdata/
			templates/
				header.htm    <- HTML file
				footer.htm
				butterfly.gif <- local image
				english/
					line_listing.txt

This approach just won't work because the browser can't "see" into the searchdata/templates area and wouldn't know to look there anyway.

It would also fail to use the above image tag with the image at the search.pl level:

index.html
cgi-bin/
	search/
		search.pl
		butterfly.gif <- local image
		searchdata/
			templates/
				header.htm    <- HTML file
				footer.htm
				english/
					line_listing.txt

This second approach is correct with regard to relative paths, but it would fail anyway because web servers do not allow for the serving of non-CGI content from within the /cgi-bin/. In this case the browser request for /cgi-bin/search/butterfly.gif would return "403 Permission Denied" instead of "404 Not Found", but the end user would still see the same broken image on the results page.


Using WYSIWYG Editors

Symptoms:

Images do not render correctly. Tables are offset. Output may appear blank.

Background

There are several problems with using WYSIWYG editors with dynamic content, but the main problem is that the FDSE templates are HTML fragments, not HTML files. The WYSIWYG editors do not realize this. They treat each fragment as a self-contained file and they close all tags.

Here is a very brief example of how FDSE could be customized to constrain all of its output to a 600-pixel-width table. The next three blocks of quoted HTML are examples of the correct approach. It is done by setting "header.htm" to:

<html>
<head><title>FDSE</title></head>
<body>
<table border="600" align="center"><tr><td>

The template "footer.htm" is correspondingly set to:

</td></tr></table></body></html>

That is, the tag sequence table, table-row, table-data is initiated in header.htm and is closed in footer.htm, and thus all script output which appears between those two templates is constrained to the table, of width 600 pixels. The concept of tags which span the two files is central to this approach.

This is a clean, effective solution. The resulting output of search.pl is valid HTML as in:

<html>
<head><title>FDSE</title></head>
<body>
<table border="600" align="center"><tr><td>
Search results:
Your search for <i>foo</i> found 12 documents of 1200 searched.
1. results
2. results
...
</td></tr></table></body></html>

Now consider the case where a WYSIWYG editor is used on the templates. The next three quoted blocks contain examples of the incorrect HTML that is generated by the editor. Assume that, despite your best intentions, the editor refuses to allow an hanging open TABLE tag in "header.htm", and so it sets it to:

<html>
<head><title>FDSE</title></head>
<body>
<table border="600" align="center"><tr><td><br /></td></tr></table>
</body></html>

In "footer.htm", the stand-alone closing tags are either stripped, or matching open tags are created for them:

<html><head></head><body><table><tr><td></td></tr></table></body></html>

Now, when a search is performed, the output of search.pl is this complete mess:

<html>
<head><title>FDSE</title></head>
<body>
<table border="600" align="center"><tr><td><br /></td></tr></table>
</body></html>
Search results:
Your search for <i>foo</i> found 12 documents of 1200 searched.
1. results
2. results
...
<html><head></head><body><table><tr><td></td></tr></table></body></html>

The above is a completely invalid chunk of HTML. There are duplicate HTML, HEAD and BODY tags, and the centered table that was intended will not work. To make matters worse, web browsers are excellent at doing the right thing, and so the above output will seem to work about 85% of the time. There will be some layout and browser-compatibility problems, but for the most part it will work. The webmaster may think therefore that he is 15% away from being perfect, but in reality the entire template system will need to be re-written to get it right in all browsers.

Solution:

FDSE ships with default templates which just display all results on a neutral white background. You may want to just keep the defaults; they will match fairly well with any site.

If you need to make minor changes -- like red text on a black background -- you should just risk a single, text-only edit to the style.inc template file. If at all possible, avoid using a WYSIWYG editor on the template files.

If you must make major changes, try this:

  1. Use your WYSIWYG editor to create a blank page at the root of your site, like http://www.mysite.tld/blank.html. Have this page be complete with your title, styles, header, navigational sidebar, and footer. Have a big center table cell, which is empty, where your content normally goes. Type "script-output" in this table cell but otherwise leave it empty.

  2. Close the WYSIWYG editor and open the resulting HTML file in a text-only editor. Find the string "script-output". Cut and paste everything from the start of the file down to the beginning of the string, and insert it into your "header.htm" file. Cut and paste everything after "script-output", and through to the end of the file, into your "footer.htm" template.

  3. Return to the "header.htm" file. If your editor hasn't automatically added one, then manually add a header tag of <base href="http://www.mysite.tld/" />. This will help ensure that all inlined images and styles and links work properly later on. See this section though for a small source code edit you'll need to make to search.pl when using a BASE HREF tag.

  4. Save changes and view search.pl. If you're lucky, it will give you the proper results.

Equal rights for WYSIWYG users

The official position of the author is that you don't need to know HTML to use this script, but you need to know HTML if you want to customize the HTML.

And, since it is a free country, you don't even need to know HTML in order to try to customize it. But, if you're not proficient, then you're going to have problems and you should budget for that emotionally. If you need outside assistance with your new HTML skills, then you should seek it elsewhere.

People who are completely dependent on WYSIWYG are treated the same as those who are masters of hand-coding HTML: neither group is given help on the basics.


    "Common problems when customizing HTML"
    http://www.xav.com/scripts/search/help/1156.html