Customizing HTML: Parsing Server-Side Includes (SSI)
Include files are excellent tools for managing web content. FDSE provides limited support for them.
As an example of include support, the "header.htm" template for each language uses an SSI call to inline the text of the "style.inc" file.
SSI parsing is done with the PrintTemplate function. PrintTemplate will attempt to properly handle include files. Because it is ignorant of the mappings between URL paths and file system paths, this code is somewhat error prone. However, I believe it will function for the majority of users and those users will find it very helpful.
Below are acceptable server-side include formats. These must match exactly - no freedom in whitespace is allowed:
<p>The script was modified on: <!--#echo var="LAST_MODIFIED" -->.</p>
<p>Hello world: <!--#include virtual="file.txt" -->.</p>
<p>Hello world: <!--#include file="file.txt" -->.</p>
Handling of "echo var" SSI calls is modeled after the Apache mod_include specification; see http://httpd.apache.org/docs/mod/mod_include.html. Vars supported are DATE_GMT, DATE_LOCAL, LAST_MODIFIED, DOCUMENT_URI, DOCUMENT_NAME, and all standard environment variables (SERVER_SOFTWARE, HTTP_USER_AGENT, HTTP_REFERER, REMOTE_ADDR, etc.)
The latter two examples indicate that the user wants to replace the SSI call with the literal contents of "file.txt". Let's say our current working directory is "./searchdata" (or whatever $DataFilesDir is), and the language is set to "english". We search for the file "file.txt" in this order:
"./searchdata/templates/english/file.txt" "./searchdata/templates/english/../file.txt" "./searchdata/templates/english/../../file.txt" "./searchdata/templates/english/../../../file.txt" "./searchdata/templates/english/../../../../file.txt" ...
Up to 12 parent paths will be searched. Note that this is equivalent to searching:
"./searchdata/templates/english/file.txt" "./searchdata/templates/file.txt" "./searchdata/file.txt" "./file.txt" "../file.txt" ...
If "file.txt" if found, its contents will be read and inserted into the document where the SSI call was, and the search process will stop. The text contents will also be recursively searched for any SSI calls, and PrintTemplate will attempt to resolve those as well. Any %replace_values will also be handled.
There is the risk of infinite looping, i.e. "foo.txt" includes "bar.txt" which includes "foo.txt" which includes "bar.txt" which... To avoid this, if PrintTemplate has already included some file named "file.txt", then it will not include another one named "file.txt", even though they may be in different directories. This is because PrintTemplate is generally ignorant of its own absolute path, and so it cannot resolve the absolute path of the target file, and since it never wants to risk including the same file twice, it errs on the side of caution.
SSI parsing will fail under these scenarios:
-
As a security precaution, any included file must have an extension, and the extension must be one of: (txt|htm|html|shtml|stm|inc). Attempting to include any other file like "/etc/passwd" or "passwords.asp" will fail with an error. This is an unfortunate divergence between
PrintTemplate's behavior and the mod_include specification. -
SSI elements "config", "exec", "fsize", "flastmod", "printenv", "set" are ignored (passed through to output - other filters may capture and execute them). Only "include" and "echo" are handled.
-
Includes will cause the physical file contents to be included, not the HTTP output of the file. This is an unfortunate divergence between
PrintTemplate's behavior and the mod_include specification. Thus, some users might expect <--#include virtual="/ads.stm" --> to include the HTML of an advertisement, but instead it will include the source code of the "ads.stm" program. Thanks to the text/html file extension restrictions above, this should result in at most an annoyance rather than a security breach. -
When $DataFilesDir is somewhere other than your main web directory, attempts to include files in your main web directory will probably fail, since searching is done only in the folder $DataFilesDir and directly above it.
-
On a related note, includes will fail when the URL path has a fundamentally different name than its underlying file system path. For example, <--#include virtual="/cgi-bin/foo.cgi" --> will fail if the web server maps "/cgi-bin/" to "e:\webroot\bin". The difference between literal folder names "cgi-bin" and "bin" prevents any form of "../../cgi-bin/foo.cgi" from mapping to "~/bin/foo.cgi". Web authors who understand this can easily work around it by writing their include statements appropriately.
-
Unlike the web server software,
PrintTemplatewill always resolve paths relative to its own current working directory using the above algorithm, rather than relative to the include file being parsed. This is an unfortunate divergence betweenPrintTemplate's behavior and the mod_include specification. Thus, if a user includes "/foo/bar.txt", and then bar.txt contains an include of "abc.txt",PrintTemplatewill begin the search for "abc.txt" back in "./searchdata/templates/english/", rather than in the same subfolder where "bar.txt" was found. It is likely thatPrintTemplatewill never locate the actual file "abc.txt" in this scenario, because it is now ignorant of the fact that it needs to look in subfolder "/foo/". -
On systems where $0 returns a basename rather than an absolute path, "echo var" of LAST_MODIFIED and DOCUMENT_URI will probably fail.
These failures arise because:
-
this Perl script lacks information about its own absolute path
-
this Perl script is ignorant of the virtual path -to- physical path mappings of the web server
-
this script is ignorant of filename-executable mappings of the web server, and may not have sufficient privileges/power to invoke them in any case
These reasons are fundamental and cannot be (generally) overcome with current software. The only way to overcome this is to have CGI execution operate beneath the include-processing filter. Currently web servers seem to send content either to the include parser or to the Perl parser, not both.
"Customizing HTML: Parsing Server-Side Includes (SSI)"
http://www.xav.com/scripts/search/help/1024.html