Home > Fluid Dynamics Search Engine > Help > 1194

Automatic submissions to the visitor-added URL form

FDSE allows visitors to add their own URL's to the search index, as described at How to allow visitor URL submissions.

Allowing automatic submissions

By default, the AnonAdd form for visitor-added submissions requires the following four parameters:

FDSE creates an HTML form for manual submissions at /search.pl?Mode=AnonAdd. That form prompts for the fields shown above.

The submission process can be automated by creating the appropriate final URL, and requesting it directly:

/search.pl?Mode=AnonAdd&Realm=MyRealm&URL=http://mysite.tld/&EMAIL=me@mysite.tld

There are many software tools, called auto-submitters, which can easily submit thousands of URL's to such standardized forms using final URL's like the one above. Sometimes this is desirable; often it's not.


Preventing automatic submissions

Automatic submissions can quickly get out of hand. If there are too many, the server hosting FDSE may slow down.

To prevent automatic submissions, while still allowing manual submissions, follow these steps:

  1. Make sure you're running FDSE version 2.0.0.0068 or newer.

  2. Go to Admin Page => Filter Rules and check:

    (_) Accept all submissions.
    (*) Accept no more than [_10_] submissions during any 5-minute interval.
    [x] Use form-signature algorithm to help prevent automated submissions.

The second radio button places a hard limit on the number of submissions that can be made during a short time.

The final checkbox enables the form-signature algorithm, which causes the standard submission URL:

/search.pl?Mode=AnonAdd&Realm=MyRealm&URL=http://mysite.tld/&EMAIL=me@mysite.tld

to no longer work. Instead, FDSE will require a different parameter set with each submission. Only users who manually fill out the form will have the correct parameter set.


Implementing the "limited submission rate"

When users limit FDSE to no more than X submissions during a five minute period, the "AnonAdd" interface performs a system check as soon as it is launched. If the Xth previous URL submission occurred less than five minutes ago, it will print an error and advise the visitor to return later.

The code for this is in block LimitSubmitRate within sub anonadd_main in common_admin.pl.


Implementing the form-signature algorithm

Form-signature: creating the form

When users select "[x] Use form-signature algorithm to help prevent automated submissions", FDSE generates a random 30-character alphanumeric string. This is the server "private key". It is stored securely in the settings.pl file.

When users first arrive at the visitor-added URL page, FDSE generates the submit form. When using the form-signature algorithm, FDSE makes the form slightly different with each request.

To build the form, FDSE starts with the 10-digit numeric timestamp. The 10-digit timestamp and the 30-character server private key are combined to form five 8-character strings, each containing 2 time digits followed by a 6-character slice of the private key. These five strings are hashed using Perl crypt with a random salt, to form a 65-character "signature" which is timestamped and which can only be generated or verified using the private server key.

As a result, two hidden input fields are generated:

<input type="hidden" name="timestamp" value="1022334455" />
<input type="hidden" name="signature" value="65-character-string" />

Auto-submit tools cannot generate valid signatures for their own timestamps, because they lack the server private key. This prevents them from making direct submissions into the database. They need to use a more cumbersome, two-step approach. First they visit the AnonAdd page to learn the current timestamp and signature, and next they submit the appropriate parameters.

To make it more difficult for automated programs to scan these hidden fields, FDSE generates eight random, unique three-character strings. It uses these strings to create four hidden form fields designed to confuse automated HTML parsers, as follows. Two form fields are invalid; any submissions containing these invalid fields are automatically rejected. Two fields use an "either-or" decoy which tests the parser's skill in navigating Javascript-based form elements.

# full decoys:
<script>//<input type="hidden" name="$names[5]" value="$names[6]" /></script>
<!-- <input type="hidden" name="$names[6]" value="$names[5]" /> -->

# either-or decoy:
<script>document.write(
	'<input type="hidden" name="$names[7]" value="$names[8]" />');</script>
<noscript><input type="hidden" name="$names[8]" value="$names[7]" /></noscript>

In addition to these four decoy fields, the standard field names "URL", "EMAIL", "timestamp", and "signature" are replaced with the remaining four random three-character names. A single fixed field is used as a key to the eight random names:

<input type="hidden" name="keynames" value="24-character-string" />

To complicate parsing, all hidden form elements, including the decoy fields, are positioned randomly in the form.

The code for this is in block CreateFormSignature within sub anonadd_main in common_admin.pl.

Form-signature: processing the form

When the form is submitted, FDSE first checks for the "keynames" field (failpoint 1). It sorts form elements into their known positions based on that info.

FDSE then compares the timestamp to the current time. If the timestamp is more than 20 minutes old, the submission is rejected (failpoint 2).

FDSE then re-calculates the 65-character form signature based on the submitted timestamp and the FDSE private server key. If the signatures fail to match, FDSE will return an error (failpoint 3).

FDSE then checks whether either of the invalid decoy fields are populated, and fails if they are (failpoints 4, 5). Next it checks the "either-or" decoy fields (failpoints 6, 7).

Once all tests are complete, FDSE then reassigns the URL and EMAIL form variable names, changing them back from their random field names. The submission is then handled normally.

The code for this is in block ValidateFormSignature within sub anonadd_main in common_admin.pl.

Form-signature: hacking this system

In order for an automated submission tool to work with this form, it must request the AnonAdd page periodically to extract a "form signature" which is valid for that 20-minute time window. It cannot generate valid form signatures on its own, since they depend in part on the secret 30-character server private key.

Because the auto-submitter must request the timestamp and signature, whose fieldnames are random, the auto-submitter is forced to correctly parse the entire HTML form. It must correctly handle all problematic invalid fields, and must correctly associate the randomly-named visible fields with the URL and EMAIL values. The hidden form fields are positioned at random places within the form to make it more difficult to parse.

Any auto-submit tool could parse the FDSE form correctly, since the FDSE source code and algorithms are published, but it would probably require a few hours of specialized development and testing. Because FDSE has insignificant market share in the search engine world, auto-submit tools are unlikely to invest time in creating special code for it. Basically, it is assumed that auto-submitters will happily auto-submit only when FDSE is in its default, computer-friendly mode with the form-signature algorithm disabled.

There is still a chance that an auto-submit program will choose to automate FDSE submissions even when the form-signature option is enabled. This is a known risk. The FDSE form-signature algorithm will be updated as necessary if unwanted auto-submit traffic continues to be a problem. The most popular anti-automation technique involves requiring human visitors to type out a string embedded in an image. While this approach is effective, it was not chosen (for now) because it is labor-intensive for end users, and because the addition of images would make the FDSE install more complex.



    "Automatic submissions to the visitor-added URL form"
    http://www.xav.com/scripts/search/help/1194.html