Indexing MP3 files using built-in runtime conversion
This document describes how to index MP3 files by automatically extracting the text from them. FDSE contains internal Perl routines for extracting text from MP3 files, so no separate applications need to be installed.
Steps to enable:
-
Make sure you're running FDSE version 2.0.0.0064 or newer.
-
Go to the FDSE Admin Page in your browser. Choose General Settings, then Binary Converters - Setup and Test.
-
The binary converters page will list all known converters and their enabled/disabled status. Confirm that MP3-Internal is listed as enabled.
-
Next, configure FDSE to discover MP3 files. Go to Admin Page => General Settings => Crawler: Ignore Links To and remove "mp3" from the extension list, if it is present there. Then go to General Settings => Ext and add "mp3" to that extension list, if it is not already there. Finally, return to "Binary Converters - Setup and Test" and click the "cross-reference" link. That will verify that your general settings match the binary converters that are loaded.
-
Next, create a realm named "Binary Conversion Test" which will include the MP3 files that you want to test. Don't rebuild the realm - just create it.
-
Return to the "Binary Converters - Setup and Test" page and reload it. There will now be a link labeled "index all files". Click it to index. When you start a rebuild from this page, the converter will include debug output, so you can see what is going on. Confirm that text is properly extracted from your MP3 files.
The MP3-Internal converter extracts MP3 ID3v1 tags from the end of the file, including title, artist, album, year, and comment. It will fail if no ID3v1 tags are present. The tags are arranged as an HTML block:
<head> <title>$artist - $title.mp3</title> <meta name="description" content= "From "$album", $year. "$title" by $artist. $comment." /> </head> $album $year $title $artist $comment -
Finally, once testing is done, you may delete the "Binary Conversion Test" realm and begin to index MP3 files normally.
Known Problems: things to keep in mind if you have trouble:
-
The "Max Characters: File" setting causes most documents to only be read through the first 64,000 characters. This is smaller than many MP3 files, and sending a truncated MP3 file to the converter will cause it to fail. FDSE works around this problem for the majority of cases by ignoring the "Max Characters: File" setting for files which have the ".mp3" extension. However, if you are retrieving MP3 files from the web and the document URL does not end in ".mp3", then you may experience this problem. You can work around it by setting "Max Characters: File" to 0 to bypass truncation, or by setting it to a sufficiently large value.
-
The web crawler will attempt MP3-to-text conversion on only those documents which return the Content-Type "audio/mpeg-3" or "audio/mpeg". If the MP3 files are not returning an accurate Content-Type header, then they will not be processed properly.
See Maintain HTML files about each non-written binary document for another way to index MP3 files. This other method allows you to associate more text with each file.
"Indexing MP3 files using built-in runtime conversion"
http://www.xav.com/scripts/search/help/1183.html