Home > Fluid Dynamics Search Engine > Help > 1168

How to use FDSE to search Russian text

This help article describes how to configure FDSE to best search Russian text encoded with the iso-8859-5 character set.

Note that FDSE only officially supports Latin characters sets. I am not familiar with Russian. The information presented here is based on the experiences of those who've translated the product to Russian. Additional comments and corrections to this information are welcome. I may not be able to answer questions because I don't understand the language. - Zoltan

Follow these steps to search in Russian:

  1. Make sure you are using FDSE version 2.0.0.0058 or newer. That was the first version to ship with a the partial Russian translation.

    Note that only 12% of strings are currently translated from English to Russian (most of them in the public interface). For more information on how to translate additional strings, see Translating the user interface.

  2. From the Admin Page, choose "User Interface" from the navigation bar. From there, scroll down to "Language and Locale Settings". You may select the language labeled Ьяээцфх / ru.

  3. Edit the library file "search/searchmods/common.pl". There you will find a block of code which begins like:

    my %extended_charset = (
    	138 => [  's',  'S', chr(154),    0, 'Scaron'],
    	140 => [ 'oe', 'OE', chr(156),    0, 'OE ligature'],
    	142 => [  'z',  'Z', chr(158),    0, ''],
    	154 => [  's',  's',    0,    0, 'scaron'],
    	156 => [ 'oe', 'oe',    0,    0, 'oe ligature'],
    	158 => [  'z',  'z',    0,    0, ''],
    	159 => [  'y',  'Y', chr(255),    0, ''],
    
    	...
    
    	253 => [  'y',  'y',    0,    0, 'Small y, acute accent'],
    	254 => [  'p',  'p',    0,    0, 'Small thorn, Icelandic'],
    	255 => [  'y',  'y',    0,    0, 'Small y, diaeresis / umlaut'],
    	);

    This is the mapping hash which decides which characters to keep and which ones to discard. The default hash is configured for Latin text. To support Cyrillic text, replace this hash with:

    my %extended_charset = (
    	138 => [  's',  'S', chr(154),    0, 'Scaron'],
    	140 => [ 'oe', 'OE', chr(156),    0, 'OE ligature'],
    	142 => [  'z',  'Z', chr(158),    0, ''],
    	154 => [  's',  's',    0,    0, 'scaron'],
    	156 => [ 'oe', 'oe',    0,    0, 'oe ligature'],
    	158 => [  'z',  'z',    0,    0, ''],
    	159 => [  'y',  'Y', chr(255),    0, ''],
    	160 => [   -1,   -1,   -1,   -1, 'Nonbreaking space'],
    	161 => [   -1,   -1,   -1,   -1, 'Inverted exclamation'],
    	162 => [   -1,   -1,   -1,   -1, 'Cent sign'],
    	163 => [   -1,   -1,   -1,   -1, 'Pound sterling'],
    	164 => [   -1,   -1,   -1,   -1, 'General currency sign'],
    	165 => [  chr(180), 0, chr(180), 0, ''],
    	166 => [   -1,   -1,   -1,   -1, 'Broken vertical bar'],
    	167 => [   -1,   -1,   -1,   -1, 'Section sign'],
    	168 => [  chr(184), 0, chr(184), 0, ''],
    	169 => [   -1,   -1,   -1,   -1, 'Copyright'],
    	170 => [  chr(186), 0, chr(186), 0, ''],
    	171 => [   -1,   -1,   -1,   -1, 'Left angle quote, guillemet left'],
    	172 => [   -1,   -1,   -1,   -1, 'Not sign'],
    	173 => [   -1,   -1,   -1,   -1, 'Soft hyphen'],
    	174 => [   -1,   -1,   -1,   -1, 'Registered trademark'],
    	175 => [  chr(191), 0, chr(191), 0, ''],
    	176 => [   -1,   -1,   -1,   -1, 'Degree sign'],
    	177 => [   -1,   -1,   -1,   -1, 'Plus or minus'],
    	178 => [  chr(179), 0, chr(179), 0, ''],
    	179 => [  0, 0, 0, 0, ''],
    	180 => [  0, 0, 0, 0, ''],
    	181 => [   -1,   -1,   -1,   -1, 'Micro sign'],
    	182 => [   -1,   -1,   -1,   -1, 'Paragraph sign'],
    	183 => [   -1,   -1,   -1,   -1, 'Middle dot'],
    	184 => [   0,   0,   0,  0 , ''],
    	185 => [   -1,   -1,   -1,   -1, 'Superscript 1'],
    	186 => [   0,   0,   0,   0, ''],
    	187 => [   -1,   -1,   -1,   -1, 'Right angle quote, guillemet right'],
    	188 => [   -1,   -1,   -1,   -1, 'Fraction one-fourth'],
    	189 => [   -1,   -1,   -1,   -1, 'Fraction one-half'],
    	190 => [   -1,   -1,   -1,   -1, 'Fraction three-fourths'],
    	191 => [   0,   0,   0,   0, ''],
    	192 => [  chr(224), 0, chr(224), 0, ''],
    	193 => [  chr(225), 0, chr(225), 0, ''],
    	194 => [  chr(226), 0, chr(226), 0, ''],
    	195 => [  chr(227), 0, chr(227), 0, ''],
    	196 => [  chr(228), 0, chr(228), 0, ''],
    	197 => [  chr(229), 0, chr(229), 0, ''],
    	198 => [  chr(230), 0, chr(230), 0, ''],
    	199 => [  chr(231), 0, chr(231), 0, ''],
    	200 => [  chr(232), 0, chr(232), 0, ''],
    	201 => [  chr(233), 0, chr(233), 0, ''],
    	202 => [  chr(234), 0, chr(234), 0, ''],
    	203 => [  chr(235), 0, chr(235), 0, ''],
    	204 => [  chr(236), 0, chr(236), 0, ''],
    	205 => [  chr(237), 0, chr(237), 0, ''],
    	206 => [  chr(238), 0, chr(238), 0, ''],
    	207 => [  chr(239), 0, chr(239), 0, ''],
    	208 => [  chr(240), 0, chr(240), 0, ''],
    	209 => [  chr(241), 0, chr(241), 0, ''],
    	210 => [  chr(242), 0, chr(242), 0, ''],
    	211 => [  chr(243), 0, chr(243), 0, ''],
    	212 => [  chr(244), 0, chr(244), 0, ''],
    	213 => [  chr(245), 0, chr(245), 0, ''],
    	214 => [  chr(246), 0, chr(246), 0, ''],
    	215 => [  chr(247), 0, chr(247), 0, ''],
    	216 => [  chr(248), 0, chr(248), 0, ''],
    	217 => [  chr(249), 0, chr(249), 0, ''],
    	218 => [  chr(250), 0, chr(250), 0, ''],
    	219 => [  chr(251), 0, chr(251), 0, ''],
    	220 => [  chr(252), 0, chr(252), 0, ''],
    	221 => [  chr(253), 0, chr(253), 0, ''],
    	222 => [  chr(254), 0, chr(254), 0, ''],
    	223 => [  chr(255), 0, chr(255), 0, ''],
    	224 => [  0, 0, 0, 0, ''],
    	225 => [  0, 0, 0, 0, ''],
    	226 => [  0, 0, 0, 0, ''],
    	227 => [  0, 0, 0, 0, ''],
    	228 => [  0, 0, 0, 0, ''],
    	229 => [  0, 0, 0, 0, ''],
    	230 => [  0, 0, 0, 0, ''],
    	231 => [  0, 0, 0, 0, ''],
    	232 => [  0, 0, 0, 0, ''],
    	233 => [  0, 0, 0, 0, ''],
    	234 => [  0, 0, 0, 0, ''],
    	235 => [  0, 0, 0, 0, ''],
    	236 => [  0, 0, 0, 0, ''],
    	237 => [  0, 0, 0, 0, ''],
    	238 => [  0, 0, 0, 0, ''],
    	239 => [  0, 0, 0, 0, ''],
    	240 => [  0, 0, 0, 0, ''],
    	241 => [  0, 0, 0, 0, ''],
    	242 => [  0, 0, 0, 0, ''],
    	243 => [  0, 0, 0, 0, ''],
    	244 => [  0, 0, 0, 0, ''],
    	245 => [  0, 0, 0, 0, ''],
    	246 => [  0, 0, 0, 0, ''],
    	247 => [  0, 0, 0, 0, ''],
    	248 => [  0, 0, 0, 0, ''],
    	249 => [  0, 0, 0, 0, ''],
    	250 => [  0, 0, 0, 0, ''],
    	251 => [  0, 0, 0, 0, ''],
    	252 => [  0, 0, 0, 0, ''],
    	253 => [  0, 0, 0, 0, ''],
    	254 => [  0, 0, 0, 0, ''],
    	255 => [  0, 0, 0, 0, ''],
    	);
  4. After updating the extended charset map, you should rebuild all of your index files. You will then be able to search Russian text from a Russian interface.


    "How to use FDSE to search Russian text"
    http://www.xav.com/scripts/search/help/1168.html