view · edit · sidebar · attach · print · history

20130315-repair-fulltext-index-dictonary-of-postgresql

<< | Index | >>


Summary

  • Repair Dictionary of PostgreSQL on production server

Commits

odba
ch.oddb.org

Index


Tokens

We have to skip stop word "auf", "den", " " and "!".
(To update do not index with these words)

oddb.org.ruby193=# select * from ts_debug('default_german', 'En Güte auf den Berg!');
   alias   |    description    | token |        dictionaries         | dictionary  | lexemes 
-----------+-------------------+-------+-----------------------------+-------------+---------
 asciiword | Word, all ASCII   | En    | {simple}                    | simple      | {en}
 blank     | Space symbols     |       | {simple}                    | simple      | {" "}
 word      | Word, all letters | Güte  | {german_ispell,german_stem} | german_stem | {gut}
 blank     | Space symbols     |       | {simple}                    | simple      | {" "}
 asciiword | Word, all ASCII   | auf   | {simple}                    | simple      | {auf}
 blank     | Space symbols     |       | {simple}                    | simple      | {" "}
 asciiword | Word, all ASCII   | den   | {simple}                    | simple      | {den}
 blank     | Space symbols     |       | {simple}                    | simple      | {" "}
 asciiword | Word, all ASCII   | Berg  | {simple}                    | simple      | {berg}
 blank     | Space symbols     | !     | {simple}                    | simple      | {!}
(10 rows)

improved

oddb.org.ruby193=# select * from ts_debug('default_german', 'En Güte zu Hause! Guten Appetit in die Schule');
   alias   |    description    |  token  |        dictionaries         |  dictionary   |     lexemes      
-----------+-------------------+---------+-----------------------------+---------------+------------------
 asciiword | Word, all ASCII   | En      | {german_ispell,german_stem} | german_stem   | {en}
 blank     | Space symbols     |         | {}                          |               | 
 word      | Word, all letters | Güte    | {german_ispell,german_stem} | german_stem   | {gut}
 blank     | Space symbols     |         | {}                          |               | 
 asciiword | Word, all ASCII   | zu      | {german_ispell,german_stem} | german_ispell | {}
 blank     | Space symbols     |         | {}                          |               | 
 asciiword | Word, all ASCII   | Hause   | {german_ispell,german_stem} | german_ispell | {hausen,hau,sen}
 blank     | Space symbols     | !       | {}                          |               | 
 asciiword | Word, all ASCII   | Guten   | {german_ispell,german_stem} | german_ispell | {gut}
 blank     | Space symbols     |         | {}                          |               | 
 asciiword | Word, all ASCII   | Appetit | {german_ispell,german_stem} | german_ispell | {appetit}
 blank     | Space symbols     |         | {}                          |               | 
 asciiword | Word, all ASCII   | in      | {german_ispell,german_stem} | german_ispell | {}
 blank     | Space symbols     |         | {}                          |               | 
 asciiword | Word, all ASCII   | die     | {german_ispell,german_stem} | german_ispell | {}
 blank     | Space symbols     |         | {}                          |               | 
 asciiword | Word, all ASCII   | Schule  | {german_ispell,german_stem} | german_ispell | {schule,schulen}
(17 rows)

Dictionary Configuration for french

Got Errors on parsing of affix.

ch.oddb> generate_dictionaries
-> ERROR:  syntax error
CONTEXT:  line 130 of configuration file "/usr/share/postgresql-9.1/tsearch_data/french_fulltext.affix": "      [^BCDFGLMNPQRST]       >       D'AD"

Related Issue

As tempolaly Solution, I commented out Block that includes "apostrophe".

oddb.org.ruby193=# select * from ts_debug('default_french', 'lógico-matemática');
   alias    |            description            |       token       |        dictionaries         | dictionary  |      lexemes       
------------+-----------------------------------+-------------------+-----------------------------+-------------+--------------------
 hword      | Hyphenated word, all letters      | lógico-matemática | {french_ispell,french_stem} | french_stem | {lógico-matemátic}
 hword_part | Hyphenated word part, all letters | lógico            | {french_ispell,french_stem} | french_stem | {lógico}
 blank      | Space symbols                     | -                 | {}                          |             | 
 hword_part | Hyphenated word part, all letters | matemática        | {french_ispell,french_stem} | french_stem | {matemátic}
(4 rows
oddb.org.ruby193=# select * from ts_debug('default_french', 'Je veux manger du fromage délicieux');
   alias   |    description    |   token   |        dictionaries         |  dictionary   |  lexemes  
-----------+-------------------+-----------+-----------------------------+---------------+-----------
 asciiword | Word, all ASCII   | Je        | {french_ispell,french_stem} | french_ispell | {}
 blank     | Space symbols     |           | {}                          |               | 
 asciiword | Word, all ASCII   | veux      | {french_ispell,french_stem} | french_ispell | {veux}
 blank     | Space symbols     |           | {}                          |               | 
 asciiword | Word, all ASCII   | manger    | {french_ispell,french_stem} | french_ispell | {manger}
 blank     | Space symbols     |           | {}                          |               | 
 asciiword | Word, all ASCII   | du        | {french_ispell,french_stem} | french_ispell | {}
 blank     | Space symbols     |           | {}                          |               | 
 asciiword | Word, all ASCII   | fromage   | {french_ispell,french_stem} | french_ispell | {fromage}
 blank     | Space symbols     |           | {}                          |               | 
 word      | Word, all letters | délicieux | {french_ispell,french_stem} | french_stem   | {délici}
(11 rows)

Commit

odba
ch.oddb.org

view · edit · sidebar · attach · print · history
Page last modified on March 15, 2013, at 08:37 AM