view · edit · sidebar · attach · print · history

20110901-fulltext-search-migel

<< Masa.20110902-refactoring-testcases-search-migel | 2011 | Masa.20110831-debug-bbmb_ch-update-search-migel >>


  1. Make a fulltext index table in odba
  2. Check drug search algorithm in ch.oddb.org
  3. Mix search function migel
  4. Product article_name fulltext search, and pharmacode, eancode search

Goal/Estimate/Evaluation:

  • Update migel search function / 90% / 90%
Milestones
  • migel search
  1. create fulltext index table
Summary
Commits
ToDo
  • 'could not sort' warning
  • Migel::Model::Migelid.all problem
    • This method takes much long time and result in memory leak
  • Refactor Group, Subgroup, Migelid because some code in the classes are totally same
  • Testcases oddb.org, migel

Make a fulltext index table in odba

Refer to

Idea

  • Make a new property in Migelid class (z.B. @full_description) that includes Group.name, Subgroup.name, Migelid.name and description
  • Make it odba_index

Experiment

  • lib/migel/model/migelid.rb
      def full_description(lang = 'de')
        [subgroup.group.name.send(lang)||'', subgroup.name.send(lang)||'', name.send(lang), (migelid_text and migelid_text.send(lang) or '')].join(' ')
      end
  • lib/migel/persistence/odba/model/migelid.rb
      odba_index :full_description_de, 'full_description(:de)'
      odba_index :full_description_fr, 'full_description(:fr)'

Restore

masa@masa ~/ywesee/migel $ sudo -u postgres dropdb migel; sudo -u postgres createdb -E UTF8 -T template0 migel
masa@masa ~/ywesee/migel $ bin/migeld
migel> Migel::Importer.new.update('data/csv/migel_de_test.csv', 'de')
-> Array

Run

  • bin/migeld

Setting

migel> Migel::Model::Migelid.find_by_migel_code('15.30.50.00.1').migelid_text.de = 'masa'
-> masa
migel> Migel::Model::Migelid.find_by_migel_code('15.30.50.00.1').migelid_text.de
-> masa
migel> Migel::Model::Migelid.find_by_migel_code('15.30.50.00.1').save

Result

migel> Migel::Model::Migelid.find_by_migel_code('15.30.50.00.1').full_description('de')
-> INKONTINENZHILFEN Pessare Wegwerf-Scheidenpessar masa
migel> Migel::Model::Migelid.search_by_full_description_de('masa').length
-> 0
migel> Migel::Model::Migelid.search_by_full_description_de('INKONTINEN').length
-> 37
migel> Migel::Model::Migelid.search_by_full_description_de('NKONTINEN').length
-> 0

Note

  • It seems that 'search_by_' method works as 'prefix search', not 'full text search'

Experiment

  • lib/migel/util/server.rb
      def create_fulltext_index
index_def = YAML.load <<-EOD
--- !ruby/object:ODBA::IndexDefinition 
index_name: 'migel_fulltext_index_de'
origin_klass: 'Migel::Model::Migelid'
target_klass: 'Migel::Model::Migelid'
resolve_search_term: 'full_description(:de)'
resolve_target: ''
resolve_origin: ''
fulltext: true
init_source: 'Migel::Model::Migelid.all'
EOD
# dictionary: default
        ODBA.cache.drop_index('migel_fulltext_index_de')
        ODBA.cache.create_index(index_def, Migel)

        puts "filling: #{index_def.index_name}"
        puts index_def.init_source
        source = instance_eval(index_def.init_source)
        puts "source.size: #{source.size}"
        ODBA.cache.fill_index(index_def.index_name, source)

      end

Restore

masa@masa ~/ywesee/migel $ sudo -u postgres dropdb migel; sudo -u postgres createdb -E UTF8 -T template0 migel
masa@masa ~/ywesee/migel $ bin/migeld
migel> Migel::Importer.new.update('data/csv/migel_de_test.csv', 'de')

Create fulltext index table

migel> create_fulltext_index
-> Array

Log

filling: migel_fulltext_index_de
Migel::Model::Migelid.all
source.size: 37

Result

migel> Migel::Model::Migelid.find_by_migel_code('15.30.50.00.1').full_description(:de)
-> INKONTINENZHILFEN,Pessare,Wegwerf-Scheidenpessar,
migel> ODBA.cache.retrieve_from_index('migel_model_migelid_migel_code', 'INKONTINENZHILFEN').length
-> 0

Note

  • failed

Check table

masa@masa ~/ywesee/migel $ psql migel -c 'SELECT * FROM migel_fulltext_index_de;' -A -F,> test.csv
origin_id,search_term,target_id
82,,82
74,,74
28,,28
94,,94
42,,42
30,,30
84,,84
36,,36
98,,98
26,,26
48,,48
108,,108
104,,104
90,,90
70,,70
112,,112
56,,56
92,,92
34,,34
78,,78
46,,46
120,,120
102,,102
80,,80
62,,62
118,,118
40,,40
86,,86
76,,76
60,,60
106,,106
54,,54
58,,58
114,,114
64,,64
96,,96
68,,68
(37 Zeilen)

Note

  • There is no 'search_term'

Reference

In oddb.org (z.B. migel_code: 15.30.50.00.1)

origin_id,search_term,target_id
...
796127,'153050001':1 'inkontinenzhilf':2 'pessar':3 'scheidenpessar':7 'wegwerf':4,6 'wegwerfscheidenpessar':5,796127
...

Question

  • How to set data on 'search_term'?

Experiment

  • lib/migel/util/server.rb
      def create_fulltext_index
 index_def = YAML.load <<-EOD
 --- !ruby/object:ODBA::IndexDefinition 
 index_name: 'migel_fulltext_index_de'
 origin_klass: 'Migel::Model::Migelid'
 target_klass: 'Migel::Model::Migelid'
 resolve_search_term: 'full_description(:de)'
 resolve_target: ''
 resolve_origin: ''
 fulltext: true
 init_source: 'Migel::Model::Migelid.all'
 dictionary: 'german'
 EOD

IMPORTANT

  • An appropriate 'dictionay' installed in the postgresql must be set for the index table definition

Restore

masa@masa ~/ywesee/migel $ sudo -u postgres dropdb migel; sudo -u postgres createdb -E UTF8 -T template0 migel
masa@masa ~/ywesee/migel $ bin/migeld
migel> Migel::Importer.new.update('data/csv/migel_de_test.csv', 'de')
migel> create_fulltext_index

Result

masa@masa ~/ywesee/oddb.org $ psql migel -c 'SELECT * FROM migel_fulltext_index_de;' -A -F,
...
origin_id,search_term,target_id
74,'ablauf':9 'beinbeutel':4,7 'inkontinenzhilf':1 'unsteril':10 'urin':3,6 'urin-beinbeutel':2,5,74
82,'ablauf':9 'anatom':10 'beinbeutel':4,7 'form':11 'inkontinenzhilf':1 'steril':12 'urin':3,6 'urin-beinbeutel':2,5,82
84,'beinbeutel':4 'beinbeuteltasch':5 'gurt':6 'inkontinenzhilf':1 'urin':3 'urin-beinbeutel':2,84
...
migel> ODBA.cache.retrieve_from_index('migel_fulltext_index_de', 'INKONTINENZHILFEN').length
-> 37
migel> ODBA.cache.retrieve_from_index('migel_fulltext_index_de', 'Wegwerf').length
-> 1
migel> ODBA.cache.retrieve_from_index('migel_fulltext_index_de', 'Wegwer').length
-> 0

Note

  • Good
  • But it is 'word' search
  • Drug search in ch.oddb.org can find drugs by a part of word
ch.oddb> search_oddb('inder', 'de').package_count
-> 21

Next

  • Check the algorithm of drug search in ch.oddb.org

Check drug search algorithm in ch.oddb.org

Refer to

  def search_oddb(query, lang)
    # current search_order:
    # 1. atcless
    # 2. iksnr or ean13
    # 3. atc-code
    # 4. exact word in sequence name
    # 5. company-name
    # 6. substance
    # 7. indication
    # 8. sequence
...

Note

  • Basically, the method calls each search method in order as above
  • the search result is an instance of ODDB::SearchResult class

Experiment

ch.oddb> ODBA.cache.retrieve_from_index('sequence_index_atc','inder').length
-> 3
ch.oddb> ODBA.cache.retrieve_from_index('sequence_index_atc','inder')[0]
-> Erythromycin
  • index table definition
 --- !ruby/object:ODBA::IndexDefinition 
 index_name: 'sequence_index_atc'
 origin_klass: 'ODDB::Sequence'
 target_klass: 'ODDB::AtcClass'
 resolve_search_term: 'search_terms'
 resolve_target: 'atc_class'
 resolve_origin: 'sequences'
 init_source: '@atc_classes.values'
 fulltext: false

Note

  • This table is not for 'fulltext search', that means 'prefix search'

Confirm

ch.oddb> ODBA.cache.retrieve_from_index('sequence_index_atc','nderal').length
-> 0
ch.oddb> search_oddb('nderal', 'de').package_count
-> 0

Note

  • Bingo

Mix search function migel

Design

  1. (Product name prefix search)
  2. (migel_code search)
  3. Group, Subgroup, Migelid name and migel_text fulltext search
  4. Group, Subgroup, Migelid name prefix search

Note

  • The first 2 search functions are already implemented
  • src/util/oddbapp.rb
    def search_migel_products(query, lang)
      migel_code = if query =~ /(\d){9}/
                     query.split(/(\d\d)/).select{|x| !x.empty?}.join('.')
                   elsif query =~ /(\d\d\.){4}\d/
                     query
                   end
      if migel_code
         MIGEL_SERVER.migelid.search_by_migel_code(migel_code)
      else
         MIGEL_SERVER.search_migel_migelid(query, lang)
      end
    end
  • lib/migel/util/server.rb
      def search_migel_migelid(query, lang)
        # search order
        # 1. Group, Subgroup, Migelid name fulltext search
        # 2. Group, Subgroup, Migelid name prefix search
        search_migelid_fulltext(query, lang) or search_migelid_by_name(query, lang)
      end
      def search_migelid_fulltext(query, lang)
        index_table_name = 'migel_fulltext_index_' + lang
        result = ODBA.cache.retrieve_from_index(index_table_name, query)
        ODBA::DRbWrapper.new(result) unless result.empty?
      end
      def search_migelid_by_name(query, lang)
        search_method = 'search_by_name_' + lang
        result = []
        if groups = Migel::Model::Group.send(search_method, query) and !groups.empty?
          groups.each do |group|
            result.concat group.subgroups.collect{|sg| sg.migelids}.flatten
          end
        end
        if subgroups = Migel::Model::Subgroup.send(search_method, query) and !subgroups.empty?
          result.concat subgroups.collect{|sg| sg.migelids}.flatten
        end
        result.concat Migel::Model::Migelid.send(search_method, query)
        ODBA::DRbWrapper.new(result.uniq)
      end

Restore

masa@masa ~/ywesee/migel $ sudo -u postgres dropdb migel; sudo -u postgres createdb -E UTF8 -T template0 migel
masa@masa ~/ywesee/migel $ bin/migeld
migel> Migel::Importer.new.update('data/csv/migel_de_test.csv', 'de')
-> Array
migel> Migel::Model::Migelid.find_by_migel_code('15.30.50.00.1').migelid_text.de = 'masa'
-> masa
migel> Migel::Model::Migelid.find_by_migel_code('15.30.50.00.1').save
-> Stück
migel> Migel::Model::Migelid.find_by_migel_code('15.30.50.00.1').migelid_text.de
-> masa
migel> create_fulltext_index
-> Array
migel> Migel::Importer.new.update_products_by_migel_code('15.30.50.00.1')
-> Array

Note

  • Migel::Model::Migelid.all in create_fulltext_index method should be replace to the other code later

Run

  • bin/oddbd
  • bin/currencyd
  • bin/migeld

Search

Note

Note

  • 'inkontinenz' is included in Migelid name, that is why the results are different

Product article_name fulltext search, and pharmacode, eancode search]]

  • src/util/oddbapp.rb
    def search_migel_items(query, lang)
      if query =~ /^\d{13}$/
        MIGEL_SERVER.product.search_by_ean_code(query)
      elsif query =~ /^\d{6,}$/
        MIGEL_SERVER.product.search_by_pharmacode(query)
      else
        search_method_article_name = 'search_by_article_name_' + lang.downcase.to_s
        search_method_company_name = 'search_by_company_name_' + lang.downcase.to_s
        #MIGEL_SERVER.product.search_by_article_name(query) + MIGEL_SERVER.product.search_by_companyname(query)
        ret = MIGEL_SERVER.product.send(search_method_article_name, query) + MIGEL_SERVER.product.send(search_method_company_name, query)
        ret.each do |x|
          p x
        end
        ret
      end
    end

Result

Note

  • after the product name search, the following message comes
 could not sort: undefined method `<=>' for nil:NilClass

Next

  • fulltext index table of product name
view · edit · sidebar · attach · print · history
Page last modified on March 12, 2013, at 02:07 PM