view · edit · sidebar · attach · print · history

20101018-update-xls2odat

<< Masa.20101019-debug-import_gkv | 2010 | Masa.20100928-resolve-missing-link-limitation-bug >>


  1. Check a sample update xls2odat
  2. Update xls2odat (analyzer method)
  3. Debug ODDB::Import::Gkv#import
  4. Study updating process of import_gkv suspend
  5. Set up yus password on production server

Goal
  • Update xls2odat / 100%
  • Update SL-Entries / 80% not yet
  • Debug ODDB::Import::Gkv#import / 30 %
Milestones
  1. Check a sample update xls2odat 8:30
  2. Update xls2odat analyzer method 9:30
  3. Look at ODDB::Import::Gkv#import method in de.oddb.org 10:00
  4. Test run import_gkv 12:00
  5. Trace back ODDB::Import::Gkv#import method 12:00
  6. Check data without importing (parsing)
  7. Print the error repord 15:45
  8. Check updating process in import_gkv
Summary
Commits
  1. xls2odat Updated if condition (analyzer method), possible to set string condition.
ToDo Tomorrow
Keep in Mind
Attached Files

Check a sample update xls2odat

Email xls2odat recognizing a term like "in Auslistung" and setting a value

in README

  * if -*- then -#-   : If the condition * is met, output #
                        -*- must be the following format
                        <column alphabet> <comparison symbol> <digit>
                        (z.B. A==12 B>3 C<=0.1)
  • at the moment, the digit is only available for if-then statement

Solutions

  1. update the if-then statement for the string data
  2. create a new statement for the string data

Solution 1 is probably easy and makes sense

Update xls2odat (analyzer method)

Update tests

$ vim test/test_xls2odat.rb

assert_equal "H", @parser.send(:analyzer, 'if F=="in Auslistung" then "H"', row, @data)

Check faliure

masa@masa ~/work/xls2odat $ ruby test/test_xls2odat.rb 
Loaded suite test/test_xls2odat
Started
F.........
Finished in 0.251387 seconds.

  1) Failure:
test_analyzer(Xls2odatTest) [test/test_xls2odat.rb:125]:
<"H"> expected but was
<"">.

10 tests, 40 assertions, 1 failures, 0 errors

Update analyzer method

        if ifthen[0] =~ /([A-Z]+)(\W+)([0-9.-]+)/ || ifthen[0] =~ /([A-Z]+)(\W+)\"(.+)\"/
          condition = [$0.strip, $1.strip, $2.strip, $3.strip]
        end
        value     = ifthen[1]
        s = 'if "' + cell(condition[1], row) + '"' + condition[2] + '"' + condition[3] + '" then ' + value + ' end'
        result = eval(s)

Notes

  • every data is compared as String data

Check test passes

masa@masa ~/work/xls2odat $ ruby test/test_xls2odat.rb 
Loaded suite test/test_xls2odat
Started
..........
Finished in 0.248878 seconds.

10 tests, 47 assertions, 0 failures, 0 errors

Check result diff

$ diff H01 ../../xls2odat/H01
278c278
< 01|20101018|1|408018|1||||||||||||408018||||||||||||2|||
---
> 01|20101018|1|408018|1||||||||||||408018||||||||||||2|||H
282c282
< 01|20101018|1|408010|1||||||||||||408010||||||||||||2|||
---
> 01|20101018|1|408010|1||||||||||||408010||||||||||||2|||H
298c298
< 01|20101018|1|3469|1||||||||||||3469||||||||||||2|||
---
> 01|20101018|1|3469|1||||||||||||3469||||||||||||2|||H
414,415c414,415
< 01|20101018|1|2979|1||||||||||||2979||||||||||||2|||
< 01|20101018|1|2976|1||||||||||||2976||||||||||||2|||
---
> 01|20101018|1|2979|1||||||||||||2979||||||||||||2|||H
> 01|20101018|1|2976|1||||||||||||2976||||||||||||2|||H
417c417
< 01|20101018|1|2975|1||||||||||||2975||||||||||||2|||
---
> 01|20101018|1|2975|1||||||||||||2975||||||||||||2|||H
491c491
< 01|20101018|1|3730|1||||||||||||3730||||||||||||2|||
---
> 01|20101018|1|3730|1||||||||||||3730||||||||||||2|||H
1081c1081
< 01|20101018|1|473578|1||||||||||||473578||||||||||||1|||
---
> 01|20101018|1|473578|1||||||||||||473578||||||||||||1|||H
1319c1319
< 01|20101018|1|448102|1||||||||||||448102||||||||||||2|||
---
> 01|20101018|1|448102|1||||||||||||448102||||||||||||2|||H
1324,1330c1324,1330

Update README

  * if -*- then -#-   : If the condition * is met, output #
                        -*- must be the following format
                        <column alphabet> <comparison symbol> <digit|string>
                        (z.B. A==12 B>3 C<=0.1 F=="ABC")

                        -#- can be digit or string
                        (z.B. if A==12 then 0 if A=="XYZ" then "OK")

Update ruby gem

Failed

masa@masa ~/work/xls2odat $ gem build xls2odat.gemspec 
WARNING:  no rubyforge_project specified
ERROR:  While executing gem ... (NoMethodError)
    undefined method `>' for nil:NilClass

Commit

Notes

  • I have not update gem file yet

Debug ODDB::Import::Gkv#import

Email Tue Oct 5 09:28:43 2010: de.oddb.org Zubef (PDF)

Error message

Tue Oct  5 08:35:47 2010: de.oddb.org  ODDB::Import::Gkv#import
NoMethodError
private method `split' called for nil:NilClass
/var/www/de.oddb.org/lib/oddb/import/gkv.rb:363:in `postprocess'
/usr/lib64/ruby/site_ruby/1.8/odba/persistable.rb:147:in `all'
/usr/lib64/ruby/site_ruby/1.8/odba/persistable.rb:147:in `each'
/usr/lib64/ruby/site_ruby/1.8/odba/persistable.rb:147:in `all'
/var/www/de.oddb.org/lib/oddb/import/gkv.rb:361:in `postprocess'
/var/www/de.oddb.org/lib/oddb/import/gkv.rb:96:in `import'
/var/www/de.oddb.org/lib/oddb/util/updater.rb:110:in `reported_import'
/var/www/de.oddb.org/lib/oddb/util/updater.rb:117:in `call'
/var/www/de.oddb.org/lib/oddb/util/updater.rb:117:in `_reported_import'
/var/www/de.oddb.org/lib/oddb/util/updater.rb:110:in `reported_import'
/var/www/de.oddb.org/lib/oddb/util/updater.rb:58:in `import_gkv'
/usr/lib64/ruby/1.8/open-uri.rb:32:in `open_uri_original_open'
/usr/lib64/ruby/1.8/open-uri.rb:32:in `open'
/var/www/de.oddb.org/lib/oddb/import/gkv.rb:76:in `download_latest'
/var/www/de.oddb.org/lib/oddb/util/updater.rb:57:in `import_gkv'
jobs/import_gkv:17
/var/www/de.oddb.org/lib/oddb/util/job.rb:16:in `call'
/var/www/de.oddb.org/lib/oddb/util/job.rb:16:in `run'
jobs/import_gkv:16

Notes

  • de.oddb.org
  • ODDB::Import::Gkv#import
  • private method `split' called for nil:NilClass
  • /var/www/de.oddb.org/lib/oddb/import/gkv.rb:363:in `postprocess'

/var/www/de.oddb.org/lib/oddb/import/gkv.rb:363:in `postprocess'

     Drugs::Product.all { |product|
      unless(product.company)
        keys = product.name.de.split

Notes

  • product.name.de must be nil
  • I have to check how it becomes nil

Solutions

  • Reproduce the same error in local
  • Check Drugs::Product.all

Test import_gkv in local

Result Email Mon Oct 18 11:03:56 2010: de.oddb.org Zubef (PDF)

Mon Oct 18 10:32:46 2010: de.oddb.org  ODDB::Import::Gkv#import
NoMethodError
private method `split' called for nil:NilClass
/home/masa/ywesee/de.oddb.org/lib/oddb/import/gkv.rb:363:in `postprocess'
/usr/lib64/ruby/site_ruby/1.8/odba/persistable.rb:147:in `all'
/usr/lib64/ruby/site_ruby/1.8/odba/persistable.rb:147:in `each'
/usr/lib64/ruby/site_ruby/1.8/odba/persistable.rb:147:in `all'
/home/masa/ywesee/de.oddb.org/lib/oddb/import/gkv.rb:361:in `postprocess'
/home/masa/ywesee/de.oddb.org/lib/oddb/import/gkv.rb:96:in `import'
/home/masa/ywesee/de.oddb.org/lib/oddb/util/updater.rb:111:in `reported_import'
/home/masa/ywesee/de.oddb.org/lib/oddb/util/updater.rb:118:in `call'
/home/masa/ywesee/de.oddb.org/lib/oddb/util/updater.rb:118:in `_reported_import'
/home/masa/ywesee/de.oddb.org/lib/oddb/util/updater.rb:111:in `reported_import'
/home/masa/ywesee/de.oddb.org/lib/oddb/util/updater.rb:59:in `import_gkv'
/usr/lib64/ruby/1.8/open-uri.rb:32:in `open_uri_original_open'
/usr/lib64/ruby/1.8/open-uri.rb:32:in `open'
/home/masa/ywesee/de.oddb.org/lib/oddb/import/gkv.rb:76:in `download_latest'
/home/masa/ywesee/de.oddb.org/lib/oddb/util/updater.rb:58:in `import_gkv'
jobs/import_gkv:17
/home/masa/ywesee/de.oddb.org/lib/oddb/util/job.rb:16:in `call'
/home/masa/ywesee/de.oddb.org/lib/oddb/util/job.rb:16:in `run'
jobs/import_gkv:16

Notes

  • I got the same error
  • It takes 30 mins

Trace back the error message

/var/www/de.oddb.org/lib/oddb/import/gkv.rb:363:in `postprocess'
/usr/lib64/ruby/site_ruby/1.8/odba/persistable.rb:147:in `all'
/usr/lib64/ruby/site_ruby/1.8/odba/persistable.rb:147:in `each'
/usr/lib64/ruby/site_ruby/1.8/odba/persistable.rb:147:in `all'
/var/www/de.oddb.org/lib/oddb/import/gkv.rb:361:in `postprocess'
/var/www/de.oddb.org/lib/oddb/import/gkv.rb:96:in `import'
/var/www/de.oddb.org/lib/oddb/util/updater.rb:110:in `reported_import'
/var/www/de.oddb.org/lib/oddb/util/updater.rb:117:in `call'
/var/www/de.oddb.org/lib/oddb/util/updater.rb:117:in `_reported_import'
/var/www/de.oddb.org/lib/oddb/util/updater.rb:110:in `reported_import'
/var/www/de.oddb.org/lib/oddb/util/updater.rb:58:in `import_gkv'
/usr/lib64/ruby/1.8/open-uri.rb:32:in `open_uri_original_open'
/usr/lib64/ruby/1.8/open-uri.rb:32:in `open'
/var/www/de.oddb.org/lib/oddb/import/gkv.rb:76:in `download_latest'
/var/www/de.oddb.org/lib/oddb/util/updater.rb:57:in `import_gkv'
jobs/import_gkv:17
/var/www/de.oddb.org/lib/oddb/util/job.rb:16:in `call'
/var/www/de.oddb.org/lib/oddb/util/job.rb:16:in `run'
jobs/import_gkv:16

lib/oddb/import/gkv.rb

  def import fh, opts={}
p "getin lib/oddb/import/gkv.rb#import"
    parser = Rpdf2txt::Parser.new(fh.read, 'utf8')
p "1"
    handler = GkvHandler.new method(:process_page)
p "2"
    parser.extract_text handler
p "3"
    postprocess
p "4"
    report
  end

Notes

  • "parser.extract_text handler" takes most of time

Experiment lib/oddb/import/gkv.rb

  def import fh, opts={}
p "getin lib/oddb/import/gkv.rb#import"
    parser = Rpdf2txt::Parser.new(fh.read, 'utf8')
p "1"
    handler = GkvHandler.new method(:process_page)
p "2"
#    parser.extract_text handler
p "3"
    postprocess
p "4"
    report
  end

Result Mail Mon Oct 18 11:57:55 2010: de.oddb.org Zubef (PDF)

Mon Oct 18 11:57:46 2010: de.oddb.org  ODDB::Import::Gkv#import
NoMethodError
private method `split' called for nil:NilClass
/home/masa/ywesee/de.oddb.org/lib/oddb/import/gkv.rb:373:in `postprocess'
/usr/lib64/ruby/site_ruby/1.8/odba/persistable.rb:147:in `all'
/usr/lib64/ruby/site_ruby/1.8/odba/persistable.rb:147:in `each'
/usr/lib64/ruby/site_ruby/1.8/odba/persistable.rb:147:in `all'
/home/masa/ywesee/de.oddb.org/lib/oddb/import/gkv.rb:371:in `postprocess'
/home/masa/ywesee/de.oddb.org/lib/oddb/import/gkv.rb:101:in `import'
/home/masa/ywesee/de.oddb.org/lib/oddb/util/updater.rb:112:in `reported_import'
/home/masa/ywesee/de.oddb.org/lib/oddb/util/updater.rb:119:in `call'
/home/masa/ywesee/de.oddb.org/lib/oddb/util/updater.rb:119:in `_reported_import'
/home/masa/ywesee/de.oddb.org/lib/oddb/util/updater.rb:112:in `reported_import'
/home/masa/ywesee/de.oddb.org/lib/oddb/util/updater.rb:60:in `import_gkv'
/usr/lib64/ruby/1.8/open-uri.rb:32:in `open_uri_original_open'
/usr/lib64/ruby/1.8/open-uri.rb:32:in `open'
/home/masa/ywesee/de.oddb.org/lib/oddb/import/gkv.rb:76:in `download_latest'
/home/masa/ywesee/de.oddb.org/lib/oddb/util/updater.rb:59:in `import_gkv'
jobs/import_gkv:17
/home/masa/ywesee/de.oddb.org/lib/oddb/util/job.rb:16:in `call'
/home/masa/ywesee/de.oddb.org/lib/oddb/util/job.rb:16:in `run'
jobs/import_gkv:16
Imported     0 Zubef-Entries on 18.10.2010:
Visited      0 existing Zubef-Entries
Visited      0 existing Companies
Visited      0 existing Substances
Created      0 new Zubef-Entries
Created      0 new Products
Created      0 new Sequences
Created      0 new Companies
Created      0 new Substances
Assigned     0 Chemical Equivalences
Assigned     0 Companies
Created      0 Incomplete Packages:

Check data

Experiment

  def postprocess
p "getin lib/oddb/import/gkv.rb#postprocess"
    Drugs::Package.search_by_code(:type => 'zuzahlungsbefreit',
                                  :value => 'true',
                                  :country => 'DE').each { |package|
      pzn = package.code(:cid).value
      unless(@confirmed_pzns.include?(pzn))
        @deleted += 1
        package.code(:zuzahlungsbefreit).value = false
        save package
      end
    } unless(@confirmed_pzns.empty?)
p Drugs::Product.all.length # 7990 on 20101018, 7913 on 20100908

oddb.20101012.sql.gz

  • Same error as well
  • Drugs::Product.all.length == 7990

oddb.20100908.sql.gz

Experiment (importing the latest pdf)

  • import_gkv 20101018 pdf on 20100908 data

lib/oddb/import/gkv.rb

  def import fh, opts={}
    parser = Rpdf2txt::Parser.new(fh.read, 'utf8')
    handler = GkvHandler.new method(:process_page)
    parser.extract_text handler
    postprocess
    report
  end

Result

Experiment

  def postprocess
p "getin lib/oddb/import/gkv.rb#postprocess"
    Drugs::Package.search_by_code(:type => 'zuzahlungsbefreit',
                                  :value => 'true',
                                  :country => 'DE').each { |package|
      pzn = package.code(:cid).value
      unless(@confirmed_pzns.include?(pzn))
        @deleted += 1
        package.code(:zuzahlungsbefreit).value = false
        save package
      end
    } unless(@confirmed_pzns.empty?)
p Drugs::Product.all.length # 7990 on 20101018, 7913 on 20100908
    Drugs::Product.all { |product|
      unless(product.company)
if product.name.de == nil
  pp product
end

Result

#<ODDB::Drugs::Product:0x7f07a05fd8f0
 @name=
  #<ODDB::Util::Multilingual:0x7f07a05fcf90
   @canonical={:de=>nil},
   @synonyms=[]>,
 @odba_id=3480899,
 @odba_observers=[],
 @odba_persistent=true,
 @sequences=
  #<ODBA::Stub:69835366067020#3480900 @odba_class=Array @odba_container=69835366067320#3480899>>

Study updating process of import_gkv

suspend

Set up yus password on production server

  • Made all files and put them on production server
view · edit · sidebar · attach · print · history
Page last modified on July 13, 2011, at 11:58 AM