view · edit · sidebar · attach · print · history

20110526-update-zubef-flag-de_oddb

<< | Index | >>


  1. Check the result of import_gkv yesterday
  2. Summary of bugs
  3. Testcases for import_gkv, rpdf2txt, bin/admin
  4. Add a report function when the PDF writer for Zubef is different

Goal/Estimate/Evaluation
  • Debug zubef flag / 90% / 90%
Milestones
  • Check result 8:00
  • execute import_gkv, import_dimdi on server
Summary
Commits

Check the result of import_gkv yesterday

Note

  • The zuzahlungsbefreit flag of ACC 200 package (50 Brausetabletten) becomes true
  • There is not the package of 1887732 found in the Zubef-PDF.

Email

Wed May 25 18:54:13 2011: de.oddb.org ODDB::Import::Gkv#import
Imported  6850 Zubef-Entries on 25.05.2011:
Visited   6510 existing Zubef-Entries
Visited   6850 existing Companies
Visited   1304 existing Substances
Created    340 new Zubef-Entries
Created      0 new Products
Created      0 new Sequences
Created      0 new Companies
Created      0 new Substances
Assigned     0 Chemical Equivalences
Assigned     0 Companies
Created      0 Incomplete Packages:
Created      1 Product(s) without a name (missing product name):
http://de.oddb.org/de/drugs/product/uid/3480899

Console log

/usr/lib64/ruby/site_ruby/1.8/odba/marshal.rb:15:in `load': undefined class/module ODDB::Business::GrantDownload (ArgumentError)
        from /usr/lib64/ruby/site_ruby/1.8/odba/marshal.rb:15:in `load'
        from /usr/lib64/ruby/site_ruby/1.8/odba/cache.rb:616:in `restore'
        from /usr/lib64/ruby/site_ruby/1.8/odba/cache.rb:336:in `fetch_or_restore'
        from /usr/lib64/ruby/site_ruby/1.8/odba/cache.rb:330:in `call'
        from /usr/lib64/ruby/site_ruby/1.8/odba/cache.rb:330:in `fetch_or_do'
        from /usr/lib64/ruby/site_ruby/1.8/odba/cache.rb:335:in `fetch_or_restore'
        from /usr/lib64/ruby/site_ruby/1.8/odba/cache.rb:64:in `bulk_restore'
        from /usr/lib64/ruby/site_ruby/1.8/odba/cache.rb:61:in `each'
         ... 20 levels...
        from /usr/lib64/ruby/site_ruby/1.8/oddb/util/job.rb:4
        from /usr/lib64/ruby/site_ruby/1.8/rubygems/custom_require.rb:31:in `gem_original_require'
        from /usr/lib64/ruby/site_ruby/1.8/rubygems/custom_require.rb:31:in `require'
        from /home/masa/ywesee/de.oddb.org/jobs/import_gkv:7

Note

  • Looks fine
  • Zuzahlungbefreit flags are also updated correctly
  • Festbetragsgruppes are same as before
  • But One error comes

Commit

Run on server (on the screen 'import_dimdi' (Detached))

  1. git pull
  2. reboot de.oddb (svc -h /service/de.oddb)
  3. save_all_package (bin/admin)
  4. remove the latest Zubef-PDF
  5. import_gkv (ruby -I lib jobs/import_gkv)
  6. import_didmi (ruby -I lib jobs/import_dimdi)

Email

Thu May 26 10:57:15 2011: de.oddb.org ODDB::Import::Gkv#import
Imported  6850 Zubef-Entries on 26.05.2011:
Visited   6510 existing Zubef-Entries
Visited   6850 existing Companies
Visited   1304 existing Substances
Created    340 new Zubef-Entries
Created      0 new Products
Created      0 new Sequences
Created      0 new Companies
Created      0 new Substances
Assigned     0 Chemical Equivalences
Assigned     0 Companies
Created      0 Incomplete Packages:
Created      1 Product(s) without a name (missing product name):
http://de.oddb.org/de/drugs/product/uid/3480899
Thu May 26 15:15:50 2011: de.oddb.org ODDB::Import::Dimdi::Substance#import
NoMethodError
private method `split' called for nil:NilClass
./lib/oddb/import/importer.rb:16:in `capitalize_all'
./lib/oddb/import/dimdi.rb:515:in `import_row'
./lib/oddb/import/excel.rb:31:in `import_worksheet'
/usr/lib64/ruby/gems/1.8/gems/spreadsheet-0.6.3.1/lib/spreadsheet/worksheet.rb:112:in `call'
/usr/lib64/ruby/gems/1.8/gems/spreadsheet-0.6.3.1/lib/spreadsheet/worksheet.rb:112:in `each'
/usr/lib64/ruby/gems/1.8/gems/spreadsheet-0.6.3.1/lib/spreadsheet/worksheet.rb:111:in `upto'
/usr/lib64/ruby/gems/1.8/gems/spreadsheet-0.6.3.1/lib/spreadsheet/worksheet.rb:111:in `each'
/usr/lib64/ruby/gems/1.8/gems/spreadsheet-0.6.3.1/lib/spreadsheet/excel/worksheet.rb:34:in `each'
./lib/oddb/import/excel.rb:30:in `import_worksheet'
./lib/oddb/import/excel.rb:25:in `import'
./lib/oddb/util/updater.rb:106:in `reported_import'
./lib/oddb/util/updater.rb:113:in `call'
./lib/oddb/util/updater.rb:113:in `_reported_import'
./lib/oddb/util/updater.rb:106:in `reported_import'
./lib/oddb/util/updater.rb:42:in `import_dimdi_substances'
./lib/oddb/import/dimdi.rb:45:in `call'
./lib/oddb/import/dimdi.rb:45:in `download'
/usr/lib64/ruby/1.8/open-uri.rb:135:in `open_uri'
/usr/lib64/ruby/1.8/open-uri.rb:519:in `open'
/usr/lib64/ruby/1.8/open-uri.rb:30:in `open'
./lib/oddb/import/dimdi.rb:44:in `download'
./lib/oddb/util/updater.rb:41:in `import_dimdi_substances'
./lib/oddb/util/updater.rb:19:in `import_dimdi'
jobs/import_dimdi:12
./lib/oddb/util/job.rb:16:in `call'
./lib/oddb/util/job.rb:16:in `run'
jobs/import_dimdi:11
Imported 310 Substances per 01.04.2011:
Visited  268 existing
Visited   86 existing in Combinations
Created    0 new
Created    0 new from Combinations
Thu May 26 15:16:20 2011: de.oddb.org ODDB::Import::Dimdi::GalenicForm#import
Imported 113 Galenic Forms per 01.04.2011:
Visited  113 existing
Created    0 new
Thu May 26 15:16:34 2011: de.oddb.org ODDB::Import::Dimdi::Product#import
Imported   29830 Products per 01.04.2011:
Visited        0 existing Products
Visited        0 existing Sequences
Ignored        0 unknown Products
Created        0 new Sequences
Created        0 new Substances from Combinations
Renamed        0 Products
Reassigned     0 PZNs
Deleted        0 Products
Deleted        0 Sequences

Summary of bugs

Blogs (12.05 - 25.05)

  1. 20110512-debug-update-function-deOddb
  2. 20110513-debug-import-process-deOddb-debug-rpdf2txt
  3. 20110516-trace-rpdf2txt
  4. 20110517-update-rpdf2txt
  5. 20110518-update-active_agent-de_oddb
  6. 20110523-swissindex-csv-update-zubef-flag-de_oddb
  7. 20110524-update-zubef-flag-de_oddb

Problems (bugs)

  • Company names and active agent names are mangled in de.oddb
  • Some zuzahlungsbefreit flags are not checked even if they are written in the Zubef-PDF file.
  • import_gkv does not work correctly. All the zuzahlungsbefreit flags of package becomes false.

Causes

  1. The Zubef-PDF file had been different since 2011 (it is written by different PDF writer, from Acrobat Distiller 9.0 to pdfFactory 3.25)
  2. Because of that, Rpdf2txt could not read the PDF data.
  3. The last line (data) of a page is not outputed from rpdf2txt
  4. Rpdf2txt reverses the lines in the new PDF format, so the first line data of a page cannot be outputed
  5. Package#save method does not work correctly. The method deletes most of the package data.

Solutions

  1. Update rpdf2txt (to read a new PDF format)
  2. Update GkvHandler class (to read the first line data of a pdf page)
  3. Update the package saving process, calling twice Package#save method

Note

  • I do not know yet why the Package#save method must be called twice.

Testcases for import_gkv, rpdf2txt, bin/admin

Check the current testcases for lib/oddb/import/gkv.rb

masa@masa ~/ywesee/de.oddb.org $ rcov -I lib test/import/test_gkv.rb  -t
./lib/oddb.rb:4: warning: already initialized constant VERSION
Loaded suite /usr/bin/rcov
Started
.........
Finished in 0.157044 seconds.

9 tests, 44 assertions, 0 failures, 0 errors
75.3%   30 file(s)   2991 Lines   2603 LOC

Note

  • The changed parts are covered

Check the current testcases for rpdf2txt/object.rb

masa@masa ~/ywesee/rpdf2txt $ rcov -I lib test/test_pdf_object.rb -t
Loaded suite /usr/bin/rcov
Started
......................'invalid literal/lengths set' when filtering with /FlateDecode
.............................unknown encoding 370 0 R
...
Finished in 5.690296 seconds.

54 tests, 95 assertions, 0 failures, 0 errors
55.5%   36 file(s)   7945 Lines   6649 LOC

Note

  • A part of merge_snippets method is not covered

rpdf2txt/test/test_object.rb

#!/usr/bin/env ruby
# encoding: utf-8
# TestObject -- rpdf2txt -- 26.05.2011 -- mhatakeyama@ywesee.com

$: << File.expand_path('../lib', File.dirname(__FILE__))

require 'test/unit'
require 'flexmock'
require 'rpdf2txt/object'

module Rpdf2txt
  class TestPageLeaf < Test::Unit::TestCase
    include FlexMock::TestCase
    def test_merge_snippets
      pageleaf = Rpdf2txt::PageLeaf.new
      snippet1 = flexmock('snippet1',
                          :txt => 'txt1',
                          :txt= => nil
                         )
      snippet2 = flexmock('snippet2',
                          :txt => 'txt2',
                          :txt= => nil
                         )

      text_snippets = [snippet1, snippet2, snippet2]
      result = pageleaf.merge_snippets(text_snippets)
      assert_equal(2, result.length)
      assert_kind_of(snippet1.class, result[0])
      assert_kind_of(snippet2.class, result[1])
    end
  end
end

Result

masa@masa ~/ywesee/rpdf2txt $ rcov -I lib test/test_object.rb 
Loaded suite /usr/bin/rcov
Started
.
Finished in 0.000629 seconds.

1 tests, 3 assertions, 0 failures, 0 errors

Commit

Check the testcases for lib/oddb/util/server.rb

masa@masa ~/ywesee/de.oddb.org $ rcov -I lib test/util/test_server.rb -t
./lib/oddb.rb:4: warning: already initialized constant VERSION
./lib/oddb/html/view/drugs/package.rb:373: warning: parenthesize argument(s) for future version
Loaded suite /usr/bin/rcov
Started
......
Finished in 0.026684 seconds.

6 tests, 12 assertions, 0 failures, 0 errors
39.3%   167 file(s)   16883 Lines   15025 LOC

test/util/test_server.rb

      def test_delete_all_active_agent_but_first
        active_agent = flexmock('active_agent')
        package = flexmock('package',
                           :code => 'code',
                           :active_agents => [active_agent, active_agent]
                          )
        flexmock(active_agent, :package => package)
        def active_agent.delete
          self.package.active_agents.delete_at(1)
        end
        flexmock(ODDB::Drugs::Package, :all => [package])
        assert_equal([package], @server.delete_all_active_agent_but_first)
      end
      def test_save_all_package
        package = flexmock('package', :save => nil)
        flexmock(ODDB::Drugs::Package, :all => [package])
        assert_equal('Done', @server.save_all_package)
      end

Result

masa@masa ~/ywesee/de.oddb.org $ ruby -I lib test/util/test_server.rb
./lib/oddb.rb:4: warning: already initialized constant VERSION
./lib/oddb/html/view/drugs/package.rb:373: warning: parenthesize argument(s) for future version
Loaded suite test/util/test_server
Started
........
Finished in 0.03162 seconds.

8 tests, 14 assertions, 0 failures, 0 errors

Commit

Add a report function when the PDF writer for Zubef is different

Reference

pdf-reader

masa@masa ~/work/examples $ gem search pdf-reader

*** LOCAL GEMS ***

pdf-reader (0.8.6)

masa@masa ~/work/examples $ ruby metadata.rb testfile.pdf 
{:Producer=>"Mac OS X 10.2.2 Quartz PDFContext", :CreationDate=>"D:20021114130430Z00'00'", :ModDate=>"D:20021114130430Z00'00'", :Creator=>"BBEdit"}
nil

masa@masa ~/work/examples $ ruby metadata.rb zubef.pdf 
/usr/lib64/ruby/gems/1.8/gems/pdf-reader-0.8.6/lib/pdf/reader.rb:132:in `parse': PDF::Reader cannot read encrypted PDF files (PDF::Reader::UnsupportedFeatureError)
        from /usr/lib64/ruby/gems/1.8/gems/pdf-reader-0.8.6/lib/pdf/reader.rb:76:in `file'
        from /usr/lib64/ruby/gems/1.8/gems/pdf-reader-0.8.6/lib/pdf/reader.rb:75:in `open'
        from /usr/lib64/ruby/gems/1.8/gems/pdf-reader-0.8.6/lib/pdf/reader.rb:75:in `file'
        from metadata.rb:23

Note

  • It seems that the PDF writer (producer) information is written as 'Meta data' in the pdf file.
  • pdf-reader is NOT available for the Zubef PDF, because the metadata looks encrypted.

Next

  • How to get the meta data from an encrypted PDF file

origami-pdf

Reference

Install

masa@masa ~/work $ sudo gem install origami
masa@masa ~/work $ gem search origami

*** LOCAL GEMS ***

origami (1.0.2)

Test

require 'origami'
include Origami
require 'pp'

pdf = PDF.read('test2.pdf')
count = 0
pdf.objects.each do |ob|
  if ob.to_s =~ /pdfFactory/
    pp ob
    count += 1
  end
  exit if count > 10
end

Result

{#<Origami::Name:0x7f5e50c0ad68
  @file_offset=136,
  @generation=0,
  @indirect=false,
  @no=0,
  @parent={...},
  @value="Producer">=>
  "pdfFactory 3.25 (Windows Server 2003 R2 Standard Edition German)",
 #<Origami::Name:0x7f5e50c078c0
  @file_offset=213,
  @generation=0,
  @indirect=false,
  @no=0,
  @parent={...},
  @value="CreationDate">=>"D:20110516102057+02'00'",
 #<Origami::Name:0x7f5e50c0c7a8
  @file_offset=93,
  @generation=0,
  @indirect=false,
  @no=0,
  @parent={...},
  @value="Creator">=>"pdfFactory www.context-gmbh.de",
"pdfFactory 3.25 (Windows Server 2003 R2 Standard Edition German)"
"pdfFactory www.context-gmbh.de"

Note

  • It seems that Origami::Name@value='Producer' has the information
  • Anyway, it looks possible to get the PDF writer information by origami-pdf anyhow.
view · edit · sidebar · attach · print · history
Page last modified on May 27, 2011, at 07:35 AM