view · edit · sidebar · attach · print · history

20110215-testcases-ext-oddb_org

<< | Index | >>


  1. Update test-cases test_fachinfo_doc_parser.rb
  2. Update the other test-cases of fiparse
  3. Make suite.rb for ext modules

Goal/Estimate
  • test_fachinfo_doc_parser test-cases / 90%
Milestones
  1. test_fachinfo_doc_parser.rb 11:20
    • test_*11 all tests passed
    • test_*10
    • test_*12
    • test_*13 10:10
    • test_*9 10:40
  2. test_fachinfo_hpricot.rb
  3. test_fachinfo_pdf.rb
  4. test_fiparse.rb
  5. test_indications.rb
  6. test_patinfo_hpricot.rb
  7. suite.rb for ext modules
Summary
Commits
ToDo Tomorrow
Keep in Mind
  1. no more test_minifi.rb
  2. backup wiki
  3. On Ice

Update test-cases test_fachinfo_doc_parser.rb

Confirm the current status

masa@masa ~/ywesee/oddb.org/ext/fiparse/test $ ruby test_fachinfo_doc_parser.rb 
Loaded suite test_fachinfo_doc_parser
Started
...........FFFFFF......FFFFFF.E..F..
>------------------------------------------------------------------------
                   Disktest (10 &#65533;g<
>------------------------------------------------------------------------
                   Disktest (10 µ<
.................................................
Finished in 14.920982 seconds.

  1) Failure:
test_composition10(TestFachinfoDocParser10) [test_fachinfo_doc_parser.rb:1038]:
<nil> expected to be an instance of
<ODDB::Text::Chapter> but was
<NilClass>.

  2) Failure:
test_galenic_form10(TestFachinfoDocParser10) [test_fachinfo_doc_parser.rb:1051]:
<"Galenische Form und Wirkstoffmengen pro Einheit"> expected but was
<"Tropfen. 1 ml enth\303\244lt: 25 mg Hamameliswasser, 5 mg Augentrosttinktur, 0,9 mg Dexpanthenol.">.

  3) Failure:
test_iksnrs10(TestFachinfoDocParser10) [test_fachinfo_doc_parser.rb:1069]:
<nil> expected to be an instance of
<ODDB::Text::Chapter> but was
<NilClass>.

  4) Failure:
test_indications10(TestFachinfoDocParser10) [test_fachinfo_doc_parser.rb:1062]:
<nil> expected to be an instance of
<ODDB::Text::Chapter> but was
<NilClass>.

  5) Failure:
test_name10(TestFachinfoDocParser10) [test_fachinfo_doc_parser.rb:1033]:
<"Tendro, Augentropfen\n"> expected but was
<"Tendro, Augentropfen\n\t\t\t\t\t\t\t\t\t\tTentan AG\n">.

  6) Failure:
test_registration_owner10(TestFachinfoDocParser10) [test_fachinfo_doc_parser.rb:1079]:
<nil> expected to be an instance of
<ODDB::Text::Chapter> but was
<NilClass>.

  7) Failure:
test_composition12(TestFachinfoDocParser12) [test_fachinfo_doc_parser.rb:1182]:
<nil> expected to be an instance of
<ODDB::Text::Chapter> but was
<NilClass>.

  8) Failure:
test_galenic_form12(TestFachinfoDocParser12) [test_fachinfo_doc_parser.rb:1195]:
<"Forme gal\351nique et quantit\351 de principe actif par unit\351"> expected but was
<"a       Principe actif : extrait de millepertuis (Hyperici herba extractum).">.

  9) Failure:
test_iksnrs12(TestFachinfoDocParser12) [test_fachinfo_doc_parser.rb:1213]:
<nil> expected to be an instance of
<ODDB::Text::Chapter> but was
<NilClass>.

 10) Failure:
test_indications12(TestFachinfoDocParser12) [test_fachinfo_doc_parser.rb:1206]:
<nil> expected to be an instance of
<ODDB::Text::Chapter> but was
<NilClass>.

 11) Failure:
test_name12(TestFachinfoDocParser12) [test_fachinfo_doc_parser.rb:1177]:
<"Yakona-Hypericum\n"> expected but was
<"Yakona-Hypericum\n\t\t\t\t\t\t\t\t\t\tTentan AG\n">.

 12) Failure:
test_registration_owner12(TestFachinfoDocParser12) [test_fachinfo_doc_parser.rb:1223]:
<nil> expected to be an instance of
<ODDB::Text::Chapter> but was
<NilClass>.

 13) Error:
test_date13(TestFachinfoDocParser13):
NoMethodError: undefined method `text' for nil:NilClass
    test_fachinfo_doc_parser.rb:1299:in `test_date13'

 14) Failure:
test_name13(TestFachinfoDocParser13) [test_fachinfo_doc_parser.rb:1247]:
<1> expected but was
<2>.

85 tests, 360 assertions, 13 failures, 1 errors

Note

  • 'test_*9' are commented out since they stop with a segmentation fault
  • They from 'test_*9' to 'test_*13' are tests for a new format of fachinfo doc

Next

  • Focus on 'test_*10'
masa@masa ~/ywesee/oddb.org/ext/fiparse/test $ ruby test_fachinfo_doc_parser.rb 
Loaded suite test_fachinfo_doc_parser
Started
FFFFFF
Finished in 0.060513 seconds.

  1) Failure:
test_composition10(TestFachinfoDocParser10) [test_fachinfo_doc_parser.rb:1042]:
<nil> expected to be an instance of
<ODDB::Text::Chapter> but was
<NilClass>.

  2) Failure:
test_galenic_form10(TestFachinfoDocParser10) [test_fachinfo_doc_parser.rb:1055]:
<"Galenische Form und Wirkstoffmengen pro Einheit"> expected but was
<"Tropfen. 1 ml enth\303\244lt: 25 mg Hamameliswasser, 5 mg Augentrosttinktur, 0,9 mg Dexpanthenol.">.

  3) Failure:
test_iksnrs10(TestFachinfoDocParser10) [test_fachinfo_doc_parser.rb:1073]:
<nil> expected to be an instance of
<ODDB::Text::Chapter> but was
<NilClass>.

  4) Failure:
test_indications10(TestFachinfoDocParser10) [test_fachinfo_doc_parser.rb:1066]:
<nil> expected to be an instance of
<ODDB::Text::Chapter> but was
<NilClass>.

  5) Failure:
test_name10(TestFachinfoDocParser10) [test_fachinfo_doc_parser.rb:1037]:
<"Tendro, Augentropfen\n"> expected but was
<"Tendro, Augentropfen\n\t\t\t\t\t\t\t\t\t\tTentan AG\n">.

  6) Failure:
test_registration_owner10(TestFachinfoDocParser10) [test_fachinfo_doc_parser.rb:1083]:
<nil> expected to be an instance of
<ODDB::Text::Chapter> but was
<NilClass>.

6 tests, 8 assertions, 6 failures, 0 errors

Memo

  • I have confirmed that the test data for test_*10, test_*12, fachinfo doc in 2006 and 2007, is not recognized at all
  • If I modify the first line of the doc, it is recognized
  • I should update the source code to recognize the new format

Confirm the class (data) structure of rwv2

  • Keywords
    • chapter
    • section
    • paragraph

test.rb

require 'rwv2'
require 'rwv2/handlers'

class TextHandler < Rwv2::TextHandler
  def run_of_text(text, character_properties)
#    puts text
  end
  def section_start(section_properties)
    p "section_start"
  end
  def section_end
    p "section_end"
  end
  def page_break
    p "page_break"
  end
  def paragraph_start(paragraph_properties)
    p "paragraph_start"
  end
  def paragraph_end
    p "paragraph_end"
  end
end

parser = Rwv2.create_parser('test.doc')
parser.set_text_handler(TextHandler.new)
parser.parse

Result

masa@masa ~/work $ ruby test.rb 
"section_start"
"paragraph_start"
"paragraph_end"
"paragraph_start"
"paragraph_end"
...
"paragraph_start"
"paragraph_end"
"section_end"

Note

  • 'page_break' is never called
  • The document hierarchy (structure) looks as follows:
    • section - paragraph (=line) - (character)
  • I have to check the cpp code if I understand the 'character_properties', 'section_properties', and 'paragraph_properties'

Experiment

test.rb

require 'rwv2'
require 'rwv2/handlers'

class TextHandler < Rwv2::TextHandler
  def run_of_text(text, character_properties)
    puts text
  end
  def section_start(section_properties)
    p "section_start"
  end
  def section_end
    p "section_end"
  end
  def page_break
    p "page_break"
  end
  def paragraph_start(paragraph_properties)
    p "paragraph_start"
  end
  def paragraph_end
    p "paragraph_end"
  end
end

parser = Rwv2.create_parser('test.doc')
parser.set_text_handler(TextHandler.new)
parser.parse

Result

masa@masa ~/work $ ruby test.rb 
"section_start"
"paragraph_start"
Fachinformation 
Tendro, Augentropfen
"paragraph_end"
"paragraph_start"
1
7.September 2007                Seite 
 PAGE 
2
 von 
 NUMPAGES \*Arabic 
2
"paragraph_end"
"paragraph_start"
Tendro, Augentropfen
"paragraph_end"
"paragraph_start"
"paragraph_end"
"paragraph_start"
Tentan AG
"paragraph_end"
"paragraph_start"
"paragraph_end"
"paragraph_start"
"paragraph_end"
"paragraph_start"
Zusammensetzung
"paragraph_end"
"paragraph_start"
a       
Wirkstoffe:      
Hamameliswasser, Augentrosttinktur, Dexpanthenol
"paragraph_end"
"paragraph_start"
b.       Hilfsstoff:      
Konservierungsmittel: Phenylquecksilberborat
, 
sowie weitere 
"paragraph_end"
"paragraph_start"
Hilfsstoffe.
"paragraph_end"
"paragraph_start"
"paragraph_end"
"paragraph_start"
Galenische Form und Wirkstoffmengen pro Einheit
"paragraph_end"
...

Note

  • From this result, as far as I understand,
    • 'run_of_text' is called every line
    • 'line feed code' looks recognized as a new 'paragraph'

Consideration

  • '@chapter' property in fachinfo_doc.rb is corresponding to one block of the documents starting with bold line, for example
 Indikationen/Anwendungsmöglichkeiten
 Augentropfen bei leichten Reizungen der Augen (Brennen), 
 zur Befeuchtung der Augen bei Müdigkeits- und Trockenheitsgefühl.
  • @chapter is an instance of ODDB::Text::Chapter
  • '@chapter.heading' becomes the string of the bold line
  • In some case, 'FachinfoTextHandler.writers[0].xxxx' is the corresponding the chapter property
    • for example, @writer.composition becomes a @chapter of 'Compostion' or 'Zusammensetzung'
    • ODDB::Test hierarchy is as follows
      • Chapter - Section - Paragraph
  • The point is how to set the document structure by using 'run_of_text', 'paragraph_start', and 'paragraph_end'

ext/fiparse/fiparse.rb#parse_fachinfo_doc

    def parse_fachinfo_doc(src)
      parser = Rwv2.create_parser_from_content(src)
      handler = FachinfoTextHandler.new
      parser.set_text_handler(handler)
      parser.set_table_handler(handler.table_handler)
      parser.parse
      if(handler.writers.empty?)
        ## Product-Name was not written large enough - retry with whatever was 
        #  the largest fontsize
        handler.cutoff_fontsize = handler.max_fontsize
        parser.parse
      end
      handler.writers.collect { |wt| wt.to_fachinfo }.compact.first
    end

Note

  • I do not understand the comment meaning but it seems that the document is parsed twice usually

Memo

  • One of the difficulties of the code is that there are too many branch structure (if condition)
  • That is why the function of the method becomes unclear, and it becomes difficult to trace the process

test_*10, test_*11, test_*12, test_*13 all passed

masa@masa ~/ywesee/oddb.org/ext/fiparse/test $ ruby test_fachinfo_doc_parser.rb 
Loaded suite test_fachinfo_doc_parser
Started
.....................................................................................
Finished in 15.136881 seconds.

85 tests, 394 assertions, 0 failures, 0 errors

Next (The last error of test_fachinfo_doc.rb)

  • test_*9 segumentation fault
masa@masa ~/ywesee/oddb.org/ext/fiparse/test $ ruby test_fachinfo_doc_parser.rb 
Loaded suite test_fachinfo_doc_parser
Started
test_fachinfo_doc_parser.rb:982: [BUG] Segmentation fault
ruby 1.8.6 (2009-06-08) [x86_64-linux]

Abgebrochen

Check

  • the test data, Calcitriol_f.doc

Experiment

  • Remove the italic format of the product name

Result

  • It works

Next

  • Swissmedic number recognition problem
  • If there is (') symbol between the swissmedic number, it is not recognized as a swissmedic number

Experiment

ext/fiparse/src/fachinfo_doc.rb#new_font

        elsif(!@in_table)
          # remove ' symbol of swissmedic number
          if text =~ /\(Swissmedic\)/
            text.gsub!(/(\d+).*?(\d+)/,'\1\2')
          end

Note

  • This is not smart but temporary solution

ext(fiparse/test/test_fachinfo_doc.rb#test_isknrs10

  def test_iksnrs10
    writer = @text_handler.writers.first
    chapter = writer.iksnrs
    assert_instance_of(ODDB::Text::Chapter, chapter)
    assert_equal('Zulassungsnummer', chapter.heading)
    assert_equal(1, chapter.sections.size)
    assert_equal(1, chapter.sections.first.paragraphs.size)
    paragraph = chapter.sections.first.paragraphs.first
    assert_equal("47831  (Swissmedic)", paragraph.text)
  end

Result

masa@masa ~/ywesee/oddb.org/ext/fiparse/test $ ruby test_fachinfo_doc_parser.rb 
Loaded suite test_fachinfo_doc_parser
Started
.
Finished in 0.011241 seconds.

1 tests, 5 assertions, 0 failures, 0 errors

Check the actual loading on the local oddb system

  • I have confirmed that the ' symbol is removed from the swissmedic number

Check final test (test_fachinfo_doc_parser.rb)

masa@masa ~/ywesee/oddb.org/ext/fiparse/test $ ruby test_fachinfo_doc_parser.rb 
Loaded suite test_fachinfo_doc_parser
Started
........................................................................................
Finished in 15.34317 seconds.

88 tests, 400 assertions, 0 failures, 0 errors

Commit

Update the other test-cases of fiparse

Check the current status

masa@masa ~/ywesee/oddb.org/ext/fiparse/test $ ruby test_fachinfo_hpricot.rb 
Loaded suite test_fachinfo_hpricot
Started
F...F....................................
Finished in 1.078624 seconds.

  1) Failure:
test_chapter(ODDB::FiParse::TestFachinfoHpricot) [test_fachinfo_hpricot.rb:52]:
<"1 Brausetablette enth\303\244lt: Carbasalatum calcicum 528\302\240mg corresp. Acidum Acetylsalicylicum 415\302\240mg, Acidum ascorbicum 250\302\240mg."> expected but was
<"1 Brausetablette enth\303\244lt: Carbasalatum calcicum 528 mg corresp. Acidum Acetylsalicylicum 415 mg, Acidum ascorbicum 250 mg.">.

  2) Failure:
test_composition1(ODDB::FiParse::TestFachinfoHpricotAlcaCDe) [test_fachinfo_hpricot.rb:113]:
<"1 Brausetablette enth\303\244lt: Carbasalatum calcicum 528\302\240mg corresp. Acidum Acetylsalicylicum 415\302\240mg, Acidum ascorbicum 250\302\240mg."> expected but was
<"1 Brausetablette enth\303\244lt: Carbasalatum calcicum 528 mg corresp. Acidum Acetylsalicylicum 415 mg, Acidum ascorbicum 250 mg.">.

41 tests, 100 assertions, 2 failures, 0 errors

Note

  • These are the failure of encoding

Commit

Check coverage of fiparse

 masa@masa ~/ywesee/oddb.org/ext/fiparse/test $ rcov suite.rb -t
 Loaded suite /usr/bin/rcov
 Started
 ..............................................................................................................................................................
 Finished in 20.883724 seconds.

 158 tests, 680 assertions, 0 failures, 0 errors
 65.6%   46 file(s)   11072 Lines   9630 LOC

Make suite.rb for ext modules

  • All the tests passed in ext modules

Memo

  • Some test files in ext directory are renamed like 'test_abc.rb' to 'abc_test.rb'
  • then these files indicate that it is not tested for the moment
  • ext/suite.rb lists each test file concretely
  • If we add more test files we also update ext/suite.rb too
  • the dependency of test-cases in 'oddb.org/ext' directory is removed but
  • the dependency of test-cases in 'oddb.org/test' directory is NOT removed
  • 'oddb.org/test/suite.rb' is modified to execute each test file independently
  • at the moment, 'oddb.org/test/suite.rb' and 'oddb.org/ext/suite.rb' cannot be executed at the same time by rcov command

Commit

view · edit · sidebar · attach · print · history
Page last modified on February 15, 2011, at 05:04 PM