view · edit · sidebar · attach · print · history

20111107-remove-rockit-oddb_org

<< | Index | >>


  1. Replace Rockit library to normal regular expression

Goal/Estimate/Evaluation
  • Replace rockit library oddb.org / 50% / 50%

Replace Rockit library to normal regular expression

Task

  • Replace the process using rockit library in ODDB::Part#size= method to normal regular expression
  • src/model/part.rb
module ODDB
  module SizeParser
    unit_pattern = '(([kmµucMG]?([glLJm]|mol|Bq)\b)(\/([mµu]?[glL])\b)?)|((Mio\s)?U\.?I\.?)|(%( [mV]\/[mV])?)|(I\.E\.)|(Fl\.)'
    numeric_pattern = '\d+(\'\d+)*([.,]\d+)?'
    iso_pattern = "[[:alpha:]()\-]+"
    @@parser = Parse.generate_parser <<-EOG
Grammar OddbSize
  Tokens
    DESCRIPTION = /(?!#{unit_pattern}\s)#{iso_pattern}(\s+#{iso_pattern})*/u
    NUMERIC     = /#{numeric_pattern}/u
    SPACE       = /\s+/u [:Skip]
    UNIT        = /#{unit_pattern}/u
  Productions
    Size      ->  Multiple* Addition? Count? Measure? Scale? Dose? DESCRIPTION?
    Count     ->  'je'? NUMERIC
    Multiple  ->  NUMERIC UNIT? /[xXà]|Set/u
    Measure   ->  NUMERIC UNIT UNIT?
    Addition  ->  NUMERIC UNIT? '+'
    Scale     ->  '/' NUMERIC? UNIT
    Dose      ->  '(' NUMERIC UNIT ')'
    EOG
    def size=(size)
      unless size.to_s.strip.empty?
        @addition, @multi, @count, @measure, @scale, @comform = parse_size(size)
...

Note

  • The parts as I have known so far where the ODDB::Part#size= is used are:
    1. ODDB::BsvPlugin::PreparationsListener#tag_end (when 'Preparation') (src/plugin/bsv_xml.rb)
    2. ODDB::SwissmedicPlugin#update_package (src/plugin/swissmedic.rb)
  • Both are called from importer (ODDB::Updater)

Note (Pattern samples)

  • unit_pattern:
 mg kg ml mg mol Bg  'Mio U.I.'
  • numeric_pattern:
 123  123'456 123.4 
  • iso_pattern:
 abc abc(def) abc-def

Note

  • Ruby regular expression: left-most and longest match
  • example:
 /bc|ab/ =~ "abc"  => $& == 'ab' (not 'bc') left-most match
 /\w+/ =~ "abc"    => $& == 'abc' (not 'a') longest match
 /a|\w+/ =~ "abc"  => $& == 'a'  (not 'abc') left-most > longest

Note

Reference

Experiment

Result

"9 Suppositorien"        : [[], nil, [nil, "9"], nil, nil, nil, "Suppositorien"]
"10 "                    : [[], nil, [nil, "10"], nil, nil, nil, nil]
"200 ml"                 : [[], nil, nil, ["200", "ml", nil], nil, nil, nil]
"10x200 ml"              : [["10", nil, "x"], nil, nil, ["200", "ml", nil], nil, nil, nil]

Result

"9 Suppositorien"        : [[], nil, [nil, "9"], nil, nil, nil, "Suppositorien"]
"10 "                    : [[], nil, [nil, "10"], nil, nil, nil, nil]
"200 ml"                 : [[], nil, nil, ["200", "ml", nil], nil, nil, nil]
"10x200 ml"              : [["10", nil, "x"], nil, nil, ["200", "ml", nil], nil, nil, nil]

Note

  • normal regular expression is much faster than rockit library (more than 100 times)

Experiment

Run

$ ruby test_rockit2.rb > test_rockit.dat
$ ruby test_parser2.rb > test_parser.dat

Diff (diff -y --suppress-common-lines -W 200 test_rockit.dat test_parser.dat)

Note

  • The following patterns of size string are mis-calculated
    1. "4000 in 5000 ml"
    2. "2000-3000 l"
    3. "10 ampoules 10 ml"
    4. "10x 250 in 500 ml"
    5. "50 Bolus/Boli"

Experiment

Run

$ ruby test_rockit3.rb > test_rockit.dat
$ ruby test_parser3.rb > test_parser.dat

Diff (diff -y --suppress-common-lines -W 200 test_rockit.dat test_parser.dat)

view · edit · sidebar · attach · print · history
Page last modified on November 08, 2011, at 07:29 AM