view · edit · sidebar · attach · print · history

Index>

20150414-oddb2xml-with-parslet

Summary

  • Switch parsing composition for --calc in oddb2xml to use parslet

Commits

Index

Keep in Mind for work to do
  • Fix dojo error http://www.sitepen.com/blog/2012/10/31/debugging-dojo-common-error-messages/#forgot-dom-ready
  • I removed on May-27 tests for ix_registrationss, fix_sequences, fix_compositions, fix_packages from test/test_plugin/swissmedic.rb,as he could not find any references for them in the src code. Did I erroneously remove stuff when cleaning up the swissmedic import earlier?
  • The whole test for older/newer Packages must be adapted to xlsx. One must compare the rows (e.g. by creating csv files) and do the same stuff in xlsx!
  • creat gem: task: input=file with ean-codes, standard output show ean-codes + atc-code. Source is Swissmedic Packungen.xlsx or XML.
  • Import via data/medreg_companies.yaml
  • Fix problem with radioactivatum 99m-technetio when parsing Wirkstoffe

Switch parsing composition for --calc in oddb2xml to use parslet

Must bring down the number of errors. With the commits

I cleanup the code and parsing all compositions reports Parsed 8937 lines with 1187 errors in 70 seconds

Also I must handle errors like in http://ch.oddb.org/de/gcc/drug/reg/62432/seq/01 where we find the string hepar sulfuris D6 2,2 mg hypericum perforatum D2 0,66 mg where in my opinion it simply lacks a comma and should be hepar sulfuris D6 2,2 mg, hypericum perforatum D2 0,66 mg

Stumbled over the methode parse_with_debug which prints a nice tree where I am when an error occurs.

Reworking some lower level parts of the parslet parser to handle correctly stuff like substances with 'et'. One of the currently failing lines is "I) DTPa-IPV-Komponente (Suspension): toxoidum diphtheriae 30 U.I., toxoidum pertussis 25 µg et haemagglutininum filamentosum 25 µg" which is a nice, but not too complicated composition.

view · edit · sidebar · attach · print · history
Page last modified on April 14, 2015, at 08:37 PM