On February 9 the import swissmedic did not run even, when there was a new packages.xlsx available. Looking in the debug log for explanations. Using grep -i packungen log/oddb/debug/2015/02.log
finds
2015-02-08 07:25:06 +0100: /var/www/oddb.org/src/plugin/swissmedic.rb: 298 skip writing /var/www/oddb.org/data/xls/Packungen-2015.02.08.xlsx as /var/www/oddb.org/data/xls/Packungen-latest.xlsx is 2570511 bytes. Returning latest 2015-02-09 07:26:51 +0100: /var/www/oddb.org/src/plugin/swissmedic.rb: 298 skip writing /var/www/oddb.org/data/xls/Packungen-2015.02.09.xlsx as /var/www/oddb.org/data/xls/Packungen-latest.xlsx is 2570511 bytes. Returning latest 2015-02-09 16:58:06 +0100: /var/www/oddb.org/src/plugin/swissmedic.rb: 293 updated download.size is 2561217 -> /var/www/oddb.org/data/xls/Packungen-2015.02.09.xlsx 2561217/var/www/oddb.org/data/xls/Packungen-2015.02.09.xlsx now 2561217 bytes != /var/www/oddb.org/data/xls/Packungen-latest.xlsx 2570511 2015-02-09 16:58:06 +0100: /var/www/oddb.org/src/plugin/swissmedic.rb: 63 update target "/var/www/oddb.org/data/xls/Packungen-2015.02.09.xlsx" 2561217 bytes. Latest /var/www/oddb.org/data/xls/Packungen-latest.xlsx 2570511 bytes 2015-02-09 17:04:30 +0100: /var/www/oddb.org/src/plugin/swissmedic.rb: 75 Compared /var/www/oddb.org/data/xls/Packungen-2015.02.09.xlsx 2561217 bytes with /var/www/oddb.org/data/xls/Packungen-latest.xlsx 2570511 bytes 2015-02-09 19:12:14 +0100: /var/www/oddb.org/src/plugin/swissmedic.rb: 100 cp /var/www/oddb.org/data/xls/Packungen-2015.02.09.xlsx /var/www/oddb.org/data/xls/Packungen-latest.xlsx after 134 minutes - Packungen 2015-02-10 07:26:49 +0100: /var/www/oddb.org/src/plugin/swissmedic.rb: 298 skip writing /var/www/oddb.org/data/xls/Packungen-2015.02.10.xlsx as /var/www/oddb.org/data/xls/Packungen-latest.xlsx is 2561217 bytes. Returning latest
It seems to me, that the Packungn.xlsx got uploaded on February 9 between 7:26 an 1:58. If Zeno did not remove data/xls/Packungen-latest.xlsx. During my code analysis I did not see any reason, why running jobs/import_swissmedic
should result any different result than running jobs/import_daily
as update_swissmedic
is always called after up update_textinfo_swissmedicinfo
in the daily run. (See src/util/updater.rb, line 198). Also on my oddb-ci2 the import of swissmedic was run automatically via import_daily. If this problem reoccurs, one must note before the size of a manually downloaded xlsx file, wait for 24h to be ensure a completion of import_daily and recheck again.
We should try to have the same quality of the data after running oddb2xml as we have on ch.oddb.org. Also Zeno thinks, that Hannes had improved the matching for galenic groups/forms for de.oddb.org. Looking at the code of de.oddb.org.
Some intersting snippets are in lib/model/drugs/dose.rb where a string for quantities/units gets analysed. There are quite some tests from it under test/drugs/test_dose.rb. The galenic forms/groups are much more difficult to trace. Eg. they seem to be taken from an xls file a ./test/import/data/xls/darform_010706.xls. Did not find this xls under dmidi.de. http://scm.ywesee.com/?p=de.oddb.org/.git;a=commitdiff_plain;h=056cc4f3ecab88b2553cce1104acbbe7be168252 suggest that some xls file were downloaded from
But in some PDF files from dimdi, like http://www.dimdi.de/dynamic/de/amg/packungsgroessen/packungsgroessen-anlage-1-20150115.pdf you find many substances and their Darreichungsform.
In the lib/importer/dimdi.rb we find the method post_process with the intersting snippet
def postprocess { 'Injektion/Infusion' \ => [ 'P', 'Fertigspritzen' ], 'Tabletten' => [ 'O', 'Tabletten', 'Filmtabletten', 'Kapseln', 'Dragees', 'Lacktabletten' ], 'Transdermale Systeme' \ => [ 'TD', 'Pflaster, transdermal' ], 'Tropfen' => [ 'P', 'Tropfen' ], 'Retard-Tabletten'=> [ 'O', 'Retardtabletten', 'Retardfilmtabletten', 'Retardkapseln', 'Retarddragees' ], 'Salben' => [ 'T', 'Creme', 'Gel', 'Lotion', 'Salbe' ], 'Suppositorien' => [ 'R', 'Suppositorien' ], 'Vaginal-Produkte'=> [ 'V', 'Vaginalcreme', 'Vaginalovula', 'Vaginaltabletten', 'Vaginalsuppositorien'] }.each { |groupname, formnames| group = Drugs::GalenicGroup.find_by_name(groupname) \ || Drugs::GalenicGroup.new(groupname) group.administration = formnames.shift group.save formnames.each { |name| if((form = Drugs::GalenicForm.find_by_description(name)) \ && !form.group) form.group = group form.save end } } end
But it looks like that last dimdi import run in 2010 or earlier, as the link is now longer working.
In log/import_pharma24 one finds also only traces of failed logins. If I interpete log/import_pharmnet (1.4GB!!!) correctly some results were imported in 2011, but for 2015 I only see results like
D, [2015-02-05T06:12:14.469388 #6290] DEBUG -- PharmNet: Searching for omeprazol Ct- E, [2015-02-05T06:12:14.796111 #6290] ERROR -- PharmNet: Mechanize::ResponseCodeError: 503 => Net::HTTPServiceUnavailable
In lib/import/pharmnet we find a snippet how the galenic form is calculated
def import_galenic_form(description) galform = Drugs::GalenicForm.find_by_description(description) unless galform galform = Drugs::GalenicForm.search_by_description(description).find do |gf| sim = ngram_similarity description, gf.description.de sim > 0.75 end if galform galform.description.add_synonym description galform.save end end unless galform galform = Drugs::GalenicForm.new galform.description.de = description galform.save end galform end
I will attack now the problem on how to determine the number of minimal selling-units for a given drug. Zeno compiled a list of the most used variant for liquid and non-liquid drugs (column C). For column M he wants to use the following classification:
This simplifies the recognition and I will use an array of pre-compiled values to search the corresponding.
We also will remove temporarily the following fields from the generate xml file
Goal ist to add a SELLING_UNITS field which should pass common sense reasoning, eg.
Selling units must be integer (as requested by Marco Descher)
Other requirement is, that we must be able to tell how many cases are matched by each rule.
Adding unit tests for selling units and a good report for each rule. Seems to work fine for some parts. Running it the first time and I still have 11145 occurrences of SELLING_UNITS>unbekannt
in oddb_calc.xml. Looking at the CSV file to search for stuff to improve.
After some minor change the number of unkown selling_units dropped to 59. The measure is still completely false!
Pushed commits Calculate selling_units and Remove pry