view · edit · sidebar · attach · print · history

Index>

20150211-swissmedic-import-did-not-run

Summary

  • Swissmedic import did not run
  • "selling-units" for oddb2xml

Commits

Index

Keep in Mind for work to do
  • Fix dojo error http://www.sitepen.com/blog/2012/10/31/debugging-dojo-common-error-messages/#forgot-dom-ready
  • I removed on May-27 tests for ix_registrationss, fix_sequences, fix_compositions, fix_packages from test/test_plugin/swissmedic.rb,as he could not find any references for them in the src code. Did I erroneously remove stuff when cleaning up the swissmedic import earlier?
  • The whole test for older/newer Packages must be adapted to xlsx. One must compare the rows (e.g. by creating csv files) and do the same stuff in xlsx!
  • creat gem: task: input=file with ean-codes, standard output show ean-codes + atc-code. Source is Swissmedic Packungen.xlsx or XML.
  • Display 10 recalls not only those from this month
  • Import via data/medreg_companies.yaml

Swissmedic import did not run

On February 9 the import swissmedic did not run even, when there was a new packages.xlsx available. Looking in the debug log for explanations. Using grep -i packungen log/oddb/debug/2015/02.log finds

2015-02-08 07:25:06 +0100: /var/www/oddb.org/src/plugin/swissmedic.rb: 298 skip writing /var/www/oddb.org/data/xls/Packungen-2015.02.08.xlsx as /var/www/oddb.org/data/xls/Packungen-latest.xlsx is 2570511 bytes. Returning latest
2015-02-09 07:26:51 +0100: /var/www/oddb.org/src/plugin/swissmedic.rb: 298 skip writing /var/www/oddb.org/data/xls/Packungen-2015.02.09.xlsx as /var/www/oddb.org/data/xls/Packungen-latest.xlsx is 2570511 bytes. Returning latest
2015-02-09 16:58:06 +0100: /var/www/oddb.org/src/plugin/swissmedic.rb: 293 updated download.size is 2561217 -> /var/www/oddb.org/data/xls/Packungen-2015.02.09.xlsx 2561217/var/www/oddb.org/data/xls/Packungen-2015.02.09.xlsx now 2561217 bytes != /var/www/oddb.org/data/xls/Packungen-latest.xlsx 2570511
2015-02-09 16:58:06 +0100: /var/www/oddb.org/src/plugin/swissmedic.rb: 63 update target "/var/www/oddb.org/data/xls/Packungen-2015.02.09.xlsx" 2561217 bytes. Latest /var/www/oddb.org/data/xls/Packungen-latest.xlsx 2570511 bytes
2015-02-09 17:04:30 +0100: /var/www/oddb.org/src/plugin/swissmedic.rb: 75 Compared /var/www/oddb.org/data/xls/Packungen-2015.02.09.xlsx 2561217 bytes with /var/www/oddb.org/data/xls/Packungen-latest.xlsx 2570511 bytes
2015-02-09 19:12:14 +0100: /var/www/oddb.org/src/plugin/swissmedic.rb: 100 cp /var/www/oddb.org/data/xls/Packungen-2015.02.09.xlsx /var/www/oddb.org/data/xls/Packungen-latest.xlsx after 134 minutes
- Packungen
2015-02-10 07:26:49 +0100: /var/www/oddb.org/src/plugin/swissmedic.rb: 298 skip writing /var/www/oddb.org/data/xls/Packungen-2015.02.10.xlsx as /var/www/oddb.org/data/xls/Packungen-latest.xlsx is 2561217 bytes. Returning latest

It seems to me, that the Packungn.xlsx got uploaded on February 9 between 7:26 an 1:58. If Zeno did not remove data/xls/Packungen-latest.xlsx. During my code analysis I did not see any reason, why running jobs/import_swissmedic should result any different result than running jobs/import_daily as update_swissmedic is always called after up update_textinfo_swissmedicinfo in the daily run. (See src/util/updater.rb, line 198). Also on my oddb-ci2 the import of swissmedic was run automatically via import_daily. If this problem reoccurs, one must note before the size of a manually downloaded xlsx file, wait for 24h to be ensure a completion of import_daily and recheck again.

oddb2xmll must determine the number of smallest possible selling-units

We should try to have the same quality of the data after running oddb2xml as we have on ch.oddb.org. Also Zeno thinks, that Hannes had improved the matching for galenic groups/forms for de.oddb.org. Looking at the code of de.oddb.org.

Some intersting snippets are in lib/model/drugs/dose.rb where a string for quantities/units gets analysed. There are quite some tests from it under test/drugs/test_dose.rb. The galenic forms/groups are much more difficult to trace. Eg. they seem to be taken from an xls file a ./test/import/data/xls/darform_010706.xls. Did not find this xls under dmidi.de. http://scm.ywesee.com/?p=de.oddb.org/.git;a=commitdiff_plain;h=056cc4f3ecab88b2553cce1104acbbe7be168252 suggest that some xls file were downloaded from

But in some PDF files from dimdi, like http://www.dimdi.de/dynamic/de/amg/packungsgroessen/packungsgroessen-anlage-1-20150115.pdf you find many substances and their Darreichungsform.

In the lib/importer/dimdi.rb we find the method post_process with the intersting snippet

    def postprocess
      {
        'Injektion/Infusion' \
                          => [ 'P', 'Fertigspritzen' ],
        'Tabletten'       => [ 'O', 'Tabletten', 'Filmtabletten', 
                               'Kapseln', 'Dragees', 'Lacktabletten' ],
        'Transdermale Systeme' \
                          => [ 'TD', 'Pflaster, transdermal' ],
        'Tropfen'         => [ 'P', 'Tropfen' ],
        'Retard-Tabletten'=> [ 'O', 'Retardtabletten', 
                               'Retardfilmtabletten', 'Retardkapseln',
                               'Retarddragees' ],
        'Salben'          => [ 'T', 'Creme', 'Gel', 'Lotion', 'Salbe' ],
        'Suppositorien'   => [ 'R', 'Suppositorien' ], 
        'Vaginal-Produkte'=> [ 'V', 'Vaginalcreme', 'Vaginalovula', 
                               'Vaginaltabletten', 
                               'Vaginalsuppositorien']
      }.each { |groupname, formnames|
        group = Drugs::GalenicGroup.find_by_name(groupname) \
          || Drugs::GalenicGroup.new(groupname)
        group.administration = formnames.shift
        group.save
        formnames.each { |name|
          if((form = Drugs::GalenicForm.find_by_description(name)) \
             && !form.group)
            form.group = group
            form.save
          end
        }
      }
    end

But it looks like that last dimdi import run in 2010 or earlier, as the link is now longer working.

In log/import_pharma24 one finds also only traces of failed logins. If I interpete log/import_pharmnet (1.4GB!!!) correctly some results were imported in 2011, but for 2015 I only see results like

D, [2015-02-05T06:12:14.469388 #6290] DEBUG -- PharmNet: Searching for omeprazol Ct-
E, [2015-02-05T06:12:14.796111 #6290] ERROR -- PharmNet: Mechanize::ResponseCodeError: 503 => Net::HTTPServiceUnavailable

In lib/import/pharmnet we find a snippet how the galenic form is calculated

  def import_galenic_form(description)
    galform = Drugs::GalenicForm.find_by_description(description)
    unless galform
      galform = Drugs::GalenicForm.search_by_description(description).find do |gf|
        sim = ngram_similarity description, gf.description.de
        sim > 0.75
      end
      if galform
        galform.description.add_synonym description
        galform.save
      end
    end
    unless galform
      galform = Drugs::GalenicForm.new
      galform.description.de = description
      galform.save
    end
    galform
  end

I will attack now the problem on how to determine the number of minimal selling-units for a given drug. Zeno compiled a list of the most used variant for liquid and non-liquid drugs (column C). For column M he wants to use the following classification:

  • fest
  • flüssig
  • Masseinheit
  • other (not be considered at the moment)

This simplifies the recognition and I will use an array of pre-compiled values to search the corresponding.

We also will remove temporarily the following fields from the generate xml file

  • PKG_SIZE
  • COUNT
  • MULTI
  • ADDITION
  • SIZE

Goal ist to add a SELLING_UNITS field which should pass common sense reasoning, eg.

  • 85 g poudre -> 85 selling units (1 g)
  • 5 x 20 tablets => 100 (selling units tablets)
  • 5 x 10 Ampullen => 50 (Ampullen)

Selling units must be integer (as requested by Marco Descher)

Other requirement is, that we must be able to tell how many cases are matched by each rule.

Adding unit tests for selling units and a good report for each rule. Seems to work fine for some parts. Running it the first time and I still have 11145 occurrences of SELLING_UNITS>unbekannt in oddb_calc.xml. Looking at the CSV file to search for stuff to improve.

After some minor change the number of unkown selling_units dropped to 59. The measure is still completely false!

Pushed commits Calculate selling_units and Remove pry

view · edit · sidebar · attach · print · history
Page last modified on February 11, 2015, at 07:29 PM