view · edit · sidebar · attach · print · history

Index>

20160201-oddb2xml-hangs

Summary

  • Oddb2xml -e hangs for long time
  • Missing or wrong DDD values
  • Adapt FI-parser to use refdata names for packages

Commits

Index

Keep in Mind for work to do
  • Fix dojo error http://www.sitepen.com/blog/2012/10/31/debugging-dojo-common-error-messages/#forgot-dom-ready
  • I removed on May-27 tests for ix_registrationss, fix_sequences, fix_compositions, fix_packages from test/test_plugin/swissmedic.rb,as he could not find any references for them in the src code. Did I erroneously remove stuff when cleaning up the swissmedic import earlier?
  • The whole test for older/newer Packages must be adapted to xlsx. One must compare the rows (e.g. by creating csv files) and do the same stuff in xlsx!
  • creat gem: task: input=file with ean-codes, standard output show ean-codes + atc-code. Source is Swissmedic Packungen.xlsx or XML.
  • Import via data/medreg_companies.yaml
  • Fix problem with radioactivatum 99m-technetio when parsing Wirkstoffe
  • Fix galenic_forms when parsing swissmedic.xlsx
  • Cleanup generic_type. Replace it everywhere by sl_generic_type and adapt code accordingly.
  • Get updated ATC-codes from EPha for oddb.org, too.
  • Use refdatabase for oddb.org, too.
  • Check whether we should revert the part which touche src/plugin/text_info.rb of commit 17af82ba4d76a5838683411b260de265531f9e74. We should improve test/stub/oddbapp.rb to work similar for update/pointer as the real oddbapp. In this case we would have a good Stub for plugins. May we need a different stub when working with plugins (which create/modify/destroy ODDB-Objects), when in most other cases a very simple stub is sufficient.
  • When a logged in admin user changes an atc_code of a product, the corresponding atc_class must update its sequences, too.
  • Order of entering search type and value should not matter. Both should show long URL with search
  • Remove parser for minifi (but keep the minifi)

Oddb2xml -e hangs for long time without any progress

oddb2xml pauses for about 30 Minutes after emitting 2016-01-28 13:10:43 +0100: At row 18750 iksnr 65855 key "7680653150012" WITHOUT nil.

Why? If you add --log to the options one sees many lines of the form 2016-02-01 09:06:00: build_article 31500 of 172536 articles 2016-02-01 09:06:08: build_article 32000 of 172536 articles. I just take many minutes, as outputting 100 articles takes (with my CPU-power) about 8 seconds. Only one core is used. And parrallelizing the output process (for 1 XML files) is not so simple, but would surely require 4 to 16 hours (or even more).

  • Search result for deponit has wrong DDD link

When visiting http://ch.oddb.org/de/gcc/search/zone/drugs/search_query/deponit/search_type/st_sequence?#best_result, the correct ATC code Glyceroltrinitrat (C01DA02) is shown, but the link "TK", aka Tageskosten, aka DDD is http://ch.oddb.org/de/gcc/ddd_price/reg/45986/seq/03/pack/071/search_query/deponit/search_type/st_sequence. Zeno says, that this is not the correct link, but for me it looks as the product links also to http://ch.oddb.org/de/gcc/compare/ean13/7680459860719. All links look for me okay, as it uses IKSNR 45986, sequence 03 and pack 071.

I looked at the wrong Link. The link named "WHO-DDD" links to http://ch.oddb.org/de/gcc/ddd/atc_code/C01DA instead of http://ch.oddb.org/de/gcc/ddd/atc_code/C01DA02.

Fixed the problem in AtcFacades, where it did not look via packages.has_ddd?.Also fixed adding best_results only for price_comparision and combined search. Running watir tests before pushing commits.

Pushed commits

A user remarked that quite a few DDD values are missing. Eg. are

  • 7680459861365 C01DA02 Deponit 10, Matrixpfl 10 mg/24h, 30 Stk
  • 7680499010891 C01DA02 Nitro Dur 5, Matrixpfl 5 mg/24h, 100 Stk
  • 7680405580203 C01DA02 Nitrolingual, Vapo, 200 Dos
  • 7680527920109 G03CA03 Divigel, Gel, 28x 0.500 g

Other values are not correct. E.g. are

  • GTIN ATC_WHO Bezeichnung FAP PP DDD FAP/DDD PP/DDD PP/DDD_oddb
  • 7680459860719 C01DA02 Deponit 5, Matrixpfl 5 mg/24h, 10 Stk 6.97 16.20 10.000 0.70 1.62 0.43
  • 7680459861280 C01DA02 Deponit 10, Matrixpfl 10 mg/24h, 10 Stk 9.65 19.25 20.000 0.48 0.96 0.26
  • 7680499010389 C01DA02 Nitro Dur 5, Matrixpfl 5 mg/24h, 30 Stk 18.09 37.15 30.000 0.60 1.24 0.15
  • 7680364420183 C03CA01 Lasix, Inf Lös 250 mg/25ml i.v., 5 Amp 25 ml 11.70 25.70 31.250 0.37 0.82 0.21
  • 7680559760179 G03CA03 Estradot, Matrixpfl 25 mcg/24h, 8 Stk 9.10 18.65 14.000 0.65 1.33 11.96
  • 7680496000116 G03CA03 Vagifem, Vag Tabl 25 mcg, 15 Stk 15.96 34.70 15.000 1.06 2.31 185.07
  • 7680476930471 N03AG01 Depakine Chrono, Filmtabl 300 mg teilbar, 100 Stk 13.09 27.35 20.000 0.65 1.37 1.43

I am having a closer look at C01DA02. WHO has published the followin values (See http://www.whocc.no/atc_ddd_index/)

New search    Show text from Guidelines
C CARDIOVASCULAR SYSTEM
C01 CARDIAC THERAPY
C01D VASODILATORS USED IN CARDIAC DISEASES
C01DA Organic nitrates
ATC code  	Name  	DDD 	 U	 Adm.R	 Note
C01DA02  	glyceryl trinitrate 	5 	mg 	O 	
2.5 	mg 	oral aerosol 	
2.5 	mg 	SL 	
5 	mg 	TD 	
List of abbreviations
Last updated: 2015-12-16

Poking around in bin/admin

ch.oddb> registration('45986').package('071').name
-> Deponit 5, emplâtre
ch.oddb> registration('45986').package('071').ddd_dose
-> 
ch.oddb> registration('45986').package('071').galenic_forms.first
-> emplâtre
ch.oddb> registration('45986').package('071').barcode
-> 7680459860719
ch.oddb> registration('45986').package('071').ddd_price
-> 0.43
ch.oddb> registration('45986').package('071').ddd.dose
-> 5 mg
ch.oddb> registration('45986').package('071').price_public
-> 16.20
ch.oddb> registration('45986').package('071').longevity.class
-> NilClass
ch.oddb> registration('45986').package('071').galenic_group
-> unbekannt
ch.oddb> registration('45986').package('071').route_of_administration.class
-> NilClass

When looking at how the DDD-price is calculated in src/util/model/package.rb we find the code

def ddd_price
  if(!@disable_ddd_price && (ddd = self.ddd) \
    && (price = price_public) && (ddose = ddd.dose) && (mdose = dose) \
    && size = comparable_size)

    _ddd_price = 0.00
    factor = (longevity || 1).to_f
    if (grp = galenic_group) && grp.match(@@ddd_galforms) 
      if(mdose > (ddose * factor))
        _ddd_price = (price / size.to_f) / factor
      else
        _ddd_price = (price / size.to_f) \
          * (ddose.to_f * factor / mdose.want(ddose.unit).to_f) / factor
      end
    else
      # This is valid only for the following case, for example, mdose unit: mg/ml, size unit: ml
      # ddd.dose  (ddose): the amount of active_agent required for one day
      # self.dose (mdose): (usually) the amount of active_agent included in one unit of package
      # but in the case of mg/ml, mdose means not 'amount' but 'concentration'
      # size: total amount of package
      begin
        if size.to_s.match(@@ddd_grmforms)
          unless mdose.to_g == 0
            _ddd_price = (price / ((size / mdose.to_g).to_f / ddose.to_f)) / factor
          end
        else
          _ddd_price = (price / ((size * mdose).to_f / ddose.to_f)) / factor
        end
      rescue StandardError
      end
    end
    unless _ddd_price.to_s.match(/^0.*0$/u)
      _ddd_price
    end
  end
rescue RuntimeError
end

In this case I think the case of gal_group unknown is not correct. Why did we not get a correct galenic_group?

Looking at the atc-code in bin/admin

ch.oddb> atc_class('C01DA02')
-> Glyceroltrinitrat
ch.oddb> atc_class('C01DA02').has_ddd?
-> true
ch.oddb> atc_class('C01DA02').ddds.size
-> 4
ch.oddb> atc_class('C01DA02').ddds.keys
-> ["O", "SL", "TD", "oral aerosol"]
ch.oddb> atc_class('C01DA02').ddds['TD'].dose
-> 5 mg
ch.oddb> atc_class('C01DA02').ddds['TD'].note
-> 
ch.oddb> atc_class('C01DA02').ddds['TD'].administration_route
-> TD
ch.oddb> atc_classes.values.find_all{|x|x.ddds.size > 1}.size
-> 540
ch.oddb> atc_classes.values.find_all{|x|x.ddds.size == 1}.size
-> 1255
ch.oddb> atc_classes.values.find{|x|x.ddds.size > 1 && x.active_packages.size > 0}.code
-> A01AA01
ch.oddb> atc_classes.values.find_all{|x|x.ddds.size > 1 && x.active_packages.size > 0}.size
-> 282
ch.oddb> atc_classes.values.find_all{|x|x.ddds.size == 1 && x.active_packages.size > 0}.size
-> 462
ch.oddb> $x=0; atc_classes.values.each{|x| $x+= x.active_packages.size if x.ddds.size == 1 }; $x
-> 4515
ch.oddb> $y=0; atc_classes.values.each{|x| $y+= x.active_packages.size if x.ddds.size > 1 }; $y
-> 4421
ch.oddb> $z=0; atc_classes.values.each{|x| $z+= x.active_packages.size if x.ddds.size == 0 }; $z
-> 8266
ch.oddb> $xx = {}; active_sequences.each{|x| $xx[x.route_of_administration] ||= 0; $xx[x.route_of_administration] += 1; }
-> Array
ch.oddb> $xx.keys
-> ["roa_O", "roa_P", "roa_SL", nil, "roa_TD", "roa_R", "roa_V"]
ch.oddb> $xx.values
-> [4755, 2134, 251, 2513, 596, 121, 29]

Do we support "TD" transdermal? Not really. It is not taken into account correctly when we evaluate the DDD-price. Also I don't see where we set the ROA (route_of_administration) for the ATC-codes that have several ROAs.

I think we should first investigate and solve the problem, that many packages/sequences don't have a route of administration.

Created a list of all packages which have not defined a route_of_administration using bin/admin File.open('ausgabe3.txt', 'w+') do |f| active_sequences.each{|seq| seq.active_packages.each{|p| f.puts "#{p.iksnr}/#{p.seqnr}/#{p.ikscd} #{p.name} #{p.galenic_forms.first} #{p.galenic_forms.first.route_of_administration}" if p.galenic_forms.size > 0 && p.galenic_forms.first.route_of_administration == nil}} end. See Attach:packages_without_roa.txt.

When looking at 45986 Deponit we see that the name contains "emplâtre" , but the unit (From colum M in Packungen.xlsx) is 'Pflaster'. But I see no way on how to extract the ROA from the packungen.xlsx and/or the AipsDownloads.xml.

Also passing the option -fix_galenic_form to jobs/import_swissmedic_only creates new galenic_forms, but their group-id is always 1 (aka unknown).

Here is a list of all galenic_groups which have a roa of nil

Kaugummi roa nil
Dialyse roa nil
Tropfen roa nil
Klebstoff roa nil
Tests roa nil
Lösungsmittel roa nil
Tupfer/Gaze roa nil
Tinkturen/Desinfektion roa nil
Seifen und Shampoos roa nil
unbekannt roa nil
Augenmittel roa nil
Inhalation roa nil
Badezusatz roa nil
Essbare roa nil
Zubehör roa nil
Nasenmittel roa nil

I think the most honest solution would be, not to publish DDDs for all cases where we don't have a ROA for the package and explain in the legend, that the data furnished by the federal authorities don't provide computer readable information about ist. It might be a time to propopse a new version for the AIPSdownsload.xml, which should include a much better XML format. Even if it does not get accepted immediately it might provide a good discussion base if we made a list of shortcomings, our proposels to overcome them and point to some examples to make it easier to grasp the problem.

Adapt FI-parser to use refdata names for packages

We must adapt the FI-Parser to get the concerned sequences via the names from refdata. Eg. Tramal (IKSNR 43788) has different patinfos for "Tramal® Tropfen, Lösung zum Einnehmen" and "Tramal® Tropfen, Lösung zum Einnehmen mit Dosierpumpe)" in its packages 078 (50 ml) and 086 (30 ml). In Packungen.xlsx we find both times "Tramal, Tropfen".

Added some preliminary code and debugging info. Running sudo -u apache jobs/update_textinfo_swissmedicinfo --no-download --reparse --target=pi 43788

Collecting the information about all GTINs (packages) references by a IKSNR via refdata works. Will continue tomorrow.

view · edit · sidebar · attach · print · history
Page last modified on February 01, 2016, at 05:54 PM