oddb2xml pauses for about 30 Minutes after emitting 2016-01-28 13:10:43 +0100: At row 18750 iksnr 65855 key "7680653150012" WITHOUT nil
.
Why? If you add --log
to the options one sees many lines of the form 2016-02-01 09:06:00: build_article 31500 of 172536 articles
2016-02-01 09:06:08: build_article 32000 of 172536 articles
. I just take many minutes, as outputting 100 articles takes (with my CPU-power) about 8 seconds. Only one core is used. And parrallelizing the output process (for 1 XML files) is not so simple, but would surely require 4 to 16 hours (or even more).
When visiting http://ch.oddb.org/de/gcc/search/zone/drugs/search_query/deponit/search_type/st_sequence?#best_result, the correct ATC code Glyceroltrinitrat (C01DA02) is shown, but the link "TK", aka Tageskosten, aka DDD is http://ch.oddb.org/de/gcc/ddd_price/reg/45986/seq/03/pack/071/search_query/deponit/search_type/st_sequence
. Zeno says, that this is not the correct link, but for me it looks as the product links also to http://ch.oddb.org/de/gcc/compare/ean13/7680459860719
. All links look for me okay, as it uses IKSNR 45986, sequence 03 and pack 071.
I looked at the wrong Link. The link named "WHO-DDD" links to http://ch.oddb.org/de/gcc/ddd/atc_code/C01DA instead of http://ch.oddb.org/de/gcc/ddd/atc_code/C01DA02.
Fixed the problem in AtcFacades, where it did not look via packages.has_ddd?.Also fixed adding best_results only for price_comparision and combined search. Running watir tests before pushing commits.
Pushed commits
A user remarked that quite a few DDD values are missing. Eg. are
Other values are not correct. E.g. are
I am having a closer look at C01DA02. WHO has published the followin values (See http://www.whocc.no/atc_ddd_index/)
New search Show text from Guidelines C CARDIOVASCULAR SYSTEM C01 CARDIAC THERAPY C01D VASODILATORS USED IN CARDIAC DISEASES C01DA Organic nitrates ATC code Name DDD U Adm.R Note C01DA02 glyceryl trinitrate 5 mg O 2.5 mg oral aerosol 2.5 mg SL 5 mg TD List of abbreviations Last updated: 2015-12-16
Poking around in bin/admin
ch.oddb> registration('45986').package('071').name -> Deponit 5, emplâtre ch.oddb> registration('45986').package('071').ddd_dose -> ch.oddb> registration('45986').package('071').galenic_forms.first -> emplâtre ch.oddb> registration('45986').package('071').barcode -> 7680459860719 ch.oddb> registration('45986').package('071').ddd_price -> 0.43 ch.oddb> registration('45986').package('071').ddd.dose -> 5 mg ch.oddb> registration('45986').package('071').price_public -> 16.20 ch.oddb> registration('45986').package('071').longevity.class -> NilClass ch.oddb> registration('45986').package('071').galenic_group -> unbekannt ch.oddb> registration('45986').package('071').route_of_administration.class -> NilClass
When looking at how the DDD-price is calculated in src/util/model/package.rb we find the code
def ddd_price if(!@disable_ddd_price && (ddd = self.ddd) \ && (price = price_public) && (ddose = ddd.dose) && (mdose = dose) \ && size = comparable_size) _ddd_price = 0.00 factor = (longevity || 1).to_f if (grp = galenic_group) && grp.match(@@ddd_galforms) if(mdose > (ddose * factor)) _ddd_price = (price / size.to_f) / factor else _ddd_price = (price / size.to_f) \ * (ddose.to_f * factor / mdose.want(ddose.unit).to_f) / factor end else # This is valid only for the following case, for example, mdose unit: mg/ml, size unit: ml # ddd.dose (ddose): the amount of active_agent required for one day # self.dose (mdose): (usually) the amount of active_agent included in one unit of package # but in the case of mg/ml, mdose means not 'amount' but 'concentration' # size: total amount of package begin if size.to_s.match(@@ddd_grmforms) unless mdose.to_g == 0 _ddd_price = (price / ((size / mdose.to_g).to_f / ddose.to_f)) / factor end else _ddd_price = (price / ((size * mdose).to_f / ddose.to_f)) / factor end rescue StandardError end end unless _ddd_price.to_s.match(/^0.*0$/u) _ddd_price end end rescue RuntimeError end
In this case I think the case of gal_group unknown is not correct. Why did we not get a correct galenic_group?
Looking at the atc-code in bin/admin
ch.oddb> atc_class('C01DA02') -> Glyceroltrinitrat ch.oddb> atc_class('C01DA02').has_ddd? -> true ch.oddb> atc_class('C01DA02').ddds.size -> 4 ch.oddb> atc_class('C01DA02').ddds.keys -> ["O", "SL", "TD", "oral aerosol"] ch.oddb> atc_class('C01DA02').ddds['TD'].dose -> 5 mg ch.oddb> atc_class('C01DA02').ddds['TD'].note -> ch.oddb> atc_class('C01DA02').ddds['TD'].administration_route -> TD ch.oddb> atc_classes.values.find_all{|x|x.ddds.size > 1}.size -> 540 ch.oddb> atc_classes.values.find_all{|x|x.ddds.size == 1}.size -> 1255 ch.oddb> atc_classes.values.find{|x|x.ddds.size > 1 && x.active_packages.size > 0}.code -> A01AA01 ch.oddb> atc_classes.values.find_all{|x|x.ddds.size > 1 && x.active_packages.size > 0}.size -> 282 ch.oddb> atc_classes.values.find_all{|x|x.ddds.size == 1 && x.active_packages.size > 0}.size -> 462 ch.oddb> $x=0; atc_classes.values.each{|x| $x+= x.active_packages.size if x.ddds.size == 1 }; $x -> 4515 ch.oddb> $y=0; atc_classes.values.each{|x| $y+= x.active_packages.size if x.ddds.size > 1 }; $y -> 4421 ch.oddb> $z=0; atc_classes.values.each{|x| $z+= x.active_packages.size if x.ddds.size == 0 }; $z -> 8266 ch.oddb> $xx = {}; active_sequences.each{|x| $xx[x.route_of_administration] ||= 0; $xx[x.route_of_administration] += 1; } -> Array ch.oddb> $xx.keys -> ["roa_O", "roa_P", "roa_SL", nil, "roa_TD", "roa_R", "roa_V"] ch.oddb> $xx.values -> [4755, 2134, 251, 2513, 596, 121, 29]
Do we support "TD" transdermal
? Not really. It is not taken into account correctly when we evaluate the DDD-price. Also I don't see where we set the ROA (route_of_administration) for the ATC-codes that have several ROAs.
I think we should first investigate and solve the problem, that many packages/sequences don't have a route of administration.
Created a list of all packages which have not defined a route_of_administration using bin/admin File.open('ausgabe3.txt', 'w+') do |f| active_sequences.each{|seq| seq.active_packages.each{|p| f.puts "#{p.iksnr}/#{p.seqnr}/#{p.ikscd} #{p.name} #{p.galenic_forms.first} #{p.galenic_forms.first.route_of_administration}" if p.galenic_forms.size > 0 && p.galenic_forms.first.route_of_administration == nil}} end
. See Attach:packages_without_roa.txt.
When looking at 45986 Deponit we see that the name contains "emplâtre" , but the unit (From colum M in Packungen.xlsx) is 'Pflaster'. But I see no way on how to extract the ROA from the packungen.xlsx and/or the AipsDownloads.xml.
Also passing the option -fix_galenic_form
to jobs/import_swissmedic_only creates new galenic_forms, but their group-id is always 1 (aka unknown).
Here is a list of all galenic_groups which have a roa of nil
Kaugummi roa nil Dialyse roa nil Tropfen roa nil Klebstoff roa nil Tests roa nil Lösungsmittel roa nil Tupfer/Gaze roa nil Tinkturen/Desinfektion roa nil Seifen und Shampoos roa nil unbekannt roa nil Augenmittel roa nil Inhalation roa nil Badezusatz roa nil Essbare roa nil Zubehör roa nil Nasenmittel roa nil
I think the most honest solution would be, not to publish DDDs for all cases where we don't have a ROA for the package and explain in the legend, that the data furnished by the federal authorities don't provide computer readable information about ist. It might be a time to propopse a new version for the AIPSdownsload.xml, which should include a much better XML format. Even if it does not get accepted immediately it might provide a good discussion base if we made a list of shortcomings, our proposels to overcome them and point to some examples to make it easier to grasp the problem.
We must adapt the FI-Parser to get the concerned sequences via the names from refdata. Eg. Tramal (IKSNR 43788) has different patinfos for "Tramal® Tropfen, Lösung zum Einnehmen" and "Tramal® Tropfen, Lösung zum Einnehmen mit Dosierpumpe)" in its packages 078 (50 ml) and 086 (30 ml). In Packungen.xlsx we find both times "Tramal, Tropfen".
Added some preliminary code and debugging info. Running sudo -u apache jobs/update_textinfo_swissmedicinfo --no-download --reparse --target=pi 43788
Collecting the information about all GTINs (packages) references by a IKSNR via refdata works. Will continue tomorrow.