---
We must finish the work. Eg. parsing input for reading the list of GTINs. Parsing the Swissmedic_package.xlsx takes several minutes.
After discussing we Zeno we decided to offer a switch '--compare' which will download and compare the three files (BAG, SwissIndex and RefData) and outputting a list of all differences. This switch will override all others. To fetch the ATC-codes fort one or several GTINs you may specify them via parameters and it will just get the data from swissindex. This takes only about 8 seconds for the first run and 3 seconds for subsequents run (in the next 24 hours).
Okay. Looks good for me now. E.g
bundle exec bin/gtin2atc --compare 2>&1 | tee compare.log rm -f /opt/src/gtin2atc/XMLPublications.zip Opened /opt/src/gtin2atc/log.log 2015-01-26 14:47:08 +0100: Resumen: Found infos about 21577 entries BAG 9212 entries. 8 entries had not GTIN field.. Fetched from http://bag.e-mediat.net/SL2007.Web.External/File.axd?file=XMLPublications.zip SwissIndex 15775 entries. Fetched from https://index.ws.e-mediat.net/Swissindex/Pharma/ws_Pharma_V101.asmx?WSDL SwissMedic 18870 entries. Fetched from http://www.swissmedic.ch/arzneimittel/00156/00221/00222/00230/index.html?lang=de Matching 8964 items. Not in BAG 4590 Not in SwissIndex 5677 Not in Packungen 2346 ATC-Codes differ 0 2015-01-26 14:47:08 +0100: 32 done. Took 46 seconds
or looking for the ATC code of two GTINs
bundle exec bin/gtin2atc 7680147690482 7680353660163; cat gtin2atc.csv gtin,ATC,pharmacode 7680147690482,N07BC02,41803 7680353660163,B03AE10,20273
Pushed commit Fixed remaining problems
Zeno wishes the following improvements:
See commits
Running gtin2atc --compare
now produces
Result of verifing data from BAG (SL): BAG-data fetched from http://bag.e-mediat.net/SL2007.Web.External/File.axd?file=XMLPublications.zip. BAG had 9212 entries 8 entries had no GTIN field Not in SwissMedic 486 Not in SwissIndex 248 Comparing ATC-Codes between BAG and Swissmedic 8123 items had the same ATC code in BAG, SwissIndex and SwissMedic 830 are the same in SwissMedic and BAG 204 are different in SwissMedic and BAG 265 are shorter in SwissMedic than in BAG 11 are longer in SwissMedic than in BAG Comparing ATC-Codes between BAG and Swissindex 8123 items had the same ATC code in BAG, SwissIndex and SwissMedic 830 are the same in SwissIndex and BAG 0 are different in SwissMedic and BAG 0 are shorter in SwissIndex than in BAG 11 are longer in SwissIndex than in BAG Result of verifing data from swissmedic: SwissMedic had 18870 entries. Fetched from http://www.swissmedic.ch/arzneimittel/00156/00221/00222/00230/index.html?lang=de SwissIndex 15775 entries. Fetched from https://index.ws.e-mediat.net/Swissindex/Pharma/ws_Pharma_V101.asmx?WSDL BAG 9212 entries. 8 entries had no GTIN field. Fetched from http://bag.e-mediat.net/SL2007.Web.External/File.axd?file=XMLPublications.zip Matching 8603 items. Not in BAG 4590 Not in SwissIndex 5677 Comparing ATC-Codes between Swissmedic and Swissindex 3334 match 158 are different 3334 are the same in SwissIndex and SwissMedic 66 are shorter in SwissIndex 1032 are longer in SwissIndex Comparing all GTIN-codes: Found infos about 21577 entries BAG 9212 entries. 8 entries had no GTIN field. Fetched from http://bag.e-mediat.net/SL2007.Web.External/File.axd?file=XMLPublications.zip SwissIndex 15775 entries. Fetched from https://index.ws.e-mediat.net/Swissindex/Pharma/ws_Pharma_V101.asmx?WSDL SwissMedic 18870 entries. Fetched from http://www.swissmedic.ch/arzneimittel/00156/00221/00222/00230/index.html?lang=de 8592 items had the same ATC code in BAG, SwissIndex and SwissMedic 4590 not in BAG 5677 not in SwissIndex 2707 not in SwissMedic 11 ATC-Codes differed
Zeno remarked that in https://srv.elexis.info/jenkins/view/Artikelstamm/job/Artikelstamm%20Full%20Build/1/console we found entries like
7680559950068:Removing VetMed Article with ATC QP53AC11
This should not be necessary.
The relevant parts of the jenkin-ci build are
rvm use ruby-1.9.3-p448 gem update oddb2xml /usr/local/bin/oddb2xml -e JAVA="/usr/bin/java" $JAVA -jar ConvertOddb2XmlToArtikelstamm.jar --oddb2xmlArticleFile oddb_article.xml --oddb2xmlLimitationFile oddb_limitation.xml --oddb2xmlProductFile oddb_product.xml /usr/bin/xmllint --noout --schema /opt/artikelstamm/Elexis_Artikelstamm_v002.xsd artikelstamm_*.xml
Creating a unit-test for 7680559950068. The relevant part in SwissIndex_Pharma_DE.xml is
<ITEM DT="2014-10-17T00:00:00"> <GTIN>7680559950068</GTIN> <PHAR>2930393</PHAR> <STATUS>A</STATUS> <STDATE>2005-02-02T00:00:00</STDATE> <LANG>DE</LANG> <DSCR>SCALIBOR Protectorband 65 cm gross f Hunde</DSCR> <ADDSCR>1 Stk</ADDSCR> <COMP> <NAME>MSD Animal Health GmbH</NAME> <GLN>7601001053854</GLN> </COMP> </ITEM>
I don't know how we could determine that this is for veterinary use. Or should the absence of in 7680559950068 in Publications.xls trigger this action? When building the article.xml I have the following info at hand (from pry).
Skipping vet ?? 7680559950068 {:refdata=>true, :_type=>:pharma, :ean=>"7680559950068", :pharmacode=>"2930393", :stat_date=>"", :lang=>"DE", :desc=>"SCALIBOR Protectorband 65 cm gross f Hunde", :atc_code=>"", :additional_desc=>"1 Stk", :company_name=>"MSD Animal Health GmbH", :company_ean=>"7601001053854"}
Or should I exclude all articles from the company 'MSD Animal Health GmbH'.
Discussed with Marco Descher the problem. He marked all ATC-Code from the group 'Q';
Side note:
His biggest problem with oddb2xml are the 2316 entries of "Invalid number", eg. 7680628610022: Invalid number string 6 x 2.5 ml in Product/PackGrSwissmedic
, where he would like to have correct description of the package size/unit. Created a small ruby-helper Attach:oddb2xml_invalid_number.rb and created a uniq list of used patterns. See Attach:oddb2xml_invalid_number_log.txt. Most of them would be easy to parse, but a few will probably impossible to convert, as they don't fit the pattern of quantity/size. E.g 84 (4 x 21)
,
84+88