We must replace the old plugin src/plugin/medreg_pharmacy.rb. I will generate a new plugin name src/plugin/refdata_company.rb to reflect the new origin of the data. But Refdata calls them Partner (http://www.refdata.ch/content/..%5Ccontent%5Cpartner_d.aspx?Nid=6&Aid=908&ID=412) and we can search using http://www.refdata.ch/content/partner_d.aspx?Nid=6&Aid=908&ID=412
But we can simply download the XLSX file from http://refdatabase.refdata.ch/Download/Partners.xlsx, We must match 10 ODDB::BaTypes to the types from RefData.
But parsing the file with rubyXL take a long time. A much faster version is to install gnumeric, call ssconvert Partners.xlsx Partners.csv
, which takes about 11 seconds then then load the file via
irb(main):001:0> require 'csv' => true irb(main):003:0> csv_text = File.read('Partners.csv'); csv_text.size => 23757504 irb(main):004:0> csv = CSV.parse(csv_text, :headers => true); csv.size => 269155 irb(main):005:0> csv.first => #<CSV::Row "PTYPE":"JUR" "GLN":"7601001028333" "STATUS":"I" "STDATE":"2007/10/09" "LANG":"DE" "DESCR1":"Fux Christine & Marcel" "DESCR2":"St Martini Apotheke" "ROLE_TYPE":"Pharm" "ROLE_STREET":nil "ROLE_STRNO":nil "ROLE_POBOX":nil "ROLE_ZIP":nil "ROLE_CITY":nil "ROLE_CTN":nil "ROLE_CNTRY":nil "DT":"2016/01/06"> irb(main):006:0> csv[1] => #<CSV::Row "PTYPE":"JUR" "GLN":"7601001367753" "STATUS":"A" "STDATE":"2010/09/06 11:25:10.387" "LANG":"DE" "DESCR1":"Amavita Apotheke Vorstadt" "DESCR2":"GaleniCare AG" "ROLE_TYPE":"Pharm" "ROLE_STREET":"Vorstadt" "ROLE_STRNO":"30/32" "ROLE_POBOX":nil "ROLE_ZIP":"8200" "ROLE_CITY":"Schaffhausen" "ROLE_CTN":"SH" "ROLE_CNTRY":"CH" "DT":"2016/01/06"> irb(main):008:0> csv.collect{|x| x['ROLE_TYPE']}.uniq => ["Pharm", "Indus", "Hosp", "DruSto", "SerFirm", "DoctMed", "PubHea", "Whole", "Pharmst", "Inst", "HeaIns", "IntOrg", "HeaEmpl", "NursHom", "ONursOrg", "SWFirm", "EmergServ", "Assoc", "NonHealthCare", "HeaTec", "AccIns", "HeaProd", "SpecPra", "Drugg", "GrpPra", "Dent", "Veter", "Nurse", "Lab", "Chiro", "HeaProv", "Physio", "LabLeader", "Midw", "Psycho", "Naturopath", "NutrAdv", "SocSec", "Spitex", "DentGrpPra", "CompTherapist", "VetGrpPra", "PrivPra", "Ergo", "MedPracAss", "DiabAdv", "SpeeTher", "PharmAss", "MedSecr", "EmergCent"] => #<CSV::Row "PTYPE":"JUR" "GLN":"7601002017688" "STATUS":"A" "STDATE":"2017/05/01 09:25:43.527" "LANG":"DE" "DESCR1":"Kantonsapotheke Zürich (KAZ)" "DESCR2":"Spitalapotheke" "ROLE_TYPE":"PubHea" "ROLE_STREET":"Südstrasse" "ROLE_STRNO":"3" "ROLE_POBOX":nil "ROLE_ZIP":"8952" "ROLE_CITY":"Schlieren" "ROLE_CTN":"ZH" "ROLE_CNTRY":"CH" "DT":"2017/05/15"> irb(main):017:0> csv.find_all{|x| x['ROLE_TYPE'].eql?('PubHea')}.last => #<CSV::Row "PTYPE":"JUR" "GLN":"7601001404656" "STATUS":"A" "STDATE":"2017/05/19 15:43:16" "LANG":"DE" "DESCR1":"Gesundheitsdirektion Kanton Zürich" "DESCR2":"eFaktura Listenspital" "ROLE_TYPE":"PubHea" "ROLE_STREET":"Stampfenbachstrasse" "ROLE_STRNO":"30" "ROLE_POBOX":nil "ROLE_ZIP":"8090" "ROLE_CITY":"Zürich" "ROLE_CTN":"ZH" "ROLE_CNTRY":"CH" "DT":"2017/05/21">
Also in the downloaded XLSX we have 269155 entries, whereas we are only interested in a lot less.
To distinguish between the last two entries we still have to get the details from https://www.medregbm.admin.ch/Betrieb/Search.
There is https://www.medregbm.admin.ch/Publikation/Liste which enables one to download a file Betriebe_20170809.xlsx, which is not recognized as a XMLX when running
> file ~/Downloads/Betriebe_20170809.xlsx /home/niklaus/Downloads/Betriebe_20170809.xlsx: Zip archive data, at least v2.0 to extract # but ssconvert can produce a valid csv > ssconvert ~/Downloads/Betriebe_20170809.xlsx Betriebe.csv > head -n3 Betriebe.csv "GLN Betrieb","Betriebsname 1","Betriebsname 2",Strasse,Nummer,PLZ,Ort,Bewilligungskanton,Land,Betriebstyp,"BTM Berechtigung" 7601001402034," Schloss Apotheke Parfumerie AG",,Rathausplatz,3,8500,Frauenfeld,Thurgau,Schweiz,"öffentliche Apotheke","6011 Verzeichnis a/b/c BetmVV-EDI" 7601001029323,"Aadorf Apotheke",,Bahnhofstrasse,8,8355,Aadorf,Thurgau,Schweiz,"öffentliche Apotheke","6011 Verzeichnis a/b/c BetmVV-EDI"
This CSV file contains 3290 entries.
Decided to use the Ox (XML-parser) http://www.ohler.com/ox as it self contained, much smaller and faster than nokogiri.
Parsing the partners.xml with ith was easy. Updated the unit tests, updater. Renamed jobs/import_medreg_betriebe => jobs/import_refdata_partners. Now testing the importer with sudo -u apache bundle-240 exec /usr/local/bin/ruby-240 jobs/import_refdata_partners
Will fix the error next week
Error: NoMethodError Message: undefined method `partners' for #<OddbPrevalence:0x00564771a03638> Backtrace: /var/www/oddb.org/src/util/oddbapp.rb:1580:in `block in method_missing' /var/www/oddb.org/src/util/oddbapp.rb:1579:in `synchronize' /var/www/oddb.org/src/util/oddbapp.rb:1579:in `method_missing' /var/www/oddb.org/src/plugin/refdata_partner.rb:167:in `get_detail_to_gln' /var/www/oddb.org/src/plugin/refdata_partner.rb:123:in `block in update' /var/www/oddb.org/src/plugin/refdata_partner.rb:123:in `each' /var/www/oddb.org/src/plugin/refdata_partner.rb:123:in `update' /var/www/oddb.org/src/util/updater.rb:272:in `block in update_regdata_partners' /var/www/oddb.org/src/util/updater.rb:543:in `wrap_update' /var/www/oddb.org/src/util/updater.rb:270:in `update_regdata_partners' jobs/import_refdata_partners:14:in `block in <module:Util>' /var/www/oddb.org/src/util/job.rb:40:in `run' jobs/import_refdata_partners:12:in `<module:Util>' jobs/import_refdata_partners:11:in `<module:ODDB>' jobs/import_refdata_partners:10:in `<main>'
Current state is Attach:import_refdata_partners_diff.text Δ
As I want to continue to check regularly for possible error seen in oddb.org I decided to add the script created yesterday (yesterday_errors) as bin/check_log_errors and add an option to analyse different days. Done with commit Added bin/check_log_errors