dev.ywesee.com - Niklaus/20170809-medwin-refdata

view · edit · sidebar · attach · print · history

Index>

20170809-medwin-refdata

Summary

Medwin Company Updater must use data from Refdata
Analyse errors found for oddb.org
Keep in Mind

Commits

https://github.com/ngiger/oddb.org/commit/2c839425c969f0e28f3fb246dd96d83afb4e2840 Added bin/check_log_errors

Index

Medwin Company Updater must use data from Refdata
Analyse errors found for oddb.org

Medwin Company Updater must use data from Refdata

We must replace the old plugin src/plugin/medreg_pharmacy.rb. I will generate a new plugin name src/plugin/refdata_company.rb to reflect the new origin of the data. But Refdata calls them Partner (http://www.refdata.ch/content/..%5Ccontent%5Cpartner_d.aspx?Nid=6&Aid=908&ID=412) and we can search using http://www.refdata.ch/content/partner_d.aspx?Nid=6&Aid=908&ID=412

But we can simply download the XLSX file from http://refdatabase.refdata.ch/Download/Partners.xlsx, We must match 10 ODDB::BaTypes to the types from RefData.

But parsing the file with rubyXL take a long time. A much faster version is to install gnumeric, call ssconvert Partners.xlsx Partners.csv, which takes about 11 seconds then then load the file via

irb(main):001:0> require 'csv'
=> true
irb(main):003:0> csv_text = File.read('Partners.csv'); csv_text.size
=> 23757504
irb(main):004:0> csv = CSV.parse(csv_text, :headers => true); csv.size
=> 269155
irb(main):005:0> csv.first
=> #<CSV::Row "PTYPE":"JUR" "GLN":"7601001028333" "STATUS":"I" "STDATE":"2007/10/09" "LANG":"DE" "DESCR1":"Fux Christine & Marcel" "DESCR2":"St Martini Apotheke" "ROLE_TYPE":"Pharm" "ROLE_STREET":nil "ROLE_STRNO":nil "ROLE_POBOX":nil "ROLE_ZIP":nil "ROLE_CITY":nil "ROLE_CTN":nil "ROLE_CNTRY":nil "DT":"2016/01/06">
irb(main):006:0> csv[1]
=> #<CSV::Row "PTYPE":"JUR" "GLN":"7601001367753" "STATUS":"A" "STDATE":"2010/09/06 11:25:10.387" "LANG":"DE" "DESCR1":"Amavita Apotheke Vorstadt" "DESCR2":"GaleniCare AG" "ROLE_TYPE":"Pharm" "ROLE_STREET":"Vorstadt" "ROLE_STRNO":"30/32" "ROLE_POBOX":nil "ROLE_ZIP":"8200" "ROLE_CITY":"Schaffhausen" "ROLE_CTN":"SH" "ROLE_CNTRY":"CH" "DT":"2016/01/06">
irb(main):008:0> csv.collect{|x| x['ROLE_TYPE']}.uniq
=> ["Pharm", "Indus", "Hosp", "DruSto", "SerFirm", "DoctMed", "PubHea", "Whole", "Pharmst", "Inst", "HeaIns", "IntOrg", "HeaEmpl", "NursHom", "ONursOrg", "SWFirm", "EmergServ", "Assoc", "NonHealthCare", "HeaTec", "AccIns", "HeaProd", "SpecPra", "Drugg", "GrpPra", "Dent", "Veter", "Nurse", "Lab", "Chiro", "HeaProv", "Physio", "LabLeader", "Midw", "Psycho", "Naturopath", "NutrAdv", "SocSec", "Spitex", "DentGrpPra", "CompTherapist", "VetGrpPra", "PrivPra", "Ergo", "MedPracAss", "DiabAdv", "SpeeTher", "PharmAss", "MedSecr", "EmergCent"]
=> #<CSV::Row "PTYPE":"JUR" "GLN":"7601002017688" "STATUS":"A" "STDATE":"2017/05/01 09:25:43.527" "LANG":"DE" "DESCR1":"Kantonsapotheke Z�rich (KAZ)" "DESCR2":"Spitalapotheke" "ROLE_TYPE":"PubHea" "ROLE_STREET":"S�dstrasse" "ROLE_STRNO":"3" "ROLE_POBOX":nil "ROLE_ZIP":"8952" "ROLE_CITY":"Schlieren" "ROLE_CTN":"ZH" "ROLE_CNTRY":"CH" "DT":"2017/05/15">
irb(main):017:0> csv.find_all{|x| x['ROLE_TYPE'].eql?('PubHea')}.last
=> #<CSV::Row "PTYPE":"JUR" "GLN":"7601001404656" "STATUS":"A" "STDATE":"2017/05/19 15:43:16" "LANG":"DE" "DESCR1":"Gesundheitsdirektion Kanton Z�rich" "DESCR2":"eFaktura Listenspital" "ROLE_TYPE":"PubHea" "ROLE_STREET":"Stampfenbachstrasse" "ROLE_STRNO":"30" "ROLE_POBOX":nil "ROLE_ZIP":"8090" "ROLE_CITY":"Z�rich" "ROLE_CTN":"ZH" "ROLE_CNTRY":"CH" "DT":"2017/05/21">

Also in the downloaded XLSX we have 269155 entries, whereas we are only interested in a lot less.

To distinguish between the last two entries we still have to get the details from https://www.medregbm.admin.ch/Betrieb/Search.

There is https://www.medregbm.admin.ch/Publikation/Liste which enables one to download a file Betriebe_20170809.xlsx, which is not recognized as a XMLX when running

> file  ~/Downloads/Betriebe_20170809.xlsx 
/home/niklaus/Downloads/Betriebe_20170809.xlsx: Zip archive data, at least v2.0 to extract
# but ssconvert can produce a valid csv
> ssconvert ~/Downloads/Betriebe_20170809.xlsx Betriebe.csv
>  head -n3 Betriebe.csv 
"GLN Betrieb","Betriebsname 1","Betriebsname 2",Strasse,Nummer,PLZ,Ort,Bewilligungskanton,Land,Betriebstyp,"BTM Berechtigung"
7601001402034," Schloss Apotheke Parfumerie AG",,Rathausplatz,3,8500,Frauenfeld,Thurgau,Schweiz,"�ffentliche Apotheke","6011 Verzeichnis a/b/c BetmVV-EDI"
7601001029323,"Aadorf Apotheke",,Bahnhofstrasse,8,8355,Aadorf,Thurgau,Schweiz,"�ffentliche Apotheke","6011 Verzeichnis a/b/c BetmVV-EDI"

This CSV file contains 3290 entries.

Decided to use the Ox (XML-parser) http://www.ohler.com/ox as it self contained, much smaller and faster than nokogiri.

Parsing the partners.xml with ith was easy. Updated the unit tests, updater. Renamed jobs/import_medreg_betriebe => jobs/import_refdata_partners. Now testing the importer with sudo -u apache bundle-240 exec /usr/local/bin/ruby-240 jobs/import_refdata_partners

Will fix the error next week

Error: NoMethodError
Message: undefined method `partners' for #<OddbPrevalence:0x00564771a03638>
Backtrace:
/var/www/oddb.org/src/util/oddbapp.rb:1580:in `block in method_missing'
/var/www/oddb.org/src/util/oddbapp.rb:1579:in `synchronize'
/var/www/oddb.org/src/util/oddbapp.rb:1579:in `method_missing'
/var/www/oddb.org/src/plugin/refdata_partner.rb:167:in `get_detail_to_gln'
/var/www/oddb.org/src/plugin/refdata_partner.rb:123:in `block in update'
/var/www/oddb.org/src/plugin/refdata_partner.rb:123:in `each'
/var/www/oddb.org/src/plugin/refdata_partner.rb:123:in `update'
/var/www/oddb.org/src/util/updater.rb:272:in `block in update_regdata_partners'
/var/www/oddb.org/src/util/updater.rb:543:in `wrap_update'
/var/www/oddb.org/src/util/updater.rb:270:in `update_regdata_partners'
jobs/import_refdata_partners:14:in `block in <module:Util>'
/var/www/oddb.org/src/util/job.rb:40:in `run'
jobs/import_refdata_partners:12:in `<module:Util>'
jobs/import_refdata_partners:11:in `<module:ODDB>'
jobs/import_refdata_partners:10:in `<main>'

Current state is Attach:import_refdata_partners_diff.text Δ

Analyse errors found for oddb.org

As I want to continue to check regularly for possible error seen in oddb.org I decided to add the script created yesterday (yesterday_errors) as bin/check_log_errors and add an option to analyse different days. Done with commit Added bin/check_log_errors

ywesee Developer-Wiki
Dieses Wiki richtet sich an alle ywesee-Entwickler

About

EBPS

Bbmb

ODBA

Oddb

Rpdf2txt

YDPM

YDIM

XmlConv

20170809-medwin-refdata

Summary

Commits

Index

Medwin Company Updater must use data from Refdata

Analyse errors found for oddb.org

ywesee Developer-Wiki Dieses Wiki richtet sich an alle ywesee-Entwickler

About

EBPS

Bbmb

ODBA

Oddb

Rpdf2txt

YDPM

YDIM

XmlConv

20170809-medwin-refdata

Summary

Commits

Index

Medwin Company Updater must use data from Refdata

Analyse errors found for oddb.org

ywesee Developer-Wiki
Dieses Wiki richtet sich an alle ywesee-Entwickler