view · edit · sidebar · attach · print · history

Index>

20140106-swissmedic-xlsx-format

Summary

  • Use new swissmedic-xslx format for oddb2xml

Commits

Index

---

Use new swissmedic-xslx format for oddb2xml

New URL is: https://www.swissmedic.ch/arzneimittel/00156/00221/00222/00230/index.html?lang=de where we find Packungen.xls and "erweiterte Präparateliste" File (for oddb.org)

Tasks are:

  1. first adapt oddb2xml to work with the new "Zugelassene Packungen.xlsx" file.
  2. adapt oddb.org to read "Zugelassene Packungen.xlsx" and "erweiterete Präparateliste.xlsx"

I manually downloaded the to files and called unzip ../Zugelassene_Packungen_011213.xlsx and unzip ../Praeparate_mit_WS_011213.xlsx in a a temporary folder.

Looked at the content of xl/worksheets/sheet1.xml and xl/sharedStrings.xml. Consulted

I looked for the row 5 of Präparate where I find a line like (each cell transformed into a line)

00274
1
Cardio-Pulmo-Rénal Sérocytol, suppositoire
Sérolab, société anonyme
Hauptzulassung
08.07.
J06AA
Blutprodukte
B
B
26.04.10
26.04.10
25.04.15
globulina equina (immunisé avec coeur, tissu pulmonaire, reins de porcins)
globulina equina (immunisé avec coeur, tissu pulmonaire, reins de porcins) 8 mg, propylenglycolum, conserv.: E 216, E 218, excipiens pro suppositorio.
Traitement immunomodulant  selon le Dr Thomas Possibilités d'emploi voir information professionnelle

Which in the sheet.xml looks like

    <row r="5" spans="1:17" ht="12.75" customHeight="1">
      <c r="A5" s="14">
        <v>274</v>
      </c>
      <c r="B5" s="6">
        <v>1</v>
      </c>
      <c r="C5" s="7" t="s">
        <v>1880</v>
      </c>
      <c r="D5" s="11" t="s">
        <v>12390</v>
      </c>
      <c r="E5" s="12" t="s">
        <v>20151</v>
      </c>
      <c r="F5" s="11" t="s">
        <v>20152</v>
      </c>
      <c r="G5" s="11" t="s">
        <v>12816</v>
      </c>
      <c r="H5" s="7" t="s">
        <v>14546</v>
      </c>
      <c r="I5" s="7" t="s">
        <v>14556</v>
      </c>
      <c r="J5" s="7" t="s">
        <v>14556</v>
      </c>
      <c r="K5" s="28">
        <v>40294</v>
      </c>
      <c r="L5" s="28">
        <v>40294</v>
      </c>
      <c r="M5" s="28">
        <v>42119</v>
      </c>
      <c r="N5" s="9" t="s">
        <v>20426</v>
      </c>
      <c r="O5" s="9" t="s">
        <v>22407</v>
      </c>
      <c r="P5" s="9" t="s">
        <v>17602</v>
      </c>
      <c r="Q5" s="9"/>
    </row>

The string Cardio-Pulmo-Rénal Sérocytol, suppositoire I can find at line 5644 (or element nr 1880) of the file xl/sharedStrings.xml. For the cell K5 I have the integer 40294 which corresponds in an XSL to the date of 26.04.10.

Therefore I think it should not be too difficult to create a relatively simple library to read this specific xslx file. However if we want to create and/or use a generice library much more work would be involved. I will look around if there is generic free SW (ruby) library for reading/writing xslx files around. Looking now at:

I will try rubyXL. Will create a new unit tests for reading an xslx file with the same content as the the old xls file. Okay I have now a failing can parse Zugelassene_Packungen.xls test.

After the unit-tests passed, running bin/oddb2xml did not yet pass, as rubyXL returned nil when a spreadsheet cell was empty and not '' as the spreadsheet-gem. This was quite time consuming to find.

I am getting stuck, as the changes in extractor.rb make 24 unit-tests fail.

Pushed commit Adapted to swissmedic xlsx files but the unit-tests still fail.

view · edit · sidebar · attach · print · history
Page last modified on January 06, 2014, at 05:30 PM