view · edit · sidebar · attach · print · history

Index>

20140507-oddb2xml-with-FI-2

Summary

  • Fix oddb2xml under Windows
  • Pretty FI in oddb2xml

Commits

Index

Keep in Mind

---

Search nach Anwendung (indications), z.B. Konjunktivitis, does not report all occurrences in the section "Anwendung" of Fachinfo.

Probably index is corrupted or not set up correctly.

We will compare this with the output of amiko for android using Kopfschmerzen and Konjunktivitis. Kopfschmerzen returns 95 drugs. Konjunktivitis 72. Via the web I get 238 for 239 and 18 for Konjunktivitis. I could reproduce these result via bin/admin

ch.oddb> @res2 = search_exact_indication('Konjunktivitis')
-> #<ODDB::SearchResult:0x007f89e87f57f0>
ch.oddb> @res2.package_count
-> 18
ch.oddb> @res = search_exact_indication('Kopfschmerzen')
-> #<ODDB::SearchResult:0x007f89b5b83eb8>
ch.oddb> @res.package_count
-> 239

For Konjunktivitis Amiko offers Amavita Ceterin (Iksrn 61240) which does not show up for our search on http://oddb-ci2.dyndns.org/de/gcc/search/zone/drugs/search_query/Konjunktivitis/search_type/st_indication#best_result. Also we see /var/www/oddb.org/src/util/oddbapp.rb:1139:in `search_sequences': sequence_index Kopfschmerzen

Checking whether rebuilding the indices helps. sudo -u apache jobs/rebuild_indices sequence_index_substance sequence_index sequence_index_exact sequence_index_atc

Activated debug option for showing index searches (src/util/oddbapp.rb:105) on oddb-ci2. Installed sqlitebrowser to search amiko.db.

Looking in detail at registration 61204

ch.oddb> registration('61240').fachinfo.search_text('de').index('Konjunktivitis')
-> 261
ch.oddb> registration('61240').fachinfo.search_text('de')[250..300]
-> lergischer Konjunktivitis Chronischer idiopathische
ch.oddb> 
ch.oddb> registration('61240').sequences.first[1].search_terms
-> ["Amavita", "Cetirizin", "Filmtabletten", "Amavita Cetirizin Filmtabletten"]
ch.oddb> registration('61240').sequences.first[1].indication
-> Antiallergikum

Running jobs/update_textinfo_swissmedicinfo --no-download --target=both --reparse 49232 61240 to see whether we get more or less hits after importing the fachinfo again. Still 18 hits.

For Amavita Cetirin we find the following indication ins aips2sqlite Erwachsenen;Kindern;Allergischer;Rhinitis;saisonal;Heuschnupfen;Pollinosis;Behandlungsdauer;Kindern;saisonaler;Rhinitis;beträgt;maximal;Wochen;perennial;Allergischer;Konjunktivitis;Chronischer;idiopathischer;Urtikaria; and in bin/admin

-> IndikationenAnwendungsmoeglichkeiten Bei Erwachsenen und Kindern ab 6 Jahren zur Behandlung von Allergischer Rhinitis saisonal Heuschnupfen Pollinosis die Behandlungsdauer bei Kindern mit saisonaler R
ch.oddb> registration('61240').fachinfo.search_text(:de)[200..399]
-> hinitis betraegt maximal 4 Wochen und perennial Allergischer Konjunktivitis Chronischer idiopathischer Urtikaria

In the Fachinfo I found

Indikationen/Anwendungsmöglichkeiten

Bei Erwachsenen und Kindern ab 6 Jahren zur Behandlung von:
Allergischer Rhinitis, saisonal (Heuschnupfen, Pollinosis; die Behandlungsdauer bei Kindern mit saisonaler Rhinitis beträgt maximal 4 Wochen) und perennial.
Allergischer Konjunktivitis.
Chronischer idiopathischer Urtikaria.

The search for indication is complicated. It looks inside the indications for ATC-codes.

   def search_by_indication(key, lang=:de)
    pattern = key.gsub(/[^A-z0-9]/, '.')
    atcs = []
    indications.map do |indication|
      if indication.search_text.match(/#{key}/i)
        atcs.concat indication.atc_classes
      end
    end
    atcs.uniq
  end

Fix oddb2xml under Windows

Yesterday I forgot to check my build under Windows and there we still have the following bug.

bash>oddb2xml -e
Added 53224 via pharmacodes of 137236 items when extracting the transfer.dat from "Zur Rose"
  found 775 lines with duplicated ean13
C:/Ruby200/lib/ruby/gems/2.0.0/gems/oddb2xml-1.8.2/lib/oddb2xml/downloader.rb:99
:in `basename': no implicit conversion of Regexp into String (TypeError)
       from C:/Ruby200/lib/ruby/gems/2.0.0/gems/oddb2xml-1.8.2/lib/oddb2xml/downloader.rb:99:in `block (2 levels) in read_xml_from_zip'
       from C:/Ruby200/lib/ruby/gems/2.0.0/gems/rubyzip-1.1.3/lib/zip/entry_set.rb:42:in `call'
       from C:/Ruby200/lib/ruby/gems/2.0.0/gems/rubyzip-1.1.3/lib/zip/entry_set.rb:42:in `block in each'
       from C:/Ruby200/lib/ruby/gems/2.0.0/gems/rubyzip-1.1.3/lib/zip/entry_set.rb:41:in `each'
       from C:/Ruby200/lib/ruby/gems/2.0.0/gems/rubyzip-1.1.3/lib/zip/entry_set.rb:41:in `each'
       from C:/Ruby200/lib/ruby/gems/2.0.0/gems/rubyzip-1.1.3/lib/zip/central_directory.rb:182:in `each'
       from C:/Ruby200/lib/ruby/gems/2.0.0/gems/oddb2xml-1.8.2/lib/oddb2xml/downloader.rb:89:in `block in read_xml_from_zip'
       from C:/Ruby200/lib/ruby/gems/2.0.0/gems/rubyzip-1.1.3/lib/zip/file.rb:99:in `open'
       from C:/Ruby200/lib/ruby/gems/2.0.0/gems/oddb2xml-1.8.2/lib/oddb2xml/downloader.rb:88:in `read_xml_from_zip'
       from C:/Ruby200/lib/ruby/gems/2.0.0/gems/oddb2xml-1.8.2/lib/oddb2xml/downloader.rb:201:in `download'
       from C:/Ruby200/lib/ruby/gems/2.0.0/gems/oddb2xml-1.8.2/lib/oddb2xml/cli.rb:238:in `block in download'

Reason was that there is a different loop to extract a file from zip under Windows. Fixed with commit Fix build under windows

Pretty FI in oddb2xml

The following classes must be extracted from oddb.org

  • src/model/text
  • src/model/fachinfo
  • ext/fiparse/textinfo_hpricot
  • ext/fiparse/fachinfo_hpricot
  • ext/fiparse/fiparse
  • ext/fiparse/writer

In a first step we will create a subdir fi in lib/oddb2xml (and spec/) to easen a later extraction to a separate gem.

We will add an option --extract-fi [iksrn1,iksrn2,..] to extract all fachinfo.

We will use some of the more difficult FIs to verify our work. They must contain

  1. tables
  2. images
  3. illegal image content
  4. spaces
  5. bold
  6. italic
  7. special characters

Candidates are 45928 (Künzle), 62184 (Cipralex® Filmtabletten), 58267 for Isentress.

Created new branch fi to work in. Updating spec/data/swissmedic_info.zip to include and AipsDownload_20140507.xml with exactly the 3 mentioned articles in german, french and italian. Creating a helper script for later use, too. This script takes many minutes to complete. Therefore added as first test only the german version of 51704 (Erbiumcitrat CIS bio international), which is quite complicated.

Questions: Should I generate for the different chapters element like that from aips2sqlite (Advantage. One could easily compare two outputs)

<paragraph>
    <paragraphtitle>
          Packungen
    </paragraphtitle>
    <p>OCTREOSCAN, Kit, a.H. [A]</p>
</paragraph>

or a more content oriented

<packages id="section18">
  <title>Packungen</title>
  <html><p>OCTREOSCAN, Kit, a.H. [A]</p><html>
</packages>

Also I think the swissmedic-XML is just brain dead, e.g section18 only defines the title but not the content!

      <section id="section18">
        <title>Packungen</title>
      </section>

or a mixture between both, e.g

      <section id="section18">
        <title>Packungen</title>
        <html><p>OCTREOSCAN, Kit, a.H. [A]</p><html>
      </section>

To be discussed with Zeno.

  • Other question

Shouldn't we skip all not needed work if creating fachinfo is asked for.

Attacking other tasks. Saved branch fi with new commits

view · edit · sidebar · attach · print · history
Page last modified on May 07, 2014, at 05:44 PM