view · edit · sidebar · attach · print · history

Index>

20140423-correct-oddb2xml-2

Summary

  • oddb2xml creates 20'000 (almost duplicated) entries.

Commits

Index

Keep in Mind
  • Fix dojo error http://www.sitepen.com/blog/2012/10/31/debugging-dojo-common-error-messages/#forgot-dom-ready
  • Search nach Anwendung (indications), z.B. Konjunktivitis, does not report all occurrences in the section "Anwendung" of Fachinfo. Probably index is corrupted or not set up correctly.
  • Error: Patents with could not connect to www.swissreg.ch: #<Net::HTTPInternalServerError:0x007f8a7d69bb58>
  • vagrant-oddb.org: cleanup installation for yus with ruby 1.8, logrotate.conf and local vhosts for tests

---

Personal remarks since yesterday I had to reboot my cablecom modem four times as my internet connection got lost. Reported by phone today (waited almost 10 minutes to get connected to a real person).

oddb2xml creates 20'000 (almost duplicated) entries

The following errors must be fixed:

  1. oddb_substances.xml and oddb_interactions.xml are empty when running oddb2xml -e
  2. oddb_xml -e create 20'000 duplicates
  3. oddb_xml -e takes very long (about an hour or more)

/oddb2xmlLooking around why we have duplicates I found the following stuff for Zyvoxid Filmtab in data/download:

data/download/oddb2xml_zurrose_transfer.dat:1122465312ZYVOXID Filmtabl 600 mg 10 Stk                    096114108275100A080190076805555800542
ata/download/swissindex_Pharma_DE.xml-      <ITEM DT="2013-06-22T00:00:00">
data/download/swissindex_Pharma_DE.xml-        <GTIN>7680555580054</GTIN>
data/download/swissindex_Pharma_DE.xml-        <PHAR>2465312</PHAR>
data/download/swissindex_Pharma_DE.xml-        <STATUS>A</STATUS>
data/download/swissindex_Pharma_DE.xml-        <STDATE>2002-01-16T00:00:00</STDATE>
data/download/swissindex_Pharma_DE.xml-        <LANG>DE</LANG>
data/download/swissindex_Pharma_DE.xml:        <DSCR>ZYVOXID Filmtabl 600 mg</DSCR>
data/download/swissindex_Pharma_DE.xml-        <ADDSCR>10 Stk</ADDSCR>
data/download/swissindex_Pharma_DE.xml-        <ATC>J01XX08</ATC>
data/download/swissindex_Pharma_DE.xml-        <COMP>
data/download/swissindex_Pharma_DE.xml-          <NAME>Pfizer AG</NAME>
data/download/swissindex_Pharma_DE.xml-          <GLN>7601001010604</GLN>
data/download/swissindex_Pharma_DE.xml-        </COMP>
data/download/swissindex_Pharma_DE.xml-      </ITEM>

Therefore I asking myself why we get a duplicate with STATUS I?

Added new task clean and test to ease testing with commits Added clean target to rake and Added test target to rake.

The reason is that I checked for entries in @index instead of searching in @articles. But I think prepare_articles will be the most time consuming operation. Adding a Oddb2xml.log function to log important options.

Added a spec test to check for duplicate entries of ZYVOXID. Made spec pass. Runnning full import now and waiting for completion.

Reworked prepare_articles and made it a lot faster. Is no longer the bootleneck. Now the bottleneck is build_article.

  • When calling oddb2xml -a nonpharma we need about 62 seconds to emit the first 10'000 (of 43689) articles.
  • When calling oddb2xml -e we need about 15 minutes seconds to emit the first 10'000 (of 161200) articles. Observing the logging we see that it often gets stuck. Maybe we have not enough memory or gcc problems. The process is using 3071M Virt and 2424M Res memory. Estimated timeof arrival is therefore 15x15 = 225 minutes or about 4 hours.

Pushed commits Added logging, accelerated prepare_articles, fixed recognition of duplicates and Bumped version to 1.8.0

Limitations and substances are still empty when running oddb2xml -e. Fixed with commits Fix problem with empty substances and limitation when running -e and More logging and exit 2 unless substances and limitations okay for extended

The last two commits also brought down the time for running oddb2xml -e to about 35 minutes.

view · edit · sidebar · attach · print · history
Page last modified on April 24, 2014, at 09:43 AM