could not connect to www.swissreg.ch: #<Net::HTTPInternalServerError:0x007f8a7d69bb58>
---
Personal remarks since yesterday I had to reboot my cablecom modem four times as my internet connection got lost. Reported by phone today (waited almost 10 minutes to get connected to a real person).
The following errors must be fixed:
/oddb2xmlLooking around why we have duplicates I found the following stuff for Zyvoxid Filmtab in data/download:
data/download/oddb2xml_zurrose_transfer.dat:1122465312ZYVOXID Filmtabl 600 mg 10 Stk 096114108275100A080190076805555800542 ata/download/swissindex_Pharma_DE.xml- <ITEM DT="2013-06-22T00:00:00"> data/download/swissindex_Pharma_DE.xml- <GTIN>7680555580054</GTIN> data/download/swissindex_Pharma_DE.xml- <PHAR>2465312</PHAR> data/download/swissindex_Pharma_DE.xml- <STATUS>A</STATUS> data/download/swissindex_Pharma_DE.xml- <STDATE>2002-01-16T00:00:00</STDATE> data/download/swissindex_Pharma_DE.xml- <LANG>DE</LANG> data/download/swissindex_Pharma_DE.xml: <DSCR>ZYVOXID Filmtabl 600 mg</DSCR> data/download/swissindex_Pharma_DE.xml- <ADDSCR>10 Stk</ADDSCR> data/download/swissindex_Pharma_DE.xml- <ATC>J01XX08</ATC> data/download/swissindex_Pharma_DE.xml- <COMP> data/download/swissindex_Pharma_DE.xml- <NAME>Pfizer AG</NAME> data/download/swissindex_Pharma_DE.xml- <GLN>7601001010604</GLN> data/download/swissindex_Pharma_DE.xml- </COMP> data/download/swissindex_Pharma_DE.xml- </ITEM>
Therefore I asking myself why we get a duplicate with STATUS I?
Added new task clean and test to ease testing with commits Added clean target to rake and Added test target to rake.
The reason is that I checked for entries in @index instead of searching in @articles. But I think prepare_articles will be the most time consuming operation. Adding a Oddb2xml.log function to log important options.
Added a spec test to check for duplicate entries of ZYVOXID. Made spec pass. Runnning full import now and waiting for completion.
Reworked prepare_articles and made it a lot faster. Is no longer the bootleneck. Now the bottleneck is build_article.
oddb2xml -a nonpharma
we need about 62 seconds to emit the first 10'000 (of 43689) articles.
oddb2xml -e
we need about 15 minutes seconds to emit the first 10'000 (of 161200) articles. Observing the logging we see that it often gets stuck. Maybe we have not enough memory or gcc problems. The process is using 3071M Virt and 2424M Res memory. Estimated timeof arrival is therefore 15x15 = 225 minutes or about 4 hours.
Pushed commits Added logging, accelerated prepare_articles, fixed recognition of duplicates and Bumped version to 1.8.0
Limitations and substances are still empty when running oddb2xml -e. Fixed with commits Fix problem with empty substances and limitation when running -e and More logging and exit 2 unless substances and limitations okay for extended
The last two commits also brought down the time for running oddb2xml -e to about 35 minutes.