view · edit · sidebar · attach · print · history

Index>

20140604-oddb2xml-add-non-refdata-flat

Summary

  • add a flag non-refdata when running oddb2xml -e
  • fix problem "Out of memory" for import_daily

Commits

Index

Keep in Mind
  • Fix dojo error http://www.sitepen.com/blog/2012/10/31/debugging-dojo-common-error-messages/#forgot-dom-ready
  • I removed on May-27 tests for ix_registrationss, fix_sequences, fix_compositions, fix_packages from test/test_plugin/swissmedic.rb,as he could not find any references for them in the src code. Did I erroneously remove stuff when cleaning up the swissmedic import earlier?
  • The whole test for older/newer Packages must be adapted to xlsx. One must compare the rows (e.g. by creating csv files) and do the same stuff in xlsx!
  • One unit-test for searchbar fails and might be a clue why searching does not work correctly.
  • Added two skip in test/test_plugin/rss.rb. Why does the mocking not work there anymore?

---

fix problem "Out of memory" for import_daily

Suddenly we got errors like this

Plugin: ODDB::TextInfoPlugin
Error: NoMemoryError
Message: failed to allocate memory
Backtrace:
/var/www/oddb.org/src/model/text.rb:395:in `block in wrap'

Observed that I consumed over 13 GB of RAM before import_daily failed on oddb-ci2. Aips*.xml file grew from 595 MB in November 2013 to 733 MB in May 2014. And using nokogiris xpath is very memory consuming. Trying to switch to sax-machine. Made a local patch and restart import_daily.

Called sudo gem install sax-machine --version 0.1.0 on oddb-ci2. Fixing some nil accesses in new code. After 9 minutes memory seems stays between 6700 and 7100 MB. After 50 minutes memory went up to about 9000 MB.

Adapt the unit test for swissmedic.rb to the new xlsx format

Will export the old xls files as csv and create a corresponding xlsx file. (test/data/xls/Packungen.xls and test/data/xls/Packungen.older.xls)

add a flag non-refdata when running oddb2xml -e

Idea for realization is that we mark all items coming from refdata by adding a refdata flag (BagXmlExtractor, SwissIndexExtractor, MigelExtractor, SwissmedicInfoExtractor). When emitting (builder.rb) we test it.

Added some unit-tests and made changes. Migel-products required a separat lookup. Running now oddbxml -e to check the result. Adapted oddb2xml.xsd. Bumped version to 1.8.5. The new element REF_DATA will always be created when emitting oddb_article.xml.

Checking the results

2014-06-04 10:41:49: build_article. Done 159363 of 159363 articles
DE
        Pharma products: 15590
        NonPharma products: 28992
FR
        Pharma products: 15590
        NonPharma products: 28992
        Prices zur Rose: 136962
2014-06-04 10:41:54 +0200: 103 done. Took 2187 seconds
Added 52997 via pharmacodes of 136962 items when extracting the transfer.dat from "Zur Rose"
  found 775 lines with duplicated ean13
niklaus@ng-tr /o/s/oddb2xml> grep -c "REF_DATA>0" oddb_article.xml 
115512
niklaus@ng-tr /o/s/oddb2xml> grep -c "REF_DATA>1" oddb_article.xml 
44602

I have a discrepancy that 15590 Pharma and 28992 NonPharma products gives only 44582 refdata products, but I emitted 44602 articles. Why? How can I find the 20 items that produce this problem. Probably a brute force debugging needed.

Pushed commits Bumped version to 1.8.5 and Added refdata field.

Running rake test to ensure that everything is okay.

view · edit · sidebar · attach · print · history
Page last modified on June 04, 2014, at 05:55 PM