view · edit · sidebar · attach · print · history

Index>

20140610-sax-parser-for-fi-pi-import

Summary

  • Use sax-parser to fix problem "Out of memory" for import_daily

Commits

Index

Keep in Mind
  • Fix dojo error http://www.sitepen.com/blog/2012/10/31/debugging-dojo-common-error-messages/#forgot-dom-ready
  • I removed on May-27 tests for ix_registrationss, fix_sequences, fix_compositions, fix_packages from test/test_plugin/swissmedic.rb,as he could not find any references for them in the src code. Did I erroneously remove stuff when cleaning up the swissmedic import earlier?
  • The whole test for older/newer Packages must be adapted to xlsx. One must compare the rows (e.g. by creating csv files) and do the same stuff in xlsx!
  • One unit-test for searchbar fails and might be a clue why searching does not work correctly.
  • Added two skip in test/test_plugin/rss.rb. Why does the mocking not work there anymore?

---

Porting ODBA to ruby 2.1.x

We have both in ruby 1.9.3 as in Ruby 2.1.2 one failing unit test test/test_stub.rb:156: loading from yaml should return a Stub, which passes in Ruby 1.8.7. See https://travis-ci.org/ngiger/odba/builds/24273672.

Porting sbsm to ruby 2.1.x

sbsm has 9 unit-tests, which are skipped in ruby 1.9, see: https://travis-ci.org/ngiger/sbsm/builds/26042086, but which passed fine with 1.8.7.

Running with 2.1.2 we have an error not seen in 1.9.3

Failure:
test_event_url(TestLookandfeel)
/opt/src/sbsm/test/test_lookandfeel.rb:168:in `test_event_url'
     165:       end
     166:       def test_event_url
     167:               # state_id is 4, because @session.state = nil
  => 168:               assert_equal("http://test.com/de/gcc/foo/state_id/4/bar/baz", 
     169:                       @lookandfeel.event_url(:foo, {:bar => 'baz'}))
     170:       end
     171:       def test_event_url__crawler
<"http://test.com/de/gcc/foo/state_id/4/bar/baz"> expected but was
<"http://test.com/de/gcc/foo/state_id/8/bar/baz">

Use sax-parser to fix problem "Out of memory" for import_daily

First try to use the sax-parser did not work, because some parsing is done only in a second phase. Must rethink the algorithm.

import_swissmedicinfo_by_index calls import_info directly and a second time via import_swissmedicinfo_by_iksnrs(@new_iksnrs).

The problem is that when calling textinfo_swissmedicinfo_index we get the FI/PI-HTML info directly from http://www.swissmedicinfo.ch/ by following the links after having selected "Neue Texte" or "Geänderte Texte".

The other useage is to use downloaded AipsDownload_latest.xml file, which contains the (same?) HTML content for each FI/PI.

The (abbreviated) call chain when running import_daily is

/var/www/oddb.org/src/util/updater.rb:308:in `update_textinfo_swissmedicinfo'
/var/www/oddb.org/src/plugin/text_info.rb:1340:in `import_swissmedicinfo_by_index'
/var/www/oddb.org/src/plugin/text_info.rb:1341:in `block in import_swissmedicinfo_by_index'
/var/www/oddb.org/src/plugin/text_info.rb:1395:in `import_swissmedicinfo'
/var/www/oddb.org/src/plugin/text_info.rb:1133:in `import_info'
/var/www/oddb.org/src/plugin/text_info.rb:107:in `parse_fachinfo'

The Sax-Parser is already used when calling get_pis_and_fis via swissmedicinfo_xml called at the beginning of import_swissmedicinfo_by_iksnrs and import_swissmedicinfo_by_index. We should avoid calling it twice in the same run.

To test we run the following test (in this order to speed up things). But first we delete a fachinfo via bin/admin reg = @system.registration('49332'); delete reg.fachinfo.pointer; update reg.pointer, :fachinfo => nil

  • jobs/update_textinfo_swissmedicinfo --no-download --target=both --reparse 49232 (has 5 entries)
  • jobs/update_textinfo_swissmedicinfo --no-download --target=both --reparse CBI (has20 entries)
  • jobs/update_textinfo_swissmedicinfo # should call import_swissmedicinfo_by_index, we have no unit-test for it!!!
  • jobs/import_daily

First import of 49232 seems to work. But http://oddb-ci2.dyndns.org/de/gcc/search/zone/drugs/search_query/49232/search_type/st_registration#best_result displays now the french description and galenic form. (But this is also the case on ch.oddb.org. Probably just the first imported language wins.)

Running the CBI-Import hangs since 25 minutes after emitting 2014-06-10 14:05:29 +0200: parse_and_update: calls parse_fachinfo dist /var/www/oddb.org/data/html/fachinfo/fr/Elumatic III, g_n_rateur de techn_tium-99m_swissmedicinfo.html name Elumatic III, générateur de technétium-99m fr title Elumatic III, générateur de technétium-99m. Killing it and trying import_daily. After 35 minutes import_daily uses about 5 GB. After 2,5 hours it went up to 5,3 GB. At 16:18 went up to 5,6 GB.n At 16.58 went up to 5,8 GB after emitting medwin data. At 16:57 just when export patents.xls finished the memory went up to about 6,1 GB and then import_daily finished without any problem! Fine.

But must fix the problem when parsing companies before being able to push the commmits.

jobs/update_textinfo_swissmedicinfo does not correctly update all new iksnrs, as it return names.

view · edit · sidebar · attach · print · history
Page last modified on June 11, 2014, at 03:02 PM