view · edit · sidebar · attach · print · history

201311125-oddb2xml-skip-download-option

<< | Index | >>


Summary

  • oodb2xml: add a --skip-download option

Commits

Index

---

integrate a simple interface to display the interaction from EPHA

We want to display the interactions from epha textually on our website.

First we must download the and load into our database. A CSV dump can be found here. It seems that the CSV is only available in german, as I could fetch neither interactions_fr_utf8.csv nor interactions_en_utf8.csv.

They are based on the ATC-Code. A good example of an interaction is between Marcoumar and Aspirin.

Will use the branch interactions to be able to respond to other problems in between.

First step is creating an importer jobs/update_interactions and inserting an appropriate test into test/test_util/updater.rb. test/test_util/updater.rb however takes over 3 minutes and still has 2 failures and 4 erros. Fixes some failures and errors, but 3 skips remain.

I propose the following translation for the CSV to add them into a table:

  1. ATC1 -> atc_code_self
  2. Name1 -> name_self
  3. ATC2 -> atc_code_other
  4. Name2 -> name_other
  5. Info -> info
  6. Mechanismus -> action
  7. Effekt -> effect
  8. Massnahmen -> measure
  9. Grad -> severity

An additional field :language which will be always 'de' at the moment. To avoid a name collision with existing interactions I will use epha_interactions.

Added src/model/epha_interactions.rb, src/plugin/epha_interactions.rb. Changed src/util/oddbapp.rb, src/util/updater.rb and unit-tests for each of the touched files. Missing from unit-tests is loading a small csv-file. Will continue adding this test tomorrow.

rawler and google-crawler must be restarted if they consume more than 1,5 GB memory

We can see the output the memory consumption at the following url.

Setting a ulimit of 1,5 GB did not work correctly. Therefore we decided to use the same Ruby mechanism as when we kill processes having more than 150 thread. Details see src/util/oddbapp.rb method log_size.

We find there the following snippets.

    MEMORY_LIMIT = 20480
<..>
           # Shutdown if more than 100 threads are created, probably because of spiders
            if threads > 200
              exit
            end
<..>
            bytes = File.read("/proc/#{$$}/stat").split(' ').at(22).to_i
            mbytes = bytes / (2**20)
            if mbytes > MEMORY_LIMIT
              puts "Footprint exceeds #{MEMORY_LIMIT}MB. Exiting."
              Thread.main.raise SystemExit
            end

Pushed commit Limit memory useage to 10GB, resp. 1.4 GB for crawler.

I am unsure whether this approach will work, as in my opinion it might be a consequence that sometimes I need to use kill -9 <pid_of_google_crawler> to kill the Google Crawler. But I never tried to pinpoint the problem why a normal kill did not work.

oodb2xml: add a --skip-download option

I want also add a unit test for for the error of last week, where swissmedic_packages.xls was not parsed correctly.

Fixed some local errors, which prevented the specs to complete.

Have problems generating a correct test.

view · edit · sidebar · attach · print · history
Page last modified on November 25, 2013, at 08:22 PM