<< | Index | >>
---
We want to display the interactions from epha textually on our website.
First we must download the and load into our database. A CSV dump can be found here. It seems that the CSV is only available in german, as I could fetch neither interactions_fr_utf8.csv nor interactions_en_utf8.csv.
They are based on the ATC-Code. A good example of an interaction is between Marcoumar and Aspirin.
Will use the branch interactions to be able to respond to other problems in between.
First step is creating an importer jobs/update_interactions
and inserting an appropriate test into test/test_util/updater.rb
.
test/test_util/updater.rb however takes over 3 minutes and still has 2 failures and 4 erros. Fixes some failures and errors, but 3 skips remain.
I propose the following translation for the CSV to add them into a table:
An additional field :language which will be always 'de' at the moment. To avoid a name collision with existing interactions I will use epha_interactions.
Added src/model/epha_interactions.rb, src/plugin/epha_interactions.rb. Changed src/util/oddbapp.rb, src/util/updater.rb and unit-tests for each of the touched files. Missing from unit-tests is loading a small csv-file. Will continue adding this test tomorrow.
We can see the output the memory consumption at the following url.
Setting a ulimit of 1,5 GB did not work correctly. Therefore we decided to use the same Ruby mechanism as when we kill processes having more than 150 thread. Details see src/util/oddbapp.rb
method log_size
.
We find there the following snippets.
MEMORY_LIMIT = 20480 <..> # Shutdown if more than 100 threads are created, probably because of spiders if threads > 200 exit end <..> bytes = File.read("/proc/#{$$}/stat").split(' ').at(22).to_i mbytes = bytes / (2**20) if mbytes > MEMORY_LIMIT puts "Footprint exceeds #{MEMORY_LIMIT}MB. Exiting." Thread.main.raise SystemExit end
Pushed commit Limit memory useage to 10GB, resp. 1.4 GB for crawler.
I am unsure whether this approach will work, as in my opinion it might be a consequence that sometimes I need to use kill -9 <pid_of_google_crawler>
to kill the Google Crawler. But I never tried to pinpoint the problem why a normal kill did not work.
I want also add a unit test for for the error of last week, where swissmedic_packages.xls was not parsed correctly.
Fixed some local errors, which prevented the specs to complete.
Have problems generating a correct test.