view · edit · sidebar · attach · print · history

Index>

20170821-save-only-latest-evidentia

Summary

  • Saved only latest file for evidentia_fi_link and interactions
  • Changelog history for PI/FI
  • Keep in Mind

Commits

Index

Saved only latest file for evidentia_fi_link and interactions

Until now we kept a copy of each day with names like interactions_de_utf8-2017.08.15.csv or evidentia_fi_link-2017.08.16.csv. This is not desired.

Done with commit Remove old epha_interactions and evidentia_fi_link files

But Zeno clarified the requirement, that he wants the old files only be removed if the have the same content as todays. This would align the behaviour of src/util/latest.rb to the one of swissmedic. We have various implementations of get_latest, as the requirements are different (e.g. XMLPublications.xml come from an zip file or we want to ignore the daily timestamp). We find it in the following files:

  • src/plugin/analysis.rb: def get_latest_file
  • src/plugin/refdata_partner.rb: def get_latest_file
  • src/plugin/shortage.rb: def get_latest_nomarketing_file
  • src/plugin/swissmedic.rb: def get_latest_file(agent, keyword='Packungen', extension = '.xlsx')
  • src/plugin/vaccines.rb: def get_latest_file
  • src/plugin/medreg_doctor.rb: def get_latest_file
  • src/util/latest.rb: def self.get_latest_file(latest, download_url, agent = Mechanize.new, must_unzip = false)

A previous attempt to unify failed. Found an error in src/util/latest.rb which was not covered by an unit test. Fixed with commit Remove only yesterday files if equal latest

Changelog history for PI/FI

Must first enable saving more than one changelog for the FI. Storing in the ODBA failed because I had not added the line ODBA_SERIALIZABLE = ['@time', '@diff'] in the class FachinfoDocument::ChangeLogItem. Reworking src/plugin/textinfo to save correctly the existing changelog entries. Using sudo -u apache bundle-240 exec /usr/local/bin/ruby-240 jobs/update_textinfo_swissmedicinfo --target=fi --reparse 60384 55945 to test some examples after reloading the db dump of August 14.

Saving and restoring new change_log_items has problems. After saving in the plugin I restart ch.oddb and in a bin/admin session I get

ch.oddb> $x = ODBA.cache.fetch(35161723)
-> Array
ch.oddb> $x.size
-> 2
ch.oddb> $x.first.time
-> 2016-08-30
ch.oddb> $x.last.time
-> undefined method `time' for nil:NilClass
Did you mean?  timeout

Why did not correctly save the time, even when the ChangeLogItem class has the following definition

    class ChangeLogItem
      include Persistence
      attr_accessor :time, :diff
      ODBA_SERIALIZABLE = ['@time', '@diff']

      def <=>(anOther)
        # [diff.to_s, time] <=> [anOther.diff.to_s, anOther.time]
        diff.to_s <=> anOther.diff.to_s
      end
      def pointer_descr
        time.strftime('%d.%m.%Y')
      end
    end

Also I got surprized by the fact that in TextInfoPlugin::store_fachinfo fachinfo.fr.text returned the german text whereas fachinfo.description(fr).text correctly returned the french text.

Now I am getting the following error when trying to save the change_log

[1] pry(#<ODDB::FachinfoDocument2001>)> odba_store
IOError: closed stream
from /var/www/oddb.org/vendor/ruby/2.4.0/gems/odba-1.1.2/lib/odba/marshal.rb:10:in `internal_encoding'

[20] pry(ODDB::TextInfoPlugin)> reg.fachinfo.fr.change_log.last.diff.class
=> Diffy::Diff

Also when looking at the ODBA-object I have in the pry session when trying to store the fachinfo

[29] pry(ODDB::TextInfoPlugin)> reg.fachinfo.fr.change_log.last.time
=> 2017-08-21
[30] pry(ODDB::TextInfoPlugin)> reg.fachinfo.fr.change_log.last.odba_id
=> 36113256
[29] pry(ODDB::TextInfoPlugin)> reg.fachinfo.fr.change_log.last.time
=> 2017-08-21
[30] pry(ODDB::TextInfoPlugin)> reg.fachinfo.fr.change_log.last.odba_id
=> 36113256

bin/admin reports

ch.oddb> registration('55945').fachinfo.fr
-> #<ODDB::FachinfoDocument2001:0x007feac6536688>
ch.oddb> registration('55945').fachinfo.fr.change_log.size
-> 1

Reverting to the old situation last year in src/model/fachinfo.rb, as I suspect the error was in the src/plugin/fachinfo.rb where we did not preserve the old change_log.

This did not work and I got

[13] pry(ODDB::TextInfoPlugin)> reg.iksnr
=> "60384"
[14] pry(ODDB::TextInfoPlugin)> lang
=> "fr"
[15] pry(ODDB::TextInfoPlugin)> app.registration('60384').fachinfo.fr.change_log.first.odba_id
=> 35827855
[16] pry(ODDB::TextInfoPlugin)> app.registration('60384').fachinfo.fr.change_log.last.odba_id
=> 36113198
[17] pry(ODDB::TextInfoPlugin)> x = ODBA.cache.fetch(36113198)
ODBA::OdbaError: Unknown odba_id 36113198
from /var/www/oddb.org/vendor/ruby/2.4.0/gems/odba-1.1.2/lib/odba/cache.rb:639:in `restore_object'
[18] pry(ODDB::TextInfoPlugin)> app.registration('60384').fachinfo.fr.change_log.last.time
=> 2017-08-21

I suspect that I must add a few lines to the Diffy::Diff class to persist it correctly, eg. I will try to add

class Diffy::Diff
  include Persistence
  ODBA_SERIALIZABLE = ['@default_format', ' @default_options', '@string1', '@string2', '@options']
end

Finally I found the culprit. The gem Diffy holds in its Diff class a variable @tempfiles, which contains closed files. ODBA is unable to dump it. This can be fixed by the following monkey patch

class Diffy::Diff
  def close_tempfiles
    @tempfiles.each{|x| x.close unless x.closed?}
    @tempfiles  = []
  end
end

and adding item.diff.close_tempfiles before calling item.odba_store.

But I think the best way is to to fork the Diffy gem, patch it and submit a pull request.

This works nicely, as seen with the attached screenshot:

Pushed the following commits

Reimporting thinpower database and running import_daily before pushing the changes for oddb.org.

view · edit · sidebar · attach · print · history
Page last modified on August 21, 2017, at 07:11 PM