view · edit · sidebar · attach · print · history

20101029-debug-bsvFollower-autorun update-update_swissmedic

<< Masa.20101101-update-update_bsv | 2010 | Masa.20101028-debug-bsv_follower-autorun >>


  1. Check the log last night
  2. Check update_swissmedic and followers
  3. Debug the test cases of bsv_xml.rb
  4. Check SwissmedicPlugin test cases
  5. Design a new method

Goal
  • Debug bsv_followers (oddb.org) autorun / 90 % wait until Monday (log comes)
  • Simplify update_swissmedic and update_bsv and (followers) / 50%
Milestones
  1. Check log last night 8:00
  2. {#Check update_swissmedic (and followers) process-} 11:30
    1. Compare update_bsv and update_swissmedic
  3. Debug the test cases of bsv_xml.rb 14:15
  4. Check SwissmedicPlugin test cases 14:15
  5. Make a test case of a new method
Summary
Commits
ToDo Tomorrow
Keep in Mind
Attached Files

Check the log last night

Local

masa@masa ~/ywesee/oddb.org $ cat log/oddb/debug/2010/10.log 

2010-10-28 16:57:52 CEST getin update_bsv
2010-10-28 16:57:56 CEST getin BsvXmlPlugin.update
2010-10-28 16:57:59 CEST path = "/home/masa/ywesee/oddb.org/data/xml/XMLPublications-2010.10.28.zip"
2010-10-28 16:57:59 CEST @latest = nil
2010-10-28 16:57:59 CEST File.exist?(@latest) = true
2010-10-28 16:57:59 CEST FileUtils.cmp(@latest, path) = false
2010-10-28 17:08:43 CEST FileUtils.cp /home/masa/ywesee/oddb.org/data/xml/XMLPublications-2010.10.28.zip, /home/masa/ywesee/oddb.org/data/xml/XMLPublications-latest.zip
2010-10-28 17:08:43 CEST return_value_BsvXmlPlugin.update = "/home/masa/ywesee/oddb.org/data/xml/XMLPublications-2010.10.28.zip"
2010-10-28 17:08:43 CEST getin log_notify_bsv
2010-10-28 17:09:05 CEST getin Log.notify (SL-Update)
2010-10-28 17:09:10 CEST return_value_log_notify = ["mhatakeyama@ywesee.com"]
2010-10-28 17:09:10 CEST getin Log.notify (SL-Update)
2010-10-28 17:09:15 CEST return_value_log2_notify = ["mhatakeyama@ywesee.com"]
2010-10-28 17:09:15 CEST return_value_update_bsv=["mhatakeyama@ywesee.com"]
2010-10-28 17:09:15 CEST getin update_bsv_followers

Emails

  1. ch.ODDB.org Report - SL-Update (XML) - 11/2010
  2. ch.ODDB.org Report - SL-Update (XML) - 11/2010

Notes

  • Success as usual

Production server

$ ls /var/www/oddb.org/log/oddb/
2006.tar.bz2  2007.tar.bz2  2008.tar.bz2  2009.tar.bz2  2010  access_log  access_log.bak.bz2  error_log

$ ls /var/www/oddb.org/log -al
total 258836
drwxr-xr-x 32 apache apache      4096 2009-09-08 17:27 .
drwxr-xr-x 15 ywesee users       4096 2010-10-27 14:35 ..
drwxr-xr-x  3 root   root       16384 2010-10-29 00:01 oddb

$ cat /etc/crontab
# run ch.oddb.org updates
1 3 * * *       apache  /var/www/oddb.org/jobs/import_daily
1 5 * * *       apache  /var/www/oddb.org/jobs/export_daily

Notes

  • There was no log on production server, because 'apache' user does have the privilege to write a file in oddb directory
  • We will wait until Monday when the log comes

Check update_swissmedic and followers

Compare update_bsv and update_swissmedic

src/util/updater.rb#run

    def run
...

      if(update_swissmedic)
        update_swissmedic_followers
      end

...

      if(update_bsv)
        update_bsv_followers
      end

    end

Notes

  • Same structure.

src/util/updater.rb#update_bsv

    def update_bsv
      logs_pointer = Persistence::Pointer.new([:log_group, :bsv_sl])
      logs = @app.create(logs_pointer)
      this_month = Date.new(@@today.year, @@today.month)
      if (latest = logs.newest_date) && latest > this_month
        this_month = latest
      end
      klass = BsvXmlPlugin
      plug = klass.new(@app)
      subj = 'SL-Update (XML)'

      wrap_update(klass, subj) {
        if plug.update
          log_notify_bsv(plug, this_month, subj)
        end
      }

    end

src/util/updater.rb|update_swissmedic

    def update_swissmedic(*args)

      logs_pointer = Persistence::Pointer.new([:log_group, :swissmedic])
      logs = @app.create(logs_pointer)
      klass = SwissmedicPlugin
      plug = klass.new(@app)

      wrap_update(klass, "swissmedic") {
        if(plug.update(*args))
          month = @@today << 1
          pointer = logs.pointer + [:log, Date.new(month.year, month.month)]
          log = @app.update(pointer.creator, log_info(plug))
          log.notify('Swissmedic XLS')
        end
      }

    end

Notes

  • Similar structure
  • Main flow is both as follows'
    1. Create a new class (BsvXmlPlugin, SwissmedicPlugin)
    2. Call update method in each class
    3. If the update runs correctly (returns NOT nil), it sends a email log
  • The important part is 'update' method
  • We want to know clearly what kinds of cases there are in the updating process, its success and failure cases, and the corresponding return values
  • If there is an error (Exception, RuntimeError, etc.) in updating and sending an email processes, the error report comes by email (this is because of wrap_update function)
  • update_swissmedic can take some arguments, but we cannot know only by reading this code what kind of arugements the method can take

Check update method in each class

src/plugin/bsv_xml.rb#update

    def update
      path = download_to ARCHIVE_PATH
      if File.exist?(@latest) && FileUtils.cmp(@latest, path)
        FileUtils.rm path
        return
      end
      _update path
      FileUtils.cp path, @latest
      path
    end

src/plugin/swissmedic.rb#update

    def update(agent=Mechanize.new, target=get_latest_file(agent))
      if(target)
        initialize_export_registrations agent
        diff target, @latest, [:atc_class, :sequence_date]
        update_registrations @diff.news + @diff.updates, @diff.replacements
        update_export_registrations @export_registrations
        update_export_sequences @export_sequences
        sanity_check_deletions(@diff)
        delete @diff.package_deletions
        deactivate @diff.sequence_deletions
        deactivate @diff.registration_deletions
        FileUtils.cp target, @latest
        @change_flags = @diff.changes.inject({}) { |memo, (iksnr, flags)|
          memo.store Persistence::Pointer.new([:registration, iksnr]), flags
          memo
        }
      end
    end

Notes

  • Main flow in both is as follows:
    1. Download a latest file online
    2. Update process runs
    3. Copy the downloaded file to the latest file locally
  • But they looks quite different
  • And it is not clear what kinds of return values will come in both cases

Analyze in detail both download processes

1. src/plugin/bsv_xml.rb#update

    def update
      path = download_to ARCHIVE_PATH
      if File.exist?(@latest) && FileUtils.cmp(@latest, path)
        FileUtils.rm path
        return
      end
      _update path
      FileUtils.cp path, @latest
      path
    end

Notes

  • path = "/home/masa/ywesee/oddb.org/data/xml/XMLPublications-2010.10.29.zip"
  • update_bsv downloads a zip archive file from online
  • in update_bsv, 'download_to ARCHIVE_PATH' is in charge of downloading process

src/plugin/bsv_xml.rb#download_to

    def download_to archive_path=ARCHIVE_PATH
      archive = File.join archive_path, 'xml'
      FileUtils.mkdir_p archive
      agent = Mechanize.new
      zip = agent.get ODDB.config.url_bag_sl_zip
      target = File.join archive,
               Date.today.strftime("XMLPublications-%Y.%m.%d.zip")
      zip.save_as target
      target
    rescue EOFError
      retries ||= 10
      if retries > 0
        retries -= 1
        sleep 10 - retries
        retry
      else
        raise
      end
    end

Notes

  • target = "/home/masa/ywesee/oddb.org/data/xml/XMLPublications-2010.10.29.zip"
  • ODDB.config.url_bag_sl_zip = "http://bag.e-mediat.net/SL2007.Web.External/File.axd?file=XMLPublications.zip"
  • Mechanize library is used for the downloading of the file online
  • EOFError is catched, this rescue code means as follows:
    • If it cannot access to the online file, it try to access it 10 times.
    • After the try 10 times, it raises RuntimeError, (which will be catched by wrap_update method in update_bsv)
  • This method is specialized for the downloading a xml zip file
  • This method deletes the downloaded file named with date information (XMLPublications-20xx.xx.xx.zip) if it runs twice in the same day (XMLPublications-latest.zip remains)

2. src/plugin/swissmedic.rb#update

    def update(agent=Mechanize.new, target=get_latest_file(agent))
      if(target)
        initialize_export_registrations agent
        diff target, @latest, [:atc_class, :sequence_date]
        update_registrations @diff.news + @diff.updates, @diff.replacements
        update_export_registrations @export_registrations
        update_export_sequences @export_sequences
        sanity_check_deletions(@diff)
        delete @diff.package_deletions
        deactivate @diff.sequence_deletions
        deactivate @diff.registration_deletions
        FileUtils.cp target, @latest
        @change_flags = @diff.changes.inject({}) { |memo, (iksnr, flags)|
          memo.store Persistence::Pointer.new([:registration, iksnr]), flags
          memo
        }
      end
    end

Notes

  • target = "/home/masa/ywesee/oddb.org/data/xls/Packungen-2010.10.29.xls"
  • update_swissmedic downloads a xls file file from online
  • in update_swissmedic, 'get_latest_file(agent)' is in charge of downloading process
  • This downloading process runs in argument place of method update method
  • This means that SwissmedicPlugin.update method can take two argumetns
  • But I am not sure there is such a code (SwissmedicPlugin.update method) which is set the target argument
  • The 'get_latest_file' method also use Mechanize libaray
  • I guess a Mechanize object does not have to set in the first argument
  • The 'get_latest_file' is also used for the downloading of Präparateliste-latest.xls.

src/plugin/swissmedic.rb#get_latest_file

    def get_latest_file(agent, keyword='Packungen')
      page = agent.get @index_url
      links = page.links.select do |link|
        ptrn = keyword.gsub /[^A-Za-z]/u, '.'
        /#{ptrn}/iu.match link.attributes['title']
      end
      link = links.first or raise "could not identify url to #{keyword}.xls"
      file = agent.get(link.href)
      download = file.body
      latest_name = File.join @archive, "#{keyword}-latest.xls"
      latest = ''
      if(File.exist? latest_name)
        latest = File.read latest_name
      end
      if(download[-1] != ?\n)
        download << "\n"
      end
      target = File.join @archive, @@today.strftime("#{keyword}-%Y.%m.%d.xls")
      if(download != latest)
        File.open(target, 'w') { |fh| fh.puts(download) }
        target
      end
    end

Notes

  • This method is specialized for xls file downloading
  • In any cases in oddb.org, the agent is Machanize object. There is the other case than Machanize.
  • This 'get_latet_file' has a function which searches a web site and finds a target file
  • The 'get_latest_file' does not use FileUtils.cp for the copy but File.open, why?
  • There is no retrying access function like download_to method
  • This method does not use FileUtils.cmp method for the comparison between the downloaded file and the latest file
  • This method does not use Mechanize.save_as method to download a file but uses File.open and Mechanize#body

Interim Summary (download_to (XMLPublication.zip), get_latest_file (Packungen.xls))

  • Both use Machanize library to get a file from online
  • 'download_to' is simple and it just get a file online
    • This is only used for XMLPublication.zip
  • 'get_latest_file' is a little bit complicated.
    • It does not only the downloading but searching a file, copying and comparing files
    • This is used for both Packungen.xls and Praparatiste.xls

Let's think

  • Why are there these two functions defined? even though the functions are so similar

Brain storming for designing a new method

  • Test first
    • Check the current tests and debug tests if necessary
  • I should list up the necessary functions in common
  • Design test cases (with referring to the current test cases)
  • Design the interface of functions
  • Test them locally
  • Commit and test them online (on production server)
  • 'get_latest_file' should be simplified

Debug the test cases of bsv_xml.rb

Check

masa@masa ~/ywesee/oddb.org $ ruby test/test_plugin/bsv_xml.rb 
test/test_plugin/bsv_xml.rb:905: warning: parenthesize argument(s) for future version
test/test_plugin/bsv_xml.rb:15: warning: already initialized constant MEDDATA_SERVER
Loaded suite test/test_plugin/bsv_xml
Started
!!!!! DEPRECATION NOTICE !!!!!
The WWW constant is deprecated, please switch to the new top-level Mechanize
constant.  WWW will be removed in Mechanize version 2.0

You've referenced the WWW constant from test/test_plugin/bsv_xml.rb:684:in `test_download', please
switch the "WWW" to "Mechanize".  Thanks!

Sincerely,

  Pew Pew Pew

FE.......
Finished in 0.076687 seconds.

  1) Failure:
test_download(ODDB::TestBsvXmlPlugin) [test/test_plugin/bsv_xml.rb:688]:
Exception raised:
Class: <Errno::ENOENT>
Message: <"No such file or directory - /home/masa/ywesee/oddb.org/test/data/xml/XMLPublications.zip">
---Backtrace---
/usr/lib64/ruby/1.8/fileutils.rb:1200:in `stat'
/usr/lib64/ruby/1.8/fileutils.rb:1200:in `lstat'
/usr/lib64/ruby/1.8/fileutils.rb:1178:in `stat'
/usr/lib64/ruby/1.8/fileutils.rb:1260:in `copy_file'
/usr/lib64/ruby/1.8/fileutils.rb:463:in `copy_file'
/usr/lib64/ruby/1.8/fileutils.rb:383:in `cp'
/usr/lib64/ruby/1.8/fileutils.rb:1395:in `fu_each_src_dest'
/usr/lib64/ruby/1.8/fileutils.rb:1411:in `fu_each_src_dest0'
/usr/lib64/ruby/1.8/fileutils.rb:1393:in `fu_each_src_dest'
/usr/lib64/ruby/1.8/fileutils.rb:382:in `cp'
test/test_plugin/bsv_xml.rb:680:in `test_download'
/usr/lib64/ruby/gems/1.8/gems/flexmock-0.8.6/lib/flexmock/expectation.rb:78:in `call'
/usr/lib64/ruby/gems/1.8/gems/flexmock-0.8.6/lib/flexmock/expectation.rb:78:in `return_value'
/usr/lib64/ruby/gems/1.8/gems/flexmock-0.8.6/lib/flexmock/expectation.rb:59:in `verify_call'
/usr/lib64/ruby/gems/1.8/gems/flexmock-0.8.6/lib/flexmock/expectation_director.rb:42:in `call'
/usr/lib64/ruby/gems/1.8/gems/flexmock-0.8.6/lib/flexmock/core.rb:101:in `method_missing'
/usr/lib64/ruby/gems/1.8/gems/flexmock-0.8.6/lib/flexmock/core.rb:191:in `flexmock_wrap'
/usr/lib64/ruby/gems/1.8/gems/flexmock-0.8.6/lib/flexmock/core.rb:98:in `method_missing'
/home/masa/ywesee/oddb.org/src/plugin/bsv_xml.rb:682:in `download_to'
test/test_plugin/bsv_xml.rb:689:in `test_download'
test/test_plugin/bsv_xml.rb:688:in `test_download'
---------------

  2) Error:
test_update_it_codes(ODDB::TestBsvXmlPlugin):
Zip::ZipError: File /home/masa/ywesee/oddb.org/test/data/xml/XMLPublications.zip not found
    /usr/lib64/ruby/gems/1.8/gems/rubyzip-0.9.4/lib/zip/zip.rb:1396:in `initialize'
    /usr/lib64/ruby/gems/1.8/gems/rubyzip-0.9.4/lib/zip/zip.rb:1410:in `new'
    /usr/lib64/ruby/gems/1.8/gems/rubyzip-0.9.4/lib/zip/zip.rb:1410:in `open'
    test/test_plugin/bsv_xml.rb:701:in `test_update_it_codes'

9 tests, 64 assertions, 1 failures, 1 errors

Notes

Commit Updated BsvXmlPlugin test cases

Test result

masa@masa ~/ywesee/oddb.org $ ruby test/test_plugin/bsv_xml.rb 
Loaded suite test/test_plugin/bsv_xml
Started
.........
Finished in 0.123518 seconds.

9 tests, 68 assertions, 0 failures, 0 errors

masa@masa ~/ywesee/oddb.org $ ls test/data/xml/
XMLPublications-2010.10.29.zip  XMLPublications.zip

Notes

  • After the test run, XMLPublications-20xx.xx.xx.zip will be created

Check SwissmedicPlugin test cases

masa@masa ~/ywesee/oddb.org $ ruby test/test_plugin/swissmedic.rb 
Loaded suite test/test_plugin/swissmedic
Started
................................
Finished in 1.159839 seconds.

32 tests, 110 assertions, 0 failures, 0 errors

Notes

  • No problem

Design a new method to download a file online

Functions

  • Retry to access the file url, if it fails it raises an error
  • The arugments are 1. url link to a file, 2. save directory, 3. save path
  • Checking a latest file
  • Comparison a latest file and a donwloaded file
  • Saving a downloaded file
  • Method name: def download_file(target_url, save_dir, save_path)
  • Return value: nil -> fail, downloaded latest file path -> success

Make a test case

test/test_plugin/bsv_xml.rb

    def test_download_file
      #return_value = @plugin.download_file("","","")
      #assert(return_value)
      assert(false)
    end

Check it fails

masa@masa ~/ywesee/oddb.org $ ruby test/test_plugin/bsv_xml.rb 
Loaded suite test/test_plugin/bsv_xml
Started
.F........
Finished in 0.119129 seconds.

  1) Failure:
test_download_file(ODDB::TestBsvXmlPlugin) [test/test_plugin/bsv_xml.rb:698]:
<false> is not true.

10 tests, 69 assertions, 1 failures, 0 errors

Make a new method only with interface

src/plugin/bsv_xml.rb

    def download_file(target_url, save_dir, file_name)
      false
    end

Update the test case

    def test_download_file
      return_value = @plugin.download_file("","","")
      assert return_value
    end

Check the failure

$ ruby test/test_plugin/bsv_xml.rb 
Loaded suite test/test_plugin/bsv_xml
Started
.F........
Finished in 0.117675 seconds.

  1) Failure:
test_download_file(ODDB::TestBsvXmlPlugin) [test/test_plugin/bsv_xml.rb:697]:
<false> is not true.

10 tests, 69 assertions, 1 failures, 0 errors

A new method

masa@masa ~/work $ cat test.rb

require 'mechanize'
require 'tempfile'

def download_file(target_url, save_dir, file_name)
  FileUtils.mkdir_p save_dir   # if there is it already, do nothing
  target_file = Mechanize.new.get(target_url)
  save_file = File.join save_dir,
           Date.today.strftime(file_name.gsub(/\./,"-%Y.%m.%d."))
  latest_file = File.join save_dir,
           Date.today.strftime(file_name.gsub(/\./,"-latest."))

p save_file
p latest_file

  # download target_file temporarily
  temp = Tempfile.new('foo')
  temp_file = temp.path
p temp_file
  target_file.save_as temp_file

  # check the latest file and comparison
  if(File.exists?(latest_file) && FileUtils.compare_file(temp_file, latest_file))
    return nil
  else
    target_file.save_as save_file
    FileUtils.cp(save_file, latest_file)
    return latest_file
  end
rescue EOFError
  retries ||= 10
  if retries > 0
    retries -= 1
    sleep 10 - retries
    retry
  else
    raise
  end
ensure
p "ensure"
  temp.close
  temp.unlink
end

target_url = 'http://bag.e-mediat.net/SL2007.Web.External/File.axd?file=XMLPublications.zip'
save_dir   = '/home/masa/work'
file_name  = 'XMLPublications.zip'
p download_file(target_url, save_dir, file_name)

Test1

masa@masa ~/work $ ls -al
-rw-r--r--  1 masa masa   1199 29. Okt 16:31 test.rb

masa@masa ~/work $ ruby test.rb 
"/home/masa/work/XMLPublications-2010.10.29.zip"
"/home/masa/work/XMLPublications-latest.zip"
"/tmp/foo.1641.0"
"ensure"
"/home/masa/work/XMLPublications-latest.zip"

$ ls -al
-rw-r--r--  1 masa masa 3398226 29. Okt 16:32 XMLPublications-2010.10.29.zip
-rw-r--r--  1 masa masa 3398226 29. Okt 16:32 XMLPublications-latest.zip

Test2

masa@masa ~/work $ ls -al
-rw-r--r--  1 masa masa   1199 29. Okt 16:33 XMLPublications-latest.zip

masa@masa ~/work $ ruby test.rb 
"/home/masa/work/XMLPublications-2010.10.29.zip"
"/home/masa/work/XMLPublications-latest.zip"
"/tmp/foo.1669.0"
"ensure"
"/home/masa/work/XMLPublications-latest.zip"

masa@masa ~/work $ ls -al
-rw-r--r--  1 masa masa 3398226 29. Okt 16:33 XMLPublications-2010.10.29.zip
-rw-r--r--  1 masa masa 3398226 29. Okt 16:33 XMLPublications-latest.zip

Test3

masa@masa ~/work $ ls -al
-rw-r--r--  1 masa masa 3398226 29. Okt 16:33 XMLPublications-latest.zip

masa@masa ~/work $ ruby test.rb 
"/home/masa/work/XMLPublications-2010.10.29.zip"
"/home/masa/work/XMLPublications-latest.zip"
"/tmp/foo.1681.0"
"ensure"
nil

masa@masa ~/work $ ls -al
-rw-r--r--  1 masa masa 3398226 29. Okt 16:33 XMLPublications-latest.zip

Test4

masa@masa ~/work $ ls -al
-rw-r--r--  1 masa masa 3398226 29. Okt 16:36 XMLPublications-2010.10.29.zip
-rw-r--r--  1 masa masa 3398226 29. Okt 16:36 XMLPublications-latest.zip

masa@masa ~/work $ ruby test.rb 
"/home/masa/work/XMLPublications-2010.10.29.zip"
"/home/masa/work/XMLPublications-latest.zip"
"/tmp/foo.1710.0"
"ensure"
nil

masa@masa ~/work $ ls -al
-rw-r--r--  1 masa masa 3398226 29. Okt 16:36 XMLPublications-2010.10.29.zip
-rw-r--r--  1 masa masa 3398226 29. Okt 16:36 XMLPublications-latest.zip

Notes

  • Looks good
view · edit · sidebar · attach · print · history
Page last modified on July 13, 2011, at 11:57 AM