view · edit · sidebar · attach · print · history

20101216-update-import_de_oddb_org_Festbe_Zubef

<< Masa.20101217-update-mail_process-bbmb_ch | 2010 | Masa.20101215-update-bbmb-ch >>


  1. Confirm Task
  2. Local test
  3. Implement dimdi_path
  4. Update xls2odat
  5. Debug import_gkv (de.oddb.org)
  6. Update bbmb.ch separating email config and change smpt server

Goal
  • Update import_dimdi / 70%
Milestones
  1. Confirm what to do
  2. Update import_dimdi 10:00
  3. remove xls2odat warning
    1. commit and push
    2. rake release 10:30
  4. debug import_gkv (de.oddb.org) 15:30
  5. Update bbmb.ch separate email config and change smpt server
  6. Debug swissmedic_followers
Summary
Commits
ToDo Tomorrow
  • check smtp_tls
  • bbmb.ch updating
Keep in Mind
  1. export_fachinfo test locally (weekend)
  2. swissmedic_followers debug
  3. change email method bbmb.ch
  4. rpdf2txt announcement 20101214
  5. On Ice
  6. emerge --sync

Confirm Task

At the moment, every quater (3 months) the following URL must be changed manually

DIMDI_PATH = "http://www.dimdi.de/dynamic/de/amg/fbag/downloadcenter/2010/4-quartal/"

Task

  • To change the URL automatically

Reference

Local test

Design

  • http://.../(year)/([1-4])-quartal/ changes depending on the month
  • 1:Jan.-Mar., 2:Apr.-Jun., 3:Jul.-Sep., 4:Oct.-Dec.
  • DIMDI_PATH (constant) is not good, since the URL should change every quater without the system rebooting
    • It should be into a normal variable
  • or replace to method, dimdi_path or download_path, then it is easy to make a test case

test.rb

quater = ((Time.now.month-1)/3+1).to_s
year  = Time.now.year.to_s
dimdi_path = "http://www.dimdi.de/dynamic/de/amg/fbag/downloadcenter/" + year + "/" + quater + "-quartal/"
p Time.now
p dimdi_path
puts
1.upto(12) do |month|
  quater = ((month-1)/3+1).to_s
  year  = Time.now.year.to_s
  dimdi_path = "http://www.dimdi.de/dynamic/de/amg/fbag/downloadcenter/" + year + "/" + quater + "-quartal/"
  print month, ": ", dimdi_path, "\n"
end

Result

masa@masa ~/work $ ruby test.rb 
Thu Dec 16 08:24:21 +0100 2010
"http://www.dimdi.de/dynamic/de/amg/fbag/downloadcenter/2010/4-quartal/"

1: http://www.dimdi.de/dynamic/de/amg/fbag/downloadcenter/2010/1-quartal/
2: http://www.dimdi.de/dynamic/de/amg/fbag/downloadcenter/2010/1-quartal/
3: http://www.dimdi.de/dynamic/de/amg/fbag/downloadcenter/2010/1-quartal/
4: http://www.dimdi.de/dynamic/de/amg/fbag/downloadcenter/2010/2-quartal/
5: http://www.dimdi.de/dynamic/de/amg/fbag/downloadcenter/2010/2-quartal/
6: http://www.dimdi.de/dynamic/de/amg/fbag/downloadcenter/2010/2-quartal/
7: http://www.dimdi.de/dynamic/de/amg/fbag/downloadcenter/2010/3-quartal/
8: http://www.dimdi.de/dynamic/de/amg/fbag/downloadcenter/2010/3-quartal/
9: http://www.dimdi.de/dynamic/de/amg/fbag/downloadcenter/2010/3-quartal/
10: http://www.dimdi.de/dynamic/de/amg/fbag/downloadcenter/2010/4-quartal/
11: http://www.dimdi.de/dynamic/de/amg/fbag/downloadcenter/2010/4-quartal/
12: http://www.dimdi.de/dynamic/de/amg/fbag/downloadcenter/2010/4-quartal/

Note

  • Good

Implement dimdi_path

grep DIMDI_PATH

masa@masa ~/ywesee/de.oddb.org $ grep -r DIMDI_PATH *
lib/oddb/import/dimdi.rb:  DIMDI_PATH = "http://www.dimdi.de/dynamic/de/amg/fbag/downloadcenter/2010/4-quartal/"
lib/oddb/import/dimdi.rb:    url = File.join(DIMDI_PATH, file)

Note

  • There is only one place where DIMDI_PATH is used
  • So, it is easy to replace the constant to a variable

Check test case

masa@masa ~/ywesee/de.oddb.org $ ruby test/import/test_dimdi.rb 
Loaded suite test/import/test_dimdi
Started
..E...
Finished in 0.054318 seconds.

  1) Error:
test_import_base_data(ODDB::Import::Dimdi::TestProduct):
Errno::ENOENT: No such file or directory - /home/masa/ywesee/de.oddb.org/test/import/data/xls/fb010708.xls
    /usr/lib64/ruby/1.8/open-uri.rb:32:in `initialize'
    /usr/lib64/ruby/1.8/open-uri.rb:32:in `open_uri_original_open'
    /usr/lib64/ruby/1.8/open-uri.rb:32:in `open'
    test/import/test_dimdi.rb:138:in `test_import_base_data'

6 tests, 37 assertions, 0 failures, 1 errors

Note

  • There is one error

Then

  • Comment out the error temporarily

Make a test case for method, dimdi_path

test/import/test_dimdi.rb

module ODDB
  module Import
    class TestDimdi < Test::Unit::TestCase
      include FlexMock::TestCase
...
      def test_dimdi_path
        assert(true)
      end
...

Result

masa@masa ~/ywesee/de.oddb.org $ ruby test/import/test_dimdi.rb 
Loaded suite test/import/test_dimdi
Started
......
Finished in 0.040949 seconds.

6 tests, 37 assertions, 0 failures, 0 errors

Implement test_dimdi_path

      def test_download_path
        flexstub(Time) do |timeclass|
          timeclass.should_receive(:now).and_return do
            flexmock do |nowobj|
              nowobj.should_receive(:month).and_return(12)
              nowobj.should_receive(:year).and_return(2010)
            end
          end
        end
        download_path = "http://www.dimdi.de/dynamic/de/amg/fbag/downloadcenter/2010/4-quartal/"
        assert_equal(download_path, Dimdi.download_path)
      end

Confirm it fails

masa@masa ~/ywesee/de.oddb.org $ ruby test/import/test_dimdi.rb 
Loaded suite test/import/test_dimdi
Started
.....F
Finished in 0.0425 seconds.

  1) Failure:
test_download_path(ODDB::Import::TestDimdi) [test/import/test_dimdi.rb:28]:
<"http://www.dimdi.de/dynamic/de/amg/fbag/downloadcenter/2010/4-quartal/"> expected but was
<nil>.

6 tests, 37 assertions, 1 failures, 0 errors

Implement Dimdi.download_url

lib/oddb/import/dimdi.rb

  DIMDI_PATH = "http://www.dimdi.de/dynamic/de/amg/fbag/downloadcenter/"
  def Dimdi.download_path
    quater = ((Time.now.month-1)/3+1).to_s
    year  = Time.now.year.to_s
    return DIMDI_PATH + year + "/" + quater + "-quartal/"
  end

Confirm test passes

masa@masa ~/ywesee/de.oddb.org $ ruby test/import/test_dimdi.rb 
Loaded suite test/import/test_dimdi
Started
......
Finished in 0.040592 seconds.

6 tests, 37 assertions, 0 failures, 0 errors

Update the other part where DIMDI_PATH is used

lib/oddb/import/dimdi.rb

module ODDB
  module Import
module Dimdi
...
  def Dimdi.download(file, &block)
    #url = File.join(DIMDI_PATH, file)
    url = File.join(download_path, file)

Check test case again

masa@masa ~/ywesee/de.oddb.org $ ruby test/import/test_dimdi.rb 
Loaded suite test/import/test_dimdi
Started
......
Finished in 0.040888 seconds.

6 tests, 37 assertions, 0 failures, 0 errors

Commit

Update xls2odat

xls2odat puts a warning

masa@masa ~/work $ sudo gem install xls2odat
Passwort: 
Successfully installed xls2odat-1.1.5
1 gem installed
/usr/lib64/ruby/gems/1.8/gems/rdoc-2.5.11/lib/rdoc/ruby_lex.rb:67: warning: parenthesize argument(s) for future version
Installing ri documentation for xls2odat-1.1.5...
Installing RDoc documentation for xls2odat-1.1.5...

masa@masa ~/work $ xls2odat
/usr/lib64/ruby/gems/1.8/gems/xls2odat-1.1.5/lib/xls2odat.rb:21: warning: already initialized constant VERSION
/usr/bin/xls2odat ver.1.1.7
Usage: /usr/bin/xls2odat <config file (.xls)> <data files (.xls)>...

Task

  • Remove the warning

Cause

  • The constant, VERSION, is duplicated and defined
class Xls2odat
  VERSION = '1.1.5'
  VERSION = File.readlines(__FILE__).grep(/Version/)[0].match(/Version::\s+([0-9.]+)/)[1]

ToDo

  • delete the second line and replace the version from 1.1.5 to 1.1.7

Commit

rake release

masa@masa ~/ywesee/xls2odat $ rake release VERSION=1.1.7 --trace
...
Pushing gem to RubyGems.org...
Successfully registered gem: xls2odat (1.1.7)

Note

  • Done!!

Reference

Debug import_gkv (de.oddb.org)

Masa's warning comes when jobs/import_gkv

/var/www/de.oddb.org $ sudo -u apache jobs/import_gkv
....
WARNING: Updater.import_gkv did nothing. It looks failing in grabbing PDF link.
Check HTML source code at https://www.gkv-spitzenverband.de/Befreiungsliste_Arzneimittel_Versicherte.gkvnet
Probably you have to modify Gkv#latest_url method, in particular, this part: link = (page/'a[@class=pdf]')
20100910 masa

Confirm the same warning locally

  • Run de.oddb.org/bin/oddbd
  • Run jobs/import_gkv

Result

  • I got the same warning

Check the PDF URL

lib/oddb/import/gkv.rb#latest_url

  def latest_url agent, opts={}
    host = 'https://www.gkv-spitzenverband.de'
    url = '/Befreiungsliste_Arzneimittel_Versicherte.gkvnet'
    page = agent.get host + url
    if link = (page/'a[@ class=pdf]').first
      host + link.attributes["href"]
    end
  end

Check the URL on browser

Check the PDF URL in HTML source

  • In the HTML source, there is like
<span class="pdf">Liste der zuzahlungsbefreiten Arzneimittel (Stand: 15. Dezember 2010)</span>
  • looks no problem

Trace the source code

Experiment

lib/oddb/import/gkv.rb#latest_url

  def latest_url agent, opts={}
    host = 'https://www.gkv-spitzenverband.de'
    url = '/Befreiungsliste_Arzneimittel_Versicherte.gkvnet'
    page = agent.get host + url
print "page/'a[@ class=pdf]'="
p page/'a[@class=pdf]'
    if link = (page/'a[@class=pdf]').first
      host + link.attributes["href"]
    end
  end

Result

page/'a[@class=pdf]'=[]

Note

  • Actually, the link is not grabbed

Got it!!

<a href="/upload/Zuzahlungsbefreit_sort_Name_101215_15212.pdf" target="_blank"><span class="pdf">Liste der zuzahlungsbefreiten Arzneimittel (Stand: 15. Dezember 2010)</span> </a>

Note

  • In HTML, 'class=pdf' is no more in 'a' tag, but 'span' tag
  • So we should change the code, page/'span[ @class=pdf]', but link information is still in 'a' tag.
  • What should I do

Experiment lib/oddb/import/gkv.rb#latest_url

  def latest_url agent, opts={}
    host = 'https://www.gkv-spitzenverband.de'
    url = '/Befreiungsliste_Arzneimittel_Versicherte.gkvnet'
    page = agent.get host + url
print "page/'span[ @class=pdf]'="
p page/'span[@class=pdf]'
link = (page/'a[@class=pdf]').first-attributes['href']
print  "link="
p link
    if link = (page/'a[@class=pdf]').first
      host + link.attributes["href"]
    end
  end

Run jobs/import_gkv

Result

page/'span[@class=pdf]'=[#<Nokogiri::XML::Element:0x3fcc4c2ff884 name="span" attributes=[#<Nokogiri::XML::Attr:0x3fcc4c2ff49c name="class" value="pdf">] children=[#<Nokogiri::XML::Text:0x3fcc4c2ff064 "Liste der zuzahlungsbefreiten Arzneimittel (Stand: 15. Dezember 2010)">]>]
link=nil

Note

  • Although span tag is grabbed but link information is not grabbed as I expected

Design

  • grab the link with the file name, not 'class' attribute

Experiment

lib/oddb/import/gkv.rb#latest_url

  def latest_url agent, opts={}
    host = 'https://www.gkv-spitzenverband.de'
    url = '/Befreiungsliste_Arzneimittel_Versicherte.gkvnet'
    page = agent.get host + url
file_base_name = "Zuzahlungsbefreit"
link = (page/'a').map{|tag| tag['href']}.grep(/#{file_base_name}/)
p link.length
p link.to_s
    if link = (page/'a[@class=pdf]').first
      host + link.attributes["href"]
    end
  end

Result

1
"/upload/Zuzahlungsbefreit_sort_Name_101215_15212.pdf"

Note

  • Good!
  • link.length should be checked, in my opinion

Reference

Experiment

lib/oddb/import/gkv.rb#latest_url

  def latest_url agent, opts={}
    host = 'https://www.gkv-spitzenverband.de'
    url = '/Befreiungsliste_Arzneimittel_Versicherte.gkvnet'
    page = agent.get host + url
file_base_name = "Zuzahlungsbefreit"
link = (page/'a').map{|tag| tag['href']}.grep(/#{file_base_name}/)
    if link.length == 1 and link.to_s.match(/\.pdf/)
      return host + link.to_s
    else
      return nil
    end

  end

Check test case

masa@masa ~/ywesee/de.oddb.org $ ruby test/import/test_gkv.rb 
Loaded suite test/import/test_gkv
Started
.........
Finished in 0.092772 seconds.

9 tests, 44 assertions, 0 failures, 0 errors

Run jobs/import_gkv

Result

Thu Dec 16 13:46:06 2010: de.oddb.org ODDB::Import::Gkv#import
Imported  6618 Zubef-Entries on 16.12.2010:
Visited   6576 existing Zubef-Entries
Visited   6616 existing Companies
Visited   1033 existing Substances
Created     42 new Zubef-Entries
Created      2 new Products
Created      4 new Sequences
Created      2 new Companies
Created      1 new Substances
Assigned     7 Chemical Equivalences
Assigned     5 Companies
Created      1 Incomplete Packages:
http://de.oddb.org/de/drugs/package/pzn/6811219
Created      1 Product(s) without a name (missing product name):
http://de.oddb.org/de/drugs/product/uid/3480899

Note

  • Good

Note

  • I do not have to reboot de.oddb.org, even after updating the code

Commit

Run jobs/import_gkv on de.oddb.org server

/var/www/de.oddb.org $ sudo -u apache jobs/import_gkv 

Result

Thu Dec 16 14:37:44 2010: de.oddb.org ODDB::Import::Gkv#import
Imported  6618 Zubef-Entries on 16.12.2010:
Visited   6576 existing Zubef-Entries
Visited   6616 existing Companies
Visited   1033 existing Substances
Created     42 new Zubef-Entries
Created      2 new Products
Created      4 new Sequences
Created      2 new Companies
Created      1 new Substances
Assigned     7 Chemical Equivalences
Assigned     5 Companies
Created      1 Incomplete Packages:
http://de.oddb.org/de/drugs/package/pzn/6811219
Created      1 Product(s) without a name (missing product name):
http://de.oddb.org/de/drugs/product/uid/3480899

Update bbmb.ch separating email config and change smtp server

We need to replace the email-Method of bbmb.ch so we can send Email through the Google SMTP-Server and not through localhost. 

Design

  • I have to separate the account data and the sending process code
  • The structure of de.oddb.org, bbmb.ch, oddb.org are helpful
  • But at the moment, we do not change most of the code and
  • we just take the account information from bbmb.yaml
  • This is the simplest solution

Check mail sending process bbmb.ch and de.oddb.org

bbmb.ch

Note

de.oddb.org

Notes

  • Application default parameters are stored in lib/oddb/config.rb
    • oddb.org and bbmb (ver.2) too
    • Default smtp server and account are also stored in lib/oddb/config.rb
    • but actual parameters are in de.oddb.org/etc/oddb.yml or /etc/oddb/oddb.yml
    • Email addresses are not stored in lib/oddb/config.rb

Reference

Test net/smtp locally

test.rb


require 'net/smtp'
require 'smtp_tls'
require 'rmail'


def make_header(subject, from, recipients, charset="utf8")
  mpart = RMail::Message.new
  header = mpart.header
  header.to = recipients
  header.from = from
  header.subject = subject
  header.date = Time.now
  header.add('Content-Type', 'text/plain', nil,
             'charset' => charset)
  return mpart
end

def sendmail(subject, message, from, recipients, charset="utf8")
  mpart = make_header(subject, from, recipients, charset)
  mpart.body = message.join("\n")

  smtp_server = "smtp.gmail.com"
  smtp_port = 25
  smtp_domain = "ywesee.com"
  smtp_user = from
  smtp_pass = "ppp"
  smtp_authtype = :plain

  Net::SMTP.start(smtp_server, smtp_port,
                  smtp_domain, smtp_user, smtp_pass,
                  smtp_authtype) do |smtp|
    recipients.each do |recipient|
      smtp.sendmail(mpart.to_s, from, recipient)
    end
  end
end


recipients = ["mhatakeyama@ywesee.com"]
from = "mhatakeyama@ywesee.com"
message = ["hello, world", "by masa"]
subject = "test mail"
charset = "utf8"


sendmail(subject, message, from, recipients, charset)

Result

hello, world
by masa

Note

  • Good

Summary

necessary information to net/smtp

  • smtp_server
  • smtp_port
  • smtp_domain
  • smtp_user = mail_from
  • smtp_pass
  • smtp_authtype

For mail header and body

  • subject
  • message (array)
  • mail_from
  • recipients (array)
  • charset

Libraries

  • net/smtp
  • rmail
  • smtp_tls
view · edit · sidebar · attach · print · history
Page last modified on July 13, 2011, at 12:05 PM