view · edit · sidebar · attach · print · history

20100909 jobs/import_gkv

<< Masa.20100910-import_gkv-error | 2010 | Masa.20100908-test_import >>


  1. Review setting de.oddb.org
  2. Confirm import_gkv error
  3. Check code one by one
  4. Trace ODDB::Import::Gkv#latest_url
  5. Search html source
  6. Correct test case
  7. Confirm import_gkv
  8. Error report in Gkv suspend

Goal
  • jobs/import_gkv / 80%
Milestones
  1. Review setting de.oddb.org
  2. Confirm de.oddb.org running (by web browser)
  3. Confirm import_gkv error No error output 9:00
  4. Check code one by one 10:30
  5. Correct test case
  6. Confirm import_gkv running
  7. Error-report method in gkv
    1. Read code import_dimdi
Summary
Commits
  1. Updated Gkv#latest_url and its test case.
ToDo Tomorrow
  1. 20100907 Ticket 242, Comment 9 - notification Mail
  2. Dimdi-Import-Error.
Keep in Mind
Attached Files
  1. Attach:gem_list_productionserver.txt
  2. Attach:gem_list_myfuntoo.txt

Review setting de.oddb.org

link

  1. http://dev.ywesee.com/wiki.php/Masa/20100903-setting-deoddborg
$ sh setup.de.oddb.org
$ sh setup.de.postgresql.org

Confirm import_gkv error

masa@masa ~/ywesee/de.oddb.org/jobs $ ruby import_gkv 
***
/home/masa/ywesee/de.oddb.org/lib/oddb/html/view/drugs/package.rb:373: warning: parenthesize argument(s) for future version

No error output

Check code one by one

looks

 /lib/oddb/util/updater.rb
 if url = importer.latest_url(Mechanize.new, opts)

does not working

http://scm.ywesee.com/?p=de.oddb.org/.git;a=blob;f=lib/oddb/util/updater.rb;h=9681513c931ba4c23527a6ac369fe841223a0e08;hb=HEAD#l56

Notes

  1. opts == {}
  2. importer.latest_url(Mechanize.new, opts) becomes nil
  3. importer.class == ODDB::Import::Gkv

Trace ODDB::Import::Gkv#latest_url

lib/oddb/import/gkv.rb

  def latest_url agent, opts={}
    host = 'https://www.gkv-spitzenverband.de'
    url = '/Befreiungsliste_Arzneimittel_Versicherte.gkvnet'
    page = agent.get host + url
    if link = (page/'span[\@class=pdf]/a').first
      host + link.attributes["href"]
    end
  end
  if link = (page/'span[\@class=pdf]/a').first

becomes nil. that is why import_gkv does not work.

Note

  1. page.class == Mechanize::Page
  2. (page/'span[\@class=pdf]/a').first becomes nil

Explain

  1. page is corresponding to https://www.gkv-spitzenverband.de/Befreiungsliste_Arzneimittel_Versicherte.gkvnet
  2. Mechanize parses the page html and search the link information between span tag with class=pdf
  3. for example, <span class=pdf><a href="hogehoge">abc</a> -> link becomes 'hogehoge'

Test

require 'mechanize'

def setup_page(url, html, mech=nil)
  response = {'content-type' => 'text/html'}
  Mechanize::Page.new(URI.parse(url), response, html, 200, mech)
end


agent = Mechanize.new
url = "http://www.hogehoge.com"
html = "<html><body><span class=pdf><a href='hogehoge'>abc</a></body></html>"
page = setup_page url, html, agent


p page.search("span").inner_text
(page/'span[@class=pdf]/a').each do |item|
    p item.inner_text
end
link = (page/'span[@class=pdf]/a').first
p link.attributes["href"]

Result

masa@masa ~/work $ ruby test.rb 
"abc"
"abc"
#<Nokogiri::XML::Attr:0x3fb089168068 name="href" value="hogehoge">

Search html source

https://www.gkv-spitzenverband.de/Befreiungsliste_Arzneimittel_Versicherte.gkvnet

There is no <span class="pdf"> tag in the page above. but there is <a ... class="pdf"> tag

Test

require 'mechanize'

agent = Mechanize.new

url = "https://www.gkv-spitzenverband.de/Befreiungsliste_Arzneimittel_Versicherte.gkvnet"
page = agent.get url
link = (page/'a[@class=pdf]').first
p link.attributes["href"]

Result

masa@masa ~/work $ ruby test.rb 
#<Nokogiri::XML::Attr:0x3fa8944058cc name="href" value="/upload/Zuzahlungsbefreit_sort_Name_100901_14383.pdf">

Summary

  1. The html source code at https://www.gkv-spitzenverband.de/Befreiungsliste_Arzneimittel_Versicherte.gkvnet changed
  2. That is why import_gkv (in particular, latest_url method) does not run
    1. more specifically, (page/'span[\@class=pdf]/a') cannot catch any information and become nil
  3. it should be now (page/'a[\@class=pdf]')

Correct test case

Confirm

masa@masa ~/ywesee/de.oddb.org $ ruby test/import/test_gkv.rb 
Loaded suite test/import/test_gkv
Started
........
Finished in 0.0906060000000001 seconds.

8 tests, 43 assertions, 0 failures, 0 errors

Commit Updated Gkv#latest_url and its test case.

Confirm import_gkv

Looks running...but I got an error

WARNING:  nonstandard use of \' in a string literal
ZEILE 2:           VALUES (214062, 'pfizer ltd. \'', 66064)
                                   ^
TIP:  Use '' to write quotes in strings, or use the escape string syntax (E'...').
WARNING:  nonstandard use of \' in a string literal
ZEILE 2:           VALUES (214062, 'pfizer ltd. \'', 133820)
                                   ^
TIP:  Use '' to write quotes in strings, or use the escape string syntax (E'...').
WARNING:  nonstandard use of \' in a string literal
ZEILE 2:           VALUES (214062, 'pfizer ltd. \'', 163939)
                                   ^
TIP:  Use '' to write quotes in strings, or use the escape string syntax (E'...').
WARNING:  nonstandard use of \' in a string literal
ZEILE 2:           VALUES (214062, 'pfizer ltd. \'', 168431)
                                   ^
TIP:  Use '' to write quotes in strings, or use the escape string syntax (E'...').
WARNING:  nonstandard use of \' in a string literal
ZEILE 2:           VALUES (214062, 'pfizer ltd. \'', 150549)
                                   ^
TIP:  Use '' to write quotes in strings, or use the escape string syntax (E'...').
WARNING:  nonstandard use of \' in a string literal
ZEILE 2:           VALUES (214062, 'pfizer ltd. \'', 153327)
                                   ^
TIP:  Use '' to write quotes in strings, or use the escape string syntax (E'...').
WARNING:  nonstandard use of \' in a string literal
ZEILE 2:           VALUES (214062, 'pfizer ltd. \'', 150536)
                                   ^
TIP:  Use '' to write quotes in strings, or use the escape string syntax (E'...').
WARNING:  nonstandard use of \' in a string literal
ZEILE 2:           VALUES (214062, 'pfizer ltd. \'', 150539)
                                   ^
TIP:  Use '' to write quotes in strings, or use the escape string syntax (E'...').
E, [2010-09-09T12:37:58.026660 #6373] ERROR -- Gkv: both user and secret are required

Search the error message 'both user and secret are required'

masa@masa ~/ywesee/de.oddb.org $ grep -r "both user and secret are required" *
README: /usr/lib/ruby/1.8/net/smtp.rb:562:in `check_auth_args': both user and secret are required (ArgumentError)

This is Davatz-san's log in README.

 There must be a configuration somewhere that sets the Mail-sending method.

Search error place

masa@masa /usr/lib/ruby/1.8 $ grep -r "both user and secret are required" *
net/smtp.rb:      raise ArgumentError, 'both user and secret are required'\

Look at /usr/lib/ruby/1.8/net/smtp.rb

Set p

    def check_auth_args( user, secret, authtype )
# masa
p "get-in check_auth_args"
print caller(0).pretty_inspect.join("\n").to_s,"\n"
      raise ArgumentError, 'both user and secret are required'\
                      unless user and secret
      auth_method = "auth_#{authtype || 'cram_md5'}"
      raise ArgumentError, "wrong auth type #{authtype}"\
                      unless respond_to?(auth_method, true)
    end

import_gvk takes about one hour.

Looks Gkv#import takes long.

  def import fh, opts={}
    parser = Rpdf2txt::Parser.new(fh.read, 'utf8')
    handler = GkvHandler.new method(:process_page)
    parser.extract_text handler

In particular, parser.extract_text handler takes over 30 minutes.

Result

"get-in Updater.import_gkv"
"a"
"b"
"get-in Gkv#download_latest"
"11"
"12"
"13"
"14"
"15"
"16"
"17"
"18"
"19"
"20"
"21"
"c"
"get-in Updater._reported_import"
"A"
"get-in Gkv#import"
"1"
"2"
WARNING:  nonstandard use of \' in a string literal
ZEILE 2:           VALUES (214062, 'pfizer ltd. \'', 66064)
                                   ^
TIP:  Use '' to write quotes in strings, or use the escape string syntax (E'...').
WARNING:  nonstandard use of \' in a string literal
ZEILE 2:           VALUES (214062, 'pfizer ltd. \'', 133820)
                                   ^
TIP:  Use '' to write quotes in strings, or use the escape string syntax (E'...').
WARNING:  nonstandard use of \' in a string literal
ZEILE 2:           VALUES (214062, 'pfizer ltd. \'', 163939)
                                   ^
TIP:  Use '' to write quotes in strings, or use the escape string syntax (E'...').
WARNING:  nonstandard use of \' in a string literal
ZEILE 2:           VALUES (214062, 'pfizer ltd. \'', 168431)
                                   ^
TIP:  Use '' to write quotes in strings, or use the escape string syntax (E'...').
WARNING:  nonstandard use of \' in a string literal
ZEILE 2:           VALUES (214062, 'pfizer ltd. \'', 150549)
                                   ^
TIP:  Use '' to write quotes in strings, or use the escape string syntax (E'...').
WARNING:  nonstandard use of \' in a string literal
ZEILE 2:           VALUES (214062, 'pfizer ltd. \'', 153327)
                                   ^
TIP:  Use '' to write quotes in strings, or use the escape string syntax (E'...').
WARNING:  nonstandard use of \' in a string literal
ZEILE 2:           VALUES (214062, 'pfizer ltd. \'', 150536)
                                   ^
TIP:  Use '' to write quotes in strings, or use the escape string syntax (E'...').
WARNING:  nonstandard use of \' in a string literal
ZEILE 2:           VALUES (214062, 'pfizer ltd. \'', 150539)
                                   ^
TIP:  Use '' to write quotes in strings, or use the escape string syntax (E'...').
"3"
"4"
"B"
"get-in check_auth_args"
E, [2010-09-09T16:23:02.892603 #9785] ERROR -- Gkv: undefined method `join' for #<String:0x7f472212eb78>
"d"
"e"

Consideration

  1. oh! I have a mistake!!
  2. I should do
    1. print caller(0).join("\n"), "\n"

Run again

I will check the result tomorrow.

view · edit · sidebar · attach · print · history
Page last modified on July 13, 2011, at 12:03 PM