view · edit · sidebar · attach · print · history

20110131-debug-fach_news-oddb_org

<< | Index | >>


  1. Check Fach Pationfo error (oddb.org)
  2. Update Fach-Patinfo news jobs

Goal
  • Debug fach patinfo news / 100%
Milestones
  1. Check error 8:30
  2. Trace and understand the code
Summary

Memo

  • The URLs for the data source in oddb.org should be in one file or database and
  • they should be summarized in a document with pictures

Words

  • Wirkstoff: Active agent
  • Fachinformation: Doctor information
  • Pationformation: Patient information
Commits
ToDo Tomorrow
Keep in Mind
  1. Check drug data structure (class diagram) in ch.oddb.org, de.oddb.org, and ramaze oddb
  2. Check rdbi instead of dbi for ODBA
  3. Encoding woes (from Davatz-san)
  4. Feedback: This option indicates that the regular expression is parsed as 'UTF8' (from Davatz-san)
  5. pg on Ubuntu - see http://dev.ywesee.com/wiki.php/Gem/Pg (from Davatz-san)
  6. On Ice
  7. emerge --sync

Check Fach Pationfo error (oddb.org)

Email ch.ODDB.org Report - Error: Fach- und Patienteninfo News - 01/2011

Check message

src/plugin/text_info.rb#fachinfo_news

    def fachinfo_news agent=init_agent
      url = ODDB.config.text_info_newssource \
        or raise 'please configure ODDB.config.text_info_newssource to proceed'
      ids = []
      page = agent.get url
...

Check oddb.yml

 ...
 text_info_searchform: same
 text_info_newssource: different

@]

Next

  • Search the updated url

Trace and understand the code (process)

src/plugin/text_info.rb#fachinfo_news

    def fachinfo_news agent=init_agent
      url = ODDB.config.text_info_newssource \
        or raise 'please configure ODDB.config.text_info_newssource to proceed'
      ids = []
      page = agent.get url
      page.links.each do |link|
        if id = extract_fachinfo_id(link.href)
          ids.push [id, link.text.gsub(/;$/, '')]
        end
      end
      ids
    end

Notes

  • This method just collects IDs of fachinfomation from the link information, and save ids Array and return them

Next

  • Check 'extract_fachinfo_id' method

src/plugin/text_info.rb#extract_fachinfo_id

    def extract_fachinfo_id href
      fi_ptrn = /Monographie.aspx\?Id=([0-9A-Fa-f\-]{36}).*MonType=fi/u
      if match = fi_ptrn.match(href.to_s)
        match[1]
      end
    end

Notes

  • This method extracts the ID information from the line (href)

Memo

  • text_info_searchform looks still working
  • text_info_newssource does not work anymore

Memo

  • The 'Kompendium' is published once a year but the 'Supplementum' is published once in 3 months (4 times published in a year)

Experiment (test fachinfo_news method test.rb)

Result

masa@masa ~/work $ ruby test.rb
[["0f23f0bd-7762-40e9-ac8e-39b2e04ca3dc",
  "Abilify\302\256 Injektionsl\303\266sung"],
 ["6a80a9c3-d194-405f-a8c4-0fdc21a72062", "Amavita ASS 500"],
 ["ab817525-5754-4118-8bc4-67ff30be4881", "Amavita Bisacodyl"],
 ["edcf285c-68f3-4113-9c37-717d3a70838a", "Amavita Cetirizin"],
 ["093cdd8b-593e-473a-8005-69481e56ba87",
  "Amavita Sport- und Rheumabalsam w\303\244rmend"],
...

Next

  • Where is the fachinfo_news method called from?

Check the flow (Fach- und Patienteninfo News job)

8. src/plugin/text_info.rb#fachinfo_news
7. src/plugin/text_info.rb#import_news
6. src/util/updater.rb#update_textinfo_news
5. src/util/updater.rb#update_fachinfo
4. src/util/updater.rb#run_random
3. src/util/oddbapp.rb#run_random_updater
2. src/util/oddbapp.rb#reset
1. src/util/oddbapp.rb#initialize
Notes

Update Fach-Patinfo news jobs

Algorithm

  1. Get the updated list of 'Neue und veränderte Fachinformationen' (once a week)
  2. Check each Fach- and Patinformation at the search page
  3. Compare the registration date with the database
  4. Then update the data if it is new

Check the current update process (fachinfo_news, etc.)

Experiment (get new Fachinformation list (get_fach.rb))

Check the source HTML

<div id="blockContentInner"><!--CONTENT:START--><p>Abilify® <br/>Abilify® Injektionslösung ...

Result

masa@masa ~/work $ ruby get_fachinfo.rb|
"Abilify\302\256"
"Abilify\302\256 Injektionsl\303\266sung"
"Abseamed\302\256"
"Aceril\302\256- mite"
"AcetaPhos\302\256 750 mg"
"Acimethin\302\256"
...

References

Read the code after getting the update list

src/plugin/text_info#import_news

    def import_news agent=init_agent
      old_news = old_fachinfo_news
      updates = true_news fachinfo_news(agent), old_news
      updates.reverse!
      indices, names = updates.transpose
      if names
        import_name names, agent
        log_news updates + old_news
        postprocess
      end
      !updates.empty?
    end

Next

  • 'import_name': this is the main part for the updating
  • 'log_news': log
  • 'postprocess': ???

src/plugin/text_info.rb#import_name

    def import_name terms, agent=init_agent
      @search_term = terms.to_a.join ', '
      terms.to_a.each do |term|
        @current_search = [:search_product, term]
        page = search_product term, agent
        import_products page, agent
      end
    end

Note

  • The argument is the 'names' Array that should be updated
  • @search_term will be used for the reporting later

Next

  • 'search_product': probably search the product by using the search function on the web
    • the return value will be the page after the searching (mechanize object)
  • 'import_products':

src/plugin/text_info.txt#search_product

    def search_product name, agent
      search 'rbPraeparat', name, agent
    end

Note

  • 'name' is each element in 'names' array

Next

  • 'search' method

src/plugin/text_info.rb#search

    def search type, term, agent
      page = init_searchform agent
      form = page.form_with :name => 'frmSearchForm'
      unless type == 'rbPraeparat' ## default value, clicking leads to an error
        form.radiobutton_with(:value => type).click
      end
      form['txtSearch'] = term
      agent.submit form
    end

Note

  • This is the important part, which uses the searching function on the web

Next

  • 'init_searchform'

Memo

  • There is no direct comparison process of online data with the data in the database
  • Only the comparison is the first line in 'import_news' method
      old_news = old_fachinfo_news
      updates = true_news fachinfo_news(agent), old_news

src/plugin/text_info.rb#init_searchform

    def init_searchform agent
      url = ODDB.config.text_info_searchform \
        or raise 'please configure ODDB.config.text_info_searchform to proceed'
      page = agent.get(url)
      form, = page.form_with :name => 'frmNutzungsbedingungen'
      if form
        if btn = form.button_with(:name => 'btnAkzeptieren')
          page = agent.submit form, btn
        end
      end
      page
    end

Note

  • 'ODDB.config.text_info_searchform' is stored in etc/oddb.yml file
  • This URL have not been changed
  • I cannot find the form tag named 'frmNutzungsbedingungen' but this code looks working

Next

  • 'import_prodcuts'

src/plugin/text_info.rb#import_products

    def import_products page, agent
      fi_sources = identify_eventtargets page, /dtgFachinfo/
      pi_sources = identify_eventtargets page, /dtgPatienteninfo/
      form = page.form_with :name => /frmResult(Produkte|hForm)/
      fi_sources.sort.each do |name, eventtarget|
        import_product name, agent, form, eventtarget, pi_sources[name]
      end
    end

Note

  • There is a case where it does not have 'pi_sources[name]' value

Next

  • 'identify_eventtargets'

src/plugin/text_info.rb#identify_eventtargets

    def identify_eventtargets page, ptrn
      eventtargets = {}
      page.links_with(:href => ptrn).each do |link|
        eventtargets.store link.text, eventtarget(link.href)
      end
      eventtargets
    end

Note

  • The return value is Hash

Next

  • 'eventtarget'

src/plugin/text_info.rb#eventtarget

    def eventtarget string
      if match = /doPostBack\('([^']+)'.*\)/.match(string.to_s)
        match[1]
      end
    end

Note

  • This method grabs something from the link URL to match the regular expression
  • What is that?

Check the source html

  • actual html code
  • This is the first argument of the javascript function which is linking to Fach- and Patienteninformation

Note

  • fi_sources (Hash) includes the name of Fachinformation and link to the argument of javascript function
  • pi_sources (Hash) also includes the name and link of Patienteninformation

Next

  • 'import_product'

src/plugin/text_info.rb#import_product

    def import_product name, agent, form, fi_target, pi_target
      fi_paths, fi_flags = download_info :fachinfo, name, agent, form, fi_target
      if pi_target
        pi_paths, pi_flags = download_info :patinfo, name, agent, form, pi_target
      end
      update_product name, fi_paths, pi_paths || {}, fi_flags, pi_flags || {}
    end

Note

  • 'fi_target' and 'pi_target' are the first arguments of the javascript function

Next

  • 'download_info'
  • 'update_product'

src/plugin/text_info.rb#download_info

Experiment

src/plugin/text_info.rb#import_product

    def import_product name, agent, form, fi_target, pi_target
p "getin import_product"
      fi_paths, fi_flags = download_info :fachinfo, name, agent, form, fi_target
print "fi_paths="
p fi_paths
print "fi_flags="
p fi_flags
exit
      if pi_target
        pi_paths, pi_flags = download_info :patinfo, name, agent, form, pi_target
      end
      update_product name, fi_paths, pi_paths || {}, fi_flags, pi_flags || {}
    end

Run

  • bin/oddbd
  • ext/currency/bin/currencyd

Run update_textinfo_news

masa@masa ~/ywesee/oddb.org $ bin/admin
ch.oddb> Updater.new(self).update_textinfo_news
-> connection closed

Result

fi_paths={:de=>"/home/masa/ywesee/oddb.org/data/html/fachinfo/de/Abilify\302\256.html", :fr=>"/home/masa/ywesee/oddb.org/data/html/fachinfo/fr/Abilify\302\256.html"}
fi_flags={:de=>:up_to_date, :fr=>:up_to_date}

Note

  • This means that the html data of updated Fach- and Patienteninformation is saved in the data/html directory

Next

  • 'update_product'

src/plugin/text_info.rb#update_product

Note

  • This is the most important part of updating
  • By this method, the name, html file path, and up-to-date flag, of both Fach- and Patienteninformation are prepared

Consideration

  • Only the different part is the one that grabs the list of new Fachinformation
  • The searching and updating parts are still available

Experiment

src/plugin/text_info.rb#import_news

    def import_news agent=init_agent
      old_news = old_fachinfo_news
      updates = true_news fachinfo_news(agent), old_news
      updates.reverse!
      indices, names = updates.transpose
 names = ['Abilify']
      if names
        import_name names, agent
        log_news updates + old_news
        postprocess
      end
      !updates.empty?
    end

Run

  • bin/oddbd
  • currency/bin/currencyd

Run update_textinfo_news

masa@masa ~/ywesee/oddb.org $ bin/admin
ch.oddb> Updater.new(self).update_textinfo_news
-> connection closed

Result

Searched for Abilify
Stored 0 Fachinfos
Ignored 0 Pseudo-Fachinfos
Ignored 0 up-to-date Fachinfo-Texts
Stored 0 Patinfos
Ignored 0 up-to-date Patinfo-Texts

Checked 0 companies


Unknown Iks-Numbers: 0


Fachinfos without iksnrs: 0


Session failures: 0

Download errors: 0


Parse Errors: 2
druby://localhost:10002 - #<Errno::ECONNREFUSED: Connection refused - connect(2)>
druby://localhost:10002 - #<Errno::ECONNREFUSED: Connection refused - connect(2)>

Note

  • '10002' is FIPARSE_URI
 masa@masa ~/ywesee/oddb.org $ grep -r 10002 src
 src/util/oddbconfig.rb: FIPARSE_URI = "druby://localhost:10002"
  • So, I should run ext/piparse/bin/fiparsed
view · edit · sidebar · attach · print · history
Page last modified on January 31, 2011, at 05:48 PM