view · edit · sidebar · attach · print · history

20121204-update-text-info-updater

<< | Index | >>


Summary

  • Update TextInfoPlugin
    • get fachinfo list form search result by company name.
    • get fachinfo and patinfo with DE and FR.
  • next
    • parse HTML and save.

Commits

not yet

Index


Update text info plugin

  • import_news
  • import_company
  • import_companies

Job interface (currently, all new methods are added "2" as suffix)

new
#  * import_company2(name(s))
#  * import_companies2(company_names, target=:both, agent=nil)
#  * import_news2(agent=init_agent2)
old
#  * import_name terms, agent=init_agent
#  * import_fulltext terms, agent=init_agent
#  * import_company, names, agent=nil, target=:both
#  * import_news agent=init_agent

Search option

Make cooies, and update it, manualy.
like this (setup before search):

    def init_agent2
      unless @agent
        @agent = Mechanize.new
        #@agent.user_agent = "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_4_11; de-de) AppleWebKit/525.18 (KHTML, like Gecko) Version/3.1.2 Safari/525.22"
        @agent.user_agent = 'Mozilla/5.0 (X11; Linux x86_64; rv:16.0) Gecko/20100101 Firefox/16.0'
        @agent.redirect_ok         = true
        @agent.redirection_limit   = 5 
        @agent.follow_meta_refresh = true
        # entry point
        home = @agent.get("http://#{SOURCE_HOST}/default/Desktop/de"). \
          link_with(:href => /\/home\/prof\/de/).click
        form = home.form_with(:name => 'aspnetForm')
        button = form.button_with(:name => 'ctl00$MainContent$ibOptions')
        # behaves as click
        prng = Random.new(Time.new.to_i)
        button.x = prng.rand(5..55).to_s
        button.y = prng.rand(5..55).to_s
        @agent.pre_connect_hooks << lambda do |agent, request|
          agent.request_headers['Referer']    = "http://#{SOURCE_HOST}/home/prof/de"
          agent.request_headers['Connection'] = 'keep-alive'
          agent.request_headers['Host']       = SOURCE_HOST
          agent.request_headers['Cookie']     = @agent.cookies.join(';')
        end 
        # discard this response
        form.click_button(button)
        # imitate setting of monographie search mode in "options", manualy
        option = @agent.get("http://#{SOURCE_HOST}/options/de")
        form = option.form_with(:name => 'aspnetForm')
        form.radiobutton_with(:id => 'ctl00_MainContent_rblMonographie_1').check # Fachinfo
        form.radiobutton_with(:id => 'ctl00_MainContent_rblCurrentLang_0').check # DE
        form.radiobutton_with(:id => 'ctl00_MainContent_rblContentLang_0').check # DE
        form.submit(form.button_with(:name => 'ctl00$MainContent$btnSave'))
        # overwrite cookie manualy for after request
        @agent.cookie_jar.each do |cookie|
          if cookie.name =~ /^dm\.kompendium/
            cookie.value.gsub!(/isTypeResultMonographieTitle=0/, 'isTypeResultMonographieTitle=1')
            cookie.value.gsub!(/language=EN/, 'language=DE')
          end 
        end 
      end 
      @agent
    end

Parse probelem.

continue tomorrow

view · edit · sidebar · attach · print · history
Page last modified on December 04, 2012, at 10:26 AM