view · edit · sidebar · attach · print · history

20120627-debug-textinfo-updater

<< | Index | >>


Summary

  • Debug update_company_textinfos Updater-Job.
    • Fixed encoding error on production server.
  • Updated link style of ATC-class in result list.

Commits

Index


update_company_textinfo problem

Timeout error

update_company_textinfos job received timeout error.
I can't re-create Encoding error.

Plugin: ODDB::TextInfoPlugin
Error: Net::HTTP::Persistent::Error
Message: too many connection resets (due to Connection reset by peer - Errno::ECONNRESET) after 105 requests on 77491120

added timeout error handling.
It seems that this problem are not caused at Request from Switzerland. in src/plugin/text_info.rb

    def init_agent
      agent = Mechanize.new
      agent.user_agent = "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_4_11; de-de) AppleWebKit/525.18 (KHTML, like Gecko) Version/3.1.2 Safari/525.22"
->    agent.keep_alive = false
      agent
    end
    def get_with_retry(agent, url)
      page = nil 
      tried = 0 
      begin
        tried += 1
        page = agent.get(url)
      rescue Net::HTTP::Persistent::Error => e
        if e.message =~ /Timeout/i and tried < 3 
          sleep 5
          retry
        else
          raise e.message
        end 
      end 
      page
    end 

It was not solution.
This Problem is caused in only POST request.

agent.submit

then tried followings.

    def init_agent
      agent = Mechanize.new
      agent.user_agent = "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_4_11; de-de) AppleWebKit/525.18 (KHTML, like Gecko) Version/3.1.2 Safari/525.22"
->    agent.keep_alive = false
->    agent.idle_timeout = 20
->    agent.read_timeout = 60
      agent
    end

Updater could handle more Requests, But sitll same Problem is caused.

Refs

I tried to debug with static HTML file.

Encoding problem

* jobs/update_company_textinfos
  * update_company_textinfos (src/util/updater.rb)
    * import_company (src/plugin/text_info.rb)
      * search_company
      * search
        * init_searchform
      * import_companies
        * submit_event
      * import_products
        * identify_eventtargets
        * import_product
          * update_product
->          * parse_fachinfo
->          * parse_patinfo
            * update_fachinfo
            * update_patinfo
              * store_orphaned

debug fiparsed via TextInfoPlugin#parse_fachinfo.

nil Element

ch.oddb> TextInfoPlugin.new(self).parse_fachinfo('/home/yasuhiro/Downloads/olanzapin.htm').name
-> Olanzapin Sandoz® Filmtabletten/Schmelztabletten
ch.oddb> TextInfoPlugin.new(self).parse_fachinfo('/home/yasuhiro/Downloads/Olanzapin-Mepha®_-_oro.html').name
-> undefined method `inner_text' for nil:NilClass

encoding

in ext/fiparse/src/textinfo_hpricot.rb

def text(elem)
    return '' unless elem
    p elem.to_s.encoding
end
#=>
  #<Encoding:UTF-8>
  #<Encoding:US-ASCII>
  #<Encoding:UTF-8>
  ..

But, This does not caused errar (any exception).
I tierd following version.

  • hpricot 0.8.4
  • hpricot 0.8.6

In local machine, All fine.
But production server has following error.

ywesee@thinpower /var/www/oddb.org $ RUBYOPT="" bin/admin 
ch.oddb> TextInfoPlugin.new(self).parse_fachinfo('/home/ywesee/test.html')                        
-> invalid byte sequence in US-ASCII

I noticed that production server does not have locale.
This is a problem for hpricot gem.

Then I tried following code via bin/admin in production server.

module Hpricot
  def self.uxs(str)
    str.to_s.force_encoding('utf-8').
        gsub(/\&(\w+);/) { [NamedCharacters[$1] || 63].pack("U*") }. # 63 = ?? (query char)
        gsub(/\&\#(\d+);/) { [$1.to_i].pack("U*") }
  end 
  class Text
    def to_s
      str = content.force_encoding('utf-8')
      Hpricot.uxs(str)
    end 
  end 
end

/path/to/ruby/lib/ruby/gems/1.9.1/gems/hpricot-0.8.4(or 6)/lib/hpricot/builder.rb

I could get expected object.

ywesee@thinpower /var/www/oddb.org $ RUBYOPT="" bin/admin 
ch.oddb> TextInfoPlugin.new(self).parse_fachinfo('/home/ywesee/test.html')
-> #<ODDB::FachinfoDocument2001:0x0000000cf88138>
ch.oddb> TextInfoPlugin.new(self).parse_fachinfo('/home/ywesee/test.html').name
-> Zadorin®
ch.oddb
commit

Update style of ATC links

Updated style of ATC links in Result list.

commit

:z.B.

view · edit · sidebar · attach · print · history
Page last modified on June 27, 2012, at 01:38 PM