<< | Index | >>
suspend
suspend
Email Sat Jan 15 02:03:17 2011: de.oddb.org ODDB::Import::Pharma24
Sat Jan 15 02:03:17 2011: de.oddb.org ODDB::Import::Pharma24#import Mechanize::RedirectLimitReachedError Maximum redirect limit (20) reached /usr/lib64/ruby/gems/1.8/gems/mechanize-1.0.0/lib/mechanize.rb:605:in `fetch_page' /usr/lib64/ruby/gems/1.8/gems/mechanize-1.0.0/lib/mechanize.rb:611:in `fetch_page' /usr/lib64/ruby/gems/1.8/gems/mechanize-1.0.0/lib/mechanize.rb:259:in `get' /var/www/de.oddb.org/lib/oddb/import/pharma24.rb:132:in `search' /var/www/de.oddb.org/lib/oddb/import/pharma24.rb:160:in `update_package' /var/www/de.oddb.org/lib/oddb/import/pharma24.rb:20:in `import' /var/www/de.oddb.org/lib/oddb/util/updater.rb:154:in `update_prices' /var/www/de.oddb.org/lib/oddb/util/updater.rb:113:in `call' /var/www/de.oddb.org/lib/oddb/util/updater.rb:113:in `_reported_import' /var/www/de.oddb.org/lib/oddb/util/updater.rb:153:in `update_prices' /var/www/de.oddb.org/jobs/import_pharma24:12 /var/www/de.oddb.org/lib/oddb/util/job.rb:16:in `call' /var/www/de.oddb.org/lib/oddb/util/job.rb:16:in `run' /var/www/de.oddb.org/jobs/import_pharma24:11 Checked 1 Packages Updated 0 Packages Created 0 Companies
suspend
We got a patch from Moo-san.
Read the fixed points
Commit
References
Check test-cases with Ruby1.8
masa@masa ~/ywesee/rpdf2txt $ ruby test/suite.rb Loaded suite test/suite Started ......................'invalid literal/lengths set' when filtering with /FlateDecode ...................................................................unknown encoding 370 0 R ............................................. Finished in 12.683721 seconds. 134 tests, 295 assertions, 0 failures, 0 errors
Note
Check test-cases with Ruby1.9
masa@masa ~/ywesee/rpdf2txt $ ruby1.9 test/suite.rb test/suite.rb:26: warning: variable $KCODE is no longer effective; ignored test/suite.rb:29:in `require': /home/masa/ywesee/rpdf2txt/test/test_pdf_object.rb:177: invalid multibyte char (US-ASCII) (SyntaxError) /home/masa/ywesee/rpdf2txt/test/test_pdf_object.rb:174: Invalid char `\x0F' in expression /home/masa/ywesee/rpdf2txt/test/test_pdf_object.rb:174: invalid multibyte char (US-ASCII) /home/masa/ywesee/rpdf2txt/test/test_pdf_object.rb:174: syntax error, unexpected $end, expecting keyword_end /Title (���)��\\���#/�-&��;S��A) ^ from test/suite.rb:29:in `block in <main>' from test/suite.rb:28:in `foreach' from test/suite.rb:28:in `<main>'
Note
My guess
test/test_pdf_object.rb#test_tree_node4
def test_tree_node4 src = ' 400 0 obj << /Title (Î^O¿ê\)ÃìÂÞ\\<9e>åPÕT#/ûØ-&<9f>®;Sü<93>O®A) #<= here /Parent 399 0 R /A 436 0 R /Next 433 0 R >> endobj ' node = Rpdf2txt::TreeNode.new(src) assert_equal(400, node.oid) assert_equal('433 0 R', node.attributes[:next]) end
BraSt
Experiment
masa@masa ~/ywesee/rpdf2txt $ ruby -I lib bin/rpdf2txt test/data/test.pdf untitled text Page 1 of 1 Printed: Donnerstag, 14. November 2002 14:04:29 Uhr testpdf
masa@masa ~/ywesee/rpdf2txt $ ruby1.9 -I lib bin/rpdf2txt test/data/test.pdf /home/masa/ywesee/rpdf2txt/lib/rpdf2txt-rockit/grammar.rb:1:in `require': /home/masa/ywesee/rpdf2txt/lib/rpdf2txt-rockit/token.rb:138: invalid multibyte char (US-ASCII) (SyntaxError) /home/masa/ywesee/rpdf2txt/lib/rpdf2txt-rockit/token.rb:138: syntax error, unexpected '~', expecting ')' super("EOF", "�~~��~^^~" + rand(1e10).inspect) ^ /home/masa/ywesee/rpdf2txt/lib/rpdf2txt-rockit/token.rb:138: invalid multibyte char (US-ASCII) from /home/masa/ywesee/rpdf2txt/lib/rpdf2txt-rockit/grammar.rb:1:in `<top (required)>' from /home/masa/ywesee/rpdf2txt/lib/rpdf2txt-rockit/lalr_parsetable_generator.rb:1:in `require' from /home/masa/ywesee/rpdf2txt/lib/rpdf2txt-rockit/lalr_parsetable_generator.rb:1:in `<top (required)>' from /home/masa/ywesee/rpdf2txt/lib/rpdf2txt-rockit/rockit.rb:2:in `require' from /home/masa/ywesee/rpdf2txt/lib/rpdf2txt-rockit/rockit.rb:2:in `<top (required)>' from /home/masa/ywesee/rpdf2txt/lib/rpdf2txt/textparser.rb:25:in `require' from /home/masa/ywesee/rpdf2txt/lib/rpdf2txt/textparser.rb:25:in `<top (required)>' from /home/masa/ywesee/rpdf2txt/lib/rpdf2txt/text.rb:26:in `require' from /home/masa/ywesee/rpdf2txt/lib/rpdf2txt/text.rb:26:in `<top (required)>' from /home/masa/ywesee/rpdf2txt/lib/rpdf2txt/object.rb:26:in `require' from /home/masa/ywesee/rpdf2txt/lib/rpdf2txt/object.rb:26:in `<top (required)>' from /home/masa/ywesee/rpdf2txt/lib/rpdf2txt/parser.rb:26:in `require' from /home/masa/ywesee/rpdf2txt/lib/rpdf2txt/parser.rb:26:in `<top (required)>' from bin/rpdf2txt:25:in `require' from bin/rpdf2txt:25:in `<main>'
Experiment
class EofToken < Token def initialize(*args) # Shouldn't match anything but since I'm not sure how to do a regexp # with that chareacteristic we use a highly unlikely string in the mean # time. # super("EOF", "¤~~¤¤~^^~" + rand(1e10).inspect) (delete) end ... class EpsilonToken < Token def initialize # Shouldn't match anything but since I'm not sure how to do a regexp # with that chareacteristic we use a highly unlikely string in the mean # time. # super("epsilon", "¤~~¤¤~^^~" + rand(1e10).inspect) (delete) end
lib/rpdf2txt-rockit/rockit_grammars_parser.rb
require 'rpdf2txt-rockit/rockit' module Parse # Parser for RockitGrammar # created by Rockit version 0.3.8 on Mon Dec 02 20:05:20 CET 2002 # Rockit is copyright (c) 2001 Robert Feldt, feldt@ce.chalmers.se # and licensed under GPL # but this parser is under LGPL tokens = [ # t1 = EofToken.new("EOF",/^(¤~~¤¤~^^~5348086680)/n), (delete) t2 = Token.new("Blank",/^(\s+)/n,:Skip),
#require 'md5' require 'digest/md5'
#require 'md5' require 'digest/md5'
Result
masa@masa ~/ywesee/rpdf2txt $ ruby1.9 -I lib bin/rpdf2txt test/data/test.pdf /home/masa/ywesee/rpdf2txt/lib/rpdf2txt/parser.rb:131:in `scan': invalid byte sequence in UTF-8 (ArgumentError) from /home/masa/ywesee/rpdf2txt/lib/rpdf2txt/parser.rb:131:in `build_object_catalogue' from /home/masa/ywesee/rpdf2txt/lib/rpdf2txt/parser.rb:48:in `object_catalogue' from /home/masa/ywesee/rpdf2txt/lib/rpdf2txt/parser.rb:163:in `page_tree_root' from /home/masa/ywesee/rpdf2txt/lib/rpdf2txt/parser.rb:145:in `build_page_tree' from /home/masa/ywesee/rpdf2txt/lib/rpdf2txt/parser.rb:51:in `page_tree' from /home/masa/ywesee/rpdf2txt/lib/rpdf2txt/parser.rb:41:in `extract_text' from bin/rpdf2txt:58:in `<main>'
References
Experiment
test1.rb
open( "test.dat", "wb" ) do |f| a = 999 f.write( [testData].pack("l") ) end
test2.rb
open( "test.dat", "rb" ) do |f| a = f.read p a.unpack("U") end
Result
masa@masa ~/work $ ruby test1.rb masa@masa ~/work $ ruby test2.rb [999]
test3.rb
a = " " p a.unpack("U")
How to make test3.rb
masa@masa ~/work $ mv test.dat test3.rb masa@masa ~/work $ vim test3.rb a=" (ESC), $, a " p a.unpack("U")
Result (Ruby 1.8)
masa@masa ~/work $ ruby test3.rb [999]
Result (Ruby 1.9)
masa@masa ~/work $ ruby1.9 test3.rb test3.rb:1: invalid multibyte char (US-ASCII) test3.rb:1: invalid multibyte char (US-ASCII)
Experiment
masa@masa ~/work $ cat test4.rb # encoding: utf-8 masa@masa ~/work $ cat test3.rb >> test4.rb masa@masa ~/work $ cat test4.rb # encoding: utf-8 a="ϧ" p a.unpack("U") masa@masa ~/work $ ruby1.9 test4.rb [999]
Notes
Reference
Experiment
test4.rb
# encoding: ascii-8bit a="ϧ" p a.unpack("U") masa@masa ~/work $ ruby1.9 test4.rb
Result
masa@masa ~/work $ ruby1.9 test4.rb [999, 999]
Notes
Experiment (add magic comments 'encoding: ascii-8bit' and force_encoding)
Result
masa@masa ~/ywesee/rpdf2txt $ ruby1.9 -I lib bin/rpdf2txt test/data/test.pdf masa@masa ~/ywesee/rpdf2txt $
Note
Important