<< | Index | >>
Check the current state
Ruby 1.8 with the latest libraries
masa@masa ~/ywesee/rpdf2txt $ ruby test/suite.rb Loaded suite test/suite Started ......................'invalid literal/lengths set' when filtering with /FlateDecode ...................................................................unknown encoding 370 0 R .........#<Rpdf2txt::CMap:0x7f9d3ecb0b78 @target_encoding="utf8", @decoded_stream="", @decrypted_stream="", @src="<< >>", @raw_stream="", @map={}, @attributes={}> ....................... Finished in 9.064189 seconds. 121 tests, 277 assertions, 0 failures, 0 errors
Ruby 1.8 with the updated libraries
masa@masa ~/ywesee/rpdf2txt $ ruby18 -I lib test/suite.rb Loaded suite test/suite Started ......................'invalid literal/lengths set' when filtering with /FlateDecode ...................................................................unknown encoding 370 0 R .........E...................... Finished in 9.428983 seconds. 1) Error: test_join_snippets__hex_chars(TestParser): NoMethodError: undefined method `[]' for nil:NilClass ./lib/rpdf2txt/object.rb:787:in `raw_stream' ./lib/rpdf2txt/object.rb:790:in `decode_raw_stream' ./lib/rpdf2txt/object.rb:682:in `decoded_stream' ./lib/rpdf2txt/object.rb:1050:in `extract_bfchar' ./lib/rpdf2txt/object.rb:1069:in `parse_cmap' ./lib/rpdf2txt/object.rb:1008:in `initialize' /home/masa/ywesee/rpdf2txt/test/test_pdf_parser.rb:313:in `new' /home/masa/ywesee/rpdf2txt/test/test_pdf_parser.rb:313:in `test_join_snippets__hex_chars' 121 tests, 277 assertions, 0 failures, 1 errors
Ruby 1.9
masa@masa ~/ywesee/rpdf2txt $ ruby1.9 -I lib test/suite.rb test/suite.rb:26: warning: variable $KCODE is no longer effective; ignored /home/masa/ywesee/rpdf2txt/test/test_pdf_object.rb:26: warning: variable $KCODE is no longer effective; ignored /home/masa/ywesee/rpdf2txt/test/test_pdf_parser.rb:28: warning: variable $KCODE is no longer effective; ignored Loaded suite test/suite Started ......................'invalid literal/lengths set' when filtering with /FlateDecode ......................................................... ..........unknown encoding 370 0 R .........E........F............. 1) Error: test_join_snippets__hex_chars(TestParser): NoMethodError: undefined method `each' for nil:NilClass /home/masa/ywesee/rpdf2txt/lib/rpdf2txt/object.rb:107:in `extract_attributes' /home/masa/ywesee/rpdf2txt/lib/rpdf2txt/object.rb:88:in `parse_attributes' /home/masa/ywesee/rpdf2txt/lib/rpdf2txt/object.rb:44:in `initialize' /home/masa/ywesee/rpdf2txt/lib/rpdf2txt/object.rb:1007:in `initialize' /home/masa/ywesee/rpdf2txt/test/test_pdf_parser.rb:313:in `new' /home/masa/ywesee/rpdf2txt/test/test_pdf_parser.rb:313:in `test_join_snippets__hex_chars' 2) Failure: test_char_width(TestTextState) [/home/masa/ywesee/rpdf2txt/test/test_text_state.rb:303]: <0.313> expected but was <0.301> diff: ? 0.3013 Finished in 7.396021768 seconds. 121 tests, 277 assertions, 1 failures, 1 errors, 0 pendings, 0 omissions, 0 notifications 98.3471% passed
Focus on the test_join_snippets__hex_chars error
test/test_pdf_parser.rb#test_join_snippets__hex_chars
1) Error: test_join_snippets__hex_chars(TestParser): NoMethodError: undefined method `each' for nil:NilClass /home/masa/ywesee/rpdf2txt/lib/rpdf2txt/object.rb:107:in `extract_attributes' /home/masa/ywesee/rpdf2txt/lib/rpdf2txt/object.rb:88:in `parse_attributes' /home/masa/ywesee/rpdf2txt/lib/rpdf2txt/object.rb:44:in `initialize' /home/masa/ywesee/rpdf2txt/lib/rpdf2txt/object.rb:1007:in `initialize' /home/masa/ywesee/rpdf2txt/test/test_pdf_parser.rb:313:in `new' /home/masa/ywesee/rpdf2txt/test/test_pdf_parser.rb:313:in `test_join_snippets__hex_chars'
Experiment
lib/rpdf2txt/object.rb#extract_attributes
def extract_attributes(ast) if(ast.children_names.include?('value')) pdf_unescape(ast.value) elsif(ast.children_names.include?('text')) pdf_unescape(ast.text.value[1...-1]) elsif(ast.children_names.include?('values')) ast.values.collect { |child| extract_attributes(child) } elsif(ast.children_names.include?('pairs')) result = {} print "ast=" p ast print "ast.pairs=" p ast.pairs print "ast.pairs.class=" p ast.pairs.class puts ast.pairs.each { |pair| k, v = pair keystr = k.value.strip.tr('/','') unless(keystr.empty?) result.store(keystr.downcase.intern, extract_attributes(v)) end } result else value = ast end end
Result
Ruby 1.8 with the latest libraries
masa@masa ~/ywesee/rpdf2txt $ ruby18 test/test_pdf_parser.rb ... ast=Hash:[_ArrayNode] ast.pairs=_ArrayNode ast.pairs.class=ArrayNode
Ruby 1.9 with the updated libraries
masa@masa ~/ywesee/rpdf2txt $ ruby1.9 -I lib test/test_pdf_parser.rb ... ast=Hash:[nil] ast.pairs=nil ast.pairs.class=NilClass
Note
Experiment
lib/rpdf2txt-rockit/glr_parser.rb#actor
def actor(stack) print "stack=" p stack #puts "actor(#{stack.state}) @stacks_to_act_on = #{@stacks_to_act_on.map{|s| s.state}.inspect}, @active_stacks = #{@active_stacks.map{|s| s.state}.inspect}" tokens = stack.lexer.peek #print "tokens = #{tokens.inspect}, " print "tokens=" p tokens tokens.each do |token| #print "state = #{stack.state.inspect}, " print "@parse_table=" p @parse_table exit
Result
Ruby 1.8 with the latest libraries
Actions: 0: ,s13 ,s1 ,s9 ,s11 , , ,s5 ,s10 , ,s14 , ,s3 , , ,s2 ,| ,8,6, , , ,7, ,12,4, 1: r5 ,r5 , , , , , , , , , ,r5 , , , , ,| , , , , , , , , , , 2: r28 ,r28 ,r28 ,r28 ,r28 , , ,r28 ,r28 ,r28 ,r28 ,r28 ,r28 ,r28 ,r28 ,r28 ,| , , , , , , , , , , 3: , , , , , , , , , , , , ,s15 , , ,| , , , , , , , , , , 4: r6 ,r6 , , , , , , , , , ,r6 , , , , ,| , , , , , , , , , , 5: r9 ,r9 , , , , , , , , , ,r9 , , , , ,| , , , , , , , , , , 6: r3 ,r3 , , , , , , , , , ,r3 , , , , ,| , , , , , , , , , , 7: r2 ,r2 , , , , , , , , , ,r2 , , , , ,| , , , , , , , , , , 8: a , , , , , , , , , , , , , , , ,| , , , , , , , , , , 9: r4 ,r4 , , , , , , , , , ,r4 , , , , ,| , , , , , , , , , , 10: ,s23 ,s16 ,s21 ,s22 , , ,s18 ,s10 ,s26 ,s14 , , , , ,s2 ,| , ,20,27,25,24,19, , ,17, 11: r8 ,r8 , , , , , , , , , ,r8 , , , , ,| , , , , , , , , , , 12: r7 ,r7 , , , , , , , , , ,r7 , , , , ,| , , , , , , , , , , 13: r1 ,r1 , , , , , , , , , ,r1 , , , , ,| , , , , , , , , , , 14: ,s29 , , , , , , , , , ,s28 , , , , ,| , , , , , , ,30, , , 15: , , , , , , , , , , , , , ,s31 , ,| , , , , , , , , , , 16: ,r17 ,r17 ,r17 ,r17 , , ,r17 ,r17 ,r17 ,r17 ,r17 ,r17 ,r17 ,r17 ,r17 ,| , , , , , , , , , , 17: ,r22 ,r22 ,r22 ,r22 , , ,r22 ,r22 ,r22 ,r22 ,r22 ,r22 ,r22 ,r22 ,r22 ,| , , , , , , , , , , 18: ,r21 ,r21 ,r21 ,r21 , , ,r21 ,r21 ,r21 ,r21 ,r21 ,r21 ,r21 ,r21 ,r21 ,| , , , , , , , , , , 19: ,r16 ,r16 ,r16 ,r16 , , ,r16 ,r16 ,r16 ,r16 ,r16 ,r16 ,r16 ,r16 ,r16 ,| , , , , , , , , , , 20: ,r15 ,r15 ,r15 ,r15 , , ,r15 ,r15 ,r15 ,r15 ,r15 ,r15 ,r15 ,r15 ,r15 ,| , , , , , , , , , , 21: ,r19 ,r19 ,r19 ,r19 , , ,r19 ,r19 ,r19 ,r19 ,r19 ,r19 ,r19 ,r19 ,r19 ,| , , , , , , , , , , 22: ,r20 ,r20 ,r20 ,r20 , , ,r20 ,r20 ,r20 ,r20 ,r20 ,r20 ,r20 ,r20 ,r20 ,| , , , , , , , , , , 23: ,r18 ,r18 ,r18 ,r18 , , ,r18 ,r18 ,r18 ,r18 ,r18 ,r18 ,r18 ,r18 ,r18 ,| , , , , , , , , , , 24: ,r14 ,r14 ,r14 ,r14 , , ,r14 ,r14 ,r14 ,r14 ,r14 ,r14 ,r14 ,r14 ,r14 ,| , , , , , , , , , , 25: ,s23 ,s16 ,s21 ,s22 , , ,s18 ,s10 ,r12 ,s14 , , , , ,s2 ,| , ,20, , ,32,19, , ,17, 26: r11 ,r11 ,r11 ,r11 ,r11 , , ,r11 ,r11 ,r11 ,r11 ,r11 ,r11 ,r11 ,r11 ,r11 ,| , , , , , , , , , , 27: , , , , , , , , ,s33 , , , , , , ,| , , , , , , , , , , 28: r24 ,r24 ,r24 ,r24 ,r24 , , ,r24 ,r24 ,r24 ,r24 ,r24 ,r24 ,r24 ,r24 ,r24 ,| , , , , , , , , , , 29: ,s13 ,s1 ,s9 ,s11 , , ,s5 ,s10 , ,s14 , ,s3 , , ,s2 ,| ,34,6, , , ,7, ,12,4, 30: ,s36 , , , , , , , , , ,s35 , , , , ,| , , , , , , , , , , 31: r27 ,r27 , , , , , , , , , ,r27 , , , , ,| , , , , , , , , , , 32: ,r13 ,r13 ,r13 ,r13 , , ,r13 ,r13 ,r13 ,r13 ,r13 ,r13 ,r13 ,r13 ,r13 ,| , , , , , , , , , , 33: r10 ,r10 ,r10 ,r10 ,r10 , , ,r10 ,r10 ,r10 ,r10 ,r10 ,r10 ,r10 ,r10 ,r10 ,| , , , , , , , , , , 34: ,r26 , , , , , , , , , ,r26 , , , , ,| , , , , , , , , , , 35: r23 ,r23 ,r23 ,r23 ,r23 , , ,r23 ,r23 ,r23 ,r23 ,r23 ,r23 ,r23 ,r23 ,r23 ,| , , , , , , , , , , 36: ,s13 ,s1 ,s9 ,s11 , , ,s5 ,s10 , ,s14 , ,s3 , , ,s2 ,| ,37,6, , , ,7, ,12,4, 37: ,r25 , , , , , , , , , ,r25 , , , , ,| , , , , , , , , , ,
Ruby 1.8 with the updated libraries
Actions: 0: ,s14 ,s5 ,s12 ,s1 , , ,s2 ,s13 , ,s3 , ,s10 , , ,s9 ,| ,8,6, , , ,7, ,11,4, 1: r8 ,r8 , , , , , , , , , ,r8 , , , , ,| , , , , , , , , , , 2: r9 ,r9 , , , , , , , , , ,r9 , , , , ,| , , , , , , , , , , 3: ,s17 , , , , , , , , , ,s16 , , , , ,| , , , , , , ,15, , , 4: r6 ,r6 , , , , , , , , , ,r6 , , , , ,| , , , , , , , , , , 5: r5 ,r5 , , , , , , , , , ,r5 , , , , ,| , , , , , , , , , , 6: r3 ,r3 , , , , , , , , , ,r3 , , , , ,| , , , , , , , , , , 7: r2 ,r2 , , , , , , , , , ,r2 , , , , ,| , , , , , , , , , , 8: a , , , , , , , , , , , , , , , ,| , , , , , , , , , , 9: r28 ,r28 ,r28 ,r28 ,r28 , , ,r28 ,r28 ,r28 ,r28 ,r28 ,r28 ,r28 ,r28 ,r28 ,| , , , , , , , , , , 10: , , , , , , , , , , , , ,s18 , , ,| , , , , , , , , , , 11: r7 ,r7 , , , , , , , , , ,r7 , , , , ,| , , , , , , , , , , 12: r4 ,r4 , , , , , , , , , ,r4 , , , , ,| , , , , , , , , , , 13: ,s30 ,s24 ,s28 ,s19 , , ,s21 ,s13 ,s23 ,s3 , , , , ,s9 ,| , ,26,29,20,27,25, , ,22, 14: r1 ,r1 , , , , , , , , , ,r1 , , , , ,| , , , , , , , , , , 15: ,s32 , , , , , , , , , ,s31 , , , , ,| , , , , , , , , , , 16: r24 ,r24 ,r24 ,r24 ,r24 , , ,r24 ,r24 ,r24 ,r24 ,r24 ,r24 ,r24 ,r24 ,r24 ,| , , , , , , , , , , 17: ,s14 ,s5 ,s12 ,s1 , , ,s2 ,s13 , ,s3 , ,s10 , , ,s9 ,| ,33,6, , , ,7, ,11,4, 18: , , , , , , , , , , , , , ,s34 , ,| , , , , , , , , , , 19: ,r20 ,r20 ,r20 ,r20 , , ,r20 ,r20 ,r20 ,r20 ,r20 ,r20 ,r20 ,r20 ,r20 ,| , , , , , , , , , , 20: ,s30 ,s24 ,s28 ,s19 , , ,s21 ,s13 ,r12 ,s3 , , , , ,s9 ,| , ,26, , ,35,25, , ,22, 21: ,r21 ,r21 ,r21 ,r21 , , ,r21 ,r21 ,r21 ,r21 ,r21 ,r21 ,r21 ,r21 ,r21 ,| , , , , , , , , , , 22: ,r22 ,r22 ,r22 ,r22 , , ,r22 ,r22 ,r22 ,r22 ,r22 ,r22 ,r22 ,r22 ,r22 ,| , , , , , , , , , , 23: r11 ,r11 ,r11 ,r11 ,r11 , , ,r11 ,r11 ,r11 ,r11 ,r11 ,r11 ,r11 ,r11 ,r11 ,| , , , , , , , , , , 24: ,r17 ,r17 ,r17 ,r17 , , ,r17 ,r17 ,r17 ,r17 ,r17 ,r17 ,r17 ,r17 ,r17 ,| , , , , , , , , , , 25: ,r16 ,r16 ,r16 ,r16 , , ,r16 ,r16 ,r16 ,r16 ,r16 ,r16 ,r16 ,r16 ,r16 ,| , , , , , , , , , , 26: ,r15 ,r15 ,r15 ,r15 , , ,r15 ,r15 ,r15 ,r15 ,r15 ,r15 ,r15 ,r15 ,r15 ,| , , , , , , , , , , 27: ,r14 ,r14 ,r14 ,r14 , , ,r14 ,r14 ,r14 ,r14 ,r14 ,r14 ,r14 ,r14 ,r14 ,| , , , , , , , , , , 28: ,r19 ,r19 ,r19 ,r19 , , ,r19 ,r19 ,r19 ,r19 ,r19 ,r19 ,r19 ,r19 ,r19 ,| , , , , , , , , , , 29: , , , , , , , , ,s36 , , , , , , ,| , , , , , , , , , , 30: ,r18 ,r18 ,r18 ,r18 , , ,r18 ,r18 ,r18 ,r18 ,r18 ,r18 ,r18 ,r18 ,r18 ,| , , , , , , , , , , 31: r23 ,r23 ,r23 ,r23 ,r23 , , ,r23 ,r23 ,r23 ,r23 ,r23 ,r23 ,r23 ,r23 ,r23 ,| , , , , , , , , , , 32: ,s14 ,s5 ,s12 ,s1 , , ,s2 ,s13 , ,s3 , ,s10 , , ,s9 ,| ,37,6, , , ,7, ,11,4, 33: ,r26 , , , , , , , , , ,r26 , , , , ,| , , , , , , , , , , 34: r27 ,r27 , , , , , , , , , ,r27 , , , , ,| , , , , , , , , , , 35: ,r13 ,r13 ,r13 ,r13 , , ,r13 ,r13 ,r13 ,r13 ,r13 ,r13 ,r13 ,r13 ,r13 ,| , , , , , , , , , , 36: r10 ,r10 ,r10 ,r10 ,r10 , , ,r10 ,r10 ,r10 ,r10 ,r10 ,r10 ,r10 ,r10 ,r10 ,| , , , , , , , , , , 37: ,r25 , , , , , , , , , ,r25 , , , , ,| , , , , , , , , , ,
Ruby 1.9 with the updated libraries
Actions: 0: ,s14 ,s5 ,s12 ,s1 , , ,s2 ,s13 , ,s3 , ,s10 , , ,s9 ,| ,8,6, , , ,7, ,11,4, 1: r8 ,r8 , , , , , , , , , ,r8 , , , , ,| , , , , , , , , , , 2: r9 ,r9 , , , , , , , , , ,r9 , , , , ,| , , , , , , , , , , 3: ,s17 , , , , , , , , , ,s16 , , , , ,| , , , , , , ,15, , , 4: r6 ,r6 , , , , , , , , , ,r6 , , , , ,| , , , , , , , , , , 5: r5 ,r5 , , , , , , , , , ,r5 , , , , ,| , , , , , , , , , , 6: r3 ,r3 , , , , , , , , , ,r3 , , , , ,| , , , , , , , , , , 7: r2 ,r2 , , , , , , , , , ,r2 , , , , ,| , , , , , , , , , , 8: a , , , , , , , , , , , , , , , ,| , , , , , , , , , , 9: r28 ,r28 ,r28 ,r28 ,r28 , , ,r28 ,r28 ,r28 ,r28 ,r28 ,r28 ,r28 ,r28 ,r28 ,| , , , , , , , , , , 10: , , , , , , , , , , , , ,s18 , , ,| , , , , , , , , , , 11: r7 ,r7 , , , , , , , , , ,r7 , , , , ,| , , , , , , , , , , 12: r4 ,r4 , , , , , , , , , ,r4 , , , , ,| , , , , , , , , , , 13: ,s30 ,s24 ,s28 ,s19 , , ,s21 ,s13 ,s23 ,s3 , , , , ,s9 ,| , ,26,29,20,27,25, , ,22, 14: r1 ,r1 , , , , , , , , , ,r1 , , , , ,| , , , , , , , , , , 15: ,s32 , , , , , , , , , ,s31 , , , , ,| , , , , , , , , , , 16: r24 ,r24 ,r24 ,r24 ,r24 , , ,r24 ,r24 ,r24 ,r24 ,r24 ,r24 ,r24 ,r24 ,r24 ,| , , , , , , , , , , 17: ,s14 ,s5 ,s12 ,s1 , , ,s2 ,s13 , ,s3 , ,s10 , , ,s9 ,| ,33,6, , , ,7, ,11,4, 18: , , , , , , , , , , , , , ,s34 , ,| , , , , , , , , , , 19: ,r20 ,r20 ,r20 ,r20 , , ,r20 ,r20 ,r20 ,r20 ,r20 ,r20 ,r20 ,r20 ,r20 ,| , , , , , , , , , , 20: ,s30 ,s24 ,s28 ,s19 , , ,s21 ,s13 ,r12 ,s3 , , , , ,s9 ,| , ,26, , ,35,25, , ,22, 21: ,r21 ,r21 ,r21 ,r21 , , ,r21 ,r21 ,r21 ,r21 ,r21 ,r21 ,r21 ,r21 ,r21 ,| , , , , , , , , , , 22: ,r22 ,r22 ,r22 ,r22 , , ,r22 ,r22 ,r22 ,r22 ,r22 ,r22 ,r22 ,r22 ,r22 ,| , , , , , , , , , , 23: r11 ,r11 ,r11 ,r11 ,r11 , , ,r11 ,r11 ,r11 ,r11 ,r11 ,r11 ,r11 ,r11 ,r11 ,| , , , , , , , , , , 24: ,r17 ,r17 ,r17 ,r17 , , ,r17 ,r17 ,r17 ,r17 ,r17 ,r17 ,r17 ,r17 ,r17 ,| , , , , , , , , , , 25: ,r16 ,r16 ,r16 ,r16 , , ,r16 ,r16 ,r16 ,r16 ,r16 ,r16 ,r16 ,r16 ,r16 ,| , , , , , , , , , , 26: ,r15 ,r15 ,r15 ,r15 , , ,r15 ,r15 ,r15 ,r15 ,r15 ,r15 ,r15 ,r15 ,r15 ,| , , , , , , , , , , 27: ,r14 ,r14 ,r14 ,r14 , , ,r14 ,r14 ,r14 ,r14 ,r14 ,r14 ,r14 ,r14 ,r14 ,| , , , , , , , , , , 28: ,r19 ,r19 ,r19 ,r19 , , ,r19 ,r19 ,r19 ,r19 ,r19 ,r19 ,r19 ,r19 ,r19 ,| , , , , , , , , , , 29: , , , , , , , , ,s36 , , , , , , ,| , , , , , , , , , , 30: ,r18 ,r18 ,r18 ,r18 , , ,r18 ,r18 ,r18 ,r18 ,r18 ,r18 ,r18 ,r18 ,r18 ,| , , , , , , , , , , 31: r23 ,r23 ,r23 ,r23 ,r23 , , ,r23 ,r23 ,r23 ,r23 ,r23 ,r23 ,r23 ,r23 ,r23 ,| , , , , , , , , , , 32: ,s14 ,s5 ,s12 ,s1 , , ,s2 ,s13 , ,s3 , ,s10 , , ,s9 ,| ,37,6, , , ,7, ,11,4, 33: ,r26 , , , , , , , , , ,r26 , , , , ,| , , , , , , , , , , 34: r27 ,r27 , , , , , , , , , ,r27 , , , , ,| , , , , , , , , , , 35: ,r13 ,r13 ,r13 ,r13 , , ,r13 ,r13 ,r13 ,r13 ,r13 ,r13 ,r13 ,r13 ,r13 ,| , , , , , , , , , , 36: r10 ,r10 ,r10 ,r10 ,r10 , , ,r10 ,r10 ,r10 ,r10 ,r10 ,r10 ,r10 ,r10 ,r10 ,| , , , , , , , , , , 37: ,r25 , , , , , , , , , ,r25 , , , , ,| , , , , , , , , , ,
Note
Next
Experiment
lib/rpdf2txt/data/pdfattributes.rb#_attr_parser
def Rpdf2txt._attr_parser print "@@parse_table70010113197280=" p @@parse_table70010113197280 exit
Result
Ruby 1.8 with the latest libraries
Actions: 0: ,s13 ,s1 ,s9 ,s11 , , ,s5 ,s10 , ,s14 , ,s3 , , ,s2 ,| ,8,6, , , ,7, ,12,4, 1: r5 ,r5 , , , , , , , , , ,r5 , , , , ,| , , , , , , , , , , 2: r28 ,r28 ,r28 ,r28 ,r28 , , ,r28 ,r28 ,r28 ,r28 ,r28 ,r28 ,r28 ,r28 ,r28 ,| , , , , , , , , , , 3: , , , , , , , , , , , , ,s15 , , ,| , , , , , , , , , , 4: r6 ,r6 , , , , , , , , , ,r6 , , , , ,| , , , , , , , , , , 5: r9 ,r9 , , , , , , , , , ,r9 , , , , ,| , , , , , , , , , , 6: r3 ,r3 , , , , , , , , , ,r3 , , , , ,| , , , , , , , , , , 7: r2 ,r2 , , , , , , , , , ,r2 , , , , ,| , , , , , , , , , , 8: a , , , , , , , , , , , , , , , ,| , , , , , , , , , , 9: r4 ,r4 , , , , , , , , , ,r4 , , , , ,| , , , , , , , , , , 10: ,s23 ,s16 ,s21 ,s22 , , ,s18 ,s10 ,s26 ,s14 , , , , ,s2 ,| , ,20,27,25,24,19, , ,17, 11: r8 ,r8 , , , , , , , , , ,r8 , , , , ,| , , , , , , , , , , 12: r7 ,r7 , , , , , , , , , ,r7 , , , , ,| , , , , , , , , , , 13: r1 ,r1 , , , , , , , , , ,r1 , , , , ,| , , , , , , , , , , 14: ,s29 , , , , , , , , , ,s28 , , , , ,| , , , , , , ,30, , , 15: , , , , , , , , , , , , , ,s31 , ,| , , , , , , , , , , 16: ,r17 ,r17 ,r17 ,r17 , , ,r17 ,r17 ,r17 ,r17 ,r17 ,r17 ,r17 ,r17 ,r17 ,| , , , , , , , , , , 17: ,r22 ,r22 ,r22 ,r22 , , ,r22 ,r22 ,r22 ,r22 ,r22 ,r22 ,r22 ,r22 ,r22 ,| , , , , , , , , , , 18: ,r21 ,r21 ,r21 ,r21 , , ,r21 ,r21 ,r21 ,r21 ,r21 ,r21 ,r21 ,r21 ,r21 ,| , , , , , , , , , , 19: ,r16 ,r16 ,r16 ,r16 , , ,r16 ,r16 ,r16 ,r16 ,r16 ,r16 ,r16 ,r16 ,r16 ,| , , , , , , , , , , 20: ,r15 ,r15 ,r15 ,r15 , , ,r15 ,r15 ,r15 ,r15 ,r15 ,r15 ,r15 ,r15 ,r15 ,| , , , , , , , , , , 21: ,r19 ,r19 ,r19 ,r19 , , ,r19 ,r19 ,r19 ,r19 ,r19 ,r19 ,r19 ,r19 ,r19 ,| , , , , , , , , , , 22: ,r20 ,r20 ,r20 ,r20 , , ,r20 ,r20 ,r20 ,r20 ,r20 ,r20 ,r20 ,r20 ,r20 ,| , , , , , , , , , , 23: ,r18 ,r18 ,r18 ,r18 , , ,r18 ,r18 ,r18 ,r18 ,r18 ,r18 ,r18 ,r18 ,r18 ,| , , , , , , , , , , 24: ,r14 ,r14 ,r14 ,r14 , , ,r14 ,r14 ,r14 ,r14 ,r14 ,r14 ,r14 ,r14 ,r14 ,| , , , , , , , , , , 25: ,s23 ,s16 ,s21 ,s22 , , ,s18 ,s10 ,r12 ,s14 , , , , ,s2 ,| , ,20, , ,32,19, , ,17, 26: r11 ,r11 ,r11 ,r11 ,r11 , , ,r11 ,r11 ,r11 ,r11 ,r11 ,r11 ,r11 ,r11 ,r11 ,| , , , , , , , , , , 27: , , , , , , , , ,s33 , , , , , , ,| , , , , , , , , , , 28: r24 ,r24 ,r24 ,r24 ,r24 , , ,r24 ,r24 ,r24 ,r24 ,r24 ,r24 ,r24 ,r24 ,r24 ,| , , , , , , , , , , 29: ,s13 ,s1 ,s9 ,s11 , , ,s5 ,s10 , ,s14 , ,s3 , , ,s2 ,| ,34,6, , , ,7, ,12,4, 30: ,s36 , , , , , , , , , ,s35 , , , , ,| , , , , , , , , , , 31: r27 ,r27 , , , , , , , , , ,r27 , , , , ,| , , , , , , , , , , 32: ,r13 ,r13 ,r13 ,r13 , , ,r13 ,r13 ,r13 ,r13 ,r13 ,r13 ,r13 ,r13 ,r13 ,| , , , , , , , , , , 33: r10 ,r10 ,r10 ,r10 ,r10 , , ,r10 ,r10 ,r10 ,r10 ,r10 ,r10 ,r10 ,r10 ,r10 ,| , , , , , , , , , , 34: ,r26 , , , , , , , , , ,r26 , , , , ,| , , , , , , , , , , 35: r23 ,r23 ,r23 ,r23 ,r23 , , ,r23 ,r23 ,r23 ,r23 ,r23 ,r23 ,r23 ,r23 ,r23 ,| , , , , , , , , , , 36: ,s13 ,s1 ,s9 ,s11 , , ,s5 ,s10 , ,s14 , ,s3 , , ,s2 ,| ,37,6, , , ,7, ,12,4, 37: ,r25 , , , , , , , , , ,r25 , , , , ,| , , , , , , , , , ,
Ruby 1.9 with the updated libraries
Actions: 0: ,s14 ,s5 ,s12 ,s1 , , ,s2 ,s13 , ,s3 , ,s10 , , ,s9 ,| ,8,6, , , ,7, ,11,4, 1: r8 ,r8 , , , , , , , , , ,r8 , , , , ,| , , , , , , , , , , 2: r9 ,r9 , , , , , , , , , ,r9 , , , , ,| , , , , , , , , , , 3: ,s17 , , , , , , , , , ,s16 , , , , ,| , , , , , , ,15, , , 4: r6 ,r6 , , , , , , , , , ,r6 , , , , ,| , , , , , , , , , , 5: r5 ,r5 , , , , , , , , , ,r5 , , , , ,| , , , , , , , , , , 6: r3 ,r3 , , , , , , , , , ,r3 , , , , ,| , , , , , , , , , , 7: r2 ,r2 , , , , , , , , , ,r2 , , , , ,| , , , , , , , , , , 8: a , , , , , , , , , , , , , , , ,| , , , , , , , , , , 9: r28 ,r28 ,r28 ,r28 ,r28 , , ,r28 ,r28 ,r28 ,r28 ,r28 ,r28 ,r28 ,r28 ,r28 ,| , , , , , , , , , , 10: , , , , , , , , , , , , ,s18 , , ,| , , , , , , , , , , 11: r7 ,r7 , , , , , , , , , ,r7 , , , , ,| , , , , , , , , , , 12: r4 ,r4 , , , , , , , , , ,r4 , , , , ,| , , , , , , , , , , 13: ,s30 ,s24 ,s28 ,s19 , , ,s21 ,s13 ,s23 ,s3 , , , , ,s9 ,| , ,26,29,20,27,25, , ,22, 14: r1 ,r1 , , , , , , , , , ,r1 , , , , ,| , , , , , , , , , , 15: ,s32 , , , , , , , , , ,s31 , , , , ,| , , , , , , , , , , 16: r24 ,r24 ,r24 ,r24 ,r24 , , ,r24 ,r24 ,r24 ,r24 ,r24 ,r24 ,r24 ,r24 ,r24 ,| , , , , , , , , , , 17: ,s14 ,s5 ,s12 ,s1 , , ,s2 ,s13 , ,s3 , ,s10 , , ,s9 ,| ,33,6, , , ,7, ,11,4, 18: , , , , , , , , , , , , , ,s34 , ,| , , , , , , , , , , 19: ,r20 ,r20 ,r20 ,r20 , , ,r20 ,r20 ,r20 ,r20 ,r20 ,r20 ,r20 ,r20 ,r20 ,| , , , , , , , , , , 20: ,s30 ,s24 ,s28 ,s19 , , ,s21 ,s13 ,r12 ,s3 , , , , ,s9 ,| , ,26, , ,35,25, , ,22, 21: ,r21 ,r21 ,r21 ,r21 , , ,r21 ,r21 ,r21 ,r21 ,r21 ,r21 ,r21 ,r21 ,r21 ,| , , , , , , , , , , 22: ,r22 ,r22 ,r22 ,r22 , , ,r22 ,r22 ,r22 ,r22 ,r22 ,r22 ,r22 ,r22 ,r22 ,| , , , , , , , , , , 23: r11 ,r11 ,r11 ,r11 ,r11 , , ,r11 ,r11 ,r11 ,r11 ,r11 ,r11 ,r11 ,r11 ,r11 ,| , , , , , , , , , , 24: ,r17 ,r17 ,r17 ,r17 , , ,r17 ,r17 ,r17 ,r17 ,r17 ,r17 ,r17 ,r17 ,r17 ,| , , , , , , , , , , 25: ,r16 ,r16 ,r16 ,r16 , , ,r16 ,r16 ,r16 ,r16 ,r16 ,r16 ,r16 ,r16 ,r16 ,| , , , , , , , , , , 26: ,r15 ,r15 ,r15 ,r15 , , ,r15 ,r15 ,r15 ,r15 ,r15 ,r15 ,r15 ,r15 ,r15 ,| , , , , , , , , , , 27: ,r14 ,r14 ,r14 ,r14 , , ,r14 ,r14 ,r14 ,r14 ,r14 ,r14 ,r14 ,r14 ,r14 ,| , , , , , , , , , , 28: ,r19 ,r19 ,r19 ,r19 , , ,r19 ,r19 ,r19 ,r19 ,r19 ,r19 ,r19 ,r19 ,r19 ,| , , , , , , , , , , 29: , , , , , , , , ,s36 , , , , , , ,| , , , , , , , , , , 30: ,r18 ,r18 ,r18 ,r18 , , ,r18 ,r18 ,r18 ,r18 ,r18 ,r18 ,r18 ,r18 ,r18 ,| , , , , , , , , , , 31: r23 ,r23 ,r23 ,r23 ,r23 , , ,r23 ,r23 ,r23 ,r23 ,r23 ,r23 ,r23 ,r23 ,r23 ,| , , , , , , , , , , 32: ,s14 ,s5 ,s12 ,s1 , , ,s2 ,s13 , ,s3 , ,s10 , , ,s9 ,| ,37,6, , , ,7, ,11,4, 33: ,r26 , , , , , , , , , ,r26 , , , , ,| , , , , , , , , , , 34: r27 ,r27 , , , , , , , , , ,r27 , , , , ,| , , , , , , , , , , 35: ,r13 ,r13 ,r13 ,r13 , , ,r13 ,r13 ,r13 ,r13 ,r13 ,r13 ,r13 ,r13 ,r13 ,| , , , , , , , , , , 36: r10 ,r10 ,r10 ,r10 ,r10 , , ,r10 ,r10 ,r10 ,r10 ,r10 ,r10 ,r10 ,r10 ,r10 ,| , , , , , , , , , , 37: ,r25 , , , , , , , , , ,r25 , , , , ,| , , , , , , , , , ,
Note
IMPORTANT
The latest libraries
# Parser for PdfAttributes # created by Rockit version 0.3.8 on Fri Nov 26 11:17:14 +0100 2010 # Rockit is copyright (c) 2001 Robert Feldt, feldt@ce.chalmers.se # and licensed under GPL # but this parser is under LGPL
The updated libraries
# Parser for PdfAttributes # created by Rockit version 0.3.8 on Tue Jan 18 07:42:18 +0100 2011 # Rockit is copyright (c) 2001 Robert Feldt, feldt@ce.chalmers.se # and licensed under GPL # but this parser is under LGPL
Reference
Next
Check lib/rpdf2txt/object.rb again
lib/rpdf2txt/object.rb#extract_attributes
def extract_attributes(ast) if(ast.children_names.include?('value')) pdf_unescape(ast.value) elsif(ast.children_names.include?('text')) pdf_unescape(ast.text.value[1...-1]) elsif(ast.children_names.include?('values')) ast.values.collect { |child| extract_attributes(child) } elsif(ast.children_names.include?('pairs')) result = {} print "ast=" p ast print "ast.pairs=" p ast.pairs print "ast.pairs.class=" p ast.pairs.class puts ast.pairs.each { |pair| k, v = pair keystr = k.value.strip.tr('/','') unless(keystr.empty?) result.store(keystr.downcase.intern, extract_attributes(v)) end } result else value = ast end end
Result
The latest libraries
masa@masa ~/ywesee/rpdf2txt $ ruby18 -I ~/work/rpdf2txt/lib test/test_pdf_parser.rb ... ast=Hash:[_ArrayNode] ast.pairs=_ArrayNode ast.pairs.class=ArrayNode
The updated libraries
masa@masa ~/ywesee/rpdf2txt $ ruby1.9 -I lib test/test_pdf_parser.rb ... ast=Hash:[nil] ast.pairs=nil ast.pairs.class=NilClass
Note
Experiment
def extract_attributes(ast) if(ast.children_names.include?('value')) pdf_unescape(ast.value) elsif(ast.children_names.include?('text')) pdf_unescape(ast.text.value[1...-1]) elsif(ast.children_names.include?('values')) ast.values.collect { |child| extract_attributes(child) } elsif(ast.children_names.include?('pairs')) result = {} if(ast_pairs = ast.pairs) #ast.pairs.each { |pair| ast_pairs.each { |pair| k, v = pair keystr = k.value.strip.tr('/','') unless(keystr.empty?) result.store(keystr.downcase.intern, extract_attributes(v)) end } end result else value = ast end end ... def raw_stream #@raw_stream ||= @src.scan(/stream[\r\n]{1,2}(.*)endstream/mn).to_s #@raw_stream ||= @src.scan(/stream[\r\n]{1,2}(.*)endstream/mn)[0][0] unless(@raw_stream) if(src_scan = @src.scan(/stream[\r\n]{1,2}(.*)endstream/mn) and !src_scan.empty?) @raw_stream = src_scan[0][0] else @raw_stream = src_scan.to_s end end return @raw_stream end
Result
masa@masa ~/ywesee/rpdf2txt $ ruby1.9 -I lib test/test_pdf_parser.rb test/test_pdf_parser.rb:28: warning: variable $KCODE is no longer effective; ignored Loaded suite test/test_pdf_parser Started . Finished in 0.084548164 seconds. 1 tests, 0 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications 100% passed
Note
Check all the tests (test_pdf_parser.rb)
masa@masa ~/ywesee/rpdf2txt $ ruby1.9 -I lib test/test_pdf_parser.rb test/test_pdf_parser.rb:28: warning: variable $KCODE is no longer effective; ignored Loaded suite test/test_pdf_parser Started ....F........ 1) Failure: test_join_snippets__hex_chars(TestParser) [test/test_pdf_parser.rb:334]: <"Paroxetin besitzt eine selektive Wirkung; in-vitro Studien haben gezeigt, dass es, im Gegensatz zu\ntrizyklischen Antidepressiva, eine geringe Affinit\xE4t f\xFCr a1-, a2- und b-Adrenozeptoren sowie f\xFCr\nDopamin (D2)-, 5-HT1-artige, 5-HT2 und Histamin (H1)-Rezeptoren aufweist. Das Fehlen einer\n"> expected but was <"Paroxetin besitzt eine selektive Wirkung; in-vitro Studien haben gezeigt, dass es, im Gegensatz zu\ntrizyklischen Antidepressiva, eine geringe Affinit\xE4t f\xFCr a1-, a2- und b-Adrenozeptoren sowie f\xFCr\nDopamin (D2)-, 5-HT1-artige, 5-HT2 und Histamin (H1)-Rezeptoren aufweist. Das Fehlen einer\n"> diff: Paroxetin besitzt eine selektive Wirkung; in-vitro Studien haben gezeigt, dass es, im Gegensatz zu ? trizyklischen Antidepressiva, eine geringe Affinit� f� a1-, a2- und b-Adrenozeptoren sowie f� Dopamin (D2)-, 5-HT1-artige, 5-HT2 und Histamin (H1)-Rezeptoren aufweist. Das Fehlen einer Finished in 0.421212833 seconds. 13 tests, 56 assertions, 1 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications 92.3077% passed
Result
masa@masa ~/ywesee/rpdf2txt $ ruby1.9 -I lib test/test_pdf_parser.rb test/test_pdf_parser.rb:28: warning: variable $KCODE is no longer effective; ignored Loaded suite test/test_pdf_parser Started ............. Finished in 0.411339522 seconds. 13 tests, 56 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications 100% passed
Next
masa@masa ~/ywesee/rpdf2txt $ ruby1.9 -I lib test/test_pdf_parser.rb test/test_pdf_parser.rb:28: warning: variable $KCODE is no longer effective; ignored Loaded suite test/test_pdf_parser Started ........F.............. 1) Failure: test_join_snippets6(TestParser) [test/test_pdf_parser.rb:481]: <"In Studie 1 evaluierte man 271 Patienten mit einer m\xE4ssigen bis schweren aktiven rheumatoiden \nArthritis, die \xB318 Jahre alt waren, bei denen die Therapie mit mindestens einem, aber mit nicht mehr \n"> expected but was <"In Studie 1 evaluierte man 271 Patienten mit einer m\xE4ssigen bis schweren aktiven rheumatoiden \nArthritis, die $18 Jahre alt waren, bei denen die Therapie mit mindestens einem, aber mit nicht mehr \n"> diff: In Studie 1 evaluierte man 271 Patienten mit einer m�sigen bis schweren aktiven rheumatoiden ? Arthritis, die �18 Jahre alt waren, bei denen die Therapie mit mindestens einem, aber mit nicht mehr ? $ Finished in 2.576134358 seconds. 23 tests, 69 assertions, 1 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications 95.6522% passed
Note
suspend
First, focus on
1) Failure: test_char_width(TestTextState) [test/test_text_state.rb:303]: <0.313> expected but was <0.301>
Experiment
lib/rpdf2txt/text_state.rb#char_width
def char_width(char) if(char.is_a? String) char = char[0] end w = 0.0 if(@font && (width = @font.width(char))) w = width elsif(@font && (avg = @font.attributes[:avgwidth])) w = avg end print "w=" p w print "char=" p char w = 300.0 if w == 0 w += @char_spacing if(char==32) w += @word_spacing end w * @font_size / USER_SPACE end
Result
Ruby 1.8 with the latest libraries
w=278 char=32 w=278 char=32 w=278 char=32
Ruby 1.9 with the updated libraries
w=278 char=" " w=278 char=" " w=278 char=" "
Note
Experiment
lib/rpdf2txt/text_state.rb#char_width
def char_width(char) if(char.is_a? String) #char = char[0] char = char.unpack('C*')[0] end
Reference
Result
masa@masa ~/ywesee/rpdf2txt $ ruby18 -I lib test/test_text_state.rb Loaded suite test/test_text_state Started . Finished in 0.113862 seconds. 1 tests, 3 assertions, 0 failures, 0 errors
masa@masa ~/ywesee/rpdf2txt $ ruby1.9 -I lib test/test_text_state.rb Loaded suite test/test_text_state Started . Finished in 0.077112895 seconds. 1 tests, 3 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications 100% passed
Note
The last failure
masa@masa ~/ywesee/rpdf2txt $ ruby1.9 -I lib test/test_pdf_parser.rb test/test_pdf_parser.rb:28: warning: variable $KCODE is no longer effective; ignored Loaded suite test/test_pdf_parser Started ........F.............. 1) Failure: test_join_snippets6(TestParser) [test/test_pdf_parser.rb:482]: <"In Studie 1 evaluierte man 271 Patienten mit einer m\xE4ssigen bis schweren aktiven rheumatoiden \nArthritis, die \xB318 Jahre alt waren, bei denen die Therapie mit mindestens einem, aber mit nicht mehr \n"> expected but was <"In Studie 1 evaluierte man 271 Patienten mit einer m\xE4ssigen bis schweren aktiven rheumatoiden \nArthritis, die $18 Jahre alt waren, bei denen die Therapie mit mindestens einem, aber mit nicht mehr \n"> diff: In Studie 1 evaluierte man 271 Patienten mit einer m�sigen bis schweren aktiven rheumatoiden ? Arthritis, die �18 Jahre alt waren, bei denen die Therapie mit mindestens einem, aber mit nicht mehr ? $ Finished in 2.577773336 seconds. 23 tests, 69 assertions, 1 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications 95.6522% passed
Experiment
lib/rpdf2txt/text.rb#mapped_ascii
def mapped_ascii(ascii) if(@current_font) print "@current_font.cmap=", @current_font.cmap.inspect, "\n" if @current_font.cmap print "cmap.map=", @current_font.cmap.map.inspect, "\n" if @current_font.cmap print "ascii=", ascii, "\n" if @current_font.cmap if((cmap = @current_font.cmap) && (map = cmap.map) \ && (unicode_bytes = map[ascii]) \ && (ascii = SymbolMap::SYMBOL_ENTITIES[unicode_bytes])) print "ascii.chr=" p ascii.chr
Result
Ruby 1.8
masa@masa ~/ywesee/rpdf2txt $ ruby18 -I lib test/test_pdf_parser.rb Loaded suite test/test_pdf_parser Started @current_font.cmap=#<Rpdf2txt::CMap:0x7fdfca1b7298 @map={36=>8805}, @target_encoding="utf8", @decrypted_stream="", @decoded_stream="", @attributes={}, @src="<< >>", @raw_stream=""> cmap.map={36=>8805} ascii=36 ascii.chr="\263"
Ruby 1.9
masa@masa ~/ywesee/rpdf2txt $ ruby1.9 -I lib test/test_pdf_parser.rb test/test_pdf_parser.rb:28: warning: variable $KCODE is no longer effective; ignored Loaded suite test/test_pdf_parser Started @current_font.cmap=#<Rpdf2txt::CMap:0x00000000c6c780 @map={36=>8805}, @attributes={}, @src="<< >>", @target_encoding="utf8", @raw_stream="[]", @decrypted_stream="[]", @decoded_stream="[]"> cmap.map={36=>8805} ascii=$
Note
Experiment
def snip(snippet) snippet_text = snippet[1..-2].gsub(/\\[nrt]/n, " ") snippet_text.gsub!(/\\([()])/n, '\1') snippet_text.gsub!(/./n) { |char| #self.mapped_ascii(char[0]) || char self.mapped_ascii(char.unpack('C*')[0]) || char
Result
Ruby 1.8
masa@masa ~/ywesee/rpdf2txt $ ruby18 -I lib test/test_pdf_parser.rb Loaded suite test/test_pdf_parser Started . Finished in 0.286884 seconds. 1 tests, 1 assertions, 0 failures, 0 errors
Ruby 1.9
masa@masa ~/ywesee/rpdf2txt $ ruby1.9 -I lib test/test_pdf_parser.rb test/test_pdf_parser.rb:28: warning: variable $KCODE is no longer effective; ignored Loaded suite test/test_pdf_parser Started . Finished in 0.198423306 seconds. 1 tests, 1 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications 100% passed
Note
Final check
Ruby 1.8 with the updated libraries
masa@masa ~/ywesee/rpdf2txt $ ruby18 -I lib test/suite.rb Loaded suite test/suite Started ......................'invalid literal/lengths set' when filtering with /FlateDecode ...................................................................unknown encoding 370 0 R ............................................. Finished in 12.640062 seconds. 134 tests, 295 assertions, 0 failures, 0 errors
Ruby 1.9 with the updated libraries
masa@masa ~/ywesee/rpdf2txt $ ruby1.9 -I lib test/suite.rb test/suite.rb:26: warning: variable $KCODE is no longer effective; ignored /home/masa/ywesee/rpdf2txt/test/test_pdf_object.rb:26: warning: variable $KCODE is no longer effective; ignored /home/masa/ywesee/rpdf2txt/test/test_pdf_parser.rb:28: warning: variable $KCODE is no longer effective; ignored Loaded suite test/suite Started ......................'invalid literal/lengths set' when filtering with /FlateDecode ......................................................... ..........unknown encoding 370 0 R ............................................. Finished in 9.930899752 seconds. 134 tests, 295 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications 100% passed
Note
Ruby 1.8 with the latest libraries
masa@masa ~/ywesee/rpdf2txt $ ruby18 test/suite.rb Loaded suite test/suite Started ......................'invalid literal/lengths set' when filtering with /FlateDecode ...................................................................unknown encoding 370 0 R ....................F........................ Finished in 12.225387 seconds. 1) Failure: test_join_snippets__hex_chars(TestParser) [/home/masa/ywesee/rpdf2txt/test/test_pdf_parser.rb:335]: <"Paroxetin besitzt eine selektive Wirkung; in-vitro Studien haben gezeigt, dass es, im Gegensatz zu\ntrizyklischen Antidepressiva, eine geringe Affinit� f� a1-, a2- und b-Adrenozeptoren sowie f� Dopamin (D2)-, 5-HT1-artige, 5-HT2 und Histamin (H1)-Rezeptoren aufweist. Das Fehlen einer\n"> expected but was <"Paroxetin besitzt eine selektive Wirkung; in-vitro Studien haben gezeigt, dass es, im Gegensatz zu\ntrizyklischen Antidepressiva, eine geringe Affinit� f� a1-, a2- und b-Adrenozeptoren sowie f� Dopamin (D2)-, 5-HT1-artige, 5-HT2 und Histamin (H1)-Rezeptoren aufweist. Das Fehlen einer\n">.
Note
Experiment
lib/rpdf2txt/text.rb#snip
def snip(snippet) snippet_text = snippet[1..-2].gsub(/\\[nrt]/n, " ") print "snippet_text1=" p snippet_text snippet_text.gsub!(/\\([()])/n, '\1') print "snippet_text2=" p snippet_text snippet_text.gsub!(/./n) { |char| #self.mapped_ascii(char[0]) || char self.mapped_ascii(char.unpack('C*')[0]) || char } print "snippet_text3=" p snippet_text _snip(snippet_text) end
Result
Ruby 1.8 with the latest libraries
snippet_text1="trizyklischen Antidepressiva, eine geringe Affinit� f� �" snippet_text2="trizyklischen Antidepressiva, eine geringe Affinit� f� �" snippet_text3="trizyklischen Antidepressiva, eine geringe Affinit� f� "
Ruby 1.9 with the updated libraries
snippet_text1="trizyklischen Antidepressiva, eine geringe Affinit\xE4t f\xFCr " snippet_text2="trizyklischen Antidepressiva, eine geringe Affinit\xE4t f\xFCr " snippet_text3="trizyklischen Antidepressiva, eine geringe Affinit\xE4t f\xFCr "
Note
snippet_text.gsub!(/./n) { |char| #self.mapped_ascii(char[0]) || char self.mapped_ascii(char.unpack('C*')[0]) || char }
Experiment
def snip(snippet) snippet_text = snippet[1..-2].gsub(/\\[nrt]/n, " ") print "snippet_text1=" p snippet_text snippet_text.gsub!(/\\([()])/n, '\1') print "snippet_text2=" p snippet_text print "snippet_text.scan(/./n)=" p snippet_text.scan(/./n) snippet_text.gsub!(/./n) { |char| #self.mapped_ascii(char[0]) || char self.mapped_ascii(char.unpack('C*')[0]) || char } print "snippet_text3=" p snippet_text _snip(snippet_text) end
Result
Ruby 1.8 with the latest libraries
snippet_text2="trizyklischen Antidepressiva, eine geringe Affinit� f� " snippet_text.scan(/./n)=["t", "r", "i", "z", "y", "k", "l", "i", "s", "c", "h", "e", "n", " ", "A", "n", "t", "i", "d", "e", "p", "r", "e", "s", "s", "i", "v", "a", ",", " ", "e", "i", "n", "e", " ", "g", "e", "r", "i", "n", "g", "e", " ", "A", "f", "f", "i", "n", "i", "t", "\344", "t", " ", "f", "\374", "r", " "]
Ruby 1.9 with the updated libraries
snippet_text2="trizyklischen Antidepressiva, eine geringe Affinit\xE4t f\xFCr " snippet_text.scan(/./n)=["t", "r", "i", "z", "y", "k", "l", "i", "s", "c", "h", "e", "n", " ", "A", "n", "t", "i", "d", "e", "p", "r", "e", "s", "s", "i", "v", "a", ",", " ", "e", "i", "n", "e", " ", "g", "e", "r", "i", "n", "g", "e", " ", "A", "f", "f", "i", "n", "i", "t", "\xE4", "t", " ", "f", "\xFC", "r", " "]
Note
Then the point is the following process
self.mapped_ascii(char.unpack('C*')[0]) || char