<< | Index | >>
Check the current state
Ruby 1.8 with the latest libraries
masa@masa ~/ywesee/rpdf2txt $ ruby test/suite.rb
Loaded suite test/suite
Started
......................'invalid literal/lengths set' when filtering with /FlateDecode
...................................................................unknown encoding 370 0 R
.........#<Rpdf2txt::CMap:0x7f9d3ecb0b78 @target_encoding="utf8", @decoded_stream="", @decrypted_stream="", @src="<< >>", @raw_stream="", @map={}, @attributes={}>
.......................
Finished in 9.064189 seconds.
121 tests, 277 assertions, 0 failures, 0 errors
Ruby 1.8 with the updated libraries
masa@masa ~/ywesee/rpdf2txt $ ruby18 -I lib test/suite.rb
Loaded suite test/suite
Started
......................'invalid literal/lengths set' when filtering with /FlateDecode
...................................................................unknown encoding 370 0 R
.........E......................
Finished in 9.428983 seconds.
1) Error:
test_join_snippets__hex_chars(TestParser):
NoMethodError: undefined method `[]' for nil:NilClass
./lib/rpdf2txt/object.rb:787:in `raw_stream'
./lib/rpdf2txt/object.rb:790:in `decode_raw_stream'
./lib/rpdf2txt/object.rb:682:in `decoded_stream'
./lib/rpdf2txt/object.rb:1050:in `extract_bfchar'
./lib/rpdf2txt/object.rb:1069:in `parse_cmap'
./lib/rpdf2txt/object.rb:1008:in `initialize'
/home/masa/ywesee/rpdf2txt/test/test_pdf_parser.rb:313:in `new'
/home/masa/ywesee/rpdf2txt/test/test_pdf_parser.rb:313:in `test_join_snippets__hex_chars'
121 tests, 277 assertions, 0 failures, 1 errors
Ruby 1.9
masa@masa ~/ywesee/rpdf2txt $ ruby1.9 -I lib test/suite.rb
test/suite.rb:26: warning: variable $KCODE is no longer effective; ignored
/home/masa/ywesee/rpdf2txt/test/test_pdf_object.rb:26: warning: variable $KCODE is no longer effective; ignored
/home/masa/ywesee/rpdf2txt/test/test_pdf_parser.rb:28: warning: variable $KCODE is no longer effective; ignored
Loaded suite test/suite
Started
......................'invalid literal/lengths set' when filtering with /FlateDecode
.........................................................
..........unknown encoding 370 0 R
.........E........F.............
1) Error:
test_join_snippets__hex_chars(TestParser):
NoMethodError: undefined method `each' for nil:NilClass
/home/masa/ywesee/rpdf2txt/lib/rpdf2txt/object.rb:107:in `extract_attributes'
/home/masa/ywesee/rpdf2txt/lib/rpdf2txt/object.rb:88:in `parse_attributes'
/home/masa/ywesee/rpdf2txt/lib/rpdf2txt/object.rb:44:in `initialize'
/home/masa/ywesee/rpdf2txt/lib/rpdf2txt/object.rb:1007:in `initialize'
/home/masa/ywesee/rpdf2txt/test/test_pdf_parser.rb:313:in `new'
/home/masa/ywesee/rpdf2txt/test/test_pdf_parser.rb:313:in `test_join_snippets__hex_chars'
2) Failure:
test_char_width(TestTextState) [/home/masa/ywesee/rpdf2txt/test/test_text_state.rb:303]:
<0.313> expected but was
<0.301>
diff:
? 0.3013
Finished in 7.396021768 seconds.
121 tests, 277 assertions, 1 failures, 1 errors, 0 pendings, 0 omissions, 0 notifications
98.3471% passed
Focus on the test_join_snippets__hex_chars error
test/test_pdf_parser.rb#test_join_snippets__hex_chars
1) Error:
test_join_snippets__hex_chars(TestParser):
NoMethodError: undefined method `each' for nil:NilClass
/home/masa/ywesee/rpdf2txt/lib/rpdf2txt/object.rb:107:in `extract_attributes'
/home/masa/ywesee/rpdf2txt/lib/rpdf2txt/object.rb:88:in `parse_attributes'
/home/masa/ywesee/rpdf2txt/lib/rpdf2txt/object.rb:44:in `initialize'
/home/masa/ywesee/rpdf2txt/lib/rpdf2txt/object.rb:1007:in `initialize'
/home/masa/ywesee/rpdf2txt/test/test_pdf_parser.rb:313:in `new'
/home/masa/ywesee/rpdf2txt/test/test_pdf_parser.rb:313:in `test_join_snippets__hex_chars'
Experiment
lib/rpdf2txt/object.rb#extract_attributes
def extract_attributes(ast)
if(ast.children_names.include?('value'))
pdf_unescape(ast.value)
elsif(ast.children_names.include?('text'))
pdf_unescape(ast.text.value[1...-1])
elsif(ast.children_names.include?('values'))
ast.values.collect { |child| extract_attributes(child) }
elsif(ast.children_names.include?('pairs'))
result = {}
print "ast="
p ast
print "ast.pairs="
p ast.pairs
print "ast.pairs.class="
p ast.pairs.class
puts
ast.pairs.each { |pair|
k, v = pair
keystr = k.value.strip.tr('/','')
unless(keystr.empty?)
result.store(keystr.downcase.intern, extract_attributes(v))
end
}
result
else
value = ast
end
end
Result
Ruby 1.8 with the latest libraries
masa@masa ~/ywesee/rpdf2txt $ ruby18 test/test_pdf_parser.rb ... ast=Hash:[_ArrayNode] ast.pairs=_ArrayNode ast.pairs.class=ArrayNode
Ruby 1.9 with the updated libraries
masa@masa ~/ywesee/rpdf2txt $ ruby1.9 -I lib test/test_pdf_parser.rb ... ast=Hash:[nil] ast.pairs=nil ast.pairs.class=NilClass
Note
Experiment
lib/rpdf2txt-rockit/glr_parser.rb#actor
def actor(stack)
print "stack="
p stack
#puts "actor(#{stack.state}) @stacks_to_act_on = #{@stacks_to_act_on.map{|s| s.state}.inspect}, @active_stacks = #{@active_stacks.map{|s| s.state}.inspect}"
tokens = stack.lexer.peek
#print "tokens = #{tokens.inspect}, "
print "tokens="
p tokens
tokens.each do |token|
#print "state = #{stack.state.inspect}, "
print "@parse_table="
p @parse_table
exit
Result
Ruby 1.8 with the latest libraries
Actions: 0: ,s13 ,s1 ,s9 ,s11 , , ,s5 ,s10 , ,s14 , ,s3 , , ,s2 ,| ,8,6, , , ,7, ,12,4, 1: r5 ,r5 , , , , , , , , , ,r5 , , , , ,| , , , , , , , , , , 2: r28 ,r28 ,r28 ,r28 ,r28 , , ,r28 ,r28 ,r28 ,r28 ,r28 ,r28 ,r28 ,r28 ,r28 ,| , , , , , , , , , , 3: , , , , , , , , , , , , ,s15 , , ,| , , , , , , , , , , 4: r6 ,r6 , , , , , , , , , ,r6 , , , , ,| , , , , , , , , , , 5: r9 ,r9 , , , , , , , , , ,r9 , , , , ,| , , , , , , , , , , 6: r3 ,r3 , , , , , , , , , ,r3 , , , , ,| , , , , , , , , , , 7: r2 ,r2 , , , , , , , , , ,r2 , , , , ,| , , , , , , , , , , 8: a , , , , , , , , , , , , , , , ,| , , , , , , , , , , 9: r4 ,r4 , , , , , , , , , ,r4 , , , , ,| , , , , , , , , , , 10: ,s23 ,s16 ,s21 ,s22 , , ,s18 ,s10 ,s26 ,s14 , , , , ,s2 ,| , ,20,27,25,24,19, , ,17, 11: r8 ,r8 , , , , , , , , , ,r8 , , , , ,| , , , , , , , , , , 12: r7 ,r7 , , , , , , , , , ,r7 , , , , ,| , , , , , , , , , , 13: r1 ,r1 , , , , , , , , , ,r1 , , , , ,| , , , , , , , , , , 14: ,s29 , , , , , , , , , ,s28 , , , , ,| , , , , , , ,30, , , 15: , , , , , , , , , , , , , ,s31 , ,| , , , , , , , , , , 16: ,r17 ,r17 ,r17 ,r17 , , ,r17 ,r17 ,r17 ,r17 ,r17 ,r17 ,r17 ,r17 ,r17 ,| , , , , , , , , , , 17: ,r22 ,r22 ,r22 ,r22 , , ,r22 ,r22 ,r22 ,r22 ,r22 ,r22 ,r22 ,r22 ,r22 ,| , , , , , , , , , , 18: ,r21 ,r21 ,r21 ,r21 , , ,r21 ,r21 ,r21 ,r21 ,r21 ,r21 ,r21 ,r21 ,r21 ,| , , , , , , , , , , 19: ,r16 ,r16 ,r16 ,r16 , , ,r16 ,r16 ,r16 ,r16 ,r16 ,r16 ,r16 ,r16 ,r16 ,| , , , , , , , , , , 20: ,r15 ,r15 ,r15 ,r15 , , ,r15 ,r15 ,r15 ,r15 ,r15 ,r15 ,r15 ,r15 ,r15 ,| , , , , , , , , , , 21: ,r19 ,r19 ,r19 ,r19 , , ,r19 ,r19 ,r19 ,r19 ,r19 ,r19 ,r19 ,r19 ,r19 ,| , , , , , , , , , , 22: ,r20 ,r20 ,r20 ,r20 , , ,r20 ,r20 ,r20 ,r20 ,r20 ,r20 ,r20 ,r20 ,r20 ,| , , , , , , , , , , 23: ,r18 ,r18 ,r18 ,r18 , , ,r18 ,r18 ,r18 ,r18 ,r18 ,r18 ,r18 ,r18 ,r18 ,| , , , , , , , , , , 24: ,r14 ,r14 ,r14 ,r14 , , ,r14 ,r14 ,r14 ,r14 ,r14 ,r14 ,r14 ,r14 ,r14 ,| , , , , , , , , , , 25: ,s23 ,s16 ,s21 ,s22 , , ,s18 ,s10 ,r12 ,s14 , , , , ,s2 ,| , ,20, , ,32,19, , ,17, 26: r11 ,r11 ,r11 ,r11 ,r11 , , ,r11 ,r11 ,r11 ,r11 ,r11 ,r11 ,r11 ,r11 ,r11 ,| , , , , , , , , , , 27: , , , , , , , , ,s33 , , , , , , ,| , , , , , , , , , , 28: r24 ,r24 ,r24 ,r24 ,r24 , , ,r24 ,r24 ,r24 ,r24 ,r24 ,r24 ,r24 ,r24 ,r24 ,| , , , , , , , , , , 29: ,s13 ,s1 ,s9 ,s11 , , ,s5 ,s10 , ,s14 , ,s3 , , ,s2 ,| ,34,6, , , ,7, ,12,4, 30: ,s36 , , , , , , , , , ,s35 , , , , ,| , , , , , , , , , , 31: r27 ,r27 , , , , , , , , , ,r27 , , , , ,| , , , , , , , , , , 32: ,r13 ,r13 ,r13 ,r13 , , ,r13 ,r13 ,r13 ,r13 ,r13 ,r13 ,r13 ,r13 ,r13 ,| , , , , , , , , , , 33: r10 ,r10 ,r10 ,r10 ,r10 , , ,r10 ,r10 ,r10 ,r10 ,r10 ,r10 ,r10 ,r10 ,r10 ,| , , , , , , , , , , 34: ,r26 , , , , , , , , , ,r26 , , , , ,| , , , , , , , , , , 35: r23 ,r23 ,r23 ,r23 ,r23 , , ,r23 ,r23 ,r23 ,r23 ,r23 ,r23 ,r23 ,r23 ,r23 ,| , , , , , , , , , , 36: ,s13 ,s1 ,s9 ,s11 , , ,s5 ,s10 , ,s14 , ,s3 , , ,s2 ,| ,37,6, , , ,7, ,12,4, 37: ,r25 , , , , , , , , , ,r25 , , , , ,| , , , , , , , , , ,
Ruby 1.8 with the updated libraries
Actions: 0: ,s14 ,s5 ,s12 ,s1 , , ,s2 ,s13 , ,s3 , ,s10 , , ,s9 ,| ,8,6, , , ,7, ,11,4, 1: r8 ,r8 , , , , , , , , , ,r8 , , , , ,| , , , , , , , , , , 2: r9 ,r9 , , , , , , , , , ,r9 , , , , ,| , , , , , , , , , , 3: ,s17 , , , , , , , , , ,s16 , , , , ,| , , , , , , ,15, , , 4: r6 ,r6 , , , , , , , , , ,r6 , , , , ,| , , , , , , , , , , 5: r5 ,r5 , , , , , , , , , ,r5 , , , , ,| , , , , , , , , , , 6: r3 ,r3 , , , , , , , , , ,r3 , , , , ,| , , , , , , , , , , 7: r2 ,r2 , , , , , , , , , ,r2 , , , , ,| , , , , , , , , , , 8: a , , , , , , , , , , , , , , , ,| , , , , , , , , , , 9: r28 ,r28 ,r28 ,r28 ,r28 , , ,r28 ,r28 ,r28 ,r28 ,r28 ,r28 ,r28 ,r28 ,r28 ,| , , , , , , , , , , 10: , , , , , , , , , , , , ,s18 , , ,| , , , , , , , , , , 11: r7 ,r7 , , , , , , , , , ,r7 , , , , ,| , , , , , , , , , , 12: r4 ,r4 , , , , , , , , , ,r4 , , , , ,| , , , , , , , , , , 13: ,s30 ,s24 ,s28 ,s19 , , ,s21 ,s13 ,s23 ,s3 , , , , ,s9 ,| , ,26,29,20,27,25, , ,22, 14: r1 ,r1 , , , , , , , , , ,r1 , , , , ,| , , , , , , , , , , 15: ,s32 , , , , , , , , , ,s31 , , , , ,| , , , , , , , , , , 16: r24 ,r24 ,r24 ,r24 ,r24 , , ,r24 ,r24 ,r24 ,r24 ,r24 ,r24 ,r24 ,r24 ,r24 ,| , , , , , , , , , , 17: ,s14 ,s5 ,s12 ,s1 , , ,s2 ,s13 , ,s3 , ,s10 , , ,s9 ,| ,33,6, , , ,7, ,11,4, 18: , , , , , , , , , , , , , ,s34 , ,| , , , , , , , , , , 19: ,r20 ,r20 ,r20 ,r20 , , ,r20 ,r20 ,r20 ,r20 ,r20 ,r20 ,r20 ,r20 ,r20 ,| , , , , , , , , , , 20: ,s30 ,s24 ,s28 ,s19 , , ,s21 ,s13 ,r12 ,s3 , , , , ,s9 ,| , ,26, , ,35,25, , ,22, 21: ,r21 ,r21 ,r21 ,r21 , , ,r21 ,r21 ,r21 ,r21 ,r21 ,r21 ,r21 ,r21 ,r21 ,| , , , , , , , , , , 22: ,r22 ,r22 ,r22 ,r22 , , ,r22 ,r22 ,r22 ,r22 ,r22 ,r22 ,r22 ,r22 ,r22 ,| , , , , , , , , , , 23: r11 ,r11 ,r11 ,r11 ,r11 , , ,r11 ,r11 ,r11 ,r11 ,r11 ,r11 ,r11 ,r11 ,r11 ,| , , , , , , , , , , 24: ,r17 ,r17 ,r17 ,r17 , , ,r17 ,r17 ,r17 ,r17 ,r17 ,r17 ,r17 ,r17 ,r17 ,| , , , , , , , , , , 25: ,r16 ,r16 ,r16 ,r16 , , ,r16 ,r16 ,r16 ,r16 ,r16 ,r16 ,r16 ,r16 ,r16 ,| , , , , , , , , , , 26: ,r15 ,r15 ,r15 ,r15 , , ,r15 ,r15 ,r15 ,r15 ,r15 ,r15 ,r15 ,r15 ,r15 ,| , , , , , , , , , , 27: ,r14 ,r14 ,r14 ,r14 , , ,r14 ,r14 ,r14 ,r14 ,r14 ,r14 ,r14 ,r14 ,r14 ,| , , , , , , , , , , 28: ,r19 ,r19 ,r19 ,r19 , , ,r19 ,r19 ,r19 ,r19 ,r19 ,r19 ,r19 ,r19 ,r19 ,| , , , , , , , , , , 29: , , , , , , , , ,s36 , , , , , , ,| , , , , , , , , , , 30: ,r18 ,r18 ,r18 ,r18 , , ,r18 ,r18 ,r18 ,r18 ,r18 ,r18 ,r18 ,r18 ,r18 ,| , , , , , , , , , , 31: r23 ,r23 ,r23 ,r23 ,r23 , , ,r23 ,r23 ,r23 ,r23 ,r23 ,r23 ,r23 ,r23 ,r23 ,| , , , , , , , , , , 32: ,s14 ,s5 ,s12 ,s1 , , ,s2 ,s13 , ,s3 , ,s10 , , ,s9 ,| ,37,6, , , ,7, ,11,4, 33: ,r26 , , , , , , , , , ,r26 , , , , ,| , , , , , , , , , , 34: r27 ,r27 , , , , , , , , , ,r27 , , , , ,| , , , , , , , , , , 35: ,r13 ,r13 ,r13 ,r13 , , ,r13 ,r13 ,r13 ,r13 ,r13 ,r13 ,r13 ,r13 ,r13 ,| , , , , , , , , , , 36: r10 ,r10 ,r10 ,r10 ,r10 , , ,r10 ,r10 ,r10 ,r10 ,r10 ,r10 ,r10 ,r10 ,r10 ,| , , , , , , , , , , 37: ,r25 , , , , , , , , , ,r25 , , , , ,| , , , , , , , , , ,
Ruby 1.9 with the updated libraries
Actions: 0: ,s14 ,s5 ,s12 ,s1 , , ,s2 ,s13 , ,s3 , ,s10 , , ,s9 ,| ,8,6, , , ,7, ,11,4, 1: r8 ,r8 , , , , , , , , , ,r8 , , , , ,| , , , , , , , , , , 2: r9 ,r9 , , , , , , , , , ,r9 , , , , ,| , , , , , , , , , , 3: ,s17 , , , , , , , , , ,s16 , , , , ,| , , , , , , ,15, , , 4: r6 ,r6 , , , , , , , , , ,r6 , , , , ,| , , , , , , , , , , 5: r5 ,r5 , , , , , , , , , ,r5 , , , , ,| , , , , , , , , , , 6: r3 ,r3 , , , , , , , , , ,r3 , , , , ,| , , , , , , , , , , 7: r2 ,r2 , , , , , , , , , ,r2 , , , , ,| , , , , , , , , , , 8: a , , , , , , , , , , , , , , , ,| , , , , , , , , , , 9: r28 ,r28 ,r28 ,r28 ,r28 , , ,r28 ,r28 ,r28 ,r28 ,r28 ,r28 ,r28 ,r28 ,r28 ,| , , , , , , , , , , 10: , , , , , , , , , , , , ,s18 , , ,| , , , , , , , , , , 11: r7 ,r7 , , , , , , , , , ,r7 , , , , ,| , , , , , , , , , , 12: r4 ,r4 , , , , , , , , , ,r4 , , , , ,| , , , , , , , , , , 13: ,s30 ,s24 ,s28 ,s19 , , ,s21 ,s13 ,s23 ,s3 , , , , ,s9 ,| , ,26,29,20,27,25, , ,22, 14: r1 ,r1 , , , , , , , , , ,r1 , , , , ,| , , , , , , , , , , 15: ,s32 , , , , , , , , , ,s31 , , , , ,| , , , , , , , , , , 16: r24 ,r24 ,r24 ,r24 ,r24 , , ,r24 ,r24 ,r24 ,r24 ,r24 ,r24 ,r24 ,r24 ,r24 ,| , , , , , , , , , , 17: ,s14 ,s5 ,s12 ,s1 , , ,s2 ,s13 , ,s3 , ,s10 , , ,s9 ,| ,33,6, , , ,7, ,11,4, 18: , , , , , , , , , , , , , ,s34 , ,| , , , , , , , , , , 19: ,r20 ,r20 ,r20 ,r20 , , ,r20 ,r20 ,r20 ,r20 ,r20 ,r20 ,r20 ,r20 ,r20 ,| , , , , , , , , , , 20: ,s30 ,s24 ,s28 ,s19 , , ,s21 ,s13 ,r12 ,s3 , , , , ,s9 ,| , ,26, , ,35,25, , ,22, 21: ,r21 ,r21 ,r21 ,r21 , , ,r21 ,r21 ,r21 ,r21 ,r21 ,r21 ,r21 ,r21 ,r21 ,| , , , , , , , , , , 22: ,r22 ,r22 ,r22 ,r22 , , ,r22 ,r22 ,r22 ,r22 ,r22 ,r22 ,r22 ,r22 ,r22 ,| , , , , , , , , , , 23: r11 ,r11 ,r11 ,r11 ,r11 , , ,r11 ,r11 ,r11 ,r11 ,r11 ,r11 ,r11 ,r11 ,r11 ,| , , , , , , , , , , 24: ,r17 ,r17 ,r17 ,r17 , , ,r17 ,r17 ,r17 ,r17 ,r17 ,r17 ,r17 ,r17 ,r17 ,| , , , , , , , , , , 25: ,r16 ,r16 ,r16 ,r16 , , ,r16 ,r16 ,r16 ,r16 ,r16 ,r16 ,r16 ,r16 ,r16 ,| , , , , , , , , , , 26: ,r15 ,r15 ,r15 ,r15 , , ,r15 ,r15 ,r15 ,r15 ,r15 ,r15 ,r15 ,r15 ,r15 ,| , , , , , , , , , , 27: ,r14 ,r14 ,r14 ,r14 , , ,r14 ,r14 ,r14 ,r14 ,r14 ,r14 ,r14 ,r14 ,r14 ,| , , , , , , , , , , 28: ,r19 ,r19 ,r19 ,r19 , , ,r19 ,r19 ,r19 ,r19 ,r19 ,r19 ,r19 ,r19 ,r19 ,| , , , , , , , , , , 29: , , , , , , , , ,s36 , , , , , , ,| , , , , , , , , , , 30: ,r18 ,r18 ,r18 ,r18 , , ,r18 ,r18 ,r18 ,r18 ,r18 ,r18 ,r18 ,r18 ,r18 ,| , , , , , , , , , , 31: r23 ,r23 ,r23 ,r23 ,r23 , , ,r23 ,r23 ,r23 ,r23 ,r23 ,r23 ,r23 ,r23 ,r23 ,| , , , , , , , , , , 32: ,s14 ,s5 ,s12 ,s1 , , ,s2 ,s13 , ,s3 , ,s10 , , ,s9 ,| ,37,6, , , ,7, ,11,4, 33: ,r26 , , , , , , , , , ,r26 , , , , ,| , , , , , , , , , , 34: r27 ,r27 , , , , , , , , , ,r27 , , , , ,| , , , , , , , , , , 35: ,r13 ,r13 ,r13 ,r13 , , ,r13 ,r13 ,r13 ,r13 ,r13 ,r13 ,r13 ,r13 ,r13 ,| , , , , , , , , , , 36: r10 ,r10 ,r10 ,r10 ,r10 , , ,r10 ,r10 ,r10 ,r10 ,r10 ,r10 ,r10 ,r10 ,r10 ,| , , , , , , , , , , 37: ,r25 , , , , , , , , , ,r25 , , , , ,| , , , , , , , , , ,
Note
Next
Experiment
lib/rpdf2txt/data/pdfattributes.rb#_attr_parser
def Rpdf2txt._attr_parser print "@@parse_table70010113197280=" p @@parse_table70010113197280 exit
Result
Ruby 1.8 with the latest libraries
Actions: 0: ,s13 ,s1 ,s9 ,s11 , , ,s5 ,s10 , ,s14 , ,s3 , , ,s2 ,| ,8,6, , , ,7, ,12,4, 1: r5 ,r5 , , , , , , , , , ,r5 , , , , ,| , , , , , , , , , , 2: r28 ,r28 ,r28 ,r28 ,r28 , , ,r28 ,r28 ,r28 ,r28 ,r28 ,r28 ,r28 ,r28 ,r28 ,| , , , , , , , , , , 3: , , , , , , , , , , , , ,s15 , , ,| , , , , , , , , , , 4: r6 ,r6 , , , , , , , , , ,r6 , , , , ,| , , , , , , , , , , 5: r9 ,r9 , , , , , , , , , ,r9 , , , , ,| , , , , , , , , , , 6: r3 ,r3 , , , , , , , , , ,r3 , , , , ,| , , , , , , , , , , 7: r2 ,r2 , , , , , , , , , ,r2 , , , , ,| , , , , , , , , , , 8: a , , , , , , , , , , , , , , , ,| , , , , , , , , , , 9: r4 ,r4 , , , , , , , , , ,r4 , , , , ,| , , , , , , , , , , 10: ,s23 ,s16 ,s21 ,s22 , , ,s18 ,s10 ,s26 ,s14 , , , , ,s2 ,| , ,20,27,25,24,19, , ,17, 11: r8 ,r8 , , , , , , , , , ,r8 , , , , ,| , , , , , , , , , , 12: r7 ,r7 , , , , , , , , , ,r7 , , , , ,| , , , , , , , , , , 13: r1 ,r1 , , , , , , , , , ,r1 , , , , ,| , , , , , , , , , , 14: ,s29 , , , , , , , , , ,s28 , , , , ,| , , , , , , ,30, , , 15: , , , , , , , , , , , , , ,s31 , ,| , , , , , , , , , , 16: ,r17 ,r17 ,r17 ,r17 , , ,r17 ,r17 ,r17 ,r17 ,r17 ,r17 ,r17 ,r17 ,r17 ,| , , , , , , , , , , 17: ,r22 ,r22 ,r22 ,r22 , , ,r22 ,r22 ,r22 ,r22 ,r22 ,r22 ,r22 ,r22 ,r22 ,| , , , , , , , , , , 18: ,r21 ,r21 ,r21 ,r21 , , ,r21 ,r21 ,r21 ,r21 ,r21 ,r21 ,r21 ,r21 ,r21 ,| , , , , , , , , , , 19: ,r16 ,r16 ,r16 ,r16 , , ,r16 ,r16 ,r16 ,r16 ,r16 ,r16 ,r16 ,r16 ,r16 ,| , , , , , , , , , , 20: ,r15 ,r15 ,r15 ,r15 , , ,r15 ,r15 ,r15 ,r15 ,r15 ,r15 ,r15 ,r15 ,r15 ,| , , , , , , , , , , 21: ,r19 ,r19 ,r19 ,r19 , , ,r19 ,r19 ,r19 ,r19 ,r19 ,r19 ,r19 ,r19 ,r19 ,| , , , , , , , , , , 22: ,r20 ,r20 ,r20 ,r20 , , ,r20 ,r20 ,r20 ,r20 ,r20 ,r20 ,r20 ,r20 ,r20 ,| , , , , , , , , , , 23: ,r18 ,r18 ,r18 ,r18 , , ,r18 ,r18 ,r18 ,r18 ,r18 ,r18 ,r18 ,r18 ,r18 ,| , , , , , , , , , , 24: ,r14 ,r14 ,r14 ,r14 , , ,r14 ,r14 ,r14 ,r14 ,r14 ,r14 ,r14 ,r14 ,r14 ,| , , , , , , , , , , 25: ,s23 ,s16 ,s21 ,s22 , , ,s18 ,s10 ,r12 ,s14 , , , , ,s2 ,| , ,20, , ,32,19, , ,17, 26: r11 ,r11 ,r11 ,r11 ,r11 , , ,r11 ,r11 ,r11 ,r11 ,r11 ,r11 ,r11 ,r11 ,r11 ,| , , , , , , , , , , 27: , , , , , , , , ,s33 , , , , , , ,| , , , , , , , , , , 28: r24 ,r24 ,r24 ,r24 ,r24 , , ,r24 ,r24 ,r24 ,r24 ,r24 ,r24 ,r24 ,r24 ,r24 ,| , , , , , , , , , , 29: ,s13 ,s1 ,s9 ,s11 , , ,s5 ,s10 , ,s14 , ,s3 , , ,s2 ,| ,34,6, , , ,7, ,12,4, 30: ,s36 , , , , , , , , , ,s35 , , , , ,| , , , , , , , , , , 31: r27 ,r27 , , , , , , , , , ,r27 , , , , ,| , , , , , , , , , , 32: ,r13 ,r13 ,r13 ,r13 , , ,r13 ,r13 ,r13 ,r13 ,r13 ,r13 ,r13 ,r13 ,r13 ,| , , , , , , , , , , 33: r10 ,r10 ,r10 ,r10 ,r10 , , ,r10 ,r10 ,r10 ,r10 ,r10 ,r10 ,r10 ,r10 ,r10 ,| , , , , , , , , , , 34: ,r26 , , , , , , , , , ,r26 , , , , ,| , , , , , , , , , , 35: r23 ,r23 ,r23 ,r23 ,r23 , , ,r23 ,r23 ,r23 ,r23 ,r23 ,r23 ,r23 ,r23 ,r23 ,| , , , , , , , , , , 36: ,s13 ,s1 ,s9 ,s11 , , ,s5 ,s10 , ,s14 , ,s3 , , ,s2 ,| ,37,6, , , ,7, ,12,4, 37: ,r25 , , , , , , , , , ,r25 , , , , ,| , , , , , , , , , ,
Ruby 1.9 with the updated libraries
Actions: 0: ,s14 ,s5 ,s12 ,s1 , , ,s2 ,s13 , ,s3 , ,s10 , , ,s9 ,| ,8,6, , , ,7, ,11,4, 1: r8 ,r8 , , , , , , , , , ,r8 , , , , ,| , , , , , , , , , , 2: r9 ,r9 , , , , , , , , , ,r9 , , , , ,| , , , , , , , , , , 3: ,s17 , , , , , , , , , ,s16 , , , , ,| , , , , , , ,15, , , 4: r6 ,r6 , , , , , , , , , ,r6 , , , , ,| , , , , , , , , , , 5: r5 ,r5 , , , , , , , , , ,r5 , , , , ,| , , , , , , , , , , 6: r3 ,r3 , , , , , , , , , ,r3 , , , , ,| , , , , , , , , , , 7: r2 ,r2 , , , , , , , , , ,r2 , , , , ,| , , , , , , , , , , 8: a , , , , , , , , , , , , , , , ,| , , , , , , , , , , 9: r28 ,r28 ,r28 ,r28 ,r28 , , ,r28 ,r28 ,r28 ,r28 ,r28 ,r28 ,r28 ,r28 ,r28 ,| , , , , , , , , , , 10: , , , , , , , , , , , , ,s18 , , ,| , , , , , , , , , , 11: r7 ,r7 , , , , , , , , , ,r7 , , , , ,| , , , , , , , , , , 12: r4 ,r4 , , , , , , , , , ,r4 , , , , ,| , , , , , , , , , , 13: ,s30 ,s24 ,s28 ,s19 , , ,s21 ,s13 ,s23 ,s3 , , , , ,s9 ,| , ,26,29,20,27,25, , ,22, 14: r1 ,r1 , , , , , , , , , ,r1 , , , , ,| , , , , , , , , , , 15: ,s32 , , , , , , , , , ,s31 , , , , ,| , , , , , , , , , , 16: r24 ,r24 ,r24 ,r24 ,r24 , , ,r24 ,r24 ,r24 ,r24 ,r24 ,r24 ,r24 ,r24 ,r24 ,| , , , , , , , , , , 17: ,s14 ,s5 ,s12 ,s1 , , ,s2 ,s13 , ,s3 , ,s10 , , ,s9 ,| ,33,6, , , ,7, ,11,4, 18: , , , , , , , , , , , , , ,s34 , ,| , , , , , , , , , , 19: ,r20 ,r20 ,r20 ,r20 , , ,r20 ,r20 ,r20 ,r20 ,r20 ,r20 ,r20 ,r20 ,r20 ,| , , , , , , , , , , 20: ,s30 ,s24 ,s28 ,s19 , , ,s21 ,s13 ,r12 ,s3 , , , , ,s9 ,| , ,26, , ,35,25, , ,22, 21: ,r21 ,r21 ,r21 ,r21 , , ,r21 ,r21 ,r21 ,r21 ,r21 ,r21 ,r21 ,r21 ,r21 ,| , , , , , , , , , , 22: ,r22 ,r22 ,r22 ,r22 , , ,r22 ,r22 ,r22 ,r22 ,r22 ,r22 ,r22 ,r22 ,r22 ,| , , , , , , , , , , 23: r11 ,r11 ,r11 ,r11 ,r11 , , ,r11 ,r11 ,r11 ,r11 ,r11 ,r11 ,r11 ,r11 ,r11 ,| , , , , , , , , , , 24: ,r17 ,r17 ,r17 ,r17 , , ,r17 ,r17 ,r17 ,r17 ,r17 ,r17 ,r17 ,r17 ,r17 ,| , , , , , , , , , , 25: ,r16 ,r16 ,r16 ,r16 , , ,r16 ,r16 ,r16 ,r16 ,r16 ,r16 ,r16 ,r16 ,r16 ,| , , , , , , , , , , 26: ,r15 ,r15 ,r15 ,r15 , , ,r15 ,r15 ,r15 ,r15 ,r15 ,r15 ,r15 ,r15 ,r15 ,| , , , , , , , , , , 27: ,r14 ,r14 ,r14 ,r14 , , ,r14 ,r14 ,r14 ,r14 ,r14 ,r14 ,r14 ,r14 ,r14 ,| , , , , , , , , , , 28: ,r19 ,r19 ,r19 ,r19 , , ,r19 ,r19 ,r19 ,r19 ,r19 ,r19 ,r19 ,r19 ,r19 ,| , , , , , , , , , , 29: , , , , , , , , ,s36 , , , , , , ,| , , , , , , , , , , 30: ,r18 ,r18 ,r18 ,r18 , , ,r18 ,r18 ,r18 ,r18 ,r18 ,r18 ,r18 ,r18 ,r18 ,| , , , , , , , , , , 31: r23 ,r23 ,r23 ,r23 ,r23 , , ,r23 ,r23 ,r23 ,r23 ,r23 ,r23 ,r23 ,r23 ,r23 ,| , , , , , , , , , , 32: ,s14 ,s5 ,s12 ,s1 , , ,s2 ,s13 , ,s3 , ,s10 , , ,s9 ,| ,37,6, , , ,7, ,11,4, 33: ,r26 , , , , , , , , , ,r26 , , , , ,| , , , , , , , , , , 34: r27 ,r27 , , , , , , , , , ,r27 , , , , ,| , , , , , , , , , , 35: ,r13 ,r13 ,r13 ,r13 , , ,r13 ,r13 ,r13 ,r13 ,r13 ,r13 ,r13 ,r13 ,r13 ,| , , , , , , , , , , 36: r10 ,r10 ,r10 ,r10 ,r10 , , ,r10 ,r10 ,r10 ,r10 ,r10 ,r10 ,r10 ,r10 ,r10 ,| , , , , , , , , , , 37: ,r25 , , , , , , , , , ,r25 , , , , ,| , , , , , , , , , ,
Note
IMPORTANT
The latest libraries
# Parser for PdfAttributes # created by Rockit version 0.3.8 on Fri Nov 26 11:17:14 +0100 2010 # Rockit is copyright (c) 2001 Robert Feldt, feldt@ce.chalmers.se # and licensed under GPL # but this parser is under LGPL
The updated libraries
# Parser for PdfAttributes # created by Rockit version 0.3.8 on Tue Jan 18 07:42:18 +0100 2011 # Rockit is copyright (c) 2001 Robert Feldt, feldt@ce.chalmers.se # and licensed under GPL # but this parser is under LGPL
Reference
Next
Check lib/rpdf2txt/object.rb again
lib/rpdf2txt/object.rb#extract_attributes
def extract_attributes(ast)
if(ast.children_names.include?('value'))
pdf_unescape(ast.value)
elsif(ast.children_names.include?('text'))
pdf_unescape(ast.text.value[1...-1])
elsif(ast.children_names.include?('values'))
ast.values.collect { |child| extract_attributes(child) }
elsif(ast.children_names.include?('pairs'))
result = {}
print "ast="
p ast
print "ast.pairs="
p ast.pairs
print "ast.pairs.class="
p ast.pairs.class
puts
ast.pairs.each { |pair|
k, v = pair
keystr = k.value.strip.tr('/','')
unless(keystr.empty?)
result.store(keystr.downcase.intern, extract_attributes(v))
end
}
result
else
value = ast
end
end
Result
The latest libraries
masa@masa ~/ywesee/rpdf2txt $ ruby18 -I ~/work/rpdf2txt/lib test/test_pdf_parser.rb ... ast=Hash:[_ArrayNode] ast.pairs=_ArrayNode ast.pairs.class=ArrayNode
The updated libraries
masa@masa ~/ywesee/rpdf2txt $ ruby1.9 -I lib test/test_pdf_parser.rb ... ast=Hash:[nil] ast.pairs=nil ast.pairs.class=NilClass
Note
Experiment
def extract_attributes(ast)
if(ast.children_names.include?('value'))
pdf_unescape(ast.value)
elsif(ast.children_names.include?('text'))
pdf_unescape(ast.text.value[1...-1])
elsif(ast.children_names.include?('values'))
ast.values.collect { |child| extract_attributes(child) }
elsif(ast.children_names.include?('pairs'))
result = {}
if(ast_pairs = ast.pairs)
#ast.pairs.each { |pair|
ast_pairs.each { |pair|
k, v = pair
keystr = k.value.strip.tr('/','')
unless(keystr.empty?)
result.store(keystr.downcase.intern, extract_attributes(v))
end
}
end
result
else
value = ast
end
end
...
def raw_stream
#@raw_stream ||= @src.scan(/stream[\r\n]{1,2}(.*)endstream/mn).to_s
#@raw_stream ||= @src.scan(/stream[\r\n]{1,2}(.*)endstream/mn)[0][0]
unless(@raw_stream)
if(src_scan = @src.scan(/stream[\r\n]{1,2}(.*)endstream/mn) and !src_scan.empty?)
@raw_stream = src_scan[0][0]
else
@raw_stream = src_scan.to_s
end
end
return @raw_stream
end
Result
masa@masa ~/ywesee/rpdf2txt $ ruby1.9 -I lib test/test_pdf_parser.rb test/test_pdf_parser.rb:28: warning: variable $KCODE is no longer effective; ignored Loaded suite test/test_pdf_parser Started . Finished in 0.084548164 seconds. 1 tests, 0 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications 100% passed
Note
Check all the tests (test_pdf_parser.rb)
masa@masa ~/ywesee/rpdf2txt $ ruby1.9 -I lib test/test_pdf_parser.rb test/test_pdf_parser.rb:28: warning: variable $KCODE is no longer effective; ignored Loaded suite test/test_pdf_parser Started ....F........ 1) Failure: test_join_snippets__hex_chars(TestParser) [test/test_pdf_parser.rb:334]: <"Paroxetin besitzt eine selektive Wirkung; in-vitro Studien haben gezeigt, dass es, im Gegensatz zu\ntrizyklischen Antidepressiva, eine geringe Affinit\xE4t f\xFCr a1-, a2- und b-Adrenozeptoren sowie f\xFCr\nDopamin (D2)-, 5-HT1-artige, 5-HT2 und Histamin (H1)-Rezeptoren aufweist. Das Fehlen einer\n"> expected but was <"Paroxetin besitzt eine selektive Wirkung; in-vitro Studien haben gezeigt, dass es, im Gegensatz zu\ntrizyklischen Antidepressiva, eine geringe Affinit\xE4t f\xFCr a1-, a2- und b-Adrenozeptoren sowie f\xFCr\nDopamin (D2)-, 5-HT1-artige, 5-HT2 und Histamin (H1)-Rezeptoren aufweist. Das Fehlen einer\n"> diff: Paroxetin besitzt eine selektive Wirkung; in-vitro Studien haben gezeigt, dass es, im Gegensatz zu ? trizyklischen Antidepressiva, eine geringe Affinit� f� a1-, a2- und b-Adrenozeptoren sowie f� Dopamin (D2)-, 5-HT1-artige, 5-HT2 und Histamin (H1)-Rezeptoren aufweist. Das Fehlen einer Finished in 0.421212833 seconds. 13 tests, 56 assertions, 1 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications 92.3077% passed
Result
masa@masa ~/ywesee/rpdf2txt $ ruby1.9 -I lib test/test_pdf_parser.rb test/test_pdf_parser.rb:28: warning: variable $KCODE is no longer effective; ignored Loaded suite test/test_pdf_parser Started ............. Finished in 0.411339522 seconds. 13 tests, 56 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications 100% passed
Next
masa@masa ~/ywesee/rpdf2txt $ ruby1.9 -I lib test/test_pdf_parser.rb test/test_pdf_parser.rb:28: warning: variable $KCODE is no longer effective; ignored Loaded suite test/test_pdf_parser Started ........F.............. 1) Failure: test_join_snippets6(TestParser) [test/test_pdf_parser.rb:481]: <"In Studie 1 evaluierte man 271 Patienten mit einer m\xE4ssigen bis schweren aktiven rheumatoiden \nArthritis, die \xB318 Jahre alt waren, bei denen die Therapie mit mindestens einem, aber mit nicht mehr \n"> expected but was <"In Studie 1 evaluierte man 271 Patienten mit einer m\xE4ssigen bis schweren aktiven rheumatoiden \nArthritis, die $18 Jahre alt waren, bei denen die Therapie mit mindestens einem, aber mit nicht mehr \n"> diff: In Studie 1 evaluierte man 271 Patienten mit einer m�sigen bis schweren aktiven rheumatoiden ? Arthritis, die �18 Jahre alt waren, bei denen die Therapie mit mindestens einem, aber mit nicht mehr ? $ Finished in 2.576134358 seconds. 23 tests, 69 assertions, 1 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications 95.6522% passed
Note
suspend
First, focus on
1) Failure: test_char_width(TestTextState) [test/test_text_state.rb:303]: <0.313> expected but was <0.301>
Experiment
lib/rpdf2txt/text_state.rb#char_width
def char_width(char)
if(char.is_a? String)
char = char[0]
end
w = 0.0
if(@font && (width = @font.width(char)))
w = width
elsif(@font && (avg = @font.attributes[:avgwidth]))
w = avg
end
print "w="
p w
print "char="
p char
w = 300.0 if w == 0
w += @char_spacing
if(char==32)
w += @word_spacing
end
w * @font_size / USER_SPACE
end
Result
Ruby 1.8 with the latest libraries
w=278 char=32 w=278 char=32 w=278 char=32
Ruby 1.9 with the updated libraries
w=278 char=" " w=278 char=" " w=278 char=" "
Note
Experiment
lib/rpdf2txt/text_state.rb#char_width
def char_width(char)
if(char.is_a? String)
#char = char[0]
char = char.unpack('C*')[0]
end
Reference
Result
masa@masa ~/ywesee/rpdf2txt $ ruby18 -I lib test/test_text_state.rb Loaded suite test/test_text_state Started . Finished in 0.113862 seconds. 1 tests, 3 assertions, 0 failures, 0 errors
masa@masa ~/ywesee/rpdf2txt $ ruby1.9 -I lib test/test_text_state.rb Loaded suite test/test_text_state Started . Finished in 0.077112895 seconds. 1 tests, 3 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications 100% passed
Note
The last failure
masa@masa ~/ywesee/rpdf2txt $ ruby1.9 -I lib test/test_pdf_parser.rb test/test_pdf_parser.rb:28: warning: variable $KCODE is no longer effective; ignored Loaded suite test/test_pdf_parser Started ........F.............. 1) Failure: test_join_snippets6(TestParser) [test/test_pdf_parser.rb:482]: <"In Studie 1 evaluierte man 271 Patienten mit einer m\xE4ssigen bis schweren aktiven rheumatoiden \nArthritis, die \xB318 Jahre alt waren, bei denen die Therapie mit mindestens einem, aber mit nicht mehr \n"> expected but was <"In Studie 1 evaluierte man 271 Patienten mit einer m\xE4ssigen bis schweren aktiven rheumatoiden \nArthritis, die $18 Jahre alt waren, bei denen die Therapie mit mindestens einem, aber mit nicht mehr \n"> diff: In Studie 1 evaluierte man 271 Patienten mit einer m�sigen bis schweren aktiven rheumatoiden ? Arthritis, die �18 Jahre alt waren, bei denen die Therapie mit mindestens einem, aber mit nicht mehr ? $ Finished in 2.577773336 seconds. 23 tests, 69 assertions, 1 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications 95.6522% passed
Experiment
lib/rpdf2txt/text.rb#mapped_ascii
def mapped_ascii(ascii)
if(@current_font)
print "@current_font.cmap=", @current_font.cmap.inspect, "\n" if @current_font.cmap
print "cmap.map=", @current_font.cmap.map.inspect, "\n" if @current_font.cmap
print "ascii=", ascii, "\n" if @current_font.cmap
if((cmap = @current_font.cmap) && (map = cmap.map) \
&& (unicode_bytes = map[ascii]) \
&& (ascii = SymbolMap::SYMBOL_ENTITIES[unicode_bytes]))
print "ascii.chr="
p ascii.chr
Result
Ruby 1.8
masa@masa ~/ywesee/rpdf2txt $ ruby18 -I lib test/test_pdf_parser.rb
Loaded suite test/test_pdf_parser
Started
@current_font.cmap=#<Rpdf2txt::CMap:0x7fdfca1b7298 @map={36=>8805}, @target_encoding="utf8", @decrypted_stream="", @decoded_stream="", @attributes={}, @src="<< >>", @raw_stream="">
cmap.map={36=>8805}
ascii=36
ascii.chr="\263"
Ruby 1.9
masa@masa ~/ywesee/rpdf2txt $ ruby1.9 -I lib test/test_pdf_parser.rb
test/test_pdf_parser.rb:28: warning: variable $KCODE is no longer effective; ignored
Loaded suite test/test_pdf_parser
Started
@current_font.cmap=#<Rpdf2txt::CMap:0x00000000c6c780 @map={36=>8805}, @attributes={}, @src="<< >>", @target_encoding="utf8", @raw_stream="[]", @decrypted_stream="[]", @decoded_stream="[]">
cmap.map={36=>8805}
ascii=$
Note
Experiment
def snip(snippet)
snippet_text = snippet[1..-2].gsub(/\\[nrt]/n, " ")
snippet_text.gsub!(/\\([()])/n, '\1')
snippet_text.gsub!(/./n) { |char|
#self.mapped_ascii(char[0]) || char
self.mapped_ascii(char.unpack('C*')[0]) || char
Result
Ruby 1.8
masa@masa ~/ywesee/rpdf2txt $ ruby18 -I lib test/test_pdf_parser.rb Loaded suite test/test_pdf_parser Started . Finished in 0.286884 seconds. 1 tests, 1 assertions, 0 failures, 0 errors
Ruby 1.9
masa@masa ~/ywesee/rpdf2txt $ ruby1.9 -I lib test/test_pdf_parser.rb test/test_pdf_parser.rb:28: warning: variable $KCODE is no longer effective; ignored Loaded suite test/test_pdf_parser Started . Finished in 0.198423306 seconds. 1 tests, 1 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications 100% passed
Note
Final check
Ruby 1.8 with the updated libraries
masa@masa ~/ywesee/rpdf2txt $ ruby18 -I lib test/suite.rb Loaded suite test/suite Started ......................'invalid literal/lengths set' when filtering with /FlateDecode ...................................................................unknown encoding 370 0 R ............................................. Finished in 12.640062 seconds. 134 tests, 295 assertions, 0 failures, 0 errors
Ruby 1.9 with the updated libraries
masa@masa ~/ywesee/rpdf2txt $ ruby1.9 -I lib test/suite.rb test/suite.rb:26: warning: variable $KCODE is no longer effective; ignored /home/masa/ywesee/rpdf2txt/test/test_pdf_object.rb:26: warning: variable $KCODE is no longer effective; ignored /home/masa/ywesee/rpdf2txt/test/test_pdf_parser.rb:28: warning: variable $KCODE is no longer effective; ignored Loaded suite test/suite Started ......................'invalid literal/lengths set' when filtering with /FlateDecode ......................................................... ..........unknown encoding 370 0 R ............................................. Finished in 9.930899752 seconds. 134 tests, 295 assertions, 0 failures, 0 errors, 0 pendings, 0 omissions, 0 notifications 100% passed
Note
Ruby 1.8 with the latest libraries
masa@masa ~/ywesee/rpdf2txt $ ruby18 test/suite.rb Loaded suite test/suite Started ......................'invalid literal/lengths set' when filtering with /FlateDecode ...................................................................unknown encoding 370 0 R ....................F........................ Finished in 12.225387 seconds. 1) Failure: test_join_snippets__hex_chars(TestParser) [/home/masa/ywesee/rpdf2txt/test/test_pdf_parser.rb:335]: <"Paroxetin besitzt eine selektive Wirkung; in-vitro Studien haben gezeigt, dass es, im Gegensatz zu\ntrizyklischen Antidepressiva, eine geringe Affinit� f� a1-, a2- und b-Adrenozeptoren sowie f� Dopamin (D2)-, 5-HT1-artige, 5-HT2 und Histamin (H1)-Rezeptoren aufweist. Das Fehlen einer\n"> expected but was <"Paroxetin besitzt eine selektive Wirkung; in-vitro Studien haben gezeigt, dass es, im Gegensatz zu\ntrizyklischen Antidepressiva, eine geringe Affinit� f� a1-, a2- und b-Adrenozeptoren sowie f� Dopamin (D2)-, 5-HT1-artige, 5-HT2 und Histamin (H1)-Rezeptoren aufweist. Das Fehlen einer\n">.
Note
Experiment
lib/rpdf2txt/text.rb#snip
def snip(snippet)
snippet_text = snippet[1..-2].gsub(/\\[nrt]/n, " ")
print "snippet_text1="
p snippet_text
snippet_text.gsub!(/\\([()])/n, '\1')
print "snippet_text2="
p snippet_text
snippet_text.gsub!(/./n) { |char|
#self.mapped_ascii(char[0]) || char
self.mapped_ascii(char.unpack('C*')[0]) || char
}
print "snippet_text3="
p snippet_text
_snip(snippet_text)
end
Result
Ruby 1.8 with the latest libraries
snippet_text1="trizyklischen Antidepressiva, eine geringe Affinit� f� �" snippet_text2="trizyklischen Antidepressiva, eine geringe Affinit� f� �" snippet_text3="trizyklischen Antidepressiva, eine geringe Affinit� f� "
Ruby 1.9 with the updated libraries
snippet_text1="trizyklischen Antidepressiva, eine geringe Affinit\xE4t f\xFCr " snippet_text2="trizyklischen Antidepressiva, eine geringe Affinit\xE4t f\xFCr " snippet_text3="trizyklischen Antidepressiva, eine geringe Affinit\xE4t f\xFCr "
Note
snippet_text.gsub!(/./n) { |char|
#self.mapped_ascii(char[0]) || char
self.mapped_ascii(char.unpack('C*')[0]) || char
}
Experiment
def snip(snippet)
snippet_text = snippet[1..-2].gsub(/\\[nrt]/n, " ")
print "snippet_text1="
p snippet_text
snippet_text.gsub!(/\\([()])/n, '\1')
print "snippet_text2="
p snippet_text
print "snippet_text.scan(/./n)="
p snippet_text.scan(/./n)
snippet_text.gsub!(/./n) { |char|
#self.mapped_ascii(char[0]) || char
self.mapped_ascii(char.unpack('C*')[0]) || char
}
print "snippet_text3="
p snippet_text
_snip(snippet_text)
end
Result
Ruby 1.8 with the latest libraries
snippet_text2="trizyklischen Antidepressiva, eine geringe Affinit� f� " snippet_text.scan(/./n)=["t", "r", "i", "z", "y", "k", "l", "i", "s", "c", "h", "e", "n", " ", "A", "n", "t", "i", "d", "e", "p", "r", "e", "s", "s", "i", "v", "a", ",", " ", "e", "i", "n", "e", " ", "g", "e", "r", "i", "n", "g", "e", " ", "A", "f", "f", "i", "n", "i", "t", "\344", "t", " ", "f", "\374", "r", " "]
Ruby 1.9 with the updated libraries
snippet_text2="trizyklischen Antidepressiva, eine geringe Affinit\xE4t f\xFCr " snippet_text.scan(/./n)=["t", "r", "i", "z", "y", "k", "l", "i", "s", "c", "h", "e", "n", " ", "A", "n", "t", "i", "d", "e", "p", "r", "e", "s", "s", "i", "v", "a", ",", " ", "e", "i", "n", "e", " ", "g", "e", "r", "i", "n", "g", "e", " ", "A", "f", "f", "i", "n", "i", "t", "\xE4", "t", " ", "f", "\xFC", "r", " "]
Note
Then the point is the following process
self.mapped_ascii(char.unpack('C*')[0]) || char