view · edit · sidebar · attach · print · history

20120323-patinfo2csv

<< | Index | >>


summary

  • updated patinfo2csv
    • added 3 bit character code
    • updated patinfo2csv with ean code, not with pharmacode
    • refactored code (finaly, it takes 2-3 minutes)

commit

index


Added 3-bit character code

...
   /\\xE2\\x80\\x99/ => "’",
   /\\xE2\\x80\\x9A/ => "‚",
   /\\xE2\\x80\\x9B/ => "&#8219;",
   /\\xE2\\x80\\x9C/ => "“",
...

refs


Try File loading(character replace) with multi threads

When I use replace by Regexp in multi threads, sometimes script stops in yaml loading.
This caused by that my script is not thread safe.

$ bin/patinfo2csv ../patinfo.yaml.sample ../patinfo.csv.sample ../eancode.txt.sample 
patinfo[de] converted: 2 rows
$ bin/patinfo2csv ../patinfo.yaml.sample ../patinfo.csv.sample ../eancode.txt.sample 
/usr/local/lib/ruby/1.9.1/psych.rb:154:in `parse': (<unknown>): couldn't parse YAML at line 2984 column 15 (Psych::SyntaxError)
        from /usr/local/lib/ruby/1.9.1/psych.rb:154:in `parse_stream'
        from /usr/local/lib/ruby/1.9.1/psych.rb:222:in `load_stream'
        from /home/yasu/Documents/workspace/ywesee/patinfo2csv/lib/patinfo2csv/loader.rb:35:in `load_yaml'
        from /home/yasu/Documents/workspace/ywesee/patinfo2csv/lib/patinfo2csv/cli.rb:18:in `run'
        from bin/patinfo2csv:45:in `<main>'

I updated scripts as thread safe.
Then ran 5 - 10 threads.

In loading and parsing, multi-threads are not effective way.
It could not speed up because I have to use lock by mutex.

only Main thread

107.776218639 sec.

Main thread + 5 threads

over 20 min.

Refactored character replacing and chapter choosing

Refactored character replacing before YAML loading.

  • skipped unused lines
  • applied character replancing by Regexp to only text and subheading

Refactored chapter choosing

  • created row(chapters) making as multi-threads (only 2 threads)

total times.

1 thread (only main thread)

120.169318657 sec.

2 threads

96.08502092 sec.

3 threads

144.50163247 sec.

view · edit · sidebar · attach · print · history
Page last modified on March 24, 2012, at 06:47 PM