view · edit · sidebar · attach · print · history

20120621-debug-yaml-exporter-exportd

<< | Index | >>


Summary

  • Updated exportd
    • Currently, oddb.yaml contains only unescaped unicode characters.
    • Fixed bugs to export missing records of about 100 companies.

Commits

Index


Debug-yaml-exporter

Problem

oddb.yaml does not have all record of companies.

ch.oddb> companies.values.length
-> 728
ch.oddb> a = 0; companies.values.each {|c| a += c.packages.length }; p a;
-> 24850
in ext/export/src/odba_export.rb
    def OdbaExporter.export_yaml(odba_ids, dir, name, opts={})
      opts.each do |key, val| Thread.current[key] = val end 
      safe_export(dir, name) { |fh|
        p "odba_ids #=> #{odba_ids.length}"
        non_data = 0 
        exp_data = 0 
        odba_ids.each { |odba_id|
          begin
            yaml = YAML.dump(ODBA.cache.fetch(odba_id, nil))
            non_data += 1 unless yaml
            exp_data += 1 if yaml
            fh.puts Syck.unescape(yaml)
            fh.puts
          rescue
          end 
        }   
        p "non_data #=> #{non_data}"
        p "exp_data #=> #{exp_data}"
        nil 
      }   
    end

Some records have gone.

"odba_ids #=> 728"
"non_data #=> 0"
"exp_data #=> 656"

In Some case, YAML.dump fails.
It seems that exceptions are caused, because Some Companies have missing Package References.

  • Some Packages can not refer @narcotics value.
# Package class
  map.add('pharmacode', self.pharmacode)
  map.add('narcotics', @narcotics.collect { |narc| narc.casrn})
  # this line is caused error.
  map.add('deductible', {'deductible_g' => 10, 'deductible_o' => 20 }[self.deductible.to_s])

I skipped nil ojbects, then oddb.yaml became double size.
(missing about 100 companies(and packages) are also in oddb.yaml.)

Currently, only oddb.yaml contains unescaped unicade characters.

commit

Debug tools for readable exported yaml

If fachinfo.yaml/patinfo.yaml are changed with unescaped characters, We have to update converter also.
(like patinfo2csv).

previouse yaml file encoding
/var/www/ch.oddb.org 500(master) $ file ~/Downloads/oddb.yaml
/home/yasuhiro/Downloads/oddb.yaml: ASCII text, with very long lines
new encoding
/var/www/ch.oddb.org 609(master) $ file data/downloads/fachinfo.yaml
data/downloads/fachinfo.yaml: UTF-8 Unicode text, with very long lines
/var/www/ch.oddb.org 610(master) $ file data/downloads/patinfo.yaml
data/downloads/patinfo.yaml: UTF-8 Unicode text, with very long lines
/var/www/ch.oddb.org 611(master) $ file data/downloads/oddb.yaml
data/downloads/oddb.yaml: UTF-8 Unicode text, with very long lines

patinfo.yaml

test new yaml file.

patinof2csv
$ bin/patinfo2csv ../patinfo.yaml ../patinfo.csv ../ean.txt 
/path/to/patinfo2csv/lib/patinfo2csv/loader.rb:12:in `gets': "\xC2" on US-ASCII (Encoding::InvalidByteSequenceError)

$ bin/patinfo2csv ../patinfo.yaml ../patinfo.csv ../ean.txt 
/path/to/ruby/1.9.1/psych.rb:206:in `parse': (<unknown>): control characters are not allowed at line 1 column 1 (Psych::SyntaxError)

suspend
(applied updating to oddb.yaml only)


Update flag value of oddb.yaml

Following flags(boolean) are missing at some case in oddb.yaml.

  • Package -> lppv
  • Parts -> addion

These Model class does not have initialize for these flag values.

src/model/packages.rb
check_accessor_list = {
...
:lppv => ["TrueClass","NilClass","FalseClass"]
...
src/model/part.rb
def multiplier
   count = @count || 1
   addition = @addition || 0
...

I added default value for exported yaml in exportd. (changed order of instance values in these Objects.)

result
$ grep -r lppv: data/downloads/oddb.yaml | sort | uniq 
          lppv: false
          lppv: true
$ grep -r addition: data/downloads/oddb.yaml | sort | uniq 
            addition: 0
            addition: 1
            addition: 10
            addition: 11
            addition: 2
            addition: 20
            addition: 3
            addition: 4
            addition: 490
            addition: 5
            addition: 50
            addition: 6
            addition: 84
commit
view · edit · sidebar · attach · print · history
Page last modified on June 21, 2012, at 11:44 AM