<< | Index | >>
oddb.yaml does not have all record of companies.
ch.oddb> companies.values.length -> 728 ch.oddb> a = 0; companies.values.each {|c| a += c.packages.length }; p a; -> 24850
def OdbaExporter.export_yaml(odba_ids, dir, name, opts={}) opts.each do |key, val| Thread.current[key] = val end safe_export(dir, name) { |fh| p "odba_ids #=> #{odba_ids.length}" non_data = 0 exp_data = 0 odba_ids.each { |odba_id| begin yaml = YAML.dump(ODBA.cache.fetch(odba_id, nil)) non_data += 1 unless yaml exp_data += 1 if yaml fh.puts Syck.unescape(yaml) fh.puts rescue end } p "non_data #=> #{non_data}" p "exp_data #=> #{exp_data}" nil } end
Some records have gone.
"odba_ids #=> 728" "non_data #=> 0" "exp_data #=> 656"
In Some case, YAML.dump fails.
It seems that exceptions are caused, because Some Companies have missing Package References.
# Package class map.add('pharmacode', self.pharmacode) map.add('narcotics', @narcotics.collect { |narc| narc.casrn}) # this line is caused error. map.add('deductible', {'deductible_g' => 10, 'deductible_o' => 20 }[self.deductible.to_s])
I skipped nil ojbects, then oddb.yaml became double size.
(missing about 100 companies(and packages) are also in oddb.yaml.)
Currently, only
oddb.yaml contains unescaped unicade characters.
If fachinfo.yaml/patinfo.yaml are changed with unescaped characters, We have to update converter also.
(like patinfo2csv).
/var/www/ch.oddb.org 500(master) $ file ~/Downloads/oddb.yaml /home/yasuhiro/Downloads/oddb.yaml: ASCII text, with very long lines
/var/www/ch.oddb.org 609(master) $ file data/downloads/fachinfo.yaml data/downloads/fachinfo.yaml: UTF-8 Unicode text, with very long lines /var/www/ch.oddb.org 610(master) $ file data/downloads/patinfo.yaml data/downloads/patinfo.yaml: UTF-8 Unicode text, with very long lines /var/www/ch.oddb.org 611(master) $ file data/downloads/oddb.yaml data/downloads/oddb.yaml: UTF-8 Unicode text, with very long lines
test new yaml file.
$ bin/patinfo2csv ../patinfo.yaml ../patinfo.csv ../ean.txt /path/to/patinfo2csv/lib/patinfo2csv/loader.rb:12:in `gets': "\xC2" on US-ASCII (Encoding::InvalidByteSequenceError) $ bin/patinfo2csv ../patinfo.yaml ../patinfo.csv ../ean.txt /path/to/ruby/1.9.1/psych.rb:206:in `parse': (<unknown>): control characters are not allowed at line 1 column 1 (Psych::SyntaxError)
suspend
(applied updating to oddb.yaml only)
Following flags(boolean) are missing at some case in oddb.yaml.
These Model class does not have initialize for these flag values.
check_accessor_list = { ... :lppv => ["TrueClass","NilClass","FalseClass"] ...
def multiplier count = @count || 1 addition = @addition || 0 ...
I added default value for exported yaml in exportd. (changed order of instance values in these Objects.)
$ grep -r lppv: data/downloads/oddb.yaml | sort | uniq lppv: false lppv: true $ grep -r addition: data/downloads/oddb.yaml | sort | uniq addition: 0 addition: 1 addition: 10 addition: 11 addition: 2 addition: 20 addition: 3 addition: 4 addition: 490 addition: 5 addition: 50 addition: 6 addition: 84