<< | Index | >>
oddb.yaml does not have all record of companies.
ch.oddb> companies.values.length
-> 728
ch.oddb> a = 0; companies.values.each {|c| a += c.packages.length }; p a;
-> 24850
def OdbaExporter.export_yaml(odba_ids, dir, name, opts={})
opts.each do |key, val| Thread.current[key] = val end
safe_export(dir, name) { |fh|
p "odba_ids #=> #{odba_ids.length}"
non_data = 0
exp_data = 0
odba_ids.each { |odba_id|
begin
yaml = YAML.dump(ODBA.cache.fetch(odba_id, nil))
non_data += 1 unless yaml
exp_data += 1 if yaml
fh.puts Syck.unescape(yaml)
fh.puts
rescue
end
}
p "non_data #=> #{non_data}"
p "exp_data #=> #{exp_data}"
nil
}
end
Some records have gone.
"odba_ids #=> 728" "non_data #=> 0" "exp_data #=> 656"
In Some case, YAML.dump fails.
It seems that exceptions are caused, because Some Companies have missing Package References.
# Package class
map.add('pharmacode', self.pharmacode)
map.add('narcotics', @narcotics.collect { |narc| narc.casrn})
# this line is caused error.
map.add('deductible', {'deductible_g' => 10, 'deductible_o' => 20 }[self.deductible.to_s])
I skipped nil ojbects, then oddb.yaml became double size.
(missing about 100 companies(and packages) are also in oddb.yaml.)
Currently, only oddb.yaml contains unescaped unicade characters.
If fachinfo.yaml/patinfo.yaml are changed with unescaped characters, We have to update converter also.
(like patinfo2csv).
/var/www/ch.oddb.org 500(master) $ file ~/Downloads/oddb.yaml /home/yasuhiro/Downloads/oddb.yaml: ASCII text, with very long lines
/var/www/ch.oddb.org 609(master) $ file data/downloads/fachinfo.yaml data/downloads/fachinfo.yaml: UTF-8 Unicode text, with very long lines /var/www/ch.oddb.org 610(master) $ file data/downloads/patinfo.yaml data/downloads/patinfo.yaml: UTF-8 Unicode text, with very long lines /var/www/ch.oddb.org 611(master) $ file data/downloads/oddb.yaml data/downloads/oddb.yaml: UTF-8 Unicode text, with very long lines
test new yaml file.
$ bin/patinfo2csv ../patinfo.yaml ../patinfo.csv ../ean.txt /path/to/patinfo2csv/lib/patinfo2csv/loader.rb:12:in `gets': "\xC2" on US-ASCII (Encoding::InvalidByteSequenceError) $ bin/patinfo2csv ../patinfo.yaml ../patinfo.csv ../ean.txt /path/to/ruby/1.9.1/psych.rb:206:in `parse': (<unknown>): control characters are not allowed at line 1 column 1 (Psych::SyntaxError)
suspend
(applied updating to oddb.yaml only)
Following flags(boolean) are missing at some case in oddb.yaml.
These Model class does not have initialize for these flag values.
check_accessor_list = {
...
:lppv => ["TrueClass","NilClass","FalseClass"]
...
def multiplier count = @count || 1 addition = @addition || 0 ...
I added default value for exported yaml in exportd. (changed order of instance values in these Objects.)
$ grep -r lppv: data/downloads/oddb.yaml | sort | uniq
lppv: false
lppv: true
$ grep -r addition: data/downloads/oddb.yaml | sort | uniq
addition: 0
addition: 1
addition: 10
addition: 11
addition: 2
addition: 20
addition: 3
addition: 4
addition: 490
addition: 5
addition: 50
addition: 6
addition: 84