view · edit · sidebar · attach · print · history

20120619-debug-oddb-yaml

<< | Index | >>


Summary

  • Debuged yaml exporter with oddb.yml
    • oddb.yaml has also escaped unicode character.

Index


Check yaml gem

NOTE

export oddb.yaml only (no export job.)

ch.oddb> YamlExporter.new(self).export

oddb.yalm should have about 24000 products

ch.oddb> a = 0; companies.values.each {|c| a += c.packages.length }; p a;
-> 24850

read YAML code

YAML syck engine dose not have support for UTF-8 (without ascii).
In Psych engine, If yaml data has unicode String as ascii, it is replaced as binary.

deprecated methods
  • to_yaml_properties
  • quick_emit
/path/to/ruby/1.9.1/psych/yaml_tree.rb
...
 # FIXME: remove this method once "to_yaml_properties" is removed
...
 warn "#{loc}: to_yaml_properties is deprecated, please implement \"encode_with(coder)\""
...

I could re-create and confirm replaced binary problem with using encode_with in this sample script.
Attach:yaml_dump_test-20120619.txt

---
- Grüße
- Öl
- Käse
- !ruby/Test:Foo
  foo: föö
  bar: !ruby/Test:Bar
    bar: !binary |-
      YsOkcg==

Refs.


Debug oddb_yaml_exporter

Strategy
  • use YAML psych engine
  • replace deprecated method in oddb_yaml.rb (to_yaml_properties and quicke_emit) for all object.
  • update encoding value with force_encoding, when export it. (before YAML emitting).
    def encode_with(coder=nil)
      self::class::EXPORT_PROPERTIES.each do |a| 
        value = instance_variable_get(a)
        if value.is_a? String
          if value.encoding.to_s.downcase != 'utf-8'
            value.force_encoding('utf-8')
          end 
        end 
        coder[a[1..-1]] = value
      end 
      coder.tag = self.to_yaml_type
    end
Problem
  • Some objects in oddb.org has string data as ascii that it contains utf-8(z.B. Umlaute).
  • YAML psych engine handles also object as binary. (Even If that values are not in export yaml file.)

Try hacking psych

psych
module Psych
  module Visitors
    class YAMLTree < Psych::Visitors::Visitor
      def binary? string
        false
      end 
      private :binary?
    end 
  end 
end

Attach:yaml_dump_test-20120619-2.txt

This way works with sample script.
But it does not work for ODBA::Stub.
Object has gone what has binary data.

view · edit · sidebar · attach · print · history
Page last modified on June 19, 2012, at 01:03 PM