view · edit · sidebar · attach · print · history

Index>

20140428-oddb2xml-with-sax-parser

Summary

  • Use SAX-Parser for oddb2xml

Commits

Index

Keep in Mind

---

Use SAX-Parser for oddb2xml

Motivation see http://tunein.yap.tv/ruby/2012/06/08/elegant-xml-parsing/.

Changing to use SAX-parser http://nokogiri.org/Nokogiri/XML/SAX/Document.html. We are hoping to solve the problem under windows where we get the following error when running oddb2xml -e

All read methods of File(IO) class can not read whole data on Windows.
(got "failed to allocate memory").

I could not reproduce this behaviour on my windows7 virtual machine.

Had patches lying around to use the builder gem. See Attach:use_builder_patch.txt. Tried it and remarked:

  • builder does not emit empty fields. (I think we could live with this problem).
  • Some problems e.g. emitted <send>DSCRDKendural Depottabl </send> instead of <DSCRD>Kendural Depottabl </DSCRD> for oddb_product.xml.
  • Running time increase significantly, eg. oddbxml -e from 1441 to 1917 seconds
  • htop reported VIRT around 2600 MB

Therefore not looking closer at this alternative.

Running oddb2xml -e --log only on an ARM linux with 256 MB RAM@1GHzto see, whether we need more memory. Also patching test_options.rb to prepend a time -v to each command to get an exact value of consumed memory.

Using a sax-parser (and sax-machine) to reduce the memory usage was much more invasive (and time consuming to add) than expected. First part see Attach:use_sax_machine_patch.txt.

Adapting the other 2 occurences of the XML-Parser, too. Now rake spec works again. Running rake test fails with

/opt/src/oddb2xml/lib/oddb2xml/extractor.rb:264:in `block (2 levels) in to_hash': undefined method `to_a' for #<Oddb2xml::LimitationElement:0x000000034e1be8> (NoMethodError)
        from /opt/src/oddb2xml/lib/oddb2xml/extractor.rb:233:in `each'
        from /opt/src/oddb2xml/lib/oddb2xml/extractor.rb:233:in `block in to_hash'
        from /opt/src/oddb2xml/lib/oddb2xml/extractor.rb:202:in `each'
        from /opt/src/oddb2xml/lib/oddb2xml/extractor.rb:202:in `to_hash'
        from /opt/src/oddb2xml/lib/oddb2xml/cli.rb:231:in `block (2 levels) in download'
        from /opt/src/oddb2xml/lib/oddb2xml/cli.rb:230:in `synchronize'
        from /opt/src/oddb2xml/lib/oddb2xml/cli.rb:230:in `block in download'

There seems to be still differences for the emitted limitations, but the maximal use of memory went done. E.g for bundle exec bin/oddb2xml --skip-download --log -f xml version 1.8.0 used Maximum resident set size (kbytes): 1832776 whereas with my local changes we need now Maximum resident set size (kbytes): 1277148 or about a 30% reduction. But as we hold a lot of information in memory, memory consumption is still very high. If we wanted to change this we would be forced to use a database, e.g. sqlite3, limit its memory usages (see e.g. this link). Also the build seem to complete in less time (also approximately 30%).

Adapt to new location of epha interaction csv-file

Solved with commit Adapt to new location of epha interaction csv-file

view · edit · sidebar · attach · print · history
Page last modified on April 28, 2014, at 09:42 PM