On VCR branch
Merging changes from oddb2xml -o. Discovered after quite a few trials that VCR is not thread-safe. Therefore many spec tests failed. Disabled threading (running takes a little bit longer).
Pushed commits Avoid using threads to fix problems when running rspec, Merge branch 'master' into vcr and Made all spec-tests (except for builder) pass again
Must fix builder_spec.rb for the changed items.
Banging my head no how to create with the new nokogiri version the same output as before. Our output should look like this:
<KMP MONTYPE="fi" LANG="DE" DT=""> <name> <p>3TC®</p> </name> <owner> <p>ViiV Healthcare GmbH</p> </owner> <monid>53663</monid> <paragraph><![CDATA[<title><p>3TC®</p></title><div class="paragraph" id="Section7000"> <div class="absTitle">Zusammensetzung</div>
Tried
pry(#<Oddb2xml::Builder>)> puts /.+zusammensetzung/im.match(info[:paragraph].root) (169Er) Erbiumcitrat CIS bio international Kolloidale Suspension zu lokalen InjektionERMM-1 Zusammensetzung pry(#<Oddb2xml::Builder>)> puts /.+zusammensetzung/im.match(info[:paragraph].root.to_s) <html><body><div xmlns="http://www.w3.org/1999/xhtml"> <p class="s2"> </p> <p class="s6" id="section1"><span class="s3"><span>(</span></span><sup class="s4"><span>169</span></sup><span class="s3"><span>Er)</span></span><span class="s5"><span> </span></span><span class="s5"><span>Erbiumcitrat CIS bio international</span></span><span class="s3"><span> </span></span></p> <p class="s6"><span class="s7"><span>Kolloidale Suspension zu lokalen Injektion</span></span></p> <p class="s8"><span class="s7"><span>ERMM-1</span></span></p> <p class="s2"> </p> <p class="s9" id="section2"><span class="s5"><span>Zusammensetzung => nil pry(#<Oddb2xml::Builder>)> puts /.+zusammensetzung/im.match(info[:paragraph].to_html) <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> <?xml version="1.0" encoding="utf-8"?><html><body><div xmlns="http://www.w3.org/1999/xhtml"> <p class="s2"> </p> <p class="s6" id="section1"><span class="s3"><span>(</span></span><sup class="s4"><span>169</span></sup><span class="s3"><span>Er)</span></span><span class="s5"><span> </span></span><span class="s5"><span>Erbiumcitrat CIS bio international</span></span><span class="s3"><span> </span></span></p> <p class="s6"><span class="s7"><span>Kolloidale Suspension zu lokalen Injektion</span></span></p> <p class="s8"><span class="s7"><span>ERMM-1</span></span></p> <p class="s2"> </p> <p class="s9" id="section2"><span class="s5"><span>Zusammensetzung => nil
Same output if using to_html instead of to_s
Using now nokogiri 1.5.11. I did not find out, what we used in 2013 as the nokogiri version did not show up in the gemspec or Gemfile.lock. Found finally the answer on howto save it without the enclosing declaration via http://stackoverflow.com/questions/8218711/print-an-xml-document-without-the-xml-header-line-at-the-top.
doc = Nokogiri.XML('<hello world="true" />') puts doc.to_html :save_with => Nokogiri::XML::Node::SaveOptions::NO_DECLARATION <hello world="true"></hello> [
Using Nokogiri::HTML.fragment(pac.content.force_encoding('UTF-8'))
to extract the paragraph resolves part of this problem. Adding also the style information for each fachinfo to the generated xml. All spec tests pass again. Running test_options.rb.
Pushed commits Re-enable some checks for extractor and Fix running with -o option for fachinfo