view · edit · sidebar · attach · print · history

20130624-Fachinfo-italic

<< | Index | >>


Summary

  • Fachinfo is italic

Commits

Index

Problem fachinfo is italic

  • Why does parsing the XML-file open the file test/var/html/patinfo/de/K_nzle_Passionsblume_Kapseln_swissmedicinfo.html.tmp?
  • Did run the unit test ext/fiparse/test/test_fachinfo_hpricot.rb. It reports quite a few errors because there is a missing space after 'Wirkstoff:", see Ponstan where we find Wirkstoff:Mefenaminsäure. instead of Wirkstoff: Mefenaminsäure.
    • this error must be corrected (but is low priority)
  • Other errors disappear if on substitutes @fachinfo.name by @fachinfo.name.to_s
  • The problem with 32917 is that the embedded HTML does not contain the correct HTML for Zyloric® (which is Zyloric®, but something very ugly like <span class="s2"><span>Zyloric</span></span><sup class="s3"><span class="s4">â</span>/sup>
  • Would it be better to always use the correct name from the downloaded XML-File?
    • This is not as simple as it seems, because the XML-info is not present when parsing the embedded HTML-fachinfo.
    • Took me some time to see the big picture. Created a graphiz diagram. SVG see Attach:update_swissmedicinfo.pdf, graphviz (newer Version) Attach:swissmedicinfo.gv.txt
    • I will continue tomorrow to see how I will attach the problem.
    • Committed Added unit test for Zyloric. This shows that parsing the (mal formed) HTML exhibits the error.

I don't think that the problem stems from a bad parsing of the HTML content. I think it is caused when creating the page to display.

If one compares it with correct looking pages it seems that italic and normal styles are reversed. E.g Wirkstoffe: Finasteridum. Wirkstoffe is normal with 58106, but italic in 58107. Also the two problematic cases have a singular "Wirkstoff" where good ones seem to have plural "Wirkstoffe".

  • Found the following unwanted tag </ p> in fachinfo.unwanted_effects and unwanted_effects for 58106 "Finasterid Streuli® 5".
view · edit · sidebar · attach · print · history
Page last modified on June 28, 2013, at 12:14 PM