We must fix the following two problems:
Pushed the commit Do not import e-mail addresses.
Cannot remove the files after running a single class like CustomerImporter, as it is referenced when running the next Importer calls via polling load_sources.
Therefore adding at_exit which checks also a newly introduced module variable success. Done with commit Remove CSV at_exit if import successfull
Now I must see, why we cannot recreate the User Backmann with ID 3219. Changed creating the customer by setting the organistion correctly. Now each import at least deletes the previous Customer. But the Model::Customer.find_by_customer_id(customer_id) never finds the new customer. Why?
Showing errors if the import did not finish correctly with at_exit must show that @@success is false
Somehow errors get catched by a bbmb procedure, eg.
Rescue invalid date in import Rescue undefined method `tr' for nil:NilClass in import
How can I avoid that they mark the import as failed? Done with commit Don't mark import as failed when parsing a date fails
Pushed commit Ignore customer 3219 (bachmann)
We must debug, why we get the parsing error when running inside a crontab job and not via a shell call.
Could reproduce the error here after changing the crontab to
5 14 * * * apache cd /var/www/oddb.org && /usr/local/bin/bundle-240 exec /usr/local/bin/ruby-240 jobs/update_drugshortage > /var/www/oddb.org/log/drugshortage.log 2>&1
Somehow we cannot read the file. Adding more debug info. I suspect that under apache environment variable like LANG and LANGUAGE are not set to de_CH.UTF-8. This is not the case. When failing dumping the following additional information
Hope that this helps to pintpoint the problem. But this is not the case. It returns
Page has 0 TD elements found via css Dumping TD is unable to parse https://www.drugshortage.ch/UebersichtaktuelleLieferengpaesse2.aspx via /var/www/oddb.org/data/html/drugshortage-latest.html 367877 page has 1 elements
Trying to reread the same file again. Alternatively I will use an older version of nokogiri. No. It looks like under crontab we read the file as US-ASCII and not as UTF-8. Seen by the debug output nr_tries 1 content is 367877 long and US-ASCII. Using Nokogiri::VERSION 1.7.1
. Forcing the encoding to UTF-8. This is weird. When call file on lastest I get (inside the plugin) the following output /var/www/oddb.org/data/html/drugshortage-latest.html: HTML document, UTF-8 Unicode text, with very long lines, with CRLF line terminators
, but the content after reading the file is still US-ASCII. Must I pass the UTF-8 encoding to the File.read command?
Fixed also the problem that I did not correctly read the changed file in the unit test. Pushed commit Read shortage HTML file always as UTF-8
Must remove daily file if nothing changed. Done with commits: