view · edit · sidebar · attach · print · history

Index>

20170628-oddb-org-rack

Summary

Commits

memory/threads

Index

Port oddb.org to use rack

The log files of the Rack app are in the directory where the app is created and placed named like log/<year>/<month>/<day>/<app_name>_log. E.g

  • -rw-r--r-- 1 apache apache 340 28. Jun 07:31 /var/www/oddb.org.rack/log/2017/06/28/crawler_log
  • -rw-r--r-- 1 apache apache 270619 28. Jun 08:03 /var/www/oddb.org.rack/log/2017/06/28/google_crawler_log
  • -rw-r--r-- 1 apache apache 31308069 28. Jun 08:03 /var/www/oddb.org.rack/log/2017/06/28/user_log

The status page works for me without problem, but after a reload I suddenly got an error

Service Temporarily Unavailable
The server is temporarily unable to service your request due to maintenance downtime or capacity problems. Please try again later.

A minute or so later it worked again.

We have way too many new starts in the log as evidenced by the next grep statement

hinpower oddb.org.rack # grep -c Starting /var/www/oddb.org.rack/log/2017/06/28/*log
/var/www/oddb.org.rack/log/2017/06/28/crawler_log:2
/var/www/oddb.org.rack/log/2017/06/28/google_crawler_log:1553
/var/www/oddb.org.rack/log/2017/06/28/user_log:19

I had used the same port 8112 for the crawler and the google_crawler in the services run file. Corrected this. Now the google_crawler does no longer restart permanentely.

But still 19 restarts for the user crawler is too much. Looking for explanations.

The google_crawler was handled in the pre rack version of oddb.org inside the index.rbx file with

    if request.cgi.user_agent =~ /google/i
      request = SBSM::Request.new(ODDB::SERVER_URI_FOR_GOOGLE_CRAWLER)
    else
      request = SBSM::Request.new(ODDB::SERVER_URI_FOR_CRAWLER)
    end

Whereas in the current apache.conf the line was RewriteCond %{HTTP_USER_AGENT} "GoogleBot". Corrected to google and case insensitive. Now the lines like this

  # ports must be kept in sync between apache.conf and /service/ch.oddb-*crawler/run  
  RewriteMap  lc int:tolower
  RewriteCond %{HTTP_USER_AGENT} "google"
  RewriteRule ^/(.*)$ http://localhost:8112/$1 [P,L]
  RewriteCond %{HTTP_USER_AGENT} "archiver|slurp|bot|crawler|jeeves|spider|\.{6}"
  RewriteRule ^/(.*)$ http://localhost:8212/$1 [P,L]

This works and I have the following entries in the log files

tail -f  /var/www/oddb.org.rack/log/2017/06/28/google_crawler_log
66.249.64.213 - - [28/Jun/2017:09:08:29 +0200] "GET /de/gcc/fachinfo/reg/61297/chapter/effects/currency/USD HTTP/1.1" 200 42410 0.0959
66.249.64.213 - - [28/Jun/2017:09:08:31 +0200] "GET /en/gcc/show/reg/55962/seq/01/pack/001 HTTP/1.1" 200 21505 0.7721
66.249.64.213 - - [28/Jun/2017:09:08:49 +0200] "GET /fr/gcc/print/fachinfo/00677 HTTP/1.1" 200 18048 0.1626
66.249.64.213 - - [28/Jun/2017:09:08:58 +0200] "GET /de/gcc/limitation_text/reg/65337/seq/01/pack/001/currency/USD HTTP/1.1" 200 11772 0.0348
66.249.64.213 - - [28/Jun/2017:09:09:02 +0200] "GET /fr/gcc/fachinfo/reg/61432/chapter/unwanted_effects HTTP/1.1" 200 29576 0.0925
66.249.64.213 - - [28/Jun/2017:09:09:19 +0200] "GET /en/gcc/patinfo/reg/53232/seq/01/currency/USD HTTP/1.1" 200 23378 0.0472
66.249.64.213 - - [28/Jun/2017:09:09:21 +0200] "GET /de/gcc/patinfo/reg/41570/seq/01/currency/EUR HTTP/1.1" 200 24676 0.0513
66.249.64.213 - - [28/Jun/2017:09:09:33 +0200] "GET /en/gcc/patinfo/reg/59410/seq/01 HTTP/1.1" 200 37447 0.0474
66.249.64.213 - - [28/Jun/2017:09:09:40 +0200] "GET /fr/gcc/patinfo/reg/36566/seq/02/currency/EUR HTTP/1.1" 200 26605 0.557

grep -iw google /var/www/oddb.org.rack/log/oddb/access_log | tail
66.249.64.213 - - [28/Jun/2017:09:09:40 +0200] "GET /fr/gcc/patinfo/reg/36566/seq/02/currency/EUR HTTP/1.1" 200 26605 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
62.202.191.7 - - [28/Jun/2017:09:09:47 +0200] "GET /de/gcc/fachinfo/reg/52703 HTTP/1.1" 200 48702 "https://www.google.ch/" "Mozilla/5.0 (iPhone; CPU iPhone OS 10_3_2 like Mac OS X) AppleWebKit/603.2.4 (KHTML, like Gecko) Version/10.0 Mobile/14F89 Safari/602.1"
66.249.64.213 - - [28/Jun/2017:09:09:49 +0200] "GET /fr/gcc/patinfo/reg/55663/seq/01 HTTP/1.1" 200 28057 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.64.215 - - [28/Jun/2017:09:10:24 +0200] "GET /de/gcc/price_history/reg/53634/seq/01/pack/036/search_type//search_query/Viscotears+Tropfgel HTTP/1.1" 200 15252 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.64.213 - - [28/Jun/2017:09:10:27 +0200] "GET /fr/gcc/fachinfo/reg/30972/chapter/indications/currency/CHF HTTP/1.1" 200 17548 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.64.215 - - [28/Jun/2017:09:10:40 +0200] "GET /fr/gcc/patinfo/reg/45351/seq/02/chapter/contra_indications/currency/EUR HTTP/1.1" 200 25915 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.64.213 - - [28/Jun/2017:09:10:43 +0200] "GET /en/gcc/ddd_price/reg/37561/seq/02/pack/045/search_query/Mucomed+600/search_type/st_registration/currency/USD HTTP/1.1" 200 14467 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.64.213 - - [28/Jun/2017:09:10:46 +0200] "GET /fr/gcc/patinfo/reg/58935/seq/02/currency/USD HTTP/1.1" 200 51188 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.64.213 - - [28/Jun/2017:09:10:52 +0200] "GET /en/gcc/patinfo/reg/30037/seq/01 HTTP/1.1" 200 22158 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
213.16.45.2 - - [28/Jun/2017:09:10:57 +0200] "GET /resources/images/patinfo/fr/Hemangiol_3_75_mg_ml_files/2.png HTTP/1.1" 200 20627 "https://www.google.bg/" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36"

Pushed commit Generated clear info when exiting because of too many sessions/used memory/threads

Must fix the problem that the up/down page does not appear. Added a RewriteRule /var/www/oddb.org.rack/doc/resources/errors/appdown.html /var/www/oddb.org.rack/doc/resources/errors/appdown.html [L] Now the text is displayed but not the logo.

This is fixed by changing the apache log rewrite rules like this

  RewriteEngine On
  RewriteCond %{DOCUMENT_ROOT}/%{REQUEST_URI}  -f
  RewriteRule ^/(.*)$ %{DOCUMENT_ROOT}/%{REQUEST_URI} [L,NC]
  RewriteRule  /var/www/oddb.org.rack/doc/resources/errors/appdown.html  /var/www/oddb.org.rack/doc/resources/errors/appdown.html [L]
  RewriteMap  lc int:tolower
  RewriteCond %{HTTP_USER_AGENT} "google"
  RewriteRule ^/(.*)$ http://localhost:8112/$1 [P,L]
  RewriteCond %{HTTP_USER_AGENT} "archiver|slurp|bot|crawler|jeeves|spider|\.{6}"
  RewriteRule ^/(.*)$ http://localhost:8212/$1 [P,L]

Must fix the error when trying to download a file like oddb2.csv. The path is not correctly calculated in src/view/user/download.rb.

But reverwerting to the pre rack version. Must fix later the following problems:

Fix rapflag problems

The reported problems are

  1. Delimiter should be ","
  2. BTC_exchange.csv (and USD_exchange.csv) does *not* take the last value of the day, for the ending balance (should be 0). INCOME value should not be repaeted if there is a missing date, i.e., days in which there were not transactions. Only the end of day balalnce should be repeated
  3. No more converting to USD
  4. XRP_exchange_summary.csv: do not repeat the income values for missing dates. Only do that for balance.
  5. If timestamp is the same, can we differentiate based on UNIX timestamp? ZEC_exchange.csv
  6. Where does this value come from (bold)ZEC_total.csv:ZEC;2017.03.24;0.0;285.39429918999997

For ZEC it should display 0 for May-22 as there was nothing in the wallet at this day

Sometimes there is are two identical timestamp at the end of the day. Which one is the Ending Value for that day?

Pushed commits Use ',' as column separator. Push version to 1.0.0 and Do not emit rate and balance_in_usd for bitfinex

Debugging shows, that the two timestamp values are identical

{"currency"=>"ZEC", "amount"=>"-1.46372534", "balance"=>"0.53627466", "description"=>"Exchange 1.46372534 ZEC for USD @ 260.0 on wallet Exchange", "timestamp"=>"1495556988.0"}
{"currency"=>"ZEC", "amount"=>"-0.53627466", "balance"=>"0.0", "description"=>"Exchange 0.53627466 ZEC for USD @ 260.0 on wallet Exchange", "timestamp"=>"1495556988.0"}

I think it might be that simply there were two other parties involved which each exchange a part ot the total sum into USD.

Pushed the commits

Pending is the problem where we find ZEC,2017.03.24,0.0,285.39429918999997 in output/bitfinex/ZEC_total.csv

view · edit · sidebar · attach · print · history
Page last modified on June 28, 2017, at 05:26 PM