Hi Amit, we can sync up on Monday regarding this.
These are the current open issues/points we need to consider related to background processing and document processing.
1) https://tracker.exphosted.com/view.php?id=4719
This is more of a ruby garbage collection issue. God always sees that delayed job is taking too much memory during document conversion and restarts it. This needs to be fixed. As ruby process spawns delayed job worker, we may need to explicitly garbage collect in the code that is run by delayed job. It is observed that delayed job worker is more memory heavy than the other options like resque/sidekique.
2) To have multiple workers of delayed job we need to upgrade delayed job plugin and also make sure it supports our REE/rails versions(as DJ depends on active record). Also, make sure that our code is thread safe.
3) https://tracker.exphosted.com/view.php?id=5965
Observed that as soon as a record is queued in delayed jobs table , it is removed and mail is sent that files are processed. But actually the conversion did not happen.
4) We have issues with office 2007+ document conversions due to current openoffice version. To achieve better results, we need to use libreoffice on Ubuntu servers. So, not only background processing , synchronous document conversion also needs to be moved to separate machine.
5) As of now, we use both workling and delayed job in our code base for background tasking. We need to standardize them and decide on which one can be used across the system.
1) Fix delayed job : Running multiple delayed job processes on same machine and also have delayed job run on multiple machines.
http://stackoverflow.com/questions/4621817/rails-can-i-run-backgrounds-jobs-in-a-different-server
2) Delayed job alternatives:
- http://www.celeryproject.org/
Initial evaluation of delayed jobs alternatives -
https://wiki.exphosted.com/doku.php/background_jobs_crossbow
Below are few changes made in how the background jobs are performed
1. Stop god on both app servers. (Ignore unable to stop god messages during deployment as we are stopping god here.)
2. Changes to /deploy/systasks/god.sh .
3. Changes to /deploy/crossbow/shared/config/generic_monitoring.god
4. Remove workling reference from /deploy/crossbow/shared/config/<environment>.rb
(remove the line - Workling::Remote.dispatcher = Workling::Remote::Runners::StarlingRunner.new)
1) By default delayed job failed jobs are attempted 25 times. This count can be reduced to 10 or even 5 to make sure delayed job won't waste time reprocessing the failing jobs again.