====== Background jobs in crossbow:====== Background jobs are executed using delayed_jobs gem as of YAREN. \\ This has been running fine barring God restarting delayed_job process. Very often, God (gem) restarts delayed_job. \\ This has been reproducable when uploading large ~> 50 mb files in the "Add Content" section of crossbow. \\ Further analysis show that delayed_job does get bloated over time. \\ Following workflow was followed when the tests were conducted: \\ 1) Login to crossbow. \\ 2) Login to the server. \\ 3) Start monitoring the ruby process related to delayed_job (shell). \\ 4) Upload a 50mb ppt file. \\ 5) Click Save button. \\ 6) Continue reviewing the cpu and memory consumption in the shell. \\ Steps 4-6 were repeated to consistently capture usage. \\ The behaviour (tracker link: https://tracker.exphosted.com/view.php?id=4719) was pretty conforming in relation to the file size and type uploaded. \\ ** For a short time solution, we should increase the :memory parameter in God to 2G for delayed_job. \\ Increasing the limit to this threshold, removed the God restart trigger for files ~200MB on test machine. ** Executing a simple CPU intensive operation coupled with large object storage ( calculate sha256 checksum and store in an array) reveals that delayed_job might be adding to the memory leaks but definitely is not great at cleaning up leaks / releasing memory used by the subprocesses created under it. \\ Furthermore, OSS software and libraries used in crossbow are out of our control and hence optimizing or replacing delayed_job so that memory is better managed is the only solution. \\ ===== Alternatives: ===== Cloud based WaaS - SaaS: - Heroku Workers - Amazon SNS - IronWorker. OSS: - Delayed_job (Currently Used) - [[https://github.com/woahdae/delayed_job_spawner|Delayed_job_spawner]] - [[https://github.com/resque/resque|Rescue]] - [[https://github.com/mperham/sidekiq|Sidekiq]] Cloud based alternatives do have a free plan (or cost negligible). But its more of a list for future reference. \\ {{:bj_comparison.png?800|}} \\ 1) Rescue and Sidekiq do not officially support REE 1.8.7. However, that is mainly because they do not actively test against REE 1.8.7. Official stance is "It should work" which is sad as they showcase an awesome featureset. \\ 2) This might not be required at this stage but we will(should) need Queueing ability in near future (yay!) to shard and scale the queue processing backend via queue based routing. \\ 3) Self explainatory. Redis is an extra layer. \\ 4 & 5) Self explainatory. However, it is important to note that having a "monitored" queue is extremely essential. We should be able to know why a job failed and how we can fix it for the current job and does not repeat. Having a web ui or any ui for that matter is super important and useful. \\ 6) Sidekiq requires any gems or code that scheduled to be Threadsafe. This breaks supports for many "un-threadsafe" gems - RMagick for example. \\ 7 && 8) DJS and Rescue fork new process. This does add some overhead in CPU (context switching, scheduler) and RAM (its own address space) but gives a more resilient and memory efficient handling of jobs. This also gives us concurrency (using multiple CPU cores by default). Sidekiq's threads' will also efficiently handle memory and respawning but it will not be concurrent by default - we can always execute more than one sidekiq process to enable concurrency across cores. \\ 9) DJS and Rescue will take longer (from ms to seconds) compared to Sidekiq. Core reasons being associated with process fundamentals of creating a new process vs. a thread. (hints in #8 above) \\ **) GitHub used DJ for many months to process almost 200 million jobs. Rescue and Sidekiq have great potential for scaling given the tiered arch. \\ Rescue has been around, sidekiq is the new kid on the block (comparatively). ===== Conclusion =====