<?xml version="1.0" encoding="UTF-8"?>
<article>
  <content>&lt;p&gt;Introduction to background processing&lt;br /&gt;If you&amp;rsquo;ve written an application before, chances are you ended up wanting to do something while the user waited for that thing, that could potentially take a while. It might be generating a PDF, sending out bulk emails, grabbing information from an API, or something that involves a lot of data which is slow to load. In these cases, you can take advantage of background processing; chopping this operation into a job or task, giving it to a background worker that processes it outside of the web request, and returning the data to the user later through AJAX or simply in the data your site shows. Let&amp;rsquo;s take a real-world example to explore this a bit better.&lt;br /&gt;I have an application which takes a user&amp;rsquo;s API key and user ID and gives it to the game EVE Online. This application then makes around 10 calls to the API with these credentials, stores around 3,000 rows of data, and the report generated is then made available to another user of the site who has requested this report. With this API-fetching stage potentially taking tens of seconds, or more if the API servers are being slow or are down, it makes sense to pull the data-retrieving stage out to a separate task which is triggered by the user. The user enters their details, they are checked with a single quick poll to ensure the credentials are good (with a short timeout in case the API servers are down), and are either given an error or told that their report is being generated. Once started, the job is stored in the database with their credentials, and the job runs, eventually spitting out a fresh report into the database which shows up in the user&amp;rsquo;s interface.&lt;br /&gt;Notably not all of the task at hand is done by the background job, a quick check is done to inform the initial feedback to the user. Background jobs are typically only used where the task will take several seconds; long enough to annoy the user or tie up application servers.&lt;br /&gt;Delayed_job and other plugins&lt;br /&gt;There are several choices for plugins to use in background processing. BackgrounDRb is one of the most mature plugins but doesn&amp;rsquo;t work at all on Windows, making developing with it difficult if you run Windows on your development environment. It does however support crontab-style automated tasks, saving you tinkering with the cron tool directly to run automatic jobs on a daily or hourly basis.&lt;br /&gt;Nanite is another library which is somewhat more Rails-independent and thus a little more complex to get working than others, so it&amp;rsquo;s not a great choice for beginners. If you need a huge amount of flexibility, you can set up RabbitMQ (the Erlang job queue it requires), and can run your code in an EventMachine-supporting environment (which at the time of writing rules out Passenger-hosted sites), then you can have a look at this fantastically fast and flexible tool. Both BackgrounDRb and Nanite are great tools and worth considering when choosing a worker, but because they are more complex to get started with, this guide will focus on delayed_job. Many of the concepts in delayed_job are replicated in other plugins, so hopefully you can still use some of the advice and information in this guide when working with these tools.&lt;br /&gt;Delayed_job is a very flexible, small plugin which has several advantages: it&amp;rsquo;s pure Ruby, it uses an ActiveRecord job queue, and is easy to hack on/modify for your own needs if it doesn&amp;rsquo;t support what you need out of the box. These features make it ideal for most cases and great for beginners. It&amp;rsquo;s distributed as a Rails plugin, making installation a cinch, though at the time of writing it requires you to add a migration to your database manually.&lt;br /&gt;The plugin is split into several sections: the Delayed::Job class which represents a job, and the Delayed::Worker class which is responsible for getting jobs and working through them. We&amp;rsquo;ll take a close look at the concepts in delayed_job and other background processing tools, and then look under the hood to see how DJ implements those concepts.&lt;br /&gt;Delayed_job Concepts&lt;br /&gt;The underlying concepts of delayed_job are simple enough to understand and should appear familiar to those who have used other queue-based background workers. This is one of the plugin&amp;rsquo;s key strengths; it is a very simple plugin that is easily extended and adapted.&lt;br /&gt;Delayed::Job is the class that represents a single job. It subclasses ActiveRecord::Base, and is backed onto the table delayed_jobs. This table contains locked_at/_by columns which are used for workers to lock jobs to work on them, a run_at and failed_at pair which define a job&amp;rsquo;s status, and created_at/updated_at columns. As well as this, there is a field which stores the serialized struct; this lets delayed_job know what class to call perform on.&lt;br /&gt;Other classes are fairly token in understanding the concepts, other than Delayed::Worker. This class wraps a simple loop which performs Job#work_off, which we&amp;rsquo;ll look at in more detail next.&lt;br /&gt;Under the hood and failure conditions&lt;br /&gt;The Job#work_off method is fairly simple, it gets a given number of jobs (It&amp;rsquo;s only parameter, by default 100), and works through the stack. It reserves each job, calls the perform method on the job, and counts successes/failures. Some more of the logic behind the scenes is found in Job#reserve, which handles actually locking the job in question. It accepts a block, and provides the job it has reserved as a variable to that block. Essentially, this handles locking rather neatly without having to mess about with the messier parts of that particular problem if you want to write or adapt your own worker methods to perform specific jobs and so on.&lt;br /&gt;Failure conditions are to be expected in development and aren&amp;rsquo;t something to overlook in production; delayed_job handles this with a last_error column which stores the traceback from any error that occurs in job processing. However, if a job fails, it is retried. If that job is going to send 10,000 emails, you probably don&amp;rsquo;t want it to repeat if it gets halfway through and hits a dodgy email, you&amp;rsquo;d rather just have it send out half and complain at you quietly, rather than sending 5,000 emails out 20 times to the same people. It&amp;rsquo;s not a recommended path to happy customers, even if you ignore the server load issue!&lt;br /&gt;Fortunately, Job has a constant called: MAX_ATTEMPTS. Change this and you can have it try once before giving up. It also has a class attribute: destroy_failed_jobs. By default this is true, meaning job queues don&amp;rsquo;t end up with broken jobs. However, it can make debugging tricky, and I&amp;rsquo;d recommend changing this in development. You may want to use something like the following instead of the default of true:&lt;br /&gt;self.destroy_failed_jobs = (ENV[&amp;lsquo;RAILS_ENV&amp;rsquo;]==&amp;rsquo;production&amp;rsquo; ? true : false)&lt;br /&gt;This will set the variable accordingly based on your environment settings. If you use a plugin such as exception_notifier or a tool like Exceptional, you won&amp;rsquo;t get notified about errors in your worker methods as delayed_job rescues these and stops an exception being raised. You can handle exceptions manually by adding your code (such as sending an email) in the rescue section of Job#work_off&amp;rsquo;s reserve begin/rescue block. Out of the box, delayed_job does not support sending emails on errors or similar error reporting.&lt;br /&gt;An example worker&lt;br /&gt;Let&amp;rsquo;s look at an example worker. Let&amp;rsquo;s say we want to receive a YAML-serialized object from another site using a web hook:&lt;br /&gt;class ProcessPushData &amp;lt; Struct.new(:raw_object)&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; def perform&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; object = YAML::load(raw_object)&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;nbsp;&amp;nbsp;&amp;nbsp; # Do stuff with object&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; end&lt;br /&gt;end&lt;br /&gt;That&amp;rsquo;s as simple as they get! It&amp;rsquo;s really that easy, all we need is a class that responds to a perform method, and we&amp;rsquo;re good! From within this class you have access to your Rails environment, such as other models. So if we now wanted to create some new models from that deserialized object, it&amp;rsquo;s not a problem.&lt;br /&gt;Working with Delayed::Job&lt;br /&gt;Jobs are all well and good, but how do we add new ones? Here&amp;rsquo;s a simple example from an application I&amp;rsquo;m using delayed_job on:&lt;br /&gt;Delayed::Job.enqueue MarketOrderUploadJob.new(params[:log], @user.id, @key.id)&lt;br /&gt;We simply call the enqueue method, passing it a new instance of the class, initialized with the variables it expects. enqueue then serializes this and stores it to the queue to be picked up by a worker. We can get a little more complex with something like this:&lt;br /&gt;Delayed::Job.enqueue(1,Time.now+1.day){ MarketOrderUploadJob.new(params[:log],&lt;br /&gt;@user.id, @key.id) }&lt;br /&gt;This shows two features: priorities and delayed jobs. The former feature is self explanatory; the higher the priority, the faster that job will be picked up by a worker (the job finds SQL orders by priority in descending order). The delayed job feature allows you to set jobs which will run in the future, which can be useful in some cases but is typically not needed.&lt;br /&gt;Of course, typically we&amp;rsquo;ll want to have some sort of interface for viewing our job queue&amp;rsquo;s status. Easy!&lt;br /&gt;@jobs = Delayed::Job.find(:all, :order =&amp;gt; &amp;lsquo;id DESC&amp;rsquo;)&lt;br /&gt;Delayed::Job is just another ActiveRecord descended class, so we can treat it as a model to some extent. There&amp;rsquo;s a few tricks you might find useful when dealing with jobs in your views, though.&lt;br /&gt;j.deserialize(j.handler).class #=&amp;gt; The name of the class this worker is using&lt;br /&gt;j.last_error #=&amp;gt; The last error this job raised during processing&lt;br /&gt;You can also show a job&amp;rsquo;s status if locked_at is set, the job is being worked on. If last_error is set then the job hit an error and could be flagged for investigation.&lt;br /&gt;There&amp;rsquo;s another common pattern you may wish to use when working with data a job should return. If you are generating a report via a delayed job, you can poll that job or the report using AJAX on the page, displaying a loading indicator while the report is generated while ensuring a snappy response for the user. The easiest way to implement this is to add a Boolean flag to your report&amp;rsquo;s model, but you can also record the job ID and look at the job status that way.&lt;br /&gt;Testing and deployment strategies&lt;br /&gt;Testing your delayed_jobs is easy enough, simply run the perform method in your tests and see if it does what you&amp;rsquo;d expect. It&amp;rsquo;s best to keep the testing of delayed_job itself separated from your workers, of course, so add them into your test suite separately from the plugin&amp;rsquo;s RSpec tests.&lt;br /&gt;Keeping the workers organized in your source can be a huge help. Personally I like to keep mine in app/jobs, and use:&lt;br /&gt;config.load_paths += %W( #{RAILS_ROOT}/app/jobs )&lt;br /&gt;to get Rails to load the classes within the folder into the environment so delayed_job can use them.&lt;br /&gt;Deployment of the workers can be more complex. On server-side all you need to do is to call a rake task, jobs:work, to work off jobs. I use God, a Ruby-based process monitor and manager, to manage my workers. God&amp;rsquo;s typical recipes can be used for this, and a full recipe can be found in the resources section at the end of this article. The only slightly tricky bit is the start command:&lt;br /&gt;w.start = &amp;ldquo;rake -f #{RAILS_APPLICATION_ROOT}/Rakele jobs:work&lt;br /&gt;RAILS_ENV=production&amp;rdquo;&lt;br /&gt;You can of course run multiple workers, which will name themselves by hostname and process ID by default. There&amp;rsquo;s one other neat trick you can do with delayed_job; let&amp;rsquo;s say you have a very time-critical background job that needs doing as soon as possible, as well as a bunch of not-so-important slow-running jobs. You can priorities, which will help to some extent, but even better is to have a worker dedicated to these jobs. Pick a priority, and specify MIN_PRIORITY on the command line when you start up the worker:&lt;br /&gt;w.start = &amp;ldquo;rake -f #{RAILS_APPLICATION_ROOT}/Rakele jobs:work&lt;br /&gt;RAILS_ENV=production MIN_PRIORITY=2&amp;rdquo;&lt;br /&gt;This will make this worker ignore all jobs except those with a priority of 2 or above. You can also use the MAX_PRIORITY variable for even tighter control. Using these simple flags you can build complex worker setups that are easily managed from within God or another service manager of your choice.&lt;br /&gt;Now you&amp;rsquo;ve got your job written, tested, workers set up on your server, all that remains is to deploy, restart your workers, and enjoy the extra flexibility of a background processing queue.&lt;/p&gt;</content>
  <created-at type="datetime">2009-08-30T03:26:34Z</created-at>
  <discuss-url>http://railsmagazine.com/forums/2/topics/67</discuss-url>
  <id type="integer">38</id>
  <issue-id type="integer">4</issue-id>
  <number type="integer">2</number>
  <title>Background Processing with Delayed_Job</title>
  <updated-at type="datetime">2009-12-24T20:49:48Z</updated-at>
</article>
