Are "jobs" just disguised RPC in a RESTful application?

In a RESTful web application design, you typically first identify the resources in your application, the nouns. For example, imagine you're writing a library app., and you're working on adding items to a catalogue. So you've got catalogues and items as your nouns.

You then decide to implement the operations on items, which live in the catalogue. In REST, HTTP verbs map onto the operations you want to perform: POST = create a new resource when you don't know what its identifier should be; PUT = update; DELETE = delete; GET = query resources. So you might end up with:

HTTP requestOperation performed on resourceReturns
GET /catalogue/items?term=potterRetrieve items containing the term "potter"Representation of items, with a 200 OK status code
POST /catalogue/items
Request body contains representation of new item
Add a new item to the catalogue201 Created status code, with Location header set to URI of new resource, and representation of resource in response body
PUT /catalogue/items/<control number>
Request body contains representation of updated item
Replace existing representation with an updated one200 OK status code
DELETE /catalogue/items/<control number>Remove item at the specified location204 No Content status code

Fairly typical REST.

Then you realise you want to upload a whole pile of items at once, embedded in a single request for efficiency; but you don't want the client to have to wait while the items are inserted into the catalogue and properly indexed etc.. Maybe it will take 5 minutes or something, and you don't want to leave a web client hanging. Or perhaps you want to upload only a single item, but once items are uploaded they are put into a queue for processing by another system, so there's a wait.

What are your choices? Here are some ideas, partly gleaned from the RESTful Web Services book:

  • Don't allow bulk uploads. You can only upload one item at a time, and you just have to wait until the operation completes and you get your proper status code back. Not really a solution, though.
  • Allow bulk uploads, process the items, and return a multi-status 207 code with the response. You still have to wait for all the processing to finish, but the response body contains a list of response codes and status reports, one for each uploaded item. Again, you have to wait for the upload to finish before you can return anything.
  • Allow a bulk upload, but return a 202 Accepted status, spawning asynchronous jobs to do the processing in the background. The response body can contain the URIs for each uploaded item. Each item then has a status which can be queried by asking for the resource again. For example, when an item is first uploaded, you get its URI back as a Location; when you GET that, the object has its status set to Inactive or Under processing or something. When processing is complete, you get a status like Complete on the item instead. The down-side is that your resource representations are polluted with status information, which could be good or bad.
  • As above, but your request to /catalogue/items returns a 202 along with a handle to a "transaction" or "job" resource which wraps the resources you uploaded. Effectively, you treat the upload itself as a resource with a status you can query; the resources attached to that resource don't have to be polluted with status information, but you perhaps lose the granularity of individual status codes on resources. Or maybe you produce one transaction resource per uploaded resource?

However, what I'm not so keen on is the idea of a job or service being a resource. Why? Well, if I want to create items in my catalog, I don't want to wrap them in a job and post them to /jobs; if I want to query my items, I don't want to have to go to a query service at /services/query or similar.

What these paths hint at to me is that an operation is being represented by that path, rather than a resource: effectively, calling them is like doing RPC: you pass the resources you want to act on as arguments to the procedure you're calling. Often, there's also some implicit resource hidden away behind the job or service. Compare:

  • GET /catalog/items?term=potter: the catalogue is visible, and we know we're querying items within it
  • GET /services/query?term=potter: here there's an implicit catalogue and its items behind the service; effectively, these objects are passed invisibly to the procedure we're calling; also what we're querying is not explicit

Or:

  • POST /catalog/items: we're appending a new item to the catalogue; we can infer that our new item will then be available at /catalog/items/<some identifier>
  • POST /jobs: job is an amorphous category, and we could post pretty much any type of resource into it; and there aren't any hints from the API about how to get at those resources once we've posted them

It's kind of like the difference between object-oriented design (REST) and procedural design (RPC). While a job might look like a resource, my opinion is that it's really an amorphous wrapper around the real resource you should be representing. Typically, jobs get introduced to cope with asynchronous updates; I'd prefer to see asynchronous operations occurring on proper resources, but exposed using the batch processing approaches outlined above. Otherwise I fear you might lose your resources inside some vague blob of a "job" or "service".