tech

Are "jobs" just disguised RPC in a RESTful application?

In a RESTful web application design, you typically first identify the resources in your application, the nouns. For example, imagine you're writing a library app., and you're working on adding items to a catalogue. So you've got catalogues and items as your nouns.

You then decide to implement the operations on items, which live in the catalogue. In REST, HTTP verbs map onto the operations you want to perform: POST = create a new resource when you don't know what its identifier should be; PUT = update; DELETE = delete; GET = query resources. So you might end up with:

HTTP requestOperation performed on resourceReturns
GET /catalogue/items?term=potterRetrieve items containing the term "potter"Representation of items, with a 200 OK status code
POST /catalogue/items
Request body contains representation of new item
Add a new item to the catalogue201 Created status code, with Location header set to URI of new resource, and representation of resource in response body
PUT /catalogue/items/<control number>
Request body contains representation of updated item
Replace existing representation with an updated one200 OK status code
DELETE /catalogue/items/<control number>Remove item at the specified location204 No Content status code

Fairly typical REST.

Then you realise you want to upload a whole pile of items at once, embedded in a single request for efficiency; but you don't want the client to have to wait while the items are inserted into the catalogue and properly indexed etc.. Maybe it will take 5 minutes or something, and you don't want to leave a web client hanging. Or perhaps you want to upload only a single item, but once items are uploaded they are put into a queue for processing by another system, so there's a wait.

What are your choices? Here are some ideas, partly gleaned from the RESTful Web Services book:

  • Don't allow bulk uploads. You can only upload one item at a time, and you just have to wait until the operation completes and you get your proper status code back. Not really a solution, though.
  • Allow bulk uploads, process the items, and return a multi-status 207 code with the response. You still have to wait for all the processing to finish, but the response body contains a list of response codes and status reports, one for each uploaded item. Again, you have to wait for the upload to finish before you can return anything.
  • Allow a bulk upload, but return a 202 Accepted status, spawning asynchronous jobs to do the processing in the background. The response body can contain the URIs for each uploaded item. Each item then has a status which can be queried by asking for the resource again. For example, when an item is first uploaded, you get its URI back as a Location; when you GET that, the object has its status set to Inactive or Under processing or something. When processing is complete, you get a status like Complete on the item instead. The down-side is that your resource representations are polluted with status information, which could be good or bad.
  • As above, but your request to /catalogue/items returns a 202 along with a handle to a "transaction" or "job" resource which wraps the resources you uploaded. Effectively, you treat the upload itself as a resource with a status you can query; the resources attached to that resource don't have to be polluted with status information, but you perhaps lose the granularity of individual status codes on resources. Or maybe you produce one transaction resource per uploaded resource?

However, what I'm not so keen on is the idea of a job or service being a resource. Why? Well, if I want to create items in my catalog, I don't want to wrap them in a job and post them to /jobs; if I want to query my items, I don't want to have to go to a query service at /services/query or similar.

What these paths hint at to me is that an operation is being represented by that path, rather than a resource: effectively, calling them is like doing RPC: you pass the resources you want to act on as arguments to the procedure you're calling. Often, there's also some implicit resource hidden away behind the job or service. Compare:

  • GET /catalog/items?term=potter: the catalogue is visible, and we know we're querying items within it
  • GET /services/query?term=potter: here there's an implicit catalogue and its items behind the service; effectively, these objects are passed invisibly to the procedure we're calling; also what we're querying is not explicit

Or:

  • POST /catalog/items: we're appending a new item to the catalogue; we can infer that our new item will then be available at /catalog/items/<some identifier>
  • POST /jobs: job is an amorphous category, and we could post pretty much any type of resource into it; and there aren't any hints from the API about how to get at those resources once we've posted them

It's kind of like the difference between object-oriented design (REST) and procedural design (RPC). While a job might look like a resource, my opinion is that it's really an amorphous wrapper around the real resource you should be representing. Typically, jobs get introduced to cope with asynchronous updates; I'd prefer to see asynchronous operations occurring on proper resources, but exposed using the batch processing approaches outlined above. Otherwise I fear you might lose your resources inside some vague blob of a "job" or "service".

Neats vs. scruffies

I did my Ph.D. in artificial intelligence, so was interested to read a few Wikipedia articles about it. One distinction I'd never heard of was neats vs. scruffies in the field.

I put myself in the scruffies camp, probably, though I always had a yen for predicate logic and formal grammars. To my mind, some of the AI scruffies weren't scruffy enough, and tried to model human intelligence without any reference to psychological data. I tried to redress the balance a bit, and compared my program's output with psychological data on human inference during story comprehension. You can read all about it here.

At the time I did my Ph.D., I was pretty unfashionable, as I was researching symbolic AI approaches, while everyone around me seemed to be doing neural networks. However, I thought that while sub-symbolic approaches might produce intelligent output, I struggled to see how that would lead to a description of the solution, or anything that might be built on or added to by humans. If you're trying to program a reasoning system, for example, is it enough to train a neural network to create associations, or do you need to write something which can reflect on the process by which it reached its solutions? Neural nets are great for recognition tasks, but I was never convinced they were suitable for reflecting on how they completed the task. I'm sure there are plenty of counter-arguments to my limited opinion, so feel free to enlighten me.

The exciting things I've done

What I've been up to recently, tech first:

  • I've been coordinating (in the loosest sense of the word) a project at Talis to build a library-specific layer (written in Java) over the Talis Platform. It's the first project I've coordinated which involves several other people, and it's been challenging but worthwhile to do. We try to follow agile practices, and are currently doing mini-sprints, a week at a time with planning on Monday, using a traditional story board plus Jira for issue tracking through the week. We do some pair programming, which has been really good fun (initially I was a bit daunted by it, but the team is very supportive of each other and includes lots of talented individuals with different strengths - I'm learning a lot).
    What we're doing: basically we store lots of RDF in the Platform, then our OPAC (Prism) fetches it through our library-specific layer, which talks to the platform, which returns stuff to the OPAC. So we've been putting functionality into our library-specific layer to support Prism. I've been keen to make sure everything is solid, so we've invested a lot of effort in a thorough suite of unit and functional tests. I've probably spent about 2 weeks out of the last three doing testing.
    To be honest, the whole testing terminology confuses me; but I have come across a useful continuum to describe different types of testing:
    • White box testing covers unit testing, testing individual components in isolation with an awareness of the application's internal structure. We do a lot of this in development, mainly using JUnit and EasyMock, to test pretty much every class and its methods.
    • Black box testing treats the application as something you put stuff into and get stuff out of; it doesn't require any knowledge of the internals, just the public API. For this, we've been using Canoo WebTest. I half like WebTest: it's marginally better than writing Java code to do tests, and does mean that you can write the tests without having to be a Java programmer (providing you understand Ant a bit). I'm still not sure it's quite right, but it just about does the job. We use this to send HTTP requests to a running instance of the application, and verify aspects of the responses (e.g. status codes, XML tags, text). Because we're testing a RESTful app., we don't really need something like Selenium, which pretends to be a human user: we just want to send one-off requests and check responses.
    • Grey box testing is in-between the two above. Basically, I think of it as testing against real objects which the application has to integrate with (e.g. databases, network sockets, threads, start/stop scripts, filesystem). While you can mock a lot of this in unit tests, this isn't a great replacement for testing against the real thing. Our tests in this region do things like checking whether files get moved around properly by certain processes and checking whether the web interface starts and responds correctly. We've been doing this with JUnit, the Apache HttpClient, Jetty, custom Java code, etc.. This, to my mind, is the messiest sort of testing. We've also explored interesting ways to isolate these tests from the real platform, currently by creating our own mock HttpClient and HttpMethod extensions, which do the trick.
  • I've had a bit of time to think about Jangle, a more generic library API/HTTP proxy tool written in Ruby, as discussions about it have got more interesting recently. I've been using it as a generic HTTP proxy to sit between a client and a REST service, and also to try out some ideas about how to structure this type of application in a simple, flexible way. I've put together some playground code which is in no way tested, runnable or even intelligible, but is fun. I'm hopeful I'll get more time to work on it before too long.
  • Moving house. Sorting things out for that has taken up quite a bit of time, e.g. cleaning the house, sorting out the garden a bit, ringing people up, sending email.
  • Child care. Nicola has had a nasty stomach bug, so I spent most of the weekend and Monday looking after Madeleine. These days we end up playing Uno a lot (she is very good at it), pretending to be on treasure hunts, dancing (Hot Chip, Cabaret Voltaire and "The King of the Swingers" have been recent faves), and going to the park. Madeleine is quite argumentative at the moment; Nicola says it's because she's a Smith, and that we're all like it (Chloe, are you reading this?).
    For instance, we spent a good 10 minutes a couple of days ago arguing about her Fifi and the Flowertots plate, with Madeleine claiming that Poppy is holding a melon, while I pointed out it must be a gooseberry as the nearby strawberry would otherwise be the same size as a melon. Madeleine claimed it must be a small melon and/or giant strawberry. (Here is a picture of the plate.) Then I stopped myself, realising that I'm a full-grown adult and she's only 4. And that I was simply being petty. That's what Nicola's talking about.

Presentation at Coventry University

I did a presentation on Rails to some students at Coventry University tonight, as part of their e-commerce M.Sc. course. Here are the materials (introductory presentation on Rails and a script for a demo. of Rails functionality).

QDOS

Quite interesting. Find out about your online presence.

Here's my QDOS.

Poem attributed to Han-shan (c. 9th Century)

My mind is like the autumn moon
Shining clean and clear in the green pool.
No, that is not a good comparison.
Tell me, how shall I explain?

Talis Library Platform News

I'm the featured team member in this month's Talis Library Platform News.

Doing all sorts of burning, ripping and encoding of video and DVDs and audio (on Linux)

Using Linux every day means that I often grapple with how to re-encode proprietary formats so that I can watch them on the computer of my choice. I also do some DVD ripping and creating new DVDs of home movies etc., for which the Linux command line tools work very nicely (more quickly, more consistently and in a more stable fashion than some of the GUIs).

So I've gathered a whole load of tips on encoding, ripping, burning, culled from dozens of forums, websites, manpages etc.. This, then, is the current state of my understanding on this topic, and hopefully distills many hours of pain into an easily-digestible format. It's not very well organised, but hopefully useful. I should mention that this stuff works on Ubuntu, but your mileage may vary. Here goes.

A note on tools

All of the tools I use are easily installable on Ubuntu, either from official repositories or universe/multiverse. You will also need to install the proprietary codecs if you want to work with them. Here's what I tend to use:

  • vlc media player
  • mplayer media player
  • ffmpeg format transformer
  • mencoder
  • dvgrab
  • dvdauthor
  • dvdbackup
  • growisofs
  • mkisofs
  • sox
  • lame
  • cdrecord

Burning an ISO onto a DVD/CD

To work out where your CD/DVD device is:

$ cdrecord -scanbus
scsibus1:
        1,0,0   100) 'HL-DT-ST' 'DVD+-RW GSA-T11N' 'A103' Removable CD-ROM
        1,1,0   101) *
        1,2,0   102) *
        1,3,0   103) *
        1,4,0   104) *
        1,5,0   105) *
        1,6,0   106) *
        1,7,0   107) *

What you're looking for is an entry with something other than a * in it; then get the content of the first column, which is the device ID (here, it's 1,0,0).

Then:

cdrecord -v -dao dev=1,0,0 image.iso

setting dev= to the ID you found above, and where image.iso is the path to your ISO image file.

Getting the audio out of a YouTube video into an mp3 file

First off, download the FLV version of the video. This is the tricky bit, but here's how to find it:

  1. Go to the YouTube page for the video
  2. Copy the content of the v parameter in the querystring (the video ID)
  3. View the source of the web page
  4. Search for "f": in the source; following this will be a long text string (the t parameter), e.g. "OEgsToPDskLJFcLZay_H2RO2hijVCxkP". This is some kind of secret token you'll need to do the download.
  5. Construct the URL as follows:
http://www.youtube.com/get_video?video_id=<video ID>&t=<t parameter>

Use your favourite browser or downloader to fetch the file (FLV format).

My new technique for getting the mp3:

ffmpeg -vn -i youtubevideo.flv youtubevideo.mp3

You'll need a recent ffmpeg for some YouTube videos, and will also need mp3 support compiled in to do this conversion.

My old technique for getting the mp3:

Once you've downloaded it, play it through mplayer, resampling the audio at 44.1KHz:

mplayer -vo null -vc null -ao pcm:file=out.wav -af resample=44100:0:0 youtubevideo.flv

Then, use lame to encode the wav file to mp3:

lame -h -V0 out.wav out.mp3

This does a reasonable quality, variable bit-rate mp3.

There are a couple of services which do this (e.g. YouTubeHack) and a command line script, but I couldn't get the services to work, and couldn't be bothered with the command line script.

mp4 or mov to mpg

Use mencoder for this. mp4 version:

mencoder in.mp4 -ovc lavc -oac lavc -o out.mpg

mov version is practically the same:

mencoder in.mov -ovc lavc -oac lavc -o out.mpg

Ripping from a dv camera with dvgrab

dvgrab --autosplit --timestamp --format jpeg

(The camera should be detected automatically, and I think this waits until it detects some input before the capture starts. I use a firewire cable to connect from the camera to the computer.) The files get named after timestamps coming from the film.

Encoding from .avi to .mpg

Use:

ffmpeg -i infile.avi outfile.mpg

(though this produces quite low quality output)

This produces better quality (bitrate = 800):

ffmpeg -i infile.avi -vcodec mpeg2video -acodec mp2 -b 800 outfile.mpeg

Encoding from .avi to .mpg suitable for DVD burning

ffmpeg -i finalmovie.avi -y -target pal-dvd -sameq -aspect 16:9 finalmovie.mpg

If you're in the US, changed pal-dvd to ntsc-dvd.

Ripping DVDs to hard disk

(See http://www.bunkus.org/dvdripping4linux/single/index.html for lots of good tips)

May need dvdcss decoder to be installed:

sudo /usr/share/doc/libdvdread3/examples/install-css.sh

Use dvdbackup to rip a DVD to hard disk.
See http://dvd-create.sourceforge.net/dvdbackup-readme.html for full instructions

# Get info. about DVD
dvdbackup -i /dev/cdrom -I

# Rip whole DVD
dvdbackup -M -i/dev/cdrom -o /media/usbdisk/dvdripping

# Rip main feature
dvdbackup -F -i /dev/cdrom -o /media/usbdisk/dvdripping

# Rip title set (in this case, title set 2)
dvdbackup -T 2 -i /dev/cdrom -o /media/usbdisk/dvdripping

# Rip title (here, rip title 1)
dvdbackup -t 1 -i /dev/cdrom -o /media/usbdisk/dvdripping

Direct DVD copying

This effectively copies the DVD's iso image to hard disk:

dd if=/dev/cdrom of=file.iso bs=2048

If the disk is encrypted, this might fail. In this case, it might be worth running this first:

dvdbackup -I -i /dev/cdrom

Then try dd again.

See http://gentoo-wiki.com/HOWTO_Backup_a_DVD#dd for more details

Note that this produces a mountable DVD image. However, it does not remove encryption, so you would still need to rip to individual VOBs using dvdbackup to get rid of that. However, it is possible to mount the iso and play it as if it were a DVD (see later): vlc is good for this.

Playing a partially ripped DVD

If you've ripped some of the content of a DVD (e.g. using dvdauthor -i /dev/cdrom -F), you can play the partial rip with:

mplayer dvd:// -dvd-device <ripped_dvd_directory>

(vlc might also be able to do this)

Encoding audio out of a VOB file (note that this drops the video altogether)

mplayer -vo null -vc null -ao pcm:file=outfile.wav infile.VOB

You could probably use ffmpeg for this, too.

Creating a DVD using dvdauthor when you don't want the whole disc

If you just have the main feature (a set of .VOB files) and you want to create a playable DVD from them.

Create a dvd.xml file in the top level directory of the ripped DVD (e.g. in VIDEO_TS)

<dvdauthor>
  <vmgm />
    <titleset>
      <titles>
        <pgc>
          <vob file="VTS_01_1.VOB" />
          <vob file="VTS_01_2.VOB" />
          <vob file="VTS_01_3.VOB" />
          <vob file="VTS_01_4.VOB" />
          <vob file="VTS_01_5.VOB" />
        </pgc>
      </titles>
    </titleset>
</dvdauthor>

(Adding another element for each VOB which you want included in your "movie". Note that these can equally well be your own mpg files.) You can also add titles etc.: see the man page for dvdauthor for more details. This works fine for me, though.

Then run it with:

dvdauthor -o <output directory for DVD structure> -x dvd.xml

Burning complete DVD structure to a new DVD

See http://www.linux.com/articles/53702

(I think you might need to have used dvdbackup -M (complete rip) for this to work, or have a DVD structure created using dvdauthor (see above))

growisofs -dvd-compat -Z /dev/cdrom -dvd-video <ripped dvd structure>

Creating an iso from a ripped DVD structure

If you've ripped the structure of a DVD (e.g. using dvdbackup) or created your own DVD structure (e.g. using dvdauthor), you can turn it into a single iso file with:

mkisofs -dvd-video <ripped dvd directory> | dd of=file.iso obs=32k seek=0

Mounting an iso filesystem so you can read it

mkdir mountpoint
sudo mount -o loop file.iso mountpoint

Once mounted, you can play the mounted iso (including its menu system) using vlc. For example, if we mounted it on the directory "mountpoint" we could do:

vlc dvd:///path/to/mountpoint

Encode VOB to mpg

If you ripped a VOB off a DVD but you want a smaller mpg:

ffmpeg -i VTS_01_1.VOB -vcodec mpeg2video -acodec mp2 -b 1000 sleeper1.mpg

Ripping RealPlayer streams

See https://help.ubuntu.com/community/HowToRipRealaudioStreamsToMp3 for full instructions

Short version:

vsound --timing --dspout --file=myfilename.wav realplay http://url.to.rip

Ruby script for converting from m4a (iTunes format?) to mp3

This is a shell script for converting files from m4a to mp3. It uses mplayer and lame behind the scenes. sox is supposed to do this, but I can never work out how to install proprietary codecs for it. This doesn't retain tags, unfortunately. Sneetchalizer will probably also work.

#!/opt/lampp/bin/ruby
# only works if the script is in the directory with the m4a files
prefix = "new_file_prefix"
Dir['*.m4a'].each do |f|
  new_filename = prefix + File.basename(f).gsub(' ', '_').slice(3..-5).downcase
  wav = new_filename + ".wav"
  mp3 = new_filename + ".mp3"
  `mplayer -ao pcm:file='#{wav}' '#{f}'`
  `lame -h -b192 #{wav} #{mp3}`
  `rm #{wav}`
end

Ruby script for converting from ogg to mp3

Uses ogg123. This script loses all the tags, though. Sneetchalizer would do this, too.

#!/opt/lampp/bin/ruby

filename = ARGV[0]

if filename.nil? or !(/\.ogg$/ =~ filename)
  puts "I don't think that's an ogg file, mister"
  exit
end

base_filename = filename.chomp(".ogg")
mp3_filename = base_filename + ".mp3"

`ogg123 -d au -f - #{filename} | lame - #{mp3_filename}`

mp3 to ogg

Bit basic, this, but you get the idea:

mpg321 -s input.mp3 | oggenc -r -a "artist" -t "title" -b 100 -M 140 -o output.ogg -

ogg to wav

sox is OK for this, as neither codec is proprietary:

sox input.ogg output.wav

BBC iPlayer Ruby code

I threw together some code for querying and parsing the BBC iPlayer search pages and emailing the results to you. You configure it by putting the names of the programmes you want to look out for into config.yaml, along with your email details, e.g.

search_terms:
  - mighty boosh
  - lead balloon
  - never better

email:
  email_to: email@example.com
  email_from: email@example.com
  server: mail.example.com
  user: email@example.com
  pass: password
  auth_type: login

Copy the sample config.yaml.dist to config.yaml in the same directory and edit.

I run the command line script via cron once a day by calling the cli.rb script with an --email switch, e.g. with the following line in crontab:

0 21 * * * /usr/bin/ruby /home/ell/dev/iplayer/cli.rb --email

You could as easily run it from a Windows scheduled task.

Dependencies are:

  • hpricot
  • rack

What it does is request the iPlayer search page with each search term, one after the other. If there are multiple pages of results, it fetches each of those too, aggregating the results. It will then email you a list of links to the programmes on the iPlayer site. One thing it does which the iPlayer search page doesn't do is sort the matching results by how long is left for you to watch them: the ones with the least amount of time left are at the top.

You can also run it as a local web server on port 3334 with:

ruby server.rb

Which then becomes accessible at:

http://localhost:3334/

Nothing fancy: just an HTML page with the search results in it, using the same config. as the command-line client. You can also call the page with extra search parameters to perform custom one-off searches, e.g.

http://localhost:3334/?search=doctor+who

It's not a serious project, just a convenience for me. GPL licence.

Syndicate content