A call to arms for Ruby HTTP client programmers

Can anyone recommend me an HTTP client library for Ruby? I know about Net::HTTP, but that's pretty low level: I'm looking for something like Apache Commons HTTPClient. I've written my own, but it's primitive. I basically need stuff like:

  • Following redirections automatically (to some pre-specified depth)
  • Automatically attaching cookies to requests (my current code does this, but not on a per-host basis, and not respecting cookie expirations, paths, secure etc.)
  • Enable me to parse HTML, tidy it, and extract parts of it

If there's nothing out there, I'll carry on building my own (codenamed: WAML [Web Automation Macro Language]). But it would be interesting to find out if anyone else is working on this already before I go too far.

Comments

A good REST client is a good HTTP client

Take a look at http://github.com/kaiwren/wrest/examples

I started this to replace Rails' ActiveResource, which is atrocious. Along the way I realised that before I can have a good REST client, I need a good HTTP client and Wrest is my attempt to do this. Do try it - I'm looking for feedback on how to improve it :)

Here's a recent useful

Here's a recent useful library: http://raa.ruby-lang.org/project/httpclient/

Thanks

Thanks for the above referece (comment above) as I am new to httpclient writing. Wow, this is an old post and it took me forever to find this info too. Glad I did. By the way, I love the blog post title, "no tech please, we're British" lmao!

Thomas

You're welcome. You might

You're welcome. You might want to look at HTTParty as well. I still think there's room for a more complete Ruby client implementation.

Thanks for the reference.

Thanks for the reference. I'll check it out next time I'm writing HTTP client code in Ruby.

Thank you!

I need that, as I am pretty new to writing HTTP client code. Thanks!

Probably too late, but - simplehttp

If this comment does nothing else but lead googlers to simplehttp, then it has served its purpose:
http://rubyforge.org/projects/simplehttp/

Thanks for the reference,

Thanks for the reference, looks well worth investigating.

RFuzz

I keep coming back to this thread, because the subject line is so compelling in the Google results...

Zed Shaw has a Ruby HTTP client library called RFuzz. It just solved my woes. For usage examples try this link: http://rfuzz.rubyforge.org/sample.html

Glad you like the title

Glad you like the title :)

Thanks for the RFuzz reference and pointer to usage examples. It looks quite similar to what I started with my own client library, though RFuzz is still alive while my library has died a death. Definitely a good one to keep in mind.

Me Too!

I've written something called Redback which is more focussed on spidering, which I think is what you are after. It doesn't do X/HTML parsing though. It was my intention to incorporate this. Presently it uses a regex for link discovery. It was designed to be lean and mean: from memory it is about 50 or so LOC. It has some cute features like recognizing duplicate URLs, optional depth-first or breadth-first as well as forcing https. I want to implement depth restrictions, total page get restrictions, ala wget and curl, but with the flexibility of Ruby. Let me know if it is of any use... Also, note that I've used Mechanize a few times (today even) and while it is decidedly slower because of the overhead of all those objects I suspect, it is very well put together. I wrote Redback in spite of knowing about Mechanize because I (mistakenly?) thought that it wasn't focussed on spidering but automating more complex actions. It certainly is possible to spider with Mechanize but I believe that most of the code to be written for this still has to be written to manage the process, which is what Redback already does. I had considered incorporating Mechanize (especially now that it incorporates Hpricot) for structural link detection and for structured extraction, and may yet do so.

Thanks, Thoran. I haven't

Thanks, Thoran. I haven't done much with WAML recently, as I ended up not using it very much. It's a topic I keep returning to every now and then. I'm not so much interested in spidering as in screen scraping: my original impetus was converting HTML pages to RSS feeds so I didn't have to keep visiting RubyForge to track project statistics. I also wrote some parse code, but I relied on HTML Tidy to make the pages tractable, then used XPath to do my querying. I also did some regular expression matching stuff, but this is more brittle than XPath. I keep meaning to look at HPricot but I haven't got round to it yet.

By the way, you didn't

By the way, you didn't provide a link to your library. Is it publically available?

Mechanize

I was quite recently in exactly your shoes. I searched and found what fits precisely your needs. It is called WWW::Mechanize:

http://www.ntecs.de/blog/Blog/WWW-Mechanize.rdoc

Cheers

Thanks for that, Greg. Interesting, as it sort of emulates a person clicking through a site. The stuff I've written works similarly, but is more focused on scraping and building new representations from the scrape. It's also pretty simple compared to Mechanize. But I will investigate further and may well switch.

Hmmm

This comment thread still running? I had some questions about it.