<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xml:base="http://townx.org" xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel>
 <title>townx - How I worked out that curl is doing bad things with MARC - Comments</title>
 <link>http://townx.org/blog/elliot/how-i-worked-out-curl-doing-bad-things-marc</link>
 <description>Comments for &quot;How I worked out that curl is doing bad things with MARC&quot;</description>
 <language>en</language>
<item>
 <title>How I worked out that curl is doing bad things with MARC</title>
 <link>http://townx.org/blog/elliot/how-i-worked-out-curl-doing-bad-things-marc</link>
 <description>&lt;p&gt;I work on a system at &lt;a href=&quot;http://talis.com/&quot;&gt;Talis&lt;/a&gt; which posts &lt;span class=&quot;caps&quot;&gt;MARC &lt;/span&gt;records from customer library databases into a &lt;span class=&quot;caps&quot;&gt;MARC &lt;/span&gt;to &lt;span class=&quot;caps&quot;&gt;RDF &lt;/span&gt;transformer. The resulting &lt;span class=&quot;caps&quot;&gt;RDF &lt;/span&gt;generated from the &lt;span class=&quot;caps&quot;&gt;MARC &lt;/span&gt;is sent into the &lt;a href=&quot;http://www.talis.com/platform/&quot;&gt;Talis Platform&lt;/a&gt;, where it&#039;s used to power &lt;a href=&quot;http://talis.com/prism&quot;&gt;Prism&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Over the last day or so I&#039;ve been working on a bug which has prevented some records going correctly through this process. Along the way, I noticed another bug occurring somewhere between the post from the customer site into our &lt;span class=&quot;caps&quot;&gt;MARC &lt;/span&gt;to &lt;span class=&quot;caps&quot;&gt;RDF &lt;/span&gt;transformer. It looked as if line break characters in the original &lt;span class=&quot;caps&quot;&gt;MARC &lt;/span&gt;record were being lost somewhere in the process. Consequently, when the &lt;span class=&quot;caps&quot;&gt;MARC &lt;/span&gt;was pushed into the transformer, the record got spat out as invalid, as the length specified in the &lt;span class=&quot;caps&quot;&gt;MARC &lt;/span&gt;leader didn&#039;t correspond to the length of the record (now it had lost its line break characters). (By the way, working directly with byte streams is the only way to work with &lt;span class=&quot;caps&quot;&gt;MARC, &lt;/span&gt;for precisely this reason.)&lt;/p&gt;

&lt;p&gt;I had a sudden insight on the way home, triggered by remembering issues I&#039;d had with &lt;a href=&quot;http://curl.haxx.se/&quot;&gt;curl&lt;/a&gt; (the command line &lt;span class=&quot;caps&quot;&gt;HTTP &lt;/span&gt;client) working on &lt;a href=&quot;http://code.google.com/p/hardfidget/&quot;&gt;another personal project&lt;/a&gt;. On that project, I&#039;d been trying to post &lt;span class=&quot;caps&quot;&gt;RDF &lt;/span&gt;triples in ntriple format into my application using curl. However, the application only seemed to recognise the first &lt;span class=&quot;caps&quot;&gt;RDF &lt;/span&gt;triple in the posted file. I couldn&#039;t understand why.&lt;/p&gt;

&lt;p&gt;Then, when I echoed the body of the &lt;span class=&quot;caps&quot;&gt;HTTP &lt;/span&gt;request, as received by my app from curl, I realised the issue: curl was sending the body of the request &lt;span class=&quot;caps&quot;&gt;WITHOUT LINE BREAKS.&lt;/span&gt; As line break characters act as the delimiter between triples in &lt;span class=&quot;caps&quot;&gt;RDF &lt;/span&gt;ntriple format, my app was only seeing a single &lt;span class=&quot;caps&quot;&gt;RDF &lt;/span&gt;ntriple. When I tried an alternative tool to send the posts (the extremely useful &lt;a href=&quot;https://addons.mozilla.org/en-US/firefox/addon/2691&quot;&gt;Poster add-on for Firefox&lt;/a&gt;), the ntriples were received correctly.&lt;/p&gt;

&lt;p&gt;Once I remembered this, I decided to do some debugging of the kind of requests curl would send if it were posting &lt;span class=&quot;caps&quot;&gt;MARC &lt;/span&gt;records. My hypothesis was that curl was stripping line break characters from the &lt;span class=&quot;caps&quot;&gt;MARC &lt;/span&gt;record (which is bad, as they are valid characters in &lt;span class=&quot;caps&quot;&gt;MARC&lt;/span&gt;), and hence causing the record to be shorter than the leader said it should be.&lt;/p&gt;

&lt;p&gt;First step was to put together something to echo and/or save &lt;span class=&quot;caps&quot;&gt;HTTP &lt;/span&gt;request bodies. &lt;a href=&quot;http://rack.rubyforge.org/&quot;&gt;Rack&lt;/a&gt; is ideal for this sort of thing, so I used this little Rack web server program:&lt;/p&gt;



&lt;pre&gt;
require &#039;rubygems&#039;
require &#039;rack&#039;

def save_body(body)
  File.open(&#039;last_raw_request&#039;, &#039;w&#039;) {|f| f.write(body)}
  body
end

Rack::Handler::WEBrick.run(lambda {|e| [200, {}, save_body(e[&#039;rack.input&#039;].read)]}, :Port=&amp;gt;7777)
&lt;/pre&gt;



&lt;p&gt;This saves the raw request body to a file called &quot;last_raw_request&quot;.&lt;/p&gt;

&lt;p&gt;I first posted a &lt;span class=&quot;caps&quot;&gt;MARC &lt;/span&gt;file with line breaks in it (attached) using Poster (with Content-Type set to application/marc21) through Firefox. The &lt;span class=&quot;caps&quot;&gt;MARC &lt;/span&gt;file came through intact and still valid.&lt;/p&gt;

&lt;p&gt;I then posted a &lt;span class=&quot;caps&quot;&gt;MARC &lt;/span&gt;file with line breaks in it using curl:&lt;/p&gt;



&lt;pre&gt;
curl -d @marcfile.mrc -H &amp;quot;Content-Type:application/marc21&amp;quot; &lt;a href=&quot;http://localhost:7777/&quot; title=&quot;http://localhost:7777/&quot;&gt;http://localhost:7777/&lt;/a&gt;
&lt;/pre&gt;



&lt;p&gt;Which produced an invalid &lt;span class=&quot;caps&quot;&gt;MARC &lt;/span&gt;file with line breaks missing.&lt;/p&gt;

&lt;p&gt;The solution is to use the &lt;code&gt;--data-binary&lt;/code&gt; switch when using curl to send binary data, which we&#039;re not doing when sending &lt;span class=&quot;caps&quot;&gt;MARC &lt;/span&gt;from the customer site. Mostly this doesn&#039;t matter, but it does when the &lt;span class=&quot;caps&quot;&gt;MARC &lt;/span&gt;record contains line break characters.&lt;/p&gt;

&lt;p&gt;Namely:&lt;/p&gt;



&lt;pre&gt;
curl --data-binary @marcfile.mrc -H &amp;quot;Content-Type:application/marc21&amp;quot; &lt;a href=&quot;http://localhost:7777/&quot; title=&quot;http://localhost:7777/&quot;&gt;http://localhost:7777/&lt;/a&gt;
&lt;/pre&gt;</description>
 <comments>http://townx.org/blog/elliot/how-i-worked-out-curl-doing-bad-things-marc#comments</comments>
 <category domain="http://townx.org/tech">tech</category>
 <pubDate>Wed, 10 Jun 2009 04:21:31 -0500</pubDate>
 <dc:creator>elliot</dc:creator>
 <guid isPermaLink="false">777 at http://townx.org</guid>
</item>
</channel>
</rss>
