Archive for January, 2012

CouchDB/BigCouch Bulk Insert/Update

Friday, January 27th, 2012

While writing a bulk importer for Crossbar, I took a look at squeezing some performance out of BigCouch for the actual inserting of documents into the database. My first time running all the documents into BigCouch at the same time resulted in some poor performance, so I went digging around for some ideas on how to improve the insertions. Reading up on the High Performance Guide for CouchDB (which BigCouch is API-compliant with), I started to play with chunking my inserts up to get better overall execution time.

Note: the following are very unscientific results, but I think are fairly instructive for what one might expect.

Docs Per Insertion Elapsed Time (ms)
26618 107176
1000 8325
1500 5679
2000 3087
2500 1644
Docs Per Insertion Elapsed Time (ms)

Based on the CouchDB guide, I decided to not pursue this further, as dropping insertion time 2 orders of magnitude was fine enough for me! I may have to bake this into the platform natively.

For those interested in the Erlang code, it is pretty simple. Taking a list of documents to save, use lists:split/2 to try and split the list. By catching the error, we can know that the list is less than our threshold, and can save the remaining list to BigCouch. Otherwise, lists:split/2 chunks our list into one for saving, and one for recursing back into the function. Since we don’t really care about the results of couch_mgr:save_docs/2, we could put the calls in the second clause of the case in a spawn to speed this up (relative to the calling process).

-spec save_bulk_rates/1 :: (wh_json:json_objects()) -> no_return().
save_bulk_rates(Rates) ->
    case catch(lists:split(?MAX_BULK_INSERT, Rates)) of
        {'EXIT', _} ->
            couch_mgr:save_docs(?WH_RATES_DB, Rates);
        {Save, Cont} ->
            couch_mgr:save_docs(?WH_RATES_DB, Save),
            save_bulk_rates(Cont)
    end.

Life Update

Thursday, January 26th, 2012

Updated the blog to run 3.3.1 – lot of cobwebs around these parts. Hopefully I can be more proactive in blogging about things going on at work, and perhaps starting to write about what I’m up to personally (not that I have much of that right now). Maybe my Google stats will jump over the 0.3 hits I average! Dare to dream!

cURL stripping newlines from your CSV or other file?

Thursday, January 26th, 2012

I’m in the process of writing a REST endpoint for uploading CSVs to Crossbar as part of our communications platform at 2600hz. Not wanting to invoke the full REST client interface, I generally use cURL to send the HTTP requests. Today, however, I had quite the time figuring out why my CSV files were being stripped of their newline characters.

The initial invocation:

$> curl http://localhost:8000/v1/path/to/upload -H "Content-Type: text/csv" -X POST -d @file.csv

Walking through the code, from where I was processing the CSV down to the webserver handling the connection itself, looking for who was stripping the newlines, I determined it was coming in sans-newlines and decided to check out cURL’s man pages for what might be amiss. I quickly found that the -d option was treating the file as ascii, and although the docs don’t explicitly say so, it appears this option will strip the newlines.

The resolution is to use the –data-binary flag so cURL doesn’t touch the file before sending it to the server.