Archive for the ‘Geekdom’ Category

CouchDB/BigCouch Bulk Insert/Update

Friday, January 27th, 2012

While writing a bulk importer for Crossbar, I took a look at squeezing some performance out of BigCouch for the actual inserting of documents into the database. My first time running all the documents into BigCouch at the same time resulted in some poor performance, so I went digging around for some ideas on how to improve the insertions. Reading up on the High Performance Guide for CouchDB (which BigCouch is API-compliant with), I started to play with chunking my inserts up to get better overall execution time.

Note: the following are very unscientific results, but I think are fairly instructive for what one might expect.

Docs Per Insertion Elapsed Time (ms)
26618 107176
1000 8325
1500 5679
2000 3087
2500 1644
Docs Per Insertion Elapsed Time (ms)

Based on the CouchDB guide, I decided to not pursue this further, as dropping insertion time 2 orders of magnitude was fine enough for me! I may have to bake this into the platform natively.

For those interested in the Erlang code, it is pretty simple. Taking a list of documents to save, use lists:split/2 to try and split the list. By catching the error, we can know that the list is less than our threshold, and can save the remaining list to BigCouch. Otherwise, lists:split/2 chunks our list into one for saving, and one for recursing back into the function. Since we don’t really care about the results of couch_mgr:save_docs/2, we could put the calls in the second clause of the case in a spawn to speed this up (relative to the calling process).

-spec save_bulk_rates/1 :: (wh_json:json_objects()) -> no_return().
save_bulk_rates(Rates) ->
    case catch(lists:split(?MAX_BULK_INSERT, Rates)) of
        {'EXIT', _} ->
            couch_mgr:save_docs(?WH_RATES_DB, Rates);
        {Save, Cont} ->
            couch_mgr:save_docs(?WH_RATES_DB, Save),
            save_bulk_rates(Cont)
    end.

cURL stripping newlines from your CSV or other file?

Thursday, January 26th, 2012

I’m in the process of writing a REST endpoint for uploading CSVs to Crossbar as part of our communications platform at 2600hz. Not wanting to invoke the full REST client interface, I generally use cURL to send the HTTP requests. Today, however, I had quite the time figuring out why my CSV files were being stripped of their newline characters.

The initial invocation:

$> curl http://localhost:8000/v1/path/to/upload -H "Content-Type: text/csv" -X POST -d @file.csv

Walking through the code, from where I was processing the CSV down to the webserver handling the connection itself, looking for who was stripping the newlines, I determined it was coming in sans-newlines and decided to check out cURL’s man pages for what might be amiss. I quickly found that the -d option was treating the file as ascii, and although the docs don’t explicitly say so, it appears this option will strip the newlines.

The resolution is to use the –data-binary flag so cURL doesn’t touch the file before sending it to the server.

Cron and infinite loops do not mix

Wednesday, March 9th, 2011

More “expert” code time! From the “expert”:

Please put this script in a cron to run every minute

while true; do
  rsync -a server:remote_dir local_dir
  sleep $freq
done

local_dir is going to be really, really, really up to date after a few minutes…the server crash will be epic. Perhaps we should write a script to find and kill these rogue processes and run it every minute too, but stagger it with the other cron…

Resolving Dialyzer “Function foo/n has no local return” errors

Tuesday, November 23rd, 2010

Dialyzer is a great static analysis tool for Erlang and has helped me catch many bugs related to what types I thought I was passing to a function versus what actually gets passed. Some of the errors Dialyzer emits are rather cryptic at first (as seems commonplace in the Erlang language/environment in general) but after you understand the causes of the errors, the fix is easily recognized.

My most common error is Dialyzer inferring a different return type that what I put in my -spec, followed by Dialyzer telling me the same function has no local return. An example:

foo.erl:125: The specification for foo:init/1 states that the function might also return {'ok',tuple()} but the inferred return is none()
foo.erl:126: Function init/1 has no local return

The init/1 function (for a gen_server, btw):

124
125
126
-spec(init/1 :: (Args :: list()) -> tuple(ok, tuple())).
init(_) ->
  {ok, #state{}}.

And the state record definition:

30
31
32
33
-record(state, {
  var_1 = {} :: tuple(string(), tuple())
  ,var_2 = [] :: list(tuple(string(), tuple()))
}).

Spot the error? In the record definition, var_1 is initialized to an empty tuple and var_2 is initialized to an empty list, yet the spec typing for the record does not take that into account. The corrected version:

30
31
32
33
-record(state, {
  var_1 = {} :: tuple(string(), tuple()) | {}
  ,var_2 = [] :: list(tuple(string(), tuple())) | []
}).

And now Dialyzer stops emitting the spec error and the no local return error.

Erlang and Webmachine

Friday, April 23rd, 2010

I’m currently working on a small startup project, for one to meet a need of some acquaintances, but more importantly to learn me some Erlang with regards to the web.

While I’m further along than I actually expected to be, I thought I’d begin documenting the steps I’ve taken towards building this app.

The current nerdities I’m using:

Installation of all of these on a GNU/Linux system is pretty straightforward, so I won’t cover that here. Defaults were used for Erlang. I installed the other libraries/applications in ~/dev/erlang/lib and pointed $ERL_LIBS there in my .bashrc.

I did follow this guide for setting up Tsung. The BeeBole site has several other pages worth reading for developing web applications in Erlang.

Once installed, build the webmachine project:

$WEBMACHINE_HOME/scripts/new_webmachine.erl wm_app /path/to/root
cd /path/to/roow/wm_app
make
./start.sh

You now have a working project! Of course, I like to have my Erlang shell inside of emacs while I’m developing, so I added a comment to the start.sh script that contained the shell parameters. My start.sh looks like this:

#!/bin/sh
 
# for emacs C-c C-z flags:
# -pa ./ebin -pa ./priv/templates/ebin -boot start_sasl -s wm_app
 
cd `dirname $0`
exec erl -pa $PWD/ebin $PWD/deps/*/ebin $PWD/deps/*/deps/*/ebin $PWD/priv/templates/ebin -boot start_sasl -s wm_app

I currently have all of my dependencies in $ERL_LIBS; when I deploy this to production, I’ll add the libs to the wm_app/deps as either a symlink or copied into the directory.

To have the custom shell means you need the .emacs code to start an Erlang shell with custom flags.

Important note: If you need to specify multiple code paths in the -pa arg, you have to use a -pa for each path, unlike in the shell command version where any path after the -pa (or -pz) is added.

Another caveat: when starting the Erlang shell within emacs, if you’re currently in a erlang-related buffer (.erl, .hrl, etc), the default shell is started without the option to set flags. I typically have the start.sh open anyway to copy the flags so I don’t run into this much anymore; I’m documenting it here just in case anyone stumbles on it.

Now you have a shell within which to execute commands against your webmachine app, load updated modules, etc.

Coming up, I’ll talk about how I’m using ErlyDTL to create templates and using CouchDB/Couchbeam for the document store.

Purely Functional Data Structures and Me

Tuesday, March 23rd, 2010

Quick post to say that I’ve put my dabblings into Chris Okasaki’s Purely Functional Data Structures book up as a github repo. I am picking and choosing what exercises and examples to code, with the goal being to slowly cover the whole book. Some concessions are made to fit the ideas into Erlang (like recursively calling an anonymous function), but overall I think the ideas fit nicely.

There is a streams lib from Richard Carlsson that I found on the Erlang mailing list in the repo as well that I used for reference for a couple things in chapter 4. I stuck with my streams being represented either as an empty list [] or [term() | fun()] with term() being the calculated value and fun() being the suspension function, instead of the tuple that Richard chose. After reading further in the thread, I understand why (don’t confuse people that might use proper list functions on streams) but for my small examples, it was enough to use lists.

Problem Solved

Monday, February 22nd, 2010

In the comments section of a recent Atwood post, commentor Paul Jungwirth (search for the name as I can’t find comment permalinks) posted about a problem from a perl mailing list that he would give potential hires. This post is not about the blog post but about the problem from the comments section.

The Problem (from the mailing list):

Consider the following string:

1 2 3 4 5 6 7 8 9 = 2002

The problem is to add any number of addition & multiplication operations wherever you’d like on the left such that in the end you have a valid equation. So for example if it gets you to a solution you can have:

12 * 345 + 6 …

if that works as part of your solution [it's much too big: 4146].
Bearing in mind that multiplication takes higher precedence than addition, what is the solution?

My answer generator can be found here in Erlang. I liked the problem because it is in the vein of Project Euler. The eval/1 is a slightly modified version of this one on TrapExit.

PHP, cURL, and POST

Friday, February 12th, 2010

While working on a script today that had been working, I couldn’t for the life of me figure out why it was failing. It uses the PHP curl_* functions to make various requests and processes the results. Turns out when you send a POST body with the CURLOPT_POSTFIELDS and a value field begins with an at symbol(@), you have to escape it (\@). The reason is the at symbol is used by curl to denote a file upload path (“@/path/to/upload.file”). So escape the at symbol and you should be back to good with the curling.

Adding Files To Subversion

Wednesday, February 10th, 2010

Working with symfony, especially when adding to the schema and generating the model, form, and filter classes, it becomes tedious to add each of the new files to your subversion repository. Here’s a succinct line to add all un-versioned files to your repo:

#!/bin/sh
 
svn add `svn st | grep ? | head | awk '{print $2}'`

The key is in the tick-marked section. It takes the output of svn st(atus) and pipes it to grep, selecting only the un-versioned files (denoted by the ?), pipes that to head which outputs the first 10 (by default) lines, and pipes that to awk which prints the second column containing the file path to be added.

But what if you have more than 10 files to add? You can easily pass a -n NUM switch to the head command to increase the number of lines it reads in at a time. I’ll leave it as an exercise to the reader to modify the script to allow a user to pass in what NUM should be.

So when your “svn st” output is filled with un-versioned files, all of which you need, run this little guy and have it done speedily.

More Erlang+Emacs

Wednesday, February 3rd, 2010

I have found that Distel’s built-in shell launcher wasn’t cutting mustard as I needed to start shells with various flags and didn’t see an easy way to accomplish this using what Distel provided. Digging around the Erlang mailing list, I found an elisp function that allowed me to pass flags to the shell. Place this snippet in your .emacs file after you’ve required erlang and distel:

(defun erl-shell-with-flags (flags)
  "Start an erlang shell with flags"
  (interactive (list (read-string "Flags: ")))
  (set 'inferior-erlang-machine-options (split-string flags))
  (erlang-shell))
 
;; map Ctrl-c Ctrl-z to the new function
(global-set-key "\C-c\C-z" 'erl-shell-with-flags)

Now when you start the erlang shell, a “Flags: ” prompt will be presented. Simply add flags as you would on the command line and the shell will start up. Great for when you need multiple shells with different snames, names, cookies, etc…