Org-Babel and Erlang Escripts

I recently wrote an escript for work that used the wonderful getopt application for processing command-line arguments. Part of that script includes printing the script usage if passed the `-h` or `–help` flag.

As part of my initiative to use orgmode's literate programming capabilities, I wrote a `README.org` for the script that described how to install pre-requisites, what the script does, etc. I also wanted to include the usage text as part of the doc, since part of what I've been doing was adding more flags to tune the operations of the script.

The tricky thing is that the real usage text and what was in the doc frequently grows apart as development occurs. Wouldn't it be great to have the README.org stay in sync with what the current usage flags are?

Org-babel SRC blocks to the rescue!

With the appropriate SRC block, you can capture the results as part of the doc when exporting (or just capture the results in place) to a new format.

For my use case, I used a simple shell SRC block (with appropriate headers) to run the script with the appropriate flag:

./my_escript -h

Evaluating the SRC block with `C-cC-c` creates the attending `#+RESULTS:` section below. Except…it was empty. I couldn't figure out why the script's output wasn't showing up. Simple shell commands like `pwd` showed up just fine. What do?

STDERR vs STDOUT

Ah yes; it turns out that, by default, getopt prints usage to stderr instead of stdout. A quick fix to call getopt:usage/3 with `standardin` as the third argument and the results showed up as expected.

I'm sure there are some other tricks to getting org-babel to get stderr output into the results but from my cursory searching it looked a bit more convoluted and not worth the time when I could fix it with such a small change.

Formatting tangled output in org-mode

I have been creating training materials to help people get comfortable with Kazoo. Part of those training materials include Erlang source code, JSON schemas, CouchDB design documents, and more. I have been using a literate programming style to create these materials, which include source blocks for the Erlang or JSON pieces. However, when "tangling" (extracting) the source blocks into their own files, the formatting (especially for the JSON) was terrible.

I immediately saw that ob-tangle.el has a hook, org-babel-post-tangle-hook, that executes after the tangling has occurred. What I didn't really understand, and took many false starts to figure out, was how this hook executed and in what context.

What I learned about org-babel-post-tangle-hook

After many false starts and brute-force attempts to make it work (so many lambdas), I took a day away from the issue and came back with both fresh eyes and a beginner's heart.

Here's the process of tangling, as far as I have gleaned thus far:

  1. run `org-babel-tangle` (`C-cC-vt` for most)
  2. The source blocks are tangled into the configured file(s) and saved
  3. When there are functions added to `org-babel-post-tangle-hook`, a buffer is opened with the contents of the file and the functions are mapped over.
  4. Once the functions have finished, the buffer is killed.

Read the source, Luke

Looking at the source:

(when org-babel-post-tangle-hook
  (mapc
   (lambda (file)
     (org-babel-with-temp-filebuffer file
       (run-hooks 'org-babel-post-tangle-hook)))
   (mapcar #'car path-collector)))

First, we see that nothing will be done if org-babel-post-tangle-hook is nil. `(mapcar #'car path-collector)` takes the `path-collector` list and applies `#'car` to each element (I'm not sure what `#'` before `car` is for). The resulting list will be used by `mapc` which we can read is like `lists:foreach/2` in Erlang - applying a function to each element in the list for its side-effects. That anonymous function (lambda) takes the element from the list (the filename I believe) and calls `org-babel-with-temp-filebuffer` with that and an expression for running the hooks.

Summarizing, if there are hooks to be run, call a function for each file that was tangled.

So what does 'org-babel-with-temp-filebuffer` do? From the lovely Help for the function: "Open FILE into a temporary buffer execute BODY there like ‘progn’, then kill the FILE buffer returning the result of evaluating BODY."

(defmacro org-babel-with-temp-filebuffer (file &rest body)
  "Open FILE into a temporary buffer execute BODY there like
`progn', then kill the FILE buffer returning the result of
evaluating BODY."
  (declare (indent 1))
  (let ((temp-path (make-symbol "temp-path"))
	(temp-result (make-symbol "temp-result"))
	(temp-file (make-symbol "temp-file"))
	(visited-p (make-symbol "visited-p")))
    `(let* ((,temp-path ,file)
	    (,visited-p (get-file-buffer ,temp-path))
	    ,temp-result ,temp-file)
       (org-babel-find-file-noselect-refresh ,temp-path)
       (setf ,temp-file (get-file-buffer ,temp-path))
       (with-current-buffer ,temp-file
	 (setf ,temp-result (progn ,@body)))
       (unless ,visited-p (kill-buffer ,temp-file))
       ,temp-result)))

Here we get deeper into Emacs Lisp than I really know, so I'll shoot a bit in the dark (dusk perhaps) about the functionality. `file` is our tangled file from the lambda and `body` is the `run-hooks` expression (not yet evaluated!). Basically, this code loads the contents of `file` into a buffer and folds `body` over that buffer. When finished, it kills the buffer.

Killed, you say?

Yes! So any formatting you may have applied to that temporary buffer is lost.

What to do?

We need a way to format the buffer, in a mode-aware way, and have that persist to the tangled file before the buffer is killed at the end of processing the hook.

The blessing and curse of Emacs is that all is available if you have the time and inclination! :)

The current implementation

A couple pieces of the puzzle, arrived at independently, put me on the right path.

First, with all of my Kazoo work, I wanted to ensure that the source files were all properly indented according to our formatting tool (which, of course, uses Erlang-mode's formatting!). Using a couple hooks to accomplish this gave me:

(add-hook 'Erlang-mode-hook
	  (lambda ()
	    (add-hook 'before-save-hook 'Erlang-indent-current-buffer nil 'make-it-local)))

So now I have an Erlang-mode specific `before-save-hook` action to indent the buffer prior to saving the buffer to the file.

I can, in turn, apply a similar hook into the js-mode (or js2-mode or JSON-mode) and use `JSON-pretty-print-buffer-ordered` to format the buffer. As long as I have a function that formats the current buffer properly, I can create the `before-save-hook` to ensure the buffer has formatting applied prior to saving.

The final piece was to figure out how to tie all this into `org-babel-post-tangle-hook`:

(defun my/run-before-save-hooks ()
  (run-hooks 'before-save-hook)
  (save-buffer)
  )

(add-hook 'org-babel-post-tangle-hook 'my/run-before-save-hooks)

What I finally came to was that, now that I had hooks available before saving for each major-mode I was interested in, and the buffer opened after tangling had the associated major-mode applied, all I needed was a way to run the `before-save-hook` hook and save the buffer before ceding control back to the `org-babel-post-tangle-hook` and `org-babel-with-temp-filebuffer` progn.

I am pretty happy with the result as so far I haven't encountered any glaring issues or performance problems. I hope others will find this useful, provide feedback on better, more idiomatic ways to accomplish the task, but overall I'm happy with the solution.

Unexpected wins

For whatever reason, when creating my .Emacs and attending customizations, I was mostly copy/pasting snippets I found (chalk it up to those dark PHP days). I didn't really take the time to grok what was happening, and most were `setq` or `define-key` anyway, so pretty simplistic. I haven't worked with Lisp in any real capacity since college (over a decade ago now - yikes) but with the last 8 or so years immersed in Erlang, I was familiar and comfortable with functional programming.

Having to dig into these functions because no one had written an easy guide to copy/paste was just the kick I needed to realize that, holy cow, I really did kind of understand what's going on here! There was a specific moment where I was ruminating on the code running the post-tangle hook, and reading the `mapc` description "Apply FUNCTION to each element of SEQUENCE for side effects only." clicked in my head as "Hey, that's lists:foreach/2"!

Navigating Emacs' help system, specifically 'describe function' (`C-hf`), has also been a boon, and I am more than grateful to the developers who've not only written the docs but the foresight to have it so easily accessible in Emacs. It has given me the genesis of an idea for a non-trivial Elisp project I want to attempt.

Finally

I haven't had a challenge like this in quite some time. Granted, on the surface it is a pretty silly, simple thing to have accomplished. On its own merit, probably not worth the time I spent. But the confidence gained in at least reading and comprehending Elisp code a little more fully, and the spark of an idea to try out, should more than compensate for that time. Now I just need to follow through and hopefully contribute a little back to the Emacs community.

The Great Migration of 2016

The Great Migration of 2016

It has been a long time since I've blogged for fun. A lot has changed and a lot has remained.

It is my goal to start writing more about what I'm up to, as much for an archive for my kids, family, and friends to read as it is to just flex the writing muscles. Most posts will continue on the nerdy theme as relates to computers and programming. However, I do plan to write up summaries about activities that involve the family and friends, for when nostalgia or curiosity about a time in life comes up.

Initially, though, I will start to write up a series of posts about how I'm consolidating as much as possible of my digital life into servers and services that I run, and using Emacs to interact with those services as much as possible.

Blog migration

I was a happy Wordpress user and developer when blogging first became a "thing". Cranking out plugins helped pay the bills out of college and blogging about technical things is ostensibly why I got a job at 2600Hz in 2010.

There are certainly ways to interact with Wordpress installations via Emacs but in the end, I wasn't happy with them and wanted something more streamlined. Through some series of events, I came across Nikola and appreciated the minimalist nature of the default installation, the ease with which I migrated existing posts from Wordpress, and the ability to manage posts using Emacs' org-mode, with which I've made a conscious effort to learn this year as well.

Git migration

Part of the appeal is that I can now put my posts, as they're static files, into version control. I've setup Gogs on my server and am transitioning my personal repos to it (and off of GitHub). I also have the full power of the command line (grep, awk, sed, etc) to work with my blog's corpus.

Going forward

I'm excited by the prospects of these (and other) changes. My goal has been to reduce the applications I use with regularity to two: Emacs and a browser. The more I can accomplish in Emacs, the less friction there is to me getting things done, which is part of why I'm so excited. Emacs is a tool that has gotten out of my way to the point that I don't even think about most keybindings I use. Emacs has become a natural extension of my thought process, and as long as my fingers can keep up with my mind, there's no impedance from my editor.

Hopefully this is the restart of my blog; no excuses aside from laziness now!

Better Blogging via Emacs

Finally trying to figure out how to get back into writing a little more consistently. Since I spend so much time in Emacs, blogging from Emacs might help in that endeavor. To that end, I've installed [org2blog](https://github.com/punchagan/org2blog) via [this blog post](http://blog.gabrielsaldana.org/post-to-wordpress-blogs-with-emacs-org-mode/). One caveat, you'll need to download xml-rpc.el from http://launchpad.net/xml-rpc-el and add (require 'xml-rpc) to your .emacs file.

Using ibrowse to POST form data

It is not immediately obvious how to use ibrowse to send an HTTP POST request with form data (perhaps to simulate a web form post). Turns out its pretty simple:

ibrowse:send_req(URI, [{"Content-Type", "application/x-www-form-urlencoded"}], post, FormData)

Where URI is where you want to send the request ("http://some.server.com/path/to/somewhere.php") and FormData is an iolist() of URL-encoded values ("foo=bar&fizz=buzz"). There's obviously a lot more that can be done, but for a quick snippet, this is pretty sweet.

Emulating Webmachine's {halt, StatusCode} in Cowboy

At 2600Hz, we recently converted our REST webserver from Mochiweb/Webmachine to Cowboy, with cowboy\http\rest giving us a comparable API to process our REST requests with. One feature that was missing, however, was an equivalent to Webmachine's {halt, StatusCode} return. While there has been chatter about adding this to cowboy\http\rest, we've got a function that emulates the behaviour pretty well (this is cleaned up a bit from our actual function, removing project-specific details).

-spec halt/4 :: (#http_req{}, integer(), iolist(), #state{}) -> {'halt', #http_req{}, #state{}}.
halt(Req0, StatusCode, RespContent, State) ->
    {ok, Req1} = cowboy_http_req:set_resp_body(Content, Req0),
    {ok, Req2} = cowboy_http_req:reply(StatusCode, Req1),
    {halt, Req2, State}.

Obviously you can omit setting the response body if you don't plan to return one.

CouchDB/BigCouch Bulk Insert/Update

While writing a bulk importer for Crossbar, I took a look at squeezing some performance out of BigCouch for the actual inserting of documents into the database. My first time running all the documents into BigCouch at the same time resulted in some poor performance, so I went digging around for some ideas on how to improve the insertions. Reading up on the High Performance Guide for CouchDB (which BigCouch is API-compliant with), I started to play with chunking my inserts up to get better overall execution time. Note: the following are very unscientific results, but I think are fairly instructive for what one might expect.

Docs Per Insertion Elapsed Time (ms)
26618 107176
1000 8325
1500 5679
2000 3087
2500 1644
Docs Per Insertion Elapsed Time (ms)

Based on the CouchDB guide, I decided to not pursue this further, as dropping insertion time 2 orders of magnitude was fine enough for me! I may have to bake this into the platform natively. For those interested in the Erlang code, it is pretty simple. Taking a list of documents to save, use lists:split/2 to try and split the list. By catching the error, we can know that the list is less than our threshold, and can save the remaining list to BigCouch. Otherwise, lists:split/2 chunks our list into one for saving, and one for recursing back into the function. Since we don't really care about the results of couch\mgr:save\docs/2, we could put the calls in the second clause of the case in a spawn to speed this up (relative to the calling process).

-spec save_bulk_rates/1 :: (wh_json:json_objects()) -> no_return().
save_bulk_rates(Rates) ->
    case catch(lists:split(?MAX_BULK_INSERT, Rates)) of
        {'EXIT', _} ->
            couch_mgr:save_docs(?WH_RATES_DB, Rates);
        {Save, Cont} ->
            couch_mgr:save_docs(?WH_RATES_DB, Save),
            save_bulk_rates(Cont)
    end.

Life Update

Updated the blog to run 3.3.1 - lot of cobwebs around these parts. Hopefully I can be more proactive in blogging about things going on at work, and perhaps starting to write about what I'm up to personally (not that I have much of that right now). Maybe my Google stats will jump over the 0.3 hits I average! Dare to dream!

cURL stripping newlines from your CSV or other file?

I'm in the process of writing a REST endpoint for uploading CSVs to Crossbar as part of our communications platform at 2600hz. Not wanting to invoke the full REST client interface, I generally use cURL to send the HTTP requests. Today, however, I had quite the time figuring out why my CSV files were being stripped of their newline characters. The initial invocation:

$> curl http://localhost:8000/v1/path/to/upload -H "Content-Type: text/csv" -X POST -d @file.csv

Walking through the code, from where I was processing the CSV down to the webserver handling the connection itself, looking for who was stripping the newlines, I determined it was coming in sans-newlines and decided to check out cURL's man pages for what might be amiss. I quickly found that the -d option was treating the file as ascii, and although the docs don't explicitly say so, it appears this option will strip the newlines. The resolution is to use the –data-binary flag so cURL doesn't touch the file before sending it to the server.

Cron and infinite loops do not mix

More "expert" code time! From the "expert":

Please put this script in a cron to run every minute

while true; do
  rsync -a server:remote_dir local_dir
  sleep $freq
done

local\dir is going to be really, really, really up to date after a few minutes…the server crash will be epic. Perhaps we should write a script to find and kill these rogue processes and run it every minute too, but stagger it with the other cron…