Formatting tangled output in org-mode

I have been creating training materials to help people get comfortable with Kazoo. Part of those training materials include Erlang source code, JSON schemas, CouchDB design documents, and more. I have been using a literate programming style to create these materials, which include source blocks for the Erlang or JSON pieces. However, when "tangling" (extracting) the source blocks into their own files, the formatting (especially for the JSON) was terrible.

I immediately saw that ob-tangle.el has a hook, org-babel-post-tangle-hook, that executes after the tangling has occurred. What I didn't really understand, and took many false starts to figure out, was how this hook executed and in what context.

What I learned about org-babel-post-tangle-hook

After many false starts and brute-force attempts to make it work (so many lambdas), I took a day away from the issue and came back with both fresh eyes and a beginner's heart.

Here's the process of tangling, as far as I have gleaned thus far:

  1. run `org-babel-tangle` (`C-cC-vt` for most)
  2. The source blocks are tangled into the configured file(s) and saved
  3. When there are functions added to `org-babel-post-tangle-hook`, a buffer is opened with the contents of the file and the functions are mapped over.
  4. Once the functions have finished, the buffer is killed.

Read the source, Luke

Looking at the source:

(when org-babel-post-tangle-hook
  (mapc
   (lambda (file)
     (org-babel-with-temp-filebuffer file
       (run-hooks 'org-babel-post-tangle-hook)))
   (mapcar #'car path-collector)))

First, we see that nothing will be done if org-babel-post-tangle-hook is nil. `(mapcar #'car path-collector)` takes the `path-collector` list and applies `#'car` to each element (I'm not sure what `#'` before `car` is for). The resulting list will be used by `mapc` which we can read is like `lists:foreach/2` in Erlang - applying a function to each element in the list for its side-effects. That anonymous function (lambda) takes the element from the list (the filename I believe) and calls `org-babel-with-temp-filebuffer` with that and an expression for running the hooks.

Summarizing, if there are hooks to be run, call a function for each file that was tangled.

So what does 'org-babel-with-temp-filebuffer` do? From the lovely Help for the function: "Open FILE into a temporary buffer execute BODY there like ‘progn’, then kill the FILE buffer returning the result of evaluating BODY."

(defmacro org-babel-with-temp-filebuffer (file &rest body)
  "Open FILE into a temporary buffer execute BODY there like
`progn', then kill the FILE buffer returning the result of
evaluating BODY."
  (declare (indent 1))
  (let ((temp-path (make-symbol "temp-path"))
	(temp-result (make-symbol "temp-result"))
	(temp-file (make-symbol "temp-file"))
	(visited-p (make-symbol "visited-p")))
    `(let* ((,temp-path ,file)
	    (,visited-p (get-file-buffer ,temp-path))
	    ,temp-result ,temp-file)
       (org-babel-find-file-noselect-refresh ,temp-path)
       (setf ,temp-file (get-file-buffer ,temp-path))
       (with-current-buffer ,temp-file
	 (setf ,temp-result (progn ,@body)))
       (unless ,visited-p (kill-buffer ,temp-file))
       ,temp-result)))

Here we get deeper into Emacs Lisp than I really know, so I'll shoot a bit in the dark (dusk perhaps) about the functionality. `file` is our tangled file from the lambda and `body` is the `run-hooks` expression (not yet evaluated!). Basically, this code loads the contents of `file` into a buffer and folds `body` over that buffer. When finished, it kills the buffer.

Killed, you say?

Yes! So any formatting you may have applied to that temporary buffer is lost.

What to do?

We need a way to format the buffer, in a mode-aware way, and have that persist to the tangled file before the buffer is killed at the end of processing the hook.

The blessing and curse of Emacs is that all is available if you have the time and inclination! :)

The current implementation

A couple pieces of the puzzle, arrived at independently, put me on the right path.

First, with all of my Kazoo work, I wanted to ensure that the source files were all properly indented according to our formatting tool (which, of course, uses Erlang-mode's formatting!). Using a couple hooks to accomplish this gave me:

(add-hook 'Erlang-mode-hook
	  (lambda ()
	    (add-hook 'before-save-hook 'Erlang-indent-current-buffer nil 'make-it-local)))

So now I have an Erlang-mode specific `before-save-hook` action to indent the buffer prior to saving the buffer to the file.

I can, in turn, apply a similar hook into the js-mode (or js2-mode or JSON-mode) and use `JSON-pretty-print-buffer-ordered` to format the buffer. As long as I have a function that formats the current buffer properly, I can create the `before-save-hook` to ensure the buffer has formatting applied prior to saving.

The final piece was to figure out how to tie all this into `org-babel-post-tangle-hook`:

(defun my/run-before-save-hooks ()
  (run-hooks 'before-save-hook)
  (save-buffer)
  )

(add-hook 'org-babel-post-tangle-hook 'my/run-before-save-hooks)

What I finally came to was that, now that I had hooks available before saving for each major-mode I was interested in, and the buffer opened after tangling had the associated major-mode applied, all I needed was a way to run the `before-save-hook` hook and save the buffer before ceding control back to the `org-babel-post-tangle-hook` and `org-babel-with-temp-filebuffer` progn.

I am pretty happy with the result as so far I haven't encountered any glaring issues or performance problems. I hope others will find this useful, provide feedback on better, more idiomatic ways to accomplish the task, but overall I'm happy with the solution.

Unexpected wins

For whatever reason, when creating my .Emacs and attending customizations, I was mostly copy/pasting snippets I found (chalk it up to those dark PHP days). I didn't really take the time to grok what was happening, and most were `setq` or `define-key` anyway, so pretty simplistic. I haven't worked with Lisp in any real capacity since college (over a decade ago now - yikes) but with the last 8 or so years immersed in Erlang, I was familiar and comfortable with functional programming.

Having to dig into these functions because no one had written an easy guide to copy/paste was just the kick I needed to realize that, holy cow, I really did kind of understand what's going on here! There was a specific moment where I was ruminating on the code running the post-tangle hook, and reading the `mapc` description "Apply FUNCTION to each element of SEQUENCE for side effects only." clicked in my head as "Hey, that's lists:foreach/2"!

Navigating Emacs' help system, specifically 'describe function' (`C-hf`), has also been a boon, and I am more than grateful to the developers who've not only written the docs but the foresight to have it so easily accessible in Emacs. It has given me the genesis of an idea for a non-trivial Elisp project I want to attempt.

Finally

I haven't had a challenge like this in quite some time. Granted, on the surface it is a pretty silly, simple thing to have accomplished. On its own merit, probably not worth the time I spent. But the confidence gained in at least reading and comprehending Elisp code a little more fully, and the spark of an idea to try out, should more than compensate for that time. Now I just need to follow through and hopefully contribute a little back to the Emacs community.