Reverse binaries in Erlang

James Aimonetti

2025-03-26 17:06

Reversing binaries in Erlang

In KAZOO, we have a kz_binary module that performs various functions over binary() data (typically string-ish stuff). Most of this code came before the string module was introduced in OTP 20 (for those that may not be aware, KAZOO started on R15B03 in 2010).

Recently we encountered an issue in a production cluster where a large blob of JSON was being processed and kz_binary:strip_right/2 was called to remove trailing endlines (why is a longer story related to changes in how we are receiving data from an external service that haven't been reflected yet in KAZOO's reception of the data) and getting stuck.

strip_right reverses the binary() and removes the supplied character, then reverses the result back. The existing reverse/1 implementation was cribbed from a Stack Overflow answer:

Existing implementation

reverse(Binary) ->
    Size = byte_size(Binary) * 8,
    <<X:Size/little-integer>> = Binary,
    <<X:Size/big-integer>>.

This works great until Binary gets large. In testing, if the binary() is large (2²² (4194304) bits or 2¹⁹ (524288) bytes), this binary matching will fail. Coincidentally, the JSON blob was 562868 bytes and triggered the crash.

After updating the tests to include a large enough binary to stress the code, I tried a couple different approaches before settling on chunking the binary input up, reversing the chunks into an accumulator, and converting that list of chunks back into a binary.

New implementation - reverse the chunks

%% for large binaries, we need to chunk them up and reverse them
%% REV_CHUNK was chosen because #reasons (through experiments)
-define(REV_CHUNK, 8192).

-spec reverse(binary()) -> binary().
reverse(<<Front:?REV_CHUNK/integer-little, Rest/binary>>) ->
    list_to_binary(reverse(Rest, [<<Front:?REV_CHUNK/integer-big>>]));
reverse(Binary) ->
    %% this fails for large binaries
    Size = bit_size(Binary),
    <<X:Size/little-integer>> = Binary,
    <<X:Size/big-integer>>.

%% side note: it was observed that creating a binary as the
%% accumulator caused execution time to range 30-60ms while using the
%% list as accumulator and converting to binary at the end caused
%% execution time to range 1-6ms typically, hence the list of binaries here
reverse(<<Front:?REV_CHUNK/integer-little, Rest/binary>>, Acc) ->
    reverse(Rest, [<<Front:?REV_CHUNK/integer-big>> | Acc]);
reverse(Rest, Acc) ->
    [reverse(Rest) | Acc].

REV_CHUNK was chosen through some experiments with timer:tc/3 running but there wasn't that much variability. A better perf test might settle on a better power of 2 but at the time, no meaningful reduction in time was observed on the dev machine used.

One other interesting note was using a list for the accumulator vs constructing the resulting binary (eg <<Front/binary, Acc/binary>> in reverse/2) was much faster.

Do note that this approach doesn't take into account Unicode stuff like grapheme clusters (see the Erlang string description and Using Unicode in Erlang for more). Caveat emptor!

Why I enjoy Erlang

I want to highlight two bits of the code above that I appreciate about Erlang that may not be apparent to some.

Clause ordering
First, the order of clauses in the reverse/1 provide some assurances:
- We know that any binary() input that has ?REV_CHUNK bytes (or more) will be handled in the first clause
- Because of that, we know that Binary in the second clause must be smaller than ?REV_CHUNK bytes.
This is a powerful approach I use a lot in Erlang - using clauses to match arguments as specifically as possible. The code is self-documenting on how to handle large vs small binaries (and what "large" is defined as). Recursive functions are easily defined by stating the base case in the first clause (or clauses) and the recursive part as the last clause.

For instance, also from kz_binary:
```
pos(Char, Bin) ->
    pos(Char, Bin, 0).

pos(_Char, <<>>, _Pos) -> -1;
pos(Char, <<Char, _/binary>>, N) -> N;
pos(Char, <<_, Bin/binary>>, N) ->
    pos(Char, Bin, N + 1).
```
1. First clause of pos/3 says we reached the end of Bin and didn't find Char
2. Second clause says we found Char and return the position it was found
3. Third clause does the recursion, taking the first byte off Bin and incrementing the position

List construction via prepend

Prepending elements onto a list is fast and gives Erlang a natural stack data structure. If we look at a trace of how reverse/2 works we can see how elegantly the list construction helps create the solution:

With REV_CHUNK=32 (4 8-byte chunks) when calling kz_binary:reverse(<<"abcdefghijkl">>):

Function	Front	Rest	Acc
rev/1	1684234849	`<<"efghijkl">>`
rev/2	1751606885	`<<"ijkl">>`	`[<<"dcba">>]`
rev/2	1818978921	`<<>>`	`[<<"hgfe">>,<<"dcba">>]`
rev/2		`<<>>`	`[<<"lkji">>,<<"hgfe">>,<<"dcba">>]`

list_to_binary([<<"lkji">>,<<"hgfe">>,<<"dcba">>]) => <<"lkjihgfedcba">>

Of course, this is a manually-written fold. Conceptually the code works the same as:

reverse(Binary) ->
    lists:foldl(fun(<<Nib:?REV_CHUNK/big>>, Acc) -> [Nib | Acc] end
	       ,[]
	       ,[Bin || <<Bin:?REV_CHUNK/little>> <= Binary] % obviously assumes binary has even number of REV_CHUNK chunks
	       ).

Alternative approaches

I looked at some other approaches and found them to be lacking.

Simple reverse using binary accumulator:

%% this one took 9007807 (or 9 seconds) on the big input
reverse(Binary) ->
    reverse(Binary, <<>>).

reverse(<<>>, Acc) -> Acc;
reverse(<<H:1/binary, Rest/binary>>, Acc) ->
    reverse(Rest, <<H/binary, Acc/binary>>).

Using lists:reverse/1:

%% this took a range of 15..64 ms on the big input
reverse(Binary) ->
    list_to_binary(lists:reverse(binary_to_list(Binary))).

2025-03-28: Newer version of `reverse/1`

Of course, while writing this post, I was doing some more reading in the Erlang docs and found that, apparently since R14B, the binary module contains encode_unsigned/2 and decode_unsigned/2 functions!

reverse(Binary) ->
    binary:encode_unsigned(binary:decode_unsigned(Bin, 'big'), 'little').

How lovely! I always like when we can leverage core Erlang code over home-rolled solutions. We have more than enough in KAZOO and I hope we can continue to weed it out where approriate.