Erlang, Primes, and the Sieve of Eratosthenes

Working through the Project Euler again and using Erlang to do so, there are quite a few problems that deal with primes. It is important, then, to have a library of functions that make generating and validating primes easy. A typical method for generating primes is using the Sieve of Eratosthenes. As discussed in the wiki article, Melissa O'Neill has shown the given Haskell implementation is not a true implementation, and shows a couple versions that are more true to the algorithm and more performant. I took the implementation she described using a priority queue on page 7 of the pdf. I used a skew heap implementation I found as the priority queue, modified it slightly to handle a Key and a Value parameter, and away I went. Here's the implementation I came up with:

%% primes.erl
-module(primes).

-export([queue/1]).

queue(N) ->
    sieve_queue(lists:seq(2, N)).

sieve_queue([]) ->
    [];
sieve_queue([X|XS]) ->
    Table = insert_prime(X, skew_kv:empty()),
    [X | sieve_queue(XS, Table)].

insert_prime(P, Table) ->
    skew_kv:insert(P*P, from(P*P, P), Table).

sieve_queue([], _Table) ->
    [];
sieve_queue([X|XS], Table) ->
    {NextComposite, _Value} = skew_kv:min(Table),
    case  NextComposite =< X of
        true -> sieve_queue(XS, adjust(Table, X));
        _Else -> [X | sieve_queue(XS, insert_prime(X, Table))]
    end.

adjust(Table, X) ->
    {N, [Nprime | NS]} = skew_kv:min(Table),
    case N =< X of
        true ->
            T = skew_kv:delete_min(Table),
            T2 = skew_kv:insert(Nprime, NS(), T),
            adjust(T2, X);
        _Else -> Table
    end.

%% from http://www.erlang.org/cgi-bin/ezmlm-cgi?4:mss:177:khdaceipfabbicmifdhf
%% a lazy list that starts at K and increments by Inc
from(K, Inc) ->
    [K|fun()-> from(K+Inc, Inc) end].

I tried to keep variable names similar to the O'Neill implementation, but I have modified it a bit to use lazy lists, Erlang-style. Not being familiar with Haskell, I think lists there are defined lazily by default, but Erlang needs an explicit construct to do lazy lists. Fortunately, it isn't terribly hard. So, the code! I named the function queue because I was implementing prime number generators using other methods from the paper (the unfaithful sieve and a sieve with a map instead of a queue), hence why you need to call the function with:

1> primes:queue(10).

I don't think I'll expound on the why, as O'Neill does a nice job explaining it in her paper, so I'll focus on the how and leave it to the reader to learn the why. My insert\_prime function differs from the O'Neill insertprime because mine has to account for Erlang-style lazy lists. My assumption is that the map call in O'Neill's creates a lazy list and so all the mappings are not computed, since XS is a potentially huge list of numbers. Instead, I observed that those lists are created by the map call are arithmetic progressions that start at p*p and increase by p. My simple lazy list that accomplishes this is called from/2, which I got from a Joe Armstrong suggestion and adapted to my needs. Let's take a quick look at from/2. We pass in a number K and an increment Inc and return a list with K at the head and a fun as the tail. The fun wraps a call to from/2, which acts as the iterator for the lazy list. So evaluating

1> primes:from(2, 2).
[2|#Fun]

Here's an example showing the iterator in action:

57> [K|F1] = primes:from(2, 2).
[2|#Fun]
58> [K2|F2] = F1().
[4|#Fun]
59> [K3|F3] = F2().
[6|#Fun]
60>

This lazy list is stored in the queue as the value (line 16) for the key P*P. The lazy list is then used in the adjust/2 function when we want to compare the smallest key of the queue to the current number we're evaluating. Line 28 extracts the K into Nprime, and the fun into NS, from the from/2 call earlier. Line 32 shows how to get the next number in the lazy list, NS(), which returns [K+Inc | fun()], and inserts it as the value into the queue. Again, for the why, consult the O'Neill paper. Otherwise, the implementation is roughly the same. There are some improvements and optimizations that could be made, I'm sure, but for a rough draft, it is fairly speedy, minimal memory usage, and for Project Euler, it is more than adequate in producing the primes I need to solve the problems.

Chewing the Cud

Erlang, Primes, and the Sieve of Eratosthenes