Purely Functional Data Structures and Me

March 23rd, 2010

Quick post to say that I’ve put my dabblings into Chris Okasaki’s Purely Functional Data Structures book up as a github repo. I am picking and choosing what exercises and examples to code, with the goal being to slowly cover the whole book. Some concessions are made to fit the ideas into Erlang (like recursively calling an anonymous function), but overall I think the ideas fit nicely.

There is a streams lib from Richard Carlsson that I found on the Erlang mailing list in the repo as well that I used for reference for a couple things in chapter 4. I stuck with my streams being represented either as an empty list [] or [term() | fun()] with term() being the calculated value and fun() being the suspension function, instead of the tuple that Richard chose. After reading further in the thread, I understand why (don’t confuse people that might use proper list functions on streams) but for my small examples, it was enough to use lists.

Problem Solved

February 22nd, 2010

In the comments section of a recent Atwood post, commentor Paul Jungwirth (search for the name as I can’t find comment permalinks) posted about a problem from a perl mailing list that he would give potential hires. This post is not about the blog post but about the problem from the comments section.

The Problem (from the mailing list):

Consider the following string:

1 2 3 4 5 6 7 8 9 = 2002

The problem is to add any number of addition & multiplication operations wherever you’d like on the left such that in the end you have a valid equation. So for example if it gets you to a solution you can have:

12 * 345 + 6 …

if that works as part of your solution [it's much too big: 4146].
Bearing in mind that multiplication takes higher precedence than addition, what is the solution?

My answer generator can be found here in Erlang. I liked the problem because it is in the vein of Project Euler. The eval/1 is a slightly modified version of this one on TrapExit.

PHP, cURL, and POST

February 12th, 2010

While working on a script today that had been working, I couldn’t for the life of me figure out why it was failing. It uses the PHP curl_* functions to make various requests and processes the results. Turns out when you send a POST body with the CURLOPT_POSTFIELDS and a value field begins with an at symbol(@), you have to escape it (\@). The reason is the at symbol is used by curl to denote a file upload path (“@/path/to/upload.file”). So escape the at symbol and you should be back to good with the curling.

Adding Files To Subversion

February 10th, 2010

Working with symfony, especially when adding to the schema and generating the model, form, and filter classes, it becomes tedious to add each of the new files to your subversion repository. Here’s a succinct line to add all un-versioned files to your repo:

#!/bin/sh
 
svn add `svn st | grep ? | head | awk '{print $2}'`

The key is in the tick-marked section. It takes the output of svn st(atus) and pipes it to grep, selecting only the un-versioned files (denoted by the ?), pipes that to head which outputs the first 10 (by default) lines, and pipes that to awk which prints the second column containing the file path to be added.

But what if you have more than 10 files to add? You can easily pass a -n NUM switch to the head command to increase the number of lines it reads in at a time. I’ll leave it as an exercise to the reader to modify the script to allow a user to pass in what NUM should be.

So when your “svn st” output is filled with un-versioned files, all of which you need, run this little guy and have it done speedily.

More Erlang+Emacs

February 3rd, 2010

I have found that Distel’s built-in shell launcher wasn’t cutting mustard as I needed to start shells with various flags and didn’t see an easy way to accomplish this using what Distel provided. Digging around the Erlang mailing list, I found an elisp function that allowed me to pass flags to the shell. Place this snippet in your .emacs file after you’ve required erlang and distel:

(defun erl-shell-with-flags (flags)
  "Start an erlang shell with flags"
  (interactive (list (read-string "Flags: ")))
  (set 'inferior-erlang-machine-options (split-string flags))
  (erlang-shell))
 
;; map Ctrl-c Ctrl-z to the new function
(global-set-key "\C-c\C-z" 'erl-shell-with-flags)

Now when you start the erlang shell, a “Flags: ” prompt will be presented. Simply add flags as you would on the command line and the shell will start up. Great for when you need multiple shells with different snames, names, cookies, etc…

Connect to remote erlang shell while inside emacs

January 7th, 2010

While developing my top secret project, I have been getting into the fun stuff in Erlang and Emacs. Connecting to a running instance of my app from a remote shell wasn’t straightforward to me at first, so below is my documented way of connecting, as well as dropping into the Erlang JCL from within an Emacs erlang shell.

  1. Start yaws: yaws –daemon -sname appname –conf /path/to/yaws.conf
  2. Start emacs, and from within emacs start an Erlang shell with C-c C-z (assuming you have distel configured).
  3. From the Emacs erlang shell, get into Erlang’s JCL by typing C-q C-g and pressing enter. A ^G will be printed at the prompt, but won’t be evaluated until you press enter. You should see the familiar JCL prompt “User switch command –>”.
  4. Type ‘j’ to see current jobs you have running locally, which is probably just the current shell (1 {shell,start,[init]}).
  5. Type ‘r appname@compy’ to connect to the remote node identified by appname ( from the -sname parameter ) on the computer compy (usually whatever hostname returns)
  6. Type ‘j’ to see current jobs, which should list your current shell as “1 {shell,start,[init]}”, and a second shell “2* {appname@compy,shell,start,[]}”.
  7. Type ‘c 2′ to connect to the remote shell. You can now run commands in the node’s shell. You may have to press enter again to bring up a shell prompt.
james@compy 14:33:34 ~/dev/erlang/app
> yaws --daemon -sname app --conf config/yaws.conf
 
james@compy 14:34:00 ~/dev/erlang/app
> emacs
Eshell V5.7.4  (abort with ^G)
1> ^G
 
User switch command
 --> j
   1* {shell,start,[init]}
 --> r app@compy
 --> j
   1  {shell,start,[init]}
   2* {app@compy,shell,start,[]}
 --> c 2
 
1>

PHP’s json_last_error

December 28th, 2009

A quick note that I hope Google picks up concerning php’s json_last_error function. I was trying to debug a json string I was decoding with json_decode, but was getting NULL. When I tried to use the json_last_error(), a fatal undefined function error was returned. The reason: json_last_error doesn’t exist in php versions < 5.3. Ah, version numbers! So, check your php version if the function is undefined.

Simple, yet a detail easily overlooked.

Posting from Emacs

December 24th, 2009

I am posting this short message from emacs using the weblogger.el package.

Cool!

Erlang, Primes, and Sieves Again (Lazy edition)

December 17th, 2009

I was working on the Project Euler problems, specifically problem 7, and decided to implement a prime number generator based of my previous prime number implementation based off the Melissa O’Neill paper.

The problem says that, given that the 6th prime number is 13, find the 10,001st prime. Since we don’t have a known upper bound to use the primes:queue/1 function, I created a lazy sieve that would return the next prime, its position, and an iterator. The code:

43
44
45
46
47
48
49
50
51
52
53
54
-export([lazy_sieve/0]).
 
lazy_sieve() ->
    Table = insert_prime(2, skew_kv:empty()),
    [ { 2, 1 } | fun() -> lazy_sieve({ 3, 2 }, Table) end ].
 
lazy_sieve({ X, Pos }, Table) ->
    {NextComposite, _Value} = skew_kv:min(Table),
    case  NextComposite =< X of
	true -> lazy_sieve({X+1, Pos}, adjust(Table, X));
	_Else -> [ {X, Pos} | fun() -> lazy_sieve({X+1, Pos+1}, insert_prime(X,Table)) end]
    end.

This can be used thusly:

1>[{Prime1, Pos1} | Next1] = primes:lazy_sieve().
% Prime1 = 2, Pos1 = 1, Next1 = fun()
2>[{Prime2, Pos2} | Next2] = Next1().
% Prime2 = 3, Pos2 = 2, Next2 = fun()
3>[{Prime3, Pos3} | Next3] = Next2().
% Prime3 = 5, Pos3 = 3, Next3 = fun()

This makes defining a function to find the nth prime trivial:

58
59
60
61
62
63
64
65
-export([nth/1]).
 
nth(N) ->
    nth(lazy_sieve(), N).
 
nth([ {Prime, N} | _Next], N) -> Prime;
nth([ {_Prime, _Pos} | Next], N) ->
    nth(Next(), N).

Lazy evaluation twice, in the main functionality and in the value of a key in the queue. And the performance of nth is pretty good too (time is the left number, in mirco-seconds, and the 10,000st prime is the right number):

9> timer:tc(primes, nth, [10001]).
{1635839,104743} % 1.6 seconds
10> timer:tc(primes, queue, [104743]).
{1788997, % 1.7 seconds to generate the full list of primes
 [2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,
  79,83,89,97,101,103|...]}

So it takes 1.6 seconds to find the 10,001st prime and 1.7 seconds to generate the list of primes up to the 10,001st prime.

Erlang, Primes, and the Sieve of Eratosthenes

December 16th, 2009

Working through the Project Euler again and using Erlang to do so, there are quite a few problems that deal with primes. It is important, then, to have a library of functions that make generating and validating primes easy. A typical method for generating primes is using the Sieve of Eratosthenes. As discussed in the wiki article, Melissa O’Neill has shown the given Haskell implementation is not a true implementation, and shows a couple versions that are more true to the algorithm and more performant. I took the implementation she described using a priority queue on page 7 of the pdf. I used a skew heap implementation I found as the priority queue, modified it slightly to handle a Key and a Value parameter, and away I went.

Here’s the implementation I came up with:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
%% primes.erl
-module(primes).
 
-export([queue/1]).
 
queue(N) ->
    sieve_queue(lists:seq(2, N)).
 
sieve_queue([]) ->
    [];
sieve_queue([X|XS]) ->
    Table = insert_prime(X, skew_kv:empty()),
    [X | sieve_queue(XS, Table)].
 
insert_prime(P, Table) ->
    skew_kv:insert(P*P, from(P*P, P), Table).
 
sieve_queue([], _Table) ->
    [];
sieve_queue([X|XS], Table) ->
    {NextComposite, _Value} = skew_kv:min(Table),
    case  NextComposite =< X of
        true -> sieve_queue(XS, adjust(Table, X));
        _Else -> [X | sieve_queue(XS, insert_prime(X, Table))]
    end.
 
adjust(Table, X) ->
    {N, [Nprime | NS]} = skew_kv:min(Table),
    case N =< X of
        true ->
            T = skew_kv:delete_min(Table),
            T2 = skew_kv:insert(Nprime, NS(), T),
            adjust(T2, X);
        _Else -> Table
    end.
 
%% from http://www.erlang.org/cgi-bin/ezmlm-cgi?4:mss:177:khdaceipfabbicmifdhf
%% a lazy list that starts at K and increments by Inc
from(K, Inc) ->
    [K|fun()-> from(K+Inc, Inc) end].

I tried to keep variable names similar to the O’Neill implementation, but I have modified it a bit to use lazy lists, Erlang-style. Not being familiar with Haskell, I think lists there are defined lazily by default, but Erlang needs an explicit construct to do lazy lists. Fortunately, it isn’t terribly hard. So, the code!

I named the function queue because I was implementing prime number generators using other methods from the paper (the unfaithful sieve and a sieve with a map instead of a queue), hence why you need to call the function with:

1> primes:queue(10).

I don’t think I’ll expound on the why, as O’Neill does a nice job explaining it in her paper, so I’ll focus on the how and leave it to the reader to learn the why.

My insert_prime function differs from the O’Neill insertprime because mine has to account for Erlang-style lazy lists. My assumption is that the map call in O’Neill’s creates a lazy list and so all the mappings are not computed, since XS is a potentially huge list of numbers. Instead, I observed that those lists are created by the map call are arithmetic progressions that start at p*p and increase by p. My simple lazy list that accomplishes this is called from/2, which I got from a Joe Armstrong suggestion and adapted to my needs.

Let’s take a quick look at from/2. We pass in a number K and an increment Inc and return a list with K at the head and a fun as the tail. The fun wraps a call to from/2, which acts as the iterator for the lazy list. So evaluating

1> primes:from(2, 2).
[2|#Fun<primes .1.47942561>]
</primes>

Here’s an example showing the iterator in action:

57> [K|F1] = primes:from(2, 2).
[2|#Fun<primes .1.47942561>]
58> [K2|F2] = F1().
[4|#Fun</primes><primes .1.47942561>]
59> [K3|F3] = F2().
[6|#Fun</primes><primes .1.47942561>]
60> 
</primes>

This lazy list is stored in the queue as the value (line 16) for the key P*P. The lazy list is then used in the adjust/2 function when we want to compare the smallest key of the queue to the current number we’re evaluating. Line 28 extracts the K into Nprime, and the fun into NS, from the from/2 call earlier. Line 32 shows how to get the next number in the lazy list, NS(), which returns [K+Inc | fun()], and inserts it as the value into the queue. Again, for the why, consult the O’Neill paper.

Otherwise, the implementation is roughly the same. There are some improvements and optimizations that could be made, I’m sure, but for a rough draft, it is fairly speedy, minimal memory usage, and for Project Euler, it is more than adequate in producing the primes I need to solve the problems.