While working on a script today that had been working, I couldn't for the life of me figure out why it was failing. It uses the PHP curl\* functions to make various requests and processes the results. Turns out when you send a POST body with the CURLOPT\POSTFIELDS and a value field begins with an at symbol(@), you have to escape it (\@). The reason is the at symbol is used by curl to denote a file upload path ("@/path/to/upload.file"). So escape the at symbol and you should be back to good with the curling.

Adding Files To Subversion

Working with symfony, especially when adding to the schema and generating the model, form, and filter classes, it becomes tedious to add each of the new files to your subversion repository. Here's a succinct line to add all un-versioned files to your repo:


svn add `svn st | grep ? | head | awk '{print $2}'`

The key is in the tick-marked section. It takes the output of svn st(atus) and pipes it to grep, selecting only the un-versioned files (denoted by the ?), pipes that to head which outputs the first 10 (by default) lines, and pipes that to awk which prints the second column containing the file path to be added. But what if you have more than 10 files to add? You can easily pass a -n NUM switch to the head command to increase the number of lines it reads in at a time. I'll leave it as an exercise to the reader to modify the script to allow a user to pass in what NUM should be. So when your "svn st" output is filled with un-versioned files, all of which you need, run this little guy and have it done speedily.

More Erlang+Emacs

I have found that Distel's built-in shell launcher wasn't cutting mustard as I needed to start shells with various flags and didn't see an easy way to accomplish this using what Distel provided. Digging around the Erlang mailing list, I found an elisp function that allowed me to pass flags to the shell. Place this snippet in your .emacs file after you've required erlang and distel:

(defun erl-shell-with-flags (flags)
  "Start an erlang shell with flags"
  (interactive (list (read-string "Flags: ")))
  (set 'inferior-erlang-machine-options (split-string flags))

;; map Ctrl-c Ctrl-z to the new function
(global-set-key "\C-c\C-z" 'erl-shell-with-flags)

Now when you start the erlang shell, a "Flags: " prompt will be presented. Simply add flags as you would on the command line and the shell will start up. Great for when you need multiple shells with different snames, names, cookies, etc…

Connect to remote erlang shell while inside emacs

While developing my top secret project, I have been getting into the fun stuff in Erlang and Emacs. Connecting to a running instance of my app from a remote shell wasn't straightforward to me at first, so below is my documented way of connecting, as well as dropping into the Erlang JCL from within an Emacs erlang shell.

  1. Start yaws: yaws –daemon -sname appname –conf /path/to/yaws.conf
  2. Start emacs, and from within emacs start an Erlang shell with C-c C-z (assuming you have distel configured).
  3. From the Emacs erlang shell, get into Erlang's JCL by typing C-q C-g and pressing enter. A \G will be printed at the prompt, but won't be evaluated until you press enter. You should see the familiar JCL prompt "User switch command –>".
  4. Type 'j' to see current jobs you have running locally, which is probably just the current shell (1 {shell,start,[init]}).
  5. Type 'r appname@compy' to connect to the remote node identified by appname ( from the -sname parameter ) on the computer compy (usually whatever hostname returns)
  6. Type 'j' to see current jobs, which should list your current shell as "1 {shell,start,[init]}", and a second shell "2* {appname@compy,shell,start,[]}".
  7. Type 'c 2' to connect to the remote shell. You can now run commands in the node's shell. You may have to press enter again to bring up a shell prompt.
james@compy 14:33:34 ~/dev/erlang/app
> yaws --daemon -sname app --conf config/yaws.conf

james@compy 14:34:00 ~/dev/erlang/app
> emacs
Eshell V5.7.4  (abort with ^G)
1> ^G

User switch command
 --> j
   1* {shell,start,[init]}
 --> r app@compy
 --> j
   1  {shell,start,[init]}
   2* {app@compy,shell,start,[]}
 --> c 2


PHP's json_last_error

A quick note that I hope Google picks up concerning php's json\last\error function. I was trying to debug a json string I was decoding with json\decode, but was getting NULL. When I tried to use the json\last\error(), a fatal undefined function error was returned. The reason: json\last\error doesn't exist in php versions < 5.3. Ah, version numbers! So, check your php version if the function is undefined. Simple, yet a detail easily overlooked.

Erlang, Primes, and Sieves Again (Lazy edition)

I was working on the Project Euler problems, specifically problem 7, and decided to implement a prime number generator based of my previous prime number implementation based off the Melissa O'Neill paper. The problem says that, given that the 6th prime number is 13, find the 10,001st prime. Since we don't have a known upper bound to use the primes:queue/1 function, I created a lazy sieve that would return the next prime, its position, and an iterator. The code:


lazy_sieve() ->
    Table = insert_prime(2, skew_kv:empty()),
    [ { 2, 1 } | fun() -> lazy_sieve({ 3, 2 }, Table) end ].

lazy_sieve({ X, Pos }, Table) ->
    {NextComposite, _Value} = skew_kv:min(Table),
    case  NextComposite =< X of
    true -> lazy_sieve({X+1, Pos}, adjust(Table, X));
    _Else -> [ {X, Pos} | fun() -> lazy_sieve({X+1, Pos+1}, insert_prime(X,Table)) end]

This can be used thusly:

1>[{Prime1, Pos1} | Next1] = primes:lazy_sieve().
% Prime1 = 2, Pos1 = 1, Next1 = fun()
2>[{Prime2, Pos2} | Next2] = Next1().
% Prime2 = 3, Pos2 = 2, Next2 = fun()
3>[{Prime3, Pos3} | Next3] = Next2().
% Prime3 = 5, Pos3 = 3, Next3 = fun()

This makes defining a function to find the nth prime trivial:


nth(N) ->
    nth(lazy_sieve(), N).

nth([ {Prime, N} | _Next], N) -> Prime;
nth([ {_Prime, _Pos} | Next], N) ->
    nth(Next(), N).

Lazy evaluation twice, in the main functionality and in the value of a key in the queue. And the performance of nth is pretty good too (time is the left number, in mirco-seconds, and the 10,000st prime is the right number):

9> timer:tc(primes, nth, [10001]).
{1635839,104743} % 1.6 seconds
10> timer:tc(primes, queue, [104743]).
{1788997, % 1.7 seconds to generate the full list of primes

So it takes 1.6 seconds to find the 10,001st prime and 1.7 seconds to generate the list of primes up to the 10,001st prime.

Erlang, Primes, and the Sieve of Eratosthenes

Working through the Project Euler again and using Erlang to do so, there are quite a few problems that deal with primes. It is important, then, to have a library of functions that make generating and validating primes easy. A typical method for generating primes is using the Sieve of Eratosthenes. As discussed in the wiki article, Melissa O'Neill has shown the given Haskell implementation is not a true implementation, and shows a couple versions that are more true to the algorithm and more performant. I took the implementation she described using a priority queue on page 7 of the pdf. I used a skew heap implementation I found as the priority queue, modified it slightly to handle a Key and a Value parameter, and away I went. Here's the implementation I came up with:

%% primes.erl


queue(N) ->
    sieve_queue(lists:seq(2, N)).

sieve_queue([]) ->
sieve_queue([X|XS]) ->
    Table = insert_prime(X, skew_kv:empty()),
    [X | sieve_queue(XS, Table)].

insert_prime(P, Table) ->
    skew_kv:insert(P*P, from(P*P, P), Table).

sieve_queue([], _Table) ->
sieve_queue([X|XS], Table) ->
    {NextComposite, _Value} = skew_kv:min(Table),
    case  NextComposite =< X of
        true -> sieve_queue(XS, adjust(Table, X));
        _Else -> [X | sieve_queue(XS, insert_prime(X, Table))]

adjust(Table, X) ->
    {N, [Nprime | NS]} = skew_kv:min(Table),
    case N =< X of
        true ->
            T = skew_kv:delete_min(Table),
            T2 = skew_kv:insert(Nprime, NS(), T),
            adjust(T2, X);
        _Else -> Table

%% from
%% a lazy list that starts at K and increments by Inc
from(K, Inc) ->
    [K|fun()-> from(K+Inc, Inc) end].

I tried to keep variable names similar to the O'Neill implementation, but I have modified it a bit to use lazy lists, Erlang-style. Not being familiar with Haskell, I think lists there are defined lazily by default, but Erlang needs an explicit construct to do lazy lists. Fortunately, it isn't terribly hard. So, the code! I named the function queue because I was implementing prime number generators using other methods from the paper (the unfaithful sieve and a sieve with a map instead of a queue), hence why you need to call the function with:

1> primes:queue(10).

I don't think I'll expound on the why, as O'Neill does a nice job explaining it in her paper, so I'll focus on the how and leave it to the reader to learn the why. My insert\prime function differs from the O'Neill insertprime because mine has to account for Erlang-style lazy lists. My assumption is that the map call in O'Neill's creates a lazy list and so all the mappings are not computed, since XS is a potentially huge list of numbers. Instead, I observed that those lists are created by the map call are arithmetic progressions that start at p*p and increase by p. My simple lazy list that accomplishes this is called from/2, which I got from a Joe Armstrong suggestion and adapted to my needs. Let's take a quick look at from/2. We pass in a number K and an increment Inc and return a list with K at the head and a fun as the tail. The fun wraps a call to from/2, which acts as the iterator for the lazy list. So evaluating

1> primes:from(2, 2).

Here's an example showing the iterator in action:

57> [K|F1] = primes:from(2, 2).
58> [K2|F2] = F1().
59> [K3|F3] = F2().

This lazy list is stored in the queue as the value (line 16) for the key P*P. The lazy list is then used in the adjust/2 function when we want to compare the smallest key of the queue to the current number we're evaluating. Line 28 extracts the K into Nprime, and the fun into NS, from the from/2 call earlier. Line 32 shows how to get the next number in the lazy list, NS(), which returns [K+Inc | fun()], and inserts it as the value into the queue. Again, for the why, consult the O'Neill paper. Otherwise, the implementation is roughly the same. There are some improvements and optimizations that could be made, I'm sure, but for a rough draft, it is fairly speedy, minimal memory usage, and for Project Euler, it is more than adequate in producing the primes I need to solve the problems.

Red-black Trees in Erlang

Working through Chris Okasaki's "Purely Functional Data Structures", I found that I couldn't find an Erlang version of the red-black tree implementation Chris shows on page 28. The code, also available as a pastie and download (with more comments of mine) .erl and .hrl:

%% redblack.hrl
-record(node, {color, data, left, right}).

%% redblack.erl
%% Adapted from Okasaki "Purely Functional Data Structures" p. 28
member(_X, undefined) ->
member(X, #node{left=A, data=Y}) when X < Y ->
    member(X, A);
member(X, #node{data=Y, right=B}) when X > Y ->
    member(X, B);
member(_X, _S) ->

insert(X, undefined) ->
    #node{color=black, data=X};
insert(X, S) ->
    % to do recursive anonymous functions, pass the created fun in as 2nd parameter
    Ins = fun(undefined, _F) ->
                  #node{color=red, data=Data}; % insert the new data as a red node
             (#node{color=Color, left=A, data=Y, right=B}, F) when X < Y->
                  balance(Color, F(A, F), Y, B);
             (#node{color=Color, left=A, data=Y, right=B}, F) when X > Y->
                  balance(Color, A, Y, F(B, F));
             (Node, _F) ->
    #node{left=A, data=Y, right=B} = Ins(S, Ins),
    #node{color=black, left=A, data=Y, right=B}.

%% detect Black->Red->Red Patterns and balance the situation
balance(black, #node{color=red, left=#node{color=red, left=A, data=X, right=B}, data=Y, right=C}, Z, D) ->
    #node{color=red, left=#node{color=black, left=A, data=X, right=B}, data=Y, right=#node{color=black, left=C, data=Z, right=D}};
balance(black, #node{color=red, left=A, data=X, right=#node{color=red, left=B, data=Y, right=C}}, Z, D) ->
    #node{color=red, left=#node{color=black, left=A, data=X, right=B }, data=Y, right=#node{color=black, left=C, data=Z, right=D}};
balance(black, A, X, #node{color=red, left=#node{color=red, left=B, data=Y, right=C}, data=Z, right=D}) ->
    #node{color=red, left=#node{color=black, left=A, data=X, right=B }, data=Y, right=#node{color=black, left=C, data=Z, right=D}};
balance(black, A, X, #node{color=red, left=B, data=Y, right=#node{color=red, left=C, data=Z, right=D}}) ->
    #node{color=red, left=#node{color=black, left=A, data=X, right=B }, data=Y, right=#node{color=black, left=C, data=Z, right=D}};
balance(Color, Left, Data, Right) ->
    #node{color=Color, left=Left, data=Data, right=Right}.

So, I'm going to go through this, mainly to prove to myself that I kinda get it, and because Ben will complain I didn't dumb it down enough for him. Line 2 defines the node structure in an Erlang header file. Comparable to a struct in C. Member Lines 6-13 define the member function. Each member() -> … is a function clause, essentially a case statement on steroids. The first clause says if the tree is undefined, then whatever X is, it is not a member of undefined, so return false. The second clause unpacks the current node ( #node{left=A, data=Y} ) and assigns the values of left and data to A and Y, respectively (I have tried to maintain Chris's variable names as much as possible, though I think Erlang's record syntax complements the Standard ML syntax found in the book). The clause then compares Y, the node's data, to the query X, and if X < Y, checks the left branch (the variable A) of the node. The third clause is nearly identical, except it handles when X > Y, and checks the right branch (B) for X's existence. The fourth and final clause handles a match (because when X is not greater than or less than Y, it must be equal), returning true. Simple enough to follow and grok. Insertion When I read several different articles online concerning red-black trees, the insertion routine sounded quite complex to me. I was quite pleased when I read the algorithm in the book, as it crystallized my thoughts and allowed me to grok the action. The first clause of the insert function handles the empty (or undefined) tree, creating and returning a black node (since the root of the tree must be black). The second clause takes the new data X and a tree S and creates a new function, stored in the var Ins. Now, calling an anonymous function recursively is tricky, and I will explain why this part varies slightly from the text. Since there's no way to refer to the function (as it isn't named), we have to pass the anonymous function as a second parameter to the anonymous function, which is what happens on line 28. So when you see F(A, F), mentally think Ins(A). The first parameter to Ins() is our current node in the tree. If it is undefined, we insert our new data as a red node. Why? That is discussed at length elsewhere, but it boils down to being easier to fix a red violation than a black violation (look up the constraints a red-black tree has on it, and the violations will make more sense). The second clause unpacks the node and makes our less than/greater than comparison like a normal BST. What's different is that we call the appropriate recursive insert call (remember F(A, F)? This inserts X somewhere into the left side of Node, and F(B, F) inserts X somewhere into the right side), and pass the result to the balance function. The final clause handles a match between X and Y and returns the Node and its branches unmodified. Balance Why do we need a balance function? In case our data being inserted into the tree is more ordered than we may have expected. A vanilla BST with inputs [3,2,1] will be a glorified list. We force the situation by inserting a red node in the tree and causing a situation where the parent of the newly inserted red node is also a red node. This violates the constraint that a red node's children must be black nodes. There are four possible combinations of this violation:

  1. (B)-Left->(Red)-Left->(Red)
  2. (B)-Left->(Red)-Right->(Red)
  3. (B)-Right->(Red)-Left->(Red)
  4. (B)-Right->(Red)-Right->(Red)

Each of those possibilities maps to the first four clauses of the balance function. The fifth clause passes a node back unchanged. So let's look at the first clause. This clause is activated when a black node's left child is red, and that red node's left child is also red.

     /   \
   R2     *
  /   \
R1     *

So we unpack the three nodes and return the three nodes rotated to remove the red violation.

Variable Meaning Value
A R1's Left node undefined
X R1's Data 1
B R1's Right node undefined
Y R2's Data R2
C R2's Right node undefined
Z B3's Data 3
D B3's Right node undefined

Unpacking from the pattern match, we then pack up a new tree structure according to the clause's definition:

#node{color=red, left=#node{color=black, left=A, data=X, right=B}, data=Y, right=#node{color=black, left=C, data=Z, right=D}};

This essentially replaces the black node having a red child and red grandchild with a red node that has two black nodes for children. The next illustration shows the result of the balance call:

  /   \
B1    B3

Now, the bubbling up of the red node may cause a red violation with the parent and grandparent of R2, which will be handled as the recursion of Ins() unwinds. When the initial Ins() ends (line 28), the resulting node is the root of the new tree, which we then unpack into the data, left, and right fields. We then force the root node to be black by creating the root node with the unpacked parts, forcing color=black. So there you have it. Simplified, with a lot of things glossed over/ignored, but this is my port of the red-black code listing on page 28. Looking forward to progressing through the book more and increasing my data structure- and Erlang-fu. Further fun: Read Chris's reflections 10 years after publishing his book.

Quiero decir español mejor

I want to learn Spanish, and while I wish I could just move somewhere and be immersed, that is not feasible at this juncture. So I am reaching out to the open source world to help me along. The first step was to get a flashcard-like system that would provide me with a way to test my vocabulary and keep track of my progress. To that end, I have installed Mnemosyne. Read all about it because it is a pretty groovy program for flashcard learning. I then downloaded and installed the four Spanish card packages available through the site. As you can see on the left, the site has many languages supported, as well as a variety of other topics to study (including Swedish road signs). This gives me my base of vocabulary and expressions. Second on the list of tools is the Firefox plugin ImTranslator. This is a great plugin for doing on the fly translations. So when I browse a site in Spanish, or come across something I want to know how to say in Spanish, this plugin gets me going in the right direction. Using the default media player in Ubuntu, Rhythmbox and its ability to play online radio stations, I've searched for and subscribed to several Spanish music and news sites. So now I get to hear music, news, and commercials in several different Spanish cities (Madrid and Barcelona stations currently, though I plan to get some Central and South American stations since that's where I'll likely travel to first). And, for some extra vocab, I signed up with the Spanish Word-a-Day mailing list. I've really enjoyed the emails as they have a word, pronunciation guide, synonyms, the word in a sentence, and usually some Spanish trivia, like a joke, expression, or conjugation table for a verb/tense. The most important piece, however, is actually conversing with native speakers, and I am lucky to have a Mexican restaurant across the street with several native speakers who I've gotten to know. Any other tools you suggest? I think my next one will be finding folks who speak both English and Spanish and chatting with them via GChat or Skype.