More Erlang+Emacs

February 3rd, 2010

I have found that Distel’s built-in shell launcher wasn’t cutting mustard as I needed to start shells with various flags and didn’t see an easy way to accomplish this using what Distel provided. Digging around the Erlang mailing list, I found an elisp function that allowed me to pass flags to the shell. Place this snippet in your .emacs file after you’ve required erlang and distel:

(defun erl-shell-with-flags (flags)
  "Start an erlang shell with flags"
  (interactive (list (read-string "Flags: ")))
  (set 'inferior-erlang-machine-options (split-string flags))
  (erlang-shell))
 
;; map Ctrl-c Ctrl-z to the new function
(global-set-key "\C-c\C-z" 'erl-shell-with-flags)

Now when you start the erlang shell, a “Flags: ” prompt will be presented. Simply add flags as you would on the command line and the shell will start up. Great for when you need multiple shells with different snames, names, cookies, etc…

Connect to remote erlang shell while inside emacs

January 7th, 2010

While developing my top secret project, I have been getting into the fun stuff in Erlang and Emacs. Connecting to a running instance of my app from a remote shell wasn’t straightforward to me at first, so below is my documented way of connecting, as well as dropping into the Erlang JCL from within an Emacs erlang shell.

  1. Start yaws: yaws –daemon -sname appname –conf /path/to/yaws.conf
  2. Start emacs, and from within emacs start an Erlang shell with C-c C-z (assuming you have distel configured).
  3. From the Emacs erlang shell, get into Erlang’s JCL by typing C-q C-g and pressing enter. A ^G will be printed at the prompt, but won’t be evaluated until you press enter. You should see the familiar JCL prompt “User switch command –>”.
  4. Type ‘j’ to see current jobs you have running locally, which is probably just the current shell (1 {shell,start,[init]}).
  5. Type ‘r appname@compy’ to connect to the remote node identified by appname ( from the -sname parameter ) on the computer compy (usually whatever hostname returns)
  6. Type ‘j’ to see current jobs, which should list your current shell as “1 {shell,start,[init]}”, and a second shell “2* {appname@compy,shell,start,[]}”.
  7. Type ‘c 2′ to connect to the remote shell. You can now run commands in the node’s shell. You may have to press enter again to bring up a shell prompt.
james@compy 14:33:34 ~/dev/erlang/app
> yaws --daemon -sname app --conf config/yaws.conf
 
james@compy 14:34:00 ~/dev/erlang/app
> emacs
Eshell V5.7.4  (abort with ^G)
1> ^G
 
User switch command
 --> j
   1* {shell,start,[init]}
 --> r app@compy
 --> j
   1  {shell,start,[init]}
   2* {app@compy,shell,start,[]}
 --> c 2
 
1>

PHP’s json_last_error

December 28th, 2009

A quick note that I hope Google picks up concerning php’s json_last_error function. I was trying to debug a json string I was decoding with json_decode, but was getting NULL. When I tried to use the json_last_error(), a fatal undefined function error was returned. The reason: json_last_error doesn’t exist in php versions < 5.3. Ah, version numbers! So, check your php version if the function is undefined.

Simple, yet a detail easily overlooked.

Posting from Emacs

December 24th, 2009

I am posting this short message from emacs using the weblogger.el package.

Cool!

Erlang, Primes, and Sieves Again (Lazy edition)

December 17th, 2009

I was working on the Project Euler problems, specifically problem 7, and decided to implement a prime number generator based of my previous prime number implementation based off the Melissa O’Neill paper.

The problem says that, given that the 6th prime number is 13, find the 10,001st prime. Since we don’t have a known upper bound to use the primes:queue/1 function, I created a lazy sieve that would return the next prime, its position, and an iterator. The code:

43
44
45
46
47
48
49
50
51
52
53
54
-export([lazy_sieve/0]).
 
lazy_sieve() ->
    Table = insert_prime(2, skew_kv:empty()),
    [ { 2, 1 } | fun() -> lazy_sieve({ 3, 2 }, Table) end ].
 
lazy_sieve({ X, Pos }, Table) ->
    {NextComposite, _Value} = skew_kv:min(Table),
    case  NextComposite =< X of
	true -> lazy_sieve({X+1, Pos}, adjust(Table, X));
	_Else -> [ {X, Pos} | fun() -> lazy_sieve({X+1, Pos+1}, insert_prime(X,Table)) end]
    end.

This can be used thusly:

1>[{Prime1, Pos1} | Next1] = primes:lazy_sieve().
% Prime1 = 2, Pos1 = 1, Next1 = fun()
2>[{Prime2, Pos2} | Next2] = Next1().
% Prime2 = 3, Pos2 = 2, Next2 = fun()
3>[{Prime3, Pos3} | Next3] = Next2().
% Prime3 = 5, Pos3 = 3, Next3 = fun()

This makes defining a function to find the nth prime trivial:

58
59
60
61
62
63
64
65
-export([nth/1]).
 
nth(N) ->
    nth(lazy_sieve(), N).
 
nth([ {Prime, N} | _Next], N) -> Prime;
nth([ {_Prime, _Pos} | Next], N) ->
    nth(Next(), N).

Lazy evaluation twice, in the main functionality and in the value of a key in the queue. And the performance of nth is pretty good too (time is the left number, in mirco-seconds, and the 10,000st prime is the right number):

9> timer:tc(primes, nth, [10001]).
{1635839,104743} % 1.6 seconds
10> timer:tc(primes, queue, [104743]).
{1788997, % 1.7 seconds to generate the full list of primes
 [2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,
  79,83,89,97,101,103|...]}

So it takes 1.6 seconds to find the 10,001st prime and 1.7 seconds to generate the list of primes up to the 10,001st prime.

Erlang, Primes, and the Sieve of Eratosthenes

December 16th, 2009

Working through the Project Euler again and using Erlang to do so, there are quite a few problems that deal with primes. It is important, then, to have a library of functions that make generating and validating primes easy. A typical method for generating primes is using the Sieve of Eratosthenes. As discussed in the wiki article, Melissa O’Neill has shown the given Haskell implementation is not a true implementation, and shows a couple versions that are more true to the algorithm and more performant. I took the implementation she described using a priority queue on page 7 of the pdf. I used a skew heap implementation I found as the priority queue, modified it slightly to handle a Key and a Value parameter, and away I went.

Here’s the implementation I came up with:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
%% primes.erl
-module(primes).
 
-export([queue/1]).
 
queue(N) ->
    sieve_queue(lists:seq(2, N)).
 
sieve_queue([]) ->
    [];
sieve_queue([X|XS]) ->
    Table = insert_prime(X, skew_kv:empty()),
    [X | sieve_queue(XS, Table)].
 
insert_prime(P, Table) ->
    skew_kv:insert(P*P, from(P*P, P), Table).
 
sieve_queue([], _Table) ->
    [];
sieve_queue([X|XS], Table) ->
    {NextComposite, _Value} = skew_kv:min(Table),
    case  NextComposite =< X of
        true -> sieve_queue(XS, adjust(Table, X));
        _Else -> [X | sieve_queue(XS, insert_prime(X, Table))]
    end.
 
adjust(Table, X) ->
    {N, [Nprime | NS]} = skew_kv:min(Table),
    case N =< X of
        true ->
            T = skew_kv:delete_min(Table),
            T2 = skew_kv:insert(Nprime, NS(), T),
            adjust(T2, X);
        _Else -> Table
    end.
 
%% from http://www.erlang.org/cgi-bin/ezmlm-cgi?4:mss:177:khdaceipfabbicmifdhf
%% a lazy list that starts at K and increments by Inc
from(K, Inc) ->
    [K|fun()-> from(K+Inc, Inc) end].

I tried to keep variable names similar to the O’Neill implementation, but I have modified it a bit to use lazy lists, Erlang-style. Not being familiar with Haskell, I think lists there are defined lazily by default, but Erlang needs an explicit construct to do lazy lists. Fortunately, it isn’t terribly hard. So, the code!

I named the function queue because I was implementing prime number generators using other methods from the paper (the unfaithful sieve and a sieve with a map instead of a queue), hence why you need to call the function with:

1> primes:queue(10).

I don’t think I’ll expound on the why, as O’Neill does a nice job explaining it in her paper, so I’ll focus on the how and leave it to the reader to learn the why.

My insert_prime function differs from the O’Neill insertprime because mine has to account for Erlang-style lazy lists. My assumption is that the map call in O’Neill’s creates a lazy list and so all the mappings are not computed, since XS is a potentially huge list of numbers. Instead, I observed that those lists are created by the map call are arithmetic progressions that start at p*p and increase by p. My simple lazy list that accomplishes this is called from/2, which I got from a Joe Armstrong suggestion and adapted to my needs.

Let’s take a quick look at from/2. We pass in a number K and an increment Inc and return a list with K at the head and a fun as the tail. The fun wraps a call to from/2, which acts as the iterator for the lazy list. So evaluating

1> primes:from(2, 2).
[2|#Fun<primes .1.47942561>]
</primes>

Here’s an example showing the iterator in action:

57> [K|F1] = primes:from(2, 2).
[2|#Fun<primes .1.47942561>]
58> [K2|F2] = F1().
[4|#Fun</primes><primes .1.47942561>]
59> [K3|F3] = F2().
[6|#Fun</primes><primes .1.47942561>]
60> 
</primes>

This lazy list is stored in the queue as the value (line 16) for the key P*P. The lazy list is then used in the adjust/2 function when we want to compare the smallest key of the queue to the current number we’re evaluating. Line 28 extracts the K into Nprime, and the fun into NS, from the from/2 call earlier. Line 32 shows how to get the next number in the lazy list, NS(), which returns [K+Inc | fun()], and inserts it as the value into the queue. Again, for the why, consult the O’Neill paper.

Otherwise, the implementation is roughly the same. There are some improvements and optimizations that could be made, I’m sure, but for a rough draft, it is fairly speedy, minimal memory usage, and for Project Euler, it is more than adequate in producing the primes I need to solve the problems.

Red-black Trees in Erlang

December 1st, 2009

Working through Chris Okasaki’s “Purely Functional Data Structures“, I found that I couldn’t find an Erlang version of the red-black tree implementation Chris shows on page 28.

The code, also available as a pastie and download (with more comments of mine) .erl and .hrl:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
%% redblack.hrl
-record(node, {color, data, left, right}).
 
%% redblack.erl
%% Adapted from Okasaki "Purely Functional Data Structures" p. 28
member(_X, undefined) ->
    false;
member(X, #node{left=A, data=Y}) when X < Y ->
    member(X, A);
member(X, #node{data=Y, right=B}) when X > Y ->
    member(X, B);
member(_X, _S) ->
    true.
 
insert(X, undefined) ->
    #node{color=black, data=X};
insert(X, S) ->
    % to do recursive anonymous functions, pass the created fun in as 2nd parameter
    Ins = fun(undefined, _F) ->
                  #node{color=red, data=Data}; % insert the new data as a red node
             (#node{color=Color, left=A, data=Y, right=B}, F) when X < Y->
                  balance(Color, F(A, F), Y, B);
             (#node{color=Color, left=A, data=Y, right=B}, F) when X > Y->
                  balance(Color, A, Y, F(B, F));
             (Node, _F) ->
                  Node
          end,
    #node{left=A, data=Y, right=B} = Ins(S, Ins),
    #node{color=black, left=A, data=Y, right=B}.
 
%% detect Black->Red->Red Patterns and balance the situation
balance(black, #node{color=red, left=#node{color=red, left=A, data=X, right=B}, data=Y, right=C}, Z, D) ->
    #node{color=red, left=#node{color=black, left=A, data=X, right=B}, data=Y, right=#node{color=black, left=C, data=Z, right=D}};
balance(black, #node{color=red, left=A, data=X, right=#node{color=red, left=B, data=Y, right=C}}, Z, D) ->
    #node{color=red, left=#node{color=black, left=A, data=X, right=B }, data=Y, right=#node{color=black, left=C, data=Z, right=D}};
balance(black, A, X, #node{color=red, left=#node{color=red, left=B, data=Y, right=C}, data=Z, right=D}) ->
    #node{color=red, left=#node{color=black, left=A, data=X, right=B }, data=Y, right=#node{color=black, left=C, data=Z, right=D}};
balance(black, A, X, #node{color=red, left=B, data=Y, right=#node{color=red, left=C, data=Z, right=D}}) ->
    #node{color=red, left=#node{color=black, left=A, data=X, right=B }, data=Y, right=#node{color=black, left=C, data=Z, right=D}};
balance(Color, Left, Data, Right) ->
    #node{color=Color, left=Left, data=Data, right=Right}.

So, I’m going to go through this, mainly to prove to myself that I kinda get it, and because Ben will complain I didn’t dumb it down enough for him.

Line 2 defines the node structure in an Erlang header file. Comparable to a struct in C.

Member

Lines 6-13 define the member function. Each member() -> … is a function clause, essentially a case statement on steroids. The first clause says if the tree is undefined, then whatever X is, it is not a member of undefined, so return false.
The second clause unpacks the current node ( #node{left=A, data=Y} ) and assigns the values of left and data to A and Y, respectively (I have tried to maintain Chris’s variable names as much as possible, though I think Erlang’s record syntax complements the Standard ML syntax found in the book). The clause then compares Y, the node’s data, to the query X, and if X < Y, checks the left branch (the variable A) of the node.
The third clause is nearly identical, except it handles when X > Y, and checks the right branch (B) for X’s existence.
The fourth and final clause handles a match (because when X is not greater than or less than Y, it must be equal), returning true. Simple enough to follow and grok.

Insertion

When I read several different articles online concerning red-black trees, the insertion routine sounded quite complex to me. I was quite pleased when I read the algorithm in the book, as it crystallized my thoughts and allowed me to grok the action.

The first clause of the insert function handles the empty (or undefined) tree, creating and returning a black node (since the root of the tree must be black).
The second clause takes the new data X and a tree S and creates a new function, stored in the var Ins. Now, calling an anonymous function recursively is tricky, and I will explain why this part varies slightly from the text. Since there’s no way to refer to the function (as it isn’t named), we have to pass the anonymous function as a second parameter to the anonymous function, which is what happens on line 28. So when you see F(A, F), mentally think Ins(A).
The first parameter to Ins() is our current node in the tree. If it is undefined, we insert our new data as a red node. Why? That is discussed at length elsewhere, but it boils down to being easier to fix a red violation than a black violation (look up the constraints a red-black tree has on it, and the violations will make more sense).
The second clause unpacks the node and makes our less than/greater than comparison like a normal BST. What’s different is that we call the appropriate recursive insert call (remember F(A, F)? This inserts X somewhere into the left side of Node, and F(B, F) inserts X somewhere into the right side), and pass the result to the balance function.
The final clause handles a match between X and Y and returns the Node and its branches unmodified.

Balance

Why do we need a balance function? In case our data being inserted into the tree is more ordered than we may have expected. A vanilla BST with inputs [3,2,1] will be a glorified list. We force the situation by inserting a red node in the tree and causing a situation where the parent of the newly inserted red node is also a red node. This violates the constraint that a red node’s children must be black nodes. There are four possible combinations of this violation:

  1. (B)-Left->(Red)-Left->(Red)
  2. (B)-Left->(Red)-Right->(Red)
  3. (B)-Right->(Red)-Left->(Red)
  4. (B)-Right->(Red)-Right->(Red)

Each of those possibilities maps to the first four clauses of the balance function. The fifth clause passes a node back unchanged. So let’s look at the first clause. This clause is activated when a black node’s left child is red, and that red node’s left child is also red.

      B3
     /   \
   R2     *
  /   \
R1     *

So we unpack the three nodes and return the three nodes rotated to remove the red violation.

Variable Meaning Value
A R1’s Left node undefined
X R1’s Data 1
B R1’s Right node undefined
Y R2’s Data R2
C R2’s Right node undefined
Z B3’s Data 3
D B3’s Right node undefined

Unpacking from the pattern match, we then pack up a new tree structure according to the clause’s definition:

33
#node{color=red, left=#node{color=black, left=A, data=X, right=B}, data=Y, right=#node{color=black, left=C, data=Z, right=D}};

This essentially replaces the black node having a red child and red grandchild with a red node that has two black nodes for children. The next illustration shows the result of the balance call:

      R2
     /   \
   B1    B3

Now, the bubbling up of the red node may cause a red violation with the parent and grandparent of R2, which will be handled as the recursion of Ins() unwinds.

When the initial Ins() ends (line 28), the resulting node is the root of the new tree, which we then unpack into the data, left, and right fields. We then force the root node to be black by creating the root node with the unpacked parts, forcing color=black.

So there you have it. Simplified, with a lot of things glossed over/ignored, but this is my port of the red-black code listing on page 28. Looking forward to progressing through the book more and increasing my data structure- and Erlang-fu.

Further fun: Read Chris’s reflections 10 years after publishing his book.

Quiero decir español mejor

November 3rd, 2009

I want to learn Spanish, and while I wish I could just move somewhere and be immersed, that is not feasible at this juncture. So I am reaching out to the open source world to help me along.

The first step was to get a flashcard-like system that would provide me with a way to test my vocabulary and keep track of my progress. To that end, I have installed Mnemosyne. Read all about it because it is a pretty groovy program for flashcard learning. I then downloaded and installed the four Spanish card packages available through the site. As you can see on the left, the site has many languages supported, as well as a variety of other topics to study (including Swedish road signs).

This gives me my base of vocabulary and expressions. Second on the list of tools is the Firefox plugin ImTranslator. This is a great plugin for doing on the fly translations. So when I browse a site in Spanish, or come across something I want to know how to say in Spanish, this plugin gets me going in the right direction.

Using the default media player in Ubuntu, Rhythmbox and its ability to play online radio stations, I’ve searched for and subscribed to several Spanish music and news sites. So now I get to hear music, news, and commercials in several different Spanish cities (Madrid and Barcelona stations currently, though I plan to get some Central and South American stations since that’s where I’ll likely travel to first).

And, for some extra vocab, I signed up with the Spanish Word-a-Day mailing list. I’ve really enjoyed the emails as they have a word, pronunciation guide, synonyms, the word in a sentence, and usually some Spanish trivia, like a joke, expression, or conjugation table for a verb/tense.

The most important piece, however, is actually conversing with native speakers, and I am lucky to have a Mexican restaurant across the street with several native speakers who I’ve gotten to know.

Any other tools you suggest? I think my next one will be finding folks who speak both English and Spanish and chatting with them via GChat or Skype.

Symfony/Propel Memory Issues

October 6th, 2009

I have a bulk import process that needs running nightly. Currently there are around 4,200 “rows” to process, which actually can encompass many tables so row is not entirely appropriate. The problem is that the script poops out after ~200 “rows” with memory limit errors. While increasing the memory limit is do-able, I am not interested in that as a solution currently.

First, I recorded some numbers to benchmark where the script was in memory consumption. The first number is the “row”, the peak is how much memory the script has allocated, and the mem is the delta in memory allocation per row.

0 peak: 41,103,408 mem: 29,884,416
1 peak: 42,440,264 mem: 1,310,720
2 peak: 43,613,848 mem: 1,310,720
3 peak: 43,893,960 mem: 262,144
4 peak: 44,223,040 mem: 262,144
5 peak: 44,896,296 mem: 786,432
6 peak: 45,671,560 mem: 786,432
7 peak: 45,865,952 mem: 0
8 peak: 46,418,272 mem: 786,432
9 peak: 47,917,888 mem: 1,310,720
10 peak: 48,566,312 mem: 786,432

After the initialization phase, memory allocation increases fairly steadily and it is not long until the 128M memory limit is reached. This is unacceptable as I know some “rows” should be much closer to 0 as nothing is imported on the majority of rows.

My first solution was to disable logging:

sfConfig::set('sf_logging_enabled', FALSE);

The initial memory allocation was decreased, but the running deltas remained higher than expected.

Second, I inserted a ton of unset() calls in the various functions. This dropped my deltas a little:

0 peak: 30,611,472 mem: 20,971,520
1 peak: 31,969,760 mem: 1,310,720
2 peak: 32,180,112 mem: 262,144
3 peak: 32,333,216 mem: 262,144
4 peak: 32,525,144 mem: 262,144
5 peak: 32,656,616 mem: 0
6 peak: 32,863,736 mem: 262,144
7 peak: 33,103,264 mem: 262,144
8 peak: 33,455,544 mem: 262,144
9 peak: 33,754,288 mem: 262,144
10 peak: 33,984,976 mem: 262,144

But allocation still killed the import before it could complete. Browsing through various sites, I discovered Propel had a hard time cleaning up circular references, which meant PHP couldn’t garbage-collect that memory. However, to combat this, Propel 1.3 offers a static method disableInstancePooling that allowed me to override Propel’s desire to keep instances around.

Adding:

Propel::disableInstancePooling();

to the beginning of the import gave me these results:

0 peak: 26,569,632 mem: 17,301,504
1 peak: 28,582,152 mem: 2,097,152
2 peak: 30,455,352 mem: 1,835,008
3 peak: 30,536,176 mem: 0
4 peak: 30,536,176 mem: 0
5 peak: 31,517,088 mem: 1,048,576
6 peak: 31,534,152 mem: 0
7 peak: 31,552,120 mem: 0
8 peak: 31,589,632 mem: 0
9 peak: 31,695,504 mem: 0
10 peak: 31,695,504 mem: 0

Now new memory was allocating only when the import was actually doing something of significance. In fact, watching the import proceed with the deltas displayed, I could observe the memory decreasing at times, prolonging the life of the script by orders of magnitude. Whereas before the script was processing ~200 “rows”, it currently processes the whole batch (4,237 “rows” currently) in one go.

As the importable “rows” increase, I know I won’t be butting up against memory limits for some time.

Letter to Representative Goodlatte Concerning the Commerce Clause

September 17th, 2009

A letter to Congressman Goodlatte, concerning the invasive way the Commerce Clause is invoked to pry into the private lives of Americans.

Congressman Goodlatte,

I would like to bring to your attention the cases of Wickard v Filburn (http://en.wikipedia.org/wiki/Wickard_v._Filburn ) and Gonzales v Raich (http://en.wikipedia.org/wiki/Gonzales_v._Raich), wherein the Supreme Court found “that Congress can regulate purely INTRASTATE activity that is not itself “commercial,” in that it is not produced for sale, if it concludes that failure to regulate that class of activity would undercut the regulation of the interstate market in that commodity” (emphasis mine).

Congress has seemingly confused economics with commerce. Setting prices is not Congress’ job. Setting quotas, to be met or not exceeded, is not Congress’ job. Telling an American what they can and cannot produce, and in what quantities, for personal use, is not the job of Congress!

Taking the Filburn case further, one can see it being applied to any number of self-sufficiency initiatives one may follow, like home gardening or personal solar/wind/hydro power.

As we can see with HR 875 and S 425, regulating home gardening is already being proposed. I am glad to see that no Republicans are supporting it, but 41 Democrats are (HR 875, that is).

As home power generation technology becomes more affordable and effective at displacing power drawn off the grid, how long until power companies join together to have Congress regulate how much energy a person can generate locally?

Congressman, this trend worries me as I’m hoping it worries you. We need strongly worded legislation that reigns in Congress’ ability to use the Commerce Clause as carte blanche to stick it’s nose in the economy. Congress has the power to regulate commerce between the Several States, NOT in the private lives of Americans, no matter how much Congress “thinks” it may affect the national economy.

Please respond by letting me know how you are personally going to reign in Congress’ power (something any true Republican would actually do, even though it reigns in their own “power”). I would like to work with you on drafting this legislation so that it protects the individual’s ability to take care of themselves and their families without Congress influencing those decisions.

Sincerely,

James Aimonetti