Archive for the ‘News’ Category

Erlang, Euler, Primes, and gen_server

Friday, March 26th, 2010

I have been working on Project Euler problems for a while now and many of them have centered around prime numbers. I’ve referenced my work with the sieve in other posts but found with a particular problem that some of my functions could benefit from some state being saved (namely the sieve being saved and not re-computed each time).

The problem called to count prime factors of numbers and find consecutive numbers that had the same count of prime factors. My primes module had a prime_factors/1 function that would compute the prime factors and the exponents of those factors (so 644 = 22 * 7 * 23, and primes:prime_factors(644) would return [{2,2},{7,1},{23,1}]. The prime_factors/1 looked something like this:

prime_factors(N) ->
    CandidatePrimes = prime_factors(N, primes:queue(N div 2)),
    PrimeFactors = [ X || X < - CandidatePrimes, N rem X =:= 0 ],
    find_factors(PrimeFactors, N, 0, []).

The call to find_factors/4 takes the factors and finds the exponents and can be ignored for now. The time sink, then, is in generating the CandidatePrimes list. I think my primes:queue/1 function is pretty fast at generating the sieve, and dividing N by two eliminates a lot of unnecessary computation, but when you’re calling prime_factors/1 thousands of times, the call to queue/1 begins to add up. This is where I needed to save some state (the sieve) in between calls. Erlang, fortunately enough, has a module behavior called gen_server that abstracts away a lot of the server internals and lets you focus on the business bits of the server. I won’t discuss it much here as I’m not an authority on it, but Joe Armstrong’s book and the Erlang docs have been a great help in understanding what’s happening behind the scene. You can view the prime_server module to see what its current state is code-wise.

To speed up prime_factors/1, I split it into two functions prime_factors/1 and prime_factors/2. The functions look like this:

prime_factors(N, CandidatePrimes) ->
    PrimeFactors = [ X || X < - CandidatePrimes, N rem X =:= 0 ],
    find_factors(PrimeFactors, N, 0, []).
 
prime_factors(N) ->
    prime_factors(N, primes:queue(N div 2)).

Now, if we don't need to save the queue between calls you can still call prime_factors/1 as usual. The prime_server module utilizes the prime_factors/2 function because it initializes its state to contain a primes sieve (either all primes under 1,000,000 if no arg is passed to start_link, or all primes =< N when using start_link/1) and the current upper bound. Now when the server handles the call for getting the factors, we pass a pared down list of primes to prime_factors/2 and get a nice speed boost.

Well, the heavy lifting is front-loaded in the initialization of the server (generating the sieve) and in calls that increase the sieve's size. One improvement there might be to save the Table generated during the initial sieve creation and start the loop back up from where it left off (when N > UpTo) but that is for another time. If you choose your initial value for start_link right, regenerating the sieve should be unnecessary.

The last speed boost was noticing that calculating the exponents was an unnecessary step so I wrote a count_factors/1 and count_factors/2 that skips the call to find_factors/4 and returns the length of the list comprehension.

With these changes complete, problem 47 went from taking well over 5 minutes to just under 20 seconds to solve brute force.

Purely Functional Data Structures and Me

Tuesday, March 23rd, 2010

Quick post to say that I’ve put my dabblings into Chris Okasaki’s Purely Functional Data Structures book up as a github repo. I am picking and choosing what exercises and examples to code, with the goal being to slowly cover the whole book. Some concessions are made to fit the ideas into Erlang (like recursively calling an anonymous function), but overall I think the ideas fit nicely.

There is a streams lib from Richard Carlsson that I found on the Erlang mailing list in the repo as well that I used for reference for a couple things in chapter 4. I stuck with my streams being represented either as an empty list [] or [term() | fun()] with term() being the calculated value and fun() being the suspension function, instead of the tuple that Richard chose. After reading further in the thread, I understand why (don’t confuse people that might use proper list functions on streams) but for my small examples, it was enough to use lists.

Connect to remote erlang shell while inside emacs

Thursday, January 7th, 2010

While developing my top secret project, I have been getting into the fun stuff in Erlang and Emacs. Connecting to a running instance of my app from a remote shell wasn’t straightforward to me at first, so below is my documented way of connecting, as well as dropping into the Erlang JCL from within an Emacs erlang shell.

  1. Start yaws: yaws –daemon -sname appname –conf /path/to/yaws.conf
  2. Start emacs, and from within emacs start an Erlang shell with C-c C-z (assuming you have distel configured).
  3. From the Emacs erlang shell, get into Erlang’s JCL by typing C-q C-g and pressing enter. A ^G will be printed at the prompt, but won’t be evaluated until you press enter. You should see the familiar JCL prompt “User switch command –>”.
  4. Type ‘j’ to see current jobs you have running locally, which is probably just the current shell (1 {shell,start,[init]}).
  5. Type ‘r appname@compy’ to connect to the remote node identified by appname ( from the -sname parameter ) on the computer compy (usually whatever hostname returns)
  6. Type ‘j’ to see current jobs, which should list your current shell as “1 {shell,start,[init]}”, and a second shell “2* {appname@compy,shell,start,[]}”.
  7. Type ‘c 2′ to connect to the remote shell. You can now run commands in the node’s shell. You may have to press enter again to bring up a shell prompt.
james@compy 14:33:34 ~/dev/erlang/app
> yaws --daemon -sname app --conf config/yaws.conf
 
james@compy 14:34:00 ~/dev/erlang/app
> emacs
Eshell V5.7.4  (abort with ^G)
1> ^G
 
User switch command
 --> j
   1* {shell,start,[init]}
 --> r app@compy
 --> j
   1  {shell,start,[init]}
   2* {app@compy,shell,start,[]}
 --> c 2
 
1>

Posting from Emacs

Thursday, December 24th, 2009

I am posting this short message from emacs using the weblogger.el package.

Cool!

Erlang, Primes, and the Sieve of Eratosthenes

Wednesday, December 16th, 2009

Working through the Project Euler again and using Erlang to do so, there are quite a few problems that deal with primes. It is important, then, to have a library of functions that make generating and validating primes easy. A typical method for generating primes is using the Sieve of Eratosthenes. As discussed in the wiki article, Melissa O’Neill has shown the given Haskell implementation is not a true implementation, and shows a couple versions that are more true to the algorithm and more performant. I took the implementation she described using a priority queue on page 7 of the pdf. I used a skew heap implementation I found as the priority queue, modified it slightly to handle a Key and a Value parameter, and away I went.

Here’s the implementation I came up with:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
%% primes.erl
-module(primes).
 
-export([queue/1]).
 
queue(N) ->
    sieve_queue(lists:seq(2, N)).
 
sieve_queue([]) ->
    [];
sieve_queue([X|XS]) ->
    Table = insert_prime(X, skew_kv:empty()),
    [X | sieve_queue(XS, Table)].
 
insert_prime(P, Table) ->
    skew_kv:insert(P*P, from(P*P, P), Table).
 
sieve_queue([], _Table) ->
    [];
sieve_queue([X|XS], Table) ->
    {NextComposite, _Value} = skew_kv:min(Table),
    case  NextComposite =< X of
        true -> sieve_queue(XS, adjust(Table, X));
        _Else -> [X | sieve_queue(XS, insert_prime(X, Table))]
    end.
 
adjust(Table, X) ->
    {N, [Nprime | NS]} = skew_kv:min(Table),
    case N =< X of
        true ->
            T = skew_kv:delete_min(Table),
            T2 = skew_kv:insert(Nprime, NS(), T),
            adjust(T2, X);
        _Else -> Table
    end.
 
%% from http://www.erlang.org/cgi-bin/ezmlm-cgi?4:mss:177:khdaceipfabbicmifdhf
%% a lazy list that starts at K and increments by Inc
from(K, Inc) ->
    [K|fun()-> from(K+Inc, Inc) end].

I tried to keep variable names similar to the O’Neill implementation, but I have modified it a bit to use lazy lists, Erlang-style. Not being familiar with Haskell, I think lists there are defined lazily by default, but Erlang needs an explicit construct to do lazy lists. Fortunately, it isn’t terribly hard. So, the code!

I named the function queue because I was implementing prime number generators using other methods from the paper (the unfaithful sieve and a sieve with a map instead of a queue), hence why you need to call the function with:

1> primes:queue(10).

I don’t think I’ll expound on the why, as O’Neill does a nice job explaining it in her paper, so I’ll focus on the how and leave it to the reader to learn the why.

My insert_prime function differs from the O’Neill insertprime because mine has to account for Erlang-style lazy lists. My assumption is that the map call in O’Neill’s creates a lazy list and so all the mappings are not computed, since XS is a potentially huge list of numbers. Instead, I observed that those lists are created by the map call are arithmetic progressions that start at p*p and increase by p. My simple lazy list that accomplishes this is called from/2, which I got from a Joe Armstrong suggestion and adapted to my needs.

Let’s take a quick look at from/2. We pass in a number K and an increment Inc and return a list with K at the head and a fun as the tail. The fun wraps a call to from/2, which acts as the iterator for the lazy list. So evaluating

1> primes:from(2, 2).
[2|#Fun<primes .1.47942561>]
</primes>

Here’s an example showing the iterator in action:

57> [K|F1] = primes:from(2, 2).
[2|#Fun<primes .1.47942561>]
58> [K2|F2] = F1().
[4|#Fun</primes><primes .1.47942561>]
59> [K3|F3] = F2().
[6|#Fun</primes><primes .1.47942561>]
60> 
</primes>

This lazy list is stored in the queue as the value (line 16) for the key P*P. The lazy list is then used in the adjust/2 function when we want to compare the smallest key of the queue to the current number we’re evaluating. Line 28 extracts the K into Nprime, and the fun into NS, from the from/2 call earlier. Line 32 shows how to get the next number in the lazy list, NS(), which returns [K+Inc | fun()], and inserts it as the value into the queue. Again, for the why, consult the O’Neill paper.

Otherwise, the implementation is roughly the same. There are some improvements and optimizations that could be made, I’m sure, but for a rough draft, it is fairly speedy, minimal memory usage, and for Project Euler, it is more than adequate in producing the primes I need to solve the problems.

Quiero decir español mejor

Tuesday, November 3rd, 2009

I want to learn Spanish, and while I wish I could just move somewhere and be immersed, that is not feasible at this juncture. So I am reaching out to the open source world to help me along.

The first step was to get a flashcard-like system that would provide me with a way to test my vocabulary and keep track of my progress. To that end, I have installed Mnemosyne. Read all about it because it is a pretty groovy program for flashcard learning. I then downloaded and installed the four Spanish card packages available through the site. As you can see on the left, the site has many languages supported, as well as a variety of other topics to study (including Swedish road signs).

This gives me my base of vocabulary and expressions. Second on the list of tools is the Firefox plugin ImTranslator. This is a great plugin for doing on the fly translations. So when I browse a site in Spanish, or come across something I want to know how to say in Spanish, this plugin gets me going in the right direction.

Using the default media player in Ubuntu, Rhythmbox and its ability to play online radio stations, I’ve searched for and subscribed to several Spanish music and news sites. So now I get to hear music, news, and commercials in several different Spanish cities (Madrid and Barcelona stations currently, though I plan to get some Central and South American stations since that’s where I’ll likely travel to first).

And, for some extra vocab, I signed up with the Spanish Word-a-Day mailing list. I’ve really enjoyed the emails as they have a word, pronunciation guide, synonyms, the word in a sentence, and usually some Spanish trivia, like a joke, expression, or conjugation table for a verb/tense.

The most important piece, however, is actually conversing with native speakers, and I am lucky to have a Mexican restaurant across the street with several native speakers who I’ve gotten to know.

Any other tools you suggest? I think my next one will be finding folks who speak both English and Spanish and chatting with them via GChat or Skype.

Converting a site to use Cachefly for static content

Tuesday, February 24th, 2009

I recently needed to move static content from a live site to a cachefly account. Rather than go through the directories, looking for the resources (js/css/images) I needed to ftp, I thought, “Man, this sure sounds like it could be automated”.

The first step was to collect a list of items that needed ftping to cachefly. I know what you’re saying, “Use find!” In case Ben is reading this, find “searchs for files in a directory hierarchy” (that’s from find’s man page Ben). I wanted to separate the resources out so I ran three different invocations.

For javascripts and css, the invocation was nearly identical:

find . -name '*.js' > js.libs
find . -name '*.css' > css.libs

Images were a little trickier. Most of the images are static content, but some are user-generated, likely to change or be removed. These do not go up to the CDN (at least for now). The user-generated content is located under one directory (call it /images/usergen), so we simply need to exclude it from find’s search.

find -path '*images/usergen*' -prune -o -path . -iname '*.gif' -o -iname '*.jpg' -o -iname '*.png' > image.files

The important parts:

  • -path '*images/usergen*' -prune

    Remove any found items that contain images/usergen in the path name.

  • -o -path .

    Search within the current directory (the root of the project).

  • -iname '*.gif' -o -iname '*.jpg' -o -iname '*.png'

    Match, case-insensitive (-iname instead of -name), any files ending in gif, jpg, or png.

We are then left with three files, each line of which contains the path, relative to the project root, of each resource I want to upload. I created a simple php script to upload the images, maintaining the pathing, to cachefly. So an image with relative path /images/header/header_left.png would now be accessible at instance.cachefly.com/images/header/header_left.png.

So the images are now up on the CDN. Now we need our code to point there as well. Fortunately, most of the resources were prepended with a domain (stored in the global $live_site). So the src attribute of an image, for instance, would be src=”< ?= $live_site ?>/images/header/header_left.png”. Creating a $cachefly_site global, we now only need to find lines in our code that have a basic layout of “stuff……$live_site…stuff…..png” where stuff is (.*) in regex land. So we utilize two commands, find and egrep. Find locates files we want and egrep searches the found files for a regex that would locate the resources in the code.

So first, we build the regex. We know a couple elements that need to be present, and one that should not be present. Needed are live_site and a resource extension (js/css/jpg/png/gif), and not needed is the “images/usergen” path, as this points to user generated content. So the regex becomes:

'live_site([^images/usergen])+.+(png|gif|jpg|css|js)'

This is the arg for egrep (the -l switch means print the file names that have a match, rather than the lines of a file that match):

egrep -lr 'live_site([^images/usergen])+.+(png|gif|jpg|css|js)'

Now we need to tell egrep what files to search using find:

find . -name "*.php" -exec egrep -lr 'live_site([^images/usergen])+.+(png|gif|jpg|css|js)' {} \;

We then store this list of files into a shell variable:

export FILES=`find . -name "*.php" -exec egrep -lr 'live_site([^images/usergen])+.+(png|gif|jpg|css|js)' {} \;`

Now that we have the files we need, we can search and replace $live_site with $cachefly_site for resources. The goto command for search and replace is sed. The sed command will look generically like this:

sed -i 's/search/replace/g' FILE

We actually have two issues though. Due to the nature of the code, we have to account for the $live_site variable being passed in via the global keyword. So not only are we searching for resource files, but we also have to add $cachefly_site to the global lines to make sure $cachefly_site is defined within the function where output is generated.

Searching and replacing resource files is pretty easy:

sed -i '/live_site.+\|js\|css\|gif\|png\|jpg/s/live_site/cachefly_site/g' $FILES

$FILES, of course, came from our find/egrep call earlier. There is one catch to the regex used here. It is actually of a different generic form than mentioned above:

sed -i '/contains/s/search/replace/g' FILE

With this format, we put a condition on whether to replace text, meaning the regex in the “contains” portion must be matched before the search and replace is performed on that line.
So our sed above says if the line contains live_site, followed by anything, ending in one of the listed resources (\| means OR), then replace live_site with cachefly_stite. I left of the $ since its common to both variables.

Running the sed command replaces everything nicely, but when we reload the page, we see notices about $live_site being undefined and resources being pulled from the host and not cachefly. So we need to handle the global importing.

This one is a little tricker because we are not really replacing live_site with cachefly_site, but appending it to the list of imported globals. So a line like

global $foo, $bar, $live_site, $baz;

becomes

global $foo, $bar, $live_site, $cachefly_site, $baz;

The other trick is that the global line should not already contain $cachefly_site. We don’t need that redundancy. So, without further ado, the sed:

sed -i '/global.*live_site.*\(cachefly_site\)\{0\}/s/live_site/live_site,\$cachefly_site/g' $FILES

The “contains” portion matches the keyword global, followed by stuff, followed by live_site followed by stuff, with cachefly_site appearing exactly 0 times (denoted by \{0\}). This ensures we only replace live_site when cachefly_site is not in the line already.
The “search” portion is easy; search for live_site. The replace portion replaces live_site with live_site,$cachefly_site. This takes into account when live_site is followed by a comma or semi-colon so we don’t get syntax errors.

And that is basically how I converted a site to use cachefly for static content.

Re-assert The Federal Government’s Role As An Agent Of the Several States

Thursday, February 5th, 2009

A template for a resolution for you to send to your state legislature requiring the Federal Government to reign itself back into it’s Constitutional constraints and cease imposing its will on the States. Remember, the Federal Government is an agent of the States, not the other way around.

  • WHEREAS, the Tenth Amendment to the Constitution of the United States reads as follows: “The powers not delegated to the United States by the Constitution, nor prohibited by it to the States, are reserved to the States respectively, or to the people”; and
  • WHEREAS, the Tenth Amendment defines the total scope of federal power as being that specifically granted by the Constitution of the United States and no more; and
  • WHEREAS, the scope of power defined by the Tenth Amendment means that the federal government was created by the states specifically to be an agent of the states; and
  • WHEREAS, today, in 2009, the states are demonstrably treated as agents of the federal government; and
  • WHEREAS, many federal laws are directly in violation of the Tenth Amendment to the Constitution of the United States; and
  • WHEREAS, the Tenth Amendment assures that we, the people of the United States of America and each sovereign state in the Union of States, now have, and have always had, rights the federal government may not usurp; and
  • WHEREAS, Article IV, section 4, United States Constitution, says in part, “The United States shall guarantee to every State in this Union a Republican Form of Government”, and the Ninth Amendment states that “The enumeration in the Constitution, of certain rights, shall not be construed to deny or disparage others retained by the people”; and
  • WHEREAS, the United States Supreme Court has ruled in New York v. United States, 112 S. Ct. 2408 (1992), that Congress may not simply commandeer the legislative and regulatory processes of the states; and
  • WHEREAS, a number of proposals from previous administrations and some now pending from the present administration and from Congress may further violate the Constitution of the United States.

THEREFORE – Be it resolved by the House of Representatives of the State of <STATE>, the Senate concurring, that:

  1. That the State of <STATE> hereby claims sovereignty under the Tenth Amendment to the Constitution of the United States over all powers not otherwise enumerated and granted to the federal government by the Constitution of the United States.
  2. That this Resolution serves as notice and demand to the federal government, as our agent, to cease and desist, effective immediately, mandates that are beyond the scope of these constitutionally delegated powers.
  3. That all compulsory federal legislation that directs states to comply under threat of civil or criminal penalties or sanctions or requires states to pass legislation or lose federal funding be prohibited or repealed.
  4. That the Secretary of State of the State of <STATE> transmit copies of this resolution to the President of the United States, the President of the United States Senate, the Speaker of the United States House of Representatives, the Speaker of the House and the President of the Senate of each state’s legislature and each Member of Congress from the State of <STATE>.

Replace “State of <STATE>” with your state or commonwealth and send it away. Or create this as a petition, gather signatures, then present it to your legislators. Take back your state from the Federal bureaucrats.

Uncle-fied

Friday, October 31st, 2008

I’m proud to announce the newest addition to the Aimonetti family. Brosef Jeffrey and Steffi have a new son, Leon Jeffrey Michael! He was born this morning, weighed 8 lbs 8 oz, and was 21 inches in length. He, Mom, and Dad are all doing wonderful! Very exciting to have the first nephew aboard.

See the first pictures at my flickr page.

Obama’s Wealth Redistribution In Action!

Monday, October 27th, 2008

From a mailing list:

Today on my way to lunch, I passed a homeless guy with a sign that read, Vote Obama; I need the money.” I laughed.

Once in the restaurant, I noticed my server had on an “Obama 08″ tie. Again I laughed as he had given away his political preference — just imagine the coincidence!

When the bill came, I decided not to tip the server and explained to him that I was exploring the Obama redistribution of wealth concept.

He stood there in disbelief while I told him that I was going to redistribute his tip to someone whom I deemed more in need — the homeless guy outside.

The server angrily stormed from my sight.

I went outside, gave the homeless guy $10 and told him to thank the server inside, who, I decided, did not need the money as much as the homeless guy. The homeless guy was most grateful!

At the end of my rather unscientific redistribution experiment, I realized the homeless guy was very grateful for the money although he did not earn it. And the waiter was pretty angry that I gave away the money he did earn, even though the actual recipient deserved the money more.

I guess redistribution of wealth is an easier thing to swallow in concept than in practical application. Or is it? Redistribution of someone else’s wealth is a great idea — or just a fool’s game?

It seems like there should be a market for all those Obama supporters to bind together into a group fund that then they can redistribute their wealth by choice, and not force others that don’t agree with the method of redistribution. Oh wait, there are…they’re called charities and non-profits. No need for the government there! If only…