Archive for the ‘News’ Category

Life Update

Thursday, January 26th, 2012

Updated the blog to run 3.3.1 – lot of cobwebs around these parts. Hopefully I can be more proactive in blogging about things going on at work, and perhaps starting to write about what I’m up to personally (not that I have much of that right now). Maybe my Google stats will jump over the 0.3 hits I average! Dare to dream!

IT Expo

Thursday, October 7th, 2010

Just returned from IT Expo West last night. Three days of learning, hob-nobbing, and talking myself hoarse about the awesomeness that is 2600hz. We got a decent writeup posted on TMC’s site, met quite a few people, collected beaucoup business cards, and generally had a fun time hanging with the team. Super tired but ready to keep building the best hosted PBX software platform!

Bonus: See Darren’s awesome (yet mildly awkward) video interview!

Also, VoIP service providers looking to offset calling costs for their business clients can look at PromoCalling as a way to compete with Google and Skype’s free calling plans.

Still Kicking

Friday, September 17th, 2010

I am still alive and well; just busy. I did write a blog entry for my company, 2600hz. More to come…eventually.

Erlang, Euler, Primes, and gen_server

Friday, March 26th, 2010

I have been working on Project Euler problems for a while now and many of them have centered around prime numbers. I’ve referenced my work with the sieve in other posts but found with a particular problem that some of my functions could benefit from some state being saved (namely the sieve being saved and not re-computed each time).

The problem called to count prime factors of numbers and find consecutive numbers that had the same count of prime factors. My primes module had a prime_factors/1 function that would compute the prime factors and the exponents of those factors (so 644 = 22 * 7 * 23, and primes:prime_factors(644) would return [{2,2},{7,1},{23,1}]. The prime_factors/1 looked something like this:

prime_factors(N) ->
    CandidatePrimes = prime_factors(N, primes:queue(N div 2)),
    PrimeFactors = [ X || X < - CandidatePrimes, N rem X =:= 0 ],
    find_factors(PrimeFactors, N, 0, []).

The call to find_factors/4 takes the factors and finds the exponents and can be ignored for now. The time sink, then, is in generating the CandidatePrimes list. I think my primes:queue/1 function is pretty fast at generating the sieve, and dividing N by two eliminates a lot of unnecessary computation, but when you’re calling prime_factors/1 thousands of times, the call to queue/1 begins to add up. This is where I needed to save some state (the sieve) in between calls. Erlang, fortunately enough, has a module behavior called gen_server that abstracts away a lot of the server internals and lets you focus on the business bits of the server. I won’t discuss it much here as I’m not an authority on it, but Joe Armstrong’s book and the Erlang docs have been a great help in understanding what’s happening behind the scene. You can view the prime_server module to see what its current state is code-wise.

To speed up prime_factors/1, I split it into two functions prime_factors/1 and prime_factors/2. The functions look like this:

prime_factors(N, CandidatePrimes) ->
    PrimeFactors = [ X || X < - CandidatePrimes, N rem X =:= 0 ],
    find_factors(PrimeFactors, N, 0, []).
 
prime_factors(N) ->
    prime_factors(N, primes:queue(N div 2)).

Now, if we don't need to save the queue between calls you can still call prime_factors/1 as usual. The prime_server module utilizes the prime_factors/2 function because it initializes its state to contain a primes sieve (either all primes under 1,000,000 if no arg is passed to start_link, or all primes =< N when using start_link/1) and the current upper bound. Now when the server handles the call for getting the factors, we pass a pared down list of primes to prime_factors/2 and get a nice speed boost.

Well, the heavy lifting is front-loaded in the initialization of the server (generating the sieve) and in calls that increase the sieve's size. One improvement there might be to save the Table generated during the initial sieve creation and start the loop back up from where it left off (when N > UpTo) but that is for another time. If you choose your initial value for start_link right, regenerating the sieve should be unnecessary.

The last speed boost was noticing that calculating the exponents was an unnecessary step so I wrote a count_factors/1 and count_factors/2 that skips the call to find_factors/4 and returns the length of the list comprehension.

With these changes complete, problem 47 went from taking well over 5 minutes to just under 20 seconds to solve brute force.

Purely Functional Data Structures and Me

Tuesday, March 23rd, 2010

Quick post to say that I’ve put my dabblings into Chris Okasaki’s Purely Functional Data Structures book up as a github repo. I am picking and choosing what exercises and examples to code, with the goal being to slowly cover the whole book. Some concessions are made to fit the ideas into Erlang (like recursively calling an anonymous function), but overall I think the ideas fit nicely.

There is a streams lib from Richard Carlsson that I found on the Erlang mailing list in the repo as well that I used for reference for a couple things in chapter 4. I stuck with my streams being represented either as an empty list [] or [term() | fun()] with term() being the calculated value and fun() being the suspension function, instead of the tuple that Richard chose. After reading further in the thread, I understand why (don’t confuse people that might use proper list functions on streams) but for my small examples, it was enough to use lists.

Connect to remote erlang shell while inside emacs

Thursday, January 7th, 2010

While developing my top secret project, I have been getting into the fun stuff in Erlang and Emacs. Connecting to a running instance of my app from a remote shell wasn’t straightforward to me at first, so below is my documented way of connecting, as well as dropping into the Erlang JCL from within an Emacs erlang shell.

  1. Start yaws: yaws –daemon -sname appname –conf /path/to/yaws.conf
  2. Start emacs, and from within emacs start an Erlang shell with C-c C-z (assuming you have distel configured).
  3. From the Emacs erlang shell, get into Erlang’s JCL by typing C-q C-g and pressing enter. A ^G will be printed at the prompt, but won’t be evaluated until you press enter. You should see the familiar JCL prompt “User switch command –>”.
  4. Type ‘j’ to see current jobs you have running locally, which is probably just the current shell (1 {shell,start,[init]}).
  5. Type ‘r appname@compy’ to connect to the remote node identified by appname ( from the -sname parameter ) on the computer compy (usually whatever hostname returns)
  6. Type ‘j’ to see current jobs, which should list your current shell as “1 {shell,start,[init]}”, and a second shell “2* {appname@compy,shell,start,[]}”.
  7. Type ‘c 2′ to connect to the remote shell. You can now run commands in the node’s shell. You may have to press enter again to bring up a shell prompt.
james@compy 14:33:34 ~/dev/erlang/app
> yaws --daemon -sname app --conf config/yaws.conf
 
james@compy 14:34:00 ~/dev/erlang/app
> emacs
Eshell V5.7.4  (abort with ^G)
1> ^G
 
User switch command
 --> j
   1* {shell,start,[init]}
 --> r app@compy
 --> j
   1  {shell,start,[init]}
   2* {app@compy,shell,start,[]}
 --> c 2
 
1>

Posting from Emacs

Thursday, December 24th, 2009

I am posting this short message from emacs using the weblogger.el package.

Cool!

Erlang, Primes, and the Sieve of Eratosthenes

Wednesday, December 16th, 2009

Working through the Project Euler again and using Erlang to do so, there are quite a few problems that deal with primes. It is important, then, to have a library of functions that make generating and validating primes easy. A typical method for generating primes is using the Sieve of Eratosthenes. As discussed in the wiki article, Melissa O’Neill has shown the given Haskell implementation is not a true implementation, and shows a couple versions that are more true to the algorithm and more performant. I took the implementation she described using a priority queue on page 7 of the pdf. I used a skew heap implementation I found as the priority queue, modified it slightly to handle a Key and a Value parameter, and away I went.

Here’s the implementation I came up with:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
%% primes.erl
-module(primes).
 
-export([queue/1]).
 
queue(N) ->
    sieve_queue(lists:seq(2, N)).
 
sieve_queue([]) ->
    [];
sieve_queue([X|XS]) ->
    Table = insert_prime(X, skew_kv:empty()),
    [X | sieve_queue(XS, Table)].
 
insert_prime(P, Table) ->
    skew_kv:insert(P*P, from(P*P, P), Table).
 
sieve_queue([], _Table) ->
    [];
sieve_queue([X|XS], Table) ->
    {NextComposite, _Value} = skew_kv:min(Table),
    case  NextComposite =< X of
        true -> sieve_queue(XS, adjust(Table, X));
        _Else -> [X | sieve_queue(XS, insert_prime(X, Table))]
    end.
 
adjust(Table, X) ->
    {N, [Nprime | NS]} = skew_kv:min(Table),
    case N =< X of
        true ->
            T = skew_kv:delete_min(Table),
            T2 = skew_kv:insert(Nprime, NS(), T),
            adjust(T2, X);
        _Else -> Table
    end.
 
%% from http://www.erlang.org/cgi-bin/ezmlm-cgi?4:mss:177:khdaceipfabbicmifdhf
%% a lazy list that starts at K and increments by Inc
from(K, Inc) ->
    [K|fun()-> from(K+Inc, Inc) end].

I tried to keep variable names similar to the O’Neill implementation, but I have modified it a bit to use lazy lists, Erlang-style. Not being familiar with Haskell, I think lists there are defined lazily by default, but Erlang needs an explicit construct to do lazy lists. Fortunately, it isn’t terribly hard. So, the code!

I named the function queue because I was implementing prime number generators using other methods from the paper (the unfaithful sieve and a sieve with a map instead of a queue), hence why you need to call the function with:

1> primes:queue(10).

I don’t think I’ll expound on the why, as O’Neill does a nice job explaining it in her paper, so I’ll focus on the how and leave it to the reader to learn the why.

My insert_prime function differs from the O’Neill insertprime because mine has to account for Erlang-style lazy lists. My assumption is that the map call in O’Neill’s creates a lazy list and so all the mappings are not computed, since XS is a potentially huge list of numbers. Instead, I observed that those lists are created by the map call are arithmetic progressions that start at p*p and increase by p. My simple lazy list that accomplishes this is called from/2, which I got from a Joe Armstrong suggestion and adapted to my needs.

Let’s take a quick look at from/2. We pass in a number K and an increment Inc and return a list with K at the head and a fun as the tail. The fun wraps a call to from/2, which acts as the iterator for the lazy list. So evaluating

1> primes:from(2, 2).
[2|#Fun<primes .1.47942561>]
</primes>

Here’s an example showing the iterator in action:

57> [K|F1] = primes:from(2, 2).
[2|#Fun<primes .1.47942561>]
58> [K2|F2] = F1().
[4|#Fun</primes><primes .1.47942561>]
59> [K3|F3] = F2().
[6|#Fun</primes><primes .1.47942561>]
60> 
</primes>

This lazy list is stored in the queue as the value (line 16) for the key P*P. The lazy list is then used in the adjust/2 function when we want to compare the smallest key of the queue to the current number we’re evaluating. Line 28 extracts the K into Nprime, and the fun into NS, from the from/2 call earlier. Line 32 shows how to get the next number in the lazy list, NS(), which returns [K+Inc | fun()], and inserts it as the value into the queue. Again, for the why, consult the O’Neill paper.

Otherwise, the implementation is roughly the same. There are some improvements and optimizations that could be made, I’m sure, but for a rough draft, it is fairly speedy, minimal memory usage, and for Project Euler, it is more than adequate in producing the primes I need to solve the problems.

Quiero decir español mejor

Tuesday, November 3rd, 2009

I want to learn Spanish, and while I wish I could just move somewhere and be immersed, that is not feasible at this juncture. So I am reaching out to the open source world to help me along.

The first step was to get a flashcard-like system that would provide me with a way to test my vocabulary and keep track of my progress. To that end, I have installed Mnemosyne. Read all about it because it is a pretty groovy program for flashcard learning. I then downloaded and installed the four Spanish card packages available through the site. As you can see on the left, the site has many languages supported, as well as a variety of other topics to study (including Swedish road signs).

This gives me my base of vocabulary and expressions. Second on the list of tools is the Firefox plugin ImTranslator. This is a great plugin for doing on the fly translations. So when I browse a site in Spanish, or come across something I want to know how to say in Spanish, this plugin gets me going in the right direction.

Using the default media player in Ubuntu, Rhythmbox and its ability to play online radio stations, I’ve searched for and subscribed to several Spanish music and news sites. So now I get to hear music, news, and commercials in several different Spanish cities (Madrid and Barcelona stations currently, though I plan to get some Central and South American stations since that’s where I’ll likely travel to first).

And, for some extra vocab, I signed up with the Spanish Word-a-Day mailing list. I’ve really enjoyed the emails as they have a word, pronunciation guide, synonyms, the word in a sentence, and usually some Spanish trivia, like a joke, expression, or conjugation table for a verb/tense.

The most important piece, however, is actually conversing with native speakers, and I am lucky to have a Mexican restaurant across the street with several native speakers who I’ve gotten to know.

Any other tools you suggest? I think my next one will be finding folks who speak both English and Spanish and chatting with them via GChat or Skype.

Converting a site to use Cachefly for static content

Tuesday, February 24th, 2009

I recently needed to move static content from a live site to a cachefly account. Rather than go through the directories, looking for the resources (js/css/images) I needed to ftp, I thought, “Man, this sure sounds like it could be automated”.

The first step was to collect a list of items that needed ftping to cachefly. I know what you’re saying, “Use find!” In case Ben is reading this, find “searchs for files in a directory hierarchy” (that’s from find’s man page Ben). I wanted to separate the resources out so I ran three different invocations.

For javascripts and css, the invocation was nearly identical:

find . -name '*.js' > js.libs
find . -name '*.css' > css.libs

Images were a little trickier. Most of the images are static content, but some are user-generated, likely to change or be removed. These do not go up to the CDN (at least for now). The user-generated content is located under one directory (call it /images/usergen), so we simply need to exclude it from find’s search.

find -path '*images/usergen*' -prune -o -path . -iname '*.gif' -o -iname '*.jpg' -o -iname '*.png' > image.files

The important parts:

  • -path '*images/usergen*' -prune

    Remove any found items that contain images/usergen in the path name.

  • -o -path .

    Search within the current directory (the root of the project).

  • -iname '*.gif' -o -iname '*.jpg' -o -iname '*.png'

    Match, case-insensitive (-iname instead of -name), any files ending in gif, jpg, or png.

We are then left with three files, each line of which contains the path, relative to the project root, of each resource I want to upload. I created a simple php script to upload the images, maintaining the pathing, to cachefly. So an image with relative path /images/header/header_left.png would now be accessible at instance.cachefly.com/images/header/header_left.png.

So the images are now up on the CDN. Now we need our code to point there as well. Fortunately, most of the resources were prepended with a domain (stored in the global $live_site). So the src attribute of an image, for instance, would be src=”< ?= $live_site ?>/images/header/header_left.png”. Creating a $cachefly_site global, we now only need to find lines in our code that have a basic layout of “stuff……$live_site…stuff…..png” where stuff is (.*) in regex land. So we utilize two commands, find and egrep. Find locates files we want and egrep searches the found files for a regex that would locate the resources in the code.

So first, we build the regex. We know a couple elements that need to be present, and one that should not be present. Needed are live_site and a resource extension (js/css/jpg/png/gif), and not needed is the “images/usergen” path, as this points to user generated content. So the regex becomes:

'live_site([^images/usergen])+.+(png|gif|jpg|css|js)'

This is the arg for egrep (the -l switch means print the file names that have a match, rather than the lines of a file that match):

egrep -lr 'live_site([^images/usergen])+.+(png|gif|jpg|css|js)'

Now we need to tell egrep what files to search using find:

find . -name "*.php" -exec egrep -lr 'live_site([^images/usergen])+.+(png|gif|jpg|css|js)' {} \;

We then store this list of files into a shell variable:

export FILES=`find . -name "*.php" -exec egrep -lr 'live_site([^images/usergen])+.+(png|gif|jpg|css|js)' {} \;`

Now that we have the files we need, we can search and replace $live_site with $cachefly_site for resources. The goto command for search and replace is sed. The sed command will look generically like this:

sed -i 's/search/replace/g' FILE

We actually have two issues though. Due to the nature of the code, we have to account for the $live_site variable being passed in via the global keyword. So not only are we searching for resource files, but we also have to add $cachefly_site to the global lines to make sure $cachefly_site is defined within the function where output is generated.

Searching and replacing resource files is pretty easy:

sed -i '/live_site.+\|js\|css\|gif\|png\|jpg/s/live_site/cachefly_site/g' $FILES

$FILES, of course, came from our find/egrep call earlier. There is one catch to the regex used here. It is actually of a different generic form than mentioned above:

sed -i '/contains/s/search/replace/g' FILE

With this format, we put a condition on whether to replace text, meaning the regex in the “contains” portion must be matched before the search and replace is performed on that line.
So our sed above says if the line contains live_site, followed by anything, ending in one of the listed resources (\| means OR), then replace live_site with cachefly_stite. I left of the $ since its common to both variables.

Running the sed command replaces everything nicely, but when we reload the page, we see notices about $live_site being undefined and resources being pulled from the host and not cachefly. So we need to handle the global importing.

This one is a little tricker because we are not really replacing live_site with cachefly_site, but appending it to the list of imported globals. So a line like

global $foo, $bar, $live_site, $baz;

becomes

global $foo, $bar, $live_site, $cachefly_site, $baz;

The other trick is that the global line should not already contain $cachefly_site. We don’t need that redundancy. So, without further ado, the sed:

sed -i '/global.*live_site.*\(cachefly_site\)\{0\}/s/live_site/live_site,\$cachefly_site/g' $FILES

The “contains” portion matches the keyword global, followed by stuff, followed by live_site followed by stuff, with cachefly_site appearing exactly 0 times (denoted by \{0\}). This ensures we only replace live_site when cachefly_site is not in the line already.
The “search” portion is easy; search for live_site. The replace portion replaces live_site with live_site,$cachefly_site. This takes into account when live_site is followed by a comma or semi-colon so we don’t get syntax errors.

And that is basically how I converted a site to use cachefly for static content.