Archive for the ‘Geekdom’ Category

Writing Excel Spreadsheets Using PHP

Thursday, July 24th, 2008

When using the Spreadsheet_Excel_Writer library from the PEAR repository, I came across an issue I didn’t see handled in the docs (as of this writing, I am using Spreadsheet_Excel_Writer 0.9.1 beta)

My application creates spreadsheets that contain order information. Part of each row is a list of up to 20 ISBNs and the quantities desired of each. The issue came in how to handle ISBNs that had a leading zero. When I first looked through the PEAR docs for the library, a Worksheet method, writeString, looked to be the solution. However, the end result was that while the leading zero was maintained, the cell’s format was still numeric. This resulted in the application receiving the generated xls to then drop the zero, resulting in an invalid ISBN.

Looking over the internals of the Worksheet::writeString method didn’t reveal an undocumented feature that would ensure a cell was read as text, regardless of its contents. I next looked at the Format::setNumFormat method as I knew it contained ways to format the number as currency, timestamp, fractions, etc. You could then pass this Format object as the optional fourth parameter to the Worksheet::write method.

Contained in the Format::setNumFormat docs was a link to the OpenOffice.org documentation of the Excel File Format (found here, pdf). Interested in how exactly the file was structured, I read on. What I learned that was directly applicable is that each cell contains a pointer to a format definition, or XF record, and it was this XF record where formatting was stored. From the doc, section 4.6:

All cell formatting attributes are stored in XF records…The cell records themselves contain an index into the XF record list. This way of string cell formatting saves memory and decreases the file size.

So if two cells use the same formatting, like the ISBN columns would, each cell would contain a pointer to the XF record that would tell Excel the cell was text. Seciton 4.6.1 lists the 6 groups of formatting attributes, the first of which is number format, which is then an index to a FORMAT record. Okay, we’re on to something here. Further in the pdf, in section 5.49, we see the definition of the FORMAT record. Lo and behold, the table of formats from the setNumFormat page is listed in the pdf, but we see that the PEAR listing is incomplete. Scanning the complete table in the pdf, we see index 49, type Text, format string ‘@’. Bingo.

Our code for formatting numeric data as text in a string goes a little something like this (modified from the PEAR example code):

$workbook = new Spreadsheet_Excel_Writer();
$worksheet =& $workbook->addWorksheet();
 
// We'll show dates with a three letter month and four digit year format
$text_format =& $workbook->addFormat();
$text_format->setNumFormat('@');
 
$worksheet->write(0, 0, "Without formatting");
$worksheet->write(0, 1, '0123'); // cell contains 123
 
$worksheet->write(1, 0, "With formatting");
$worksheet->write(1, 1, '0123', $text_format); // cell contains 0123

To verify, generate the xls and open it. Right click the cells to modify the format of the cell, and see that the first cell is formatted as a general number, and the second cell is formatted as text.

The meta-moral is to read the docs and follow references to get at the source material. Had I not opened the pdf, it may have been a few more time units finding the information on Google. Plus, I learned a lot more about an important file format. I can sleep easy knowing I’m that much more knowledgeable.

Announcements

Monday, April 28th, 2008

Thought I’d go ahead and announce, mainly to myself, that I will be working through SICP. The rub…doing it in Javascript. Seems as though most other languages are covered (I know Erlang is taken) and since I am doing an increasingly large amount of Javascript, coupled with the eventual prevalence of server-side Javascript, I figured it best to start getting intimate. What I like about this task is that since SICP has been so widely covered on the web, I have many resources to aid in better understanding the material (and it is some thick material). Anyway, I’ve begun chapter one and will post the chapters, as well as excerpts I find interesting, in no pre-defined timeframe.

Oh yeah, and I’m engaged.

More Wget

Thursday, April 10th, 2008

It’s hard to understate the usefulness and robust feature set that most of the GNU tools have in their arsenal. Today, I’ll make mention of one such tool, wget, and a novell use of the command.

As I go through my work, I find that sites we agree to take over have little structure. They generally were slapped together a long time ago, with little thought to organization, made with Dreamweaver or, Stallman forbid, FrontPage. I’m not judging; as long as something looks okay in the browser, a company can proclaim, “We’re on the intarwebs!” However, tracking down all of their pages to be converted into a CMS, for instance, can be time consuming. Not wanting to waste a client’s money by searching through the source for links and images, then manually reconstructing the layout of the files, I fell on my trusty GNU tool wget. (I also did not have FTP access, but I knew there were dead pages that I didn’t want to resurrect. Using wget in this case helped me retrieve only the pages that were still linked to from the main page).

Here’s a variation of the incantation of wget I used:

wget -r -A '*.htm*, *.jpg, *.png, *.gif' -l 3 http://www.example-site.com

What’s it all mean?
-r: wget should retrieve recursively
-A: takes a comma-separated list of patterns to match files to accept (use -R to reject). In this case, we want all htm, html, and most picture format files.
-l: denotes how far down the rabbit hole to venture. I started with 1, so only links from the first page were parsed and followed. I then tried 2, following links that were a level below the parent and compared the resulting structure. Trying 3, I found no difference between 3’s results and 2’s results, meaning all links had been followed and accounted for.

The result:
A directory called www.example-site.com that contains the files in their layout on the server. Now I knew which pages needed converting and which images to add to the new site.

A side note: A handy way to see the layout of your newly downloaded directory is to use the tree command.

tree www.example-site.com/

will display something like this:

www.example-site.com/
|– about.html
|– calendar.html
|– committees.html
|– contact.html
|– otherdir
| `– index.html
|– images
| |– header.gif
| |– logo.gif
| `– spacer.gif
|– index.html
|– join.html
|– news.html
|– partnerships.html
`– scoopholiday.html

Fibonacci stream

Monday, February 4th, 2008

The Fibonacci sequence: teacher of recursion for so many Computer Science students. But can it also teach us about streams? Yes! The function to build a stream (in Javascript):

var fib_stream_maker = function() {
  return (function() {
    var a = 0;
    var b = 0;
    var c = 1;
 
    return function() {
      a = b;
      b = c;
      c = a + b;
      return c;
    }
  })();
}

Let’s break this down so Ben can follow along. First, the inner most function:

return function() {
  a = b;
  b = c;
  c = a + b;
  return c;
}

This anonymous function returns the next number in the sequence. Looking one level of scope higher, we see the variables a, b, and c declared:

function() {
  var a = 0;
  var b = 0;
  var c = 1;
 
  return function() {
    a = b;
    b = c;
    c = a + b;
    return c;
  }
}

So we initialize the variables and then return a function that increments those variables, local to the inner function’s scope, so we can have multiple instances of the function running without collision of variables. To build the stream factory, we wrap the initialization function in parentheses, creating a continuation, and then execute the wrapped function, creating an initialized fib stream function. This is then set to return when the fib_stream_maker is called as a function.

var fib_stream = fib_stream_maker();
 
fib_stream(); // returns 1
fib_stream(); // returns 2
fib_stream(); // returns 3
fib_stream(); // returns 5
fib_stream(); // returns 8, etc...

This is only touching the surface of streams but I thought it was pretty cool that the Fibonacci sequence can be utilized to teach such important concepts as continuations and streams. Code on!

*Update*

While explaining the inner workings of the above code to CD, she asked a very astute question, one that Ben probably would not have, “What about going backwards in the stream?” A just question. Let’s look at a table of how the values of a, b, and c act as the stream goes forward first (this will help us design our algorithm later).

  a b c Returns
Creating fib_stream 0 0 1  
fib_stream(); 0 1 1 1
fib_stream(); 1 1 2 2
fib_stream(); 1 2 3 3
fib_stream(); 2 3 5 5

So to go backwards, what do we need to do to values of a, b, and c? We need to assign b’s value to c, a’s value to b, and the difference of c and b to a, and still return c. In code, this would look like:

c = b;
b = a;
a = c - b;
return c;

To integrate this into the above example, we need to pass a boolean parameter to the inner-most lamda function and test to see whether to forward or reverse the stream. Let’s see the inner-lamda now:

return function(go_forward) {
  if ( go_forward ) {
    a = b;
    b = c;
    c = a + b;
  } else {
    c = b;
    b = a;
    a = c - b;
  }
 
  return c;
}

We may want to put an additional test to see if we are at the beginning of the stream again:

return function(go_forward) {
  if ( go_forward ) {
    a = b;
    b = c;
    c = a + b;
  } else if ( a == 0 && b == 0 ) {
    ; // do nothing
  } else {
    c = b;
    b = a;
    a = c - b;
  }
 
  return c;
}

Let’s see it all as one big fun function:

var fib_stream_maker = function() {
  return (function() {
    var a = 0;
    var b = 0;
    var c = 1;
 
    return function(go_forward) {
      if ( go_forward ) {
        a = b;
        b = c;
        c = a + b;
      } else if ( a == 0 && b == 0 ) {
        ; // do nothing
      } else {
        c = b;
        b = a;
        a = c - b;
      }
      return c;
    }
  })();
}

And there you have it. Thanks CD for bringing up this interesting idea! What a nerd!

MySQL -> CSV

Wednesday, January 30th, 2008

Always on the lookout for increases in efficiency, I love when I find a slick snippet of command line goodness that makes a hard sounding task simple and quick. I was tasked with creating an email list from a database and putting it into a csv format, and had only the command line to interface with the DB. My first attempt revolved around using the SELECT … INTO OUTFILE syntax. Unfortunately, I was unable to write out to a file with the DB user I had access to. What’s a fella to do?

Unix pipes to the rescue.

First, the whole command:

echo “select * from example;” | mysql -u user -p dbname | tr ‘^V^I’ ‘,’ > filename.csv

Let’s break this down, in case Ben is reading and can’t follow along. The echo statement contains your query. It is sent to the mysql command, which connects you to the database and executes the query, returning the data in tab-delimited format to the console. The tr command reads from STDIN, and replaces tabs (Ctrl-V Ctrl-I) with whatever delimiter you want (in this case the comma). The final touch is sending it to a file of your choosing.

*Note* - You actually have to type the Ctrl-V Ctrl-I when entering this command. Copy/paste won’t cut it in the example above.

*Note* - You typically do not want to actually enter your mysql password on the command line, as commands run are typically logged. Omit the password to force mysql to ask for it (it won’t interfere with the query). And if you don’t have your mysql access password protected, WTF? You’re asking for trouble.

So there you have it. Simple, easy to follow, effective. As always, this example can be extended into a variety of different ways. It’s up to you to figure it out (you can, of course, pay me to figure it out).

Portland is fun

Saturday, December 8th, 2007

While the weather has not cooperated and given me many sunny days yet, I have been enjoying exploring the cyber-presence Portland has, in particular the Craigslist offerings. Today, however, brings a new entry in the NW Nerdery: a movement to rename 42nd Ave to Douglas Adams Blvd. The site, rename42nd.org, is making a serious effort to have the 42nd Ave renamed in honor of Douglas Adams, most notable for authoring the Hitchhiker’s Guide to the Galaxy series. My inner geek smiles wide for this effort and hope they succeed. If you are a Portlander, support the movement. I know I am…

Airplanes!

Tuesday, June 19th, 2007

It has been a while since a good time wasting game came across my browser. This one is fun, but once you learn the secret, it’s less challenging and more luck to get the super high distances.

Throw a paper airplane and see where you stand. Me, I’m currently 5608 globally with a distance of 114.452m.

[Update 6/19/2007 5:31 CST] 114.717m for a global ranking of 2766.

cribbage.erl

Monday, May 28th, 2007

Coupling Cribbage and Erlang into a program sounds like a fun little program to write to aid in learning Erlang while writing a program that brings a game I like in life to the virtual world. Is it the most efficient? Probably not, but you gotta start somewhere. To the code!

The first thing I wanted to do was create a method to calculate points. An ace is a 1, 2-10 are face value, and Jack, Queen, King are 11, 12, 13, respectively. Easy adjustments could be made to allow characters (A, J, Q, K) but for now, I like keeping it simple.

-module(cribbage).
 
-export([points/1]).
 
points([]) -> 0
points(L) ->
    Hand = lists:sort(L),
    fifteens(Hand, 0) + runs(Hand, 0, 0) + pairs(Hand, 1, 0).

The above creates a module called cribbage and exports a function called points/1 which takes one parameter, a list of cards. There are three kinds of scoring in Cribbage: combinations of cards that equal 15, runs of three or more, and pairs (or sets or four of a kind). There is one other kind, but it’s not part of this portion of the game.

cardval(C) when C > 9 -> 10
cardval(C) -> C.
 
fifteens(_L, Total) when Total > 15 -> 0
fifteens(_Hand, Total) when Total =:= 15 -> 2
fifteens([], _Total) -> 0
fifteens([H  T], Total) when Total < 15 ->
    fifteens(T, Total) + fifteens(T, Total + cardval(H)).

cardval is a function that converts the value of face cards (11-J,12-Q,13-K) to 10 and leaves other cards unchanged in value. This is useful in finding all combinations of 15 in the hand. When a combo equals 15, two points are added to the score.

Runs were the trickiest of the three to get right. First, I defined a simple function to determine the points for a run of given length.

run(3) -> 3
run(4) -> 6
run(5) -> 12
run(_Length) -> 0.

Some people may play with different values for runs of different lengths, so this allows for easy editing.

Runs come in two flavors: 1) A normal run, and 2) A run where one or two of the cards are doubled. To account for this, I have runs/3 and runs/4. runs/3 handles the first case, and passes control to runs/4 when a run of the second case is encountered. Another special case is when a run has two different cards doubled (e.g. 3,4,4,5,5) where the run of three is doubled and doubled again.

%% two cases for runs
%%   1. A straight run - 4,5,6,7
%%   2. A run with a double in the sequence - 4,4,5,6 or 4,5,5,6
runs([], _Curr, Len) -> run(Len)
runs([H  T], Curr, Len) when H =:= (Curr+1) -> runs(T, H, Len + 1)
runs([H  T], Curr, Len) when H =:= Curr -> runs(T, Curr, Len, {H, 2})
runs([H  T], _Curr, Len) -> run(Len) + runs(T, H, 1).
 
runs([], _Curr, Len, {_Card, Mult}) -> Mult * run(Len)
runs([H  T], Curr, Len, {Card, Mult}) when H =:= (Curr+1) ->
    runs(T, H, (Len+1), {Card, Mult})
%% needed for special cases where multiple cards are doubled up
%% like 3,4,4,5,5
runs([H  T], Curr, Len, {Card, Mult}) when H =:= Curr, H > Card -> 
    runs(T, Curr, Len, {H, (Mult*2)})
%% handles a triple carding, like 2,2,2,3,4
runs([H  T], Curr, Len, {Card, Mult}) when H =:= Curr ->
    runs(T, Curr, Len, {Card, (Mult+1)})
runs([H  T], _Curr, Len, {_Card, Mult}) ->
    (Mult * run(Len)) + runs(T, H, 1).

For pairs, I do a similar thing: define a pair(Length) function that returns the point value given a number of similar cards. But it’s all pretty straightforward.

pairs([], Pairs, _Curr) -> pair(Pairs)
pairs([H  T], Pairs, Curr) when H =:= Curr -> pairs(T, Pairs+1, Curr)
pairs([H  T], Pairs, _Curr) -> pair(Pairs) + pairs(T, 1, H).
 
pair(2) -> 2
pair(3) -> 6
pair(4) -> 12
pair(_Length) -> 0.

That’s it for now. Actual game play to come. You can get the code here.

Remove nested arrays in javascript using the prototype library

Tuesday, May 22nd, 2007

I have been playing with some drawing code in javascript, storing coordinates and using them later on in the application. My list of coordinates is of the form [ [id1, x1, y1, width1, height1], [id2, x2, y2, width2, height2],…]. A requirement of the application is that a user can delete a set of coordinates from the list. Using prototype.js, I created a simple function to remove the nested array based on the id.

// remove an array from the list based on the id
function remove(id, list) {
    return $A(list).map(
        function(arr) {
            if ( $A(arr).first() == id ) { return ; }
            else { return arr; }
        }).compact();
}

In your favorite editor, this function can be a one-liner, but spacing helps here for clarity and formatting on the page. Onward!

So what’s happening? The first thing we do is wrap list with the $A() call to ensure we have access to the extensions prototype gives us for arrays (I’m calling the parameter a list because I’m on an Erlang kick and it has infiltrated my core!). Once extended, we call the map function to iterate through the list and apply the supplied function to each element in the list (in this case it is a list of arrays, so each element passed to the supplied function will be an array as well).

Within the supplied function, we are dealing with a single array of the form [id, x, y, width, height], so $A(arr).first() returns the id of the array. This value is compared to the value of the id parameter and if it matches, returns nothing, or ‘undefined’ in Javascript. If the ids don’t match, it returns the array unaltered. As the map function iterates through the list, a new list is created containing the results of the supplied function. So the return value of the map function call is an array. We then call the compact function on the resulting array, which removes any undefined values from the array, essentially leaving only those arrays that did not have the id passed in.

This function is fairly specialized; the requirements for the function are fairly specific. A more general function could be written but that is an exercise left to the reader.

Recursive FTP

Wednesday, May 9th, 2007

So you want to download some files from an ftp server, but they are contained in more than one subdirectory. With a straight ftp client, you would have to recurse through all of the directories and mget each directory’s contents manually. Never fear, though, there is a little utility that can help - wget.

> wget -r ftp://user:pass@ftpsite.com/directory .

If the ftp site allows anonymous logins, you can omit the user:pass portion. This will get everything…it is left as an exercise to the reader to customize the command.