Archive for the ‘Work’ Category

Converting a site to use Cachefly for static content

Tuesday, February 24th, 2009

I recently needed to move static content from a live site to a cachefly account. Rather than go through the directories, looking for the resources (js/css/images) I needed to ftp, I thought, “Man, this sure sounds like it could be automated”.

The first step was to collect a list of items that needed ftping to cachefly. I know what you’re saying, “Use find!” In case Ben is reading this, find “searchs for files in a directory hierarchy” (that’s from find’s man page Ben). I wanted to separate the resources out so I ran three different invocations.

For javascripts and css, the invocation was nearly identical:

find . -name '*.js' > js.libs
find . -name '*.css' > css.libs

Images were a little trickier. Most of the images are static content, but some are user-generated, likely to change or be removed. These do not go up to the CDN (at least for now). The user-generated content is located under one directory (call it /images/usergen), so we simply need to exclude it from find’s search.

find -path '*images/usergen*' -prune -o -path . -iname '*.gif' -o -iname '*.jpg' -o -iname '*.png' > image.files

The important parts:

  • -path '*images/usergen*' -prune

    Remove any found items that contain images/usergen in the path name.

  • -o -path .

    Search within the current directory (the root of the project).

  • -iname '*.gif' -o -iname '*.jpg' -o -iname '*.png'

    Match, case-insensitive (-iname instead of -name), any files ending in gif, jpg, or png.

We are then left with three files, each line of which contains the path, relative to the project root, of each resource I want to upload. I created a simple php script to upload the images, maintaining the pathing, to cachefly. So an image with relative path /images/header/header_left.png would now be accessible at instance.cachefly.com/images/header/header_left.png.

So the images are now up on the CDN. Now we need our code to point there as well. Fortunately, most of the resources were prepended with a domain (stored in the global $live_site). So the src attribute of an image, for instance, would be src=”< ?= $live_site ?>/images/header/header_left.png”. Creating a $cachefly_site global, we now only need to find lines in our code that have a basic layout of “stuff……$live_site…stuff…..png” where stuff is (.*) in regex land. So we utilize two commands, find and egrep. Find locates files we want and egrep searches the found files for a regex that would locate the resources in the code.

So first, we build the regex. We know a couple elements that need to be present, and one that should not be present. Needed are live_site and a resource extension (js/css/jpg/png/gif), and not needed is the “images/usergen” path, as this points to user generated content. So the regex becomes:

'live_site([^images/usergen])+.+(png|gif|jpg|css|js)'

This is the arg for egrep (the -l switch means print the file names that have a match, rather than the lines of a file that match):

egrep -lr 'live_site([^images/usergen])+.+(png|gif|jpg|css|js)'

Now we need to tell egrep what files to search using find:

find . -name "*.php" -exec egrep -lr 'live_site([^images/usergen])+.+(png|gif|jpg|css|js)' {} \;

We then store this list of files into a shell variable:

export FILES=`find . -name "*.php" -exec egrep -lr 'live_site([^images/usergen])+.+(png|gif|jpg|css|js)' {} \;`

Now that we have the files we need, we can search and replace $live_site with $cachefly_site for resources. The goto command for search and replace is sed. The sed command will look generically like this:

sed -i 's/search/replace/g' FILE

We actually have two issues though. Due to the nature of the code, we have to account for the $live_site variable being passed in via the global keyword. So not only are we searching for resource files, but we also have to add $cachefly_site to the global lines to make sure $cachefly_site is defined within the function where output is generated.

Searching and replacing resource files is pretty easy:

sed -i '/live_site.+\|js\|css\|gif\|png\|jpg/s/live_site/cachefly_site/g' $FILES

$FILES, of course, came from our find/egrep call earlier. There is one catch to the regex used here. It is actually of a different generic form than mentioned above:

sed -i '/contains/s/search/replace/g' FILE

With this format, we put a condition on whether to replace text, meaning the regex in the “contains” portion must be matched before the search and replace is performed on that line.
So our sed above says if the line contains live_site, followed by anything, ending in one of the listed resources (\| means OR), then replace live_site with cachefly_stite. I left of the $ since its common to both variables.

Running the sed command replaces everything nicely, but when we reload the page, we see notices about $live_site being undefined and resources being pulled from the host and not cachefly. So we need to handle the global importing.

This one is a little tricker because we are not really replacing live_site with cachefly_site, but appending it to the list of imported globals. So a line like

global $foo, $bar, $live_site, $baz;

becomes

global $foo, $bar, $live_site, $cachefly_site, $baz;

The other trick is that the global line should not already contain $cachefly_site. We don’t need that redundancy. So, without further ado, the sed:

sed -i '/global.*live_site.*\(cachefly_site\)\{0\}/s/live_site/live_site,\$cachefly_site/g' $FILES

The “contains” portion matches the keyword global, followed by stuff, followed by live_site followed by stuff, with cachefly_site appearing exactly 0 times (denoted by \{0\}). This ensures we only replace live_site when cachefly_site is not in the line already.
The “search” portion is easy; search for live_site. The replace portion replaces live_site with live_site,$cachefly_site. This takes into account when live_site is followed by a comma or semi-colon so we don’t get syntax errors.

And that is basically how I converted a site to use cachefly for static content.

Writing Excel Spreadsheets Using PHP

Thursday, July 24th, 2008

When using the Spreadsheet_Excel_Writer library from the PEAR repository, I came across an issue I didn’t see handled in the docs (as of this writing, I am using Spreadsheet_Excel_Writer 0.9.1 beta)

My application creates spreadsheets that contain order information. Part of each row is a list of up to 20 ISBNs and the quantities desired of each. The issue came in how to handle ISBNs that had a leading zero. When I first looked through the PEAR docs for the library, a Worksheet method, writeString, looked to be the solution. However, the end result was that while the leading zero was maintained, the cell’s format was still numeric. This resulted in the application receiving the generated xls to then drop the zero, resulting in an invalid ISBN.

Looking over the internals of the Worksheet::writeString method didn’t reveal an undocumented feature that would ensure a cell was read as text, regardless of its contents. I next looked at the Format::setNumFormat method as I knew it contained ways to format the number as currency, timestamp, fractions, etc. You could then pass this Format object as the optional fourth parameter to the Worksheet::write method.

Contained in the Format::setNumFormat docs was a link to the OpenOffice.org documentation of the Excel File Format (found here, pdf). Interested in how exactly the file was structured, I read on. What I learned that was directly applicable is that each cell contains a pointer to a format definition, or XF record, and it was this XF record where formatting was stored. From the doc, section 4.6:

All cell formatting attributes are stored in XF records…The cell records themselves contain an index into the XF record list. This way of string cell formatting saves memory and decreases the file size.

So if two cells use the same formatting, like the ISBN columns would, each cell would contain a pointer to the XF record that would tell Excel the cell was text. Seciton 4.6.1 lists the 6 groups of formatting attributes, the first of which is number format, which is then an index to a FORMAT record. Okay, we’re on to something here. Further in the pdf, in section 5.49, we see the definition of the FORMAT record. Lo and behold, the table of formats from the setNumFormat page is listed in the pdf, but we see that the PEAR listing is incomplete. Scanning the complete table in the pdf, we see index 49, type Text, format string ‘@’. Bingo.

Our code for formatting numeric data as text in a string goes a little something like this (modified from the PEAR example code):

$workbook = new Spreadsheet_Excel_Writer();
$worksheet =& $workbook->addWorksheet();
 
// We'll show dates with a three letter month and four digit year format
$text_format =& $workbook->addFormat();
$text_format->setNumFormat('@');
 
$worksheet->write(0, 0, "Without formatting");
$worksheet->write(0, 1, '0123'); // cell contains 123
 
$worksheet->write(1, 0, "With formatting");
$worksheet->write(1, 1, '0123', $text_format); // cell contains 0123

To verify, generate the xls and open it. Right click the cells to modify the format of the cell, and see that the first cell is formatted as a general number, and the second cell is formatted as text.

The meta-moral is to read the docs and follow references to get at the source material. Had I not opened the pdf, it may have been a few more time units finding the information on Google. Plus, I learned a lot more about an important file format. I can sleep easy knowing I’m that much more knowledgeable.

Announcements

Monday, April 28th, 2008

Thought I’d go ahead and announce, mainly to myself, that I will be working through SICP. The rub…doing it in Javascript. Seems as though most other languages are covered (I know Erlang is taken) and since I am doing an increasingly large amount of Javascript, coupled with the eventual prevalence of server-side Javascript, I figured it best to start getting intimate. What I like about this task is that since SICP has been so widely covered on the web, I have many resources to aid in better understanding the material (and it is some thick material). Anyway, I’ve begun chapter one and will post the chapters, as well as excerpts I find interesting, in no pre-defined timeframe.

Oh yeah, and I’m engaged.

More Wget

Thursday, April 10th, 2008

It’s hard to understate the usefulness and robust feature set that most of the GNU tools have in their arsenal. Today, I’ll make mention of one such tool, wget, and a novell use of the command.

As I go through my work, I find that sites we agree to take over have little structure. They generally were slapped together a long time ago, with little thought to organization, made with Dreamweaver or, Stallman forbid, FrontPage. I’m not judging; as long as something looks okay in the browser, a company can proclaim, “We’re on the intarwebs!” However, tracking down all of their pages to be converted into a CMS, for instance, can be time consuming. Not wanting to waste a client’s money by searching through the source for links and images, then manually reconstructing the layout of the files, I fell on my trusty GNU tool wget. (I also did not have FTP access, but I knew there were dead pages that I didn’t want to resurrect. Using wget in this case helped me retrieve only the pages that were still linked to from the main page).

Here’s a variation of the incantation of wget I used:

wget -r -A '*.htm*, *.jpg, *.png, *.gif' -l 3 http://www.example-site.com

What’s it all mean?
-r: wget should retrieve recursively
-A: takes a comma-separated list of patterns to match files to accept (use -R to reject). In this case, we want all htm, html, and most picture format files.
-l: denotes how far down the rabbit hole to venture. I started with 1, so only links from the first page were parsed and followed. I then tried 2, following links that were a level below the parent and compared the resulting structure. Trying 3, I found no difference between 3′s results and 2′s results, meaning all links had been followed and accounted for.

The result:
A directory called www.example-site.com that contains the files in their layout on the server. Now I knew which pages needed converting and which images to add to the new site.

A side note: A handy way to see the layout of your newly downloaded directory is to use the tree command.

tree www.example-site.com/

will display something like this:

www.example-site.com/
|– about.html
|– calendar.html
|– committees.html
|– contact.html
|– otherdir
| `– index.html
|– images
| |– header.gif
| |– logo.gif
| `– spacer.gif
|– index.html
|– join.html
|– news.html
|– partnerships.html
`– scoopholiday.html

Walk the Walk

Wednesday, April 2nd, 2008

I’ve had the pleasure of working on the technology that powers iliveinspired.com and know that the founders, Rob and Chris, are working their butts off to make this a great service. Armed with not much more than inspiration and determination, they are taking their service to the world by foot. Their first foray into marching marketing was to sign the Dalai Lama on as a content provider. Not only did the Dalai Lama receive them (read about their trip, starting here), but he embraced them and their message and agreed to work with the service.

Now these two are on a mission to sign Oprah and are generating some press about it. Pick up the story from their blog here, and then read about some of the press they’ve gotten here, here, here, and here. This is a great service, there are lots of different themes to choose from, and it’s a cheap service that can provide a lot of value. Why else would they offer the first 45 days free of charge?

What’s keeping you from living inspired today? Visit I Live Inspired and start receiving daily inspiration on your phone.

MySQL -> CSV

Wednesday, January 30th, 2008

Always on the lookout for increases in efficiency, I love when I find a slick snippet of command line goodness that makes a hard sounding task simple and quick. I was tasked with creating an email list from a database and putting it into a csv format, and had only the command line to interface with the DB. My first attempt revolved around using the SELECT … INTO OUTFILE syntax. Unfortunately, I was unable to write out to a file with the DB user I had access to. What’s a fella to do?

Unix pipes to the rescue.

First, the whole command:

echo “select * from example;” | mysql -u user -p dbname | tr ‘^V^I’ ‘,’ > filename.csv

Let’s break this down, in case Ben is reading and can’t follow along. The echo statement contains your query. It is sent to the mysql command, which connects you to the database and executes the query, returning the data in tab-delimited format to the console. The tr command reads from STDIN, and replaces tabs (Ctrl-V Ctrl-I) with whatever delimiter you want (in this case the comma). The final touch is sending it to a file of your choosing.

*Note* – You actually have to type the Ctrl-V Ctrl-I when entering this command. Copy/paste won’t cut it in the example above.

*Note* – You typically do not want to actually enter your mysql password on the command line, as commands run are typically logged. Omit the password to force mysql to ask for it (it won’t interfere with the query). And if you don’t have your mysql access password protected, WTF? You’re asking for trouble.

So there you have it. Simple, easy to follow, effective. As always, this example can be extended into a variety of different ways. It’s up to you to figure it out (you can, of course, pay me to figure it out).

7 Days, 7 Nights, 7 Minutes

Friday, October 19th, 2007

One of the projects I’ve been helping get off the ground is I Live Inspired, an inspirational text-messaging service. The site is good, and the concept is great. One can never have too much inspiration.

The founders, Rob and Chris, are on a mission. They are seeking an audience with the Dalai Lama in Indiana and are walking 7 days in hopes of getting a 7 minute audience. They are also keeping a blog of their adventures. These guys are not much older than me and are trying to get a positive service off the ground. Whether they have audience with the Dalai Lama or not, the experience of the walk, the people they’ve met thus far, and the people they’ve yet to meet, will change their lives. And we get to share in that through their writing.

So take a minute, read up on what the future of America is up to, and if you want some extra inspiration delivered to your phone daily, consider signing up for one of the many great communities at I Live Inspired.

* Disclaimer: While I have helped create the site, I do not receive any compensation for spreading the word. I think its a great service and deserves notice.

Remove nested arrays in javascript using the prototype library

Tuesday, May 22nd, 2007

I have been playing with some drawing code in javascript, storing coordinates and using them later on in the application. My list of coordinates is of the form [ [id1, x1, y1, width1, height1], [id2, x2, y2, width2, height2],…]. A requirement of the application is that a user can delete a set of coordinates from the list. Using prototype.js, I created a simple function to remove the nested array based on the id.

// remove an array from the list based on the id
function remove(id, list) {
    return $A(list).map(
        function(arr) {
            if ( $A(arr).first() == id ) { return ; }
            else { return arr; }
        }).compact();
}

In your favorite editor, this function can be a one-liner, but spacing helps here for clarity and formatting on the page. Onward!

So what’s happening? The first thing we do is wrap list with the $A() call to ensure we have access to the extensions prototype gives us for arrays (I’m calling the parameter a list because I’m on an Erlang kick and it has infiltrated my core!). Once extended, we call the map function to iterate through the list and apply the supplied function to each element in the list (in this case it is a list of arrays, so each element passed to the supplied function will be an array as well).

Within the supplied function, we are dealing with a single array of the form [id, x, y, width, height], so $A(arr).first() returns the id of the array. This value is compared to the value of the id parameter and if it matches, returns nothing, or ‘undefined’ in Javascript. If the ids don’t match, it returns the array unaltered. As the map function iterates through the list, a new list is created containing the results of the supplied function. So the return value of the map function call is an array. We then call the compact function on the resulting array, which removes any undefined values from the array, essentially leaving only those arrays that did not have the id passed in.

This function is fairly specialized; the requirements for the function are fairly specific. A more general function could be written but that is an exercise left to the reader.

Recursive FTP

Wednesday, May 9th, 2007

So you want to download some files from an ftp server, but they are contained in more than one subdirectory. With a straight ftp client, you would have to recurse through all of the directories and mget each directory’s contents manually. Never fear, though, there is a little utility that can help – wget.

> wget -r ftp://user:pass@ftpsite.com/directory .

If the ftp site allows anonymous logins, you can omit the user:pass portion. This will get everything…it is left as an exercise to the reader to customize the command.

Drop down menus

Thursday, April 26th, 2007

As we all know (actually, many probably don’t know) Internet Explorer has claimed many hours of developer time trying to get a feature working with the quirks of IE. One quirk that I’ve dealt with recently was the :hover pseudo-class and its implementation across various browsers. The most notable quirk is that IE only supports the :hover on anchor tags (<a>). What’s a fella to do when he wants a drop down menu that displays the sub-menu when the mouse is hovering over an li element? Write some Javascript to aid IE in rendering the drop-down effect properly.

The first draft of our menu is here. If you are unfortunate enough to be using IE, you probably won’t see the sub-menu items. So how do we negotiate this? With a little extra class, and some Javascript.

The second draft of our menu can be found here. The differences to note:

  • The li:hover rule is now accompanied by a li.over as well
  • The function fixHover()
  • The function init()

So we added a rule that says any ul with a parent li with a class of over will also get the styling that a ul with a parent with li:hover gets; in this case – display the underlying ul. Next, we added a function (fixHover) that took an element, and retrieved all of it’s immediate children nodes. We then iterate through the list of children, basically adding two events, “mouseover” and “mouseout”, to for each element to observe. For “mouseover” events, append the classname “over” to the element; on “mouseout” events, remove the “over” classname. The essence here is that :hover is the CSS equivalent of observing the “mouseover” and “mouseout” events. The draw back to our solution is that if Javascript is turned off, the sub-menus remain hidden from the user.

NOTE: I am not a designer, so the purpose of this article is to merely illustrate the ability to apply hover-type functionality to any element on the page in IE and not showcase my ability to make things look nice.

Another feature to mention is the init function and the Event.observe() call, which calls the init function after the window has finished loading the page. This is a must because we cannot apply the “mouseover” and “mouseout” event observations until the nodes have been created in the DOM. Best to leave this until the window has loaded. Both of the functions rely on the prototype.js library to retrieve the child nodes, iterate through the nodes, and attach events to the nodes. It is possible to do this without prototype or with another library, but I leave it up to the reader to translate this code to their library of choice.