Converting a site to use Cachefly for static content

I recently needed to move static content from a live site to a cachefly account. Rather than go through the directories, looking for the resources (js/css/images) I needed to ftp, I thought, "Man, this sure sounds like it could be automated". The first step was to collect a list of items that needed ftping to cachefly. I know what you're saying, "Use find!" In case Ben is reading this, find "searchs for files in a directory hierarchy" (that's from find's man page Ben). I wanted to separate the resources out so I ran three different invocations. For javascripts and css, the invocation was nearly identical:

    find . -name '*.js' > js.libs
    find . -name '*.css' > css.libs

Images were a little trickier. Most of the images are static content, but some are user-generated, likely to change or be removed. These do not go up to the CDN (at least for now). The user-generated content is located under one directory (call it /images/usergen), so we simply need to exclude it from find's search.

    find -path '*images/usergen*' -prune -o -path . -iname '*.gif' -o -iname '*.jpg' -o -iname '*.png' > image.files

The important parts:

  • #+BEGIN_EXAMPLE -path 'images/usergen' -prune #+END_EXAMPLE

    Remove any found items that contain images/usergen in the path name.

  • #+BEGIN_EXAMPLE -o -path . #+END_EXAMPLE

    Search within the current directory (the root of the project).

  • #+BEGIN_EXAMPLE -iname '.gif' -o -iname '.jpg' -o -iname '*.png' #+END_EXAMPLE

    Match, case-insensitive (-iname instead of -name), any files ending in gif, jpg, or png.

We are then left with three files, each line of which contains the path, relative to the project root, of each resource I want to upload. I created a simple php script to upload the images, maintaining the pathing, to cachefly. So an image with relative path /images/header/header\_left.png would now be accessible at instance.cachefly.com/images/header/header\_left.png. So the images are now up on the CDN. Now we need our code to point there as well. Fortunately, most of the resources were prepended with a domain (stored in the global $live\_site). So the src attribute of an image, for instance, would be src="< ?= $live\_site ?>/images/header/header\_left.png". Creating a $cachefly\_site global, we now only need to find lines in our code that have a basic layout of "stuff……$live\_site…stuff…..png" where stuff is (.*) in regex land. So we utilize two commands, find and egrep. Find locates files we want and egrep searches the found files for a regex that would locate the resources in the code. So first, we build the regex. We know a couple elements that need to be present, and one that should not be present. Needed are live\_site and a resource extension (js/css/jpg/png/gif), and not needed is the "images/usergen" path, as this points to user generated content. So the regex becomes:

    'live_site([^images/usergen])+.+(png|gif|jpg|css|js)'

This is the arg for egrep (the -l switch means print the file names that have a match, rather than the lines of a file that match):

    egrep -lr 'live_site([^images/usergen])+.+(png|gif|jpg|css|js)'

Now we need to tell egrep what files to search using find:

    find . -name "*.php" -exec egrep -lr 'live_site([^images/usergen])+.+(png|gif|jpg|css|js)' {} \;

We then store this list of files into a shell variable:

    export FILES=`find . -name "*.php" -exec egrep -lr 'live_site([^images/usergen])+.+(png|gif|jpg|css|js)' {} \;`

Now that we have the files we need, we can search and replace $live\_site with $cachefly\_site for resources. The goto command for search and replace is sed. The sed command will look generically like this:

    sed -i 's/search/replace/g' FILE

We actually have two issues though. Due to the nature of the code, we have to account for the $live\_site variable being passed in via the global keyword. So not only are we searching for resource files, but we also have to add $cachefly\_site to the global lines to make sure $cachefly\_site is defined within the function where output is generated. Searching and replacing resource files is pretty easy:

    sed -i '/live_site.+\|js\|css\|gif\|png\|jpg/s/live_site/cachefly_site/g' $FILES

$FILES, of course, came from our find/egrep call earlier. There is one catch to the regex used here. It is actually of a different generic form than mentioned above:

    sed -i '/contains/s/search/replace/g' FILE

With this format, we put a condition on whether to replace text, meaning the regex in the "contains" portion must be matched before the search and replace is performed on that line. So our sed above says if the line contains live\_site, followed by anything, ending in one of the listed resources (\| means OR), then replace live\_site with cachefly\_stite. I left of the $ since its common to both variables. Running the sed command replaces everything nicely, but when we reload the page, we see notices about $live\_site being undefined and resources being pulled from the host and not cachefly. So we need to handle the global importing. This one is a little tricker because we are not really replacing live\_site with cachefly\_site, but appending it to the list of imported globals. So a line like

    global $foo, $bar, $live_site, $baz;

becomes

    global $foo, $bar, $live_site, $cachefly_site, $baz;

The other trick is that the global line should not already contain $cachefly\_site. We don't need that redundancy. So, without further ado, the sed:

    sed -i '/global.*live_site.*\(cachefly_site\)\{0\}/s/live_site/live_site,\$cachefly_site/g' $FILES

The "contains" portion matches the keyword global, followed by stuff, followed by live\_site followed by stuff, with cachefly\_site appearing exactly 0 times (denoted by \{0\}). This ensures we only replace live\_site when cachefly\_site is not in the line already. The "search" portion is easy; search for live\_site. The replace portion replaces live\_site with live\_site,$cachefly\_site. This takes into account when live\_site is followed by a comma or semi-colon so we don't get syntax errors. And that is basically how I converted a site to use cachefly for static content.