I recently needed to move static content from a live site to a cachefly account. Rather than go through the directories, looking for the resources (js/css/images) I needed to ftp, I thought, “Man, this sure sounds like it could be automated”.
The first step was to collect a list of items that needed ftping to cachefly. I know what you’re saying, “Use find!” In case Ben is reading this, find “searchs for files in a directory hierarchy” (that’s from find’s man page Ben). I wanted to separate the resources out so I ran three different invocations.
For javascripts and css, the invocation was nearly identical:
find . -name '*.js' > js.libs find . -name '*.css' > css.libs
Images were a little trickier. Most of the images are static content, but some are user-generated, likely to change or be removed. These do not go up to the CDN (at least for now). The user-generated content is located under one directory (call it /images/usergen), so we simply need to exclude it from find’s search.
find -path '*images/usergen*' -prune -o -path . -iname '*.gif' -o -iname '*.jpg' -o -iname '*.png' > image.files
The important parts:
-
-path '*images/usergen*' -prune
Remove any found items that contain images/usergen in the path name.
-
-o -path .
Search within the current directory (the root of the project).
-
-iname '*.gif' -o -iname '*.jpg' -o -iname '*.png'
Match, case-insensitive (-iname instead of -name), any files ending in gif, jpg, or png.
We are then left with three files, each line of which contains the path, relative to the project root, of each resource I want to upload. I created a simple php script to upload the images, maintaining the pathing, to cachefly. So an image with relative path /images/header/header_left.png would now be accessible at instance.cachefly.com/images/header/header_left.png.
So the images are now up on the CDN. Now we need our code to point there as well. Fortunately, most of the resources were prepended with a domain (stored in the global $live_site). So the src attribute of an image, for instance, would be src=”< ?= $live_site ?>/images/header/header_left.png”. Creating a $cachefly_site global, we now only need to find lines in our code that have a basic layout of “stuff……$live_site…stuff…..png” where stuff is (.*) in regex land. So we utilize two commands, find and egrep. Find locates files we want and egrep searches the found files for a regex that would locate the resources in the code.
So first, we build the regex. We know a couple elements that need to be present, and one that should not be present. Needed are live_site and a resource extension (js/css/jpg/png/gif), and not needed is the “images/usergen” path, as this points to user generated content. So the regex becomes:
'live_site([^images/usergen])+.+(png|gif|jpg|css|js)'This is the arg for egrep (the -l switch means print the file names that have a match, rather than the lines of a file that match):
egrep -lr 'live_site([^images/usergen])+.+(png|gif|jpg|css|js)'
Now we need to tell egrep what files to search using find:
find . -name "*.php" -exec egrep -lr 'live_site([^images/usergen])+.+(png|gif|jpg|css|js)' {} \;
We then store this list of files into a shell variable:
export FILES=`find . -name "*.php" -exec egrep -lr 'live_site([^images/usergen])+.+(png|gif|jpg|css|js)' {} \;`
Now that we have the files we need, we can search and replace $live_site with $cachefly_site for resources. The goto command for search and replace is sed. The sed command will look generically like this:
sed -i 's/search/replace/g' FILE
We actually have two issues though. Due to the nature of the code, we have to account for the $live_site variable being passed in via the global keyword. So not only are we searching for resource files, but we also have to add $cachefly_site to the global lines to make sure $cachefly_site is defined within the function where output is generated.
Searching and replacing resource files is pretty easy:
sed -i '/live_site.+\|js\|css\|gif\|png\|jpg/s/live_site/cachefly_site/g' $FILES
$FILES, of course, came from our find/egrep call earlier. There is one catch to the regex used here. It is actually of a different generic form than mentioned above:
sed -i '/contains/s/search/replace/g' FILE
With this format, we put a condition on whether to replace text, meaning the regex in the “contains” portion must be matched before the search and replace is performed on that line.
So our sed above says if the line contains live_site, followed by anything, ending in one of the listed resources (\| means OR), then replace live_site with cachefly_stite. I left of the $ since its common to both variables.
Running the sed command replaces everything nicely, but when we reload the page, we see notices about $live_site being undefined and resources being pulled from the host and not cachefly. So we need to handle the global importing.
This one is a little tricker because we are not really replacing live_site with cachefly_site, but appending it to the list of imported globals. So a line like
global $foo, $bar, $live_site, $baz;
becomes
global $foo, $bar, $live_site, $cachefly_site, $baz;
The other trick is that the global line should not already contain $cachefly_site. We don’t need that redundancy. So, without further ado, the sed:
sed -i '/global.*live_site.*\(cachefly_site\)\{0\}/s/live_site/live_site,\$cachefly_site/g' $FILES
The “contains” portion matches the keyword global, followed by stuff, followed by live_site followed by stuff, with cachefly_site appearing exactly 0 times (denoted by \{0\}). This ensures we only replace live_site when cachefly_site is not in the line already.
The “search” portion is easy; search for live_site. The replace portion replaces live_site with live_site,$cachefly_site. This takes into account when live_site is followed by a comma or semi-colon so we don’t get syntax errors.
And that is basically how I converted a site to use cachefly for static content.