Writing Excel Spreadsheets Using PHP
When using the Spreadsheet\Excel\Writer library from the PEAR repository, I came across an issue I didn't see handled in the docs (as of this writing, I am using Spreadsheet\Excel\Writer 0.9.1 beta) My application creates spreadsheets that contain order information. Part of each row is a list of up to 20 ISBNs and the quantities desired of each. The issue came in how to handle ISBNs that had a leading zero. When I first looked through the PEAR docs for the library, a Worksheet method, writeString, looked to be the solution. However, the end result was that while the leading zero was maintained, the cell's format was still numeric. This resulted in the application receiving the generated xls to then drop the zero, resulting in an invalid ISBN. Looking over the internals of the Worksheet::writeString method didn't reveal an undocumented feature that would ensure a cell was read as text, regardless of its contents. I next looked at the Format::setNumFormat method as I knew it contained ways to format the number as currency, timestamp, fractions, etc. You could then pass this Format object as the optional fourth parameter to the Worksheet::write method. Contained in the Format::setNumFormat docs was a link to the OpenOffice.org documentation of the Excel File Format (found here, pdf). Interested in how exactly the file was structured, I read on. What I learned that was directly applicable is that each cell contains a pointer to a format definition, or XF record, and it was this XF record where formatting was stored. From the doc, section 4.6:
All cell formatting attributes are stored in XF records…The cell records themselves contain an index into the XF record list. This way of string cell formatting saves memory and decreases the file size.
So if two cells use the same formatting, like the ISBN columns would, each cell would contain a pointer to the XF record that would tell Excel the cell was text. Seciton 4.6.1 lists the 6 groups of formatting attributes, the first of which is number format, which is then an index to a FORMAT record. Okay, we're on to something here. Further in the pdf, in section 5.49, we see the definition of the FORMAT record. Lo and behold, the table of formats from the setNumFormat page is listed in the pdf, but we see that the PEAR listing is incomplete. Scanning the complete table in the pdf, we see index 49, type Text, format string '@'. Bingo. Our code for formatting numeric data as text in a string goes a little something like this (modified from the PEAR example code):
$workbook = new Spreadsheet_Excel_Writer(); $worksheet =& $workbook->addWorksheet(); // We'll show dates with a three letter month and four digit year format $text_format =& $workbook->addFormat(); $text_format->setNumFormat('@'); $worksheet->write(0, 0, "Without formatting"); $worksheet->write(0, 1, '0123'); // cell contains 123 $worksheet->write(1, 0, "With formatting"); $worksheet->write(1, 1, '0123', $text_format); // cell contains 0123
To verify, generate the xls and open it. Right click the cells to modify the format of the cell, and see that the first cell is formatted as a general number, and the second cell is formatted as text. The meta-moral is to read the docs and follow references to get at the source material. Had I not opened the pdf, it may have been a few more time units finding the information on Google. Plus, I learned a lot more about an important file format. I can sleep easy knowing I'm that much more knowledgeable.