| Sign In/My Account | View Cart |
In our previous article, we discussed what the POI project is all about, showed how to read and write OLE 2 Compound Document files, and gave a brief history of the POI project. Probably half of the folks who read that article are scratching their heads now, thinking "How do I write out a spreadsheet?" Good news! In this article we'll show you how to read and write Excel files and how to read any Microsoft file's document properties.
The POI project is nearing a 2.0 release and is in a stage of rapid development, with new features and changes integrating nightly. In order to keep this article relevant, we'll refer to the recent 1.9-development release. While compiling the sources with later releases should work, there may be minor changes between now and the 2.0 release.
The Microsoft Excel 97 file format is also known as "BIFF8." Recent versions of Excel have changed very little about this file format, and writing out the new intricacies would serve no purpose other than to make everyone upgrade. So when we say Excel 97 format, we mean Excel 97-to-XP format.
|
Related Reading
Head First Java |
Our implementation of the Excel 97 file format is called "HSSF," which stands for, you guessed it, Horrible SpreadSheet Format. (We admire their method of making simple things complicated and oversimplifying things that should have been done with more flexibility.) HSSF may have a comical name, but is a very serious API. HSSF lets you read, write, and modify Excel files using nothing but Java.
What does HSSF have to do with POIFS, which was covered in the previous article? Like all of the other POI APIs, it is built on top of POIFS. So there is code contained within HSSF that is very similar to the examples included with the previous article. Generally, however, you don't need to know about POIFS APIs when writing to HSSF APIs.
HSSF has two APIs for reading: usermodel and eventusermodel. The former is
most familiar, and the latter is more cryptic but far more efficient. The
usermodel consists primarily of the classes in the
org.apache.poi.hssf.usermodel package, as well as
org.apache.poi.hssf.eventusermodel. (In earlier versions of HSSF,
this was in the eventmodel package.) The usermodel package maps
the file into familiar structures like Workbook,
Sheet, Row, and Cell. It stores the
entire structure in memory as a set of objects. The eventusermodel package
requires you to become more familiar with the actual low-level structures of
the file format. It operates in a manner similar to XML's SAX APIs or the AWT
event model (the origin of the name)--and can be trickier to use. It is also
read-only, so you cannot modify files using the eventusermodel.
Reading files using the HSSF usermodel is simple. Create a new
inputstream and construct an instance of
HSSFWorkbook.
InputStream myxls = new FileInputStream("workbook.xls"));
HSSFWorkbook wb = new HSSFWorkbook(myxls);
With the HSSFWorkbook instance, you can now retrieve a sheet,
its rows, and its cells:
HSSFSheet sheet = wb.getSheetAt(0); // first sheet HSSFRow row = sheet.getRow(2); // third row HSSFCell cell = row.getCell((short)3); // fourth cell
This fetches the fourth cell of the third row from the first sheet in the workbook. You can retrieve a value from the cell object. Be sure to note the cell type before retrieving its value.
if (cell.getCellType() == HSSFCell.CELL_TYPE_STRING) {
("The Cell was a String with value " + cell.getStringCellValue());
} else if (cell.getCellType() == HSSFCell.CELL_TYPE_NUMERIC) {
("The cell was a number " + cell.getCellValue());
} else () {
("The cell was nothing we're interested in");
}
Note that you may get an exception if you try and ask for an unrelated datatype.
Handle dates carefully while using HSSF. Excel stores all dates as numbers, internally. The only way to distinguish a date is by the formatting of the cell. (If you have ever formatted a cell containing a date in Excel, you will know what I mean.)
Therefore, for a cell containing a date, cell.getCellType()
will return HSSFCell.CELL_TYPE_NUMERIC. However, you can use a
utility function, HSSFDateUtil.isCellDateFormatted(cell), to check
if the cell can be a date. This function checks the format against a few
internal formats to decide the issue, but by its very nature it is prone to false
negatives.
Appendix 1 contains a complete example of using HSSF to create and return a workbook from a servlet.
Example 1 reads in an Excel spreadsheet and converts it to comma-separated values.
Writing XLS files is even simpler. Create a new instance of
HSSFWorkbook. At some point, you'll need to create an
outputstream to write out the file to disk; however,
this can be done at the end.
HSSFWorkbook wb = new HSSFWorkbook();
FileOutputStream fileOut = new FileOutputStream("workbook.xls");
wb.write(fileOut);
fileOut.close();
To create sheets and rows, you must do so from the parent object. For instance:
HSSFSheet sheet = wb.createSheet();
HSSFRow row = sheet.createRow((short)0);
HSSFCell cell = row.createCell((short)0);
cell.setCellValue(1);
row.createCell((short)1).setCellValue(1.2);
row.createCell((short)2).setCellValue("This is a string");
row.createCell((short)3).setCellValue(true);
To style a cell, create a style and assign it to the cell. Assign that
style to as many cells as should be styled in that fashion. This confuses a
number of new users of HSSF, as it is currently possible to create an invalidly
large number of styles. Ideally, you want to create one style for the same
rules that you assign it. If you have a summary row and want to make its cells
bold and underlined, create a summaryRowStyle and assign it to any
cell that is on a summary row.
It is important to realize that the CellFormat and
CellStyle objects are members of the workbook that are referenced
by the cell.
...
HSSFCellStyle style = workbook.createCellStyle();
style.setDataFormat(HSSFDataFormat.getBuiltinFormat("($#,##0_);[Red]($#,##0)"));
style.setFillBackgroundColor(HSSFColor.AQUA.index);
style.setFillPattern(HSSFCellStyle.BIG_SPOTS);
...
someCell.setCellStyle(style);
someOtherCell.setCellStyle(style);
Newer editions of HSSF allow you use a limited set of formulas. Note that this is a beta-quality feature. You should do appropriate testing before using it.
A formula assignment looks like this:
someCell.setCellFormula(SUM(A1:A2:);
At present, you can use any built-in function or operator in the formula,
except logical operators and functions (such as the IF function).
This feature is being worked on even as we write this.
Files are modified by reading and rewriting. Example 3 takes in an Excel
file and changes the values based on rules specified in a
.property file.
Reading files via the eventusermodel is much harder but is much more memory-efficient, since it expects the application to process the data as it is read. Reading in a file in this way is analogous to reading an XML document using SAX, as opposed to DOM. You have to register your interest in the structures you want, and the library will call back when such structures are encountered. Before using this model, however, you must familiarize yourself with some basic structures in an Excel workbook.
In HSSF, the low-level binary structures are called records. Records come
in different types, each of which is modelled as a Java class in the
org.apache.poi.hssf.record package. For example, the
BOFRecord denotes the start of the Workbook or
Sheet section. The RowRecord denotes that a row
exists and stores its style information. Any record exposing the
CellValueRecordInterface is a cell value. These include
NumericRecord, LabelSSTRecord, and
FormulaRecord. (There are a few more that are obsolete or for
optimization, but generally speaking, HSSF converts them.)
See The Structure of an HSSF Spreadsheet for more details.
private EventRecordFactory factory = new EventRecordFactory();
factory.registerListener(new ERFListener() {
public boolean processRecord(Record rec) {
(got BOF Record);
return true;
}
}, new short[] {BOFRecord.sid});
factory.processRecords(someInputStream);
HSSF, as we mentioned, is built on top of POIFS. More to the point, an
Excel 97+ file is an OLE 2 Compound Document. The underlying OLE 2 Compound
Document stores a stream, or File. This "stream" is always named
Workbook (except in Excel 95, which HSSF does not handle).
Currently, HSSF deals strictly with this stream and delegates all functionality
to POIFS. In recent builds, HSSF is aware enough to preserve these other
nodes. Macros and Images are stored in separate
streams and sometimes even separate directories within the OLE 2 CDF file.
Macros should be preserved; however, we have no API as of yet to
handle them.
Within each stream is a set of records. A record is just an array of bytes, with a header and a body. The header contains the record type (AKA the "id") and the length of the following data. The body is broken up into fields. Fields contain numeric data (including references to other records), character data, or flags.
The following is the top-level structure of an Excel Workbook:
Bla.xls {
OLE2CDF headers
"Workbook" stream {
Workbook {
Static String Table Record..
Sheet names... and pointers
}
Sheet {
ROW
ROW
...
NUMBER RECORD (cell)
LABELSST Record (cell)
...
}
Sheet
}
}
... images, macros, etc.
Document Summary
Summary
Pages: 1, 2 |