On a recent logistics project, a customer asked our team to build a web site that would allow users to query a legacy system for shipment information. The customer defined three main requirements:
Our team had plenty of experience with J2EE web applications, but we had little experience with PDF documents. We needed to find a pure Java class library that could produce sophisticated PDF documents in a server-side web application. We found a solution that completely met our needs: iText.
iText is an open source pure Java class library for creating and manipulating PDF documents. Bruno Lowagie and Paulo Soares lead the project. The iText API enables a Java developer to programmatically create PDF documents. iText delivers a rich set of features:
|
Related Reading
Java Servlet Programming |
iText is an open source library. At the time of this writing, the iText software is available under a dual license: the Mozilla Public License (MPL) and the LGPL. Consult the iText web site for details. In this article, you'll see the iText API in action. We will demonstrate how to use iText and servlets to dynamically generate PDF documents in a server-side application.
First, you will need to obtain the iText JAR file. Visit the iText web site and download the current release. At the time of this writing, the current iText release is version 0.99. The iText web site provides API documentation and a comprehensive tutorial.
In addition to iText, we'll be using servlets, too. If you aren't familiar with servlets, you can learn about them in Jason Hunter's book, Java Servlet Programming. You will need to obtain a J2EE application server or a standalone servlet engine. Some good open source options are Tomcat, Jetty, and JBoss. The rest of this article assumes that you are using Jakarta Tomcat 4.1.
The iText API is intuitive and easy to use. Using iText, you will be able to programmatically create customized PDF documents. The iText library consists of the following packages:
com.lowagie.servlets
com.lowagie.text
com.lowagie.text.html
com.lowagie.text.markup
com.lowagie.text.pdf
com.lowagie.text.pdf.codec
com.lowagie.text.pdf.hyphenation
com.lowagie.text.pdf.wmf
com.lowagie.text.rtf
com.lowagie.text.xml
com.lowagie.tools
For generating PDF files, you'll need only com.lowagie.text and
com.lowagie.text.pdf.
Our example application uses these iText classes:
com.lowagie.text.pdf.PdfWriter
com.lowagie.text.Document
com.lowagie.text.HeaderFooter
com.lowagie.text.Paragraph
com.lowagie.text.Phrase
com.lowagie.text.Table
com.lowagie.text.Cell
The key classes are Document and PdfWriter. You
will always use both of these classes when creating PDF documents.
Document is an object-oriented representation of a PDF document.
You can add content to the document by invoking methods provided by the
Document class. A PdfWriter object associates a
Document with a java.io.OutputStream object.
Coordinate System for iText DocumentsWhen I wrote my first iText program, I stumbled over the coordinate system. I naively assumed that iText's coordinate system was identical to Swing's coordinate system. This is not the case. In Swing, the origin (0, 0) is located in the upper left-hand corner of a component. In iText, the origin is located in the bottom left-hand corner of a page. |
During your design phase, you must decide how you plan to use iText. I've built web applications using both of the following techniques.
Create the PDF file on the server's filesystem. The application
uses java.io.FileOutputStream to write the file to
the server's filesystem. The user will download the file via HTTP
GET.
Create the PDF file in memory using
java.io.ByteArrayOutputStream. The application sends the PDF
bytes to the client via the servlet's output stream.
|
Source Code Download the source code for this example: |
I prefer technique B to technique A because the application does not write to the server's filesystem, and the application is guaranteed to work in a clustered server environment. Technique A can fail if your application runs in a clustered environment, and the server cluster does not provide session affinity.
Our example application consists of a single class: PDFServlet.
This servlet uses technique B from the previous section. The
OutputStream is a java.io.ByteArrayOutputStream. With
ByteArrayOutputStream, the PDF document bytes will be in memory.
When PDFServlet receives an HTTP request, it will dynamically
generate a PDF document and send the document to the client.
The PDFServlet class extends
javax.servlet.http.HttpServlet and imports two of the iText
packages, com.lowagie.text and com.lowagie.text.pdf.
doGet MethodMost servlets override either the doPost method or the
doGet method. Our servlet is no different. The
PDFServlet class overrides the doGet method. The
servlet will generate a PDF file any time it receives an incoming HTTP GET
request.
In a nutshell, the servlet's doGet method does the
following:
ByteArrayOutputStream object that contains the PDF
document bytes.
Figure 1. Editing doGet in Eclipse
generatePDFDocumentBytes MethodThe generatePDFDocumentBytes method is responsible for creating
the PDF document. The three most important objects in this method are the
Document object, the ByteArrayOutputStream object,
and the PdfWriter object. The PdfWriter associates
the Document with the ByteArrayOutputStream.
Document doc = new Document();
ByteArrayOutputStream baosPDF = new ByteArrayOutputStream();
PdfWriter docWriter = null;
docWriter = PdfWriter.getInstance(doc, baosPDF);
// ...
Adding content to a Document is done with the add method.
doc.add(new Paragraph(
"This document was created by a class named: "
+ this.getClass().getName()));
doc.add(new Paragraph(
"This document was created on "
+ new java.util.Date()));
When you are done adding content, close the Document and
PdfWriter objects.
doc.close();
docWriter.close();
After closing the document, the ByteArrayOutputStream object is returned to the caller.
return baosPDF;
The ByteArrayOutputStream contains all bytes for the PDF
document.
In this application, we care only about four HTTP response headers:
Content-type, Content-disposition,
Content-length, and Cache-control. If you've never
worked with HTTP headers before, consult the HTTP 1.1 specification.
Examine the doGet method in the PDFServlet.
You'll notice that the HTTP response headers are set before any data is
written to the servlet output stream. This is an important, yet subtle, point.
Let's look at each response header in more detail.
|
Content-typeIn servlets, HttpServletResponse has a content type that
indicates the type of content that the response contains. For PDF files, the
content type is application/pdf. If the servlet does not set a
content type, the web browser may have a difficult time determining how to
handle the file.
PDFServlet sets the content type with the following line:
resp.setContentType("application/pdf");
Content-dispositionThe Content-disposition header provides information that helps
a web browser identify the content of the HTTP response. When a web browser
reads this header, it can determine:
RFC 2183 provides a full explanation of the Content-disposition header.
By setting the Content-disposition header
appropriately, the servlet can instruct the browser to display the
file "inline," or to treat it like an attachment.
Example 1. Displaying a file inline
Content-disposition: inline; filename=foobar.pdf
Example 2. Attaching a file to the response
Content-disposition: attachment; filename=foobar.pdf
The following pseudo-code demonstrates how to set the header:
public void doGet(HttpServletRequest req, HttpServletResponse resp)
{
// ...
resp.setHeader(
"Content-disposition",
"inline; filename=foobar.pdf" );
// ...
}
Cache-Control HeadersDepending upon the nature of your application, you may or may not want web browsers to cache the PDF files that you are generating. There are a variety of HTTP headers that a server-side web application can use to control caching of content. Some examples are:
Cache-Control: no-cache Cache-Control: no-store Cache-Control: must-revalidate Cache-Control: max-age=30 Pragma: no-cache Expires: 0 A full explanation of Cache-Control headers is found
in the HTTP 1.1 specification.
The PDFServlet sets Cache-Control to
max-age=30. This header tells the web browser to cache the file
for a maximum of 30 seconds.
Content-lengthThe Content-length header must be set to the number of bytes in
the PDF file. If the Content-length header is not set correctly,
the web browser may not be able to display the file. Example code might
be:
ByteArrayOutputStream baos = getByteArrayOutputStream();
resp.setContentLength(baos.size());
PDFServlet sends the PDF document to the client by writing
bytes to the servlet's output stream. It obtains the output stream by calling
getOutputStream() on the HttpServletResponse object.
getOutputStream returns an object of type
javax.servlet.ServletOutputStream.
ServletOutputStream sos;
sos = resp.getOutputStream();
baos.writeTo(sos);
sos.flush();
After writing all data to the stream, call the flush()
method to send all bytes to the client.
To run the PDFServlet in Tomcat, you'll need to package the application in a
WAR file. The iText JAR file (itext-0.99.jar) must be placed in
the WAR file's lib directory. If you forget to include the
iText JAR file, the servlet will fail with a
java.lang.NoClassDefFoundError.
After the WAR file has been deployed, you are ready to test the servlet. Jakarta Tomcat listens for requests on port 8080.
Point your web browser to http://hostname:8080/pdfservlet/createpdf.
When you visit the URL, the servlet executes and sends a PDF document back to your browser.
iText provides a great low-level API for producing PDF documents. However, it may not be the best tool for every application.
At my day job, we used iText in combination with Microsoft Word and Adobe Acrobat. First, our team designed a shipment form using Microsoft Word. Next, we converted the Word document to PDF using Adobe Acrobat. Then, using iText's template capability, we loaded the PDF file into our application. From there, it was quite easy to fill in data values on the form and output the final PDF document.
For report-oriented web applications, tools like JasperReports provide a higher level of abstraction than iText.
When your Java application needs to dynamically create PDF documents, the iText class library is a great solution. You can experiment with iText's capabilities by enhancing and extending the code in this article. In a short time, you'll be able to impress your co-workers and customers with sophisticated PDF documents.
If you are exploring Microsoft's .NET platform, be sure to check out iTextdotNet and iTextSharp. Both projects are derived from the Java-based iText library. iTextSharp is written in Microsoft's C# language.
Sean C. Sullivan has been developing Internet applications with Java since 1996. His recent work includes B2B web applications, various open source projects, and the development of an Internet e-commerce payment system at Intel.
Return to ONJava.com.
Copyright © 2009 O'Reilly Media, Inc.