| Sign In/My Account | View Cart |
Parsing and Processing Large XML Documents with Digester Rules
Pages: 1, 2
Download the complete source code for DBUnitRuleSet and DBUnitFlatRuleSet, with an accompanying Maven project. Below is the implementation
of the following rules from DBUnitRuleSet: TableRule,
TableColumnRule, TableRowRule, and TableRowValueRule.
For convenience, the concrete rules could be coded as static inner
classes within RuleSet.
Each rule may handle any combination of:
begin().end().body().In this example, TableRule creates a child copy of the parent context
for each new table, initializes the TABLE_NAME attribute, and creates
a new TABLE_COLUMNS List for column names when handling the open
<table> element. It also drops the current child context from
the Digester stack at the closing </table> element.
private static class TableRule extends Rule {
public void begin( String ns, String name,
Attributes att) {
Map parentCtx = (Map) getDigester().peek();
Map ctx = new HashMap(parentCtx);
ctx.put("TABLE_NAME", att.getValue("name"));
ctx.put("TABLE_COLUMNS", new ArrayList());
ctx.put("TABLE_ROWS", new ArrayList());
getDigester().push( ctx);
}
public void end( String ns, String name) {
getDigester().pop();
}
}
TableColumnRule adds a single column
name into the TABLE_COLUMNS List
in the current context.
private static class TableColumnRule
extends Rule {
public void body( String ns, String name,
String text) {
Map ctx = ( Map) getDigester().peek();
((List) ctx.get("TABLE_COLUMNS")).add(text);
}
}
TableRowRule initializes a TABLE_ROW List
that will be used to store values for the current table row at the opening
<row> element.
This rule also executes SQL to insert data from the current row when the
closing </row> element is handled. This way, the entire XML document is
never loaded into memory. The actual SQL is constructed in the
getStatement() method.
private static class TableRowRule extends Rule {
public void begin( String ns, String name,
Attributes att) {
Map ctx = (Map) getDigester().peek();
ctx.put("TABLE_ROW", new ArrayList());
}
public void end( String ns, String name)
throws SQLException {
Map ctx = (Map) getDigester().peek();
execute(ctx, getStatement(ctx));
ctx.remove("TABLE_ROW");
}
private int execute( Map ctx,
PreparedStatement st) throws SQLException {
List values = (List) ctx.get("TABLE_ROW");
if( values.size()==0) return 0;
for( int i = 0; i<values.size(); i++) {
st.setObject(i+1, values.get(i));
}
return st.executeUpdate();
}
private PreparedStatement getStatement( Map ctx)
throws SQLException {
List cols = (List) ctx.get("TABLE_COLUMNS");
if(cols.size()==0) return null;
String tableName = getTableName(ctx);
StringBuffer sql = new StringBuffer()
.append("INSERT INTO ")
.append(tableName).append("(");
StringBuffer values = new StringBuffer("?");
sql.append(columns.get(0));
for( int i = 1; i<columns.size(); i++) {
sql.append(",").append(columns.get(i));
values.append(",?");
}
sql.append(") VALUES (")
.append(values).append(")");
Connection conn = getConnection(ctx);
return conn.prepareStatement(sql.toString());
}
private Connection getConnection( Map ctx) {
return (Connection) ctx.get("CONNECTION");
}
private String getTableName(Map ctx) {
return (String) ctx.get("TABLE_NAME");
}
}
TableRowValueRule collects column values
for the current row from the <value> element
within the TABLE_ROW List of the current context.
private static class TableRowValueRule
extends Rule {
public void body( String ns, String name,
String text) {
Map ctx = (Map) getDigester().peek();
((List) ctx.get("TABLE_ROW")).add(text);
}
}
The code above does not cache the created
PreparedStatement instances, and instead recreates them every time. This
may cause some performance concerns; however, if this code is used inside of a J2EE container,
a connection is obtained from the container-managed DataSource,
so most likely, caching of prepared statements is being done automatically. If not,
then the getStatement() method can be extended in order to save
created instances of the PreparedStatement within the processing
context. Also, please note that these statements must be explicitly closed at the
end of processing, such as in the end() method of
TableRule.
For event-driven code, testing is twice as important than it is for any other application.
It is not always possible to clearly observe which events will be fired by the
event generator. In our case, events are generated by the SAX XML parser, so
we build test data for this. It does not make a much sense to test each rule
independently, because they are related. On the other hand, for a first shot
at an execution sequence test for DBLoader, we don't
really need a database connection and can use a mocked environment. It is easy to implement such test using the
jMock dynamic mock
testing framework. A mocked Connection and PreparedStatement can verify that
rules are executed in an appropriate order and that they convert all data from XML.
Here is a simple test suite.
public class DBLoaderTest extends TestCase {
...
private static final String DBUNIT_FDATA =
"<dataset>\n"+
" <TABLE1 col1=\"1\" col2=\"11\"/>\n"+
" <TABLE1 col1=\"2\" col2=\"22\"/>\n"+
"</dataset>";
public static Test suite() {
String name = DBLoaderTest.class.getName();
TestSuite suite = new TestSuite(name);
suite.addTest( new DBLoaderTest(
new DBUnitRuleSet(), DBUNIT_DATA));
suite.addTest( new DBLoaderTest(
new DBUnitFlatRuleSet(), DBUNIT_FDATA));
return suite;
}
private final RuleSet ruleSet;
private final String xml;
private DBLoaderTest( RuleSet ruleSet,
String xml) {
super("testDBLoader");
this.ruleSet = ruleSet;
this.xml = xml;
}
public void testDBLoader() throws Exception {
Mock ps = new Mock(PreparedStatement.class);
Object[][] params = new Object[][] {
{ new Integer(1), "1"},
{ new Integer(2), "11"},
{ new Integer(1), "2"},
{ new Integer(2), "22"}};
for( int i = 0; i<params.length; i++) {
ps.expects(new InvokeOnceMatcher())
.method(new IsSetter())
.with(new IsEqual(params[i][0]),
new IsEqual(params[i][1]))
.isVoid();
}
ps.expects(new InvokeCountMatcher(2))
.method("executeUpdate")
.will(new ReturnStub(new Integer(1)));
Mock conn = new Mock(Connection.class);
conn.expects(new InvokedRecorder())
.method("prepareStatement")
.will(new ReturnStub(ps.proxy()));
Reader r = new StringReader( xml);
DBLoader loader = new DBLoader(ruleSet);
loader.load((Connection) conn.proxy(), r);
ps.verify();
conn.verify();
}
public String getName() {
String name = ruleSet.getClass().getName();
return super.getName()+" "+name;
}
public class IsSetter implements Constraint {
public boolean eval( Object o) {
return ((String) o).startsWith("set");
}
}
}
The same test case can be used to test both layouts, because the sequence of
JDBC calls will be the same in both cases for the same data. The method
testDBLoader() creates a Mock for PreparedStatement
and sets its expectations based on the source XML structure. Expected methods are
setObject()/setString() and executeUpdate().
The test method also calls verify() for all mocks after DBLoader
execution to ensure that expectations are met.
As shown above, Digester can help to isolate XML processing logic in maintainable
rules and maintain the advantages of the stream-based XML processing. The code is
easy to understand and test.
Eugene Kuleshov is an independent consultant with over 15 years of experience in software design and development.
Return to ONJava.com.