Digesting XML documents
by Keld H. Hansen
Introduction
When it comes to parsing XML documents there are several ways to
proceed. One is to use SAX, which is an event-driven tool, that allows you to
catch precisely the data you need. SAX is pretty low-level,
however, and you'd therefore often prefer a tool like JDOM, which builds an in-core tree structure of the XML
document. Using various method calls you may manipulate the tree
as you like. Another approach is to let either Castor or XMLBeans convert the XML document into a linked
structure of Java beans, which are easy to access from your
program.
If you're only interested in parts of an XML document, or you
don't care about fancy tree structures, Digester
from Jakarta
Commons could be an option. It allows you to extract the
parts of the XML document you need, and puts few restrictions on
the way you store data in your program. In general it's simpler
to use than the other tools I mentioned.
It resembles SAX since it links the various XML elements to
methods in your programs, but it's much simpler to use than SAX.
Digester has a programming API, but also has a possibility of
using an XML configuration file to describe how processing
should be done. It furthermore implements a very open
architecture allowing you to define your own processing rules by
coding separate plug-ins.
If you're interested in another XML-tool for your toolbox then
you're welcome to read on. First I'll tell you how to install
Digester, then we'll look into a few basic examples, and finally
we'll build a Struts web application which will process and show
data from an RSS
(Really Simple Syndication--or Rich Site Summary) feed.
Installing Digester
To run Digester you'll need a jar-file for Digester plus 3
additional jar-files from other Jakarta Commons projects: Beanutils
a>, Collections and Logging.
All projects can be downloaded from the same Jakarta
download page. After the download you must place the jar
files in your classpath and you're ready to run.
The Digester design
When using Digester there are some simple rules you must know and follow:
- Determine what data you'll need from the XML document
- You may choose to extract anything from a single data value to all data
-
- Create Java classes (if you don't have them already) to hold the data you
extract
- You must also have methods in these classes that can be used for storing the
data. The standard bean setter-methods are fine for non-Collection type of data.
Collections may be handled by "add-methods", which adds one element to a
Collection. The examples following shortly show how it works.
-
- Digester identifies each XML element by a simple string pattern
- Let's take a simple XML document like this:
<A>
<B>data for B</B>
<C>
<D>data for D</D>
</C>
</A>
To identify the B-element Digester uses the string "A/B".
The data in D is referred to by "A/B/C/D".
The strings are called "Element patterns". More on this in the examples below.
- Rules for element matching
- When an XML element is matched to a pattern you must specify what should
happen in your program. This is done by telling which Java objects should be
created or which Java methods should be called.
-
- If possible use the same names for XML elements and attributes as for
bean properties
- This just makes coding simpler.
New on the Java Boutique:
New Review:
Time Management Made Easy with the Quartz Enterprise Job Scheduler
Why not just use the Java timer API? This open source scheduling
API boasts simplicity, ease-of-integration, a well-rounded feature
set, and it's free!
New Applet:
Reverse Complement
Reverse Complement is a simple applet that converts DNA or RNA
sequences into three useful formats.
Elsewhere on internet.com:
WebDeveloper Java
Lots of Java information on webdeveloper.com
WDVL Java
Thorough Java resource at the Web Developer's Virtual Library.
ScriptSearch Java
Hundreds of free Java code files to download.
jGuru: Your View of the Java Universe
Customizable portal with online training, FAQs, regular news updates, and tutorials.
|