Java 版 (精华区)
发信人: rhine (有雨无风), 信区: Java
标 题: Programming XML in Java[1]
发信站: 哈工大紫丁香 (2000年12月17日15:10:07 星期天), 站内信件
Programming XML in Java, Part 1
Create Java apps with SAX appeal
Summary
In Web-authoring systems and information channel definitions, in
middleware and in the core of enterprise databases, organizations and
individuals are embracing XML as a powerful tool to help solve their
data-management problems. But as powerful as it may be for
representing data, XML is useless without an application to process it.
In this article, you'll learn to use the Simple API for XML (SAX)
interface to process XML documents in Java. (5,000 words)
By Mark Johnson So, you understand (more or less) how you would
represent your data in XML, and you're interested in using XML to
solve many of your data-management problems. Yet you're not sure how
to use XML with your Java programs.
This article is a follow-up to my introductory article, "XML for the
absolute beginner", in the April 1999 issue of JavaWorld (see the
Resources section below for the URL). That article described XML; I will
now build on that description and show in detail how to create an
application that uses the Simple API for Java (SAX), a lightweight and
powerful standard Java API for processing XML.
The example code used here uses the SAX API to read an XML file and
create a useful structure of objects. By the time you've finished this
article, you'll be ready to create your own XML-based applications.
The virtue of laziness
Larry Wall, mad genius creator of Perl (the second-greatest
programming language in existence), has stated that laziness is one of
the "three great virtues" of a programmer (the other two being
impatience and hubris). Laziness is a virtue because a lazy programmer
will go to almost any length to avoid work, even going so far as
creating general, reusable programming frameworks that can be used
repeatedly. Creating such frameworks entails a great deal of work, but
the time saved on future assignments more than makes up for the
initial effort invested. The best frameworks let programmers do
amazing things with little or no work -- and that's why laziness is
virtuous.
XML is an enabling technology for the virtuous (lazy) programmer. A
basic XML parser does a great deal of work for the programmer,
recognizing tokens, translating encoded characters, enforcing rules on
XML file structure, checking the validity of some data values, and
making calls to application-specific code, where appropriate. In fact,
early standardization, combined with a fiercely competitive marketplace,
has produced scores of freely available implementations of standard XML
parsers in many languages, including C, C++, Tcl, Perl, Python, and, of
course, Java.
The SAX API is one of the simplest and most lightweight interfaces for
handling XML. In this article, I'll use IBM's XML4J implementation of
SAX, but since the API is standardized, your application could
substitute any package that implements SAX.
SAX is an event-based API, operating on the callback principle. An
application programmer will typically create a SAX Parser object, and
pass it both input XML and a document handler, which receives
callbacks for SAX events. The SAX Parser converts its input into a
stream of events corresponding to structural features of the input, such
as XML tags or blocks of text. As each event occurs, it is passed to
the appropriate method of a programmer-defined document handler, which
implements the callback interface org.xml.sax.DocumentHandler. The
methods in this handler class perform the application-specific
functionality during the parse.
For example, imagine that a SAX parser receives a document containing
the tiny XML document shown in Listing 1 below. (See Resources for the
XML file.)
<POEM>
<AUTHOR>Ogden Nash</AUTHOR>
<TITLE>Fleas</TITLE>
<LINE>Adam</LINE>
<LINE>Had 'em.</LINE>
</POEM>
Listing 1. XML representing a short poem
When the SAX parser encounters the <POEM> tag, it calls the user-defined
DocumentHandler.startElement() with the string POEM as an argument. You
implement the startElement() method to do whatever the application is
meant to do when a POEM begins. The stream of events and resulting calls
for the piece of XML above appears in Table 1 below.
Table 1. The sequence of callbacks SAX produces while parsing Listing
1 Item encountered Parser callback
{Beginning of document} startDocument()
<POEM> startElement("POEM", {AttributeList})
"\n" characters("<POEM>\n...", 6, 1)
<AUTHOR> startElement("AUTHOR", {AttributeList})
"Ogden Nash" characters("<POEM>\n...", 15, 10)
</AUTHOR> endElement("AUTHOR")
"\n" characters("<POEM>\n...", 34, 1)
<TITLE> startElement("TITLE", {AttributeList})
"Fleas" characters("<POEM>\n...", 42, 5)
</TITLE> endElement("TITLE")
"\n" characters("<POEM>\n...", 55, 1)
<LINE> startElement("LINE", {AttributeList})
"Adam" characters("<POEM>\n...", 62, 4)
</LINE> endElement("LINE")
<LINE> startElement("LINE", {AttributeList})
"Had 'em." characters("<POEM>\n...", 67, 8)
</LINE> endElement("LINE")
"\n" characters("<POEM>\n...", 82, 1)
</POEM> endElement("POEM")
{End of document} endDocument()
You create a class that implements DocumentHandler to respond to
events that occur in the SAX parser. These events aren't Java events
as you may know them from the Abstract Windowing Toolkit (AWT). They are
conditions the SAX parser detects as it parses, such as the start of
a document or the occurrence of a closing tag in the input stream. As
each of these conditions (or events) occurs, SAX calls the method
corresponding to the condition in its DocumentHandler.
So, the key to writing programs that process XML with SAX is to figure
out what the DocumentHandler should do in response to a stream of method
callbacks from SAX. The SAX parser takes care of all the mechanics of
identifying tags, substituting entity values, and so on, leaving you
free to concentrate on the application-specific functionality that
uses the data encoded in the XML.
Table 1 shows only events associated with elements and characters. SAX
also includes facilities for handling other structural features of XML
files, such as entities and processing instructions, but these are
beyond the scope of this article.
The astute reader will notice that an XML document can be represented as
a tree of typed objects, and that the order of the stream of events
presented to the DocumentHandler corresponds to an in-order, depth-first
traversal of the document tree. (It isn't essential to understand
this point, but the concept of an XML document as a tree data
structure is useful in more sophisticated types of document processing,
which will be covered in later articles in this series.)
The key to understanding how to use SAX is understanding the
DocumentHandler interface, which I will discuss next.
Page 1 of 3, continued...
Page 1.
Page 2. Customize the parser with org.xml.sax.DocumentHandler
Page 3. An applied example: AWT menus as XML
Printer-friendly (all-in-one) version Resources and Related Links
Resources
"XML for the Absolute Beginner," Mark Johnson (JavaWorld, April 1999):
http://www.javaworld.com/javaworld/jw-04-1999/jw-04-xml.html
David Megginson, creator of SAX, has an excellent SAX site:
http://www.megginson.com/SAX/index.html
"Portable Data/Portable Code: XML & Java Technologies," JP Morgenthal --
Sun whitepaper on the combination of XML and Java:
http://java.sun.com/xml/ncfocus.html
"XML and Java: A Potent Partnership, Part 1," Todd Sundsted (JavaWorld,
June 1999) gives an example of how XML and SAX can be useful for
enterprise application integration:
http://www.javaworld.com/javaworld/jw-06-1999/jw-06-howto.html
"Why XML is Meant for Java," Matt Fuchs (WebTechniques, June 1999) is an
excellent article on XML and Java:
http://www.webtechniques.com/archives/1999/06/fuchs/
Download the source files for this article in one of the following
formats:
In jar format (with class and java files):
http://www.javaworld.com/javaworld/jw-03-2000/xmlsax/SAXMar2000.jar
In tgz format (gzipped tar):
http://www.javaworld.com/javaworld/jw-03-2000/xmlsax/SAXMar2000.tgz
In zip format:
http://www.javaworld.com/javaworld/jw-03-2000/xmlsax/SAXMar2000.zip
--
海纳百川,
有容乃大,
壁立千尺,
无欲则刚。
※ 来源:·哈工大紫丁香 bbs.hit.edu.cn·[FROM: dip.hit.edu.cn]
Powered by KBS BBS 2.0 (http://dev.kcn.cn)
页面执行时间:5.543毫秒