精华区文章阅读

发信人: rhine (有雨无风), 信区: Java
标  题: XML APIs for Databases
发信站: 哈工大紫丁香 (2000年12月17日19:36:37 星期天), 站内信件

XML APIs for Databases
By Ramnivas Laddad ; Reprinted from JavaWorld
For processing XML documents, most XML tools work with the SAX or DOM
API. In this article, we'll look at a way to implement the same APIs
directly over a database, enabling XML tools to treat databases as if
they were XML documents. That way, we can obviate the need of converting
a database.

We'll see an implementation of the SAX API for Databases that should
work with any database with a JDBCTM driver. Next, we'll examine an
implementation of the DOM API for Databases that uses the SAX API
internally. To demonstrate the SAX API for Databases, we'll look at
its integration with XT (an XSLT processor). We'll also see an example
of how such integration can create HTML pages that incorporate an XSLT
stylesheet directly from a database and how it can convert a database
into an XML document. Finally, we'll look at how the DOM API for
Databases integrates with an XQL processor.

In this article, I make use of existing tools rather than create new
tools to illustrate the applications of the SAX and DOM APIs for
Databases. I show how to leverage a number of available XML tools to
work with a database. All the XML tools I mention are either available
for free or free for noncommercial use (though you should, of course,
check licensing agreements).

Overview of the SAX and DOM APIs
SAX is an event-based API for XML. With it, the SAX parser reports
events such as the start and end of elements to the application as it
walks over the document. Because the parser reports events as it
visits different parts of the document, it does not have to build any
internal structure. That reduces the strain on system resources, which
makes the parser attractive for large documents. For XML documents
received as continuous streams, an event-based API is the only choice.

The DOM API, on the other hand, follows a treelike construct. Elements
have parent-child relations with other elements. With this API, the
parser builds an internal structure such that an application can
navigate it in a treelike fashion. DOM allows an application to have
random access to the tree-structured document at the cost of increased
memory usage.

XML APIs for databases: The basics
Because of a database's highly regular data-storage structure, we can
map it into data-centric XML documents. For example, we can transform
a database table into an XML document with a DTD of the following form:

<!ELEMENT table rows*>
<!ELEMENT rows (column1, column2, ...)>
<!ELEMENT column1 #PCDATA>
<!ELEMENT column2 #PCDATA>
....

In other words, with an XML API for databases, we can make the
database look like an XML document; these APIs present the database as a
virtual XML document. We are at the most basic concept of
object-oriented design: it is the interface -- not the implementation --
that matters. In our situation, the tools using such an XML API need
not care whether they are operating on a database table or an XML file.

A SAX or DOM parser can enable XML tools to work directly with databases

Back to Top

Implementing the SAX API for Databases
To implement the SAX API for Databases, we need to implement a parser
that operates on a JDBC data source, iterates over each row and column,
and generates appropriate SAX events while iterating. The SAX
specification provides the org.xml.sax.InputSource class that models a
data source representing a URL or a byte stream. To represent a
database, we need a specialized form of it that can represent a table in
a database. We therefore implement JDBCInputSource, which extends the
org.xml.sax.InputSource class. Let's look at JDBCInputSource in more
detail:
// JDBCInputSource.java
package dbxml.sax;
import java.sql.*;
import org.xml.sax.InputSource;
public class JDBCInputSource extends InputSource {
private String _connectionURL;
private String _userName;
private String _passwd;
private String _tableName;
public JDBCInputSource(String connectionURL, String userName,
String passwd, String tableName) {
super(connectionURL);
_connectionURL = connectionURL;
_userName = userName;
_passwd = passwd;
_tableName = tableName;
}
public String getTableName() {
return _tableName;
}
public Connection getConnection() throws SQLException {
return DriverManager.getConnection(_connectionURL, _userName, _passwd);

}
}

In the code above, the constructor takes the information needed to
connect to a database and the name of the table to be parsed. The method
getConnection() connects to the database and returns a Connection
object.

Next, we need to implement the SAX parser that uses JDBCInputSource to
iterate over database table rows and columns and generates SAX events
along the way. To simplify the code, we create an abstract ParserBase
class, which implements the org.xml.sax.Parser and has responsibility
only for managing various handlers. We then create our SAX parser for
the JDBC source JDBCSAXParser that extends the ParserBase class.

(To view the code for ParserBase.java, click here.)

// JDBCSAXParser.java
package dbxml.sax;
import java.io.IOException;
import java.sql.*;
import org.xml.sax.*;
import org.xml.sax.helpers.AttributeListImpl;
public class JDBCSAXParser extends ParserBase {
private static final AttributeList _stockEmptyAttributeList
= new AttributeListImpl();
//-------------------------------------
// Methods from the Parser interface
//-------------------------------------
public void parse (InputSource source) throws SAXException,
IOException {
if (! (source instanceof JDBCInputSource)) {
throw new SAXException("JDBCSAXParser can work only with source "
+ "of JDBCInputSource type");
}
parse((JDBCInputSource)source);
}

public void parse (String systemId) throws SAXException, IOException {
throw new SAXException("JDBCSAXParser needs more information to "
+ "connect to database");
}

//-------------------------
// Additional methods
//--------------------------
public void parse(JDBCInputSource source)
throws SAXException, IOException {
try {
Connection connection = source.getConnection();
if (connection == null) {
throw new SAXException("Could not establish connection with "
+ "database");
}

String sqlQuery = getSelectorSQLStatement(source.getTableName());
PreparedStatement pstmt = connection.prepareStatement(sqlQuery);

ResultSet rs = pstmt.executeQuery();
parse(rs, source.getTableName());
rs.close();

connection.close();
} catch (SQLException ex) {
throw new SAXException(ex);
}
}

public void parse(ResultSet rs, String tableName)
throws SAXException, SQLException, IOException {
if (_documentHandler == null) {
return; // nobody is interested in me, no need to sweat!
}

ResultSetMetaData rsmd = rs.getMetaData();
int numCols = rsmd.getColumnCount();

String tableMarker = getTableMarker(tableName);
String rowMarker = getRowMarker();

_documentHandler.startDocument();
_documentHandler.startElement(tableMarker, _stockEmptyAttributeList);
while(rs.next()) {
_documentHandler.startElement(rowMarker,
_stockEmptyAttributeList);
for (int i = 1; i <= numCols; i++) {
generateSAXEventForColumn(rsmd, rs, i);
}
_documentHandler.endElement(rowMarker);
}
_documentHandler.endElement(tableMarker);
_documentHandler.endDocument();
}

public void parse(String connectionURL, String userName, String passwd,

String tableName) throws SAXException, IOException {
parse(new JDBCInputSource(connectionURL, userName, passwd, tableName));

}

//--------------------------------------
// Protected methods that derived classes could override to
// customize the parsing.
//--------------------------------------
protected void generateSAXEventForColumn(
ResultSetMetaData rsmd,
ResultSet rs,
int columnIndex)
throws SAXException, SQLException {
String columnValue = rs.getString(columnIndex);
if (columnValue == null) {
return;
}
String columnMarker
= getColumnMarker(
rsmd.getColumnLabel( columnIndex));
char[] columnValueChars =
columnValue.toCharArray();
_documentHandler.startElement(
columnMarker,
_stockEmptyAttributeList);
_documentHandler.characters(
columnValueChars,
0, columnValueChars.length);
_documentHandler.endElement(
columnMarker);
}

protected String getTableMarker(
String tableName) {
return tableName;
}
protected String getRowMarker() {
return "row";
}
protected String getColumnMarker(
String columnName) {
return columnName;
}
protected String
getSelectorSQLStatement(
String tableName) {
return "select * from " +
tableName;
}
}

Let's examine the code in more detail. JDBCSAXParser includes several
overloaded parse() methods. In the list below, the org.xml.sax.Parser
interface requires implementing the parse(InputSource) and parse(String)
methods. The other parse() methods simplify the code and allow
derived classes to override them to modify the parser behavior.

The parse(InputSource) method calls the parse(JDBCInputSource) method if
the argument is of type JDBCInputSource; otherwise, it throws a
SAXException as it cannot deal with it.

The parse(String) method throws a SAXException as the information
supplied is not sufficient to access the database.

The parse(JDBCInputSource) method gets a Connection object from the
input source and executes a query to obtain a ResultSet object. It
then calls parse(ResultSet) with this object.

The parse(ResultSet, String) method performs the core parsing logic.
It iterates over each row in the result set and each column in the rows.
The row iteration loop is surrounded by calls to startElement() and
endElement() with a table marker as the element-name argument.
Similarly, each column iteration loop is surrounded by calls to
startElement() and endElement() with a row marker as the element-name
argument. In both cases an empty attribute list passes as the second
argument to the startElement() methods. On each visit to a column, the
generateSAXEventForColumn() method is called with column-name and
column-value arguments. The value of a column is accessed by the
getString() method on the result-set object, as we need a string
representation of the column data to be notified in the characters() SAX
event.

The convenience method parse(String, String, String, String) simply
creates a JDBCInputSource object with the arguments passed to it and
then calls the parse(JDBCInputSource) method with it.
The protected methods of JDBCSAXParser offer some customization
possibilities through overriding:

The generateSAXEventForColumn() method generates events for column data.
A null value for a column in a database has a different meaning from
a column with an empty string. We capture the difference by not firing
any events for a column that has a null value. Another choice for
representing a null value in a database is to use a binary attribute
like isNull. With this option, a true value will be set for null data;
otherwise it will be false.

The getTableMarker(), getRowMarker(), and getColumnMarker() methods
return reasonable defaults for table, row, and column markers. Derived
classes may override these to provide custom markups.

The getSelectorSQLStatement() method returns a "select * from
<tableName>" string. Derived classes can override it to provide a
different select query string to offer database-level filtering.
The JDBCSAXUtils convenience class provides two methods for creating a
JDBCInputSource: it can be done from either a property file or a
Property object. There's no need to supply a long list of parameters
that describe a database to an application that uses either the SAX or
DOM APIs for database. The class expects the user to supply a property
file that contains entries for a database URL, a user name and
password to connect to the database, a JDBC driver to establish a
connection, and a table name. The code below demonstrates a typical
property file:

URL=jdbc:odbc:db1
user=jw
password=jw-passwd
table=portfolio
driver=sun.jdbc.odbc.JdbcOdbcDriver

Back to Top

The story so far ...
We now have a simple parser that can generate appropriate SAX events for
the information in a database table. It takes care of null data and
offers some marker customization. While such functionality may be
sufficient for some applications, the complete solution will consider
additional functionality because:

The parser does not incorporate relational information. That can be
solved by using a XPointer/XLink to set the reference to a foreign key
in a table.

A text column in a database may contain marked-up data. A SAX parser for
databases should parse those data as well and generate appropriate
SAX events. If such functionality is important for an application, it
could override generateSAXEventForColumn() and parse the content of
the column and generate additional SAX events.

In databases, a table contains an unordered list of columns; the order
in which columns are stored is not important. An XML DTD, on the other
hand, does not have a way to describe an unordered collection of child
elements.
We can deal with this problem in a few ways. If the task is to convert a
database into another XML document, say an HTML page, the XSLT
stylesheet written for that purpose can create output in the correct
order. We could also override the getSelectorSQLStatement() method to
supply an explicit list of columns in the correct order.

It is desirable to present only a selected part of a table as a document
based on some query. While XML tools can do the filtering, databases
are better at it. The getSelectorSQLStatement() method can be overridden
to return the appropriate select query string.

The parser uses the getString() method on the result-set object to
obtain the string representation of the value in a column. This works
fine for columns with text, numbers, and so on, but it does not work
well with binary data. While binary data can be represented as text,
that may not be suitable for certain tasks. The parser also does not
deal with user-defined types available with SQL3/JDBC 2.0.
We can solve both problems by overriding the generateSAXEventForColumn()
method and providing a suitable implementation.

Back to Top

Implementing the DOM API for Databases
To build a DOM tree for a database table, we could iterate over rows and
columns and build nodes for a tree as we visit them. Or we could employ
another library, like Sun's JAXP implementation, that builds a DOM tree
from a SAX event stream. The latter approach is simpler, and it
requires less coding because it reuses an existing facility. To
implement the DOM API using such an approach we need just a clever reuse
of the SAX parser for databases we implemented earlier.
Class JDBCDOMParser implements the DOM API for Databases:

// JDBCDOMParser.java
package dbxml.dom;
import java.io.IOException;
import org.w3c.dom.Document;
import org.xml.sax.SAXException;
import com.sun.xml.tree.XmlDocumentBuilder;
import dbxml.sax.*;
public class JDBCDOMParser {
public static Document
createDocument(
JDBCInputSource inputSource)
throws SAXException, IOException {
XmlDocumentBuilder documentBuilder =
new XmlDocumentBuilder();
JDBCSAXParser saxParser =
new JDBCSAXParser();
documentBuilder.setParser(saxParser);
saxParser.parse(inputSource);
return documentBuilder.getDocument();
}
}

The implementation for class JDBCDOMParser is simple. It uses the
XmlDocumentBuilder class provided by JAXP to build a DOM document from a
SAX event stream. The JDBCDOMParser has only one method:
createDocument(), which takes a JDBC data source as the argument. The
method creates a JDBCSAXParser and sets it as a parser for an
XmlDocumentBuilder object. It then fires the parsing and returns the
resulting document from the XmlDocumentBuilder object. Internally, the
XmlDocumentBuilder object responds to the SAX events generated by the
JDBCSAXParser object by building a DOM document.

Using the SAX API for Databases
We have already looked at one example of using the SAX API for Databases
to implement a DOM API for Databases. Here we look at another example
of using the SAX API for Databases. In this section, we look at using
the SAX API for Databases to integrate with XT -- an XSLT processor
written in Java. With such integration, we can apply an XSLT
stylesheet directly to the virtual XML documents stored in a database.

We wrap the logic of creating a SAX source for a given database source
and processing it with the given XSLT stylesheet to produce an output
file in class JDBCXSLProcessor, which is based on com.jclark.xsl.sax.
Driver from XT. The main method takes three arguments: a database
property file, an XSLT stylesheet file, and an output file.

As we'll see below, we can use this approach to generate HTML pages
directly without incurring the penalty of creating an intermediate XML
file. Moreover, we'll look at how we can use the integration of the
SAX API for databases and XT to convert a database into a nonvirtual XML
document.

(To view the source code for JDBCXSLProcessor.java, click here.)

Generating HTML pages directly from a database using an XSLT
stylesheet
Here we look at a simple stylesheet that formats a highly regular XML
document represented by a database table. The database table is
formatted as an HTML table. The generic stylesheet createTable.xsl can
be used to transform any XML document with a tablelike structure. The
stylesheet uses names of markers for columns as table headers.

(To view the stylesheet createTable.xsl, click here.)

Converting a database to XML with an XSLT stylesheet
While most XML applications work with the SAX or DOM API, we may need to
obtain an XML document file in some situations. For example, we need an
XML document to use a tool that does not work with either API. Here,
I suggest a way to convert a database into an XML file. With this
approach, we write an XSLT stylesheet to do the identity transformation.
Using such a stylesheet with the SAX API for Databases will yield an
XML document that represents a table in the database. I have provided
a stylesheet -- identity.xsl -- for the identity transformation that
works with the current implementation of XT.

<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="@*|*">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>
</xsl:stylesheet>

Note that while an XSLT stylesheet easily performs the identity
transformation, it is not very efficient as it goes through the
complex general-purpose logic of applying the full stylesheet. This
inefficiency can be a particular problem for database tables with
large numbers of records. An alternative approach is to write an
application to perform the identity transformation. Such an
application would listen to SAX events and create XML elements and
data in response to them, resulting in an XML document that represents
the table in the database.

Using the DOM API for Databases
For most situations, the SAX API for Databases is more memory
efficient than the DOM API for Databases. However, some applications
need random access to the XML documents and therefore require the
treelike structure that the DOM API for Databases offers.
Integrate the DOM API for Databases with the XQL processor
XML Query Language (XQL), a query language for XML documents, has a
syntax similar that of to XPath patterns. Here, we integrate our DOM
parser for database with GMD-IPSI's XQL Engine. With such integration,
we can perform SQL-like queries on the XML document representing a
database table.

As an example of integration, I provide a simple shell for querying
the XML document obtained from a database table. The class
JDBCXQLProcessor creates a shell-like environment, which takes the
queries from the user and prints the resulting document. The method
processQueries() can work with any Document object -- not just objects
created by the JDBCDOMParser. It reads the System.in for the query
string, performs the query, and prints the result on System.out. The
main() method creates a JDBCInputSource object from its argument and
uses it with JDBCDOMParser to obtain a Document object corresponding
to the given database table.

(To view the code for JDBCXQLProcessor.java, click here.)

As a side note, writing a database-to-XML converter is a snap with
XMLWriter and JDBCDOMParser. Just get a Document object from
JDBCDOMParser and write it to the desired file with XMLWriter.
write(Document).

Conclusion
In this article, we discussed using the XML APIs for database to tap the
information residing in databases. With such APIs, we can avoid the
costs associated with converting a database into XML documents and the
cost of keeping them synchronized. We presented a Java implementation of
SAX and DOM APIs for databases, which can work with any database with a
JDBC driver. We then presented an integration of SAX API for
Databases with XT. We illustrated the uses of such integration to create
HTML pages directly from a database and to convert databases into XML
documents. Finally, we integrated the DOM API for Databases with an
XQL processor.
Complete Source Code: zip file

Reprinted with permission from the January 2000 edition of JavaWorld
magazine. Copyright Web Publishing Inc., an IDG Communications company.
Register for editorial e-mailalerts

About the Author
Ramnivas Laddad is a Sun-certified architect of Java technology (Java
2). He has a master's in electrical engineering with a specialization in
communication engineering. He has six years of experience with
several software projects involving GUI, networking, and distributed
systems. He has developed object-oriented software systems in Java for
the last two years and in C++ for the last five years.

Ramnivas works at Real-Time Innovations, where he is working to design
and develop ControlShell, a component-based programming framework for
complex real-time systems.

Back to Top

------------------------------------------------------------------------
--------

--
           海纳百川，
                   有容乃大，
                           壁立千尺，
                                   无欲则刚。

※ 来源:·哈工大紫丁香 bbs.hit.edu.cn·[FROM: dip.hit.edu.cn]

Java 版 (精华区)