This document discusses three different approaches to parsing XML documents: SAX, DOM, and JDOM. SAX is an event-based parser that works by invoking methods as it reads the XML document and encounters different components. DOM parses the entire XML document into a tree structure that can be manipulated in memory. JDOM is a Java library that provides an API for parsing XML into Java objects similar to DOM. The document then provides code examples demonstrating how to use each of these XML parsing approaches to search for a book by ISBN in a sample XML book catalog.
2024: Domino Containers - The Next Step. News from the Domino Container commu...
SAX, DOM & JDOM parsers for beginners
1. Parsing XML with SAX, DOM & JDOM
Hicham Qaissi
hicham.qaissi@gmail.com
1
2. Contents
0. What is an XML parser? ............................................................................................ 3
1. Describing the example to develop........................................................................... 3
2. SAX............................................................................................................................. 6
3. DOM ........................................................................................................................ 11
4. JDOM....................................................................................................................... 14
5. Conclusion ............................................................................................................... 16
2
3. 0. What is an XML parser?
The XML parsers bring us the possibility of analyzing and composing of the XML
documents. Analyzing the XML data and structure, we can make some objects in some
languages programming (Java in our case). Also we can make the inverse process, in other
words, make a XML document from some data objects (See Fig. 1). In this manual, I analyze
with examples three kinds, SAX, DOM & JDOM.
1. Describing the example to develop
The example that I make is entertained. This is the same for the entire three API (SAX,
DOM and JDOM). The example consists in analyzing a XML document that contains
information about some books (ISBN code (isbn is an attribute), Name, Author name, Price,
Editorial). The program expects a book code (ISBN), and searches this book into the XML. If the
book exists, all its information are printed by the standard output, in other case, we print a
message notifying that the book doesn’t exist in the XML. Are you finding it as amusing as I do?
Let’s go!!!
3
4. The xml example (books.xml) is the following:
<books>
<book isbn="0000000001">
<name>Book 1</name>
<author>Author name 1</author>
<price>12.54</price>
<editorial>Editorial 1</editorial>
</book>
<book isbn="0000000002">
<name>Book 2</name>
<author>Author name 2</author>
<price>58.25</price>
<editorial>Editorial 2</editorial>
</book>
<book isbn="0000000003">
<name>Book 3</name>
<author>Author name 3</author>
<price>29.45</price>
<editorial>Editorial 3</editorial>
</book>
<book isbn="0000000004">
<name>Book 4</name>
<author>Author name 4</author>
<price>78.95</price>
<editorial>Editorial 4</editorial>
</book>
<book isbn="0000000005">
<name>PBook 5</name>
<author>Author name 5</author>
<price>61.25</price>
<editorial>Editorial 5</editorial>
</book>
</books>
4
5. For all parsers (SAX, DOM & JDOM), I use this DTO (Data Transfer Object):
public class MyBook {
private String isbn;
private String name;
private String author;
private String price;
private String editorial;
public String getIsbn() {
return isbn;
}
public void setIsbn(String isbn) {
this.isbn = isbn;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public String getAuthor() {
return author;
}
public void setAuthor(String author) {
this.author = author;
}
public String getPrice() {
return price;
}
public void setPrice(String price) {
this.price = price;
}
public String getEditorial() {
return editorial;
}
public void setEditorial(String editorial) {
this.editorial = editorial;
}
}
5
6. 2. SAX
SAX (Simple API for XML), it Works by events and associated methods. As the parser is
reading the document XML and finds the components (the events) of the document
(elements, attributes, values, etc) or it detects errors, is invoking to the methods that the
programmer has associated. You can find more information about SAX on
www.saxproject.org.
First, be sure that you’ve included the sax jar in the classpath (The jar file can be
downloaded http://sourceforge.net/projects/sax/files/). We must instantiate the reader. This
reader implements the XMLReader’s interface, we can obtain it from the abstract class
SAXParser. I obtain SAXParser from the SAXParserFactory. The method parse of XMLReader
analyses the xml document:
import java.io.IOException;
import org.xml.sax.SAXException;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.XMLReader;
public class MySAXSeracher{
public static void main(String[] args) {
try {
SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setNamespaceAware( true );
factory.setValidating( true );
SAXParser saxParser = factory.newSAXParser();
XMLReader xr = saxParser.getXMLReader();
xr.parse( args[0] );
} catch ( IOException ioe ) {
System.out.println( "Error: " + ioe.getMessage() );
} catch ( SAXException saxe ){
System.out.println( "Error: " + saxe.getMessage() );
} catch ( ParserConfigurationException pce ){
System.out.println( "Error: " + pce.getMessage() );
}
}
}
If the program compiles, it means that java and the jar file are ok. Nevertheless, the
program doesn’t do anything because we haven’t been interested on any event at the
moment. It’s important to catch the exceptions java.io.IOException,
org.xml.sax.SAXException and
javax.xml.parsers.ParserConfigurationException.
6
7. To manipulate the events, our main class must extends
org.xml.sax.helpers.DefaultHandler. DefaultHandler implements the following
interfaces:
org.xml.sax.ContentHandler: events about data (The most extended)
org.xml.sax.ErrorHandler: events about errors
org.xml.sax.DTDhandler: DTD’s treatment
org.xml.sax.EntityResolver: foreign entities
We can make our own classes implementing ContentHandler and ErrorHandler to treat
the event which we are interested in:
Data: implementing ContentHandler and associate it to the reader (parser) with the
method setContenthandler().
Errors: implementing ErrorHandler and associate it to the reader (parser) with the
method setErrorHandler().
The most important methods in the interface ContentHandler (implemented by
DefaultHandler which is extended by our class MySAXSearcher) are:
• startDocument():Receive notification of the beginning of a document.
• endDocument(): Receive notification of the end of a document.
• startElement():Receive notification of the beginning of an element
• endElement():Receive notification of the end of an element.
• characters():Receive notification of character data.
See more about ContentHandler on
http://download.oracle.com/javase/1.4.2/docs/api/org/xml/sax/ContentHandler.html.
Now, MySAXSearcher is the following (I’ve made my own ContentHandler and
ErrorHandler, it’s much more clean than overriding the ContentHandler and ErrorHandler
interesting methods in our class that extends DefaultHandler):
7
8. MySAXSearcher.java:
import java.io.IOException;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.DefaultHandler;
public class MySAXSearcher extends DefaultHandler{
public static void main(String[] args) {
MySAXSearcher searcher = new MySAXSearcher();
searcher.searchBook(args[0], args[1]);
}
private void searchBook(String xml, String isbn){
try {
SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setNamespaceAware( true );
factory.setValidating( true );
SAXParser saxParser = factory.newSAXParser();
XMLReader xr = saxParser.getXMLReader();
// Assigning my own ContentHandler at my XMLReader.
MyContentHandler ch = new MyContentHandler();
ch.isbnSearched = isbn;
xr.setContentHandler( ch );
// Assigning my own ErrorHandler at my XMLReader.
xr.setErrorHandler( new MyOwnErrorHandler() );
xr.setFeature( "http://xml.org/sax/features/validation", false);
xr.setFeature( "http://xml.org/sax/features/namespaces", true);
long before = System.currentTimeMillis();
xr.parse( xml );
long after = System.currentTimeMillis();
printResult (xml, ch, after - before);
} catch ( IOException ioe ) {
System.out.println( "Error: " + ioe.getMessage() );
} catch ( SAXException saxe ){
System.out.println( "Error: " + saxe.getMessage() );
} catch ( ParserConfigurationException pce ){
System.out.println( "Error: " + pce.getMessage() );
}
}
public void printResult(String xml, MyContentHandler ch, long time){
System.out.println("Document " + xml + ". Parsed in : " + time + "
ms");
if (ch.book != null){
System.out.println("Book found:");
System.out.println(" Isbn: " + ch.book.getIsbn());
System.out.println(" Name: " + ch.book.getName());
System.out.println(" Author: " + ch.book.getAuthor());
System.out.println(" Price: " + ch.book.getPrice());
System.out.println(" Editorial: " + ch.book.getEditorial());
8
9. } else {
System.out.println("Book not found");
}
}
}
MyContentHandler.java:
import org.xml.sax.Attributes;
import org.xml.sax.ContentHandler;
import org.xml.sax.Locator;
import org.xml.sax.SAXException;
public class MyContentHandler implements ContentHandler {
boolean isBookFound = false;
String isbnSearched = "";
String currentNode = "";
MyBook book = null;
// Overrided
public void startDocument() throws SAXException {
System.out.println("***Start document***");
}
// Overrided
public void endDocument() throws SAXException {
System.out.println("***End document***");
}
// Overrided
public void startElement(String uri, String local, String raw,
Attributes attrs) {
currentNode = local;
if ("book".equals(local) && !isBookFound){
// The book node only has an attribute (isbn)
if ("isbn".equals(attrs.getLocalName(0)) &&
isbnSearched.equals(attrs.getValue(0))){
isBookFound = true;
book = new MyBook();
book.setIsbn(isbnSearched);
}
}
}
// Overrided
public void characters(char ch[], int start, int length) {
String value = "";
// I get the text value
for (int i = start; i < start + length; i++) {
value+= Character.toString(ch [i]);
}
if (!"".equals(value.trim()) && isBookFound){
if("name".equals(currentNode)){
book.setName(value.trim());
} else if ("author".equals(currentNode)){
book.setAuthor(value.trim());
} else if ("price".equals(currentNode)){
book.setPrice(value.trim());
} else if ("editorial".equals(currentNode)){
book.setEditorial(value.trim());
isBookFound = false;
}
}
9
10. }
// Overrided
public void endElement(String arg0, String arg1, String arg2)
throws SAXException {
}
// Overrided
public void endPrefixMapping(String arg0) throws SAXException {
}
// Overrided
public void ignorableWhitespace(char[] arg0, int arg1, int arg2)
throws SAXException {
}
// Overrided
public void processingInstruction(String arg0, String arg1)
throws SAXException {
}
// Overrided
public void setDocumentLocator(Locator arg0) {
}
// Overrided
public void skippedEntity(String arg0) throws SAXException {
}
// Overrided
public void startPrefixMapping(String arg0, String arg1)
throws SAXException {
}
}
MyErrorHandler.java:
import org.xml.sax.ErrorHandler;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;
public class MyErrorHandler implements ErrorHandler {
// Overrided
public void warning(SAXParseException ex) {
System.err.println("[Warning] : "+ ex.getMessage());
}
// Overrided
public void error(SAXParseException ex) {
System.err.println("[Error] : "+ex.getMessage());
}
// Overrided
public void fatalError(SAXParseException ex) throws SAXException {
System.err.println("[Error!] : "+ex.getMessage());
}
}
With our xml (books.xml), and the book code to search 0000000003, we can executed
our program with:
java MySAXSearcher “books.xml” “0000000003”
10
11. The result must be the following:
***Start document***
***End document***
Document books.xml Parsed in: 141ms
Book found:
Isbn: 0000000003
Name: Book 3
Author: Author name 3
Price: 29.45
Editorial: Editorial 3
3. DOM
DOM (Document Object Model), while SAX offers access at all elements of document,
DOM brings the parsing as a tree that can be parsed and transformed. DOM has some
disadvantages and advantages with regards to SAX:
Disadvantage:
• The data can be acceded only when the entire document is parsed.
• The tree is an object loaded on the memory; this is problematic for big and
complex documents.
Advantages:
• With DOM we can manipulate (update, delete and add elements) the xml
document. Also, we can create a new xml document.
To manipulate an xml document, we must instantiate a Document (interface) object
that implements the Document interface (extends the interface Node). We use the classes’
javax.xml.parsers.DocumentBuilder and javax.xml.parsers.DocumentBuilderFactory, we
invoke the method parse() to obtain a Document object.
For manipulate an XML with DOM, there are some important classes’:
org.w3c.dom.Document (interface representing the entire XML document),
org.w3c.dom.Element (Elements in the XML document), org.w3c.dom.Node (node that has
some elements) and org.w3c.dom.Att (The attributes of every element).
Ok, now let’s talk in java code language. As DTO (Data Transfer Object), I use the same
object MyBook.
11
12. MyDOMSearcher.java:
import java.io.File;
import java.io.IOException;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;
public class MyDOMSearcher {
public static void main(String[] args) {
MyDOMSearcher searcher = new MyDOMSearcher();
searcher.searchBook(args[0], args[1]);
}
private void searchBook(String xml, String isbn) {
long before = System.currentTimeMillis();
MyBook book = null;
try{
DocumentBuilderFactory factory =
DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
factory.setValidating(true);
DocumentBuilder parser = factory.newDocumentBuilder();
// I assign my own ErrorHandler to my Parser
parser.setErrorHandler(new MyErrorHandler());
File file = new File(xml);
Document doc = parser.parse (file);
// I obtain all the elements <book>
// NodeList is an interface that has 2 methods:
// 1. item(int): returns the Node (Interface) Object of the
position int.
// 2. getLength(): returns the length of the List
NodeList booksNodes = doc.getElementsByTagName("book");
NodeList bookChildsNodes = null;
String isbnAttribute = "";
for(int i = 0; i < booksNodes.getLength(); i++) {
Node node = booksNodes.item(i);
if(node != null && node.hasAttributes()) {
isbnAttribute =
node.getAttributes().getNamedItem("isbn").getNodeValue();
if(isbnAttribute.equals(isbn)){
//I've caught the isbn searched
if(book == null){
book = new MyBook();
book.setIsbn(isbn);
}
if(node.hasChildNodes()){
bookChildsNodes = node.getChildNodes();
for (int j = 0; j < bookChildsNodes.getLength(); j++) {
if("name".equals(bookChildsNodes.item(j).getNodeName())){
book.setName(bookChildsNodes.item(j).getTextContent());
12
13. }else
if("author".equals(bookChildsNodes.item(j).getNodeName())){
book.setAuthor(bookChildsNodes.item(j).getTextContent());
}else if("price".equals(bookChildsNodes.item(j).getNodeName())){
book.setPrice(bookChildsNodes.item(j).getTextContent());
}else if("editorial".equals(bookChildsNodes.item(j).getNodeName())){
book.setEditorial(bookChildsNodes.item(j).getTextContent());
// I've found my book. Ending the for iteration
break;
}
}
}
}
}
}
}catch(IOException ioe){
System.err.println("[Error] : "+ioe.getMessage());
}catch(ParserConfigurationException pce){
System.err.println("[Error] : "+pce.getMessage());
}catch(SAXException se){
System.err.println("[Error] : "+se.getMessage());
}
long after = System.currentTimeMillis();
printResults(xml, book, after - before);
}
public void printResults(String xml, MyBook book, long time) {
System.out.println("Document " + xml + ". Parsed in : " + time + "
ms");
if (book != null){
System.out.println("Book found:");
System.out.println(" Isbn: " + book.getIsbn());
System.out.println(" Name: " + book.getName());
System.out.println(" Author: " + book.getAuthor());
System.out.println(" Price: " + book.getPrice());
System.out.println(" Editorial: " + book.getEditorial());
}else{
System.out.println("Book not found");
}
}
}
13
14. 4. JDOM
All the precedents API’s are available for many programming languages, but their use
is laborious in Java. A specific API has been made for java (JDOM), that API uses the own
capacities and features of Java, therefore, using it make the XMlL parsing easily. We can find
some related information on www.jdom.org.
Now, let’s make the same example (searching a book in our XML) with JDOM (be sure
that the jar is installed in your classpath, you can download it on
http://www.jdom.org/dist/binary/).
MyJDOMSearcher.java:
import java.io.IOException;
import java.util.Iterator;
import java.util.List;
import org.jdom.Document;
import org.jdom.Element;
import org.jdom.JDOMException;
import org.jdom.input.SAXBuilder;
public class MyJDOMSearcher {
private String isbn;
private MyBook book;
private boolean noSearchMore = false;
public static void main(String[] args) {
try {
long before = System.currentTimeMillis();
MyJDOMSearcher searcher = new MyJDOMSearcher();
// The second parameter is the isbn to search
searcher.isbn = args[1];
SAXBuilder saxBuilder = new SAXBuilder();
Document document = saxBuilder.build(args[0]);
searcher.searchBook(document.getRootElement());
long after = System.currentTimeMillis();
searcher.printResults(args[0], after-before);
} catch (JDOMException jde){
System.err.println("[Error] JDOMException: "+jde.getMessage());
} catch (IOException ioe){
System.err.println("[Error] IOException: "+ioe.getMessage());
}
}
private void searchBook(Element element){
inspect(element);
List content = element.getContent();
Iterator iterator = content.iterator();
Element child = null;
Object object = null;
14
15. while(iterator.hasNext()){ // All times we have "books" node
object = iterator.next();
if(object instanceof Element){
child = ((Element)object); //Casting from Object to Element
searchBook(child);
}
}
}
// Recursively descend the tree
public void inspect(Element element) {
if (!noSearchMore){ // If I've had the book yet, I'll do anything
if("book".equals(element.getQualifiedName()) & book == null){
if(isbn.equals(element.getAttribute("isbn").getValue())){
book = new MyBook();
book.setIsbn(isbn);
}
}
if(book != null){
if("name".equals(element.getQualifiedName())){
book.setName(element.getValue());
}
if("author".equals(element.getQualifiedName())){
book.setAuthor(element.getValue());
}
if("price".equals(element.getQualifiedName())){
book.setPrice(element.getValue());
}
if("editorial".equals(element.getQualifiedName())){
book.setEditorial(element.getValue());
noSearchMore = true;
}
}
}
}
private void printResults(String xml, long time) {
System.out.println("Document " + xml + ". Parsed in : " + time + "
ms");
if (book != null){
System.out.println("Book found:");
System.out.println(" Isbn: " + book.getIsbn());
System.out.println(" Name: " + book.getName());
System.out.println(" Author: " + book.getAuthor());
System.out.println(" Price: " + book.getPrice());
System.out.println(" Editorial: " + book.getEditorial());
} else {
System.out.println("Book not found");
}
}
}
15
16. 5. Conclusion
Executing the same example with the three API’s (MySAXSearcher, MyDOMSearcher
and MyJDOMSearcher) having us parameters received the same xml file and the isbn to search
("0000000003"), the result (in time) obtained is the following:
MySAXSearcher MyDOMSearcher MyJDOMSearcher
93 ms 750 ms 609 ms
The SAX API is faster than DOM and JDOM (But it’s laborious).
16