![]() ![]() ![]() ![]() ![]() |
Top Contents Index Glossary |
Link Summary
|
|
You saw earlier that if you are writing text out as XML, you need to know if
you are in a CDATA
section. If you are, then angle brackets (<)
and ampersands (&) should be output unchanged. But if you're not in a CDATA
section, they should be replaced by the predefined entities <
and &
. But how do you know if you're processing a CDATA
section?
Then again, if you are filtering XML in some way, you would want to pass comments along. Normally the parser ignores comments. How can you get comments so that you can echo them?
Finally, there are the parsed entity definitions. If an XML-filtering app sees
&myEntity;
it needs to echo the same string -- not the text
that is inserted in its place. How do you go about doing that?
This section of the tutorial answers those questions. It shows you how to use
com.sun.xml.parser.LexicalEventListener
to identify comments, CDATA
sections, and references to parsed
entities.
Note:
This material is specific to Project X, Sun's reference implementation for the JAXP standard. The material in this section is not part of the standard. Instead, it represents helpful functionality that you may need to take advantage of until some equivalent mechanism is standardized. Because it is not part of the JAXP standard, the functionality described here may very well not exist in other JAXP-standard parsers. In fact, as standards evolve, future versions of the JAXP reference implementation could employ different mechanisms to achieve the same goals.
Comments, CDATA
tags, and references to parsed entities constitute
lexical information -- that is, information that concerns the text of
the XML itself, rather than the XML's information content. Most applications,
of course, are concerned only with the content of an XML document. Such
apps will not use the LexicalEventListener
API. But apps that output
XML text will find it invaluable.
Note:
TheLexicalEventListener
API is likely to be part of the SAX 2.0 specification.
To be informed when the SAX parser sees lexical information, you configure
the parser with a LexicalEventListener
rather than a DocumentHandler
.
(For an overview of these two APIs, see An
Overview of the Java XML APIs.) The LexicalEventListener
interface
extends the DocumentHandler
interface to add:
comment(String comment)
startCDATA()
, endCDATA()
CDATA
section is starting and ending, which tells
your application what kind of characters to expect the next time characters()
is called.startParsedEntity(String name)
,
EndParsedEntity(String name, Boolean included)
included
tells if the entity value was passed to the app. Although validating parsers
are required to process external entities, the specification allows nonvalidating
parsers to skip them. This value tells you whether or not the nonvalidating
parser did so. (The Java XML nonvalidating parser always includes external
entities, so this value is always true.)
Note:
The com.sun.xml.parsers.XmlDocumentBuilder
class (which constructs a DOM) implements
LexicalEventListener
, in fact. That is how it is able to construct
an object tree that has all the appropriate syntax elements.
In the remainder of this section, you'll convert the Echo app into a lexical event listener and play with its features.
Note:
The code shown in this section is inEcho11.java
. The output is shown inEcho11-09.log
.
To start, add the code highlighted below to implement the LexicalEventListener
interface and add the appropriate methods.
import com.sun.xml.parser.LexicalEventListener; public class Echo extends HandlerBaseimplements LexicalEventListener { ... public void processingInstruction (String target, String data) ... }public void comment(String text) throws SAXException { } public void startCDATA() throws SAXException { } public void endCDATA() throws SAXException { } public void startParsedEntity(String name) throws SAXException { } public void endParsedEntity(String name, boolean included) throws SAXException { } private void emit (String s) ...
Those are the only changes you need to make to turn the Echo
class
into a lexical event listener. The parser checks the class type, and knows that
the "document handler" you specify with:
parser.setDocumentHandler ( new Echo() );
is really the extended class, LexicalEventListener
.
The next step is to do something with one of the new methods. Add the code highlighted below to echo comments in the XML file:
public void comment(String text) throws SAXException { nl(); emit ("COMMENT: "+text); }
When you compile the Echo program and run it on your XML file, the result looks something like this:
COMMENT: A SAMPLE set of slides COMMENT: DTD for a simple "slide show". COMMENT: Defines the %inline; declaration COMMENT: ...
The line endings in the comments are passed as part
of the comment string, once again normalized to newlines (\n
).
You can also see that comments in the DTD are echoed along with comments from
the file. (That can pose problems when you want to echo only comments that are
in the data file. To get around that problem, you use the startDTD
and endDTD
methods in the DtdEventListener
interface.)
To finish up this section, you'll exercise the remaining LexicalEventHandler
methods.
Note:
The code shown in this section is inEcho12.java
. The file it operates on isslideSample10.xml
. The results of processing are inEcho12-10.log
.
Make the changes highlighted below to remove the comment echo (you don't need that any more) and echo the other events:
public void comment(String text) throws SAXException {} public void startCDATA() throws SAXException { nl(); emit ("COMMENT: "+text);nl(); emit ("START CDATA SECTION"); } public void endCDATA() throws SAXException {nl(); emit ("END CDATA SECTION"); } public void startParsedEntity(String name) throws SAXException {nl(); emit ("START PARSED ENTITY: "+name); } public void endParsedEntity(String name, boolean included) throws SAXException {nl(); emit ("END PARSED ENTITY: "+name); emit (", INCLUDED="+included); }
Here is what happens when the internally defined products
entity
is processed with the latest version of the program:
And here is the result of processing the externalELEMENT: <slide-title> CHARS: Wake up to START PARSED ENTITY: products CHARS: WonderWidgets END PARSED ENTITY: products, INCLUDED=true CHARS: ! END_ELM: </slide-title>
copyright
entity:
Finally, you get output like this for theSTART PARSED ENTITY: copyright CHARS: This is the standard copyright message ... END PARSED ENTITY: copyright, INCLUDED=true
CDATA
section:
In summary, theSTART CDATA SECTION CHARS: Diagram: frobmorten <------------ fuznaten | <3> ^ | <1> | <1> = fozzle V | <2> = framboze staten --------------------+ <3> = frenzle <2> END CDATA SECTION
LexicalEventListener
gives you the event-notifications
you need to produce an accurate reflection of the original XML text.
![]() ![]() ![]() ![]() ![]() |
Top Contents Index Glossary |