libjava/gnu/xml/dom/package.html

   1 <html>
   2 <body>
   3
   4 <p>
   5 This is a Free Software DOM Level 3 implementation, supporting these features:
   6 <ul>
   7 <li>"XML"</li>
   8 <li>"Events"</li>
   9 <li>"MutationEvents"</li>
  10 <li>"HTMLEvents" (won't generate them though)</li>
  11 <li>"UIEvents" (also won't generate them)</li>
  12 <li>"USER-Events" (a conformant extension)</li>
  13 <li>"Traversal" (optional)</li>
  14 <li>"XPath"</li>
  15 <li>"LS" and "LS-Async"</li>
  16 </ul>
  17 It is intended to be a reasonable base both for
  18 experimentation and supporting additional DOM modules as clean layers.
  19 </p>
  20
  21 <p>
  22 Note that while DOM does not specify its behavior in the
  23 face of concurrent access, this implementation does.
  24 Specifically:
  25 <ul>
  26 <li>If only one thread at a time accesses a Document,
  27 of if several threads cooperate for read-only access,
  28 then no concurrency conflicts will occur.</li>
  29 <li>If several threads mutate a given document
  30 (or send events using it) at the same time,
  31 there is currently no guarantee that
  32 they won't interfere with each other.</li>
  33 </ul>
  34 </p>
  35
  36 <h3>Design Goals</h3>
  37
  38 <p>
  39 A number of DOM implementations are available in Java, including
  40 commercial ones from Sun, IBM, Oracle, and DataChannel as well as
  41 noncommercial ones from Docuverse, OpenXML, and Silfide.  Why have
  42 another?  Some of the goals of this version:
  43 </p>
  44
  45 <ul>
  46 <li>Advanced DOM support. This was the first generally available
  47 implementation of DOM Level 2 in Java, and one of the first Level 3
  48 and XPath implementations.</li>
  49
  50 <li> Free Software.  This one is distributed under the GPL (with
  51 "library exception") so it can be used with a different class of
  52 application.</li>
  53
  54 <li>Second implementation syndrome.  I can do it simpler this time
  55 around ... and heck, writing it only takes a bit over a day once you
  56 know your way around.</li>
  57
  58 <li>Sanity check the then-current Last Call DOM draft.  Best to find
  59 bugs early, when they're relatively fixable.  Yes, bugs were found.</li>
  60
  61 <li>Modularity.  Most of the implementations mentioned above are part
  62 of huge packages; take all (including bugs, of which some have far
  63 too many), or take nothing.  I prefer a menu approach, when possible.
  64 This code is standalone, not beholden to any particular parser or XSL
  65 or XPath code.</li>
  66
  67 <li>OK, I'm a hacker, I like to write code.</li>
  68 </ul>
  69
  70 <p>
  71 This also works with the GNU Compiler for Java (GCJ).  GCJ promises
  72 to be quite the environment for programming Java, both directly and from
  73 C++ using the new CNI interfaces (which really use C++, unlike JNI). </p>
  74
  75
  76 <h3>Open Issues</h3>
  77
  78 <p>At this writing:</p>
  79 <ul>
  80 <li>See below for some restrictions on the mutation event
  81 support ... some events aren't reported (and likely won't be).</li>
  82
  83 <li>More testing and conformance work is needed.</li>
  84
  85 <li>We need an XML Schema validator (actually we need validation in the DOM
  86 full stop).</li>
  87 </ul>
  88
  89 <p>
  90 I ran a profiler a few times and remove some of the performance hotspots,
  91 but it's not tuned.  Reporting mutation events, in particular, is
  92 rather costly -- it started at about a 40% penalty for appendNode calls,
  93 I've got it down around 12%, but it'll be hard to shrink it much further.
  94 The overall code size is relatively small, though you may want to be rid of
  95 many of the unused DOM interface classes (HTML, CSS, and so on).
  96 </p>
  97
  98
  99 <h2><a name="features">Features of this Package</a></h2>
 100
 101 <p> Starting with DOM Level 2, you can really see that DOM is constructed
 102 as a bunch of optional modules around a core of either XML or HTML
 103 functionality.  Different implementations will support different optional
 104 modules.  This implementation provides a set of features that should be
 105 useful if you're not depending on the HTML functionality (lots of convenience
 106 functions that mostly don't buy much except API surface area) and user
 107 interface support.  That is, browsers will want more -- but what they
 108 need should be cleanly layered over what's already here. </p>
 109
 110 <h3> Core Feature Set:  "XML" </h3>
 111
 112 <p> This DOM implementation supports the "XML" feature set, which basically
 113 gets you four things over the bare core (which you're officially not supposed
 114 to implement except in conjunction with the "XML" or "HTML" feature).  In
 115 order of decreasing utility, those four things are: </p> <ol>
 116
 117     <li> ProcessingInstruction nodes.  These are probably the most
 118     valuable thing. Handy little buggers, in part because all the APIs
 119     you need to use them are provided, and they're designed to let you
 120     escape XML document structure rules in controlled ways.</li>
 121
 122     <li> CDATASection nodes.  These are of of limited utility since CDATA
 123     is just text that prints funny. These are of use to some sorts of
 124     applications, though I encourage folk to not use them. </li>
 125
 126     <li> DocumentType nodes, and associated Notation and Entity nodes.
 127     These appear to be useless.  Briefly, these "Type" nodes expose no
 128     typing information.  They're only really usable to expose some lexical
 129     structure that almost every application needs to ignore.  (XML editors
 130     might like to see them, but they need true typing information much more.)
 131     I strongly encourage people not to use these.  </li>
 132
 133     <li> EntityReference nodes can show up.  These are actively annoying,
 134     since they add an extra level of hierarchy, are the cause of most of
 135     the complexity in attribute values, and their contents are immutable.
 136     Avoid these.</li>
 137
 138     </ol>
 139
 140 <h3> Optional Feature Sets:  "Events", and friends </h3>
 141
 142 <p> Events may be one of the more interesting new features in Level 2.
 143 This package provides the core feature set and exposes mutation events.
 144 No gooey events though; if you want that, write a layered implementation! </p>
 145
 146 <p> Three mutation events aren't currently generated:</p> <ul>
 147
 148     <li> <em>DOMSubtreeModified</em> is poorly specified.  Think of this
 149     as generating one such event around the time of finalization, which
 150     is a fully conformant implementation.  This implementation is exactly
 151     as useful as that one. </li>
 152
 153     <li> <em>DOMNodeRemovedFromDocument</em> and
 154     <em>DOMNodeInsertedIntoDocument</em> are supposed to get sent to
 155     every node in a subtree that gets removed or inserted (respectively).
 156     This can be <em>extremely costly</em>, and the removal and insertion
 157     processing is already significantly slower due to event reporting.
 158     It's much easier, and more efficient, to have a listener higher in the
 159     tree watch removal and insertion events through the bubbling or capture
 160     mechanisms, than it is to watch for these two events.</li>
 161
 162     </ul>
 163
 164 <p> In addition, certain kinds of attribute modification aren't reported.
 165 A fix is known, but it couldn't report the previous value of the attribute.
 166 More work could fix all of this (as well as reduce the generally high cost
 167 of childful attributes), but that's not been done yet. </p>
 168
 169 <p> Also, note that it is a <em>Bad Thing&#153;</em> to have the listener
 170 for a mutation event change the ancestry for the target of that event.
 171 Or to prevent mutation events from bubbling to where they're needed.
 172 Just don't do those, OK? </p>
 173
 174 <p> As an experimental feature (named "USER-Events"), you can provide
 175 your own "user" events.  Just name them anything starting with "USER-"
 176 and you're set.  Dispatch them through, bubbling, capturing, or what
 177 ever takes your fancy.  One important thing you can't currently do is
 178 pass any data (like an object) with those events.  Maybe later there
 179 will be a "UserEvent" interface letting you get some substantial use
 180 out of this mechanism even if you're not "inside" of a DOM package.</p>
 181
 182 <p> You can create and send HTML events.  Ditto UIEvents.  Since DOM
 183 doesn't require a UI, it's the UI's job to send them; perhaps that's
 184 part of your application.  </p>
 185
 186 <p><em>This package may be built without the ability to report mutation
 187 events, gaining a significant speedup in DOM construction time.  However,
 188 if that is done then certain other features -- notably node iterators
 189 and getElementsByTagname -- will not be available.</em>
 190
 191
 192 <h3> Optional Feature:  "Traversal" </h3>
 193
 194 <p> Each DOM node has all you need to walk to everything connected
 195 to that node.  Lightweight, efficient utilities are easily layered on
 196 top of just the core APIs. </p>
 197
 198 <p> Traversal APIs are an optional part of DOM Level 2, providing
 199 a not-so-lightweight way to walk over DOM trees, if your application
 200 didn't already have such utilities for use with data represented via
 201 DOM.  Implementing this helped debug the (optional) event and mutation
 202 event subsystems, so it's provided here.  </p>
 203
 204 <p> At this writing, the "TreeWalker" interface isn't implemented. </p>
 205
 206
 207
 208 <h2><a name='avoid'>DOM Functionality to Avoid</a></h2>
 209
 210 <p> For what appear to be a combination of historical and "committee
 211 logic" reasons, DOM has a number of <em>features which I strongly advise
 212 you to avoid using</em> in your library and application code.  These
 213 include the following types of DOM nodes; see the documentation for the
 214 implementation class for more information: <ul>
 215
 216     <li> CDATASection
 217     (<a href='DomCDATA.html'>DomCDATA</a> class)
 218     ... use normal Text nodes instead, so you don't have to make
 219     every algorithm recognize multiple types of character data
 220
 221     <li> DocumentType
 222     (<a href='DomDoctype.html'>DomDocType</a> class)
 223     ... if this held actual typing information, it might be useful
 224
 225     <li> Entity
 226     (<a href='DomEntity.html'>DomEntity</a> class)
 227     ... neither parsed nor unparsed entities work well in DOM; it
 228     won't even tell you which attributes identify unparsed entities
 229
 230     <li> EntityReference
 231     (<a href='DomEntityReference.html'>DomEntityReference</a> class)
 232     ... permitted implementation variances are extreme, all children
 233     are readonly, and these can interact poorly with namespaces
 234
 235     <li> Notation
 236     (<a href='DomNotation.html'>DomNotation</a> class)
 237     ... only really usable with unparsed entities (which aren't well
 238     supported; see above) or perhaps with PIs after the DTD, not with
 239     NOTATION attributes
 240
 241     </ul>
 242
 243 <p> If you really need to use unparsed entities or notations, use SAX;
 244 it offers better support for all DTD-related functionality.
 245 It also exposes actual
 246 document typing information (such as element content models).</p>
 247
 248 <p> Also, when accessing attribute values, use methods that provide their
 249 values as single strings, rather than those which expose value substructure
 250 (Text and EntityReference nodes).  (See the <a href='DomAttr.html'>DomAttr</a>
 251 documentation for more information.) </p>
 252
 253 <p> Note that many of these features were provided as partial support for
 254 editor functionality (including the incomplete DTD access).  Full editor
 255 functionality requires access to potentially malformed lexical structure,
 256 at the level of unparsed tokens and below.  Access at such levels is so
 257 complex that using it in non-editor applications sacrifices all the
 258 benefits of XML; editor aplications need extremely specialized APIs. </p>
 259
 260 <p> (This isn't a slam against DTDs, note; only against the broken support
 261 for them in DOM.  Even despite inclusion of some dubious SGML legacy features
 262 such as notations and unparsed entities,
 263 and the ongoing proliferation of alternative schema and validation tools,
 264 DTDs are still the most widely adopted tool
 265 to constrain XML document structure.
 266 Alternative schemes generally focus on data transfer style
 267 applications; open document architectures comparable to
 268 DocBook 4.0 don't yet exist in the schema world.
 269 Feel free to use DTDs; just don't expect DOM to help you.) </p>
 270
 271 </body>
 272 </html>
 273