Skip to main content

XML Enhancements for Java

A set of language extensions that facilitate XML processing in Java.

Date Posted: March 31, 2005

alphaworks tab navigation


Update: November 22, 2006 New release includes support for updates and better XPath compilation and typechecking.

1. Why is programming in XJ better than DOM, SAX, etc.?

We will look at this issue with respect to the three major extensions in XJ: the ability to refer to element declarations in XML Schema as if they were Java classes, the ability to write XPath expressions, and the ability to construct XML data by using inline XML.

Referring to XML Types: Consider two code fragments of pure Java code: The first code fragment is written in a reflective style, where the only type used is java.lang.Object; the other code fragment is written as Java programs normally are, that is, by referring to classes and interfaces directly. Clearly, the second style is less tedious to write and easier to maintain. For example, errors in the reflective style (for example, if a class does not contain a particular field) will not be reported until run time. If a class definition (or XML Schema in the XML case) is changed, a programmer must search an entire program to find the portions that are affected. In the second case, where type references are explicit in the program, a compiler would throw errors at improper accesses of the program. The ability to specify the types of XML data in an XJ program makes XML processing programs easier to write, more readable, and more maintainable.

XPath Expressions: XPath is a declarative language that allows a programmer to write concise expressions that retrieve information from XML data. Run-time APIs, such as DOM, are moving towards allowing programmers to write XPath expressions rather than writing explicit tree traversal code (which is painful). There are two problems with the run-time approach. First, an XPath is passed to the run-time library as a string. Simple errors in XPath, such as syntax errors, will not be caught until runtime. More complex errors, such as misspelling the name of an element in the document, cannot be detected by the run-time API; the XPath evaluation will return no results. The second problem of run-time APIs is performance. The run-time library must parse and evaluate an XPath expression at run-time. There is no opportunity for a run-time library to analyze a program and structure the query evaluations in an optimal manner.

XJ solves these problems by introducing knowledge of XPath expressions into the language. The compiler can verify that the evaluation of an XPath on an element is appropriate with respect to the XML Schema type of the element (detecting many errors at compilation time). Furthermore, it can generate optimized code for evaluation of XPath expressions (this ability is not in the current prototype, but is coming soon).

Construction of XML: With run-time libraries, such as DOM, a programmer must construct XML in a step-by-step manner, constructing each node and appropriately inserting it into the tree. In constructing elements and attributes, a programmer passes strings to the run-time library; misspellings and other errors are not caught until run time (if the constructed document is validated). XJ, on the other hand, allows programmers to write XML inline. Programmers can cut-and-paste the XML they wish to generate and include it in the program. The compiler will verify that the constructed XML is valid with respect to the declared types (it can also optimize construction).

1. Why is programming in XJ better than DOM, SAX, etc.?

We will look at this issue with respect to the three major extensions in XJ: the ability to refer to element declarations in XML Schema as if they were Java classes, the ability to write XPath expressions, and the ability to construct XML data by using inline XML.

Referring to XML Types: Consider two code fragments of pure Java code: The first code fragment is written in a reflective style, where the only type used is java.lang.Object; the other code fragment is written as Java programs normally are, that is, by referring to classes and interfaces directly. Clearly, the second style is less tedious to write and easier to maintain. For example, errors in the reflective style (for example, if a class does not contain a particular field) will not be reported until run time. If a class definition (or XML Schema in the XML case) is changed, a programmer must search an entire program to find the portions that are affected. In the second case, where type references are explicit in the program, a compiler would throw errors at improper accesses of the program. The ability to specify the types of XML data in an XJ program makes XML processing programs easier to write, more readable, and more maintainable.

XPath Expressions: XPath is a declarative language that allows a programmer to write concise expressions that retrieve information from XML data. Run-time APIs, such as DOM, are moving towards allowing programmers to write XPath expressions rather than writing explicit tree traversal code (which is painful). There are two problems with the run-time approach. First, an XPath is passed to the run-time library as a string. Simple errors in XPath, such as syntax errors, will not be caught until runtime. More complex errors, such as misspelling the name of an element in the document, cannot be detected by the run-time API; the XPath evaluation will return no results. The second problem of run-time APIs is performance. The run-time library must parse and evaluate an XPath expression at run-time. There is no opportunity for a run-time library to analyze a program and structure the query evaluations in an optimal manner.

XJ solves these problems by introducing knowledge of XPath expressions into the language. The compiler can verify that the evaluation of an XPath on an element is appropriate with respect to the XML Schema type of the element (detecting many errors at compilation time). Furthermore, it can generate optimized code for evaluation of XPath expressions (this ability is not in the current prototype, but is coming soon).

Construction of XML: With run-time libraries, such as DOM, a programmer must construct XML in a step-by-step manner, constructing each node and appropriately inserting it into the tree. In constructing elements and attributes, a programmer passes strings to the run-time library; misspellings and other errors are not caught until run time (if the constructed document is validated). XJ, on the other hand, allows programmers to write XML inline. Programmers can cut-and-paste the XML they wish to generate and include it in the program. The compiler will verify that the constructed XML is valid with respect to the declared types (it can also optimize construction).

2. How is XJ different from other languages that integrate XML support?

There are many languages that integrate XML as a first-class construct into a programming language, the granddaddy of them all being XDuce. The aspect that sets XJ apart from these is its consistency with the open standards of XML 1.0, XPath, Namespaces, and XML Schema. Many of the details of our design were driven by the goal of ensuring that XML processing in XJ behaves "as expected;" for example, an XPath evaluation on an XML document in XJ gives the same results as that of an XSLT or XQuery processor.

Another aspect of XJ that is unique (not available in the current prototype, but coming soon) is the ability to update XML values in-place.

2. How is XJ different from other languages that integrate XML support?

There are many languages that integrate XML as a first-class construct into a programming language, the granddaddy of them all being XDuce. The aspect that sets XJ apart from these is its consistency with the open standards of XML 1.0, XPath, Namespaces, and XML Schema. Many of the details of our design were driven by the goal of ensuring that XML processing in XJ behaves "as expected;" for example, an XPath evaluation on an XML document in XJ gives the same results as that of an XSLT or XQuery processor.

Another aspect of XJ that is unique (not available in the current prototype, but coming soon) is the ability to update XML values in-place.

3. Where can I find more documentation?

More information on XJ can be found at the XJ web site. The XJ manual is available for download in pdf or HTML.

3. Where can I find more documentation?

The XJ manual can be found in the doc subdirectory of the distribution. An HTML version of the documentation is also available.

4. How do I report bugs?

Please either use the alphaWorks forum or send e-mail. We are committed to fixing bugs as quickly as possible.

4. How do I report bugs?

Please either use the alphaWorks forum or send e-mail. We are committed to fixing bugs as quickly as possible.

5. What are the limitations of XJ?

Certain features of XML Schema, such as substitution groups, are not yet supported. See the manual (XJmanual.pdf) in the doc subdirectory for a list of limitations.

5. What are the limitations of XJ?

Certain features of XML Schema, such as substitution groups, are not yet supported. See the manual (XJmanual.pdf) in the doc subdirectory for a list of limitations.

6. What are some of the features under development?

We are working on better compilation of XPath expressions and a more efficient run-time environment. We are also developing an Eclipse plug-in for easier development of XJ applications. Finally, we will be adding WSDL support in order to facilitate development of Web services.

6. What are some of the features under development?

We are working on better compilation of XPath expressions and a more efficient run-time environment. We are also developing an Eclipse plug-in for easier development of XJ applications. Finally, we will be adding WSDL support in order to facilitate development of Web services.

7. Are there other FAQs available? Is it possible to contribute content such as creative solutions or new ways to use this technology?

The ETTK Wiki contains a link to user-contributed FAQs and we welcome you to add your own FAQ entries. You can contribute your own ETTK-related content by joining the ETTK Community and posting to the appropriate section of the Wiki.

7. Are there other FAQs available? Is it possible to contribute content such as creative solutions or new ways to use this technology?

The ETTK Wiki contains a link to user-contributed FAQs and we welcome you to add your own FAQ entries. You can contribute your own ETTK-related content by joining the ETTK Community and posting to the appropriate section of the Wiki.

8. This technology is listed as a member of the ETTK. What does that mean?

The Emerging Technologies Toolkit (ETTK) is a collection of emerging technologies that are relevant to IBM's emerging software strategies. The ETTK team works with external users to incubate and further develop these technologies so they can be used to create innovative customer solutions. ETTK packages are focused on just a few select technology areas. In a way, you could associate "ETTK"-labeled technologies as being close to "alphaWorks Featured Technologies." ETTK technologies explore new types of applications or address emerging application-, Internet-, or standard-oriented domains. We value your input and want to hear how you would make use of this technology in your environment; please visit the ETTK Blog or ETTK Wiki for additional information.

8. This technology is listed as a member of the ETTK. What does that mean?

The Emerging Technologies Toolkit (ETTK) is a collection of emerging technologies that are relevant to IBM's emerging software strategies. The ETTK team works with external users to incubate and further develop these technologies so they can be used to create innovative customer solutions. ETTK packages are focused on just a few select technology areas. In a way, you could associate "ETTK"-labeled technologies as being close to "alphaWorks Featured Technologies." ETTK technologies explore new types of applications or address emerging application-, Internet-, or standard-oriented domains. We value your input and want to hear how you would make use of this technology in your environment; please visit the ETTK Blog or ETTK Wiki for additional information.

Trademarks




Related technologies