IBM®
Skip to main content
    United States change      Terms of use
 
 
Select a scope:    
     Home      Products      Services & industry solutions      Support & downloads      My account     
alphaWorks  >  XML  >  

Virtual XML Garden

An implementation of XPath and XQuery for processing (and combining) many kinds of structured and formatted data as if it were all XML. (This is an ETTK technology.)


Date Posted: November 3, 2005
This is an ettk technology.
OverviewRequirements Download FAQs Forum Reviews

Update: December 19, 2006

Version 4: Rewritten XQuery optimizer based on higher-order rewriting; DFDL adapter updated to reflect the DFDL Combined Specification Working Document (core, revision 09); many fixes.

What is the Virtual XML Garden?

More and more structured data is converted into XML documents, either for transmission and processing that follow various standards such as the Web services standards, or for combination with semi-structured data such as HTML documents. Sometimes the original structured data is replaced with the converted data, sometimes it is converted "on the fly." Both approaches pose problems: If the original data is converted, then legacy applications depending on the old format must be rewritten. Converting data on the fly, on the other hand, imposes a significant performance penalty because the standard XML format requires significant overhead for generating or parsing XML character sequences.

Virtual XML solves these problems by

  • keeping everything in the native format most natural for the data
  • providing thin "on-demand" adapters for each format in a generic abstract XML interface corresponding to the XML Infoset as well as the forthcoming XPath and XQuery Data Model.

This solution is combined with implementations of the XPath and XQuery languages and sophisticated analysis technology based on a generic, higher-order rewriting engine. This combination allows it to exploit the specific data access patterns allowed by each actual structured data source underneath the Virtual XML adaptation layer through the use of data model profiles. (For a technical overview of this idea, please see Phantom XML.) Each adapter is then exposed to the user as a simple XPath or XQuery function, which allows for scripts that merge and join data from all the available data sources. Several examples of such scripts are included in the updated version of Virtual XML Garden.

Although Extensible Markup Language (XML) has gained popularity and has resulted in the creation of powerful software for creating, transforming, and querying XML-based business data, much information remains in non-XML form. Virtual XML Garden demonstrates how it is possible to "virtualize" data resources to appear as if they were all in XML and thus enables applications to access both XML and non-XML sources in a uniform manner. (For an in-depth treatment of the business case, see the IBM Systems Journal paper.)

For further details, discussion, and applications, please see Kristoffer Rose's Blog.

Virtual XML Garden is part of the Emerging Technologies Toolkit (ETTK), a special collection of emerging technologies from IBM's software development and research labs.

How does it work?

With the new Virtual XML Garden release, users can write scripts in XPath and XQuery that mix and match data retrieved from virtual XML views in order to create derived views or "XML mash-ups" with data from a wide variety of data sources:
  • formatted binary and text files (for example, formatted by COBOL copybooks) that can be described using (a subset of) the forthcoming Data Format Description Language (DFDL) standard, which retains the structure of the formatted data in the virtual XML view
  • ZIP archives (including OpenDocument files), where the ZIP members are seen as virtual XML siblings (which can, in turn, be used as data sources and "inlined" as XML)
  • entire file systems, mapping the directory structure to a virtual XML document structure (which can, in turn, be used as data sources and "inlined" as XML)
  • relational databases (through JDBC) using either the SQL/XML standard format to view the entire database as a single virtual XML document or the JSR114 WebRowSet format to view the result of a specific SQL query as virtual XML (furthermore, embedded XML information can be seamlessly "inlined" into the view)
  • IBM IMS (Information Management System) hierarchical databases exposing the IMS hierarchy as a single XML document
  • Exif/JPEG images, which can be viewed in virtual XML form with all the metadata tags encoded in the XML structure
  • Legacy HTML, which can be virtualized as modern XHTML
  • XML resources in files and on the Internet.

Furthermore, data generated by such scripts are themselves derived views that can be accessed as XML data or through standard APIs such as SDO (Service Data Objects).


About the technology author(s):
Virtual XML Garden was developed by a world-wide team of IBM's XML technology experts as part of the Virtual XML project. They can be contacted through e-mail.

  • Kristoffer Rose, Ph.D., whose research in XML technology is focused on the manner in which XML and the XML processing languages (XPath, XSLT, XQuery, etc.) can be implemented such that they can be used efficiently over diverse and distributed data structures, including large data collections and data not in XML
  • Lionel Villard, Ph.D., whose research interests include multimedia documents, contextual adaptation, authoring tools, document transformations, incremental transformations, and high performance
  • Achille Fokoué, who was the primary architect and the developer of the IBM XML Schema Quality Checker. His work on XML technologies involves static analyses of XPath, XQuery, and XSLT for both program understanding and optimization purposes.
  • Paul Castro, Ph.D., who currently works on enhancing browser-based and rich-client platforms. His past work includes mobile computing systems and applications, network sensor environments, and wireless location systems.
  • Christopher Holtz, who works in IMS development, predominantly with Java, XML, SOAP, and emerging, SOA-enabling technologies
  • William Li, a former member of IMS Fast Path Test and IMS DB Adapter Development who currently works as a member of the IMS XML database team
  • Geoff Judd, an advisory software engineer at IBM Hursley Labs in Winchester, England, who works on the WBI-Message Broker, focusing on run-time message parsing
  • Suman Kalia, an advisory software developer at IBM Toronto Laboratory in Canada, the team lead for Message Set Development tools for WebSphere Business Integration Message Broker.
  • Anthony Beardsmore, a software engineer at IBM Hursley Labs, Winchester, England, who works with the ATHENA EU Integrated Project on technologies for business integration
  • Rajeshwari Rajendra, an IBM software engineer in Bangalore, India, whose interests include parser development and static analysis for programming languages, currently with a specific emphasis on XML-processing languages such as XQuery
  • Anke Diderich, an intern from the University of Rostock in Germany who is pursuing a diploma in computer science, works on integration of XML and databases.

Download now Download now

Related technologies

For platform(s):
Java

For topics:
Data Analysis, DFDL (data format definition language), Emerging Technologies Toolkit (ETTK), Java technology, virtualization, XML, XPath, XQuery


Related resources

Virtual XML project

Kristoffer Rose's blog

 

    About IBM Privacy Contact