Text Analysis Perspective for DB2 Warehouse
A set of Eclipse plug-ins that allows you to configure and test text analysis engines and use them in warehouse and mining flows created by DB2 Warehouse Edition 9.5 and its subsequent release, InfoSphere Warehouse 9.5.1.
Date Posted: December 18, 2007
|
|
 |
 |
|
Update: July 15, 2008
Version 9.5.1 works with InfoSphere Warehouse 9.5.1 and supports analysis engines that are based on UIMA 2.2.1.
What is Text Analysis Perspective for DB2 Warehouse?
Text Analysis Perspective for DB2® Warehouse is an Eclipse perspective that can be integrated into DB2 Warehouse Design Studio and InfoSphere™ Warehouse Design Studio 9.5.1. It allows you to quickly and easily configure the analysis engines that are used in text operators.
Since the release of DB2 Warehouse 9.5, the Design Studio provides text operators that can be included into data flows. These operators use UIMA analysis engines to extract concepts and relations from unstructured text. As a result, unstructured information is transformed into a structure that can be analyzed in the DB2 warehouse together with existing structured information by using business-intelligence tools such as reporting tools, tools for multidimensional analysis, or data mining tools. However, in order to deliver meaningful results, the analysis engines must be configured to a particular business problem. Text Analysis Perspective for DB2 Warehouse simplifies this task.
Text Analysis Perspective for DB2 Warehouse provides the following main benefits:
- the ability to test analysis engines on a custom collection of test documents to evaluate the quality of dictionaries, and regular-expression rules built with the analysis engines, or to evaluate the resources of a third-party UIMA analysis engine; these documents can be extracted database text columns or text documents from the file system
- the ability to compare analysis results across test runs to determine the impact of changes in your analysis engine
- the ability to use text search on test documents to identify suitable terms to be included into a dictionary or to find suitable context terms to be used in regular-expression rules.
How does it work?
Text Analysis Perspective for DB2 Warehouse is a set of Eclipse plug-ins that allows users of the DB2/InfoSphere Warehouse Design Studio to configure and test UIMA analysis engines before they are used in a data flow. These plug-ins build on the UIMA (Unstructured Information Management Architecture) Java™ SDK but do not require knowledge of UIMA itself. The Text Analysis Perspective supports users in all steps involved when configuring an annotator to use unstructured information for a business problem:
- Create a "Text Analysis Project," which contains the structure and the actions tailored to the text analysis configuration task.
- Import collections of sample text documents or database columns for testing your annotator configuration.
- Explore these documents using Lucene-based text search and an Eclipse plug-in for frequent terms analysis in order to understand the information present in the documents.
- Choose the right UIMA analysis engine for the extraction task. Text Analysis Perspective for DB2 Warehouse includes two built-in analysis engines that allow the extraction of information based on regular expressions and word lists. These annotators are packaged as "Text Analysis Plug-ins," which also contain all editors and viewers necessary for working with these annotators without UIMA skills. Moreover, one can use UIMA processing engine archive (PEAR) files, containing UIMA annotators. DB2 Warehouse 9.5 supports IBM UIMA 1.4.5-compliant annotators, whereas InfoSphere Warehouse 9.5.1 supports Apache UIMA 2.2.1.
- Run the analysis engine on the document collections in order to analyze the documents and extract information. The results are stored in an embedded Derby database for the result evaluation.
- Understand and compare the results. Text Analysis Perspective for DB2 Warehouse contains Eclipse viewers for viewing the results on the document collection and for comparing results across different runs in order to understand the impact of configuration changes (such as a change to a regular expression rule).
- Use the configured analysis engine within a warehouse project. By referencing a text analysis project in a warehouse project, all analysis engines and resources of the text analysis project are directly accessible within the warehouse project and can be used in text operators.
What's new in Version 9.5.1?
Text Analysis Perspective for DB2 Warehouse has been updated to work with InfoSphere Warehouse 9.5.1. It now supports analysis engines that are based on UIMA 2.2.1, which is an open-source Apache Incubator project. You can now use the open-source annotators that are part of this project within InfoSphere Warehouse 9.5.1 and Text Analysis Perspective for DB2 Warehouse.
|
|
 |

|  | About the technology author(s):
Text Analysis Perspective for DB2 Warehouse is a joint project between the Business Intelligence (BI) Development team and the Content Discovery team in the IBM® Development Laboratory Boeblingen (Germany). The main contributors are
- Alexander Lang, team lead, Content Discovery Solutions
- Dennis Nienhueser, intern
- Mathias Rueck, intern
- Mathias Zapke, user-centered design
- Andrea Elias, software engineer, Content Discovery
- Sebastian Nelke, software engineer, Content Discovery
- Silvia Mesturino, Ph.D., software test engineer, information management
- Simone Daum, software engineer, BI Development
- Stefan Abraham, software engineer, BI Development
- Tong-Haing Fin, software engineer, IBM Research
- Peter Bendel, architect, BI Development
IBM, DB2, and InfoSphere are trademarks of IBM Corporation in the United States, other countries, or both.
Other company, product, or service names may be trademarks or service marks of others.
| |
|
For platform(s):
Win32, Windows, Windows XP
|
 |
For topics:
analysis, Data Analysis, data mining, Eclipse, Java technology, Natural Language, semantics, UIMA, utilities
|
|
| |