IBM Tool for Interactive Text Classification and Labeling
An interactive interface that enables you to create, validate, train, and refine a text classification system.
Date Posted: September 11, 2007
|
|
 |
 |
What is IBM Tool for Interactive Text Classification and Labeling?
Whether your application supports customer satisfaction analysis, root cause determination, or churn predication, your application needs to classify large body of text data into correct set of groups. An incorrect classification could potentially lead to wrong conclusions, mis-allocation of resources, and delay in resolution of problems.
Manual approaches to classification of data can be expensive, error prone, and unsuitable for a large amount of data. Automatic classification systems are also prone to error and poor quality. For example, classification can be affected by uncertain labeling semantics and ambiguity in text.
IBM® Tool for Interactive Text Classification and Labeling bridges this gap and provides a comprehensive interactive interface for
- correcting and refining automatic text classification models through human feedback
- continuously inspecting and validating pre-classified data in order to ensure consistency
- building "good" training data from scratch.
How does it work? This tool has three components:
- The Classifier is an automatic text classification module that must be trained using a few pre-labeled documents. A model is created as a result of this training process. After the model is created, it can be applied to predicting labels for new documents with a confidence score assigned to the classification process.
- Confidence-based selection allows users to select an active document set based on thresholds on confidence or randomly validate or alter predicted labels.
- Model improvement uses feedback to improve the classification model by re-training. It incorporates those documents that are most informative to the current model.
|
|
 |

|  | About the technology author(s): Shourya Roy has worked as a technical staff member with IBM Research since 2002. His expertise lies in the area of unstructured information management (UIM), and he is proficient in designing as well as developing UIM applications for domains such as contact center and telecom. Mr. Roy's research interests include knowledge management, data mining, machine learning, and databases. He has published papers in leading conferences, including VLDB, WWW, ACL, ICDM, COLING, Hypertext, and EMNLP, as well as in journals such as the VLDB journal and the Journal of Autonomic and Trusted Computing on Autonomic and Trusted Computing Systems and Applications. Mr. Roy earned his B.E. in computer science and engineering from Jadavpur University, Kolkata, in 2000 and his M. Tech in computer science and engineering from IIT Bombay in 2002.
Shantanu Godbole, Ph.D., earned his B.E. in computer engineering from Pune University in 1998, his M.Tech. in IT from IIT Bombay in 2001, and his Ph.D. from IIT Bombay in 2006. His doctoral dissertation was titled Inter-class relationships in text classification. Dr. Godbole has been working as a research staff member with IBM India Research Lab since early 2006. He works with the Information Management group and is interested in data mining, machine learning, information visualization, and building operational information analytics systems. Dr. Godbole has been published in leading conferences such as KDD, PKDD, ICDM, PAKDD, and Hypertext, and he is an active member of the data-mining community.
Sumeet Agarwal and Diwakar Punjani, no longer with IBM, also contributed to this technology.
This software was developed at IBM India Research Lab as a by-product of a research project. If you have any queries, comments, criticisms, bug reports, and feedback, please e-mail the engineers.
IBM is a trademark of IBM Corporation in the United States, other countries, or both.
Other company, product, or service names may be trademarks or service marks of others.
| |
|
| |