Go to www.autonomy.comGo to www.autonomy.co.jpGo to www.autonomy.com.cn
Limitations of Other Approaches
IDOL Modules
Search & Retrieval
Collaboration & Personalization
Analytics & Taxonomy
Metadata, XML & Connectivity
Rich Media
Europe
Asia-Pacific
Autonomy's Partners
Limitations of Other Approaches
 

Technology Explicit Thesauri | The Vector Method | The "OneBox" Model

The Vector Method

The Vector method is concerned with the partitioning of data, or categorization. This is done by imagining documents as points in a multidimensional space which are then divided into categories. Categories must be taught to the system so that the more training that occurs, the more accurate the categorization can be. Many of today's search engines use a combination of Vector and Boolean methods.

Language Dependent

The system needs to be trained in its target language, and will only recognize words it has been taught. There is no inherent understanding of synonyms or related words. For example, it would be unable to deduce that "Creutzfeldt-Jakob" and "mad cow" are related terms.

Inaccurate

The Vector method is inaccurate because it is unable to perfectly divide categories and has particular trouble with documents that fit into more than one category. It will classify such documents under one category or another, but not both. There is also no notion of threshold or relevance, so if a document is put into a particular category, there is no indication of how relevant it is within that category. Does it mention the topic only a couple of times, or is it entirely focused on it? The Vector Method is unable to tell.

Manual

All categories must be defined manually by administrators and the system requires constant monitoring and maintenance to ensure it keeps functioning. Any time there is a change in the categorization, the whole training process must begin again from scratch as there is no ability to make updates to just one area of the system.

Ranking Discrimination

The importance and relevance of one word compared to another is not understood. To combat this effect, common words can be ignored, and the focus placed on rare words, assuming they will give more insight into the theme of a document. However, this is not always accurate and can result in weight being placed on inappropriate words resulting in categorization errors.

Autonomy's Approach

Autonomy's technology can understand the content of a document probabilistically, without depending on an understanding of a particular language, and create categories accordingly. Where necessary, a document can be classified in more than one category. Autonomy's automatic categorization functionality ensures that taxonomies are created and maintained with as much or as little human interference as desired.

"We were attracted to Autonomy because it can process information from a wide range of differently structured data sources, which similar products cannot do. It can also combine internal and external information. The users like it and so far we have been impressed."
Duncan Fyfe, AstraZeneca
Further Reference: The Evolution of Search
Further Reference: Autonomy's Unique Combination of Technologies
Technology Explicit Thesauri | The Vector Method | The "OneBox" Model
Further References:
 
 
Discover More...