Go to www.autonomy.comGo to www.autonomy.co.jpGo to www.autonomy.com.cn
Limitations of Other Approaches
IDOL Modules
Search & Retrieval
Collaboration & Personalization
Analytics & Taxonomy
Metadata, XML & Connectivity
Rich Media
Europe
Asia-Pacific
Autonomy's Partners
Limitations of Other Approaches
 

Technology Keyword & Boolean | Parsing | Manual Tagging & XML

Parsing and Natural Language Analysis

For the last 20 years, much effort has been put into approaches to deal with unstructured information. These approaches, called parsing or semantic analysis, use rules of grammar and lexicons to try to explicitly understand textual information.

The Inherent Complexity of Language

In spite of more than two decades of research into semantic approaches, it is rarely used in real applications because its results and performance have yet to live up to expectations in real-world problems. The following cases illustrate the limitations of this approach, namely, the inability of parsing to handle ambiguity.

Example 1: 'The dog came into the room; it was white.'

It is unclear from the sentence whether it is the dog or the room that is white. On the other hand, a human being would have little problem deciphering the following examples because of his or her familiarity with both rooms and dogs:

'The dog came into the room; it was furry.'
'The dog came into the room; it was full of furniture.'

In this case the computer would be stumped. It lacks the understanding to solve such ambiguities. Some advanced systems will allow the construction of a set of rules for the machine to follow to resolve these uncertainties. However, the instruction set would be incredibly cumbersome and difficult to maintain, and would significantly degrade the system's performance.

Example 2: 'The fly, it's clear to me, can fly faster than the bee.'

The computer may be confused by the word 'fly', which is used in this sentence as both a subject and a verb. But that is an easy problem to solve. What about the word 'it'? How does one parse a word that refers to abstract thought?

These problems are exacerbated when a computer attempts to extract meaning by parsing full paragraphs.

Example 3: 'The president arrived by car to meet the Chinese premier.'

Like keyword-based approaches, semantic analysis cannot determine the relative importance of ideas. In other words, the computer will assign an equal level of importance to the President, his mode of transportation and the leader he is meeting with. In addition, parsing is designed to handle a few sentences. A strict parsing mechanism has great difficulty in extracting meaning from a full paragraph. On the other hand, Autonomy is able to understand the concepts underlying a large corpus of information, from a paragraph to a whole document, meaning that relevant emphasis is placed on each theme within the document.

Reliability

Because semantic analysis is based on a true/false decision tree and rules structure, one incorrect decision or the occurrence of an unknown construct can derail the entire analysis.

Language Dependent

The semantic approach is language specific and its reliance on the grammar of a given language means it is vulnerable to slang or grammatically incorrect constructions. As the system needs to be taught every new word or change in meaning, it cannot scale easily. More generally, the system will only support a very limited subset of languages, for example English, German and Dutch, and adding a new and very different language, such as Chinese, can be problematic. Autonomy is uniquely able to handle any language.

Question and Answer Systems

An increasing number of search vendors now offer users the ability to retrieve information through natural language questions. While this approach may work well for one sentence questions or queries concerning a known universe of information, the language model simply breaks down when employed on large documents with many concepts. This occurs because question and answer systems rely on the simple combination of manually defined "question forms" and a corresponding structured dataset that holds the relevant answers. As a result, these systems can only recognize precise questions and the matching answers that have been stored in the database. They cannot find concepts outside this manually defined structure that might supply relevant answers to users' questions. Equally, question and answer systems cannot understand questions that are phrased using slang or worded slightly differently, even if these queries would make perfect sense to a human.

Autonomy's Approach

Autonomy's pattern matching technology uses predictable statistical word patterns to represent concepts and functions independently of any given language.

Further Reference: The Evolution of Search
Further Reference: Autonomy's Unique Combination of Technologies
Technology Keyword & Boolean | Parsing | Manual Tagging & XML
Further References:
 
 
Discover More...