| Technology |
|
Keyword & Boolean | | | Parsing | | | Manual Tagging & XML |
|
Parsing and Natural Language Analysis
For the last 20 years, much effort has been put into approaches to deal with unstructured information. These approaches, called parsing or semantic analysis, use rules of grammar and lexicons to try to explicitly understand textual information.
The Inherent Complexity of Language
In spite of more than two decades of research into semantic approaches, it is rarely used in real applications because its results and performance have yet to live up to expectations in real-world problems. The following cases illustrate the limitations of this approach, namely, the inability of parsing to handle ambiguity.
It is unclear from the sentence whether it is the dog or the room that is white. On the other hand, a human being would have little problem deciphering the following examples because of his or her familiarity with both rooms and dogs:
In this case the computer would be stumped. It lacks the understanding to solve such ambiguities. Some advanced systems will allow the construction of a set of rules for the machine to follow to resolve these uncertainties. However, the instruction set would be incredibly cumbersome and difficult to maintain, and would significantly degrade the system's performance.
The computer may be confused by the word 'fly', which is used in this sentence as both a subject and a verb. But that is an easy problem to solve. What about the word 'it'? How does one parse a word that refers to abstract thought?
These problems are exacerbated when a computer attempts to extract meaning by parsing full paragraphs.
Like keyword-based approaches, semantic analysis cannot determine the relative importance of ideas. In other words, the computer will assign an equal level of importance to the President, his mode of transportation and the leader he is meeting with. In addition, parsing is designed to handle a few sentences. A strict parsing mechanism has great difficulty in extracting meaning from a full paragraph. On the other hand, Autonomy is able to understand the concepts underlying a large corpus of information, from a paragraph to a whole document, meaning that relevant emphasis is placed on each theme within the document.
Reliability
Because semantic analysis is based on a true/false decision tree and rules structure, one incorrect decision or the occurrence of an unknown construct can derail the entire analysis.
Language Dependent
The semantic approach is language specific and its reliance on the grammar of a given language means it is vulnerable to slang or grammatically incorrect constructions. As the system needs to be taught every new word or change in meaning, it cannot scale easily. More generally, the system will only support a very limited subset of languages, for example English, German and Dutch, and adding a new and very different language, such as Chinese, can be problematic. Autonomy is uniquely able to handle any language.
Question and Answer Systems
An increasing number of search vendors now offer users the ability to retrieve information through natural language questions. While this approach may work well for one sentence questions or queries concerning a known universe of information, the language model simply breaks down when employed on large documents with many concepts. This occurs because question and answer systems rely on the simple combination of manually defined "question forms" and a corresponding structured dataset that holds the relevant answers. As a result, these systems can only recognize precise questions and the matching answers that have been stored in the database. They cannot find concepts outside this manually defined structure that might supply relevant answers to users' questions. Equally, question and answer systems cannot understand questions that are phrased using slang or worded slightly differently, even if these queries would make perfect sense to a human.
Autonomy's Approach
Autonomy's pattern matching technology uses predictable statistical word patterns to represent concepts and functions independently of any given language.
| Technology |
|
Keyword & Boolean | | | Parsing | | | Manual Tagging & XML |
|
















