Autonomy's Technology
Unified Information Access
Connectivity
Rich Media
Healthcare Technology

Technology Innovation | Limitations of Other Approaches | Security, Scalability and Performance
Overview
Related Events
Related Case Studies
Related Resources
Related News

Limitations of Other Approaches

Autonomy supports all legacy methods described below. However, we recognize the limitations of the following approaches in enterprise scenarios and uniquely offer conceptual retrieval to provide users with the most accurate and complete search results with minimal manual intervention.

Keyword and Boolean Searches

Keyword and Boolean searches return only those documents that contain the terms queried by the user. Because of this limitation, the success rate of the searches is heavily reliant on the skill of the user, use of precisely the right terms, adeptness with Boolean operators, etc. Keyword searches presuppose that the user already knows exactly what they are looking for (hence the precise terms required for a successful query). However, in an enterprise search scenario, the user often only has a general idea of the file they are looking for, and many valuable documents are found serendipitously not in the initial cycle but through clicking around different related files.

In addition to its heavy dependence on the user, other major flaws prevail:

It ignores the context in which the keywords were found, and therefore cannot accurately gauge whether the keywords found in the file also represent the main concepts of the file. Weighting of keywords (e.g. found in title vs. buried in the middle of the file; frequency of keywords) only mitigate this issue and does not remove this critical defect.
It cannot find files that are conceptually relevant to the queried terms but do not contain the keywords used in the query.
In order to maximize accuracy, it requires human intervention to manage and update keyword associations or categories.
It is unable to learn and adapt through use; it cannot retrieve files by being shown an example.

Parsing and Natural Language Analysis

Parsing, or semantic analysis, uses rules of grammar and lexicons in order to explicitly understand textual information. In spite of more than two decades of research into semantic approaches it is rarely used in real applications because the associated results and performance have yet to live up to expectations in real-world problems:

Due to the inherent complexity of language, it is unable to handle ambiguity (e.g. "The dog came into the room; it was furry." What is the "it" it refers to?). Improving the algorithm requires the construction of a set of rules that are cumbersome and difficult to maintain.
It cannot determine the relative importance of ideas.
It is designed to handle a few sentences and has great difficulty extracting meaning from full paragraphs.
Since semantic analysis is based on a true/false decision tree and rules structure, one incorrect decision or the occurrence of an unknown construct can derail the entire analysis.
It is language-specific and its reliance on grammar makes it unable to understand slang or grammatically incorrect constructions.
It cannot scale easily since the system needs to be taught every new word or change in meaning.

An increasing number of search vendors now offer users the ability to retrieve information through natural language questions. While this approach may work well for one-sentence questions or queries concerning a known universe of information, the language model simply breaks down when employed on large documents with many concepts. This occurs because question and answer systems rely on the simple combination of manually defined 'question forms' and a corresponding structured dataset that holds the relevant answers. As a result, these systems can only recognize precise questions and the matching answers that have been stored in the database.

Manual Tagging, Weighting and XML

Manual tagging schemes are becoming an increasingly popular method of labeling and categorizing digital material. However, it suffers from the following flaws:

It is descriptively inconsistent due to its reliance on human contribution. Each person may tag a given document differently (especially when the content deals with multiple themes), and/or people can get lax in their tagging and categorize most content under "general."
XML is not a set of standard tag definitions; it is a set of definitions that allow for tags to be defined. This poses difficulty when organizations or departments with different practices interoperate.
Taxonomy creation and tagging involve costly manual labor, requiring input from librarians, users and IT staff.
Tags fail to highlight the relationships between subjects because they lack a conceptual understanding to form correlations. There are often vital relationships between seemingly separately tagged subjects such as wing design/low drag and aerofoil/efficiency, but this concept of "idea distancing" is not leveraged.
As the number of tags increases, so too does the likelihood of misclassification and the effort to maintain consistency. This approach is not scalable.

PageRank and Popularity-Based Internet Methods

PageRank determines a web page's "importance" depending on the number of pages that link to it, and on how "important" these pages are considered. This is then used in conjunction with a keyword entered by the user to retrieve the most relevant results. It suffers from the following flaws:

Since oft-linked pages usually consist of general overview of topics, search results list the most general pages first, and finding specific information requires users to enter very specific keywords.
PageRank relies on manually added hyperlinks, which are rare in enterprise context and not always a good indicator of a file's importance.
"When applied to enterprise search, the effectiveness of PageRank ranges from limited to useless."
CMS Watch, The Enterprise Search Report 2008

Explicit Thesauri

A thesaurus maintains a list of industry-specific terms and their synonyms. This can be useful in environments with a large corpus of industry-specific terms, abbreviations and jargon, such as the medical and scientific fields. However, thesauri are costly and time-consuming to create and definitions can be inaccurate because the meaning of words vary according to context. In addition:

The lists are static and the system cannot automatically update the changes in meaning or addition of words. In a costly and manually intensive fashion, the administrator must maintain the thesaurus.
The creation of the thesauri involves the painstaking and time-consuming work of an expert.

Social Methods

Enterprises that have embraced "socialware" have benefited from the enthusiastic wave of information creation as well as the connection it engenders between disparate members of the enterprise community. Unfortunately, social methods also have critical flaws in an enterprise context:

User-generated content and manual tags are subjective and privy to personal habits that can be exploited at the expense of accurate and reliable information. Tag spamming, in which people label their information inaccurately in order to generate interest, can also become an issue.
Their form of classification is too wild and unpredictable to deliver a stable and accessible taxonomy.
Not enough distinction is made between contributions from experts and thoughts from amateur enthusiasts who volunteer information beyond their expertise.
Granting editing access to a large number of employees on internal wikis invites greater potential for security breaches.
Social methods inherently require manual effort for creation and maintenance.

Ultimately, the enterprise has much to gain when social methods can be automated. Autonomy's IDOL platform incorporates a range of unique functions to automate and enhance social networking tools, automatically generating comprehensive user profiles, recommending appropriate tags and generating hyperlinks to related material. Autonomy can also monitor the evolution of content to alert management to vandalism and automatically repair damaged articles. At every stage, authorized administrators are able to modify entries and settings by a comprehensive range of parameters, delivering full control. In short, Autonomy's holistic approach provides all the benefits of social methods, but also negates the pitfalls of intensive maintenance and user bias.

"The biggest challenge in the information society is the fact that we are drowning in information. With Autonomy we can save time and costs that we used to spend on maintenance and information retrieval. Additionally we can support our users with personalized interfaces. When we trialled Autonomy we had already chosen a more traditional keywordbased technology, but Autonomy changed our mindset."
Peter Rasmussen, Danske Bank

This is a selection of our forthcoming events, please visit our seminars page for more information.

Automatic Hyperlinks provided by IDOL Server

This is a small selection of the Autonomy case studies available, please visit our publications site at http://publications.autonomy.com/ for more information.

Automatic Hyperlinks provided by IDOL Server

Technology Innovation | Limitations of Other Approaches | Security, Scalability and Performance
About Us
Technology
Functionality
Products
Solutions
Services
Customers
Partners
News & Events
Contact Us