Switch to EnglishSwitch to JapaneseSwitch to Chinese
Technical Benefits
Autonomy Service Dashboard
User Interfaces
Connectors
Administration
Voice & Video
Europe
Asia-Pacific
VAR & SI Partners
Technical Benefits
 

Technology Overview

Autonomy is founded on a unique combination of technologies borne out of research carried out at Cambridge University. Autonomy's strength lies in advanced pattern-matching techniques (non-linear adaptive digital signal processing), rooted in the theories of Bayesian Inference and Claude Shannon's Principles of Information, that enable identification of the patterns that naturally occur in text, based on the usage and frequency of words or terms that correspond to specific concepts.

Based on the preponderance of one pattern over another in a piece of unstructured information, Autonomy enables computers to understand that there is a particular probability that a document in question is about a specific subject. In this way, Autonomy is able to extract a document's digital essence, encode the unique "signature" of the key concepts, then enable a host of operations to be performed on that text, automatically. These operations include automatic clustering of related documents, automatic information delivery, hyper-linking of content as well as more traditional short query, or keyword searching.

IDOL™ Server

At the heart of Autonomy's software infrastructure lays the IDOL™ Server. It serves as a platform for understanding the meaning and significance of information: additional functionality can be seamlessly integrated in order to perform advanced operations on that data. Using this off-the-shelf solution, organizations can quickly process digital information automatically and communicate with multiple applications without the need for manual processing or metadata.

Intellectual Foundations of Autonomy

The theoretical underpinnings for Autonomy's unique functionality can be traced to Bayesian Inference and Claude Shannon's Principle of Information.

Bayesian Inference

Bayesian inference is a statistical inference named after Thomas Bayes, an 18th century English cleric whose work on mathematical probability was not published until after his death. Bayes' efforts centered on calculating the probabilistic relationship between multiple variables and determining the extent to which one variable impacted another.

A typical problem is to evaluate how relevant a document is to a given query or agent profile. Bayesian theory aids in this calculation by relating this evaluation to details that we already know, such as the model of an agent. Extensions of the theory go further than determining the relevance of information for a given query against a text. Adaptive Probabilistic Concept Modeling (APCM) algorithms are also used to analyze, sort and cross-reference unstructured information. A traditional statistical argument is that if a coin is tossed 100 times and comes up heads every time, it still has an even chance of coming up tails on the next throw. An alternative, using the Bayesian approach, is to say that 100 consecutive heads are evidence that the coin is biased, for example, it has heads on both sides. In a similar manner, knowledge about the documents deemed relevant by a user to an agent's profile can be used in judging the relevance of future documents.

Although no one knows for certain what Bayes' original goal was, Bayes' Theorem has become a central tenet of modern statistical probability modeling. By applying contemporary computational power to the concepts pioneered by Bayes, it is now feasible to calculate the relationships between many variables quickly and efficiently, allowing software to manipulate concepts and extract the meaning of information.

Shannon's Information Theory

Information Theory is the mathematical foundation for all digital communications systems. Claude Shannon's innovation was to discover that 'information' could be treated as a quantifiable value in communications.

Natural languages contain a high degree of redundancy, or unessential content. For example, a conversation in a noisy room can be understood even when some of the words cannot be heard and the essence of a news article can be grasped simply by skimming over the text. Information Theory provides a framework for extracting the concepts from this redundancy.

Autonomy's approach to concept modeling relies on Shannon's theory that the less frequently a unit of communication occurs, the more information it conveys. Therefore, ideas, which are relatively rare within the context of a communication, tend to be more indicative of its meaning. It is this theory that enables Autonomy's software to determine the most important (or informative) concepts within a document.

Core IDOL™ Features

Language Independence

Autonomy is based on advanced pattern-matching technology (non-linear adaptive digital signal processing) that exploits high-performance probabilistic modeling techniques to extract a document's digital essence and determine the characteristics that give the text meaning. As this technology is based on probabilistic modeling, it does not use any form of language dependent parsing or dictionaries. Words are treated as abstract symbols of meaning deriving its understanding through the context of their occurrence rather than a rigid definition of the language grammar.

Learning Ability

Autonomy software is able to continuously develop and learn, thanks to its unique combination of Bayesian Inference and Shannon's Information Theory. This learning ability significantly reduces the manual input required by other solutions and translates into large savings in time and money for the company.

Where other solutions need to be taught new words, phrases or concepts and shown how to categorize them, Autonomy can automatically deduce the significance of these new units of meaning, add them to relevant categories, and create new categories where necessary.

Autonomy's technology can also learn about its users by dynamically monitoring the content they view, and then deliver new and relevant content as it is added to the environment.

Format Agnosticism

Autonomy handles all types of data, providing a range of highly scalable components that automatically aggregate more than 300 different content formats, including voice and video content, from the most comprehensive range of repositories. Autonomy allows enterprises to exploit their knowledge resources as effectively as possible, offering them immediate access to a wide range of data sources, including:

Unstructured data such as HTML pages, word processing documents, spreadsheets, e-mail and rich media such as voice and video content
Semi-structured data (XML)
Structured data such as Oracle, Lotus Notes and ODBC compliant material

Unstructured Query Language (UQL)

UQL is the syntax used by IDOL when running queries and is unique to Autonomy. Unlike rigid Boolean search protocols, UQL is entirely flexible and can support even the most complex syntax, including natural language queries. Using UQL IDOL can run conceptual searches of all forms of data in the enterprise in any depository, including unstructured data such as emails, web-paged and audio and visual files.

Security

Building secure applications in the enterprise software domain is a multi-faceted problem. Competing standards, varied and numerous sub-systems and differing policies all vie with each other in an environment marked by heterogeneous networks and underlying hardware. Autonomy's Intellectual Asset Protection System (IAS) meets this challenge by approaching the task of securing enterprise applications from an architectural aspect. IAS prescribes security at every required stage, with each separate security sub-system aware of its role within the wider context. IAS thus provides organizations that deploy Autonomy's technology with the confidence of a system that is secure throughout and not just at selective points through the dataflow.

Manual or Automatic - It is not an either/or choice

Autonomy enables an entire range of information processing options, both manual and automatic. It's not an "either/or" choice. If circumstances arise when users want to apply traditional manual techniques to an information processing operation, Autonomy technology supports this choice. For example, Autonomy provides application administrators with a full workbench to control and tune the relevancy of search results. In addition, Autonomy's legacy application handling enables the manual investment in such applications to be captured, and the results to be seamlessly integrated into Autonomy's automatic solution.

Architecture

IDOL™ Server has an open architecture and is entirely data-agnostic and scalable, thereby allowing large organizations to manage vast quantities of information regardless of format or storage location.

Connectors

Using Autonomy connectors, IDOL integrates information from over 300 different repositories through an understanding of content and access rights, delivering a real-time environment in which operations across applications and content are automated.