Metadata Handling Operations
Built on a unique pattern-recognition technology, Autonomy's core engine enables a manual or fully automated precise means of matching and identifying the similarity of pieces of information.
Full Metadata Handling
|
|
Metadata-based approaches to the problem of unstructured information are fundamentally compromised by being predicated on the flawed assumption that it is possible to summarize any document or piece of information within a smaller number of keywords. Autonomy's technology understands the information itself and therefore is able to make decisions and perform operations dependent on the entire content -- not just a subset of words decided upon during an unaccountable and expensive manual process. |
However, where Autonomy displaces a legacy technology, considerable investment may already have been made into such processes and it is important to capture the business value contained in existing metadata. Autonomy's Intelligent Data Operating Layer is able to extract all such data and offers complete support of legacy operations over it, enabling the enterprise to migrate to Autonomy, automatically supporting any metadata dependent applications currently in use.
Full Metadata Handling includes:
Connector Extraction of Metadata
All of Autonomy's Connectors are capable of extracting ALL information contained in the repositories, including for example the metadata stored in database records, file records in document management systems and meta-information in internet and intranet pages. Once stored within Autonomy's Intelligent Data Operating Layer, all applications built on the layer can take advantage of this metadata and the business rules they embody.
Legacy Search Support
IDOL Server uses Autonomy's Unstructured Query Language (UQL) to support all known legacy search methods, including:
- Keyword search, including the support of operators such as AND, OR, NOT, NEAR, DNEAR, SOUNDEX, FUZZY, RANGE, etc
- Boolean restrictions across multiple metadata fields
- Parametric search to narrow a search space down through successive metadata selections
- Support of sorting by relevance and/or arbitrary numbers of metadata fields
- Full weighting of metadata - weight individual keywords, special metadata fields or entire documents more or less than others
- Support of user-feedback systems to allow users to increase or decrease the influence of a given document within a given set of results
Legacy Search Examples
| 1) Keyword query using range restrictions of metadata & bracketed boolean expressions |
|
Question "I'm interested in books on thai cookery, published after January 1st 2001 that cost between than $10 and $15. I'll also consider UK books between £8 and £12 on the same subject, as long as they are paperback, to keep the postage costs down." |
|
Database Query action=query&text=thai cookery&fieldtext=RANGE{01/01/2001,.}:*/PUBLISH_DATE+AND+(NRANGE{10,15}:book/us/price) |
| 2) Conceptual query with full weighting of keywords & metadata restrictions |
|
Question "I'm interested in reports we have on financial services firms in Asia. I know I'm most interested in reports to do with forex and those originating in Indonesia, so I'd like to weight that part of the query." |
|
Database Query action=query&text=financial services industry and individual firms in asia, particularly those that deal with forex[20] in China, Malaysia, Indonesia[30] Japan and Hong Kong and how they've been affected by recent market uneasiness in the US&database=Archive&fieldtext=MATCH{china,malaysia,indonesia,japan,hong kong}:*/country+OR+MATCH{asia}:*/metadata/region |
Autonomy's Legacy Compatibility Module - LCM
Autonomy provide legacy compatibility that enables organisations to deploy IDOL, displacing existing legacy systems, while maintaining existing system workflow. Once this minimally disruptive process has been completed, organizations are able to activate the advanced functionality that IDOL offers progressively, enabling a controlled and managed migration of technology with no loss of service to current applications. LCM options include:
- The Legacy Import Module: a special Autonomy Connector that can connect directly to popular legacy indexes and automatically extract all data contained within. Currently supported formats include BIF, legacy Topics and all XML-based indexes
- Query Translation: It is possible to express all known legacy operators within Autonomy's Unstructured Query Language (UQL).
- Result Templating and XML support: All IDOL output starts as XML and can be repurposed extensively through the use of templates and stylesheets. Therefore, once Autonomy is in place data can be delivered in whatever format an existing application expects
Multidimensional Metadata
As Autonomy uses XML as its internal storage format, it is possible to encode hierachical metadata that cannot be expressed in flatfile formats that are popular with legacy systems. With such legacy systems non-trivial metadata semantics that is based on, for example, relational databases is irretrievably lost.
Multi-Type Metadata Support
Autonomy's internal storage architecture is highly configurable and when metadata is captured using the processes detailed above, it can be stored in different formats to assist its future processing. Typical information captured includes:
- Arbitrary Numbers / Length fields
- Metadata per document
- Price, Color, Images
- Summaries, Types, Security, Meta-tags
- Strings, Numbers, Dates, Bits
Educe
|
|
With the advent of Autonomy's Eduction module, organizations will be able to automatically educe metadata, contextual meaning and relationships between data, intelligently leveraging any form of information irrelevant of file format or location to drive the business decision process. |
Configuration of the Eduction process is done by simple examples and the detection of the metadata does not rely on strict formatting or wording of the documents. Eduction performs two types of metadata extraction: Plain Tagging and Concept-Value Tagging.
Plain Tagging
Plain tagging is combining words, numbers and other symbols into a single item, and then associating this item with a Tag-Name. All items with the same Tag-Name share some common properties. Tag-Names are either predefined or user defined.
The extracted fields EDUCE_PHONE, EDUCE_EMAIL, EDUCE_DATE, EDUCE_STREET, EDUCE_TIME, EDUCE_TOWN and EDUCE_POSTCODE in the CV example are examples of plain tagging.
-
User Defined Tags
User defined phrase tags are simple to configure. All that is required is a list of sample phrases to detect and a suitable Tag-Name.
-
Predefined Tags
Most commonly used tags are predefined in the system. The table below shows the set of predefined tags used by the plain tagging process.
Autonomy delivers a wide range of off-the-shelf preset tag types
Concept-Value Tagging
Concept-Value tagging is the detection of a concept or phrase and matching this with suitable values. Suitable values are inferred from examples set by the user.
-
Training by Example
Training by Example is simply set by supplying a sequence of example phrases. Each phrase is also given an example Tag-Value
This example shows example training for the concepts: rates of interest, series number, issue dates and amount. These might typically be used for financial reports:
The sample rules shown above were used to infer the eduction tags.
Compound Concept-Value Tagging
Autonomy enables you to combine both methods of tagging (Plain Tagging and Concept-Value Tagging) into one method. This enables association of two or more plain tagged elements. For example, association of Names and E










