General Motors
BP
Ford Motor Company
AstraZeneca
DaimlerChrysler
CNN
General Electric
US Senate
Credit Suisse First Boston
Volkswagen
Siemens
Philip Morris
Bloomberg
T-Mobile
Verizon
AT&T
3
FIAT
Nestle
Credit Lyonnais
General Dynamics
Hewlett Packard
ABN Amro
UBS Warburg
Merrill Lynch
New York Stock Exchange
The Economist
France Telecom
Boeing
Lafarge
Sun Microsystems
Safeway
People's Republic of China's
Ministry of Agriculture
US State Department
Nordea
US State Department
Ericsson
BBC
General Motors
Tesco
Pfizer
Philips
Britvic Softdrinks
UK Department of Trade & Industry
Sybase
Sprint
New York Life Insurance
Canon USA
US Department of Commerce
Novell
US Department of Defense
Ericsson
EDS
Philip Morris International
Royal & SunAlliance
Novartis
Credit Lyonnais
General Motors
Sun Microsystems
British American Tobacco
Norsk Hydro
AstraZeneca
Skanska
AT&T
3
Ingersoll-Rand
Philips
BAE Systems
Kodak
The Royal Mail Group
Henkel
Bank of Montreal
Danske Bank
BMW
Kronos Corporation
Fujitsu Technology Services
MOL
Zurich Financial Services
Halliburton
Philips
BBC
Blue Cross/Blue Shield of Massachusetts
T-Mobile
3
Channel 4 Corporation
VHA
Macmillan Publishing
Burges Salmon
Motorola
General Motors
Sun Microsystems
British Telecom
Swiss Army
Ferrari
Lloyds
Deloitte & Touche
PA Consulting
New York Life Insurance
Danske Bank
US Army
ABN Amro
Hewlett Packard
3
BP
UK Department of Trade & Industry
EMC Corporation
US Department of Commerce
Encana Corporation
IEEE
Hewitt Associates LLC
HEALTHvision
Lloyds
Paramount
Lexmark
US Department of Defense
Philips
3
JD Edwards
Ingersoll-Rand
Nestle
PricewaterhouseCoopers
Vodafone Omnitel
Nomura
AstraZeneca
The McGraw-Hill Companies
US State Department
Reed Elsevier
Dow Chemical Company
Siemens Power Generation
HM Revenue & Customs
Texas Instruments
Forrester Research
Royal & SunAlliance
General Motors
BBC
McData
Wall Street Journal
Siemens
Lloyds
AstraZeneca
NASA
SCA
UK Department of Trade & Industry
Reuters
AstraZeneca
Philips
3
ITN
US Department of Defense
IBM NICA
Forbes.com
Nissan North America, Inc.
Toyota Motor
BP
The McGraw-Hill Companies
Fox Sports
Society of Petroleum Engineers
Lloyds
AstraZeneca
US Department of Energy
European Commission
Telecom Italia
Harrah's
Nestle
AXA
Sybase
Napster
Oracle
Compuware
Olympus
Sun Microsystems
ARM
AstraZeneca
Ericsson
US State Department
Taylor & Francis
Federal Express
Nissan Motor
Siemens
Milward Brown Precis
Federal Government of Canada
UK Home Office
HM Revenue & Customs
3
Harvard Business School
Britvic Softdrinks
MOL
Macmillan Publishing
Allianz Life Insurance Co
Swiss Army
Parliament of Singapore
VMS
Nestle
Singapore Police Force
Sony Music
GSA Advantage!
AstraZeneca
Kaiser Permanente
Stanford Business School
Johns Hopkins
AT&T
Wachovia
Standard Life Insurance
Raytheon
Commerzbank
Allstate Insurance
BP
Henkel
State of Washington
Napa Valley County
Texas Department of Transportation
Pfizer
Vodafone Omnitel
Nestle
American HomePatient
TIBCO
Sharper Image
Xerox
America Online
Lockheed Northrop Grumman
Dow Chemical Company
Draeger Medical
Hewlett Packard
General Electric
Sutter Health
Kenyan AIDS Clinic
University of Washington
State of Minnesota
World Wildlife Fund
Autonomy Group Customers
 
Autonomy's Technology
Unified Information Access
Connectivity
Rich Media

Technology Limitations of Other Approaches | Security, Scalability and Performance | Global Language Support
Overview
Related Events
Related Case Studies
Related Resources
Related News

Security, Scalability and Performance

The world's largest and most secure intelligence organizations have deployed Autonomy's Intellectual Asset Protection System (IAS) to safeguard their most sensitive information assets. Autonomy provides all aspects of security management, including front-end user authentication, back-end entitlement checking and secure encrypted communication between the IDOL Server and its client applications with 128-bit Block Tiny Encryption Algorithm (BTEA). IDOL's mapped security model is the only empirically proven index security model that scales in the enterprise.

There are three general security models currently available:

1. Unmapped Security

Unmapped security is the traditional method used by source repositories and search engines. For every potential match to a given query, a call is made via the native repository's API (e.g. Documentum) to ascertain the access privileges for that particular document. A single query consequently bombards the native repository with document privilege requests as the retrieval system attempts to assemble a relevant results list from thousands of candidate hits. This method presents significant performance and scalability problems.

Unmapped Security
Unmapped Security
Mapped Security
Mapped Security
"Security is a key differentiator for IDOL. IDOL offers "mapped security" and near real-time synchronization of security entitlements with source content repositories - making it a great fit for highly secure search scenarios"
The Forrester WaveTM: Enterprise Search Platforms, Matthew Brown

Autonomy recommends mapped security but also offers the choice between mapped, unmapped and a hybrid of both. Autonomy also supplies plug-in sample code, so that customers, OEMs and partners are able to develop and implement their own form of security plug-in.

2. Cached Security

Cached security is the method of choice for legacy systems. Cached security marginally relieves the scalability problem of unmapped security by storing results for queries it has already seen. Consequently, when a user repeats a query, the result set can be retrieved from the cache rather than triggering a network-mediated request. However, this approach still relies on calling out across the network directly to the repository for each new query. In addition, it also misses potential results, as the result sets stored within its memory do not dynamically update new information.

3. Autonomy's Unique IAS Mapped Security

Only Autonomy offers mapped security - a highly configurable, secure, accurate, and fast method for respecting third party security entitlements. IDOL maps the underlying security model in the form of ACL, group, role, protective markings, etc. from all of the underlying repositories directly into the kernel of the IDOL engine itself, and stores the information in an encrypted field. As a result, IDOL does not need to send any requests across the network to the data stores when building up a results list. What the user is allowed to see is assessed "inline" within the IDOL kernel at speeds that exceed the response times of the native repository. Unlike other techniques, the security model is never out of date as the transitional signaling mechanism within the connector layer informs IDOL in real-time of any updates or changes to permissions within the underlying content.

Since IDOL's architecture is inherently modular by design, it requires multiple subsystems to communicate with each other, often across insecure networks. All communication between these processes may be encrypted (Secure Sockets Layer), so that packet sniffers who are able to break past a firewall are unable to read the content of traffic between IDOL modules. All of the system's modules are capable of operating in a secure communications mode providing, at minimal processing overhead, the protection of 128-bit encryption. Additionally, IDOL can leverage SSL for both aggregation and querying of content, including access to SSL encrypted sites.

"One factor that has set the Autonomy search apart from the crowd is security. Whatever security exists on the application layer," she says, "Autonomy acknowledges it."
Carol Fineagan, CIO of EnergySolutions, CIO Magazine, July 2008

Scalability and Performance

The management of structured and unstructured content requires a platform that can meet the most rigorous performance requirements and be easily resized commensurate to business needs. IDOL scales to support the largest enterprise-wide and portal deployments in the world, with presence in virtually every vertical market. Since IDOL's scalability is based on its modular, distributed architecture, it can handle massive amounts of data on commodity dual-CPU servers. For instance, only a few hundred entry-level enterprise machines are required to support ChoicePoint's 10 billion record footprint. By comparison, a competitor uses 150,000 machines to handle the same amount of data.

A single IDOL engine can:

Support an estimated 30 million documents on 32-bit architectures and over 250 million on 64-bit platforms
Accurately index in excess of 60 GB/hour with guaranteed index commit times (i.e. how fast an asset can be queried after it is indexed) of sub 5ms
Execute over 2,000 queries per second, while querying the entire index for relevant information, with subsecond response times on a single machine with two CPUs when used against 30 million pieces of content
Support hundreds of thousands of enterprise users, or millions of web users, accessing hundreds of terabytes of data
Save storage space with an overall footprint of less than 30% of the original file size

This enhanced scalability results in hardware cost-savings as well as the ability to address larger volumes of content. Though IDOL scales extremely well on commodity servers, its flexible architecture can take full advantage of massive parallelism, SMP processing capabilities, 64-bit environments (such as Intel Itanium 64-bit architecture), software platforms (such as Solaris 10, Linux 64, Win64, etc), distributed server farms, and all common forms of external disk arrays (i.e. NAS, SAN etc) to further improve performance. This flexibility extends to being able to leverage one or a combination of these different environments.

How It Works

Content from various repositories is aggregated by connectors and then indexed into the IDOL Server or for dissemination across multiple IDOL Servers, through the Distributed Index Handler (DIH). The DIH can efficiently split and index copious quantities of data into multiple IDOL Server instances, optimizing performance by batching data, replicating all index commands and invoking dynamic load distribution. The DIH can perform data-dependent operations, such as distributing the content by date, which allows for more efficient querying. Performance is augmented by the Distributed Action Handler (DAH), a distribution server that allows the user to distribute action commands, such as querying, to IDOL Servers. Multiple copies of IDOL Servers, to which the DAH propagates actions, further ensure uninterrupted service in the event of server failure. For flexibility, both the DAH and the DIH can be configured to run in mirroring mode (IDOL Servers are exact copies of each other) and non-mirroring mode (each IDOL Server is configured differently and contains different data). In addition, the Distributed Service Handler (DiSH) component allows effective auditing, monitoring and alerting of all other Autonomy components.

Linear Scalability

Performance and capacity can be doubled by simply replicating the existing machine. This allows scaling predictions to be made without worry about bottlenecks.

Load Balancing

Data is automatically replicated across multiple servers and user requests are load-balanced across these replicas, guaranteeing performance, reducing latency and improving user-experience.

Mirroring / Failover

Automatically generated replicas are used to provide a pool of servers, the primary resource is automatically selected and the system switches to secondary systems if it fails so that service continues uninterrupted.

Distribution

For organizations that are geographically distributed, local replicas are automatically created and utilized where possible. Remote copies are only used when a local system fails, thereby building fault tolerance whilst maintaining the benefits of local performance and a reduction of resource overhead into a single, seamless service.

Adaptive Probabilistic Concept Caching

Frequently used concepts are maintained in memory and query results are returned as quickly and efficiently as possible.

Multi-dimensional Index & Query Throttling

By using a multi-dimensional index to provide valuable information to the distribution components, IDOL precludes bottlenecks and unbalanced peak loads during the indexing and query process.

Autonomy provides prioritized throttling based on:

Time: maximize index/query performance based on the time of day (i.e. work hours)
Location: prioritize activity based on the server landscape
Status: arbitrarily assign prioritized status for processing
"We have worked with Autonomy for a number of years due to their ability to offer a next-generation enterprise search platform that doesn't necessitate a trade-off between performance, security and scalability."
Mr. K. Sriram, Senior Vice President, Satyam Consulting and Enterprise Solutions Practice, 2007

Instruction-Level Parallelism

IDOL programmatically expresses itself as an expanding collection of operations. These operations can and are executed in serial pipeline form yet the inherent logic of simultaneously processing disparate forms of unstructured, semi-structured and structured data requires a high degree of parallelism. Not only does IDOL need to ingest multiple streams and types of data, it must also provide a real-time answer or decision against that data as it is indexed rather than force the user to wait an arbitrary period until serially accessed resources becomes available.

As a consequence IDOL has been designed with instruction-level parallelism (ILP) as the core of its process and operation model. ILP by definition is limited by the serial instruction model of scalar processors and thus Autonomy has been an extremely conscious early adopter of all forms of parallel architecture from multi-CPU, hyper-threading and now single die multi-core processing.

The engine's default process model is multi-threaded (using a configurable number of threads). IDOL operations can either be grouped by class, with indexing and querying performed by separate threads or for n-core models a single operation can be "atomized" into multiple threads. Concurrent querying and indexing is the default with no requirement whatsoever for "locking" any part of the indexes while querying takes place. All major multi-core manufacturers are supported, including Intel, AMD and the latest Niagara offerings from Sun Microsystems.

Classic scalar models that rely on Moore's predicted doubling of transistor density over 18 month intervals have already demonstrated wire and memory access latencies in addition to heat sealings. As a result, hardware manufacturers such as Intel have declared multi-core strategies as key to crossing the consumer "teraflop" threshold and aim to produce n-core 32 billion transistor die within the next 10 years. Autonomy is actively pursuing a Tera computing R&D simulation program in anticipation of increasing transistor and core density and the declared aim of such manufacturers. Autonomy is currently performing "coalition" simulations of split thread IDOL operations against n-core "battalion" processor units that blend general-purpose cores with more specialist cores such as those dedicated to signal processing. These blended core units are predicted to be the first consumer teraflop chips. Autonomy is developing process thread models that dynamically co-opt different core types to act in "coalition" to perform the simultaneous deconstruction and analysis of unstructured sources such as video that combine visual and auditory attributes.

This is a selection of our forthcoming events, please visit our seminars page for more information.

Automatic Hyperlinks provided by IDOL Server 7

This is a small selection of the Autonomy case studies available, please visit our publications site at http://publications.autonomy.com/ for more information.

Automatic Hyperlinks provided by IDOL Server 7

Technology Limitations of Other Approaches | Security, Scalability and Performance | Global Language Support
Company
Technology
Functionality
Products
Solutions
Services
Customers
Partners
News & Events