General Motors
BP
Ford Motor Company
AstraZeneca
DaimlerChrysler
CNN
General Electric
US Senate
Credit Suisse First Boston
Volkswagen
Siemens
Philip Morris
3
Bloomberg
Verizon
AstraZeneca
Siemens
AT&T
FIAT
Nestle
General Dynamics
Hewlett Packard
Nestle
ABN Amro
UBS Warburg
Merrill Lynch
Ericsson
US Department of Defense
New York Stock Exchange
Nestle
The Economist
France Telecom
Boeing
Lafarge
Safeway
People's Republic of China's
Ministry of Agriculture
Lloyds
Nordea
Tesco
Pfizer
Philips
Sybase
Sprint
Philips
New York Life Insurance
US State Department
Sun Microsystems
Canon USA
3
Novell
Ericsson
EDS
Philip Morris International
UK Department of Trade & Industry
Royal & SunAlliance
3
Novartis
Credit Lyonnais
Sun Microsystems
British American Tobacco
General Motors
Norsk Hydro
Philips
AstraZeneca
Skanska
BAE Systems
MOL
AT&T
Kodak
Britvic Softdrinks
The Royal Mail Group
General Motors
Henkel
Bank of Montreal
Danske Bank
US Department of Commerce
Hewlett Packard
General Motors
US Department of Defense
BMW
Kronos Corporation
Fujitsu Technology Services
New York Life Insurance
Zurich Financial Services
General Motors
US State Department
UK Department of Trade & Industry
Credit Lyonnais
Halliburton
BBC
Blue Cross/Blue Shield of Massachusetts
T-Mobile
Channel 4 Corporation
BP
Swiss Army
VHA
BP
AstraZeneca
Burges Salmon
Motorola
British Telecom
Ferrari
Deloitte & Touche
PA Consulting
Nestle
US Army
UK Department of Trade & Industry
EMC Corporation
US Department of Commerce
Encana Corporation
IEEE
Philips
Hewitt Associates LLC
HEALTHvision
Paramount
Lexmark
US Department of Defense
JD Edwards
Ingersoll-Rand
PricewaterhouseCoopers
Vodafone Omnitel
Ingersoll-Rand
Nomura
US State Department
Reed Elsevier
Dow Chemical Company
Siemens Power Generation
Texas Instruments
Forrester Research
McData
Philips
Wall Street Journal
HM Revenue & Customs
US State Department
Lloyds
NASA
SCA
Siemens
Reuters
ITN
T-Mobile
IBM NICA
Sun Microsystems
Forbes.com
Nissan North America, Inc.
BBC
Toyota Motor
The McGraw-Hill Companies
Fox Sports
AstraZeneca
Society of Petroleum Engineers
US Department of Energy
European Commission
Telecom Italia
Harrah's
AXA
Royal & SunAlliance
Sybase
Napster
3
Lloyds
Oracle
Compuware
Olympus
ARM
Taylor & Francis
Federal Express
Nissan Motor
Milward Brown Precis
Federal Government of Canada
UK Home Office
HM Revenue & Customs
3
The McGraw-Hill Companies
Harvard Business School
Britvic Softdrinks
Henkel
Sun Microsystems
MOL
3
Macmillan Publishing
Allianz Life Insurance Co
Swiss Army
Ericsson
Parliament of Singapore
VMS
Singapore Police Force
Danske Bank
Sony Music
GSA Advantage!
Kaiser Permanente
Stanford Business School
Johns Hopkins
Wachovia
AstraZeneca
Standard Life Insurance
Raytheon
BP
Commerzbank
AstraZeneca
AT&T
Allstate Insurance
State of Washington
Pfizer
Napa Valley County
Texas Department of Transportation
American HomePatient
TIBCO
Sharper Image
Xerox
America Online
Lockheed Northrop Grumman
ABN Amro
Dow Chemical Company
Draeger Medical
AstraZeneca
BBC
3
Vodafone Omnitel
Macmillan Publishing
Sutter Health
Lloyds
Kenyan AIDS Clinic
General Electric
University of Washington
Nestle
Hewlett Packard
State of Minnesota
World Wildlife Fund
Autonomy Group Customers
 
Autonomy's Technology
Unified Information Access
Connectivity
Rich Media

Technology Technology | Understanding Meaning: An Evolution | A Different Approach
Overview
Related Events
Related Case Studies
Related Resources
Related News

Evolution of Search

Autonomy Search Solutions

From the time of the very first computers, their inability to process human-friendly, "unstructured" information has posed a considerable challenge. The modern IT industry was founded on the principle that, for example, if the number in "Column 3" goes to zero, the computer will automatically order more stock for the warehouse - in other words, the position of a piece of information tells the computer what to do with it - and a tremendous amount of effort has been poured into sorting and distilling unstructured information into tidy rows and columns.

Increasingly, structuring information in this way does not represent a viable solution, not only because of the incredible amount of manual effort required but because by organizing information in this way, its richness and subtleties are lost. Consequently, attention has turned to finding alternative, more intelligent solutions to the problem of unstructured information and the journey towards integrated MBC began...

Keyword Search

Because computers were unable to understand the meaning of information, the seemingly obvious alternative was to simply search it in order to locate any keywords relevant to the desired subject. The problem with this approach is that the computer has no way of identifying what a given keyword means, and therefore cannot process the information afterwards. For example, if a user types in the letters "D-O-G" the computer has no concept of what that word means; it will simply identify all of the documents which contain that combination of letters, which might produce a list of results thousands of pages long.

Keyword Search +

In order to improve the results from straight-forward keyword searches, the technique was enhanced by adding a series of arbitrary rules so that the most relevant results would appear at the top of the list. For example, if the search term appears in the title of a document, five points are added to that result, and if it appears three times within the document, one point. This works to a certain extent but the important issue is that there is still no understanding of what a "D-O-G" is, or does. In addition, the rules have to be modified manually and become very costly to maintain every time a subject develops.

PageRank

On the Internet there is a simple trick to get around this problem because, in many cases, the most popular information is also the most relevant. The importance or popularity of a Web page is approximated by counting the number of other pages that are linked to it, and by how frequently those pages are viewed by other users. This works quite well on the Internet but in the enterprise it is doomed to failure. Firstly, there are no native links between information in the enterprise. Secondly, if a user happens to be an expert, perhaps in the field of gallium arsenide laser diodes, there may be no one else interested in the subject, but it is still imperative that they find relevant information.

Federated Search

As a result of new regulatory drivers such as the FRCP, enterprises need to be able to guarantee that a search has covered absolutely every piece of relevant information across potentially hundreds of different repositories throughout the enterprise. Most search engines are not actually capable of doing this so they ask the original repositories to perform the search - a process known as federated search.

Federated search is often advertised as an asset. However, it creates significant problems because it generates vast increases in network traffic. Every time the user enters a query, each and every repository has to do a search, so a repository that previously ran a search perhaps 0.01 times per user per day, starts to glow white-hot. More importantly, all of the results are searched using different algorithms which means that all of their relevance rankings are different and incompatible when compiling a results list. In addition, most of the underlying search algorithms used in the repositories are not compliant with the new FRCP. Consequently, federated search is not compatible with a pan-enterprise platform.

All of the approaches described up to this point fit squarely into the mid-enterprise search market. A technology which is limited to these capabilities is not suitable for a true pan-enterprise deployment, for reasons that will now become clear.

Conceptual Search

A critical leap forward came with the ability to actually "understand" the idea behind a given phrase, and retrieve information which is conceptually related, even when a particular keyword is not used. So for example, if the user types in the letters "D-O-G", a conceptual search engine will retrieve all the information conceptually related to but not confined to the word "D-O-G", perhaps information about a "hound" as well as "walks" and different breeds of dog, because it understands the idea represented by the word. This is incredibly powerful because critical information is often missed because users do not always use the same search terms.

Secure Search

Security is absolutely paramount to the enterprise and the challenge this poses is staggeringly complex, from protecting the enterprise's intellectual property from unauthorized access, to ensuring internal compliance with an ever-growing list of regulatory requirements. Most users are not permitted to view most documents or even be aware that they exist. Typically, around 1/1000 documents should be available to each user and access privileges must be specific to each of the many underlying repositories in the enterprise. Achieving air-tight security without significant performance degradation is a considerable challenge.

Legal Search

In order to scale without impeding performance, some technologies fail to search each document in its entirety. This prevents users from retrieving valuable information and it exposes the enterprise to significant compliance risk. Such technologies begin to calculate the relevance of each document at indexing time; however, if at the beginning of the calculation a particular result appears to be irrelevant, the engine will stop calculating, effectively assuming the result is not relevant without reading all the way through. Consequently, a relevant snippet of information on the last page of a hundred page report could be overlooked and the legal consequences could be absolutely catastrophic. In fact, the company CEO could go to jail because the search failed to retrieve all of the information required by the court.

Audio and Video Search

The full potential of multimedia content is often not utilized due to the fact that it has traditionally taken considerable manual involvement to process. Consequently, intelligence lies dormant in resources such as recorded meetings, training videos and broadcast content. True Pan-Enterprise Search technology automatically captures, encodes and indexes television, video and audio content from any source and provides users with the ability to search this with pinpoint accuracy and treat rich media content in the same way as more traditional forms of information.

Categorize, Alert and Profile

When computers "understand" information, they can start to automatically process it and begin to bring information to the user rather than the other way round. For example, through forming an understanding, computers can automatically create taxonomies, alert users to new and relevant information in real-time or automatically profile an individual's interests based on what they read and write, offering them interesting information without the need to search or connect with similar people.

Clustering, Scene Detection, Speaker Identification and Sentiment Analysis

Understanding information allows computers to cluster information, identifying inherent themes or clusters of conceptually similar information. In addition, using this approach it is possible to detect irregularities in everyday scenes for security purposes, identify well-known speakers in broadcast media and analyze conversations to detect positive or negative sentiment.

Integrated Meaning Based Computing

In examining the different approaches to the challenge of unstructured information, it becomes clear that the solution does not boil down to plain search. It is only through understanding the meaning of ALL information that computers are able to automatically process it and provide users with the ability to handle and maximize the value of this rich resource. MBC addresses the full range of information challenges and consequently forms the central requirement of major enterprise deployments all over the world.

This is a selection of our forthcoming events, please visit our seminars page for more information.

Automatic Hyperlinks provided by IDOL Server 7

This is a small selection of the Autonomy case studies available, please visit our publications site at http://publications.autonomy.com/ for more information.

Automatic Hyperlinks provided by IDOL Server 7

Technology Technology | Understanding Meaning: An Evolution | A Different Approach
Company
Technology
Functionality
Products
Solutions
Services
Customers
Partners
News & Events