From the time of the very first computers, their inability to process human-friendly, "unstructured" information has posed a considerable challenge. The modern IT industry was founded on the principle that, for example, if the number in "Column 3" goes to zero, the computer will automatically order more stock for the warehouse - in other words, the position of a piece of information tells the computer what to do with it - and a tremendous amount of effort has been poured into sorting and distilling unstructured information into tidy rows and columns.
Increasingly, structuring information in this way does not represent a viable solution, not only because of the incredible amount of manual effort required but because by organizing information in this way, its richness and subtleties are lost. Consequently, attention has turned to finding alternative, more intelligent solutions to the problem of unstructured information and the journey towards integrated MBC began...
Because computers were unable to understand the meaning of information, the seemingly obvious alternative was to simply search it in order to locate any keywords relevant to the desired subject. The problem with this approach is that the computer has no way of identifying what a given keyword means, and therefore cannot process the information afterwards. For example, if a user types in the letters "D-O-G" the computer has no concept of what that word means; it will simply identify all of the documents which contain that combination of letters, which might produce a list of results thousands of pages long.
In order to improve the results from straight-forward keyword searches, the technique was enhanced by adding a series of arbitrary rules so that the most relevant results would appear at the top of the list. For example, if the search term appears in the title of a document, five points are added to that result, and if it appears three times within the document, one point. This works to a certain extent but the important issue is that there is still no understanding of what a "D-O-G" is, or does. In addition, the rules have to be modified manually and become very costly to maintain every time a subject develops.
On the Internet there is a simple trick to get around this problem because, in many cases, the most popular information is also the most relevant. The importance or popularity of a Web page is approximated by counting the number of other pages that are linked to it, and by how frequently those pages are viewed by other users. This works quite well on the Internet but in the enterprise it is doomed to failure. Firstly, there are no native links between information in the enterprise. Secondly, if a user happens to be an expert, perhaps in the field of gallium arsenide laser diodes, there may be no one else interested in the subject, but it is still imperative that they find relevant information.
As a result of new regulatory drivers such as the FRCP, enterprises need to be able to guarantee that a search has covered absolutely every piece of relevant information across potentially hundreds of different repositories throughout the enterprise. Most search engines are not actually capable of doing this so they ask the original repositories to perform the search - a process known as federated search.
Federated search is often advertised as an asset. However, it creates significant problems because it generates vast increases in network traffic. Every time the user enters a query, each and every repository has to do a search, so a repository that previously ran a search perhaps 0.01 times per user per day, starts to glow white-hot. More importantly, all of the results are searched using different algorithms which means that all of their relevance rankings are different and incompatible when compiling a results list. In addition, most of the underlying search algorithms used in the repositories are not compliant with the new FRCP. Consequently, federated search is not compatible with a pan-enterprise platform.
All of the approaches described up to this point fit squarely into the mid-enterprise search market. A technology which is limited to these capabilities is not suitable for a true pan-enterprise deployment, for reasons that will now become clear.
A critical leap forward came with the ability to actually "understand" the idea behind a given phrase, and retrieve information which is conceptually related, even when a particular keyword is not used. So for example, if the user types in the letters "D-O-G", a conceptual search engine will retrieve all the information conceptually related to but not confined to the word "D-O-G", perhaps information about a "hound" as well as "walks" and different breeds of dog, because it understands the idea represented by the word. This is incredibly powerful because critical information is often missed because users do not always use the same search terms.
Security is absolutely paramount to the enterprise and the challenge this poses is staggeringly complex, from protecting the enterprise's intellectual property from unauthorized access, to ensuring internal compliance with an ever-growing list of regulatory requirements. Most users are not permitted to view most documents or even be aware that they exist. Typically, around 1/1000 documents should be available to each user and access privileges must be specific to each of the many underlying repositories in the enterprise. Achieving air-tight security without significant performance degradation is a considerable challenge.
In order to scale without impeding performance, some technologies fail to search each document in its entirety. This prevents users from retrieving valuable information and it exposes the enterprise to significant compliance risk. Such technologies begin to calculate the relevance of each document at indexing time; however, if at the beginning of the calculation a particular result appears to be irrelevant, the engine will stop calculating, effectively assuming the result is not relevant without reading all the way through. Consequently, a relevant snippet of information on the last page of a hundred page report could be overlooked and the legal consequences could be absolutely catastrophic. In fact, the company CEO could go to jail because the search failed to retrieve all of the information required by the court.
The full potential of multimedia content is often not utilized due to the fact that it has traditionally taken considerable manual involvement to process. Consequently, intelligence lies dormant in resources such as recorded meetings, training videos and broadcast content. True Pan-Enterprise Search technology automatically captures, encodes and indexes television, video and audio content from any source and provides users with the ability to search this with pinpoint accuracy and treat rich media content in the same way as more traditional forms of information.
When computers "understand" information, they can start to automatically process it and begin to bring information to the user rather than the other way round. For example, through forming an understanding, computers can automatically create taxonomies, alert users to new and relevant information in real-time or automatically profile an individual's interests based on what they read and write, offering them interesting information without the need to search or connect with similar people.
Clustering, Scene Detection, Speaker Identification and Sentiment Analysis
Understanding information allows computers to cluster information, identifying inherent themes or clusters of conceptually similar information. In addition, using this approach it is possible to detect irregularities in everyday scenes for security purposes, identify well-known speakers in broadcast media and analyze conversations to detect positive or negative sentiment.
In examining the different approaches to the challenge of unstructured information, it becomes clear that the solution does not boil down to plain search. It is only through understanding the meaning of ALL information that computers are able to automatically process it and provide users with the ability to handle and maximize the value of this rich resource. MBC addresses the full range of information challenges and consequently forms the central requirement of major enterprise deployments all over the world.
Summary: ...words and the highly specialized language of pharmaceutical research and development. For example, when a scientist looks for the latest clinical trial findings on a treatment for schizophrenia known to researchers as “aripiprazole,” IDOL K2 expands the term to include synonyms like “BMS-337039”...
Summary: ...Scrutton Bland Case Study. All documents, including e-mails, had to be fled away, but still be available when needed for re-use or editing. Consequently, the frm used over 60 cabinets for fling which required two rooms just to house them. Every client engagement had a large paper fle associated with it,...
Summary: ...has been the need to keep restocking the beer mats and posters as students have been taking them home to show to their mates – which is absolutely brilliant as it means more engagement and more business.” —Simon Stebbing, Managing Director, Ogilvy Action How Wetherspoon used Augmented Reality to...
Summary: ...Sunstein Case Study. Most other paper is scanned, profiled, saved for a year, and then destroyed. The firm’s operations are also streamlined through integration with their other applications via WorkSite’s API. Now, users can access WorkSite from the firm's custom matter portal, docketing workflow...
Summary: ...they must destroy them afterwards. “Paper is still a very convenient medium,” says Kosminsky, but the firm now makes sure not to create and store any more documents than absolutely necessary.
...
Summary: ...to the community. While research undertaken in 2002 demonstrated that phone and written correspondence were still the preferred means of stakeholder communication with the department, it also highlighted the emerging importance of electronic communication in stakeholder dealings and, in particular, the...
Summary: ...collection was vulnerable to the hazards of physical degradation. Despite being of extraordinary historical and cultural significance, these videos were an untapped resource requiring costly and prolonged manual intervention to process and index. Consequently, much of the footage was left unseen and important...
Summary: ...been using HP Data Protector since the first version was released,” Kavcic says. “In our long history, we have never once lost data due to a failed backup or restore.” Centralized, standardized backup solution This isn’t to say that Iskratel never considered other data backup solutions. Although...
Summary: ...American Hospital Association Case Study - Health Information Portal. autonomy@autonomy.com Other Offices Autonomy has additional o?ces in Boston, New York, Sunnyvale, Vista and Washington DC, as well as in Amsterdam, Beijing, Brussels, Hamburg, London, Madrid, Milan, Munich, Oslo, Paris, Rome, Shanghai,...
Summary: ...are getting current, accurate information. “Previously, when we updated a marketing brochure, for example, we had to manually update that information everywhere it appeared online. This was an incredibly labor and time-intensive process that often resulted in out-of-sync information across our different...
Summary: ...Gloucester (C&G), part of the Lloyds TSB Group. It therefore decided to implement a similar solution. The Meridio document imaging solution provides a central repository capable of recording and storing all interactions with Lloyd’s TSB’s customers. The system’s modular design means that capacity...
This is a small selection of the Autonomy case studies available, please visit our publications site at http://publications.autonomy.com/ for more information.
Summary: ...growing area of Meaning Based Computing (MBC). MBC refers to the ability to form an understanding of content and recognize the relationships that exist within it. MBC extends far beyond traditional keyword searching to provide and a contextual understanding of the entire corpus of raw and production data....
Summary: ...calculation a particular result appears to be irrelevant, the engine will stop without taking into consideration the rest of the document. Consequently, a relevant snippet of information on the last page of a hundred page report could be overlooked and the legal consequences could be catastrophic. Universal...
Summary: ...calculation a particular result appears to be irrelevant, the engine will stop without taking into consideration the rest of the document. Consequently, a relevant snippet of information on the last page of a hundred page report could be overlooked and the legal consequences could be catastrophic. Universal...
Summary: ...root. Autonomy provides stemming algorithms that reduce words to this form. This is useful because it allows concepts to be matched regardless of the grammatical use of words. In English for example, the words "run", "runner" and "running" can all be stripped down to their stem "run" without significant...
Summary: ...This requires work to be recreated because you simply cannot find the file you knew existed, or you can’t find it because you aren’t aware it exists inside a departmental silo. For organizations that manage rich media assets for hundreds of clients, this could pose considerable risk if the wrong version...
Summary: ...profile of the information usage of each user. The system can then identify user’s current interests to refine search accuracy without any manual input and deliver targeted updates instantly. Agents – Agents can be created to deliver conceptually related information based on users’ profiles and...
Summary: ...Autonomy Retina Overview. Automatic Summarization Summaries of results are created automatically in real-time so that users can easily discern the relevancy of the files to their intent. IDOL Retina provides summarization in three forms, and the length can vary from a few words to several sentences • The...
Summary: ...and is even more difficult to manage on an ongoing basis, which leaves vast unstructured resources including tacit knowledge and experience untapped. There is growing recognition that these types of implicit information are critical to enterprise operations. For example, the employees of geographically...
Summary: ...KeyView IDOL performs at least 3x faster, and often up to 10x faster, than competing filtering technologies •Filterin g support for all major platforms, including Mac (not supported by competing technologies) •Marke t-leading R&D investment •Provides strong ROI by eliminating the expense of content...
Summary: ...such as automatic categorization, analytics and visualization tools allow users to hold conversations with their unstructured data, which uncovers unstated relationships. Leveraging Human Information From the time of the very first computers, the ability to process human information has posed a considerable...
Summary: ...fail to provide users with a defensible legal hold process as defined by the courts and the relevant case law. The legal hold is also perhaps one of the most misunderstood concepts in the eDiscovery space. The legal hold is the means and method by which organizations meet the preservation obligation,...
Summary: ...document. The ability to reuse previously approved language reduces review and approval times freeing up high value resources and allowing for more business transactions. Key Features Feature What It Does Integrated User Interface Template Manager is seamlessly embedded into Microsoft Word, providing...
This is a small selection of the Autonomy Product Briefs available, please visit our publications site at http://publications.autonomy.com/ for more information.
Summary: ...related to but not confined to the word “D-OG”, perhaps information about a “hound” as well as “walks” and different breeds of dog, because it understands the idea represented by the word. This is incredibly powerful because critical information is often missed because users do not always...
Summary: ...that the letters “D-O-G” mean a dog, man’s best friend, a Labrador, or an animal that likes to go for walks, the process becomes more human and the computer can do more of the work for us. Yet the lack of structure in human information still makes the search process challenging for the simple reason...
Summary: ...the requisite technology to understand and effectively utilize multimedia content. The current method of adding metatags and applying metadata to multimedia files involves considerable manual labor and does not scale. Consequently, intelligence lies dormant in common resources such as recorded conference...
Summary: ...Autonomy Whitepaper - Best Practices for Cloud-Based Information Governance. Consequently, the new facility may likely be underutilized at first and potentially present an unsupported fixed cost, if the anticipated increased product demand does not materialize. The same holds true in the world of governance,...
Summary: ...level. HP AUTONOMY’S MEANING-BASED APPROACH A more effective and productive approach to speech analytics is known as meaningbased computing (MBC), pioneered by HP Autonomy. MBC stresses relevance, not just accuracy, in its methodology.
...
Summary: ...the benefits of eDRM become even more powerful. By linking eDRM to workflow, for example, the system could be set so that entry of a letter of complaint in to the eDRM system would trigger the workflow system to prompt relevant staff to look at the letter and start the process of dealing with it. Adding...
Summary: ...are casting about for relief by ensuring true single instance storage. While marketing pundits often purport that Exchange facilitates single instance storage, the truth is that PST files encourage duplication and message redundancy is often rampant. The numbers can be staggering. Organizations can often...
Summary: ...from and to individuals or purchase orders with receipts, inventories and shipping notes. In today’s business world change has almost become a way of life. Organisations and individuals face change on a regular basis. This often means significant changes are made to jobs and responsibilities. It is...
Summary: ...in conjunction with acoustic-phonetic methods to achieve significantly greater accuracy and better results. Simple acoustic-phonetic methods alone fail to achieve good speech to text translation. The acoustic-phonetic approach doesn’t differentiate, for example, between “can I” and “can eye”....
Summary: ...critical infrastructure that enables any type of organization to manage and process all of its data. While this capability is often mislabeled as simply Pan-Enterprise search, it actually encompasses a wide range of functions and services including Process Automation, Rich Media Search, Enterprise Chaining,...
Summary: ...in Data Protector GUI As a result, the Data Protector Inet service will impersonate the Microsoft SharePoint Server 2007/2010 farm administrator Windows domain user account and consequently start the integration agent under that user account although it will run under the Windows local SYSTEM user account....
Summary: ...XML applications to understand conceptual information, independent of variations in tagging schemas or the variety of applications in use. This means, for example, that legacy data from disparate sources, tagged using different schemas, can be automatically reconciled and operated upon. "E-Businesses...
This is a small selection of the Autonomy White Papers available, please visit our publications site at http://publications.autonomy.com/ for more information.