From the time of the very first computers, their inability to process human-friendly, "unstructured" information has posed a considerable challenge. The modern IT industry was founded on the principle that, for example, if the number in "Column 3" goes to zero, the computer will automatically order more stock for the warehouse - in other words, the position of a piece of information tells the computer what to do with it - and a tremendous amount of effort has been poured into sorting and distilling unstructured information into tidy rows and columns.
Increasingly, structuring information in this way does not represent a viable solution, not only because of the incredible amount of manual effort required but because by organizing information in this way, its richness and subtleties are lost. Consequently, attention has turned to finding alternative, more intelligent solutions to the problem of unstructured information and the journey towards integrated MBC began...
Because computers were unable to understand the meaning of information, the seemingly obvious alternative was to simply search it in order to locate any keywords relevant to the desired subject. The problem with this approach is that the computer has no way of identifying what a given keyword means, and therefore cannot process the information afterwards. For example, if a user types in the letters "D-O-G" the computer has no concept of what that word means; it will simply identify all of the documents which contain that combination of letters, which might produce a list of results thousands of pages long.
In order to improve the results from straight-forward keyword searches, the technique was enhanced by adding a series of arbitrary rules so that the most relevant results would appear at the top of the list. For example, if the search term appears in the title of a document, five points are added to that result, and if it appears three times within the document, one point. This works to a certain extent but the important issue is that there is still no understanding of what a "D-O-G" is, or does. In addition, the rules have to be modified manually and become very costly to maintain every time a subject develops.
On the Internet there is a simple trick to get around this problem because, in many cases, the most popular information is also the most relevant. The importance or popularity of a Web page is approximated by counting the number of other pages that are linked to it, and by how frequently those pages are viewed by other users. This works quite well on the Internet but in the enterprise it is doomed to failure. Firstly, there are no native links between information in the enterprise. Secondly, if a user happens to be an expert, perhaps in the field of gallium arsenide laser diodes, there may be no one else interested in the subject, but it is still imperative that they find relevant information.
As a result of new regulatory drivers such as the FRCP, enterprises need to be able to guarantee that a search has covered absolutely every piece of relevant information across potentially hundreds of different repositories throughout the enterprise. Most search engines are not actually capable of doing this so they ask the original repositories to perform the search - a process known as federated search.
Federated search is often advertised as an asset. However, it creates significant problems because it generates vast increases in network traffic. Every time the user enters a query, each and every repository has to do a search, so a repository that previously ran a search perhaps 0.01 times per user per day, starts to glow white-hot. More importantly, all of the results are searched using different algorithms which means that all of their relevance rankings are different and incompatible when compiling a results list. In addition, most of the underlying search algorithms used in the repositories are not compliant with the new FRCP. Consequently, federated search is not compatible with a pan-enterprise platform.
All of the approaches described up to this point fit squarely into the mid-enterprise search market. A technology which is limited to these capabilities is not suitable for a true pan-enterprise deployment, for reasons that will now become clear.
A critical leap forward came with the ability to actually "understand" the idea behind a given phrase, and retrieve information which is conceptually related, even when a particular keyword is not used. So for example, if the user types in the letters "D-O-G", a conceptual search engine will retrieve all the information conceptually related to but not confined to the word "D-O-G", perhaps information about a "hound" as well as "walks" and different breeds of dog, because it understands the idea represented by the word. This is incredibly powerful because critical information is often missed because users do not always use the same search terms.
Security is absolutely paramount to the enterprise and the challenge this poses is staggeringly complex, from protecting the enterprise's intellectual property from unauthorized access, to ensuring internal compliance with an ever-growing list of regulatory requirements. Most users are not permitted to view most documents or even be aware that they exist. Typically, around 1/1000 documents should be available to each user and access privileges must be specific to each of the many underlying repositories in the enterprise. Achieving air-tight security without significant performance degradation is a considerable challenge.
In order to scale without impeding performance, some technologies fail to search each document in its entirety. This prevents users from retrieving valuable information and it exposes the enterprise to significant compliance risk. Such technologies begin to calculate the relevance of each document at indexing time; however, if at the beginning of the calculation a particular result appears to be irrelevant, the engine will stop calculating, effectively assuming the result is not relevant without reading all the way through. Consequently, a relevant snippet of information on the last page of a hundred page report could be overlooked and the legal consequences could be absolutely catastrophic. In fact, the company CEO could go to jail because the search failed to retrieve all of the information required by the court.
The full potential of multimedia content is often not utilized due to the fact that it has traditionally taken considerable manual involvement to process. Consequently, intelligence lies dormant in resources such as recorded meetings, training videos and broadcast content. True Pan-Enterprise Search technology automatically captures, encodes and indexes television, video and audio content from any source and provides users with the ability to search this with pinpoint accuracy and treat rich media content in the same way as more traditional forms of information.
When computers "understand" information, they can start to automatically process it and begin to bring information to the user rather than the other way round. For example, through forming an understanding, computers can automatically create taxonomies, alert users to new and relevant information in real-time or automatically profile an individual's interests based on what they read and write, offering them interesting information without the need to search or connect with similar people.
Clustering, Scene Detection, Speaker Identification and Sentiment Analysis
Understanding information allows computers to cluster information, identifying inherent themes or clusters of conceptually similar information. In addition, using this approach it is possible to detect irregularities in everyday scenes for security purposes, identify well-known speakers in broadcast media and analyze conversations to detect positive or negative sentiment.
In examining the different approaches to the challenge of unstructured information, it becomes clear that the solution does not boil down to plain search. It is only through understanding the meaning of ALL information that computers are able to automatically process it and provide users with the ability to handle and maximize the value of this rich resource. MBC addresses the full range of information challenges and consequently forms the central requirement of major enterprise deployments all over the world.
Summary: ...words and the highly specialized language of pharmaceutical research and development. For example, when a scientist looks for the latest clinical trial findings on a treatment for schizophrenia known to researchers as “aripiprazole,” IDOL K2 expands the term to include synonyms like “BMS-337039”...
Summary: ...Scrutton Bland Case Study. All documents, including e-mails, had to be fled away, but still be available when needed for re-use or editing. Consequently, the frm used over 60 cabinets for fling which required two rooms just to house them. Every client engagement had a large paper fle associated with it,...
Summary: ...with clients. The firm’s practice groups often told its IT department that they needed a secure way to share electronic versions of documents with clients. Behind the scenes, the architecture of the previous solution posed additional problems. The firm was using a two-tier solution to support more than...
Summary: ...has been the need to keep restocking the beer mats and posters as students have been taking them home to show to their mates – which is absolutely brilliant as it means more engagement and more business.” —Simon Stebbing, Managing Director, Ogilvy Action How Wetherspoon used Augmented Reality to...
Summary: ...As a regulated utility provider responsible for meeting strict federal regulatory agency requirements, this was a serious problem. Says Haug, “As opposed to most businesses where governance is recommended, as a closely regulated company it is absolutely critical that KeySpan institute strict governance...
Summary: ...they must destroy them afterwards. “Paper is still a very convenient medium,” says Kosminsky, but the firm now makes sure not to create and store any more documents than absolutely necessary.
...
Summary: ...At the same time, this strategy makes it essential to meet the high standards and expectations of today’s highly web-savvy younger generation. This posed a special challenge for EDMC’s Art Institutes, which operate through 41 campuses across North America. “We try to let each location be its own...
Summary: ...collection was vulnerable to the hazards of physical degradation. Despite being of extraordinary historical and cultural significance, these videos were an untapped resource requiring costly and prolonged manual intervention to process and index. Consequently, much of the footage was left unseen and important...
Summary: ...been using HP Data Protector since the first version was released,” Kavcic says. “In our long history, we have never once lost data due to a failed backup or restore.” Centralized, standardized backup solution This isn’t to say that Iskratel never considered other data backup solutions. Although...
Summary: ...American Hospital Association Case Study - Health Information Portal. autonomy@autonomy.com Other Offices Autonomy has additional o?ces in Boston, New York, Sunnyvale, Vista and Washington DC, as well as in Amsterdam, Beijing, Brussels, Hamburg, London, Madrid, Milan, Munich, Oslo, Paris, Rome, Shanghai,...
Summary: ...are getting current, accurate information. “Previously, when we updated a marketing brochure, for example, we had to manually update that information everywhere it appeared online. This was an incredibly labor and time-intensive process that often resulted in out-of-sync information across our different...
Summary: ...South Yorkshire Police Case Study. In addition, officers were finding that even when they were able to locate the relevant information, the lack of metadata and incorrectly weighted search parameters meant that searches were often unrepeatable and information was lost again within the system. The South...
This is a small selection of the Autonomy case studies available, please visit our publications site at http://publications.autonomy.com/ for more information.
Summary: ...growing area of Meaning Based Computing (MBC). MBC refers to the ability to form an understanding of content and recognize the relationships that exist within it. MBC extends far beyond traditional keyword searching to provide and a contextual understanding of the entire corpus of raw and production data....
Summary: ...iManage Universal Search. Traditional search engines also fail to filter out documents that use the same words, but with an entirely different meaning than the user’s query. Simple keyword search techniques and relevance algorithms are unable to return quality search results, particularly when legacy...
Summary: ...iManage Universal Search. Traditional search engines also fail to filter out documents that use the same words, but with an entirely different meaning than the user’s query. Simple keyword search techniques and relevance algorithms are unable to return quality search results, particularly when legacy...
Summary: ...There is growing recognition that these types of implicit information are critical to enterprise operations. For example, the employees of geographically dispersed organizations typically have difficulty determining what others are doing and which resources can best address their problems. Failure to...
Summary: ...Intelligent Data Operating Layer), IUS understands and identifies the context and concepts within structured and unstructured information and delivers alerts based on news, social media or industryspecific data sources, without relying on manual tagging or keywords. Inside the enterprise, IUS can search...
Summary: ...document. The ability to reuse previously approved language reduces review and approval times freeing up high value resources and allowing for more business transactions. Key Features Feature What It Does Integrated User Interface Template Manager is seamlessly embedded into Microsoft Word, providing...
Summary: ...KeyView IDOL performs at least 3x faster, and often up to 10x faster, than competing filtering technologies •Filterin g support for all major platforms, including Mac (not supported by competing technologies) •Marke t-leading R&D investment •Provides strong ROI by eliminating the expense of content...
Summary: ...root. Autonomy provides stemming algorithms that reduce words to this form. This is useful because it allows concepts to be matched regardless of the grammatical use of words. In English for example, the words "run", "runner" and "running" can all be stripped down to their stem "run" without significant...
Summary: ...weighting of metadata to weight individual keywords, special metadata fields or entire documents more or less than others • AFTER - similar to NEAR, only the first (left-hand) term before this operator has to occur within a specified word distance AFTER the term on the right side of this operator in...
Summary: ...index with conceptual and keyword search Automatic classification and clustering to create and extend records management fileplans and taxonomies Automatic alerts for document custodians if deletion of important information is attempted Close integration with email messaging/IM archives both on-site and...
Summary: ...This requires work to be recreated because you simply cannot find the file you knew existed, or you can’t find it because you aren’t aware it exists inside a departmental silo. For organizations that manage rich media assets for hundreds of clients, this could pose considerable risk if the wrong version...
Summary: ...TeamSite and the power of Autonomy IDOL, an industryleading enterprise search platform, gives customers what they want by returning search results based on concepts and ideas instead of just matching keyword terms. This means customers can find what they need even without using a precise search phrase....
This is a small selection of the Autonomy Product Briefs available, please visit our publications site at http://publications.autonomy.com/ for more information.
Summary: ...related to but not confined to the word “D-OG”, perhaps information about a “hound” as well as “walks” and different breeds of dog, because it understands the idea represented by the word. This is incredibly powerful because critical information is often missed because users do not always...
Summary: ...single word can have multiple meanings based on the intent behind it. Take the tweet “Saw Red Riding Hood, the wicked wolf got boiled - it was really wicked.” The word wicked can mean either bad or good, based on where it appears in the post. The ever-changing nature of a word’s meaning makes it...
Summary: ...the requisite technology to understand and effectively utilize multimedia content. The current method of adding metatags and applying metadata to multimedia files involves considerable manual labor and does not scale. Consequently, intelligence lies dormant in common resources such as recorded conference...
Summary: ...Autonomy Whitepaper - Best Practices for Cloud-Based Information Governance. Consequently, the new facility may likely be underutilized at first and potentially present an unsupported fixed cost, if the anticipated increased product demand does not materialize. The same holds true in the world of governance,...
Summary: ...are casting about for relief by ensuring true single instance storage. While marketing pundits often purport that Exchange facilitates single instance storage, the truth is that PST files encourage duplication and message redundancy is often rampant. The numbers can be staggering. Organizations can often...
Summary: ...of early-generation speech analytics. Traditional speech analytics tools rely on keywords and phonetics. This poses a problem: these solutions miss out on context and relevancy, both of which are essential to understand what individuals are saying. In short, they do not provide meaning. Words by themselves...
Summary: ...audio recognition can be trained to enable individual speakers to be identified • Word spotting and phrase recognition: Virage can search audio by standard keyword as well as conceptual methods. Conceptual searching returns references to conceptually related information ranked by relevance or contextual...
Summary: ...from and to individuals or purchase orders with receipts, inventories and shipping notes. In today’s business world change has almost become a way of life. Organisations and individuals face change on a regular basis. This often means significant changes are made to jobs and responsibilities. It is...
Summary: ...the benefits of eDRM become even more powerful. By linking eDRM to workflow, for example, the system could be set so that entry of a letter of complaint in to the eDRM system would trigger the workflow system to prompt relevant staff to look at the letter and start the process of dealing with it. Adding...
Summary: ...in different parts of an organization, or in different age groups devise their own private languages for the context of their then current environment. For example, what does POS mean? What is 1337?” 2 The approach Autonomy takes is that of format agnosticism that enables organizations to benefit from...
Summary: ...Autonomy ControlPoint: Information Governance and eDiscovery Solution for SharePoint. This means that 80% of search results that the user must wade through are completely irrelevant to their purpose. This failing becomes especially magnified and costly in an eDiscovery case. One can imagine that if a...
Summary: ...cannot compensate for inaccuracies caused by homophones, homonyms, and linguistic complexities. For example, phonetic approaches often cannot recognize when a base phoneme is actually a part of a larger, more complex word, such as “cat” in the word “catastrophe” or “category”. Word spotting...
This is a small selection of the Autonomy White Papers available, please visit our publications site at http://publications.autonomy.com/ for more information.