When we consider human information and its dominance in today's enterprise, it is natural to first wonder how we can effectively search and find the information we need. Typically, people search or analyze data using an attribute such as the date a video was taken, who is in a photo, or whether a blog gives a positive view of a product. Since computers have historically used databases to increase search efficiency, finding human information raises a number of new questions regarding how we can organize, process, and search it.
Human Information comes in two categories:
Unstructured Text Data. Unstructured text data includes content posted to blogs, news feeds, documents, and social media interactions such as those occurring via Twitter and Facebook.
Unstructured Rich Media. Unstructured rich media includes photos, videos, sound files, and other forms of information that by default do not have any text information on their subject beyond simple metadata.
Using simple search methods, such as keywords, computers can send back every instance of a particular word or combination of words. Because these methods do not understand the meaning of the word "DOG," you get results that contain the word, but you will still have to sift through the results to find what you want. More recent methods of using rules, popularity ranking, federation, and other basic functions are now used to improve the search process. But these methods still have limitations.
To recognize the importance of understanding the concepts contained in information, it is important to first understand the unique challenges posed by human information.
Information is diverse. Human information is not limited to one file type or source. It represents all types of information and does not fit neatly into a structured database. It includes text in the form of emails, documents, IMs, social media, SMS messages, audio in the form of speech and sounds, video, XML, and images.
Ideas do not match, they have a distance. No two ideas are exactly the same, but they have degrees of similarity based on how conceptually close they are to each other. Consider the description "low-drag wing design expert" versus "high-efficiency aerofoil designer." These words do not match, but the ideas are conceptually "close." In turn, a very different idea, such as safari animals, would be conceptually "distant."
Context is important. Distances between ideas change with the context around them. When the story "Clinton Arrives by Car to Meet the Chinese Premier, Drives Up in Black Lincoln" appears, the main point changes based on who reads it. For most people, the news is that Clinton has met with the Chinese Premier. For the subscribers to Limousine, Charter & Tour Magazine, the real news is that Clinton arrived in a black Lincoln. When analyzing human information, the context must be understood to grasp the meaning of the information.
Information does not match exactly. There is a definitional problem when dealing with human information. When a user poses a query, information never matches exactly the way structured information would. The question "Is Snoopy a dog?" does not have a simple answer, as there are many ways to define Snoopy. You must take into account why he would or would not be considered a dog. The answer to a question is dependent on other pieces of information. For instance, if the answer is "No, he is a cartoon character," then Snoopy is not a dog. This demonstrates the relative nature of information.
Meaning is dynamic. In the age of social media, new slang terms are continually emerging. Even within the same phrase, a single word can have multiple meanings based on its intent. Take the tweet "Saw Red Riding Hood, the wicked wolf got boiled—it was really wicked." The word "wicked" meant both bad and good in the same post, depending on where it appears. The ever-changing nature of a word's meaning makes it especially difficult to understand and process human information without the ability to understand context.
Meaning is multi-layered. Within the same set of phrases or words, there can be multiple levels or layers of meaning. We can see this principle best in poetry, where complex metaphors can run through a set of text, building on each other and adding depth.
Meaning is relative. What something means is closely related to your own perspective, such as your social or cultural viewpoint. Two opposing cultural groups will view a set of results very differently, and meaning changes over time and is subject to historical perspective.
The leap forward in the ability to understand human information comes with conceptual search. When a computer can understand that the letters "D-O-G" mean a dog, man's best friend, a Labrador, or an animal that likes to go for walks, the process becomes more human and the computer can do more of the work for us.
Yet the lack of structure in human information still makes the search process challenging for the simple reason that people search or analyze data using an attribute of the data, such as the date a recording was captured, the people pictured in a photo, or whether a website gives a positive review of a product. This requires some form of metadata (data about data) to be tagged to the item or generated on the fly as the item is saved. If no such metadata exists, you will have difficulty finding it, or may not be able to find it at all.
This issue is not an easy one to solve without human involvement. For example, how can software tell if a picture is of a yellow rose, a yellow Labrador named Rose, or a girl named Rose in a yellow dress? Compounding these challenges, human information is often more difficult to manage than structured or semi-structured information in terms of size, organization, and availability.
Autonomy takes a unique approach to leveraging the power of weak information. We use a theory that says the less frequently a unit of communication occurs, the more information it conveys. By using a larger amount of conceptually-related weak information to drive a search, you can yield more relevant results than a smaller amount of seemingly strong keywords. For example, when you search for the word "penguin", there is about an 85 percent chance of bringing back a document about the flightless bird. But your search may also return information on the Batman villain, the publishing house, and the hockey team, along with the bird. On the other hand, a group of weak terms like "a black and white flash jumped into the sea and appeared with a fish in its beak" paints a much more accurate picture. In this case, the probability that this document is about the flightless bird is about 98 percent. Although each word is much weaker, and does not even include the word "penguin," together they offer much clearer information. But to understand what these words are describing, you have to understand their context.
"A black and white flash jumped into the sea and appeared with a fish in its beak" Is it a penguin?
This approach is similar to understanding a conversation in a noisy room, where you can grasp the context of the discussion even when some of the words cannot be heard—or grasping the essence of a news article simply by skimming over the text. Autonomy creates a framework for extracting the concepts from content to determine the meaning of information.
The ability to derive meaning, spot patterns, 'connect the dots,' and automate business processes is now possible using the technology developed by Autonomy and the power of Meaning-Based Computing (MBC). Autonomy, an HP Company, and a pioneer in the area of MBC, provides technology that allows you to derive insight, sentiment, and concepts from structured and unstructured human information to drive better enterprise decisions.
Autonomy's core technology, its Intelligent Data Operating Layer (IDOL 10), understands any type of unstructured information, including text, voice, audio and video—as well as structured application data—to give you the power to perform automatic operations such as hyperlinking, agents, summarization, taxonomy generation, clustering, eduction, profiling, alerting, and retrieval. For instance, Autonomy's core technology, allows text to be searched and processed from databases, audio, video, text files, or click streams. Autonomy IDOL, combined with the latest advances in hardware and software, enables massive amounts of constantly created and updated unstructured, structured, and rich media to be analyzed in real time.
Summary: ...UK Cabinet Office The eDRM system uses Autonomy Meridio’s standard software integrated with the Cabinet Office’s existing Microsoft standard applications including Microsoft Word, Excel, Outlook and Exchange. The system is used to store all important document and records within the organization. On...
Summary: ...applications such as Microsoft Word, Excel, Adobe Acrobat, and the frms Accounts and Tax Return production programs. Managing e-mail with WorkSite is as simple as dragging and dropping from a user’s Inbox to a client fle in Outlook. The frm also selected integrated scanning software that allows employees...
Summary: ...and features are subject to change without notice. Use of Autonomy software is under license. [20120723_PI_CS_Radioplayer] The Benefits Leveraging Autonomy’s IDOL, Radioplayer has succeeded in re-popularizing radio by making online listening accessible, intuitive and simple. IDOL enables Radioplayer...
Summary: ...for activities ranging from docent training to curatorial lectures to public relations. “Because it’s self-service, people don’t have to call photo services for a copy, and it’s much easier to browse the archive both visually and by metadata—it’s as simple as any consumer photo sharing website,...
Summary: ...Santiago, Sao Paulo, Stockholm, Sydney, Tokyo, Utrecht and Washington, D.C. Over the course of two years, Autonomy Optimost conducted a series of tests to see exactly how ThomasNet’s visitors were interacting with the site, yielding results that helped increase user interaction with content, make it...
Summary: ...choice between using a printed work performance evaluation form during the employee review process, which had to be completed on a typewriter, or using an Excel or Word form from a computer. The evaluation had to be accompanied by a separate personnel status change form, recommending the salary increase...
Summary: ...consistent links to the projects managed in WorkSite. The firm is also considering the implementation of Interwoven Universal Search to streamline the identification of document authors with specific expertise. Still in the brainstorming stage: integrations with additional forms of voice and text messaging...
Summary: ...possible vehicle configurations, search local inventories, access research data and learn about current incentives. In addition to delivering timely marketing messages, the sites must also accomplish a simple yet fundamental goal: to move customers along to a local dealer to close the sale. For several...
Summary: ...convert video and audio natural language to text and time synchronize with a streaming preview of the content. Video assets can be quickly and easily found with pinpoint accuracy to the exact location within a video where a word or phrase is spoken. Virage MediaBin also provides a new and intuitive user...
Summary: ...D2D archive jobs deployed Installation • 1 instance of DBA is equal to 2–4 Web consoles + 1 repository database − Web consoles built, as needed, for additional capacity − All instances support archiving from multiple database technologies • Web console − Linux operating system running on a...
Summary: ...using this solution, the archiving process is relatively simple and it is possible to gain instant access to historical materials. It is also possible, according to requirements, to switch seamlessly to distributed instance. Furthermore, in Jinan Steel’s RAC environment, it is possible for this solution...
Summary: ...EarthLink - Case Study. Test results from its portal page also showed that by simply changing the search button text, from “Search” to “Go,” and the color of the search tabs, from blue to white, helped increase click-throughs by 11.85%. These findings allowed EarthLink to dramatically increase...
This is a small selection of the Autonomy case studies available, please visit our publications site at http://publications.autonomy.com/ for more information.
Summary: ...tools to define and model complex logic and actions within template tags. Tags include "Condition", "Loop", "Variable", "Expression", "Barcode" and others. This means, for instance, that users can insert 2D barcodes into a template or define and evaluate Microsoft Word formulas simply and efficiently....
Summary: ...conceptual and contextual search versus simple keyword queries. In large organizations, being able to support 400+ connectors to disparate data sources and multi-lingual intelligence is critical. Additional criteria to consider when selecting an optimal enterprise solution include:
...
Summary: ...the meaning of the clauses within those agreements. Find precedent directly from within Microsoft Word Reviewing non-standard language is one of the first items a lawyer is trained to do, while another is to consider precedence in accepting such language or whether similar alternative language has previously...
Summary: ...that the trade needs action and move the appropriate documents through correct processing and generation queues and tasks. Scrittura WordML® Scrittura WordML is a powerful addition to DocGenerator that allows users to maintain templates and generated documents in Microsoft Word.
...
Summary: ...business objectives. For example, in whole-word mode, all entity matches begin and end at a word boundary. In sub-string mode, an entity can begin and end anywhere, including the middle of a word. Entities can also be set to allow overlapping matches. In overlap mode, entity matches can overlap the same...
Summary: ...can easily control and monitor all of Autonomy’s modules and services, whether they are running locally or remotely. IDOL Retina provides several search methods from which the users can choose to conduct their queries. By default, users can enter natural language queries, ranging from a word to an entire...
Summary: ...iManage Universal Search. Traditional search engines also fail to filter out documents that use the same words, but with an entirely different meaning than the user’s query. Simple keyword search techniques and relevance algorithms are unable to return quality search results, particularly when legacy...
Summary: ...Brochure Find the information you need when you .... Documents that discuss similar ideas, but use different words, jargon, or language, are typically overlooked in rudimentary keyword searches. By the same token, documents that contain the same words, but are not related, are often returned in the same...
Summary: ...formats. This means that Autonomy Document Generation can display conceptually similar clauses from previously executed contracts, giving you a uniquely simple way to refer to previously used language while creating a new template, updating an existing template, or making custom edits to a specific document....
Summary: ...over 1,000 Electronically Stored Information (ESI) formats. This means that Autonomy Document Generation can display conceptually similar clauses from previously executed contracts, giving you a uniquely simple way to refer to previously used language while creating a new template, updating an existing...
Summary: ...are using virtualization to consolidate servers and reduce infrastructure costs. As the number of virtual machines grows, it becomes increasingly difficult to manage the backup process for virtual machines alongside the existing physical machines. A complicated backup management solution only increases...
Summary: ...and more meaningful understanding of their data. Consider the term “fever:” keyword-based technology would identify documents containing that exact word. However, what about documents containing related words, such as elevated temperature, febrile, or pyrexia? Only Bayesian inference technology can...
This is a small selection of the Autonomy Product Briefs available, please visit our publications site at http://publications.autonomy.com/ for more information.
Summary: ...Riding Hood, the wicked wolf got boiled—it was really wicked.” The word “wicked” meant both bad and good in the same post, depending on where it appears. The ever-changing nature of a word’s meaning makes it especially difficult to understand and process human information without the ability...
Summary: ...languages contain a high degree of redundancy, or nonessential content. For example, a conversation in a noisy room can be understood even when some of the words cannot be heard, and the essence of a news article can be grasped simply by skimming over the text. Information Theory provides a framework...
Summary: ...not used. So for example, if the user types in the letters “D-O-G”, a conceptual search engine will retrieve all the information conceptually related to but not confined to the word “D-OG”, perhaps information about a “hound” as well as “walks” and different breeds of dog, because it understands...
Summary: ...And easier access means that lawyers can gain that grasp quicker. This increases responsiveness, and it improves the legal department’s perceived value among the business units. And if the businesspeople no longer consider legal as a “time-wasting black hole where projects and deals go to die,” they...
Summary: ...or reviewing a bill on your website, to logging a complaint with the contact center, to actually touching the product in a store. More customers are making their feelings known indirectly through word of mouth as well as through web-based technologies. For instance, the widespread adoption of interactive...
Summary: ...White Paper page eleven Certain languages such as Thai, Japanese, Chinese, Korean, etc. are written without the use of spaces to delimit words. A sentence is normally a continuous flow of characters with some punctuation used for readability. The individual words are normally discerned by the context...
Summary: ...what doesn’t work is valuable in order to prevent future pitfalls. Multivariable optimization also tells you how different versions of your copy work with other elements on your web pages, another factor to consider when optimizing web copy. For instance, you might find that using “Free Shipping”...
Summary: ...by the amount of information they are asked for. It’s not just a singular decision that is being pondered, but rather a series of minidecisions. Simple layout changes can make a big difference Simple header changes can also make a difference Best Practices: Optimizing Web Forms 3 Copy Words are powerful...
Summary: ...data. These keyword-based engines cannot comprehend the meaning of information, which limits them to finding those interactions in which a specific word occurs. This inability to understand information means that other relevant interactions that are conceptually relevant but use different words are overlooked....
Summary: ...8 Autonomy’s Rich Media Solutions for SharePoint Autonomy also understands how context can alter the meaning of words. Presented with a news story involving the White House, for instance, Autonomy technology would predict the word Bush is likely to follow George, and would aggregate information under...
Summary: ...tail wags dog” tasks. Systems have to be well-designed and present in a logical path from current to future state for users to even begin to buy in. 5. Understand fully the steps and ensure there are no additional or unexpected barriers for the change to occur by making sure that:
...
Summary: ...firms are now exploring and adopting for knowledge workers in particular. − Moderated interactions: These are interactions that occur on corporate maintained social media sites such as a corporate Facebook or Twitter account. In this instance, the organization itself is in essence the “owner” of...
This is a small selection of the Autonomy White Papers available, please visit our publications site at http://publications.autonomy.com/ for more information.