| Technology |
|
Pan-Enterprise Search | | | Keyword & Boolean | | | Parsing |
|
Keyword and Boolean Searches
The most common information retrieval techniques, keyword and Boolean search, require users to input the exact words they are looking for into a text field. Upon submission, a search will return a list of documents that contain the search terms.
Accuracy and Context
Keyword and Boolean search are accurate tools when used against a large corpus of data and when the user has very specific knowledge of the unique information they are seeking. Autonomy fully supports this approach. Nonetheless, while keyword search can match words and phrases within documents, it cannot tell how relevant the entire document is to the subject being researched.
Consider the following phrase:
Although the word "street" is mentioned several times, the phrase is really about a crime. If a keyword and Boolean search for the word "street" returned this phrase, the technique would prove to be very inaccurate.
To improve this, keyword search techniques often rely on weighting to rank search results. If a keyword appears in a prominent place in the document (e.g. in the title) the document is given more importance, or a higher weighting, than one which contains the keyword in a less obvious place such as one buried in the middle of the last paragraph. Higher weighting can also be given to a document that contains multiple occurrences of a keyword.
In well formatted and consistent data such as medical journals, weighting will improve results, and again Autonomy offers full support for this. On the other hand, this still does not take into account the context in which the word appears or the aspect of the topic discussed. Instead, the approach is working on the assumption that if a word is in the title or mentioned often, the document as a whole must be relevant. Autonomy is able to overcome this problem by using sophisticated pattern matching techniques to form a contextual understanding of any document, and to suggest other documents that have matching concepts without relying on keywords.
Manual Refinement
Keyword search engines provide manual techniques to refine results including complex Boolean expressions, keyword tagging, librarian-maintained keyword associations and/or categories. Again, Autonomy supports these techniques out of the box, and they are frequently used by skilled knowledge workers.
However, keyword search engines do nothing more complex than look for a few words, which is very manually intensive and requires humans to manage and update keyword associations or categories.
For example:
Keyword methodologies rely on the end-user to be able to author queries in a complex and specific language (also known as Boolean form). This requires an ability to construct unwieldy search 'rules'.
One initial rule may be: <Israel AND Palestine> OR <Israel OR Palestine>
This may seem sufficient. However, if the user is particularly interested in the escalation of tensions between Israel and Palestine, the rule above would return documents that do not relate to their focus:
And since many articles are written assuming the reader knows some of the background, some relevant documents may not include the relevant key words "Israel" or "Palestine" :
So the rule must be modified to include such documents:
However, this may bring back results about other suicide attacks and would potentially miss valuable documents on a directly related topic such as the reaction of other interested parties (for example, the US Government). So, the rule must be modified again and again in order to return relevant information only:
This is only an initial rule, without even taking ongoing maintenance of the categories involved into account.
The example rule requires a match from each of the categories of name, violence phrase and location, but not all the stories will contain all three and so they will be passed over by the query. This approach therefore requires extensive, detailed manual effort in order to bring back results that still may not be the most accurate available. In contrast, Autonomy can automatically deduce the main topics in a document, and channel related material to the user without requiring any manual input from them.
Ability to Learn
Keyword search engines cannot "learn" through use. It is also very difficult for keyword search systems to find things by being shown an example. Typically a "more like this..." function will increase the number of keywords in the query based on what terms appear most frequently in the example document. Documents are matched based on keywords and therefore on the categories into which they fall. While Autonomy can support this method, its technology is also able to match documents based on the concepts they contain, which can be much more useful.
For example, someone interested in the financial dealings of Manchester United Football Club will be offered other articles on sport such as golf, tennis, maybe even football, by a traditional "more like this" function, while Autonomy could see that the focus of interest is financial in relation to a particular club and provide related documents accordingly.
Autonomy's Approach
Autonomy has the unique ability to match concepts instead of simple keywords, although it does have the ability to perform standard Boolean text queries as well. The software takes into account the context in which terms appear, eliminating many false hits while also catching documents that may not contain the specific term, but do include the concept.
| Technology |
|
Pan-Enterprise Search | | | Keyword & Boolean | | | Parsing |
|
















