Go to www.autonomy.comGo to www.autonomy.co.jpGo to www.autonomy.com.cn
Limitations of Other Approaches
IDOL Modules
Search & Retrieval
Collaboration & Personalization
Analytics & Taxonomy
Metadata, XML & Connectivity
Rich Media
Europe
Asia-Pacific
Autonomy's Partners
Limitations of Other Approaches
 

Technology Pan-Enterprise Search | Keyword & Boolean | Parsing

Keyword and Boolean Searches

The most common information retrieval techniques, keyword and Boolean search, require users to input the exact words they are looking for into a text field. Upon submission, a search will return a list of documents that contain the search terms.

Accuracy and Context

Keyword and Boolean search are accurate tools when used against a large corpus of data and when the user has very specific knowledge of the unique information they are seeking. Autonomy fully supports this approach. Nonetheless, while keyword search can match words and phrases within documents, it cannot tell how relevant the entire document is to the subject being researched.

Consider the following phrase:

"I was walking down the street the other night. It was a long street, a darkstreet and at the end of the street I was attacked by a mugger."

Although the word "street" is mentioned several times, the phrase is really about a crime. If a keyword and Boolean search for the word "street" returned this phrase, the technique would prove to be very inaccurate.

To improve this, keyword search techniques often rely on weighting to rank search results. If a keyword appears in a prominent place in the document (e.g. in the title) the document is given more importance, or a higher weighting, than one which contains the keyword in a less obvious place such as one buried in the middle of the last paragraph. Higher weighting can also be given to a document that contains multiple occurrences of a keyword.

In well formatted and consistent data such as medical journals, weighting will improve results, and again Autonomy offers full support for this. On the other hand, this still does not take into account the context in which the word appears or the aspect of the topic discussed. Instead, the approach is working on the assumption that if a word is in the title or mentioned often, the document as a whole must be relevant. Autonomy is able to overcome this problem by using sophisticated pattern matching techniques to form a contextual understanding of any document, and to suggest other documents that have matching concepts without relying on keywords.

Manual Refinement

Keyword search engines provide manual techniques to refine results including complex Boolean expressions, keyword tagging, librarian-maintained keyword associations and/or categories. Again, Autonomy supports these techniques out of the box, and they are frequently used by skilled knowledge workers.

However, keyword search engines do nothing more complex than look for a few words, which is very manually intensive and requires humans to manage and update keyword associations or categories.

For example:

Keyword methodologies rely on the end-user to be able to author queries in a complex and specific language (also known as Boolean form). This requires an ability to construct unwieldy search 'rules'.

One initial rule may be: <Israel AND Palestine> OR <Israel OR Palestine>

This may seem sufficient. However, if the user is particularly interested in the escalation of tensions between Israel and Palestine, the rule above would return documents that do not relate to their focus:

'Israel: The number of Jews in Palestine was small in the early 20th century; itincreased from 12,000 in 1845 to nearly 85,000 in 1914.'

And since many articles are written assuming the reader knows some of the background, some relevant documents may not include the relevant key words "Israel" or "Palestine" :

'Mr. Arafat responded to the suicide attacks by declaring a state of emergency in theWest Bank and Gaza and arresting 75 militants.'

So the rule must be modified to include such documents:

<Arafat OR "suicide attacks" OR "state of emergency" OR "West Bank">

However, this may bring back results about other suicide attacks and would potentially miss valuable documents on a directly related topic such as the reaction of other interested parties (for example, the US Government). So, the rule must be modified again and again in order to return relevant information only:

<("Arafat" OR "Sharon" OR "Bush") AND ("suicide attacks" OR "state of emergency" OR"bombings") AND ("Israel" OR "Palestine" OR "West Bank" OR "Jerusalem")>

This is only an initial rule, without even taking ongoing maintenance of the categories involved into account.

The example rule requires a match from each of the categories of name, violence phrase and location, but not all the stories will contain all three and so they will be passed over by the query. This approach therefore requires extensive, detailed manual effort in order to bring back results that still may not be the most accurate available. In contrast, Autonomy can automatically deduce the main topics in a document, and channel related material to the user without requiring any manual input from them.

Ability to Learn

Keyword search engines cannot "learn" through use. It is also very difficult for keyword search systems to find things by being shown an example. Typically a "more like this..." function will increase the number of keywords in the query based on what terms appear most frequently in the example document. Documents are matched based on keywords and therefore on the categories into which they fall. While Autonomy can support this method, its technology is also able to match documents based on the concepts they contain, which can be much more useful.

For example, someone interested in the financial dealings of Manchester United Football Club will be offered other articles on sport such as golf, tennis, maybe even football, by a traditional "more like this" function, while Autonomy could see that the focus of interest is financial in relation to a particular club and provide related documents accordingly.

Autonomy's Approach

Autonomy has the unique ability to match concepts instead of simple keywords, although it does have the ability to perform standard Boolean text queries as well. The software takes into account the context in which terms appear, eliminating many false hits while also catching documents that may not contain the specific term, but do include the concept.

Further Reference: The Evolution of Search
Further Reference: Autonomy's Conceptual Search Capabilities
Technology Pan-Enterprise Search | Keyword & Boolean | Parsing
Further References:
 
 
Discover More...