Technology Parsing | Manual Tagging & XML | Page Ranking
Overview
Related Events
Related Case Studies
Related Resources
Related News

Manual Tagging, Weighting and XML

With an upswing in enterprise portal use, it is imperative to create taxonomies that address various information types including documents, structured data, HTML, XML and multimedia. Manual tagging schemes are becoming an increasingly popular method of labeling digital material.

Descriptive Inconsistency

Each person will categorize or tag a given document differently. In order to retrieve it, a user must guess the category under which it was tagged. This often results in the correct document not being found.

Another problem inherent in this approach is that humans can get lax in their tagging, leading to the large majority of content being tagged under the category 'general', making it difficult to find anything and rendering the whole taxonomy system useless.

Further complications arise when subjects incorporate multiple themes. Should an article about "technology development in Russia within the context of changing foreign policy" be classified as (i) Russian technology; (ii) Russian foreign policy, or (iii) Russian economics?

The decision process is both complex and time consuming and introduces yet more inconsistency, particularly when the sheer number of options available to a user is considered. For example, over eight hundred tags for general newspaper subjects make the task of choosing a potentially basic subject description in a reasonable time scale an even more challenging process.

Interoperability of Tagging

XML is not a set of standard tag definitions; it is a set of definitions that allow you to define tags. This means that if two organizations are going to interoperate and apply the same meaning to the same tags, they have to explicitly agree upon their definitions in advance.

While this may prove possible for small groups of cooperating agents working over public networks, doubts remain as to whether this will scale to support an extended network of industry trading partners.

Idea Distancing

Tags also fail to highlight the relationships between subjects. There are often vital relationships between seemingly separately tagged subjects such as wing design/low drag and aerofoil/efficiency, a concept known as "idea distancing." Obviously, there will be a degree of overlap between these categories, and because of this a user may be interested in the contents of both. However, without understanding the meanings of the category names there is no clear correlation between the two.

Not Scalable

In order to be very specific in the retrieval and processing of tagged documents, the number of tags will need to be very high. For example, tag numbers in a company such as Reuters run into the tens of thousands. However, as the number of tags increases, so too do the effort required and the likelihood of misclassification.

High Labor Costs

Taxonomy creation and tagging is still a predominantly manual task requiring input from librarians, users and IT staff. This means that large labor costs are involved in making sense of information.

Autonomy's Approach

Autonomy adds a layer of intelligence to the management of XML and understands the content and purpose of either the tag itself, or related information or both.

Further Reference: HTML Icon The Evolution of Search
Further Reference: PDF Icon Autonomy XML White Paper

This is a selection of our forthcoming events, please visit our seminars page for more information.

Automatic Hyperlinks provided by IDOL Server 7

This is a small selection of the Autonomy case studies available, please visit our publications site at http://publications.autonomy.com/ for more information.

Automatic Hyperlinks provided by IDOL Server 7

Technology Parsing | Manual Tagging & XML | Page Ranking
+1 415 243 9955
Further References:
Click here to read the Forrester Wave for Enterprise Search, Q2 2008

Company
Technology
Products
Functionality
Business Solutions
Services
Customers
Partners & OEMs
News & Events
Investors