With an upswing in enterprise portal use, it is imperative to create taxonomies that address various information types including documents, structured data, HTML, XML and multimedia. Manual tagging schemes are becoming an increasingly popular method of labeling digital material.
Descriptive Inconsistency
Each person will categorize or tag a given document differently. In order to retrieve it, a user must guess the category under which it was tagged. This often results in the correct document not being found.
Another problem inherent in this approach is that humans can get lax in their tagging, leading to the large majority of content being tagged under the category 'general', making it difficult to find anything and rendering the whole taxonomy system useless.
Further complications arise when subjects incorporate multiple themes. Should an article about "technology development in Russia within the context of changing foreign policy" be classified as (i) Russian technology; (ii) Russian foreign policy, or (iii) Russian economics?
The decision process is both complex and time consuming and introduces yet more inconsistency, particularly when the sheer number of options available to a user is considered. For example, over eight hundred tags for general newspaper subjects make the task of choosing a potentially basic subject description in a reasonable time scale an even more challenging process.
Interoperability of Tagging
XML is not a set of standard tag definitions; it is a set of definitions that allow you to define tags. This means that if two organizations are going to interoperate and apply the same meaning to the same tags, they have to explicitly agree upon their definitions in advance.
While this may prove possible for small groups of cooperating agents working over public networks, doubts remain as to whether this will scale to support an extended network of industry trading partners.
Idea Distancing
Tags also fail to highlight the relationships between subjects. There are often vital relationships between seemingly separately tagged subjects such as wing design/low drag and aerofoil/efficiency, a concept known as "idea distancing." Obviously, there will be a degree of overlap between these categories, and because of this a user may be interested in the contents of both. However, without understanding the meanings of the category names there is no clear correlation between the two.
Not Scalable
In order to be very specific in the retrieval and processing of tagged documents, the number of tags will need to be very high. For example, tag numbers in a company such as Reuters run into the tens of thousands. However, as the number of tags increases, so too do the effort required and the likelihood of misclassification.
High Labor Costs
Taxonomy creation and tagging is still a predominantly manual task requiring input from librarians, users and IT staff. This means that large labor costs are involved in making sense of information.
Autonomy's Approach
Autonomy adds a layer of intelligence to the management of XML and understands the content and purpose of either the tag itself, or related information or both.
Summary: ...and images; metadata such as Alt Tag and Description are extracted via MediaBin’s SOAP interface. MediaBin is also used for the popular “Guess My Age” microsite, where images uploaded by users are transformed into multiple formats from which visitors guess the age of the person shown. Says Blumfeld,...
Summary: ...to Fujitsu to supply integrated services and project management based on a DMS solution by Meridio. This was developed with Meridio’s high-productivity Application Framework, which is built on SOAP, XML and Microsoft .NET standards. Customs documents are now scanned at 130 TNT scan depots and sent to...
Summary: ...sites—hgtv.com, foodtv.com, and the do-it-yourself site (diynet.com)—Scripps turned to IDOL K2. The company was so impressed with the results that it has expanded its deployment of IDOL K2 across all of its Vignette-driven Web sites in 2001, including its top eight newspaper sites.
...
Summary: ...of research. “The perception of our end-users is that things are much faster—and one of the huge contributing factors of this is K2,” says Graves. “It’s definitely a much more pleasant experience for visitors who come to our site.” The IDOL K2-powered portal has paid off for ACM with healthy...
Summary: ...cs_honda_20080821 Along the way, RPA has gained valuable insight into the kinds of design factors that influence a site’s effectiveness. “Each Honda model has a different audience and buyer, and each responds differently to things like text, color, and layout,” says Shlauter. “The same types of...
Summary: ...management practices frm-wide diffcult to achieve. Weak search functionality meant that Worldox was unable to support effcient knowledge management and re-use of work product. “It took so long to complete a full text search that it was essentially useless and no one used it,” says Ellingson. On the...
Summary: ...laminators, and record storage products, to a full-line of mobile products, media labeling, storage, and desktop products. “At Fellowes, we strive to help our retailers by providing them with frst-class service,” says Brad Hillebrand, manager of enterprise technology at Fellowes. “It is imperative...
Summary: ...Ticketmaster Case Study. "Before, we could monitor each agent once a month, now it’s every 10 days," he says. "We didn’t change anything else, except installing etalk’s agent evaluation technology." As a result, TicketMaster has increased the number of agent evaluations and increased the overall...
Summary: ...central authority for phytosanitary issues in Slovenia. It exercises control, implements measures defined in the relevant legislation, and facilitates the coordination between public services organisations and other bodies. It is also responsible for the cooperation at international and EU level, in particular...
Summary: ...how it oversaw its data infrastructure. While the existing HOLMES 2 system contained vast amounts of data from across the country, accessing this information could sometimes prove complicated, time-consuming and costly. While HOLMES has many applications, often sharing the same database, in order to search...
Summary: ...a new set of Callaway clubs get a trade-in credit for their old set. That old set gets shipped to Callaway Golf Pre-Owned and must pass a meticulous 10-point inspection. Approved sets are cleaned, often refinished, and graded according to condition. The certified clubs are marketed on www. callawaygolfpreowned.com....
Summary: ...to maintain a high level of control over the patient data that is collected during a clinical trial, thus bringing quality products to market efficiently. The Challenge Bringing a new prescription medication to market is a formidable task, especially in the field of oncology, which often deals with very...
This is a small selection of the Autonomy case studies available, please visit our publications site at http://publications.autonomy.com/ for more information.
Summary: ...found. Additional complications arise when subjects incorporate multiple themes. Interoperability of Tagging If two organizations are going to interoperate and apply the same meaning to the same tags, they have to explicitly agree upon their classification schemes in advance. Scale As the number of tags...
Summary: ...measures through ideas distancing, where vital relationships between seemingly uncorrelated subjects such as wing design/low drag and airfoil/efficiency can be established. IDOL realizes there is a degree of overlap between these categories, and because of this a user who has expressed interest in wing...
Summary: ...information, or both. IDOL can automatically insert XML tags and links into documents based on the concepts contained in the information. IDOL’s meaning-based technology also provides an infrastructure for complete and automatic interoperability between applications using different XML tagging rules....
Summary: ...Powers, Senior Analyst, Forrester Research • The USA PATRIOT Act • Military grade data disposition Advanced eDiscovery Web Content Archiving & Compliance offers both automatic and manual policy-driven retention and disposition management, in which assets can be explicitly tagged with retention periods...
Summary: ...or identifying a need for PCI compliance or a governance policy. By automatically tagging, classifying, or applying a policy to an interaction, ICE alleviates the man-hours needed to do this manually and eliminates the likelihood of human error. PCI Compliance and Governance for Audio Autonomy ICE delivers...
Summary: ...the likelihood of misclassification. Autonomy Virage IPTV is the world’s first technology to carry out deep video indexing and automatically generate extensive metadata from television, video or audio, eliminating the need to tag multimedia files manually. Autonomy Virage adds a layer of intelligence...
Summary: ...writing and relies on people making the effort to connect, often before their expertise has been established. Alternatively, techniques that use “click-through” records are hampered by massive quantities of “noise” which is useless in conveying expertise. The Solution By forming a conceptual understanding...
Summary: ...Autonomy TeleForm. Eliminating the requirement for human intervention or pre-tagging prevents the errors and inconsistency that are frequently associated with manual classification. The Autonomy IDR solution automatically extracts index and field information from paper-based documents of nearly any format...
Summary: ...Autonomy Interwoven Merchandising & Recommendation product brief. IDOL realizes there is a degree of overlap between these categories, and because of this a user who has expressed interest in wing design is connected to -an other user who enjoys studying airfoil. Alerting IDOL can automatically generate...
Summary: ...of documents against a classification schema, which powers information management policies as well as some necessary records management processes. Applying legal hold policies within MOSS are cumbersome, as they require manual search and tag methods. Organizations also recognize that they have no centralized...
Summary: ...analysis and algorithms of inference, and is completely automated, eliminating the need for tagging or manual intervention. Rich, conceptual profles are created for each visitor based on both historical and realtime information—whether it was explicitly or implicitly gathered. Customers provide explicit...
Summary: ...for: • web analytics integrations • general debugging of TeamSite SitePublisher page components • repository access for fetching assets • authentication • web services access and SOAP RPC/XML messaging • access to JDBC databases • HTTP requests • tags for the insertion of targeted content...
This is a small selection of the Autonomy Product Briefs available, please visit our publications site at http://publications.autonomy.com/ for more information.
Summary: ...within the stored assets. Users can additionally search the archive using IDOL’s advanced functionality. EAS provides automatic and manual policy-driven retention and disposition management, which automatically and explicitly tags assets with retention periods for a highly granular control over the...
Summary: ...retention enabling users to find information that they didn’t know existed, based on relevance to the original article in any data format • Remove the need to carry out manual tagging and categorization of content • Stop information overload – by allowing the user to identify relevant content...
Summary: ...weighting of metadata to weight individual keywords, special metadata fields or entire documents more or less than others • AFTER - similar to NEAR, only the first (left-hand) term before this operator has to occur within a specified word distance AFTER the term on the right side of this operator in...
Summary: ...can be used interchangeably which means that it does not matter which encoding a language is given in. This makes it, for example, possible to query in one recognized encoding for a language and receive results that are in other encodings. Transliteration schemes Transliteration is the ability to represent...
This is a small selection of the Autonomy Technical Briefs available, please visit our publications site at http://publications.autonomy.com/ for more information.
Summary: ...Autonomy XML White Paper 20031003. 3 page five 3.2 Idea distancing Tags also fail to highlight the relationships between subjects. Termed ‘idea distancing’, there are often vital relationships between seemingly separately tagged subjects such as for example, /wing design/low drag/ and /aerofoil/efficiency/....
Summary: ...itself, related information, or both. Its key benefits include: • Removing the need to manually insert XML tags • Allowing interoperability between applications that use different XML tagging rules • Allowing applications to use idea distancing (vital relationship between seemingly separately tagged...
Summary: ...XML automatically. Autonomy’s approach allows organizations to overcome problems that are typically associated with XML by: Removing the need to manually insert XML tags Allowing interoperability between applications that use different XML tagging schemes Indexing native XML directly into the engine...
Summary: ...Authority (FINRA) or its exchanges. In Europe, principles-based models provided guidelines on compliance objectives, but not necessarily on how compliance should be achieved or enforced. As regulators develop new regulatory requirements, it seems reasonable to assume the complexity of policies, volume...
Summary: ...and Rule 37(f) in particular. Part II of this article suggests a framework to guide courts in determining whether the element of good faith has been met. Finally, Part III provides some practical suggestions on measures corporations should implement in order to establish good faith under the proposed...
Summary: ...pre-computed list of possible tags pre-designed by organizational librarians or editors who construct an overall taxonomy that the tags are references into. Ideally, the taxonomy that is designed would anticipate the categories that users would like to see their search results grouped into. Instead of...
Summary: ...example, any solution relying on tagging of content would require a more and more elaborate tagging scheme, often with multiple tags having to be associated with each piece of content and each user. In these situations, as the volume grows and the user base grows, changes to the taxonomy are more frequent...
Summary: ...to limitations in speed and scalability (in comparison to lower-level languages such as C) as a trade-off for quicker development cycles. This compromises security because using a large number of modules increases vulnerability to attack, as well as creating interoperability issues. The endemic “bolt...
Summary: ...Today’s Imperative for Intelligent Content Targeting: WCM with Robust Search and Advanced Analytics White Paper. • Understand the visitor ’s intent by aggregating all user information, both explicitly stated and implicitly derived, from such diverse sources as CRMs, business intelligence (BI) systems,...
Summary: ...DAM system designs converge to solve the absence of portable metadata. First, the DAM system must support XML-tagged metadata—an enhanced database function. Second, the DAM system must support mul tiple metadata standards, not just one. These metadata standards often use different tags (such as Author,...
Summary: ...What is an acceptable error rate? Auto-classification schemes, regardless of how rigorously defined, are no substitute for the ability of a human being to understand the context for a matter and where a document should be filed. Claims of 90% accuracy in classification seem impressive on paper, but in...
Summary: ...12 months) retention policies in the archive must be set for a “legal hold.” Legal holds are required by companies to systematically enforce retention of data relevant to “pending or reasonably anticipated litigation.” The legal hold process is often a part of a company’s general retention policies....
This is a small selection of the Autonomy White Papers available, please visit our publications site at http://publications.autonomy.com/ for more information.