Autonomy - Adding Intelligence to XML
|
The Extensible Markup Language (XML) is the universal format for structured documents and data on the Web. |
Overview
XML is becoming an increasingly popular method of labeling and tagging digital material. However, there are significant barriers to ensuring that XML decreases the costs and increases the efficiency of managing information. Insufficient awareness of such barriers and lack of understanding of how to automate the otherwise burdensome administrative processes upon which XML depends, can lead to high labor costs and descriptive inconsistency.
Autonomy addresses these barriers by automating and managing XML tagging. Autonomy can therefore be viewed as the oil that enables the wheels of XML to turn in practice.
What is XML?
XML (eXtensible Markup Language) provides a standard set of descriptions for labeling digital information and is designed to automate the rapid identification of material and the seamless exchange of data between computers.
As such, XML is designed to overcome the constraints of HTML as a single, inflexible document type and avoid the complexity of full SGML. By facilitating the mapping, transformation and routing of information based on a common set of "document type definitions" (DTD), the most common perceived application of XML is to provide metadata structuring for unstructured content, as well as the exchange of data between different applications and operating systems.
As such, XML is likely to feature prominently in the future development of online information sources however, like all tagging schema it suffers from a number of limitations:
Limitations
-
Manual Processes
The limitations of XML begin with the manual process employed to choose and apply the tags. One example of the effect of human behaviour and the inherent limitations of manually describing information - albeit from existing descriptions - is illustrated by the results of a US Department of Defense edict, mandating that internal users responsible for authoring documents also create an appropriate description of the document's content. At first glance, a seemingly sensible and pragmatic decision. However, after many months of activity, it was discovered that the vast majority of documents had been loosely described and tagged as "general".
Whilst XML attempts to break away from such generalist terms, it remains dependant upon the same shortcomings of human behaviour that manifest themselves as "inconsistency". An individual's ability to describe information is dependant upon their personal experience, knowledge and opinions. Such "intangibles" vary from person to person and are also dependant upon circumstance dramatically reducing the effectiveness of the results.
Further complications arise when subjects incorporate multiple themes. Should an article about "technology development in Russia within the context of changing foreign policy" be classified as (i) Russian technology (ii) Russian foreign policy, or (iii) Russian economics? The decision process is both complex and time consuming and introduces yet more inconsistency, particularly when the sheer number of options available to a user is considered. For example, over 800 tags for general newspaper subjects make the task of choosing a potentially basic subject description in a reasonable time-scale, an even more challenging process.
Idea Distancing Tags also fail to highlight the relationships between subjects. Termed "idea distancing", there are often vital relationships between seemingly separately tagged subjects such as for example, /wing design/low drag/and /aerofoil/efficiency/. The first category may contain information about the way the wings are designed to achieve low air resistance. The latter category discusses ways in which efficient aerofoils are made. Obviously, there will be a degree of overlap between these categories and because of this; a user may be interested in the contents of both. However, without understanding the meanings of the category names, there is no clear correlation between the two.
-
Specificity
In order to be very specific in retrieval and processing of XML based documents, the number of tags will need to be very high. For example, tag numbers in a company such as Reuters run into the tens of thousands. However, as the number of tags increases, so does both the effort and the likelihood of misclassification.
-
Interoperability of Tagging
XML is not a set of standard tag definitions but it is a set of definitions that allow you to define tags. This means that if two organizations are going to interoperate and understand the same meaning for the same tags, they have to explicitly agree their definitions in advance.
Whilst this may prove possible for small groups of co-operating agents working over public networks, doubts remain as to whether this will scale to support an extended network of industry trading partners.
For some sectors in particular, such as the automotive industry, interoperability has become critical. With the advent of just-in-time deliveries, vendor managed inventories, supply chain integration and a greater reliance on transportation and warehousing, the need to reconcile different industry vocabularies has increased. However, creating XML specifications that encourage transparent interoperability will require a focused approach based on an understanding of global business needs.
Autonomy and XML
Autonomy's software avoids traditional problems associated with XML because our core technology enables computers to understand a page of unstructured information and automatically insert appropriate XML tags. Autonomy is thus completely XML compliant and enables the efficient operation, creation and changing of XML meta information.
Based on the category inferred from the content of the document, Autonomy automatically marks data with an XML tag. These tags enable information to be reused and maintained and the system to automatically categorize or deliver this information to the right people. Employees or administrators no longer need to waste time manually inserting XML tags, thus lowering development cycle times, labor costs, and removing the inefficiencies of human error.
Autonomy: XML and specific applications
A more subtle application of Autonomy and XML combined, lies in areas such as supply chain management where, building on the strengths of XML to accurately record precise product codes or catalog numbers, additional unstructured information may be required to relay qualitative or supplementary detail.
In such cases, in addition to the automated creation of the tag itself, Autonomy is able to analyze and process related peripheral information. For example, an aircraft manufacturer may have agreed the automation of a number of component deliveries that in practice, are influenced by additional pieces of information that relate to changes in manufacturing techniques, support issues or installation instructions. Normally, at this point the automation of supply chain management breaks down, as human beings manually have to process such information or worse, the information is discarded or fails to be recognized at all.
Similarly, the same issues can be seen to exist with commerce applications. XML may enable e-commerce vendors to tag products and the information associated with them (price, size, color, features) in a common way, making it easy for customers to comparison shop across the Web. However, again the automated component of the model can be seen to break down with the example of a flowery summer dress which can also be classified as a floral print dress. Ultimately, while the human-readable XML tags provide a simple data format, it is the intelligent definition of these tags and common adherence to their usage that will determine their value. In order to really benefit from the use of XML it will be necessary to deal with the exception processing and the idea distancing issues, which are both vital to actually make that system work.
Conclusion
Autonomy addresses the inefficiencies introduced by many of the manual issues associated with creating XML tags and in addition, adds a layer of intelligence to the management of XML by understanding the content and purpose of either the tag itself, or related information, or both.
Autonomy can therefore be viewed as the oil that enables the wheels of XML to turn in practice.













