Why is XML the future for the sharing of parliamentary information?

XML is the current hot topic in information systems. Suppliers, industry groups and standards makers are all rushing to announce support for it. Most people have now at least heard of it, but there is still widespread misunderstanding of what XML really is and, more importantly, its information system implications.

Accordingly, this paper first presents, with as little technical complexity as possible, a very brief summary of what XML is and how it differs from SGML and HTML and then discusses its information systems (IS) implications. Although concentrating on information sharing in the parliamentary context, the principles are applicable far more widely. Only when viewed from a “wide area” perspective can the real impact of XML in information sharing be appreciated.

This paper considers only the structural, information representation and interchange issues of XML. In particular, imaging, style and linking are not discussed. To aid clarity and succinct presentation, some generalisations and technical simplifications have been made in this document.

What is XML?

XML, the Extensible Markup Language, is not strictly a language. It is a meta-language, used to write other languages, specific to particular applications and markets. XML governs syntax only, not content, not meaning.

Its key features are:

  • Simple – the XML standard is 30 pages, SGML is 150 pages not counting the recent annexes.
  • Human readable – like SGML and HTML.
  • Machine readable – like SGML and HTML but more regular, so easier to parse. A design goal of XML was that it should be possible for a competent programmer to write a parser in a week; compare that with the complexities of SGML and the present chaotic state of HTML.
  • Extensible – nothing is fixed except the way in which new things are defined.
  • Supports Unicode.
  • Supports Metadata.

An XML document consists of physical entities and logical declarations, comprising elements, comments, character references and processing instructions. The elements are the important part, the structured text, with start tags, end tags, empty element tags and name/value attribute pairs.

A related standard, RDF, the Resource Description Framework, uses XML to represent metadata, “data about data” and more complex information. Like XML itself, it does not contain any predefined vocabularies, but describes a way of using XML to make statements about resources and their properties. It can be thought of as a “metadata meta-language” or a “meta-grammar”.

Both XML and RDF are fundamentally simple and, for most practical purposes, infinitely flexible. These factors, together with the now widespread knowledge of “tag” style languages created by HTML, have combined to produce an industry uptake far more rapid than for SGML and far more rapid than expected. Even the quick standardisation activities of W3C and IETF are having a hard time keeping up with the pace. There are currently 67 organisations known to be producing industry XML specifications.

XML and RDF have already both grown far beyond what they were intended for. Together, they are going to create a revolution in distributed information processing. HTML produced a revolution in information publication and access; XML’s revolution is about content, description and processing - information structures and the high quality questions, information interchange and distributed processing that structured information and metadata make possible.

Even within an organization, the ability to use a single, vendor and platform neutral file format for information creation, editing, publishing, archiving and retrieval is a revolution in information processing.

Where it came from

XML and HTML both derive from SGML. SGML, in turn, derives from IBM’s DCF GML (Document Composition Facility Generalized Markup Language), developed by Charles Goldfarb in 1984 but markup had, of course, been used in the electronic typesetting industry for a number of years and in the conventional printing industry for a long time before that.

Work began on XML in W3C in 1996 and it was recommended as an industry standard in February 1998. The declared goal was “to enable generic SGML to be served, received and processed on the Web in the way that is now possible with HTML”. Yet, now, XML is being used as the basis for things that SGML could never do and describing HTML as processable is somewhat ridiculous, as programmers who have attempted to do so will confirm.

It was also described by Jon Bosak in the 1997 article “XML, Java and the Future of the Web” [XMLAPPS] in the often-quoted phrase “giving Java something to do”. That may have been intended as helping to promote Java (Bosak does work for Sun) but the future position of Java on the client desktop is still far from certain and the entire client/server model has been undergoing radical rethinking in recent months. Thin clients are the new fashion. Sun’s latest Java Station replacement contains no Java at all. The Java Station is dead; the Network Computer is dead. Web terminals may turn out to be the new commodity “dumb terminal”, but actually a lot smarter. Web information appliances are one of several new futures.

The truth is that like all good ideas that enjoy widespread uptake, including the PC itself, XML is finding its own level and it is not the level that its creators intended. It might be better to look on XML as “giving something to keep Java for”, using it to write agent software for the new generation of application-specific information appliances but XML in no way depends on Java for its future.

Differences from HTML

  • No presentation elements – structural elements only (but there is nothing to stop the definition of elements that have a meaning to a presentation parser);
  • More rigorous syntax – no end tag minimisation, all element attributes quoted, empty elements must have /> terminator; makes it much easier to parse;
  • Case-sensitive – everywhere, markup and attributes;
  • Continuation after parsing errors not permitted – enforces well-formed code.

Differences from SGML

  • Nearly 50 rule differences between XML and SGML [SGML-XML].
  • XML is not even “pure” SGML, it depends on the changes made to SGML by the Web SGML Adaptations Annex.
  • When XML abandons DTDs in favour of XML-Schema, will it really be SGML any more?

It is now time to stop thinking of XML as “a subset of SGML”, to forget about where it came from and, instead, to look to the future. SGML has not been a failure but it has not been a great success outside large scale publishing activities and related areas, which is what it was always intended for (“SGML can be used for publishing in its broadest definition”, to quote the introduction of ISO8879). Its use as an interchange language is limited and its presence on the Web is negligible. Despite the Extended Naming Rules Corrigendum and the Web SGML Adaptations Annex, SGML is already a legacy system, overtaken and eclipsed by XML, which is moving into the Web mainstream in a way SGML never managed.

As with so many things, time and place were crucial factors; factors as important, if not more important, than the technical issues involved.

Time and place

The XML specification was published in February 1998. The programme and papers of the October 1998 ECPRD seminar, in Budapest, contained no references to XML. What a difference a year makes. In that time, XML has gone from an interesting new development to a fundamental strategic technology that nobody can ignore and you will be using in one way or another within three years. Within five years you will be using little else.

Today, and for the foreseeable future, the Web governs information systems development. Like the PC architecture and MS Windows, that situation may frustrate the introduction of truly innovative new technologies, but it has given the customer unprecedented choice and value for money in a commodity marketplace. XML will become the common information representation standard and will provide benefits to the consumer that could be even greater – content as commodity. The power to enforce competition that XML and agent technology will bring to customers is something that industry is only just starting to appreciate.

The uptake of XML will far exceed the pace of uptake of the Web itself because Web uptake required both content and access infrastructure; XML is all content and so can be deployed very quickly across the existing, and still rapidly growing, infrastructure.

What can XML be used for?

  • XML as SGML replacement – what many of the MLs are – vocabulary definitions; “ontologies” as knowledge engineers like to call them.
  • XML as HTML replacement – clean code, better linking and with presentation issues separated from structure.
  • Client side selection – hardware capabilities, local language, sort order, etc. (at the expense of server side control).
  • Internal representation - XML as a database format.
  • External representation - XML as an interchange format, e.g. Microsoft Office 2000 and Corel Office 2000.
  • Middleware – perhaps the most exciting - XML as a transactional lingua franca, query/response, agent negotiation, everything as a transformation. Generically, it is messaging and this is the hottest topic within the hot topic of XML because this is what will drive e-Commerce. Another role not thought of when XML was first proposed.

Structural richness

XML, like SGML but unlike HTML, is structurally rich. Structural elements can be specified to whatever level desired. The better the structure is defined, the easier it is to ask high quality questions and to process the data automatically because each element can be identified easily and reliably. The earlier in the information creation process that the structural richness is included, the easier it is to process the information at subsequent stages.

Because the element identification scheme used by XML is intended to be equally suitable for human and machine interpretation, but is defined in such a way as to ensure that machine processing is easier than for SGML, XML is well suited to automatic information interchange and processing.

An XML document can be checked automatically to see if it is “well-formed” without a DTD or any knowledge of the element vocabulary. If a DTD or XML Schema is available, the document can be checked to see if it is valid as well. It is important to bear in mind the distinction between “well-formed” and “valid”.

  • Well-formed – conforms to the XML syntax rules;
  • Valid – well-formed and conforms to a particular XML application, defined in a DTD or Schema;

Note that neither of these checks that the document is “meaningful”, “useful” or “correct” at some semantic content level; those are matters for application software or RDF Schema.

Why is XML the future for the sharing of parliamentary information?

The simplistic answer is, of course, because it is the future for the sharing of all information. And not only sharing; it will provide the mechanisms for searching and retrieval and messaging.

However, there are some aspects of parliamentary information that make it particularly suited to XML which will, in turn, facilitate the sharing of that information.
What are the particular characteristics of parliamentary information?

  • Primarily textual - huge amounts of text.
  • Well-defined document repertoires.
  • Well-defined document contents (obvious logical and semantic structural elements, even if they may not easily identifiable by a machine parser).
  • Relatively simple structures.
  • Lots of metadata – all that library information.
  • A single “industry-group”.

These characteristics make parliamentary information an ideal candidate for XML, not only for interchange but also as an internal working format, an external publishing format and an archiving and retrieval format. It also means that it will be far easier for parliaments to move to using XML than most other public sector bodies and commercial organisations.

There are many ways in which the use of XML can assist in the processing of parliamentary information but three important XML capabilities are worthy of particular note:

  • Namespaces
  • Metadata
  • Fragment handling

Namespaces

The ability within XML documents to have either locally standardised element type names (e.g. via a DTD) or non-standard, ad hoc element names (a “DTD-less document”) is very flexible but can lead to interpretation problems when XML information is interchanged because of vocabulary mismatch. But, to quote Tim Berners-Lee and Dan Connolly, “the ability to combine resources that were developed independently is an essential survival property of technology in a distributed information system.” [WEBARCH]. Or, put another way, “it must scale to the Web”.

XML addresses this “vocabulary mix-in” requirement by the use of namespaces, which can be referenced using a URI, and used to map local identifiers to global identifiers. Simple in theory, but when the specification for namespaces in XML was published in January 1999 [XML-NAMES], it caused so much confusion that an explanatory note had to be published a month later in an attempt to clarify matters.

A very simple analogy of namespaces is to think of telephone numbers. A national phone number area code is local, in the sense that it is unambiguous only within the context of that country, but when qualified by the country code it becomes unique in the context of the world – global, i.e. globally unambiguous. Essentially, XML namespaces work like that, except that, instead of assuming that all the country codes have been fixed previously in some implied standard, it also permits specifying where you can find the country’s area code reference book online.

Namespaces ensure that there is no vocabulary conflict but they do not address the problem of vocabulary reconciliation. Syntactic and semantic interoperability are very different things. The semantic level problem has to be addressed either by application software or RDF metadata schema specifying relationship rules.
SGML cannot support namespaces, because DTDs lack namespace-awareness; yet another of many reasons for the move towards discarding DTDs in favour of XML Schemas which can support it. It is also, of course, one of the reasons that SGML cannot scale to the Web, despite the originally stated goal of XML.

Another important concept introduced with namespaces is that of multiple interpretation contexts; it will be possible to have more than one namespace in scope at any point in a document. This will allow expressions of compound elements that combine elements from different namespace vocabularies. Programming languages have long allowed the combining of calls to functions in multiple modules; it is so fundamental as to have been forgotten, but it is not possible in SGML.

Each parliament could define its own vocabulary, referenced with its own namespace. Even individual departments within a parliament could define their own vocabularies. In traditional IS thinking, this is a recipe for chaos, yet it is how people work and it is precisely how the Web must come to work in the shift from unstructured to semi-structured content. An IS strategy cannot be enforced on the entire Web; this has implications for corporate Intranet strategies as well.

However, for reliable, automated information interchange (as opposed to browsing, searching and retrieval), an agreed, sector-specific vocabulary is required, hence the 67 industry groups busy producing XML specifications. The emerging model is defined vocabularies for the “B2B” (business to business) sector where reliable, verifiable information interchange is important but no defined vocabularies for the “customer facing” general Web information publishing sector. At least, not yet because this will need to change as public e-Commerce takes off (especially for agent technology to work) but it is still a good reminder that it is the business model that drives the Information Strategy and not the other way round.

A recent (1997) paper on interoperability for digital libraries [INTEROP] highlighted six criteria for evaluating the tradeoffs in different approaches to interoperability:

  • High degree of component autonomy (compliance with global rules)
  • Low cost of infrastructure (hardware, connectivity, software)
  • Ease of contributing components (hardware, connectivity, software, rules)
  • Ease of using components (system integration not user interface)
  • Breadth of task complexity supported by the approach
  • Scalability in the number of components

XML, not mentioned at all in the paper, has a positive impact on all six of these criteria.

So, should there be a PML, Parliamentary Markup Language? It would make information interchange between parliaments easier, but is it necessary? To answer this, go back to the business model. How much of the information interchange between parliaments is outward dissemination of documents (“push”) and how much of it is ad hoc queries (“pull”)? If it is primarily push, then a PML would be good, to assist with automated processing of the received documents; if it is primarily pull, then a PML is not needed but could still help in asking high-quality (structurally dependent) questions.

On balance, then, a PML would be a good thing but the language problem makes it more difficult. The 67 industry groups are mostly American and writing specifications only in English and for English content; the rest of the world will just have to either join in or not. Such an approach between European parliaments is unlikely to be popular.

Metadata

Metadata on the Web so far has meant a trick Web site managers do in the HEAD of HTML documents to try and fool Web search engines into giving their site a higher relevance ranking. Real metadata, like Dublin Core and the Warwick framework, is something most Web authors, programmers and business people have never heard of. XML is going to change that also, with metadata set to take centre stage in the new, semantic Web, because it will be RDF schema that will describe what XML content means and what to do with it.

The EU Green Paper “Public Sector Information – A Key Resource for Europe” (COM(1998)585) recognized the importance of metadata by asking the question “Could the establishment of European metadata […] help the European citizens and businesses in finding their way in the public sector information throughout Europe?”

The answer to that question may at first sight seem to be a blindingly obvious “yes”, but there is a more subtle issue to be considered. Although good librarians might disagree, why is metadata needed if there is the ability to ask high quality questions of well-structured data held on a defined list of relevant servers? One answer is that the only metadata that will be needed is the list of relevant servers. Another, more fundamental, answer is that the meaning and scope of metadata are changing. Conventional (“library”) metadata will no longer be needed but, instead, a higher level of metadata that addresses the semantic issues. This will be needed for the “Semantic Web”, envisioned by Tim Berners-Lee but it is going to require a very flexible system to achieve it.

XML was designed with metadata in mind right from the start. Particular frustrations with HTML are that metadata can be placed only in the HEAD part of the document, there is no way to add metadata locally to specific elements of the HTML document BODY and, of course, no way to identify structures within the HTML document BODY to which metadata in the HEAD could refer. Resorting to HTML comments to hide within the HTML document BODY structure and metadata meaningful to automated processes or the use of proprietary server-side scripting are, not surprisingly, widespread.

The mechanism for metadata representation in XML is called RDF – Resource Description Framework. It was based on the experience gained when creating PICS and incorporates work from the Warwick framework activity. It is, of course, written in XML itself and provides a model for representing named properties and property values, based on just three object types: resources, properties and statements. A resource can be anything from a single XML element to an entire Web site. Statements consist of a subject, a predicate and an object, i.e. the property value, which can be a literal or another resource. RDF Schema provide information about the interpretation of the statements in a given RDF data model. This could include semantic-level meaning or processing information.

This is a simple, but extremely powerful technique. At the logical level, the applications of RDF are limited only by the imagination. Things which once seemed to need a new language have became just a question of writing down the right RDF.

RDF statements can also be represented pictorially as Directed Labelled Graphs (DLGs), with nodes representing resources and arcs (“edges”) representing named properties. So, confusingly, the important parts of XML documents, the structured text blocks, are “elements” to XML, “resources” to RDF and “nodes” to DLGs.

The large amounts of high quality metadata which parliaments traditionally create and maintain are going to help greatly in the migration to the XML future. In many cases, the largest database in a parliament is not about content but about metadata. With the appropriate front-end (or “wrapper”), those metadata databases could operate in parallel with the XML content databases and even additional XML/RDF based metadata repositories. Eventually, it will all be XML, but this sort of approach will ease the migration path.

Fragment exchange

Programmers who have attempted to write Web search engines or gateways will understand the difficulty of extracting selected fragments in context. For example, how to present a single row of a table in a meaningful context? Almost impossible without structured information. Many parliamentary documents are very long and the need to extract fragments in context for a variety of reasons is an important requirement. For the future Web, search engines that return just a list of references to entire documents will not be good enough.

The XML Fragment Interchange [XMLFRAG] recommendation uses a fragment context specification (effectively listing the XML context of the fragment) to enable the interchange of portions of XML documents whilst retaining the ability to parse them correctly. These fragments can then be viewed (e.g. as part of a query result), edited, accumulated or processed in another way. Returning modified fragments to the sender, e.g. after editing, is obviously possible but would have to managed by an appropriate distributed authoring tool, effectively providing XML groupware. Fragment reuse has also been considered but again left to higher level applications.

Work on Fragment Interchange is continuing and could lead to some exciting applications.

Other issues and developments

Data typing

Another shortcoming of SGML that renders it of very limited value for heterogeneous system information interchange or as an interchange format with conventional databases is that it was not possible to specify data types in a DTD. All conventional database systems have the ability to specify data types, e.g. float, integer, date, so-called “strong typing”. Merging the untyped or weakly typed textual data with the strongly typed database data is one of the big advantages of XML in information structure terms, and one not originally envisaged when XML was first proposed.

An XML mechanism has been proposed to allow data elements to be strongly typed by the addition of special attributes and in a way closely related to SQL, so facilitating XML data interchange with conventional database systems, “the glue for mapping between databases and other data models”. [TYPING]

Although, in theory, this technique could be used with SGML, with appropriate declarations, it is unlikely to happen and the move away from DTDs towards XML Schema will leave SGML even further isolated.

XQL

It is not often appreciated that HTML as an information representation language is one way. When you ask questions of a search engine or submit data to a server, that transmission does not actually involve HTML at all; it is an entirely proprietary content, communicated directly to the server application via HTTP. Each application will have different vocabulary and syntax for the content of the query or submission.

XQL, the XML Query Language, will change all that. It provides a standard query language that will allow the high quality questions of well-structured databases that are the fundamental requirement for making the information on the Web more useful. It has been designed to provide, in a single query language, the ability to operate in four identified problem domains:

  • Queries within a single document (e.g. “find” in a browser)
  • Queries in collections of documents (equivalent level to Web search engines)
  • Addressing within or across documents (nodes and elements as selectors, with relationships, e.g. hierarchy, sequence, position)
  • XSL patterns (queries as general transformations - X transformed by Y gives Z)

An important motivation for the design of XQL was the realisation that the data model of XML is neither a traditional RDBMS, an OO-database, an OR-database or just free-text. XQL questions are expressions (not in XML!) and the results are in whatever form the database returns, not necessarily XML; XQL does not specify this, just how to ask the questions.

SOX

The Schema for Object-Oriented XML [SOX] is another example of the use of a Schema approach for defining in an object-oriented way the structure, content and semantics of XML documents because of the shortcomings of DTDs, especially with regard to content and semantics. It supports data typing, namespaces and inheritance, enabling object reuse at the document design and application programming levels. An interesting development.

SOAP

A very recent (September 1999) announcement from Microsoft was SOAP, the Simple Object Access Protocol [SOAP], that “bridges different object models over the Web and provides an open mechanism for Web services to communicate with one another”.

Effectively, this defines a way of performing Remote Procedure Calls (RPC) using XML content (an XML-RPC) over the HTTP protocol. This scales to the Internet, works through firewalls and is not tied to any one object model. It also demonstrates another advantage of XML when used to serialise objects – it is self-describing.

This is a very significant announcement, showing yet again the extent to which the Web now controls new technology development. One Microsoft insider described it to me as “the death of DCOM”. It is also interesting to speculate on its impact on COM and MSMQ. Perhaps, MSMQ will become just another piece of XML middleware. Definitely one to watch.

Emerging techniques that will facilitate information sharing

Metadata interchange

XMI, the XML Metadata Interchange initiative started to provide a way of interchanging metadata between modelling tools based on UML and repositories based on MOF, i.e. to create a stream-based interchange format in the specific context of UML (primarily STEP product data), but it did acknowledge that XMI could be used for other kinds of structured data interchange.

An important one of these has recently emerged. The Object Management Group has published the CWMI Common Warehouse Metadata Interchange specification. This is intended to allow the interchange of enterprise data, primarily for e-business applications (another example of XML as an interchange format between heterogeneous systems). The principles are not just about business intelligence and data warehousing to support Customer Relationship Management (CRM), Value Chain Management (VCM), etc., they are also about making metadata collections – metadata equivalents to, or helpers for, portal sites. With the increasing difficulties of scale in doing full-text index creation on the entire Web, metadata servers will become important information finding tools and the same principles will apply to corporate Intranets and Extranets as well.

Query mediators

Query mediators provide another “smart” alternative to Web search engines. The current approach of Web search engines does not scale; the more data there is on the Web, the more traffic is created by search engines. The compromises are net congestion versus index timeliness. With Internet traffic doubling every 100 days, this is an important issue. The co-operative index exchange model used by Harvest and NetScape Compass was a good idea, reducing network traffic and providing more selective indexes, but failed mainly because it required a separate protocol running on a non-standard port, so needing special arrangements to operate via firewalls, and also special setup arrangements on Web servers.

Query mediators have several advantages:

  • Neither the entire data set nor the entire index has to cross the Web.
  • Results of the queries are always up to date.
  • Legacy systems can continue to operate behind “wrappers” translating between XML Web format and internal representation formats.
  • These wrappers could in the future contain natural language translators.
  • One query can be sent to many wrappers and the returned result fragments assembled into a single result set.
  • Ability to combine search and browse activities.

The MIX project [MIX] shows how this kind of system might operate and develop, mixing the browse and query paradigms.

The provision of metadata repositories and query mediators would well suit providing closer information interchange between Parliaments.

Important things you need to know

The presentation side of XML is currently in the “active development” phase, i.e. everyone is still arguing about how to do it. CSS is in a bit of a mess. Recently, there have been quarrels over the use of “formatted objects” by XSL in XML documents; a presentation element in a structural framework. It is all very reminiscent of the heated arguments that arose when NetScape introduced the CENTER tag to HTML. CSS is associated with HTML and DSSSL is associated with SGML. XSL transformations seem the best way forward.

It is better not to worry about the presentational aspects of XML now and instead concentrate on the information architectures and metadata. For now, XML can always be converted to HTML for screen presentation and imported into other applications, e.g. Word2000, for creating Postscript. By the time XML system definition has been done and the first XML system implemented, the presentational issues will have been largely resolved.

Stylesheets, etc. have been moved out of the W3C XML groups into the W3C user interface groups; it’s not about information structures. This separation reflects the difference between structure and presentation that XML embodies.

The XML Schema activity has got itself into trouble. Despite two of the key declared requirements being “simple” and “prepared quickly”, it has turned out to be neither and a “Simple Syntax” taskforce has been assembled to sort the matter out and get things moving again.

RDF progress is being hampered partly by the Schema difficulties and also because many people are having difficulty grasping the power and implications of what is will be able to do. There are hardly any tools available to support practical use. It will come, but is going to take longer than hoped and is going to require specialist personnel to make real use of it.

XHTML (HTML 4, recoded as an XML application) has run into problems recently over how many namespaces to have active at any one time. However, there is some good news; the work to modularize it will help in selecting “subsets” of XHTML for device specific presentation, e.g. for PDAs, Web phones, Internet TVs, etc.

Where is it all leading?

One of the long-term goals of XML is to move the Web from its present unstructured form to a semi-structured form, variously described as “elevating the web from machine-readable to machine-understandable”, “the second-generation Web” (Jon Bosak and Tim Bray) or the “semantic Web” (Tim Berners-Lee). The crucial problem with the first generation Web, what we have now, is that most of the information is designed to be viewed by humans, not to be understood by other machines. It is the machine to machine interactions that will take the Web to the next level of usefulness, primarily by providing the humans with better quality, relevant information.

The evolution of the Web leads to another important new paradigm that must be understood, and one which runs contrary to traditional IS thinking: the incomplete data set. The entire data set can never be known and that part of the data set that is known will change during the time it takes to sample it and parts of it may be inaccessible during the sampling. New types of large scale indexing algorithms will have to be developed to deal with this constantly changing situation.

Conclusions

A new convergence

In the same way that TCP/IP and the Internet created a new connectivity convergence and the Web created a new user access convergence, the gravitational pull of XML will bring about another new convergence; that of information content and interaction.

The scale of this convergence is extraordinary. PICS is being rewritten in XML/RDF. RDF incorporates the principles of the Warwick Framework. Dublin core is being rewritten in XML. The DOM is being integrated with XML. The TEI is abandoning its coding work and moving to XML, as is the DocBook project. ANSI X.12 EDI is to be rewritten in XML. RDF is doing information modelling things it was never intended for and being compared with UML [RDFUML]. XML middleware is going to become the new corporate IS glue. E-Commerce has taken to XML with great enthusiasm; the recent initiatives of Microsoft BizTalk (business messaging in XML) and IBM’s Business Rules for E-Commerce [IBMRULES] (rule-based content language and agent communication) demonstrate this.

The momentum is becoming unstoppable. XML must feature in your Information Strategies and as soon as possible.

Parliamentary convergence

Is it possible to apply this new convergence to construct a model on which future parliamentary information sharing might converge? The author believes so and that it will incorporate some or all of the following techniques:

  • Metadata interchange as a way of sharing information about collections and updates.
  • XML middleware as a way of integrating heterogeneous systems both within and between parliaments.
  • Parliamentary portal sites that provide views into the information stores of that parliament and other co-operating parliaments.
  • Query mediators that forward queries from one parliament to all other co-operating parliaments and assemble the results.
  • Browsers and/or mediators with language translation capabilities (e.g. Lernout & Hauspie) .
  • Non-textual relation builders to assist in related information finding (e.g. Alexa).

Note that all of these can be used productively within a parliament but the standardisation on XML makes interoperation much easier.

To derive full benefit from these techniques, an agreed parliamentary metadata model and a parliamentary markup language will be needed. The idea of “Business Rules” being applied to parliaments may seem an alien concept but, again, the benefits could be very great. Using RDF Schema to model the entire parliamentary process is feasible and, it could be argued, desirable.

These tasks are significant undertakings but the benefits to parliaments, parliamentarians, citizens and the democratic process will also be very great.

Recommendations

1. Don’t worry about presentation aspects yet; just assume they will be solved in an XML-based way and use current systems and techniques to provide on-screen and printed output for now.

2. Don’t start a large heterogeneous system information interchange project based on XML until the XML Schema situation has stabilised, especially if the information interchange involves strongly typed data, needs process information, requires input validation or where the content specification is likely to change over time. DTDs are just not suited to this.

3. Start thinking about information structures; build element vocabularies. These vocabularies were always important when doing traditional database definition but got forgotten about in the rush to the Web. They are back. They are going to be more complicated because of the great flexibility of XML and the fact that it will encompass far more than an RDBMS data definition ever could. XML namespaces allows these vocabularies to be built in a decentralised way but for reliable automated information interchange, reconciliation of namespace conflicts will have to be done at some stage, both within your organisation and between organisations, either by vocabulary merging or by relationships specified in RDF schema.

4. Look at your existing metadata repositories and see how they could be used to form the basis of an RDF metadata model and how the valuable information they contain could be integrated (directly or indirectly) with the full text information items to which they refer.

5. Start thinking about the information flows in your organisation, if you are not already doing so. The two keys to the successful deployment of future distributed information systems are content and process. It is no use exchanging content if you cannot describe how to process it to make it useful.

6. Don’t plan any new projects based on SGML. It is a legacy system; all new software development is going into XML projects. SGML software will soon stagnate and then become unsupported.

7. Work out how to migrate your existing SGML projects to XML in the medium term. Once XML abandons DTDs in favour of XML Schema, bi-directional interchange between SGML and XML will no longer be possible.

8. Concentrate first on the structural aspects of XML and then the metadata aspects.

9. Concentrate first on its use as a middleware and messaging lingua franca internally and then as an external interchange format.

10. Start rewriting your Information Strategies to include XML. The benefits are going to be very great.

References

[IBMRULES] IBM Research

[INTEROP] Interoperability for Digital Libraries: Problems and Directions, Paepcke, Chang, Garcia-Molina, Winograd.

[MIX] MIX: Mediation of Information using XML.
Features and Requirements for an XML View Definition Language.

[RDFSYNTAX] Resource Description Framework (RDF Model and Syntax Specification.

[RDFUML] A discussion of the relationship between RDF-Schema and UML.

[SGMLXML] Comparison of SGML and XML.

[SOAP] .

[SOX] Schema for Object-Oriented XML 2.0.

[TYPING] Adding Strong Data Typing to SGML and XML.

[WEBARCH] Web Architecture: Extensible Languages.

[XHTMLMOD] Modularization of XHTML.

[XMLAPPS] XML, Java and the future of the Web, Jon Bosak.

[XMLFRAG] XML Fragment Interchange.

[XMLNAMES] Namespaces in XML.

[XMLSPEC] Extensible Markup Language (XML) 1.0.

(Presentation to ECPRD Conference, Stockholm, October 1999)