1
Introduction
1.1
The First Schema Based Compression Scheme
1.2
Application to XML
1.3
The Need for a Single Standard
2
Theory
2.1
Information Theory
2.2
Schema-based Encoding
2.3
Unifying the Models
3
Benefits
3.1
Reduced Storage Requirements
3.2
Reduced Bandwidth Requirements and Cost
3.3
Increased Performance
3.4
Extended Battery Life
3.5
Increased Interoperability with “Efficient” Systems
3.6
Reduced Development Time and Cost for “Efficient” Systems
3.7
Increased Market Size for XML Solutions
4
Requirements
4.1
Generality
4.2
Directly Readable and Writable
4.3
Streaming
4.4
Encoding Query Results
4.5
Low Memory and Processing Requirements
4.6
Resilient to Schema Variations
4.7
Extensible Encoder Architecture
5
Non-requirements
5.1
Random access
5.2
Dynamic Updates
5.3
Versioning
5.4
Mixing of text and binary encodings
6
Conclusions
7
References
Biography
In 1995, our CTO
invented a technology called Knowledge Based Compression (KBC)
designed to drastically reduce the bandwidth requirements of
structured text messages shared between automated systems [4].
To our knowledge, this was the first data compression technique
to leverage shared meta-data (i.e. a schema) to produce highly
efficient encodings of structured text messages
KBC was based on
the principles of information theory pioneered by Claude Shannon
[20] and informed by empirical data from one of the world’s
largest and most complicated data sharing environments (see [15]
for additional information). It was designed to enable the U.S. DoD and its Allies to share timely, accurate, complete
information between hundreds of fixed and mobile automated
systems developed independently by different organizations,
evolving on different schedules, funded by different budgets and
driven by different priorities.
With the advent
of XML, it was obvious this technology could be applied to
benefit a far broader range of users [8]. Our preliminary
findings showed that compression rates of 99% (50 times smaller)
could be achieved on XML data (depending on the schema). Unlike
conventional compression techniques that depend on character
redundancy, schema-aware encodings worked equally well on large
messages and high frequency streams of small messages typical of
web services and mobile devices. In addition, they could be read
and written directly from an appropriately modified XML
Information Set [6] implementation (e.g., DOM, SAX,
pull-parser), eliminating the need for a separate compression
step.
However, despite
a few early contributions with initial support for schema-aware
compression [11] [14] [21], it was relatively clear the
mainstream market was not yet ready to take this leap [9].
Recently, however, there has been a marked increase of activity
in this area with broader initiatives under development, like
the emerging MPEG-7 BiM standard [1] [13] and the XML work of
the ASN.1 community [18] [22].
With this
increased interest and activity comes the risk that several
non-interoperable XML encodings will evolve independently. Given
this risk, it is clear that now is the right time to establish a
single global standard for efficiently encoding XML data. As the
custodian of XML, the W3C is the right organization to take on
this task.
Given our
background in this area, we were selected by the DoD to devise
an efficient XML encoding that will satisfy their requirements.
While the initiatives cited above are a fantastic start, they
each lack some important architectural components and some
include features that add unnecessary overhead. We are very
interested in collaborating with the W3C member organizations to
share our knowledge and expertise toward the development of a
single, global standard for efficient XML encoding
AgileDelta has
devised uniquely efficient methods for deploying compelling web
service applications to almost any programmable device
(including Java and Microsoft devices). This technology enables
us to implement the full web services stack and an intuitive
data binding layer in as little as 7 Kbytes. One of our
principal requirements for an efficient XML encoding is that it
can be implemented on similarly constrained environments without
increasing this code footprint by a significant percentage
The remainder of
this position paper provides background and outlines some of our
additional rational and requirements for an efficient XML
encoding mechanism.
The field of
Information Theory pioneered by Claude Shannon provides a formal
mathematical foundation for quantifying the minimum amount of
data required to represent a piece of information [20].
Information Theory defines a bit as the amount of data
required to differentiate between two equally likely
alternatives and states that log2(n) bits
of data are required to differentiate between n equally
likely alternatives. However, when the alternatives are not
equally likely, fewer bits may be used for higher probability
alternatives resulting in a more efficient encoding of the
information. Specifically, log2(1/ pI)
bits are required to represent an alternative with probability
pI. It is this principle on which
compression schemes employed by popular programs, such as
WinZip, are based
From this
principle, Information Theory concludes the amount of data
required to represent a piece of information is a function of
the amount of knowledge the receiver has about the nature of the
data. Specifically, the more knowledge the receiver has about
the probable messages being sent, the less data is required to
transmit those messages. Paul Revere took advantage of this
principle when he interpreted the number of lights in the tower
of the Old North Church within the famous context: “one if by
land, two if by sea.” This knowledge enabled him to receive two
important messages, “the British are coming” and “they are
coming by land” encoded in only two bits.
In the context of
XML, document recipients may access a wealth of knowledge about
the nature of the data being sent. The XML Schema Language [2]
[17] is being widely deployed by industry leaders for validating
XML files, enabling content aware XML editors and describing web
services APIs. There are simple, inexpensive tools available
from several companies that make it easy to develop XML Schemas,
including tools that automatically deduce XML Schemas from class
definitions or a representative set of XML documents. XML
Schemas are appearing for all major XML domains of discourse,
including horizontal domains (e.g., XHTML [16] , SOAP [11] and
SVG [10]) and many vertical domains [23] [7].
XML documents may
identify one or more Schemas providing detailed information
about the elements, attributes and data types found in the
document. In most cases, the set of XML document types processed
by an application are known well in advance (e.g., at
application development time). Therefore, the appropriate XML
Schemas can be deployed with the application (i.e., pre-cached).
In cases where
the appropriate XML Schemas are not known in advance or cannot
be deployed with the application, they may be downloaded
dynamically, for example from a schema repository, and cached
for later use.
Most of the
mainstream schema-aware encoding proposals either do not address
compression of textual content [22] or separate the compression
process into two separate phases [1] [11] [13]. In one phase,
they encode the textual content of the XML data using
traditional techniques derived from information theory and in
the other phase they encode the structure of the XML data based
on the schema without applying the principles of information
theory to the resulting value space. We believe the principles
of information theory apply equally to both phases and can be
applied to unify them and achieve better results more
efficiently. Researchers have found that techniques that
combined these two phases generally produce better results [5].
In addition, we believe the schema should inform the encoding
process without unnecessarily constraining the XML data that can
be encoded or losing certain aspects of the original document
(see sections 4.1 and 4.6).
This section
provides a brief overview of some of the benefits we expect from
an efficient XML encoding.
A more efficient
XML encoding requires less primary and secondary storage. This
is particularly important for small mobile devices and embedded
systems where primary and secondary storage are at a premium. In
some cases, the size of the data may determine which devices may
effectively use that data.
Probably the most
obvious benefit of a more efficient XML encoding is reduced
bandwidth requirements. Reducing bandwidth requirements may or
may not increase the visible performance of a particular
network-bound application (e.g., due to network latency);
however, it will make more efficient use of available
communications resources allowing more users to share those
resource effectively. Consequently, those purchasing bandwidth
directly or indirectly will achieve a better cost / performance
ratio. Consumers and enterprises using the ever increasing
number of mobile devices connected to always-on, packet-switched
networks that charge for bandwidth by the kilo-byte will realize
this savings directly on their monthly bills.
Engineers using
efficient XML encodings have reported up to an order of
magnitude performance increase over processing the XML text
encoding [18]. There are several ways in which a properly
designed XML encoding can improve performance. First, the
encoding may be read and written directly from an information
set implementation (e.g., DOM, SAX, pull-parser) without a
separate compression step. Since the encoding must be produced
from an internal information set representation, the decoder can
depend on certain invariants that allow it to avoid steps such
as testing for well-formedness or schema validity. In addition,
the encoding can avoid costly string conversion functions such
as atoi() by avoiding storing non-string values as strings.
Second, smaller
files require fewer CPU cycles to read and process. For example,
consider reading data from a file. The computer must read and
process each byte of data. The processor repeats a set of
instructions for each byte of the data read. Reading twenty
times more data requires twenty times more repetitions of these
instructions
Third, space
efficiency has indirect impacts on performance that are more
subtle. Modern computing devices employ a multi-level memory
architecture typically consisting of an internal cache on the
CPU, a level two cache on the CPU board, a main memory on the
motherboard and a swap device on secondary storage (listed in
order of increasing size and decreasing speed). During the
execution of any piece of software, portions of its data will be
scattered across all of these memory devices. The most recently
accessed portions of the data are stored in the smallest,
fastest memory devices assuming they are most likely to be
reused. This assumption is more likely to be true for smaller
data representations since a larger percentage of the entire
dataset resides in the fastest memory devices at any given time.
Very small data representations can fit entirely in the CPU’s
internal cache, resulting in extremely fast processing times. On
the other hand, very large pieces of data must be stored on the
hard disk and repeatedly swapped in and out of main memory,
resulting in an extremely inefficient process called thrashing.
This problem is not limited to machines with small memories, but
is particularly noticeable on large servers, on which a large
number of processes and threads compete for the same limited
memory resources. Using a smaller data representation can allow
more processes and threads to effectively share the resources of
the same machine. This increases server scalability and
decreases the cost / performance ratio.
When using
devices that are continuously connected to a power source it is
easy to forget that work takes energy. However, each instruction
executed by the processor, each byte of data transmitted or
received over the radio and each byte of data read or written to
secondary storage requires energy. For mobile devices, efficient
use of resources translates directly to longer batter life.
Because a smaller data representation makes more efficient use
of available processing, storage and communication resources, it
requires less energy and extends battery life.
In environments
where efficiency is critical, such as in gaming, mobile,
embedded and real-time systems, support for the XML text syntax
is often not desirable or even possible. Therefore, these
systems generally implement a more efficient and often
proprietary data encoding. Consequently, it is not possible for
these systems to access the wealth of information available in
XML format or provide information to systems that speak XML
without an expensive series of proprietary gateways and the
associated risk that information fidelity and accuracy will be
lost
Researchers have
found that compressed or encoded XML messages are competitive in
size and often smaller than hand coded binary formats [5] making
them an attractive alternative to proprietary formats. Providing
a more efficient XML encoding will enable efficient systems to
access and share information seamlessly with a wide range of XML
systems. Per Metcalfe’s Law, the value of information exposed as
XML will increase exponentially with the number of systems able
to access it.
As discussed
above, providing a more efficient XML encoding will enable a
broader range of systems to use XML. In addition to
interoperability benefits, this enables those systems to
leverage the broad range of available XML technologies,
development tools, software libraries, and developer expertise,
which eliminates the time and cost associated with developing
technologies, tools, libraries and expertise for a proprietary
or less popular format from scratch. In addition, the quality,
cost and tempo of evolution for XML technologies is driven by a
competitive mass-market meaning high quality, rapidly evolving
technologies can be acquired inexpensively and are often free.
Providing a more
efficient XML encoding will enable a broader range of systems to
use XML, including gaming, mobile, embedded and real-time
systems. The size of the market for XML solutions will increase
as the number of systems able to efficiently use XML increases,
further stimulating the XML ecosystem.
This section
provides a brief overview of some of the requirements we’d like
to see satisfied by an efficient XML encoding. These
requirements are informed by our technical research and
empirical analysis of the characteristics of data exchanged
between a diverse array of military systems during several real
operational scenarios.
To maximize the
benefits to and synergy with the XML eco-system, the efficient
XML encoding must be general enough to apply to the broad range
of XML applications. For example,
Data and
Document Centric Applications: The encoding should work well
for both data and document centric applications of XML. In
particular, it should work well on documents with a large
percentage of text content, support mixed content models and be
able to preserve whitespace and XML artifacts like comments and
processing instructions. A particularly important use case is
XHTML, which will help the technology become a compelling
addition to popular web browsers. Popular web browsers will be
important for providing a ubiquitous platform for viewing
encoded data and building XML applications.
It should be
possible to turn support for various document-centric features
off on a schema or instance-document basis, such that
data-centric applications that do not need them will not be
required to incur the associated overhead
Lossless and
Lossy Encoding: The XML encoding should support a lossless
encoding of the data for applications that need a high-fidelity
representation of the data (e.g., preservation of whitespace,
comments and processing instructions). Applications that do not
need this level of fidelity should be able to avoid incurring
the associated overhead by selecting a lossy encoding mode
Applications
with and without Schemas: We believe the encoding should
leverage schema information when it is available, but should not
depend on it to operate. It should be possible to provide
efficient encodings of untyped data, including entire XML
documents or portions of XML documents associated with Schema’s
anyType [17
Large and
Small Documents: The XML data encoding should work equally
well on large XML documents, such as database dumps and aircraft
maintenance manuals, and small documents such as those typically
exchanged by web services or mobile devices. It should not
depend solely on character redundancy or shared meta-data (i.e.,
schemas) to achieve compression. Rather, these techniques should
be unified by relating them through their common roots in
information theory.
The efficient XML
encoding must be directly readable and writable from an
appropriately modified information set [22] implementation
(e.g., DOM, SAX, pull-parser). In particular, it should not
require a separate compression step after serializing an XML
document or a separate decompression step prior to parsing an
XML document. Information set implementations should provide a
unified information set abstraction to host applications
independent of which encoding was used. Applications should be
able to request the abstract information set be serialized using
a particular encoding where appropriate.
The efficient XML
encoding must support streaming. This is required by
applications that must generate results responsively before the
entire document has been received, read or processed. In
addition, it is critical for devices with limited memory
capacity, where it may not be possible to store a large portion
of the document in memory before processing it.
In particular,
the encoding must be formulated such that applications can begin
processing a particular information item before information
items occurring later in document order have been processed,
read or even sent from their source. In addition, applications
must be able to decode and examine each information item in the
document without remembering information items that occurred
earlier in document order
This requirement
is less restrictive than providing full random access to the
document.
When bandwidth,
processing power and storage capacity are at a premium,
applications must be able to send and receive specific portions
of an XML document on demand as they are needed instead of
transferring the entire document. If a client requires only a
small subset of the information items in a large document,
transmitting the entire document is highly inefficient and
obviates the benefits of the efficient encoding. Therefore the
encoding must be capable of representing arbitrary collections
of information items extracted from a document.
The efficient XML
encoding will likely provide the largest benefits to devices
with limited processing and memory resources. Therefore, it is
important that the algorithms required to support the encoding
have low space and time complexities. In addition, the
implementation of these algorithms should be very small
As mentioned
above, AgileDelta has devised uniquely efficient methods for
implementing the full web services stack and an intuitive data
binding layer in as little as 7K. Adding a standard, efficient
XML encoding to this framework should not increase this
footprint by a significant percentage.
The efficient XML
encoding should be informed by schema information when it is
available, but it should not prohibit applications from
intentionally encoding information sets that do not conform to
the given schema. In particular, the encoding should be capable
of representing instances of XML documents for which the Post
Schema Validated Information set (PSVI) contain information
items with the [Validity] property set to invalid
[17]. Such document instances should require more bits to encode
due to their decreased probability of occurrence, but should not
be prohibited. At the same time, the encoding should not require
documents known to be schema-valid to incur any overhead for
encoding information items that are schema-invalid (other than
possibly a Boolean value indicating whether the document is
known to be schema-valid or schema-invalid
There has been
much debate over the usefulness of transmitting or processing
documents that are schema-invalid. In general, machines are not
capable of inferring the meaning of information items they were
not explicitly programmed to understand (the semantic web [19]
is still a distant point on the horizon). However, although we
prefer documents to be schema-valid, we have found several
real-world scenarios where requiring strict schema-validity is
at best disruptive and at worst disabling.
For example, it
is sometimes necessary to store or transmit a document in an
incomplete state. This occurs frequently with documents that are
constructed collaboratively by several cooperating systems or
organizations. Before the document is completed, it must be
communicated between organizations in a schema-invalid form.
Systems should not have to fall-back to the text encoding of an
XML document to achieve this goal
As another
example, information exchange requirements often evolve faster
than the standards (and the associated schemas) that capture
them. One might argue the schema should be flexible enough to
account for any possible contingency; however, this is not
always possible and the complexity that results from the
associated layers of abstraction is often undesirable.
Therefore, in practice we’ve found that users will intentionally
override validation results when they need to send information
not captured in the schema. If the system does not allow them to
override the validation results, they will find another way to
send the needed information, for example by embedding it in a
comment. Unfortunately, this ad-hoc form of capturing deviations
from the schema is generally ignored by automated systems.
Therefore, we’ve found it is much better to formalize a
mechanism for articulating intentional deviations from the
schema that can be understood and handled appropriately by
receiving systems
We’ve also found
that there are several useful things an application can do with
schema-invalid information items. Each application receiving
schema-invalid information items may implement a different
policy according to their processing requirements. For example,
an application may
Generate an
error: Depending on the processing needs of the application,
it may have no other option than to generate an error. Higher
level application logic may be able to recover from the error
and continue processing or may indicate a failure to the user or
originating system via a user interface, log or message
Ask for human
intervention: The application may request assistance from a
human user. The user may handle the schema deviation manually or
establish a rule indicating how the application should treat
similar deviations in the future
Application
specific error correction: The application may be programmed
to “correct” certain classes of error automatically with no
human intervention
Ignore them:
The schema-invalid information items may occur in a section of
the document that does not effect the information the
application needs to complete its task. Therefore, the
application may safely ignore the schema-invalid items and
complete its task successfully
Pass them
through: The application may pass the schema-invalid
information items through to another application for handling
Provide them
to the end user: In some cases, the schema-invalid
information is ultimately presented to a human, who will likely
understand the deviation with no problems. Effectively, portions
of the document may be used for structured human-to-human
communications. The client application knows best which data
fall into this category.
Ultimately, it is
the receiving application that knows best how the information
items in the document will be processed and how to handle
deviations from the schema. XML applications today have both the
responsibility and the freedom to define and enforce their own
policies for handling schema-invalid documents. This ability
should not be taken away when using a more efficient XML
encoding. The same logic used to handle schema-invalid documents
encoded as characters, should be available for processing
schema-invalid documents encoded as bits
This is not to
say that client application must perform schema validation on
XML encoded documents. On the contrary the encoding can
communicate the results of validation performed by the encoder.
The client simply reads this information and applies the
appropriate error policies.
While the W3C is
determining the need for a second standard encoding for XML, it
may be worthwhile to invest some time into defining a standard
API for an extensible encoder architecture, whereby custom
CODECs could be added to XML information set implementations.
There are hundreds of legacy data formats that could benefit
from plugging into the XML ecosystem. A standard API that allows
developers to plug in custom CODECs for creating an XML
information set from a custom format or serializing an
information set in a custom format would enable more users to
take advantage of the rich set of XML technologies, tools,
libraries and development expertise. At the same time, it would
lower barriers to XML adoption, expanding the XML community and
associated market for XML solutions.
Some of the
proposed efficient XML encoding methods specify features we
consider out of scope for a W3C standard. Some of these are
features that could easily be added on top of an efficient XML
encoding.
The data format
should not incur any additional overhead to support random
access to the data over and above that provided by the ability
to encode query results (see section 4.4). This matches the
current level of random access available to XML application
today and should be sufficient.
Dynamic updates
should be out of scope for the efficient XML encoding. The
ability to incrementally receive, reconstruct and update
documents on a client can easily be added on top of an
appropriate XML encoding in environments where it is needed.
Applications that don’t need this capability should not incur
the additional overhead associated with it.
Versioning of XML
schemas is going to be critical for XML as the number of systems
exchanging XML data increases and those individual systems begin
to evolve independently. Currently, the XML standards do not
define a mechanism for expressing the version relationship
between two or more XML schemas. For example, there is no widely
accepted way to express that schema A is a later version of
schema B or that schema A is compatible with schema B (i.e.,
valid instances of schema A are also valid instances of schema B
and have the same meaning). Consequently, achieving continuous
information interoperability over time between a diverse set of
systems maintained by differing organizations according to
different schedules and different priorities will be at best
expensive and at worst impossible
That said, we do
not believe the XML encoding is the right place to begin
addressing this problem. We believe the W3C should take a
holistic view of the versioning problem and addresses it across
the spectrum of XML technologies so the solution is not narrowly
applicable to the efficient XML encoding.
In the interim,
applications must continue to address this problem outside the
scope of XML standards. The efficient XML encoding should not
incur additional overhead to address it at this time.
Mixing of text
and binary encodings is not a requirement for us and we do not
believe the efficient XML encoding should incur any additional
overhead to support this feature.
Given the recent
interest and activity around efficient XML encodings, we believe
now it is the right time to initiate a W3C working group to
investigate the development of a single, global standard for
efficiently encoding XML data. We believe this activity will
result in several benefits to the current and future XML
community. We are interested in sharing our history, expertise
and ongoing research and development in this area with W3C
members to expedite the development of this standard.
[1] O. Avaro and P. Salembier. MPEG-7
Systems: Overview. IEEE Transactions on Circuits and Systems
for Video Technology, Vol II, No. 6, June 2001.
[2] P. V. Biron and A. Malhotra. XML
Schma Part 2: Datatypes. W3C Recommendation, May 2001.
Available as
http://www.w3.org/TR/xmlschema-2/.
[3] J. Boyer. Canonical XML Version
1.0. W3C Recommendation, March 2001. Available as
http://www.w3.org/TR/xml-c14n
[4] R. Cherinka, J. Ricci and J.
Schneider. Applying Knowledge Based Compression to Reduce DoD
Bandwidth Requirements. MITRE WN 96B0000139, July 1996
[5] M. Cokus and D. Winkowski. XML
Sizing and Compression Study for Military Wireless Data.
Proceedings of XML 2002 Conference. December 2002. Available
as
http://www.idealliance.org/papers/xml02/dx_xml02/html/abstract/06-02-04.html.
[6] J. Cowan and R. Tobin. XML
Information Set. W3C Recommendation, October 2001.
Available as
http://www.w3.org/TR/xml-infoset.
[7] DoD Metadata Registry and Clearing
House. Available as
http://diides.ncr.disa.mil/xmlreg/user/index.cfm.
[8] L. Dodds. Good Things Come In Small
Packages. March 2000. Available as
http://www.xml.com/pub/a/2000/03/22/deviant/#xml.
[9] L. Dodds. Dictionaries and
Datagrams. January 2001. Available as
http://www.xml.com/pub/a/2001/01/24/deviant.html.
[10] J. Ferraiolo et. al. Scalable Vector
Graphics (SVG) 1.0 Specification. W3C Recommendation,
September 2001. Available as
http://www.w3.org/TR/SVG/.
[11] M. Girardot and N. Sundaresan.
Millau: An Encoding Format for Efficient Representation and
Exchange of XML over the Web. Proceedings of the 9th
International World Wide Web Conference, May 2000. Available
as
http://www9.org/w9cdrom/154/154.html
[12] M. Gudgin, M. Hadley, N.
Mendelsohn, J. Moreau, H. Nielsen. SOAP Version 1.2 part 1:
Messaging Framework. W3C Recommendation, June 2003.
Available as
http://www.w3.org/TR/soap12-part1/.
[13] ISO/IEC JTC1/SC29/WG11. Information
Technology – Multimedia Content Description Interface – Part 1:
Systems. Working draft of Text of ISO/IEC 15938-1/PDAM1.
December, 2002
[14] H. Liefke and D. Suciu. XMill: an
Efficient Compressor for XML. December 1999. Available as
http://www.research.att.com/sw/tools/xmill/.
[15] M. A. Malloy and J. C.
Schneider. Experiences Designing Query Languages for
Hierarchically Structured Text Documents. Proceedings of the
W3C Workshop on XML Query Languages, December 1998.
Available as
http://www.w3.org/TandS/QL/QL98/pp/mitre.html.
[16] S. Pemberton, D. Austin, J.
Axelsson, et al. XHTML 1.0 The Extensible HyperText Markup
Language (Second Edition). W3C Recommendation, August
2002. Available as
http://www.w3.org/TR/xhtml1
[17] H. S. Thompson, D Beech, M.
Maloney and N. Mendelsohn. XML Schema Part 1: Structures. W3C
Recommendation, May 2001. Available as
http://www.w3.org/TR/xmlschema-1/.
[18] P. Sandoz and S. Pericas-Geertsen.
Fast Web Services. Proceedings of JavaOne 2003 Conference,
June 2003. Available as
http://servlet.java.sun.com/javaone/sf2003/conf/sessions/display-3752.en.jsp.
[19] Semantic Web. Available as
http://www.w3.org/2001/sw/.
[20] C. E. Shannon. A mathematica
theory of communication. Bell System Technical Journal,
27:379-423 and 623-656, July and October 1948.
[21] N. Sundaresan and R. Moussa,
Algorithms and Programming Models for Efficient Representation
of XML for Internet Applications. Proceedings of 10th
International World Wide Web Conference, May 2001. Available
as
http://www10.org/cdrom/papers/542/.
[22] What ASN.1 Can Offer to XML.
Available as
http://asn1.elibel.tm.fr/xml/.
[23] xml.org Schema Registry.
Available as
http://www.xml.org/xml/registry.jsp.