Learn more about the RelaxNG schema definitions used in the content editor.


Overview

The definition of the structure (schema) of a document and its validation plays an important role for the work with XML documents. censhare 4 uses its own schema concept that is based upon the possibilities and the syntax of the DTD (document type definition). This makes it more difficult to relay XML documents outside of censhare. Besides that, the DTD and its possibilities for structuring documents are no longer state-of-the-art. As of the development of censhare 5 it would have been necessary to implement a new censhare-4-Schema. Therefore, the search for another solution started, too.

RelaxNG (REgular LAnguage for XML Next Generation) and the XMS-Schema W3C, also named XSD (XML Schema Definition), were candidates for the replacement.

The reasons for RelaxNG in a nutshell

  • Good solution for document-based XML applications

  • Document structures (schemas) have high flexibility.

  • The schema definition is very easy to read.

  • Assistance for the user suggesting allowed elements in the different places in the document

  • Own extensions are part of the standard

  • In reality proven standard

XSD shows its strengths in the exchange of data between programs. In opposite, RelaxNG is a very powerful language for the description of documents. In this area, RelaxNG is also the better solution because the schemas are much more readable than the ones with XSD. Both standards are proven in reality.

The main difference between XSD and RelaxNG is when it comes to validation. In the opposite to XSD, a RelaxNG-based tool can deliver suggestions for elements at all the different places of a document. This was one of the main requirements when it came to the evaluation because it's mostly the user and not a program that is using the XML documents.

This is the reason why censhare counts on the standard RelaxNG beginning from version 5. The censhare-RelaxNG-Schema still confirms with standard if there are own extension. Using that schema customers can validate XML documents, distribute them simply and adapt them according to their needs. As a well-accepted standard RelaxNG makes a contribution to the future of censhare.

RelaxNG -The better choice for documents

Introduction

First came the Internet and then the mobile clients: As of that, companies can always reach less their customers less than one channel. Therefore, the goal is using different output channels with the least effort in parallel. Therefore, XML is used for creating documents. They are then prepared for the different channels. The necessary structure of the XML documents defines an XML schema. There are different schema languages available to define this structure. The capabilities of these schemas determine the possibilities to describe the structure of the content as precisely as possible. For this reason, the selection of the XML schema language is the main point for censhare.

The initial situation

So far, censhare has used the DTD (document type definition) for describing the structure of XML texts within the content editor. As the content editor needs further information for styles, templates and localization the DTD structure has been extended. Unfortunately, the DTD does not allow to extend the standard, for instance, for localization of element names. This is the reason why a syntax has been developed that diverges from standard using supplements. So, the censhare-4-Schema differs from the DTD standard. The validation of the censhare-4-Schemas is only possible within censhare. This makes it more difficult to distribute XML documents that are using the censhare-4-Schema.

The opportunity for reorientation

Version5 of censhare is required to implement a new censhare-4-Schema and the belonging validation engine. A validator checks the compliance of an XML document with an XML schema. As of the new implementation, the censhare AG decided to search for alternatives for the existing solution based upon the DTD. There were three candidates which were available: DTD, XSD (XML Schema (W3C)) and RelaxNG (Regular Language for XML Next Generation).

The requirements for a new solution

For the use of XML schema, there are two major application areas: the description of the structure of documents on the one side and on the other side the data-exchange formats for programs and databases. The requirements for both applications areas are very different. As of that, an XML schema language for the description of an exchange format is not automatically suited for the description of documents and vice versa. With censhare, the focus is on the description of documents.

A validator for a new XML schema language has to allow an in-depth inspection of the structure of an XML document. This includes tracking an error to the appropriate element in the document. This allows a program to show a user exactly the place in the document that does not comply with the structure. Besides that, it is possible to give the user a list of the allowed XML elements in the actual context when he is working with the document.

It is also important that the XML schema language can be extended to user-defined elements like styles, templates or localization. But external XML editors must also be able to read the XML schema and documents that are using this schema.

DTD as the oldest of three XML schema languages

As with RelaxNG and XSD, the DTD separates the layout, the structure and the text itself. It originates from SGML and is comparatively old. Maybe, this is a reason why DTD, unlike XSD and RelaxNG, no longer corresponds to the current state of XML schema description languages. Compared to the younger standards XSD and RelaxNG the DTD syntax is not based on a well-formed XML syntax that can be validated.

At DTD exists no native support for namespaces. There is only one data type, text string, in the DTD standard. To extend the data types on your own is not possible. Therefore, it is not possible to define the type of an individual element or check that.

XSD for exchanging data

In comparison to RelaxNG XSD is more known. This is because it is used intensively, especially in the field of web programming. XSD is very useful to describe formats for exchanging data between applications. Appropriate tools produce automatically schemas for exchange formats, for example, from databases. Other tools can use code with annotations to create XSD schemas. Annotations itself are not part of the actual code. They are evaluated, for example, by compilers or XSD tools.

Unlike the DTD and RelaxNG XSD has an extensive set of data types. However, XSD data type definitions can be incorporated into RelaxNG. With version 1.1 (adopted in 2012) XSD receives the option of using assertions to formulate conditions and to verify their compliance. A condition can look like this: An element A can either have an attribute B or a child element C, but not both. RelaxNG has this possibility from the beginning. Here you can describe such a condition much easier and also read more easily as compared to XSD.

This leads to a further disadvantage of XSD: schema descriptions are comparatively hard to read for humans. For the typical application range of XSD in programming, this is not a special requirement. Mainly programs, not people, work with the schemas. For reliable data exchange, it is important to describe the structures as clearly as possible. Ambiguities as expressly allowed in RelaxNG, are not helpful here. But they offer more flexibility.

XSD still has a drawback: Validators can identify a faulty point in an XML text. However, under certain circumstances, they cannot provide the list of allowed elements at this point. For a program that is negligible. It will reject the document and re-request it. However, for a user, it is much easier to correct the error if a list of allowed elements is available at the faulty point.

RelaxNG for working with documents

While XSD is the likely tool for data-centric applications, RelaxNG has its strengths when working with documents. So RelaxNG is significantly more powerful in describing structures. The RelaxNG schema can be read well, this makes it easy for users to create or adapt schemas.

The power of RelaxNG is the flexibility to describe complex structures. XSD has difficulties describing certain constructs in RelaxNG or cannot do it all. For instance, RelaxNG allows you to have any order of elements ("interleave"). For certain applications, this is not necessary or desirable to do so. XSD, however, always requires an explicit order, which is important for data applications. Another example is the so-called "Mixed Content". The schema defines a certain order of elements. In between, there can be any continuous text. For example, in the brief description of wine there should always appear certain properties in the same order like the type of grape or region. This is also possible with XSD but it is more expansive and associated with certain restrictions.

Another strength of RelaxNG is the possibility of modularization. RelaxNG schemas can be expanded or adapted for various application scenarios. This can also be used for more clarity in large RelaxNG structures. For example, a company can design schemas for different areas and then combine them into a single structure. This makes it easier to create an overall schema. Or you have a basic schema that is always used. Depending on the customer or the application case there will be definitions added or overwritten in the basic schema.

Unlike XSD RelaxNG is based on a formal mathematical specification. This allows you to determine formally if the change of a schema is backward compatible with the initial schema. The combination of two RelaxNG schemas is again a RelaxNG schema. This is not always true with XSD. It is also possible to convert RelaxNG schemas into other schema description languages.

Unlike RelaxNG a validator for XSD cannot always determine incorrect elements or suggests allowed elements if the XML document is invalid.

Although RelaxNG does not have the name recognition of XSD, it has proven itself well in practice. Evidence is used in the DocBook, Adobe InDesign/InCopy Markup Languages IDML/ICML, DITA (Darwin Information Typing Architecture) or EPUB. oXygen from Syncro Soft as a widespread XML editor supports the RelaxNG standard.

As RelaxNG has more options for defining schemas, validation is somewhat more complicated than XSD. But this is a challenge only for the developers. The users themselves do not experience this complexity, for instance, if they edit a text that is based upon a RelaxNG schema.

Appendices

Appendix 1: FAQs

Why do you use the relatively unknown RelaxNG to describe XML schemas?

XSD is especially more well-known because it is widely used in web programming. In document-oriented applications such as Docbook RelaxNG is against common practice. It significantly offers more possibilities than XSD to describe structures clearly, flexibly and understandable. These are the applications censhare is made for. The strength of XSD is data applications. Another advantage of RelaxNG over XSD in the area of document-oriented applications is: RelaxNG can validate at element level. Therefore, it is always possible to point to defect elements directly and propose permitted elements contextually. There are some difficulties with XSD to do that. Besides that, RelaxNG schemas are easier to read than XSD schemas.

With RelaxNG the censhare AG decided on a language that is much more complex than XSD?

The opposite is true. RelaxNG schemas are much easier to read and understandable. Only, the technical implementation is more complicated.

Can I store my own RelaxNG schemas in censhare and use them?

Yes, this functionality is analogously planned to the Content-Editor in censhare 4.

Can I convert standard DTDs and XSDs into RelaxNG? 

Appropriate XML tools like oXygen from Syncro Soft are able to do that or can assist.

In addition to the XML syntax, a compact representation exists for a RelaxNG schema. What does the support look like?

The compact syntax represents a more compressible RelaxNG schema than the XML syntax. However, since the compact syntax has no XML structure it cannot be processed by an XML parser. For this reason, censhare does not support the compact syntax. However, it is always possible to convert the XML structure in the compact syntax with the help of relevant tools like oXygen of Syncro Soft and back.

Appendix 2: Examples of the flexibility of RelaxNG

Flexibility in the order of attributes and elements

In RelaxNG you can easily define dependencies when using attributes and elements. Such a dependency can look like this: An element may have either an attribute or specific elements as children, but not both at the same time.

<element=“Addresses“

<zeroOrMore
		<element name=“Person“
			<element name=“Name“
 				<text/
			</element
			<choice
				<attribute name=“Contact “
					<ref name="Person"/
				</attribute
				<group
					<element name=“Street“
						<text/
					</element
					<element name=“Location “
						<text/
					</element			
				</group
			</choice
		</element

</zeroOrMore
</element>
CODE

Example 1: Definition of dependencies in RelaxNG

In Example 1 a "Person" may either have an attribute "Contact" with a reference to another "Person" or the two elements "Street" and "Location". Both together are not allowed. The condition "choice" ensures this. The element "group" combines the elements "Street" and "Location" together. Therefore, both elements can only occur together.

It also demonstrates that RelaxNG attributes and elements are treated equally close related to the syntax. This simplifies the description, unlike XSD which requires a clear separation.

Readability of schemas in RelaxNG and XSD

A schema description in RelaxNG is easier to read than in XSD. The following example will illustrate this. A list of people can have an attribute either "Name" or "Alias", but not both. For better readability, the schema defines a "Person_object" to which the list definition then refers.

<define name=“Person_object“
		<choice
			<attribute name=“Name“/
			<attribute name=“Alias/“
		</choice

</define>


<element=“Persons“
 		<zeroOrMore
			<element=“Person“
				<ref name="Person_object"/ 
			</element
 		</zeroOrMore

</element>
CODE

Example 2: schema description for a list of persons in RelaxNG


<xs:complexType name="Person_object" 
		<xs:attribute name="Name" type="xs:string"/
		<xs:attribute name="Alias" type="xs:string"/
		<xs:assert test="count(@Name | @Alias) eq 1"/ 

</xs:complexType>


<xs:element name="Persons" 
		<xs:complexType
			<xs:sequence
 				<xs:element name="Person" type= "Person_object" maxOccurs="unbounded"/ 
			</xs:sequence
		</xs:complexType 
 </xs:element>
CODE

Example 3: Schema description for a list of persons in XSD

With „<xs:assert test="count(@Name | @Alias) eq 1"/>“ XSD assures in the Example 3 that either „Name“ or „Alias“ is used. This is a so-called assertion in XSD. RelaxNG uses the element pair "<choice> ... </choice>“ in Example 2 to guarantee the constraint .

Assertions are also a problem in XSD when it comes to validation. The validator treats assertions as a mathematical equation that he evaluates. The assertion is true or not. But he cannot infer conversely using the assertion which elements or attributes are allowed at a certain incorrect place in the document. Therefore, an application cannot deliver a list of allowed elements respectively attributes.

Ambiguities in XSD and RelaxNG

As the goal of XSD is the description of data for exchange between programs, the unambiguity of the structure is an important principle in XSD. Therefore, an element in an XSD schema must be unique regardless of its attributes or the content. It may occur only once in total. The XSD-Standard calls this the „Unique Particle Attribution“. This leads to problems with the description of structures for texts. The following simple example illustrates this. It is about an XML schema for a book. This may consist of a sequence of odd and even pages and ends, if necessary, with an odd page.


<xs:group name="pages"
		<xs:sequence
			<xs:sequence minOccurs="0" maxOccurs="unbounded"
				<xs:element ref="odd-page"/
				<xs:element ref="even-page"/
			</xs:sequence
   		<xs:element ref="odd-page" minOccurs="0"/
		</xs:sequence

</xs:group>
CODE

Example 4: The XSD schema definition is not valid because it allows ambiguity.

The XSD schema definition in Example 4 leads to the error: "non-deterministic content model". It is not a valid schema. In many cases, it is possible to get a valid XSD schema by modifying the original one. However, this leads to a more complex expression which also reduces the readability. This way is not possible for the book task in Example 4. To come to a valid definition, there is also the possibility to insert an "anyElement".


<xs:group name="pages"
		<xs:sequence
			<xs:sequence minOccurs="0" maxOccurs="unbounded"
				<xs:element ref="odd-page"/
				<xs:element ref="even-page"/
			</xs:sequence
			<anyElement/  
		</xs:sequence

</xs:group>
CODE

Example 5: Using an "anyElement" in order to receive a valid XML schema for the book task

Using "anyElement" the XSD schema becomes valid. However, the solution leads away from the target to specify a structure for the book that is as exact as possible. After all, the use of the "anyElement" causes the opposite: The code in Example 5 allows almost any structure.


<zeroOrMore
		<ref name="odd-page" /
		<ref name="even-page" /

</zeroOrMore>

<optional
		<ref name="odd-page" /

</optional>
CODE

Example 6: Valid RelaxNG schema for the specified book structure

In the opposite to XSD, it is possible in RelaxNG to express the book structure short and elegant, as shown in Example 6.

Appendix 3: Sources

Title: An algorithm for RELAX NG validation
Author: James Clark
Link: www.thaiopensource.com

Title: Documents vs. Data, Schemas vs. Schemas Author: Bob DuCharme
Link: www.snee.com

Title: How to define an XSD element with either one of the two attributes?
Website: IBM developerWorks, Forums, Q & A
Link: www.ibm.com

Title: OASIS - RELAX NG Tutorial
Author: James Clark, Makoto Murata
Link: relaxng.org

Title: Relax NG
Website: Wikipedia
Link: en.wikipedia.org

Title: RelaxNG-Book
Author: Eric van der Vlist
Link: books.xmlschemata.org

Title: RELAX NG home page
Author: Makoto Murata
Link: relaxng.org

Title: Taxonomy of XML Schema Languages using Formal Language Theory
Author: Makoto Murata, Dongwon Lee, Murali Mani, Kohsuke Kawaguchi

Title: XML Schema: how to declare complexType that has either attribute or child with the same name
Website: Stack Overflow, Q & A
Link: stackoverflow.com

Title: XML schema languages
Website: Wikipedia
Link: en.wikipedia.org

Title: XML Schema (W3C)
Website: Wikipedia
Link: en.wikipedia.org

Title: XML Schema Part 1: Structures Second Edition
Author: Henry S. Thompson, David Beech, Murray Maloney, Noah Mendelsohn
Website: W3C
Link:www.w3.org