Why XML?

By Sowmya Sannaiah

Technical Communication professionals have been talking about authoring in XML for a very long time. XML, a cross-platform markup language, was initially designed to meet the challenges of large-scale e-publishing. Were the challenges met? Did XML succeed in exchanging a wide variety of data on the web? Let’s discuss.

So, what is XML?

Extensible Markup Language (XML) is a cross-platform, software and hardware independent markup language derived from Standard Generalized Markup Language (SGML). It is purely a text-based technology with a self-descriptive language. The data in XML is structured using meaningful tags to specify a given set of information.  It contains sender data, receiver data, heading, and a message body. You can add tag anytime to extend the content of the document thus making XML extensible.

An XML document also allows data storage in a format that can be interpreted by any computer system and hence it is used to transfer structured data between heterogeneous systems. It plays a very significant role in the movement of a wide variety of data on the Web.

XML is an international document standard created by the World Wide Web Consortium (W3C), an organization that is responsible for maintaining web standards.

Defining document content

While documenting in XML, you need to define the elements or define the structure that can appear in an XML document. This can be done by using DTD and/or XML Schema:

  • Document Type Definition (DTD): It describes the order in which the data should appear, how the data can be nested, and other basic details of XML document structure. DTD is part of the XML specification and works similar to SGML DTD.
  • XML Schema: A schema defines all the document structures that can be added in a DTD, defines the data types, and other advanced rules that a DTD cannot define.

An XML document with a DTD or an XML Schema is designed to be self-descriptive.

XML and HTML

We should note that XML is not a replacement of Hypertext Markup Language (HTML). XML and HTML were designed to achieve different goals.

XML was originally designed to describe, store, and transfer data. XML was not designed to display data. The tags are not predefined but are created by the author of the XML document. Whereas HTML was designed to display data in web browsers and the tags are predefined. When HTML is used to display data, the data is embedded in HTML format.

XML editor

The XML files can be created and edited using a simple text-editor like Notepad. But professional XML editors help you to write error-free XML documents, validate the XML against DTD or a schema, and ensure that you adhere to a valid XML structure.

An XML editor should be able to:

  • Automatically add closing tags to the opening tags
  • Validate XML code
  • Verify XML against DTD and Schema
  • Color code the XML syntax to increase readability

Advantages of XML

Some of the advantages offered by XML:

  • Human readable content: The tags, elements, and attributes in XML files are not only computer readable but also can easily be interpreted by humans. This is the greatest advantage for writers who have limited knowledge of programming languages.
  • Domain-specific vocabulary: As XML does not have any predefined tags, it allows the user to create tags based on the requirement of an application. In other words, XML allows domain-specific vocabulary per the need of the application without any restriction on the number of tags that can be defined.
  • Ease of data interchange between computer systems: XML provides the structure for storing data in text format. It is used as a popular standard format for data interchange. Thus, differences in how systems exchange data become insignificant. It produces files that are unambiguous, easy to generate, and easy to read.
  • Better search engine performance: The XML file creators can inform a search engine that the search needs to be performed within certain tags. This allows a focused search. Therefore, using XML ensures the precision of the search result that matches the search query.
  • Separation of contents and formats: XML allows the user to implement conditional formatting for an XML document. A separate style sheet is maintained to format the XML document. XML uses two types of style sheets, Cascading Style Sheet (CSS) and Extensible Style Language (XSL), for formatting data.  Because of this separation, it is easy to update and maintain the format of the document whenever required. Also, it is easy to maintain a consistent style sheet for all documents.
  • Granular updates: When data in an XML document needs to be updated, the entire page need not be reloaded from the server. Only the changed content is downloaded, thus making updates faster.
  • Flexibility: Writing an XML document is easy as compared to other markup languages. There are no predefined rules to follow, users can create their own tags and rules to serve their needs. So in terms of developing a document, XML is very flexible.
  • Multiple data types: XML documents can contain many data types which include multimedia data like image, sound, and videos. These multimedia data are embedded directly in an XML document as encoded text.
  • Ease of translation and publishing: When content is stored in XML tags, the cost of translations can be reduced by automation. It is much easier to translate an XML file because it allows content and format separation and it follows a rigorous standard and well defined syntax. Publishing the document in several languages can be done with a single click because formatting could be applied automatically while publishing source XML files.
  • Forward and backward compatibility: Forward or backward compatibility of XML files is relatively easy to implement: DTD and Schema allow tags to be defined as optional. As long as the newly added tags are descendants of the optional tag, the old and new versions are mutually compatible.

Why should we use XML in technical documentation?

A finished document can be assessed from two dimensions – effectiveness and efficiency. Effectiveness is whether the content clearly explains the product and the procedure to the reader. Efficiency is how quickly and efficiently the document was created.

We know that XML tags the elements of a document. For example, you use tags for a heading, a paragraph, and an item in a numbered list. As these tags help the rendering engine formats it appropriately for a wide range of output media, for content creators, the burden of formatting is reduced tremendously.  Since the flexible tagging scheme helps to define the content and improve readability, many software engineers have been increasingly using XML as a means to document computer programs. We, Technical Communicators, also adopt XML for the same reasons.

When XML is suitable?

XML is especially suitable for documents with complex structures. Darwin Information Type Architecture (DITA) standards is a niche XML that is widely adopted in technical communications. DITA standards are written for different industries and different document types. Although DITA allows you to customize the tags as per your required style, customization is time consuming and costly. For structured documents that need more flexibility, XML is a good alternative.

Wide adoption of XML

As per W3C, XML is one of the world’s most widely-used formats for representing and exchanging information.

XML helps to represent, process, and exchange information with robustness and efficiency. Hence XML is heavily used as a format for document storage and processing, both online and offline.

Today, XML not only works with documents but also works with JSON, linked data, large databases (both SQL/relational and NoSQL), the Internet of Things (IoT), music players, in automobiles and aircraft industries. It is found almost everywhere.

Future prospects of XML in documentation

XML is a simple and very flexible markup language which will be a great foundation for many standards yet to come. XML also provides a common language that different computer system can use to exchange data with one another. Even when each industry group comes up with new standards for what it wants to communicate, computers can still exchange data with minimal barriers.

About the Author:

Sowmya Sannaiah

Sowmya is a Technical Communicator with experience in IT documentation.

LinkedIn: https://www.linkedin.com/in/sowmya-sannaiah/

Blog: https://sowmyasannaiah.wordpress.com/


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s