Pages

Banner 468

Thursday, 17 March 2011

XML Basics

0 comments
 
In today's post I will be discussing the basics of XML and why it has become so important for the web.

What is XML?
XML stands for eXtensible Markup Language and was primarily designed as a means of "carrying" or storing data.  As its name implies, XML is a markup language which means that "tags" or "elements" are used to structure the data within the document.  In contrast to HTML (another markup language), where elements are pre-defined, in XML you define your own elements.  Consider the following example where we would like to store the following information about student projects as an XML document:
  • Student Name
  • Student ID
  • Project Title
  • Project Category
  • Project Abstract
  • Date Submitted
One way of structuring this data in XML would be the following:


In the example above, an element was defined for each data item that we needed to store and the actual data is placed between each element's opening and closing tags.  You may also notice that the <project> tag encapsulates all of the other elements.  Besides giving more meaning to the data, this element is also necessary because an XML document must always have one root element (more on validation later).

Taking it Further
In the previous example we can see that "student_name" and "student_id" both relate to a student while the rest of the elements relate to the project.  Also, since an XML document can only have one root element, the current structure prevents us from storing information about multiple projects.  With this in mind, here's a better way of structuring the document:


By adding the <projects> root element we can now have multiple <project> elements within the document and student-related information has also been nested (grouped) within the <student> element.  Finally, the <date_submitted> element has been broken down further into the <day>, <month> and <year> elements, which at first glance seem superfluous.  Consider however a scenario where you wanted to share this data with another system over the web. This system may or may not have the same date format as your system.  Breaking down the date into its constituent parts makes sure that the date is interpreted correctly.  I'll be discussing the advantages and practical uses of XML later on in this post.

XML Attributes
We have already seen how to define and use XML elements, however XML elements can also have attributes (like HTML).  Attributes are generally used to store information about the data within the element - I like to think of attributes as an element's metadata.  Attribute values must be enclosed within quotes and defined within the element's opening tag, just like in HTML.  Taking the previous example, the project category can be considered to be information describing the type of project and is a good candidate for an attribute:


Here, the category elements have been replaced by the category attributes of the project elements.  There is no real difference between the two examples as both XML documents contain the same data.  So when should we use attributes?  Opinions on the subject vary.  Some say that XML attributes are useful others say that they should be avoided.  My view is that it all depends on how extensible you want your XML document to be.  Keep in mind that XML elements can be nested, so an element that today holds a single value can be changed to store multiple values some time down the line.  Conversely, XML attributes can only store one value and will have to be changed to elements if the need to store multiple values arises which may have an adverse effect on systems that consume your XML documents.  In a nutshell, some careful planning and thinking ahead needs to be done when deciding between elements and attributes.  My rule of thumb is to use elements whenever in doubt.

Basic XML Validation using DTD
An XML document having correct syntax is said to be "Well Formed" but is not necessarily valid.  Besides being well formed, an XML document must also follow rules that define its structure for it to be valid.  One way of defining the structure of an XML document is by using a Document Type Definition (DTD).  XML documents can also be validated using XML Schemas but for the sake of simplicity, I will only be loking at DTDs in this post.  

A DTD defines the structure of the XML document by providing a list of elements and attributes that are valid or "legal" within the XML document.  Using the projects XML document discussed above as an example, the following DTD can be used to validate the document:

Line 3 states that this DTD applies to the "projects" document type which is our XML document root element.  Lines 5 to 16 list the legal elements that are expected within the XML document. Looking closer, Line 5 states the projects element (our root element) must have one or more occurances of the <project> element (the '+' sign indicates one or more).  Similarly line 6 states that the <project> element must have exactly one occurance (no '+' sign) of each of the <student>, <title>, <abstract> and <date_submitted> attributes.  The DTD also specifies that the lowest level elements such as id, name and surname contain PCDATA or Parsed Character data. In other words, the values of these elements will be parsed by the XML parser to check for entities and other markup.  Conversely, if CDATA (Character Data) was used,  the values would not be parsed and accepted as is.  Line 18 also specifies that the <project> entity must have (#REQUIRED) a "category" attribute of type CDATA.

The DTD can be defined within the XML document (as the example above) or as a separate file.  If used as a separate file, the <!DOCTYPE>" declaration should be written as "<!DOCTYPE projects SYSTEM 'DTDfilename'>".

The Importance of XML
As hinted earlier in this post, the fact that XML is text based makes it a truly platform independent way of storing and sharing data.  This is especially important for sharing data between applications over the web, where more often than not these applications are incompatible with one another.  XML also makes your data more available, in the sense that it can be consumed by any number of services and devices.  XML is also being used to create new languages.  XHTML, RSS, WSDL and more recently XAML are all based on XML.

Just the Beginning
In this post, we have just scraped the tip of the proverbial "iceberg" with respect to XML.  There are many other topics to explore including XML Transformations, XML Schema Validation, XML in web services and many more.  Hopefully, I will have the opportunity to cover XML in more detail in future posts.



Leave a Reply