Pages

Banner 468

Thursday, 3 November 2011

CMT3315 Lab 03 - XML Syntax 1

0 comments
 

Today's post introduces the basics of XML syntax and answers a number of questions related to this topic.

XML document structure & syntax

The basic idea behind XML is to produce documents whose structure can be understood by software applications. In an XML document, pieces of text that have special meaning are marked up using tags. A tag is simply a word between angle brackets such as "<name>". An XML document is made up of 3 parts:

  • A prolog (optional);
  • The document or root element;
  • Other miscellaneous content following the root element end tag.(optional)

The prolog is an optional component of an XML document which, if included, must appear before the root element. The prolog consists of an XML declaration, which defines the XML version and character encoding being used, and a Document Type Declaration (DTD). The prolog may also contain comments. Here's an example of an XML prolog:

   <?xml version="1.0" encoding="UTF-8"?>
   <!--This is a comment-->
   <DOCTYPE musicCollection"<

Lines 1 and 2 are the XML declaration and a comment respectively. Line 3 is the Document Type Declaration which provides the name of the document element "musicCollection". The document or root element follows the prolog. An XML document must have only one root element which in many cases contains a heirarchy of other elements. For instance, let's assume that our music collection is made up of a number of CDs where each CD has a title, an artist and a number of tracks. A suitable XML document to store this information would be:

   <?xml version="1.0" encoding="UTF-8"?>
   <DOCTYPE musicCollection>
   <!--Prolog ends here. -->
   <musicCollection>
      <cd index="1">
         <title>Innuendo</title>
         <artist>Queen</artist>
         <tracks>
            <track id="1">Innuendo</track>
            <track id="2">I'm Going Slightly Mad</track>
            <track id="3">Headlong</track>
            <track id="4">I Can't Live With You</track>
            <track id="5">Don't Try So Hard</track>
            <track id="6">Ride The Wild Wind</track>
            <track id="7">All God's People</track>
            <track id="8">These Are The Days Of Our Lives</track>
            <track id="9">Delilah</track>
            <track id="10">The Hitman</track>
            <track id="11">Bijou</track>
            <track id="12">The Show Must Go On</track>
          </tracks>
      </cd>
      <cd index="2">
         <title>(What's The Story) Morning Glory?</title>
         <artist>Oasis</artist>
         <tracks>
            <track id="1">Hello</track>
            <track id="2">Roll With It</track>
            <track id="3">Wonderwall</track>
            <track id="4">Don't Look Back In Anger</track>
            <track id="5">Hey Now!</track>
            <track id="6">Untitled</track>
            <track id="7">Some Might Say</track>
            <track id="8">Cast No Shadow</track>
            <track id="9">She's Electric</track>
            <track id="10">Morning Glory</track>
            <track id="11">Untitled</track>
            <track id="12">Champagne Supernova</track>
          </tracks>
      </cd>
   </musicCollection>

Above we can see that our "musicCollection" root element contains two "cd" elements which in turn contain further elements describing the CD. We can also see that every start tag has a matching end tag (e.g. "<cd>" and "</cd>"). XML elements can have zero, one or more child elements and all (except for the root element) must have a parent element. Furthermore, XML elements must be correctly nested for the document to be valid. Elements in XML can also contain attributes and when present, these should be placed withn the element's start tag. In our example, every "cd" element has an attribute called "index" and every "track" element has an attribute called "id". XML attributes can be thought of data describing data, generally (but not necessarily) used for storing ID's. An attribute is made up of a name, followed by an "=" sign and a value within quotes, which may be single or double as long as they match. We can use this example as a basis for discussing the next section:

Well Formedness

A well formed XML document is one that follows proper XML syntax. Unlike most HTML parsers, XML parsers expect the document to be well formed and will stop processing the document if any syntax errors are found. An XML document must be structured as discussed in the previous section and all element and attribute names must be valid. The first character in a name must be either a letter ([A-Z][a-z]) a colon ":" or an underscore "_" and the name cannot start with the letters "xml". The rest of the characters can also include numbers ([0-9]), dashes "-" and fullstops ".".

XML is case-sensitive so using our example above, the element "<track>" is not the same as the element <Track>. Furthermore, end tags must match their start tags. In some cases, XML elements will not contain any information. In such cases the end tag may be replaced by a "/" at the end of a start tag, for example:

 <emptyElement> index="0" />
 <!-- is equivalent to -->
 <emptyElement> index="0"><emptyElement>

These are the basic rules to follow to create a well formed XML document.

Lab Questions
Q1. Write an XML document that contains the following information:
  • Your name;
  • Your email address;
  • Your student number;
  • Your home town;
  • Your date of birth.
Choose appropriate tags. Use attributes for the date of birth.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE student>
<student dateOfBirth="01/01/1990">
 <name>Wayne Zammit</name>
 <email>wayne@somedomain.com</email>
 <studentNo>NN123</studentNo>
 <homeTown>St. Julians</homeTown>
</student>


Q2. Identify all the syntax errors in the XML document below:

<?xml version= "1.0" ?>
<!DOCTYPE countryCollection SYSTEM "countryList.dtd">
<CountryList>
<Nations TotalNations ="3"/>
<!--Data from CIA --Year Book -->
 <Country CountryCode="1"> 
  <OfficialName>United States of America</officialName>
  <Label>Common Names:</label>  
  <CommonName>United States</commonName>
  <CommonName>U.S.</commonName>
  <Label>Capital:</capital>  
  <Capital cityNum="1">Washington, D.C. </label>
  <2ndCity cityNum="2">New York </2ndCity> 
  <Label>Major Cities:</label> 
  <MajorCity cityNum="3">Los Angeles </majorCity>
  <MajorCity cityNum="4">Chicago </majorCity>  
  <MajorCity cityNum="5’>Dallas </majorCity>  
  <Label>Bordering Bodies of Water:</label>    
  <BorderingBodyOfWater> Atlantic Ocean </borderingBodyOfWater>
  <BorderingBodyOfWater> Pacific Ocean </borderingBodyOfWater>  
  <BorderingBodyOfWater> Gulf of Mexico </borderingBodyOfWater> 
  <Label>Bordering Countries:</label>   
  <BorderingCountry CountryCode="1"> Canada </borderingCountry>    
  <BorderingCountry CountryCode ="52"> Mexico </borderingCountry>
</country>
 <Country CountryCode="81">
  <OfficialName> Japan </officialName>
  <Label>Common Names:</label>    
  <CommonName> Japan </commonName>
  <Label>Capital:</label>  
  <Capital>Tokyo</capital cityNum="1">
  <2ndCity cityNum="2">Osaka </2ndCity>
  <Label>Major Cities:</label>    
  <MajorCity cityNum="3">Nagoya </majorCity>
  <MajorCity cityNum="4">Osaka </majorCity>  
  <MajorCity cityNum="5’>Kobe </majorCity>  
  <Label>Bordering Bodies of Water:</label>
  <BorderingBodyOfWater>Sea of Japan </borderingBodyOfWater>
  <BorderingBodyOfWater>Pacific Ocean </borderingBodyOfWater>  
 </country>
 <Country CountryCode="254">
  <OfficialName> Republic of Kenya </officialName>
  <Label>Common Names:</label>    
  <CommonName> Kenya </commonName>
  <Label>Capital:</label>  
  <Capital cityNum=’1’>Nairobi </capital>
  <2ndCity cityNum=’2’>Mombasa</2ndCity>
  <Label>Major Cities:</label>    
  <MajorCity cityNum=’3’>Mombasa </majorCity>
  <MajorCity cityNum=’4’>Lamu </majorCity>
  <MajorCity cityNum=’5’>Malindi </majorCity>  
  <MajorCity cityNum=’6’ cityNum=’7’>Kisumu-Kericho </majorCity> 
  <Label>Bordering Bodies of Water:</label>
  <BorderingBodyOfWater <!--Also Lake Victoria --> > Indian Ocean </borderingBodyOfWater>
 </country> 
The XML document contains various syntax errors including:

  • The root node should be called "<countryCollection>" as specified in the DOCTYPE declaration.  There is also no matching closing tag;
  • The comment at line 5 contains an extra "--" between its start and end tags.  This is not allowed;
  • Start tags do not match end tags (different letter-casing) and in some instances (eg lines 11 and 12) end tags have been swapped;
  • The "2ndCity" element name is invalid as names cannot start with a number (lines 13, 32, 47);
  • Mismatching quote at lines 17 and 36;
  • Attribute placed in the end tag at line 31.  This should be placed in the start tag;
  • Single quotes used to enclose attribute values at lines 46 to 52;
  • Duplicate attribute "cityNum" at line 52;
  • Comment placed within the start tag at line 54;


Leave a Reply