Pages

Banner 468

Tuesday, 8 November 2011

CMT3315 LAB04 - XML Syntax 2

0 comments
 

Today's post introduces Document Type Definition (DTD) syntax and answers a number of questions relating to XML syntax and DTDs.

Document Type Definition

In last week's post we created the following XML document containing information about a small music collection:

   <?xml version="1.0" encoding="UTF-8"?>
   <DOCTYPE musicCollection>
   <!--Prolog ends here. -->
   <musicCollection>
      <cd index="1">
         <title>Innuendo</title>
         <artist>Queen</artist>
         <tracks>
            <track id="1">Innuendo</track>
            <track id="2">I'm Going Slightly Mad</track>
            <track id="3">Headlong</track>
            <track id="4">I Can't Live With You</track>
            <track id="5">Don't Try So Hard</track>
            <track id="6">Ride The Wild Wind</track>
            <track id="7">All God's People</track>
            <track id="8">These Are The Days Of Our Lives</track>
            <track id="9">Delilah</track>
            <track id="10">The Hitman</track>
            <track id="11">Bijou</track>
            <track id="12">The Show Must Go On</track>
          </tracks>
      </cd>
      <cd index="2">
         <title>(What's The Story) Morning Glory?</title>
         <artist>Oasis</artist>
         <tracks>
            <track id="1">Hello</track>
            <track id="2">Roll With It</track>
            <track id="3">Wonderwall</track>
            <track id="4">Don't Look Back In Anger</track>
            <track id="5">Hey Now!</track>
            <track id="6">Untitled</track>
            <track id="7">Some Might Say</track>
            <track id="8">Cast No Shadow</track>
            <track id="9">She's Electric</track>
            <track id="10">Morning Glory</track>
            <track id="11">Untitled</track>
            <track id="12">Champagne Supernova</track>
          </tracks>
      </cd>
   </musicCollection>

XML uses documents known as schemas to define the structure a particular class (type) of XML document should follow. In our example, Line 2 specifies that this document is of type musicCollection. We can use a Document Type Definition (DTD) to define the legal structure of the musicCollection document type. Looking at our example we can see that every "cd" element is expected to have a "title", an "artist" and a "tracks" element. In turn, the "tracks" element further contains multiple "track" elements. The number of occurences of an element is known as it's Cardinality. In a DTD, we can define what which elements are expected as well as their attributes, sequence and cardinality.

Using this information we can write our DTD, starting from the root element:

<!--musicCollection.dtd-->
<!ELEMENT musicCollection (cd+)>

This is the first line in our DTD document. It specifies that the "musicCollection" document element is made up of one or more "cd" elements. The "+" sign after "cd" specifies the "one or more" cardinality of the "cd" element. Other Cardinality specifiers include:

  • ? - means 0 or 1
  • * - means 0 or 1 or more
  • no specifier means "exactly 1"

Let's define the "cd" element next:

<!--musicCollection.dtd-->
<!ELEMENT musicCollection (cd+)>
<!ELEMENT cd (title, artist, tracks)>
<!ATTLIST cd index CDATA #REQUIRED>

The "cd" element is made up of the "title", "artist" and "tracks" elements. Given that each of these must appear only once in a "cd" element, no cardinality specifier was necessary. Additionally, the "cd" element has an "index" attribute which is defined at line 4. We also specify that the "index" attribute is made up of character data (CDATA) and is required (#REQUIRED). CDATA is not treated as markup by the XML parser and will not be parsed. Let's add the rest of our elements:

<!--musicCollection.dtd-->
<!ELEMENT musicCollection (cd+)>
<!ELEMENT cd (title, artist, tracks)>
<!ATTLIST cd index CDATA #REQUIRED>
<!ELEMENT title (#PCDATA)>
<!ELEMENT artist (#PCDATA)>
<!ELEMENT tracks (track+)>
<!ELEMENT track (#PCDATA)>
<!ATTLIST track id CDATA #REQUIRED>

Our DTD is now complete. Each of the "title", "artist" and "track" elements are defined as containing parsed character data (#PCDATA). In contrast to CDATA, PCDATA is text that will be parsed by the parser. We also specify that the "tracks" element should contain multiple "track" elements and that every "track" element must have an "id" attribute. All we need to do now is reference our DTD from the XML document. Assuming that our DTD file is called "musicCollection.dtd" and resides in the same directory as the XML file, the following modification to the XML's document type declaration will reference the DTD:

<DOCTYPE musicCollection SYSTEM "musicCollection.dtd">
Summary

DTDs can be declared inside of the XML document itself or as external files and are a simple yet effective way of defining and validating the structure of XML documents.

The following are the questions for this week's lab session

Quick Questions
Q1. What does XML stand for?  And CSS?

XML stands for Extensible Markup Language and CSS stands for Cascading Style Sheets.

Q2. Is this XML line well-formed? Say Why.

<b><i>This text is bold and italic </i></b>

Yes this line is well formed.  Both start tags match their end tags and are properly nested.

Q3. Is this XML document well-formed? Say why.

<?xml version= "1.0" ?>
<greeting>
Hello, world!
</greeting>
<greeting>
Hello Mars too!
</greeting>
No. This XML document is not well formed as it does not have a root element.

Longer Questions
Q1. Write an XML document that contains the following information:
  • The name of this course;
  • The name of this building;
  • The name of this room;
  • The start and end times of this session.
Choose appropriate tags.  Use attributes for the start and end times.
<?xml version= "1.0" ?>
<course>
 <name>CMT 3315 Advanced Web Technologies</name>
 <buildingName>STC Training</buildingName>
 <roomName>Room 5</roomName>
 <session startTime="18:00" endTime="21:00"/>
</course>



Q2. Identify all the syntax errors in the following XML document:

<?xml version= "1.0" ?>
<!DOCTYPE bookStock SYSTEM "bookstock.dtd">
<bookstore>
  <book category="Cooking">
    <title lang="en">Everyday Italian</title>
    <author>Giada De Laurentiis</author>
    <1stEdition>2005</1stEdition >
    <2ndEdition>2007</2ndEdition >
    <price>19.99</price currency="pounds sterling">
  </book>
  <book category="Children’>
    <title lang="en">Harry Potter and the enormous pile of money</title>
  <!—best selling children’s book of the year --2009 -->
    <author>J K. Rowling</author>
   <1stEdition>2005</1stEdition>
    <price>29.99</Price>
  </book>
  <book category="Web">
    <title lang="en">Learning XML</title>
    <author>Erik T. Ray</author>
   <1stEdition>2003</1stEdition>
   <2ndEdition >2008</2ndEdition >
    <price>29.95</discount>
    <discount>15%</price>
  </book>
  <book category="Computing">
    <title lang=en>Insanely great – the life and times of Macintosh, the computer that changed everything </title>
    <author <!—other authors not listed -->>Steven Levy</author>
   <1stEdition>1994</1stEdition>
    <price>9.95</discount>
    <discount>15%</price>
  </book>

The XML document contains various syntax errors including:
  • The root node should be called "bookStock" as specified in the DOCTYPE declaration. There is also no matching closing tag;
  • The "1stEdition" and "2ndEdition" element names are invalid as names cannot start with a number (lines 7, 8, 15, 21, 22, 29);
  • Attribute placed in the end tag at line 9. This should be placed in the start tag;
  • Mismatching quote at line 11;
  • Incorrect comment start tag and extra "--" in comment on line 13;
  • Mismatching start and end tags at lines 16, 23, 24, 30 and 31;
  • Missing quotes for attribute value at line 27;
  • Comment incorrectly placed within the start tag at line 28.  Comment start tag is also incorrect;


Q3. You are asked to produce a Document Type Declaration for a class of XML documents called “memo”. You come up with this .dtd file:

<!DOCTYPE memo
[
<!ELEMENT memo (to,from,heading,body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
]>


Your client says “That’s all very well, but every memo has to have a date. And some of them have to have a security classification too (you might want to write “Secret” at the top). And a memo has a serial number – I think that’s what you’d call an attribute, isn’t it?” How would you amend this .dtd file so that it did what the client wanted?

A suitable DTD to fulfill these requirements would be as follows:

<!DOCTYPE memo
[
<!ELEMENT memo (date,to,from,heading,body,classification?)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
<!ELEMENT date (#PCDATA)>
<!ELEMENT classification(#PCDATA)>
<!ATTLIST memo serialNo ID #REQUIRED>
]>

Given that not all memos will have a security classification, the DTD uses the "?" cardinality specifier to indicate that the "memo" element can contain zero or one "classification" element.

Other specifiers include:

  • "*" - means zero or one or more of the element is allowed
  • "+" - means that one or more of the element is allowed
Finally, attaching none of the cardinality specifiers to an element name means that the element must appear exactly once.

Leave a Reply