Pages

Banner 468

Thursday 8 December 2011

CMT3315 Lab 07 - DTD 3

0 comments
 
Quick Questions
Q1. People who prepare XML documents sometimes put part of the document in a CDATA section.

  • Why would they do that?
  • How is the CDATA section indicated?
  • If CDATA sections hadn't been invented, would there be any other way to achieve the same effect?

Sometimes, the contents of an XML document might have characters which have a special meaning in XML such as "<", ">" and "&".  When an XML document is being parsed, text between XML tags is also parsed so   including such characters 'as is' will break your XML document.  The XML parser will interpret them as XML syntax when in fact they are only part of the text and should be ignored.  The CDATA section solves this problem by marking a section of text as unparsed character data which the parser will ignore.

A CDATA section is indicated by placing text between the "<![CDATA[" start tag and "]]>" end tag.

Instead of using CDATA, one could also replace the "<", ">" and "&" characters with "&lt;", "&gt;" and "&amp;" respectively to achieve the same effect, a technique known as "escaping".  This option however is more laborious and makes the XML harder to read (by humans).


Q2. What is a parser and what does it have to do with validity?
XML parsers can be classified as either "validating" or "non-validating" depending on the checks they perform on an XML.  Non-validating parsers simply check the XML syntax to determine whether or not the XML document is well formed.  Validating parsers on the other hand go one step further by checking the validity of the XML document against a schema (such as a DTD).  Validating parsers ensure that the XML document is both well formed and valid.


Q3. You write a .dtd file to accompany a class of XML documents.  You want one of the elements, with the tag <trinity>, to appear exactly three times within the document element of every document in this class.  Is it possible for the .dtd file to specify this?
Unfortunately no.  DTDs can specify whether an element appears
  • exactly once
  • zero or one times
  • zero or many times
  • one or many times
but cannot specify that an element appears exactly n times within the document/element.

Longer Question
This question is a continuation on last post's "long" question number 2, where we are given the contents of chapter 2 of the book "Toba: The worst volcanic eruption ever".  We were required to:
  • Write a suitable prolog for the document (chapter 2);
  • Modify the book's .dtd file to cater for the new tags introduced in this document;
  • Put suitable tags into the document, to identify special pieces of text such as a poem or words that would feature in the book's index. 
  • Add pictures to the document at appropriate places.

A suitable prolog for this document would be the following:
        <?xml version="1.0" encoding="UTF-8"?>
        <!-- Chapter 2: Volcanic Winter-->
        <!DOCTYPE text SYSTEM "Lab07_book.dtd">
    

The DOCTYPE declaration specifies that the .dtd file used to validate this document is "Lab07_book.dtd" (the same used by the entire book), identifying the "text" element as the document (root) element.
The "text" element from last post's .dtd file is modified to include the new elements being introduced this week.  Furthermore, new entities pointing to the images that will be added to the document are also declared.  The resulting dtd - Lab07_book.dtd - is as follows:
        <?xml version="1.0" encoding="UTF-8"?>
        <!NOTATION jpg PUBLIC "image/jpeg">
        <!ENTITY Sumbawa SYSTEM "sumbawa.jpg" NDATA jpg>
        <!ENTITY LakeGeneva SYSTEM "Geneva1816.jpg" NDATA jpg>
        <!ENTITY MaryShelley SYSTEM "MaryShelley.jpg" NDATA jpg>
        <!ENTITY pub "STC Press, Malta">
        <!ENTITY chap1 SYSTEM "chap1.xml">
        <!ENTITY chap2 SYSTEM "chap2.xml">
        <!ENTITY chap3 SYSTEM "chap3.xml">
        <!ELEMENT book (titlePage, titlePageVerso, contents, chapter+)>
        <!ELEMENT titlePage (bookTitle, author+, publisher)>
        <!ELEMENT bookTitle (#PCDATA)>
        <!ELEMENT author (#PCDATA)>
        <!ELEMENT publisher (#PCDATA)>
        <!ELEMENT titlePageVerso (copyright, publishedBy, ISBN)>
        <!ELEMENT copyright (#PCDATA)>
        <!ELEMENT publishedBy (#PCDATA)>
        <!ELEMENT ISBN (#PCDATA)>
        <!ELEMENT contents (chapterName+)>
        <!ELEMENT chapterName (#PCDATA)>
        <!ATTLIST chapterName number CDATA #REQUIRED>
        <!ELEMENT chapter (text)>
        <!ATTLIST chapter number CDATA #REQUIRED name CDATA #REQUIRED>
        <!ELEMENT text (paragraph+)>
        <!ELEMENT paragraph (#PCDATA|image|indexEntry|poem)*>
        <!ELEMENT image EMPTY>
        <!ATTLIST image source ENTITY #REQUIRED caption CDATA #REQUIRED>
        <!ELEMENT indexEntry (#PCDATA)>
        <!ELEMENT poem (verse+)>
        <!ELEMENT verse (#PCDATA)>
    

Line 24 specifies that the "text" element must contain one or more "paragraph" elements.  A "paragraph" element can contain a mix of (0 or many of each) parsed character data, "image", "indexEntry" and "poem" elements.
The "image" element is an empty element having two attributes, one for the name of the entity pointing to the corresponding picture and another for the picture's caption (Lines 26 & 27).

The "poem" element must contain one or more "verse" elements (Line 29) containing parsed character data.  
The resultant XML file for chapter 2 of the book (chap2.xml) is the following:
<?xml version="1.0" encoding="UTF-8"?>
        <!DOCTYPE text SYSTEM "Lab07_book.dtd">
        <text>
           <paragraph>
              A volcanic winter is very bad news.  The worst eruption  in recorded history happened at <indexEntry>Mount Tambora</indexEntry> in 1815. It killed about 71 000 people locally, mainly because the <indexEntry>pyroclastic flows</indexEntry> killed everyone on the island of <indexEntry>Sumbawa</indexEntry> and the tsunamis drowned the neighbouring islands, but also because the ash blanketed many other islands and killed the vegetation.<image source="Sumbawa" caption="Sumbawa, after the volcanic eruption"/> It also put about 160 cubic kilometres of dust and ash, and about 150 million tons of sulphuric acid mist, into the sky, which started a volcanic winter throughout the northern hemisphere. The next year was <indexEntry>the year without a summer</indexEntry> . No spring, no summer – it stayed dark and cold all the year round. This had its upside. In due course, all that ash and mist in the upper atmosphere made for some lovely sunsets, and Turner was inspired to paint this. 
           </paragraph>
           <paragraph>
              <image source="LakeGeneva" caption="Lake Geneva, during the summer of 1816"/>
              <indexEntry>The Lakeland poets took a holiday at Lake Geneva</indexEntry>, and the weather was so horrible that Lord Byron was inspired to write this.
              <poem>
                 <verse>The bright sun was extinguish'd, and the stars</verse>
                 <verse>Did wander darkling in the eternal space,</verse>
                 <verse>Rayless,and pathless, and the icy earth</verse>
                 <verse>Swung blind and blackening in the moonless air; </verse>
                 <verse>Morn came and went – and came, and brought no day.</verse>
              </poem>
           </paragraph>
           <paragraph>
              <image source="MaryShelley" caption="Mary Shelley, author of Frankenstein"/>
              Mary Shelley was inspired to write Frankenstein. The downside was that there were <indexEntry>famines</indexEntry> throughout Europe, India, China and North America, and perhaps 200 000 people died of starvation in Europe alone.
           </paragraph>
        </text>
    

Readmore...