Pages

Banner 468

Thursday, 8 December 2011

CMT3315 Lab 07 - DTD 3

0 comments
 
Quick Questions
Q1. People who prepare XML documents sometimes put part of the document in a CDATA section.

  • Why would they do that?
  • How is the CDATA section indicated?
  • If CDATA sections hadn't been invented, would there be any other way to achieve the same effect?

Sometimes, the contents of an XML document might have characters which have a special meaning in XML such as "<", ">" and "&".  When an XML document is being parsed, text between XML tags is also parsed so   including such characters 'as is' will break your XML document.  The XML parser will interpret them as XML syntax when in fact they are only part of the text and should be ignored.  The CDATA section solves this problem by marking a section of text as unparsed character data which the parser will ignore.

A CDATA section is indicated by placing text between the "<![CDATA[" start tag and "]]>" end tag.

Instead of using CDATA, one could also replace the "<", ">" and "&" characters with "&lt;", "&gt;" and "&amp;" respectively to achieve the same effect, a technique known as "escaping".  This option however is more laborious and makes the XML harder to read (by humans).


Q2. What is a parser and what does it have to do with validity?
XML parsers can be classified as either "validating" or "non-validating" depending on the checks they perform on an XML.  Non-validating parsers simply check the XML syntax to determine whether or not the XML document is well formed.  Validating parsers on the other hand go one step further by checking the validity of the XML document against a schema (such as a DTD).  Validating parsers ensure that the XML document is both well formed and valid.


Q3. You write a .dtd file to accompany a class of XML documents.  You want one of the elements, with the tag <trinity>, to appear exactly three times within the document element of every document in this class.  Is it possible for the .dtd file to specify this?
Unfortunately no.  DTDs can specify whether an element appears
  • exactly once
  • zero or one times
  • zero or many times
  • one or many times
but cannot specify that an element appears exactly n times within the document/element.

Longer Question
This question is a continuation on last post's "long" question number 2, where we are given the contents of chapter 2 of the book "Toba: The worst volcanic eruption ever".  We were required to:
  • Write a suitable prolog for the document (chapter 2);
  • Modify the book's .dtd file to cater for the new tags introduced in this document;
  • Put suitable tags into the document, to identify special pieces of text such as a poem or words that would feature in the book's index. 
  • Add pictures to the document at appropriate places.

A suitable prolog for this document would be the following:
        <?xml version="1.0" encoding="UTF-8"?>
        <!-- Chapter 2: Volcanic Winter-->
        <!DOCTYPE text SYSTEM "Lab07_book.dtd">
    

The DOCTYPE declaration specifies that the .dtd file used to validate this document is "Lab07_book.dtd" (the same used by the entire book), identifying the "text" element as the document (root) element.
The "text" element from last post's .dtd file is modified to include the new elements being introduced this week.  Furthermore, new entities pointing to the images that will be added to the document are also declared.  The resulting dtd - Lab07_book.dtd - is as follows:
        <?xml version="1.0" encoding="UTF-8"?>
        <!NOTATION jpg PUBLIC "image/jpeg">
        <!ENTITY Sumbawa SYSTEM "sumbawa.jpg" NDATA jpg>
        <!ENTITY LakeGeneva SYSTEM "Geneva1816.jpg" NDATA jpg>
        <!ENTITY MaryShelley SYSTEM "MaryShelley.jpg" NDATA jpg>
        <!ENTITY pub "STC Press, Malta">
        <!ENTITY chap1 SYSTEM "chap1.xml">
        <!ENTITY chap2 SYSTEM "chap2.xml">
        <!ENTITY chap3 SYSTEM "chap3.xml">
        <!ELEMENT book (titlePage, titlePageVerso, contents, chapter+)>
        <!ELEMENT titlePage (bookTitle, author+, publisher)>
        <!ELEMENT bookTitle (#PCDATA)>
        <!ELEMENT author (#PCDATA)>
        <!ELEMENT publisher (#PCDATA)>
        <!ELEMENT titlePageVerso (copyright, publishedBy, ISBN)>
        <!ELEMENT copyright (#PCDATA)>
        <!ELEMENT publishedBy (#PCDATA)>
        <!ELEMENT ISBN (#PCDATA)>
        <!ELEMENT contents (chapterName+)>
        <!ELEMENT chapterName (#PCDATA)>
        <!ATTLIST chapterName number CDATA #REQUIRED>
        <!ELEMENT chapter (text)>
        <!ATTLIST chapter number CDATA #REQUIRED name CDATA #REQUIRED>
        <!ELEMENT text (paragraph+)>
        <!ELEMENT paragraph (#PCDATA|image|indexEntry|poem)*>
        <!ELEMENT image EMPTY>
        <!ATTLIST image source ENTITY #REQUIRED caption CDATA #REQUIRED>
        <!ELEMENT indexEntry (#PCDATA)>
        <!ELEMENT poem (verse+)>
        <!ELEMENT verse (#PCDATA)>
    

Line 24 specifies that the "text" element must contain one or more "paragraph" elements.  A "paragraph" element can contain a mix of (0 or many of each) parsed character data, "image", "indexEntry" and "poem" elements.
The "image" element is an empty element having two attributes, one for the name of the entity pointing to the corresponding picture and another for the picture's caption (Lines 26 & 27).

The "poem" element must contain one or more "verse" elements (Line 29) containing parsed character data.  
The resultant XML file for chapter 2 of the book (chap2.xml) is the following:
<?xml version="1.0" encoding="UTF-8"?>
        <!DOCTYPE text SYSTEM "Lab07_book.dtd">
        <text>
           <paragraph>
              A volcanic winter is very bad news.  The worst eruption  in recorded history happened at <indexEntry>Mount Tambora</indexEntry> in 1815. It killed about 71 000 people locally, mainly because the <indexEntry>pyroclastic flows</indexEntry> killed everyone on the island of <indexEntry>Sumbawa</indexEntry> and the tsunamis drowned the neighbouring islands, but also because the ash blanketed many other islands and killed the vegetation.<image source="Sumbawa" caption="Sumbawa, after the volcanic eruption"/> It also put about 160 cubic kilometres of dust and ash, and about 150 million tons of sulphuric acid mist, into the sky, which started a volcanic winter throughout the northern hemisphere. The next year was <indexEntry>the year without a summer</indexEntry> . No spring, no summer – it stayed dark and cold all the year round. This had its upside. In due course, all that ash and mist in the upper atmosphere made for some lovely sunsets, and Turner was inspired to paint this. 
           </paragraph>
           <paragraph>
              <image source="LakeGeneva" caption="Lake Geneva, during the summer of 1816"/>
              <indexEntry>The Lakeland poets took a holiday at Lake Geneva</indexEntry>, and the weather was so horrible that Lord Byron was inspired to write this.
              <poem>
                 <verse>The bright sun was extinguish'd, and the stars</verse>
                 <verse>Did wander darkling in the eternal space,</verse>
                 <verse>Rayless,and pathless, and the icy earth</verse>
                 <verse>Swung blind and blackening in the moonless air; </verse>
                 <verse>Morn came and went – and came, and brought no day.</verse>
              </poem>
           </paragraph>
           <paragraph>
              <image source="MaryShelley" caption="Mary Shelley, author of Frankenstein"/>
              Mary Shelley was inspired to write Frankenstein. The downside was that there were <indexEntry>famines</indexEntry> throughout Europe, India, China and North America, and perhaps 200 000 people died of starvation in Europe alone.
           </paragraph>
        </text>
    

Readmore...
Saturday, 26 November 2011

CMT3315 Lab 06 - Character Encoding

0 comments
 

Today's post deals with character encoding and how it can be specified in XML documents.

Character Encoding

Character encoding is the process of converting any character into another form which facilitates its transmission over a telecommunications network or its storage. Early examples of character encoding are Morse code - which converts characters into a series of long and short presses of a telegraph key - and Baudot code a precursor to ASCII. The ASCII - American Standard Code for Information Interchange - character set was developed in the 1960s and uses a series of 8 bits (1 byte) to represent each character. It originally consisted of 128 characters but was later extended with a further 128 characters bringing the total to 256. The ASCII character set includes characters from the English language and many other European languages as well as simple mathematical characters. ASCII remained the most widely used character-encoding method right through the late 2000s and hence, much of the software in use today is designed to process ASCII documents.

Although popular, ASCII has its limitations. For starters it cannot encode documents written in non European alphabets and lacks many other technical, scientific, cultural and artistic symbols. ISO 8859 provides one solution to this problem by providing a number of different character sets allowing software to switch among the sets according to what is needed. An even better solution is to have one, much larger character set that includes as many characters and symbols from as many characters as possible. One such character set is Unicode which currently includes more than 107,000 characters and symbols from over 90 alphabets.

Character Sets in XML

Im XML, you can specify the character encoding by modifying the XML declaration in the document prolog"


<?xml version="1.0" encoding="UTF-8"?>

Here, the XML declaration specifies that the XML document uses Unicode UTF-8 character encoding. In actual fact, UTF-8 is the default character encoding and the parser will assume UTF-8 is being used if no character encoding is specified in the XML declaration.

The following are this weeks questions.

Quick Questions
Q1. What exactly does a DTD do in XML?
A DTD (Document Type Definition) - defines the structure a particular type of XML document should take.  It dictates the elements that should be present in the XML document, their attributes as well as the order in which they should appear in the document.


Q2. You've written an XML document, with the XML declaration '<?xml version="1.0"?>' at the start.  You realise that the text contains some arabic characters.  Which of the following should you do:
  • change the XML declaration to '<?xml version="1.0" encoding="ISO 8859-6"?>'
  • change the XML declaration to '<?xml version="1.0" encoding="UTF-8"?>'
  • do nothing: the declaration is fine as it is.

Although both of the first two options would work, the current declaration is fine as it is.  This is because XML parsers assume that the document encoding is UTF-8 if it is not specified, and UTF-8 contains all the arabic characters.


Q3. Can you use a binary graphics file in an XML document?
Yes.  To do so you must define the image as an external entity, mark it as non-parsable data and define the image format.  You can then assign this entity to an attribute of an empty element.  Here's an example:
<?xml version="1.0" encoding="utf-8"?>
        <!DOCTYPE koala [
        <!ENTITY koalaimage SYSTEM "koala.gif" NDATA gif>
        <!NOTATION gif PUBLIC "image/gif">
        <!ELEMENT koala (image)>
        <!ELEMENT image EMPTY>
        <!ATTLIST image source ENTITY #REQUIRED>
        ]>
        <koala>
        <image source="koalaimage"/>
        </koala>
    

The entity declaration at line 3 defines an entity called "koalaimage" that points to an external file "koala.gif".  The entity is also marked as non-parsable data using the keyword 'NDATA' which is followed by the "gif" format code which is defined at line 4 using the "NOTATION" keyword.  Finally, the "source" attribute of the empty "image"element is set to the name of our image entity, i.e "koalaimage".

Longer Questions
Q1. For this question we were required to produce an XML document and accompanying DTD file for a book entitled "Toba: the worst volcanic eruption of all".  The first three chapters of the book are written as separate XML files, where the text of each is placed between "<text>" and "</text>" tags.  
From the specification given, the structure of this book's XML document can be described with the following diagram:

Given this information a suitable DTD for this specification would be as follows:

<?xml version="1.0" encoding="UTF-8"?>
        <!ENTITY pub "STC Press, Malta">
        <!ENTITY chap1 SYSTEM "chap1.xml">
        <!ENTITY chap2 SYSTEM "chap2.xml">
        <!ENTITY chap3 SYSTEM "chap3.xml">
        <!ELEMENT book (titlePage, titlePageVerso, contents, chapter+)>
        <!ELEMENT titlePage (bookTitle, author+, publisher)>
        <!ELEMENT bookTitle (#PCDATA)>
        <!ELEMENT author (#PCDATA)>
        <!ELEMENT publisher (#PCDATA)>
        <!ELEMENT titlePageVerso (copyright, publishedBy, ISBN)>
        <!ELEMENT copyright (#PCDATA)>
        <!ELEMENT publishedBy (#PCDATA)>
        <!ELEMENT ISBN (#PCDATA)>
        <!ELEMENT contents (chapterName+)>
        <!ELEMENT chapterName (#PCDATA)>
        <!ATTLIST chapterName number CDATA #REQUIRED>
        <!ELEMENT chapter (text)>
        <!ATTLIST chapter number CDATA #REQUIRED name CDATA #REQUIRED>
        <!ELEMENT text (#PCDATA)>
    

From the details given, one could notice that the publisher name will appear more than once in the XML document: in the title page and also in the title page verso.  This makes the publisher name an ideal candidate for an entity, as declared in line 2 of the DTD above.  Furthermore, since the chapters of the book are each stored as a separate XML file, an entity for each chapter was declared (lines 3 to 5) each pointing to the corresponding external file.  The rest of the DTD is quite straight forward, declaring the rest of the elements and attributes.

Here's what the book's XML document looks like:
        <?xml version="1.0" encoding="UTF-8"?>
        <!DOCTYPE book SYSTEM "Lab06_book.dtd">
        <book>
            <titlePage>
                <bookTitle>Toba: the worst volcanic eruption of all</bookTitle>
                <author>John</author>
                <author>Jack</author>
                <author>Jill</author>
                <author>Joe</author>
                <publisher>&pub;</publisher>
            </titlePage>
            <titlePageVerso>
                <copyright>Copyright 2010 STC Press</copyright>
                <publishedBy>&pub;</publishedBy>
                <ISBN>978-0-596-52722-0</ISBN>
            </titlePageVerso>
            <contents>
                <chapterName number="1">The Mystery of Lake Toba's origins</chapterName>
                <chapterName number="2">Volcanic Winter</chapterName>
                <chapterName number="3">What Toba did to the human race</chapterName>
            </contents>
            <chapter number="1" name="The Mystery of Lake Toba's origins">&chap1;</chapter>
            <chapter number="2" name="Volcanic Winter">&chap2;</chapter>
            <chapter number="3" name="What Toba did to the human race">&chap3;</chapter>
        </book>
    

The XML document references the DTD file explained earlier and makes use of the entities declared within that DTD to refer to the publisher name (lines 10 and 14) and the three book chapters (lines 22 to 24).

Readmore...
Saturday, 12 November 2011

CMT3315 Lab 05 - XML Well-formedness & DTDs

0 comments
 

The last post covered the basics of XML syntax and document type definitions. This weeks post is a continuation, answering some more questions related to XML well formedness and DTDs. Where possible, lab questions were reproduced before providing the answer.

Quick Questions
Q1. <:-/> This is a smiley.  Is it also a well-formed XML document?  Say why.

From a structural point of view, an XML document must consist of at least one element, known as the root or document element for it to be well formed.  So in this case, the XML document is structurally well formed as the smiley is our root element.  It is also a properly closed empty element denoted by the "/>" at the end and the name of our element is therefore ":-".  According to the W3C Recommendation "Extensible Markup Language (XML) 1.0 (Fifth Edition)", element names can start with a colon ":" and can contain hyphens "-", so technically the element name is also well formed.  However, its is generally considered good practice to avoid using the colon character as it is reserved for use with namespaces.


Q2. What is the difference between well-formed and valid XML?

A well formed XML document is one which is syntactically correct, i.e. it follows proper XML syntax as defined in the XML 1.0 Fifth Edition W3C Recommendation.  On the other hand, a well formed XML document is not necessarily valid.  In addition to being well formed, an XML document must also follow rules set out in a Document Type Definition (DTD) or XML Schema for it to be valid.

Longer Questions
Q1. For this question, we were required to write a Document Type Definition (DTD) for an XML specification to store information about college textbooks.  The given specification can be described with the following diagram:

Additionally, a chapter is identified by a chapter number and a chapter title and a section is identified by a section number and a section title.  Finally the publisher name will always be "Excellent Books Ltd" and their address will always be "21, Cemetery Lane, SE1 1AA, UK".

Given this information, a suitable DTD  for this specification would be as follows:

<?xml version="1.0" encoding="utf-8"?>
      <!ENTITY pubName "Excellent Books Ltd">
      <!ENTITY pubAddress "21, Cemetery Lane, SE1 1AA, UK">
      <!ELEMENT textbook (titlePage, titlePageVerso, chapter+)>
      <!ELEMENT titlePage (title, author, publisher, aphorism?)>
      <!ELEMENT titlePageVerso (publisherAddress, copyrightNotice, ISBN, dedication*)>
      <!ELEMENT chapter (section+)>
      <!ELEMENT section (bodyText+)>
      <!ATTLIST chapter chapterNo CDATA #REQUIRED chapterTitle CDATA #REQUIRED>
      <!ATTLIST section sectionNo CDATA #REQUIRED sectionTitle CDATA #REQUIRED>
      <!ELEMENT title (#PCDATA)>
      <!ELEMENT author (#PCDATA)>
      <!ELEMENT publisher (#PCDATA)>
      <!ELEMENT aphorism (#PCDATA)>
      <!ELEMENT publisherAddress (#PCDATA)>
      <!ELEMENT copyrightNotice (#PCDATA)>
      <!ELEMENT ISBN (#PCDATA)>
      <!ELEMENT dedication (#PCDATA)>
      <!ELEMENT bodyText (#PCDATA)>
   


Q2. Write an XML document that contains the following information: the name of a London tourist attraction. The name of the district it is in. The type of attraction it is (official building, art gallery, park etc). Whether it is in-doors or out-doors. The year it was built or founded [Feel free to make this up if you don’t know]. Choose appropriate tags. Use attributes for the type of attraction and in-doors or out-doors status.


<?xml version="1.0" encoding="utf-8"?>
<attraction type="Park" indoors="N">
  <name>Hyde Park</name>
  <district>West London</district>
  <yearFounded>1600</yearFounded>
</attraction>


Q3.  This multi-part question is based on an XML document which can be described  with the following diagram (click to enlarge):


Here's a snippet taken from this XML document:

<?xml version="1.0" encoding="utf-8"?>
<phraseBook targLang="Russian">
  <section>
    <sectionTitle>Greetings</sectionTitle>
    <phraseGroup>
      <engPhrase>Hi! </engPhrase>
      <translitPhrase>privEt </translitPhrase>
      <targLangPhrase>Привет!</targLangPhrase>
    </phraseGroup>
     <phraseGroup>
       <engPhrase>Good morning!</engPhrase>
       <translitPhrase>dObraye Utra</translitPhrase>
       <targLangPhrase>Доброе утро!</targLangPhrase>
       </phraseGroup>
      <phraseGroup>
...

a) It’s clear that the XML document is concerned with English phrases and their Russian translations. One of the start tags is <targLangPhrase> with </targLangPhrase> as its end tag. Why do you suppose this isn’t <russianPhrase> with </russianPhrase> ?

The structure of the document suggests that it could very well be used for translating English phrases into other languages and not just Russian.  It would not make much sense to name the "<trgLangPhrase>" with "<russianPhrase>" if the document was in fact translating English phrases into, say, Italian.

b) Write a suitable prolog for this document
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE phraseBook SYSTEM "phraseBook.dtd">

c) Write a .dtd file to act as the Document Type Description for this document

A suitable DTD would be as follows:
<?xml version="1.0" encoding="utf-8"?>
<!ELEMENT phraseBook (section+)>
<!ATTLIST phraseBook targLang CDATA #REQUIRED> 
<!ELEMENT section (sectionTitle, phraseGroup+)>
<!ELEMENT sectionTitle (#PCDATA)>
<!ELEMENT phraseGroup (engPhrase, translitPhrase, targLangPhrase)>
<!ELEMENT engPhrase (#PCDATA|gloss)*>
<!ELEMENT translitPhrase (#PCDATA|gloss)*>
<!ELEMENT targLangPhrase (#PCDATA)>
<!ELEMENT gloss (#PCDATA)>

d) The application that is to use this document runs on a Unix system, and was written some years ago.  Is that likely to make any difference to the XML declaration?

Character encoding might be an aspect that would need to be considered.  Setting the encoding property to "UTF-8" will ensure backward compatibility with older systems that might only support ASCII character set.
Readmore...
Tuesday, 8 November 2011

CMT3315 LAB04 - XML Syntax 2

0 comments
 

Today's post introduces Document Type Definition (DTD) syntax and answers a number of questions relating to XML syntax and DTDs.

Document Type Definition

In last week's post we created the following XML document containing information about a small music collection:

   <?xml version="1.0" encoding="UTF-8"?>
   <DOCTYPE musicCollection>
   <!--Prolog ends here. -->
   <musicCollection>
      <cd index="1">
         <title>Innuendo</title>
         <artist>Queen</artist>
         <tracks>
            <track id="1">Innuendo</track>
            <track id="2">I'm Going Slightly Mad</track>
            <track id="3">Headlong</track>
            <track id="4">I Can't Live With You</track>
            <track id="5">Don't Try So Hard</track>
            <track id="6">Ride The Wild Wind</track>
            <track id="7">All God's People</track>
            <track id="8">These Are The Days Of Our Lives</track>
            <track id="9">Delilah</track>
            <track id="10">The Hitman</track>
            <track id="11">Bijou</track>
            <track id="12">The Show Must Go On</track>
          </tracks>
      </cd>
      <cd index="2">
         <title>(What's The Story) Morning Glory?</title>
         <artist>Oasis</artist>
         <tracks>
            <track id="1">Hello</track>
            <track id="2">Roll With It</track>
            <track id="3">Wonderwall</track>
            <track id="4">Don't Look Back In Anger</track>
            <track id="5">Hey Now!</track>
            <track id="6">Untitled</track>
            <track id="7">Some Might Say</track>
            <track id="8">Cast No Shadow</track>
            <track id="9">She's Electric</track>
            <track id="10">Morning Glory</track>
            <track id="11">Untitled</track>
            <track id="12">Champagne Supernova</track>
          </tracks>
      </cd>
   </musicCollection>

XML uses documents known as schemas to define the structure a particular class (type) of XML document should follow. In our example, Line 2 specifies that this document is of type musicCollection. We can use a Document Type Definition (DTD) to define the legal structure of the musicCollection document type. Looking at our example we can see that every "cd" element is expected to have a "title", an "artist" and a "tracks" element. In turn, the "tracks" element further contains multiple "track" elements. The number of occurences of an element is known as it's Cardinality. In a DTD, we can define what which elements are expected as well as their attributes, sequence and cardinality.

Using this information we can write our DTD, starting from the root element:

<!--musicCollection.dtd-->
<!ELEMENT musicCollection (cd+)>

This is the first line in our DTD document. It specifies that the "musicCollection" document element is made up of one or more "cd" elements. The "+" sign after "cd" specifies the "one or more" cardinality of the "cd" element. Other Cardinality specifiers include:

  • ? - means 0 or 1
  • * - means 0 or 1 or more
  • no specifier means "exactly 1"

Let's define the "cd" element next:

<!--musicCollection.dtd-->
<!ELEMENT musicCollection (cd+)>
<!ELEMENT cd (title, artist, tracks)>
<!ATTLIST cd index CDATA #REQUIRED>

The "cd" element is made up of the "title", "artist" and "tracks" elements. Given that each of these must appear only once in a "cd" element, no cardinality specifier was necessary. Additionally, the "cd" element has an "index" attribute which is defined at line 4. We also specify that the "index" attribute is made up of character data (CDATA) and is required (#REQUIRED). CDATA is not treated as markup by the XML parser and will not be parsed. Let's add the rest of our elements:

<!--musicCollection.dtd-->
<!ELEMENT musicCollection (cd+)>
<!ELEMENT cd (title, artist, tracks)>
<!ATTLIST cd index CDATA #REQUIRED>
<!ELEMENT title (#PCDATA)>
<!ELEMENT artist (#PCDATA)>
<!ELEMENT tracks (track+)>
<!ELEMENT track (#PCDATA)>
<!ATTLIST track id CDATA #REQUIRED>

Our DTD is now complete. Each of the "title", "artist" and "track" elements are defined as containing parsed character data (#PCDATA). In contrast to CDATA, PCDATA is text that will be parsed by the parser. We also specify that the "tracks" element should contain multiple "track" elements and that every "track" element must have an "id" attribute. All we need to do now is reference our DTD from the XML document. Assuming that our DTD file is called "musicCollection.dtd" and resides in the same directory as the XML file, the following modification to the XML's document type declaration will reference the DTD:

<DOCTYPE musicCollection SYSTEM "musicCollection.dtd">
Summary

DTDs can be declared inside of the XML document itself or as external files and are a simple yet effective way of defining and validating the structure of XML documents.

The following are the questions for this week's lab session

Quick Questions
Q1. What does XML stand for?  And CSS?

XML stands for Extensible Markup Language and CSS stands for Cascading Style Sheets.

Q2. Is this XML line well-formed? Say Why.

<b><i>This text is bold and italic </i></b>

Yes this line is well formed.  Both start tags match their end tags and are properly nested.

Q3. Is this XML document well-formed? Say why.

<?xml version= "1.0" ?>
<greeting>
Hello, world!
</greeting>
<greeting>
Hello Mars too!
</greeting>
No. This XML document is not well formed as it does not have a root element.

Longer Questions
Q1. Write an XML document that contains the following information:
  • The name of this course;
  • The name of this building;
  • The name of this room;
  • The start and end times of this session.
Choose appropriate tags.  Use attributes for the start and end times.
<?xml version= "1.0" ?>
<course>
 <name>CMT 3315 Advanced Web Technologies</name>
 <buildingName>STC Training</buildingName>
 <roomName>Room 5</roomName>
 <session startTime="18:00" endTime="21:00"/>
</course>



Q2. Identify all the syntax errors in the following XML document:

<?xml version= "1.0" ?>
<!DOCTYPE bookStock SYSTEM "bookstock.dtd">
<bookstore>
  <book category="Cooking">
    <title lang="en">Everyday Italian</title>
    <author>Giada De Laurentiis</author>
    <1stEdition>2005</1stEdition >
    <2ndEdition>2007</2ndEdition >
    <price>19.99</price currency="pounds sterling">
  </book>
  <book category="Children’>
    <title lang="en">Harry Potter and the enormous pile of money</title>
  <!—best selling children’s book of the year --2009 -->
    <author>J K. Rowling</author>
   <1stEdition>2005</1stEdition>
    <price>29.99</Price>
  </book>
  <book category="Web">
    <title lang="en">Learning XML</title>
    <author>Erik T. Ray</author>
   <1stEdition>2003</1stEdition>
   <2ndEdition >2008</2ndEdition >
    <price>29.95</discount>
    <discount>15%</price>
  </book>
  <book category="Computing">
    <title lang=en>Insanely great – the life and times of Macintosh, the computer that changed everything </title>
    <author <!—other authors not listed -->>Steven Levy</author>
   <1stEdition>1994</1stEdition>
    <price>9.95</discount>
    <discount>15%</price>
  </book>

The XML document contains various syntax errors including:
  • The root node should be called "bookStock" as specified in the DOCTYPE declaration. There is also no matching closing tag;
  • The "1stEdition" and "2ndEdition" element names are invalid as names cannot start with a number (lines 7, 8, 15, 21, 22, 29);
  • Attribute placed in the end tag at line 9. This should be placed in the start tag;
  • Mismatching quote at line 11;
  • Incorrect comment start tag and extra "--" in comment on line 13;
  • Mismatching start and end tags at lines 16, 23, 24, 30 and 31;
  • Missing quotes for attribute value at line 27;
  • Comment incorrectly placed within the start tag at line 28.  Comment start tag is also incorrect;


Q3. You are asked to produce a Document Type Declaration for a class of XML documents called “memo”. You come up with this .dtd file:

<!DOCTYPE memo
[
<!ELEMENT memo (to,from,heading,body)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
]>


Your client says “That’s all very well, but every memo has to have a date. And some of them have to have a security classification too (you might want to write “Secret” at the top). And a memo has a serial number – I think that’s what you’d call an attribute, isn’t it?” How would you amend this .dtd file so that it did what the client wanted?

A suitable DTD to fulfill these requirements would be as follows:

<!DOCTYPE memo
[
<!ELEMENT memo (date,to,from,heading,body,classification?)>
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA)>
<!ELEMENT heading (#PCDATA)>
<!ELEMENT body (#PCDATA)>
<!ELEMENT date (#PCDATA)>
<!ELEMENT classification(#PCDATA)>
<!ATTLIST memo serialNo ID #REQUIRED>
]>

Given that not all memos will have a security classification, the DTD uses the "?" cardinality specifier to indicate that the "memo" element can contain zero or one "classification" element.

Other specifiers include:

  • "*" - means zero or one or more of the element is allowed
  • "+" - means that one or more of the element is allowed
Finally, attaching none of the cardinality specifiers to an element name means that the element must appear exactly once.
Readmore...
Thursday, 3 November 2011

CMT3315 Lab 03 - XML Syntax 1

0 comments
 

Today's post introduces the basics of XML syntax and answers a number of questions related to this topic.

XML document structure & syntax

The basic idea behind XML is to produce documents whose structure can be understood by software applications. In an XML document, pieces of text that have special meaning are marked up using tags. A tag is simply a word between angle brackets such as "<name>". An XML document is made up of 3 parts:

  • A prolog (optional);
  • The document or root element;
  • Other miscellaneous content following the root element end tag.(optional)

The prolog is an optional component of an XML document which, if included, must appear before the root element. The prolog consists of an XML declaration, which defines the XML version and character encoding being used, and a Document Type Declaration (DTD). The prolog may also contain comments. Here's an example of an XML prolog:

   <?xml version="1.0" encoding="UTF-8"?>
   <!--This is a comment-->
   <DOCTYPE musicCollection"<

Lines 1 and 2 are the XML declaration and a comment respectively. Line 3 is the Document Type Declaration which provides the name of the document element "musicCollection". The document or root element follows the prolog. An XML document must have only one root element which in many cases contains a heirarchy of other elements. For instance, let's assume that our music collection is made up of a number of CDs where each CD has a title, an artist and a number of tracks. A suitable XML document to store this information would be:

   <?xml version="1.0" encoding="UTF-8"?>
   <DOCTYPE musicCollection>
   <!--Prolog ends here. -->
   <musicCollection>
      <cd index="1">
         <title>Innuendo</title>
         <artist>Queen</artist>
         <tracks>
            <track id="1">Innuendo</track>
            <track id="2">I'm Going Slightly Mad</track>
            <track id="3">Headlong</track>
            <track id="4">I Can't Live With You</track>
            <track id="5">Don't Try So Hard</track>
            <track id="6">Ride The Wild Wind</track>
            <track id="7">All God's People</track>
            <track id="8">These Are The Days Of Our Lives</track>
            <track id="9">Delilah</track>
            <track id="10">The Hitman</track>
            <track id="11">Bijou</track>
            <track id="12">The Show Must Go On</track>
          </tracks>
      </cd>
      <cd index="2">
         <title>(What's The Story) Morning Glory?</title>
         <artist>Oasis</artist>
         <tracks>
            <track id="1">Hello</track>
            <track id="2">Roll With It</track>
            <track id="3">Wonderwall</track>
            <track id="4">Don't Look Back In Anger</track>
            <track id="5">Hey Now!</track>
            <track id="6">Untitled</track>
            <track id="7">Some Might Say</track>
            <track id="8">Cast No Shadow</track>
            <track id="9">She's Electric</track>
            <track id="10">Morning Glory</track>
            <track id="11">Untitled</track>
            <track id="12">Champagne Supernova</track>
          </tracks>
      </cd>
   </musicCollection>

Above we can see that our "musicCollection" root element contains two "cd" elements which in turn contain further elements describing the CD. We can also see that every start tag has a matching end tag (e.g. "<cd>" and "</cd>"). XML elements can have zero, one or more child elements and all (except for the root element) must have a parent element. Furthermore, XML elements must be correctly nested for the document to be valid. Elements in XML can also contain attributes and when present, these should be placed withn the element's start tag. In our example, every "cd" element has an attribute called "index" and every "track" element has an attribute called "id". XML attributes can be thought of data describing data, generally (but not necessarily) used for storing ID's. An attribute is made up of a name, followed by an "=" sign and a value within quotes, which may be single or double as long as they match. We can use this example as a basis for discussing the next section:

Well Formedness

A well formed XML document is one that follows proper XML syntax. Unlike most HTML parsers, XML parsers expect the document to be well formed and will stop processing the document if any syntax errors are found. An XML document must be structured as discussed in the previous section and all element and attribute names must be valid. The first character in a name must be either a letter ([A-Z][a-z]) a colon ":" or an underscore "_" and the name cannot start with the letters "xml". The rest of the characters can also include numbers ([0-9]), dashes "-" and fullstops ".".

XML is case-sensitive so using our example above, the element "<track>" is not the same as the element <Track>. Furthermore, end tags must match their start tags. In some cases, XML elements will not contain any information. In such cases the end tag may be replaced by a "/" at the end of a start tag, for example:

 <emptyElement> index="0" />
 <!-- is equivalent to -->
 <emptyElement> index="0"><emptyElement>

These are the basic rules to follow to create a well formed XML document.

Lab Questions
Q1. Write an XML document that contains the following information:
  • Your name;
  • Your email address;
  • Your student number;
  • Your home town;
  • Your date of birth.
Choose appropriate tags. Use attributes for the date of birth.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE student>
<student dateOfBirth="01/01/1990">
 <name>Wayne Zammit</name>
 <email>wayne@somedomain.com</email>
 <studentNo>NN123</studentNo>
 <homeTown>St. Julians</homeTown>
</student>


Q2. Identify all the syntax errors in the XML document below:

<?xml version= "1.0" ?>
<!DOCTYPE countryCollection SYSTEM "countryList.dtd">
<CountryList>
<Nations TotalNations ="3"/>
<!--Data from CIA --Year Book -->
 <Country CountryCode="1"> 
  <OfficialName>United States of America</officialName>
  <Label>Common Names:</label>  
  <CommonName>United States</commonName>
  <CommonName>U.S.</commonName>
  <Label>Capital:</capital>  
  <Capital cityNum="1">Washington, D.C. </label>
  <2ndCity cityNum="2">New York </2ndCity> 
  <Label>Major Cities:</label> 
  <MajorCity cityNum="3">Los Angeles </majorCity>
  <MajorCity cityNum="4">Chicago </majorCity>  
  <MajorCity cityNum="5’>Dallas </majorCity>  
  <Label>Bordering Bodies of Water:</label>    
  <BorderingBodyOfWater> Atlantic Ocean </borderingBodyOfWater>
  <BorderingBodyOfWater> Pacific Ocean </borderingBodyOfWater>  
  <BorderingBodyOfWater> Gulf of Mexico </borderingBodyOfWater> 
  <Label>Bordering Countries:</label>   
  <BorderingCountry CountryCode="1"> Canada </borderingCountry>    
  <BorderingCountry CountryCode ="52"> Mexico </borderingCountry>
</country>
 <Country CountryCode="81">
  <OfficialName> Japan </officialName>
  <Label>Common Names:</label>    
  <CommonName> Japan </commonName>
  <Label>Capital:</label>  
  <Capital>Tokyo</capital cityNum="1">
  <2ndCity cityNum="2">Osaka </2ndCity>
  <Label>Major Cities:</label>    
  <MajorCity cityNum="3">Nagoya </majorCity>
  <MajorCity cityNum="4">Osaka </majorCity>  
  <MajorCity cityNum="5’>Kobe </majorCity>  
  <Label>Bordering Bodies of Water:</label>
  <BorderingBodyOfWater>Sea of Japan </borderingBodyOfWater>
  <BorderingBodyOfWater>Pacific Ocean </borderingBodyOfWater>  
 </country>
 <Country CountryCode="254">
  <OfficialName> Republic of Kenya </officialName>
  <Label>Common Names:</label>    
  <CommonName> Kenya </commonName>
  <Label>Capital:</label>  
  <Capital cityNum=’1’>Nairobi </capital>
  <2ndCity cityNum=’2’>Mombasa</2ndCity>
  <Label>Major Cities:</label>    
  <MajorCity cityNum=’3’>Mombasa </majorCity>
  <MajorCity cityNum=’4’>Lamu </majorCity>
  <MajorCity cityNum=’5’>Malindi </majorCity>  
  <MajorCity cityNum=’6’ cityNum=’7’>Kisumu-Kericho </majorCity> 
  <Label>Bordering Bodies of Water:</label>
  <BorderingBodyOfWater <!--Also Lake Victoria --> > Indian Ocean </borderingBodyOfWater>
 </country> 
The XML document contains various syntax errors including:

  • The root node should be called "<countryCollection>" as specified in the DOCTYPE declaration.  There is also no matching closing tag;
  • The comment at line 5 contains an extra "--" between its start and end tags.  This is not allowed;
  • Start tags do not match end tags (different letter-casing) and in some instances (eg lines 11 and 12) end tags have been swapped;
  • The "2ndCity" element name is invalid as names cannot start with a number (lines 13, 32, 47);
  • Mismatching quote at lines 17 and 36;
  • Attribute placed in the end tag at line 31.  This should be placed in the start tag;
  • Single quotes used to enclose attribute values at lines 46 to 52;
  • Duplicate attribute "cityNum" at line 52;
  • Comment placed within the start tag at line 54;


Readmore...

CMT3315 LAB 02 - XML vs HTML

0 comments
 

In this post we will be having a look at the similarities and differences between XML and HTML. As discussed in my previous post XML is a subset of SGML designed to make the knowledge structure of a document known to a software package. In essence, it enables a software package to "understand" the structure of a document. HTML was also derived from SGML but appeared before XML.

HTML

HTML is made up of a standard set of tags specifically designed to create web pages meant to be understood and displayed by web browsers. Soon after its original release, HTML became very popular very quickly and web pages were being used to accomplish things that they weren't designed to do. Initally, the way an HTML document was interpreted and displayed was left entirely up to the browser. This started creating problems to web designers who wanted their pages to render the same across browsers, so HTML started being extended with other tags that defined presentation such as "<font>". These type of tags go directly against the original concept of SGML which was to separate content from layout. The browser wars did little to help the situation, and in fact made it even worse. Fierce competition gave rise to proprietary tags, and browsers became tolerant to badly written HTML documents which in the end badly hinders programmatic interpretation of web pages.

XML

A major advantage of HTML is its simplicity but it is also one of its biggest weaknesses. XML was born out of the need for having something that was simpler than SGML but alsostricter than HTML. XML tags are user-defined, which means that when creating an XML document you are defining your own standard for structuring that particular type of document. That is why XML is extensible. Every time you define a new set of tags you are effectively defining a new markup language! In fact, just like SGML, XML is itself a framework for defining markup languages. Its main objective is providing the ability to structure information in such a way as to enable any system on any platform to process that information.

XML gave rise to XHTML, a stricter version of HTML based on XML rules. Differences between HTML and XHTML include:

  • unlike HTML, XHTML documents must be well-formed XML documents;
  • XHTML is case-sensitive for element and attribute names, HTML is not;
  • Attribute minimisation (omitting the "=" sign and value) in XHTML is not allowed.

There are many more differences, most of which would require a separate blog post to explain, but the main idea is that XHTML addresses the limits and weaknesses in the original HTML specification because it is based on XML.

Summary

HTML (1990) was derived from SGML (1986) as a simpler markup language for creating hyperlinked web documents. XML (1998) is a specification for defining markup languages simpler than SGML from which it is itself derived. Its main objective is to facilitate portability of data across multi-platform systems. In 2000 XML gave rise to XHTML, a reformulation of HTML aimed at adressing limits in the latter's original specification.

Readmore...

CMT3315 Intro - XML, What it is and where it came from

0 comments
 
Given the summer recess, it has been a while since my last post but it's time to pull the proverbial socks up and get back to work.  During this semester, the blog will be focusing mostly on the eXtensible Markup Language - XML for short - and there's a lot to cover, so let's get right to it:

First Things First
So what is XML? Before answering that question one must understand where XML comes from and more importantly its purpose.  To do that we need to take a look at the early days of computing, back in the 1960's, where the idea of storing documents in computers in such a way as to make them "understandible" to software, was still fresh in the minds of researchers.

GML
Charles Goldfarb, Edward Mosher and Raymond Lorie, three researchers working for IBM came up with GML, a set of macros that implemented mark-up tags to describe the logical structure of a document.  Interestingly, GML is known today as "Generalized Markup Language" but originally the acronym stood for the researcher's surnames! 

SGML
GML was eventually extended by Goldfarb in the mid 80s into the Standard Generalized Markup Language (SGML).  SGML was designed to make it possible for large entities (such as government) to share machine-readable electronic documents.  The main idea was to embed a set of tags within a document which a software package could use to derive information about that document.  For this to work, the set of tags had to be standardised, made publicly available and most importantly be platform independent.  SGML is a very large and complex markup language, suitable for storing equally large and complex documents but rather cumbersome for use with smaller, simpler documents.  The beauty of SGML however is that it can be used to derive smaller, simpler mark-up languages better suited for smaller documents.  Such languages include HTML (Hyper-Text Markup Language) and of course XML.

XML
Going back to the original question: "What is XML?", XML is an extensible markup language used to encode documents in such a way as to make the knowledge structure of a document known to a software package.  It separates the actual content of a document from it's structure and presentation.  Being a subset of SGML it is much simpler to use and is ideally suited to store, and more importantly transport, electronic documents.  It's most common use today is to transmit data between applications, irrespective of the platform.  The beauty of XML is that you define your own tags which means that any document structure can be described in XML.

XML became a W3C Recommendation on February 10, 1998 (w3schools.com) and the first web browser to support XML was Microsoft's Internet Explorer 4.0.

Next time, I will be comparing XML to HTML.  In the mean time, have a look at this post from my blog for more on XML.

Readmore...
Tuesday, 19 July 2011

Up in the Cloud

0 comments
 
Surely, one of the most popular buzz words since Web 2.0 has to be "Cloud Computing".  The trouble is that there are many different schools of thought on what the Cloud is or should be.  "Platform as a Service", "Software as a Service", "on-demand computing" and "Internet as a platform" are just some of the phrases used to define cloud computing.  However you wish to call it, to me Cloud Computing is about history repeating itself, the next phase of a cycle.  We're shifting software and data back from our desktop PCs to remote systems known collectively as "The Cloud"

Think about it, back in the beginning of computing, software ran on mainframes and super (for the time) computers and accessed through dumb terminals (thin clients) in a hub and spoke fashion.  Fast forward a couple of decades and the personal computer is introduced, full of promise that software could now be installed on client machines.  This eventually led to the client-server model, where fat-clients running software locally accessed data residing on central servers. Today, the focus is shifting away from the client back to the server which is now provided as a service and resides on the cloud.  This does not mean that our clients are becoming thin again, quite the opposite - today's smart phones for instance really pack a punch hardware wise - however the applications themselves are being offered over the internet with no local installation necessary.

What's Great about The Cloud 

For starters you don't need to worry about software installations, compatibility/ and updates. Cloud applications are offered as a service and accessed through the browser without the need for local installation.  Another perk is the ability to be able to access your data and apps from anywhere, using any workstation. Since all your data and apps reside on the cloud, they can be accessed from anywhere greatly increasing your mobility. 

Scalability is also another plus point.  Businesses can request more processing power or storage on demand as required but can also scale back down again during off-peak/slow business periods, freeing up resources and lowering costs because you generally only pay what you use.  So the scalability offered by the cloud is also very flexible.

Cost considerations are somewhat a double edged sword but what's certain is that if you opt for the cloud, your initial set-up costs are dramatically reduced.  You will not be incuring costs for acquiring servers and no servers means no data centres with their high electricity bills.  Maintenance and operational costs are also considerably reduced.
Your business can also focus more on strategic objectives because there is much less time spent on deployment, maintenance and operations.

So, What's not to Love

The biggest issues with cloud computing relate to service reliability/accessibility, security and privacy, or at least how they are perceived with respect to the cloud.  Businesses that have their own data centres and have been sitting on their data since starting up will find it very hard to accept that moving that data to a 3'rd party's data centre might be the better option.  The perceived loss of control in most cases overshadows the benefits gained by moving to the cloud. 

Unfortunately, some of these concerns are well founded.  How reliable is the service being provided?  What kind of impact would temporary service unavailability have on your business?  Can a business make sure that it can completely delete any of its data?  What degree of control does the service provide?  What happens to your data if you decide to switch providers?  These questions are generally hard to answer.  Although the idea behind cloud computing is fairly old, its implementation was not possible until very recently and legislation in respect of cloud computing is generally weak.  To complicate things further, your data could be stored in different countries and different countries have different laws.  There is still a way to go as far as legislation is concerned until we can start answering these questions properly

Looking ahead

Legislation and perceptions notwithstanding, cloud computing is flourishing and seems to be the way forward.  Office apllications such as Google Docs and Microsoft Office 365 are good examples of successful cloud services.  CRMSs, online storage systems (Skydrive, DropBox etc) and other applications such as photo editing packages are all being offered as cloud services.

The real challenge will be the culture change required for businesses to shift their operations to the cloud.

Readmore...
Thursday, 14 July 2011

Social What?

1 comments
 
Social Networks have been around since before the internet itself.  Ever since computers could connect to each other, so did the users at each end.  The social networking heavyweights of today such as Facebook, Twitter and Google+ are just latest products of a long (and vicious) evolution of the genre.  I’ve always had this sort of love-hate relationship with social media and as the products evolve so too are my feelings evolving into a love-to-hate attitude.

The Evolution of the Social Network

The roots of social media can be traced back to the late 70’s with the birth of the Usenet system which let users read and post messages/articles to one or more categories (newsgroups), similar to modern-day forums.  Usenet systems sparked the development of newsreader clients which are themselves the precursors of today’s RSS feed readers.

Shortly after Usenet, the first Bulletin Board Systems (BBSs) started coming online which were initially hosted on personal computers.  Users could dial-in on the computer’s modem to gain access to the BBS and could leave messages or upload files on the host computer.  A major drawback of these systems was the fact that the number of concurrent users accessing the BBS was limited by the number of phone lines available.  This generally meant that only one user at a time could access the BBS.

Next came online services and Instant Messaging.  The pioneer of instant messaging was the (initially) UNIX-based Internet Relay Chat (IRC), developed in 1988.  Users could send messages and share files through IRC in real time, something which was not possible on BBSs.  IRC also introduced the idea of chat ‘channels’ and private-messaging and is still somewhat in use today.  Although IRC clients are now available for most platforms, the first instant messaging programs for PCs was ICQ which proved extremely popular when first released.  SPAM and privacy issues however all but killed it off in the US and Western Europe, but despite its shortcomings, it remains popular in Eastern Europe and Russia.

Some consider dating sites as the birth of social networks but the first real modern-day social network was Six Degrees, launched in 1997.  Unlike its contemporaries, Six Degrees allowed its users to create comprehensive profiles and add people as friends - essential features of a modern social network.   There were/are numerous others: LiveJournal, Hi 5, LinkedIn, MySpace and of course FaceBook and Twitter.  Google is also running beta tests of Google+, which promises to be the next big thing in social networking.

Why so many? From the outset it was apparent that social media was a powerful thing, but social media projects are also some of the most volatile out there.  Six Degrees was launched in 1997 and  perished in 2001, but went out of fashion even before that.  In just 4 years, it went from launch to shut-down, such is the vicious nature of the social media phenomenon.  Sites like Hi 5 and MySpace also lost most of their popularity.  Why?  Users are quick to switch from one system to the next just to gain that extra feature or keep with the latest trends.  Loyalty means next to nothing in this environment, it's all about trends and the next big thing.  Failure to stay ahead of the game (and keep users interested) spells the loss of your user-base in a matter of months if not weeks.  

My Perspective


There's no denying Social Networks are great but I'm hardly what you could call a fan of the things.  I don't have a facebook or twitter account and I only have a Google+ account because Google created it for me!  Funny thing is it's hard for me to explain why I'm not inclined to have such accounts.  It's got nothing to do with age, profession or social standing - social networks are made up of people from all walks of life.  I guess privacy issues are a part of the problem.  Personally, I never feel the need to share a photograph of myself with the rest of the world - no matter how 'private' a system portrays itself to be - but that is still something I can control, it's my choice.  What gets me is the fact that you may feature in someone else's photograph, even in the background, someone recognizes you and suddenly you're tagged - which is something you have no control over.  It's also hard to be sure who's on the other side unless of course chatting live using a web cam.    

I'm also perplexed at how people can be so reserved in person but all inhibitions are out the window the minute they're on-line. This behavior begs another question.  Why do we find it so easy (and feel so uninhibited) in sharing personal information on the net but then find it so hard to open up to people (in person) and build relationships?

Another issue that bothers me is the fact that these systems tend to become addictive.  I have plenty of friends that cannot go an hour without checking their accounts let alone a day.  I vividly remember an episode which at the time made me laugh:  I was giving a course on web design to a class of teenage students some time ago.  Come lunch time of the first day I simply said "You may go outside and have your lunch now" and the whole class instantly loaded up their browsers and logged onto Facebook.  Some of them took out their lunch and to eat and when I pointed out that no food or drink was allowed in the lab, most of them just put their lunches back in their bags!   This happened for the rest of the one-week course.  Most of the class never bothered to eat.

It's not all bad however.  Social networks are a great way of keeping in touch with distant loved ones for example or long lost friends.  There is no denying that they take communication to a whole other level.  But at what cost?  Maybe I'm being paranoid but aren't these companies in a position to build complete profiles of their user base, habits, likes, dislikes, personal histories... personalities even?  Given today's powerful data analysis methods it's not hard to imagine how these companies can exploit this data - which we are all too happy to supply - to their advantage.  Personally, I find that invasive and too high a price for the sake of keeping in touch.

Conclusion

I guess it's just me against the world :)
Readmore...
Sunday, 3 July 2011

Mobile Devices and Geolocation

0 comments
 
The topic of mobile phone development is a minefield of myths, false truths and misconceptions. In this post I will try to clear things up a bit as much as I can and also take a look at how to exploit geolocation from our mobile devices.

The Mobile Web

One of the most common misconceptions I come across is that there is no need for a website to adapt to a mobile device. The reasoning behind this idea is the fact that the internet is platform independent and that the browser should do the dirty work. When faced with such an argument I tend to respond with another: “If the website will not adapt to a mobile device, can that mobile device adapt to your website?” The typical response I get is: “Sure, why not? You might need to scroll a bit more but it looks fine.” Let’s set the screen resolution topic aside for a moment and consider the following example: Let’s say your website uses cascading menus which open up when the user hovers over them with the mouse. Let’s also say that we are going to browse the website using something like an iphone or similar multi-touch smartphone. Will that menu work? Can you “hover” over the menu using a touch screen? Not to my knowledge, so one of your more important features on your homepage is lost on a mobile device. And this is not some mobile phone from the late 90’s we’re talking about, but a latest-generation smartphone running a proper HTML browser. It’s not that the phone is not willing to adapt to the site, it simply can’t.

Screen resolution is another factor. I remember back at the beginning of my career as a developer (late 90’s) we had desktop machines that boasted an 800x600 screen resolution (1024x768 if you were lucky) while our end users only managed 640x480. It was company policy however that all the software we produced had to target 640x480, irrespective of what resolutions we were able to push. Today, the sheer variety of devices out there pushing all sorts of different resolutions is quite staggering and scary at the same time, but makes it even more important for developers to be aware of screen real estate and its effect on user experience.

Another myth I sometimes hear is that people are not using their mobile phones to browse the web. This might be true of people using ‘classic’ phones but it would be naïve (or rather stupid) to think that smartphone users (which are on the rise) aren’t. We as end-users still want to use the same websites we use on our desktop machines on our mobile devices, it’s the way we use them that’s different.

These are just a few considerations on why developing for mobile phones/devices should be treated differently. For a product to be successful, developers must be aware of the platform they are targeting, and let’s face it…

It’s a Jungle Out There

As mentioned earlier, the sheer variety of mobile devices available today is staggering and their capabilities are equally varied. The evolution of the mobile phone in particular is quite remarkable. The mobile phone started out as being just that, a phone which you could carry around and use anywhere you liked. We then had low-end mobile devices with very basic web support and limited memory. At this stage a mobile phone stopped being just a mobile phone and became a device. These were followed by devices that supported HTML and Java applications. Today’s smartphones are literally hand-held PC’s running fully fledged OS’s – my Android “phone” has a 1Ghz processor, half a gigabyte of RAM and 8Gb internal storage. I use it to browse the internet, check my email, run office apps and sometimes, even phone people up . The number of people using such devices is constantly on the rise and so the landscape of web development is rapidly changing. In some cases, mobile device browsers are better at handling cutting-edge HTML5 and CSS 3 features than their desktop counterparts for instance. Try http://www.html5test.com from your favorite desktop and mobile browsers, you might be surprised at the results!

If a web product is to be successful in this varied landscape, it is vital that the developers understand the targeted devices, not just at a software level, but even at the hardware level. What does the device support? What does it allow it’s users to do and how? I recently came across this article by Dan McKenzie that deals with how to go about designing for Android which is currently the most popular smartphone platform out there. Quite an interesting read.

We’ve seen some of the “limitations” brought by mobile devices, such as smaller screen real estate and no “hover” but it would be wrong to think that it’s all bad news. Mobile devices also bring to the table interesting and powerful features which are missing from their desktop counterparts. One of these features is geolocation.


Geolocation

Geolocation is one of those features that brings with it a whole range of possibilities for our web applications. Location-based services, geo-marketing, geo-tagging, geo-targeting and web analytics are just a few examples. Gelocation adds meaning to global positioning where rather than just a simple set of co-ordinates, your position is described in terms of street name and how far you are from the closest restaurant!

Your device can determine its position in various ways including GPS and, WiFi positioning and even through Cell information of your mobile phone network. One way of making use of this information is through the W3C Geolocation API, an effort by the W3C to standardize the way a mobile device retrieves positioning information.

So, how do we use this API? The first thing we need to determine is whether our browser supports it by querying the “navigator.geolocation” object:

If (navigator.geolocation==undefined) {
   alert (“Your browser does not support Geolocation API”);
}

If the API is supported we can go ahead get our current location by calling the “getCurrentPosition” function. This is an asynchronous call which receives two callbacks, one for handling the returned position and another to handle any errors. You can also (optionally) specify additional properties to improve accuracy for instance. Here’s an example:

function pageLoaded(){ //this function is called from the body's onload event
   
   if(navigator.geolocation==undefined){
      document.getElementById("MSG").innerText = 'Geolocation is not supported';
   } else {
      document.getElementById("MSG").innerText = 'Geolocation is supported';            
      navigator.geolocation.getCurrentPosition(userLocated, locationError)
   }
} 

function userLocated(position)
{
   var lat = position.coords.latitude;
   var lon = position.coords.longitude;
   var alt = position.coords.altitude;
   var tim = position.timestamp;

   // display the returned values on the page     
   document.getElementById("LAT").innerText = lat;
   document.getElementById("LON").innerText = lon;
   document.getElementById("ALT").innerText = alt
}
 
function locationError(error){
   //handle error;
}


We can also use the W3C Geolocation API to track the device’s location. By tracking the location we can also determine other factors such as speed, distance and direction of movement which can be used by our application. Here’s an example:

var watchID = false;
function toggleTracking(){
    
   if (watchID==false){
      watchID = navigator.geolocation.watchPosition(userLocated, locationError);
      document.getElementById("btn_trk").value = "Stop Tracking";
   } else {
      navigator.geolocation.clearWatch(watchID);
      watchID = false;
      document.getElementById("btn_trk").value = "Start Tracking";
   }
}

To track the device's position, a call is made to the "watchPosition" method of the geolocation object which accepts two callback functions just like the "getCurrentPosition" method. "watchPosition" also returns an handler to the watch (watchId) which is used later to stop tracking by calling the "clearWatch" method.

I must confess that I had trouble testing the tracking function, especially while on foot. I tried creating a function that calculated the distance being travelled as well as the direction but could not test it very well as the response time was jerky at best. I admit that walking up and down the corridor of my house might be too small a distance for proper testing but it was the best I could do since I was editing javascript on my laptop and copying files to and from my mobile phone. I did register some position changes at times but it was sporadic and not enough to draw any conclusions. I have however installed a code editor on my mobile phone such that I can edit the code on the go. The plan is to test the code while travelling to work to see if I can get more encouraging results. I’ll also play around with the A-GPS settings to see what effect they have on accuracy. I’ll let you know how it goes.

Readmore...
Wednesday, 22 June 2011

HTML 5 & CSS 3

0 comments
 
This weeks topics are the much anticipated HTML5 and CSS3 specifications, the next generation in web page markup and styling.

Introduction

HTML5 is the successor to HTML4 which came out way back in 1999. Back then the internet was a very different place where notions such as web applications, e-commerce and social networking were yet unheard of. The web has changed a lot since then but the fundamental technology used to build it hasn't, and over the years its limitations were becoming ever more apparent where web designers were stretching HTML to its absolute limits, hacking it into submission.  Thankfully, in 2006, the Web Hypertext Application Technology Working Group (WHATWG) and the World Wide Web Consortium (W3C) - both of which were working on separate specifications - decided to cooperate to create HTML5. The principles behind HTML5 (as stated on w3schools.com) are:
  • New features should be based on HTML, CSS, DOM and Javascript
  • Reduce the need for external plugins such as flash
  • Better error handling
  • More markup to replace scripting
  • HTML5 should be device independent
  • The development process should be visible to the public 

Implementing HTML5

HTML5 is still a work in progress but W3C have announced that it will be complete by 2014. Since HTML5 is not yet an official standard, no browser has full HTML5 support, however, most major browsers continue to add support for HTML5 with every release. This means that we can (and are encouraged to) start using HTML5 features today.  HTML5 builds on the previous specification so drastic changes to existing markup is not required to start using some of the new features. HTML5 markup also makes websites more search engine friendly, can help improve accessibility and given that all the major browsers largely support the syntax, the business cost for adopting HTML5 is almost negligible.

So, What's new in HTML5?

HTML5 is loaded with new features aimed at improving user experience over the web.  It's a collection of various small improvements that collectively help web designers create something special.  Here are some the most noticeable features in HTML5:
  • A <canvas> element that allows for dynamic rendering of 2D shapes and images on web pages
  • Content specific elements (such as header and article) to improve web page semantics
  • Support for audio and video playback
  • New form controls for better input validation
  • Improved support for local storage (based on databases rather than cookies)
One of my favorite has to be the set of new form controls.  In contrast to HTML4 where we had the generic textbox, HTML5 gives us input controls for: email, URLs, numbers, ranges, dates, search boxes, even colour!    These controls will drastically improve the way user input is currently validated.  Client-side input validation in HTML4 was always a headache and in most cases only cosmetic since it was largely based on javascript which could be disabled at any time by the end user.  With these new controls however, these basic validations will be carried out by the browser itself, which means less javascript and more robustness.  This is not to say that we can do without server-side validation, far from it, but at least we are spared the cumbersome client-side equivalents.  Browser support for these new input types varies, however they can still be used since they will behave as normal text-boxes if they are not supported, which is brilliant.  That's the main thing about HTML5.  We do not need to wait for an official release date to start using HTML5 features, indeed there won't be such a date.  HTML5 is with us today, browser support is growing steadily so failing to embrace the new specification today, simply means being left behind.

What about Visuals? - Enter CSS 3

Cascading Style Sheets (CSS) are an integral part of web development adding layout and style to our HTML pages.  Way back in my very first post(s) I discussed CSS at length, highlighting how it can help us de-couple our web page content from its layout.  The current specification of CSS (2.1) is powerful but CSS 3 takes that power to a completely new level.  Just like HTML5, CSS 3 builds on its predecessor and is still under development by the W3C, however modern browsers already support most of the new properties introduced in CSS 3.  This means that we can start using CSS 3 today, just as we can HTML5.  There is one catch though.  Until CSS 3 specification is finalised, browsers are allowed to interpret a property any way they see fit.  These kind of properties are usually prefixed with a namespace (such as -moz- or -webkit-) to indicate that they are not yet standard.  To explain this better, let's take the new "border-radius" property as an example.

CSS 3 supports adding rounded corners to objects, something which was previously only (painstakingly) possible using images.  Suppose we wanted to add a border with rounded corners to every "<div>" element on our website.  Here's what the CSS 3 style rule would look like:

   div {
      border: 1px solid black;
      -moz-border-radius: 5px;
      -webkit-border-radius: 5px;
      border-radius: 5px;
   }

The first line in the rule simply sets a solid black border around our "<div>" element.  The next 3 lines all state that the border should have rounded corners, each 5 pixels in radius.  Why do we have 3 lines that seemingly state the same thing?  "border-radius" is the proper name of this new CSS 3 property and this is the name that will stick once the CSS 3 specification is finalised.  "-moz-border-radius" is the Mozilla (Firefox) team's interpretation of how the property should be implemented, while the "-webkit-border-radius-" property is the Webkit (Safari, Chrome) team's interpretation. It is important to note that the proper name of the property should always be defined after it's 'non-standard' counterparts such that it is the one that takes precedence.  This will ensure that your stylesheet is forward-compatible i.e. newer browsers that support the standard property will in fact apply the standard one as it always takes precedence, without you having to constantly update your stylesheet.  Furthermore, older browsers can still apply the non-standard version of the property as they have no knowledge of the standard name!  As a matter of fact, at the time of writing, all the latest versions of the major browsers including Internet Explorer, Chrome, Safari and Opera, now support the standard version of "border-radius".  However,  if you want to test this behavior out, try the "border-image" property.

Here's a list of the most common prefixes for CSS 3 properties which are not yet standard:

PrefixBrowser
-ms-Internet Explorer 9
-moz-Firefox
-webkit-Safari, Chrome
-o-Opera

Other Features

There's lots more to CSS 3 than fancy borders, much more in fact. Here are some of the more exciting features of CSS 3:

  • Fonts - you can now use any font you like on your webpage rather than sticking to web-safe fonts;
  • 2D and 3D Transformations;
  • Transitions; and
  • Animations

Fonts

How many times have you resorted to images just so that you could use a particular font for your website logo or headings, sacrificing flexibility for looks.  With CSS 3 this is no longer an issue, simply upload your chosen font to your website and it will be automatically downloaded as required.  Now you can have the looks, the flexibility and better accessibility on your website, neat.


Transformations

In CSS 3 we can apply 2D and 3D transformations to any element on our web page including:

  • Translation (move)
  • Rotation
  • Scaling (re-sizing)
  • Skewing
  • Matrix (any combination of the above)
At the moment, 2D transformations enjoy more browser support than 3D transformations.  In fact, at the time of writing, all major browsers support 2D transformations while only Safari and Chrome have support for 3D transformations.  Here are some examples followed by the CSS 3 styles used:

Examples of CSS 3 2D Transformations (as viewed in Google Chrome)

div{
   background-color:#F5F5F5;
   border:solid 1px black;   
   width:100px;
   height:100px;
   margin:30px;
   float:left;
   text-align:center;
   font-family:arial;
   line-height:30px;
   -webkit-border-radius:5px;
   -moz-border-radius:5px;
   border-radius:5px;
   -webkit-box-shadow: 5px 5px 12px cyan;
   -moz-box-shadow: 5px 5px 12px cyan;
    box-shadow: 5px 5px 12px cyan;
}
 
.rotate{
   -ms-transform:rotate(45deg);
   -moz-transform:rotate(45deg);
   -webkit-transform:rotate(45deg);
   -o-transform:rotate(45deg);
   transform:rotate(45deg);
}
 
.scale {
   -ms-transform:scale(1.5,1.5):
   -moz-transform:scale(1.5,1.5);
   -webkit-transform:scale(1.5,1.5);
   -0-transform-scale(1.5,1.5);
   transform:scale(1.5,1.5);
}
 
.skew {
   -ms-transform:skew(20deg, 15deg); 
   -moz-transform:skew(20deg, 15deg);
   -webkit-transform:skew(20deg, 15deg);
   -o-transform:skew(20deg, 15deg);
   transform:skew(20deg, 15deg);
}

All three <div> elements in this example have the same basic style: i00 pixels square, have a light grey background and a thin black border. I also added the new CSS 3 properties "border-radius" and "box-shadow". "Border-radius" we've already seen, "box-shadow" on the other hand is another border-related CSS 3 property that creates a drop-shadow around your elements by specifying X and Y offsets (how far you want the shadow to 'drop') and shadow distance - how soft/precise it is.

Each of the <div> elements however implements a different class according to the desired transformation. The rotate transformation rotates the element around its centre by the specified number of degrees. The scale transformation takes two parameters one for the width and another for the height. In this case the div is enlarged by a factor of 1.5 along both axis. Similarly the skew transformation takes two angles as parameters (for the x-axis and y-axis) and skews the object accordingly.

3D transformations work in a similar way, this time taking parameters across three dimensions, (X, Y and Z). You could also specify perspective properties and transformation origin in 3D transformations. However, I want to turn my attention to one of my personal favorites: Animation.

CSS 3 Animation

Until recently, the only way to add animations to your website was by using animated gifs or plugins such as Flash.  If you were really brave you could also animate using javascript - not for the feint hearted.  This led to all sorts of inconveniences such as lack of flexibility, accessibility and compatibility. Now we can add animations to our website "natively" using CSS 3. Let's take a simple example, try hovering over any of the columns below using Chrome or Safari:














Whenever the mouse hovers over any one of the columns, the column expands for a short period of time and goes back to its original size. This effect is created using CSS 3 keyframe animation which is currently only supported by webkit browsers. Here's how it's done:

a.anim{
    display:block;
    text-decoration:none;
    width:150px;
    height:120px;
    padding-top:80px;
    text-align:center;
    margin:1px;
    background:url(http://www.w3.org/html/logo/downloads/HTML5_Logo_64.png) no-repeat #F5F5F5 50% 10px;
    border:solid 1px black;
    float:left;
 }

 a:hover{
    /* Animate */
    -webkit-animation-name:grow;
    -webkit-animation-duration: .4s;
    -webkit-animation-iteration-count: 1;
    -webkit-animation-timing-function: ease-in-out;

    /* Forward Compatibility */
    animation-name:grow;
    animation-duration: .4s;
    animation-iteration-count: 1;
    animation-timing-function: ease-in-out;

 }
 
 /* Define the Animation */
 @-webkit-keyframes grow {
    0%   {-webkit-transform: scale(1,1);}
    50%  {-webkit-transform: scale(1.2, 1.2);}
    100% {-webkit-transform: scale(1, 1);}

 @keyframes grow { /* forward compatibility */
    0%   {transform: scale(1,1);}
    50%  {transform: scale(1.2, 1.2);}
    100% {transform: scale(1, 1);}

 }



Ok, so our three columns are in fact three anchor elements styled to look like columns, nothing new here. The animation is triggered by the "a:hover" selector where we specify:

  • The name of the animation to trigger: "grow" in this case;
  • How long the animation should take;
  • How many times the animation should run: once in this case
  • The timing/easing function which adds a more organic feel to the animation

Easing adds smoothness to our animation. In this case the easing function is set to "ease-in-out" which means that the animation start slowly, accelerate towards the middle and slow down again at the end. The final part is the animation definition itself which is based on three keyframes. Each keyframe applies a scale transformation to change the size of the object being animated over time. Animations must have at least two keyframes, one for the begining (0% or 'from') and one for the end (100% or 'to'), but you can have as many keyframes as you like between these two. In this case the animation is pretty simple so one additional keyframe set at the middle of the animation (50%) is enough to get the desired effect.  Of course you could add all sorts of effects to your animation such as changing colours, borders, shadows... you name it.  

Summing Up

Presenting all there is to know about HTML5 and CSS 3 in a single blog post is a ridiculous proposition, the subject is as vast as it is exciting - and it's still evolving. What strikes me the most is the fact that HTML5 and CSS 3 bring so much to web development and asks very little in return in terms of learning effort. If you know HTML you know HTML5, same goes for CSS 3. They are not new technologies but rather extensions to what we're already used to, and yet they bring so much more (power) to the table. I've never experienced anything like it with any other language I've used so far. The fact that all the major players in the software industry including Apple, Google and more recently Microsoft, have committed themselves to HTML5 is further testament to its significance to web development and beyond.
Readmore...