Pages

Banner 468

Thursday 3 November 2011

CMT3315 LAB 02 - XML vs HTML

0 comments
 

In this post we will be having a look at the similarities and differences between XML and HTML. As discussed in my previous post XML is a subset of SGML designed to make the knowledge structure of a document known to a software package. In essence, it enables a software package to "understand" the structure of a document. HTML was also derived from SGML but appeared before XML.

HTML

HTML is made up of a standard set of tags specifically designed to create web pages meant to be understood and displayed by web browsers. Soon after its original release, HTML became very popular very quickly and web pages were being used to accomplish things that they weren't designed to do. Initally, the way an HTML document was interpreted and displayed was left entirely up to the browser. This started creating problems to web designers who wanted their pages to render the same across browsers, so HTML started being extended with other tags that defined presentation such as "<font>". These type of tags go directly against the original concept of SGML which was to separate content from layout. The browser wars did little to help the situation, and in fact made it even worse. Fierce competition gave rise to proprietary tags, and browsers became tolerant to badly written HTML documents which in the end badly hinders programmatic interpretation of web pages.

XML

A major advantage of HTML is its simplicity but it is also one of its biggest weaknesses. XML was born out of the need for having something that was simpler than SGML but alsostricter than HTML. XML tags are user-defined, which means that when creating an XML document you are defining your own standard for structuring that particular type of document. That is why XML is extensible. Every time you define a new set of tags you are effectively defining a new markup language! In fact, just like SGML, XML is itself a framework for defining markup languages. Its main objective is providing the ability to structure information in such a way as to enable any system on any platform to process that information.

XML gave rise to XHTML, a stricter version of HTML based on XML rules. Differences between HTML and XHTML include:

  • unlike HTML, XHTML documents must be well-formed XML documents;
  • XHTML is case-sensitive for element and attribute names, HTML is not;
  • Attribute minimisation (omitting the "=" sign and value) in XHTML is not allowed.

There are many more differences, most of which would require a separate blog post to explain, but the main idea is that XHTML addresses the limits and weaknesses in the original HTML specification because it is based on XML.

Summary

HTML (1990) was derived from SGML (1986) as a simpler markup language for creating hyperlinked web documents. XML (1998) is a specification for defining markup languages simpler than SGML from which it is itself derived. Its main objective is to facilitate portability of data across multi-platform systems. In 2000 XML gave rise to XHTML, a reformulation of HTML aimed at adressing limits in the latter's original specification.

Leave a Reply