A Beginners Guide to XML
XML stands for eXtensible Mark-up Language. You might not realise it but you are probably already using it:
- XHTML, a mark-up language which many web pages are written in, is a form of XML
- if you subscribe to any RSS feeds or use a feed reader, you are using XML
- if you use Google, and have seen the drop down "suggestion box" when you type, you have seen XML at work
If you have heard of AJAX - which is probably one of the main features of Web 2.0 - the "X" in AJAX stands for XML.
AJAX is a technology which often uses XML
AJAX is a technology which enables a dialogue to be set up between a web page and a server, and allows page content to be dynamically modified without having to physically refresh and reload the page. The data that the page retrieves is normally sent as XML.
Elements of an XML Document
At the most basic level an XML file is just a text file – in a similar manner to an HTML document, an XML file can be created in a wide variety of text editors including "Notepad".
XML tags are almost entirely customisable
An XML document is made up of tags which are almost entirely customisable, as long as the XML rules are followed. There are standard formats though - the above-mentioned XHTML and RSS both use standard formats as defined by their schemas or Document Type Definitions (DTD).
The Document Type Definitions (DTD)
A DTD is a schema written in the DTD language. In Standard Generalized Mark-up Language (SGML) family mark-up languages, the DTD contains a set of mark-up declarations that define a document type - whether it might be:
- HTML 4.01 strict (Contains all HTML elements and attributes, excluding presentational or deprecated elements [eg. font] and framesets)
- XHTML 1.0 strict (a version of XHTML excluding elements marked as deprecated in the HTML 4.01 specification)
- XHTML 1.0 transitional (a version including some presentational elements excluded from the strict version)
- or something else
The XML document declares what DTD it uses via a reference within the mark-up - this is how a web browser know what type of HTML to expect and how a feed reader using a RSS document knows what to expect.
The DTD that an HTML or XHTML document refers to is defined at the beginning of the document's code using the "doctype" declaration - the declaration looking something like this:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
The "doctype" declaration usually contains two pieces of information to identify the relevant DTD; a Formal Public Identifier (FPI) string [the first quoted segment] and a URL where the browser can find it.
The FPI can be broken down into two information snippets - known as the "owner identifier" and "text identifier" respectively - which are then further delimited by the "//" double-slash characters:
"[Registration Status]//[Owner]//[Class] [Description]//[Language]"
The flexibility of the FPI string is the crucial link required to provide the ability to set up your own DTD, or modify the existing DTDs to add the tags you have always wanted.
The ability to create DTDs confers the power to construct new XML formats
By using the "doctype" declaration in your XML code to refer to the location of your DTD, your customised tags would then be valid and execute in the correct manner.
It is this ability to create DTDs which XML documents can refer to, which confers the power to create customised tags and even complete new XML formats.
XML and the Document Object Model (DOM)
Much of the functionality we take for granted in today's websites relies on something called the "DOM" - the Document Object Model.
The DOM is a representation of the web page as XML "nodes", a node being an element or tag. If a node has children (i.e. more tags inside it), then the current node is the parent. You can also have siblings which are next to each other.
JavaScript manipulates the DOM - and therefore the page display - by finding a specified node and modifying it, but also has the ability to add new nodes or remove nodes.
This JavaScript functionality is crucial to the ability of AJAX to modify and update page content, dependant on the data received from the server - as it is this manipulation of the DOM which confers the ability to update content without refreshing the page.
XML Validation
XML documents must validate correctly to conform to the DTD
In order to conform to a DTD's standards, XML mark-up should validate and be 'well formed'.
Being valid means that it follows the schema and being well formed means that it is properly written as XML – tags must be closed and formatted correctly.
There is another reason why validating HTML is important, as validating page not only checks whether it is well formed or not but also provides a way to debug any problems that might occur within the page.
XML Formats
XML is used for many things where a standard way to transfer data over the internet is required. RSS, the ubiquitous format that pervades the internet, has already mentioned. However, there many different XML formats emerging - including the following examples:
XMLTV
XMLTV is a format that enables TV listings to be distributed over the internet to be used by various applications such as media centres or personal video recorders. The Radio Times provides an XMLTV feed of all its listings for personal use.
OpenDocument
OpenDocument is a format produced by the open source community as way to standardise and offer better inter operability between systems.
The format was originally created by OpenOffice.org the open source office suite but was adopted as a standard by the Organization for the Advancement of Structured Information Standards which is a organisation a bit like the W3C and has since been adopted by Microsoft and is used as one of the formats in its Office 2007 suite.
There was some controversy about this as Microsoft was pushing their format, the Office Open XML format which is also an internationally recognised standard. Many open source advocates maintained that this format was biased in favour of Microsoft, but time will show the uptake numbers of each format.
SVG
Scalable Vector Graphics is a format that allows a graphic to be broken down into XML. All web browsers support this format apart from Internet Explorer although Google do provide a plug-in for Internet Explorer to allow it to use SVG. It's not just web browsers that use the format though main graphics applications support this format. SVG can been animated and interactive along similar lines to what you might expect from Adobe Flash
There are many other different kinds of XML and the chances are you have used it without realising so next time you see a file extension or application that includes an "X" in its name there is a good chance it stands for XML.


Share this
Tweet