Monday, July 14, 2008

XML Syntax Rules

XML Elements Must be Properly Nested

In HTML, you will often see improperly nested elements:

This text is bold and italic

In XML, all elements must be properly nested within each other:

This text is bold and italic

In the example above, "Properly nested" simply means that since the element is opened inside the element, it must be closed inside the element.
XML Documents Must Have a Root Element

XML documents must contain one element that is the parent of all other elements. This element is called the root element.



.....




XML Attribute Values Must be Quoted

XML elements can have attributes in name/value pairs just like in HTML.

In XML the attribute value must always be quoted. Study the two XML documents below. The first one is incorrect, the second is correct:


Tove
Jani




Tove
Jani


The error in the first document is that the date attribute in the note element is not quoted.
Entity References

Some characters have a special meaning in XML.

If you place a character like "<" inside an XML element, it will generate an error because the parser interprets it as the start of a new element.

This will generate an XML error:

if salary < 1000 then

To avoid this error, replace the "<" character with an entity reference:

if salary < 1000 then

There are 5 predefined entity references in XML:
< < less than
> > greater than
& & ampersand
' ' apostrophe
" " quotation mark

Note: Only the characters "<" and "&" are strictly illegal in XML. The greater than character is legal, but it is a good habit to replace it.
Comments in XML

The syntax for writing comments in XML is similar to that of HTML.


With XML, White Space is Preserved

HTML reduces multiple white space characters to a single white space:
HTML: Hello my name is Tove
Output: Hello my name is Tove.

With XML, the white space in your document is not truncated.
XML Stores New Line as LF

In Windows applications, a new line is normally stored as a pair of characters: carriage return (CR) and line feed (LF). The character pair bears some resemblance to the typewriter actions of setting a new line. In Unix applications, a new line is normally stored as a LF character. Macintosh applications use only a CR character to store a new line.

No comments: