Science With the Virtual Observatory
|
For this to work you need the software listed on the the
software page.
Specifically, we will be looking at an example XML file in
$NVOSS/java/dev/XMLparse.
This lesson provides a brief introduction to XML. We will
explore the components that make up an XML document as well as how to
form correct XML documents. Finally, we will examine
an example XML document. The student exercise involves
construction of a correct XML document.
We shall look at :
XML documents should start with the xml directive. This line
in the document states which version of xml is used in the remainder of
the document. The first line in Listing 1 is the xml directive.
XML like HTML is constructed using tags which define elements of the document. Each portion of the document is set apart by beginning and ending tags. Listing 1 shows a short XML document with 5 separate elements. Elements can have element content, mixed content, simple content, and empty content.
<?xml version="1.0"?>
<FAMILYTREE>
<FAMILY> Krughoff Family
<MOTHER> Noell </MOTHER>
<FATHER> Tom </FATHER>
<CHILDREN progeny="yes"></CHILDREN>
</FAMILY>
</FAMILYTREE>Listing 1
XML elements can also have attributes associated with them. Attributes are name--value pairs associated with the element but not contained within the tag block. In Listing 1 the <CHILDREN> element has an attribute called 'progeny' with value 'yes' associated with it. Each element may have multiple attributes.
Attributes are frequently used in HTML. In XML, however, it is a good idea to avoid overusing attributes. A rule of thumb is: If the attribute contains data, use a child element instead. Attributes in XML can be useful for storing identifier numbers in documents where many instances of the same element occur.
Comments in XML documents are the same as in HTML documents <!-- begins the comment and --> ends the comment.
Similar to HTML, XML treats some characters as special. The characters <, >, &, ", and ' are some the characters which cannot appear in the text of elements. These must be translated to the appropriate entity reference. For example: & must be written as &.
In order to modularize and categorize XML documents, XML documents may contain references to a namespace. The namespace gives an identifier to the document. A technical description of namespaces can be found here.
Just as XML documents may be, but do not have to be, associated with a namespace, they may also have an associated document which describes the acceptable blocks within the document. There are two options for document description.
Both the DTD and XML Schema describe the form that a valid XML document should take. For example, they indicate which elements are valid children, which element is the root node, and even the datatype associated with and element. More discussion of Schema and DTDs will be carried out in the next session.
Unlike HTML, XML must be strictly well formed. As an example, most HTML parsers will ignore ending tags if they are left off of one line elements. This is not true of XML. All beginning tags must have an associated ending tag. Following is a list of several of the most important aspects of well formedness.
Listing shows a few examples of common gotchas associated with XML well formedness.
Case sensitive:
<TAG> This is incorrect </tag>
<TAG> This is correct </TAG>
Overlapping tags:
<TAG1>
<TAG2>
This is incorrect
</TAG1>
</TAG2>
<TAG1>
<TAG2>
This is correct
</TAG2>
</TAG1>
Single root node:
Incorrect:
<?xml version="1.0"?>
<FAMILY> Krughoff Family
<MOTHER> Noell </MOTHER>
<FATHER> Tom </FATHER>
<CHILDREN progeny="yes"></CHILDREN>
</FAMILY>
<FAMILY> Worland Family
<MOTHER> Wilhelmina </MOTHER>
<FATHER> Vincent </FATHER>
<CHILDREN progeny="yes"></CHILDREN>
</FAMILY>
Correct:
<?xml version="1.0"?>
<FAMILYTREE>
<FAMILY> Krughoff Family
<MOTHER> Noell </MOTHER>
<FATHER> Tom </FATHER>
<CHILDREN progeny="yes"></CHILDREN>
</FAMILY>
<FAMILY> Worland Family
<MOTHER> Wilhelmina </MOTHER>
<FATHER> Vincent </FATHER>
<CHILDREN progeny="yes"></CHILDREN>
</FAMILY>
</FAMILYTREE>Listing 2
Well formedness has the primary benefit of making XML easily human readable and easy to parse by machine.
There are no predefined elements in XML. This allows the user to define all the elements in use. It also makes it easy to extend XML documents to handle complex datatypes as they come into use.
XML is inherently hierarchical. Each element is either a parent or a child element or both. The only element which is only a parent element is the root element. The hierarchical nature of XML makes it directly applicable to hierarchical datatypes like trees or tables.
XML is a plain text protocol. Thus, by nature, XML is human readable, mailable, and easily editable.
<TREE>
<FAMILY>
<MOTHER> Billy </MOTHER>
<FATHER> Vincent </FATHER>
<CHILDREN>
<SON> Peter </SON>
<DAUGHTER> Sue </DAUGHTER>
<FAMILY>
<MOTHER progeny="true"> Noell </MOTHER>
<FATHER progeny="false"> Tom </FATHER>
<CHILDREN>
<SON> Simon </SON>
<DAUGHTER> Laura </DAUGHTER>
<SON> Stephen </SON>
<FAMILY>
<MOTHER progeny="true"> Emily </MOTHER>
<FATHER progeny="false"> Jarrod </FATHER>
<CHILDREN>
<SON> Henry </SON>
</CHILDREN>
</FAMILY>
</CHILDREN>
</FAMILY>
</CHILDREN>
</FAMILY>
</TREE>Listing 3
We will use the family tree to exemplify the hierarchy inherent in XML. Listing 3 shows an excerpt from an XML document describing my immediate family. Figure 1 shows a tree representation of the same XML document.

Figure 1
The family tree is a good metaphor for the XML hierarchy, partly because of similar terminology. Nodes closer to the root node are referred to as parent nodes to those directly below them. The lower nodes are known as child or progeny nodes. Nodes with the same parent node are siblings.
A more complete version of my family, with additional tags and attributes is available here. You may use it as a starting point for completing the student exercise.
Write an XML description of your own family tree. You do not have
to use the same hierarchy that I use in the example. Feel free to use
the extensibility of XML to create a unique set of elements.
The NVO Summer School is made possible through the support of the National Science Foundation and the National Aeronautics and Space Administration.
![]() |
