Science With the Virtual Observatory
|
For this to work you need the software listed on the the
software page.
Specifically, we will be looking at an example XML file in
$NVOSS/java/dev/XMLparse (%NVOSS%\java\dev\XMLparse).
We shall look at :
Parsing XML entails involves the interpretation of the XML tags in the document in question. Since XML must be well formed (see notes from XML Introduction), the structure of any given XML document adheres to very specific rules. Since XML is not free-form, any valid XML document may be interpreted without prior knowledge of the datatypes represented by the elements. Two main methods for parsing XML documents have become popular.
The DOM method of XML parsing builds a tree representation of the XML document in memory. This allows for non-sequential access of the document nodes. Methods in the XML parser provide the means for traversing the tree and retrieving information about each node. In situations where the document must be modified or accessed in a non-sequential way, this is the method to use. The downside is that the entire document must be held in memory which can be prohibitive if the document is very long.
Unlike the DOM method of parsing, SAX is event based. The document is read sequentially and initiates callback methods based on the occurrence of tags. Since the document need not be contained in memory, this method is most appropriate for applications where the document does not need to be modified.
Use DOM when:
Use SAX when:
The program PrintUsingDOM.java shows how to use the Xerces DOM
parser to digest an XML document. The program PrintUsingSAX.java shows
how to use the Xerces SAX parser to digest an XML document.
Listing 1 shows the main method responsible for the parsing of the
XML document. This method uses Java recursion to walk the tree
representation of the document in a depth-first pre-order
traversal. A Java applet which shows the different types of tree
traversal is available here.
In the method in Listing 1 there are case switches for each type of
node that may be encountered, but the real heavy lifting is done by the
recursion call highlighted near the bottom of the listing
//walk the DOM tree and print as you go
private void walk(Node node)
{
int type = node.getNodeType();
switch(type)
{
case Node.DOCUMENT_NODE:
{
System.out.println("<?xml version=\"1.0\" encoding=\""+
"UTF-8" + "\"?>");
break;
}//end of document
case Node.ELEMENT_NODE:
{
System.out.print('<' + node.getNodeName() );
NamedNodeMap nnm = node.getAttributes();
if(nnm != null )
{
int len = nnm.getLength() ;
Attr attr;
for ( int i = 0; i < len; i++ )
{
attr = (Attr)nnm.item(i);
System.out.print(' '
+ attr.getNodeName()
+ "=\""
+ attr.getNodeValue()
+ '"' );
}
}
System.out.print('>');
break;
}//end of element
case Node.ENTITY_REFERENCE_NODE:
{
System.out.print('&' + node.getNodeName() + ';' );
break;
}//end of entity
case Node.CDATA_SECTION_NODE:
{
System.out.print( "<![CDATA["
+ node.getNodeValue()
+ "]]>" );
break;
}
case Node.TEXT_NODE:
{
System.out.print(node.getNodeValue());
break;
}
case Node.PROCESSING_INSTRUCTION_NODE:
{
System.out.print("<?"
+ node.getNodeName() ) ;
String data = node.getNodeValue();
if ( data != null && data.length() > 0 ) {
System.out.print(' ');
System.out.print(data);
}
System.out.println("?>");
break;
}
}//end of switch
//recurse
for(Node child = node.getFirstChild(); child != null; child = child.getNextSibling())
{
walk(child);
}
//without this we miss the ending tags
if ( type == Node.ELEMENT_NODE )
{
System.out.print("</" + node.getNodeName() + ">");
}
}//end of walkListing 1
Unlike DOM which uses tree traversal to navigate the XML document,
SAX is event based and reads the document sequentially. This
means that there must be a content handler for the document. The
content handler contains callback methods which are executed when
specific tags are reached. In the fragment below, we see the
content handler being set to this which is simply stating that the
callback methods reside in the same class as the constructor. See
the Java source code for examples of how to define the callback methods.
public PrintUsingSAX(String fileName)
{
try
{
XMLReader myParser = new SAXParser();
myParser.setContentHandler(this);
myParser.setErrorHandler(this);
myParser.parse(fileName);
}
catch(Exception e)
{
System.out.println("Exc " + e);
}
}//end of constructorListing 2
You may try running the parsers by following the following recipe.
Windows:
>cd %NVOSS2005%
>bin\setup
>cd %NVOSS2005%\java\dev\XMLparse
>ant compile
>java PrintUsingDOM simongen.xml
>java PrintUsingSAX simongen.xmlUnix/Mac
>cd $NVOSS2005
>source bin/setup.csh
>cd java/dev/XMLparse
>ant compile
>java PrintUsingDOM simongen.xml
>java PrintUsingSAX simongen.xmlListing 3
The URL Class is the main java class used for accessing data available through urls. This applies to REST type web services as well. Let's examine how the URL class is used to access a resource. Listing 1 is the relevant code from the URLreader.java file.
Define default URL string to access.
String Url = "http://casjobs.sdss.org/ImgCutoutDR4/getjpeg.aspx/"; //default service
double ra,dec;
int ind=0;
if (args.length >= 2 ) {
ra = Double.parseDouble(args[ind++]);
dec = Double.parseDouble(args[ind++]);
}
else {
ra = 323.414;
dec = 10.5083;
}Put full URL together and instantiate the URL class.
Url = Url+"?ra="+ra+"&dec="+dec+"&height=1024&width=1024&scale=0.5";
URL imgurl = new URL(Url);Open a file to write to and open a stream to the URL.
DataHandler dh = new DataHandler(imgurl);
FileOutputStream fw = new FileOutputStream("img.jpeg");
dh.writeTo(fw);
fw.close();Save the content of the URL to a file on the local filesystem.
dh.writeTo(fw);
fw.close();
File f = new File(dh.getName());
f.delete();Listing 4
The URL class can point to any web resource with a valid URL
definition. Thus, cutout servers, conesearches, and even regular html
web pages can be accessed via this method. To exemplify this,
redefine the string Url to point to your favorite web page.
Change the output filename to test.html. Compile and run the
URLreader now open test.html in your browser.
OLD:
Url = Url+"?ra="+ra+"&dec="+dec+"&height=1024&width=1024&scale=0.5";
URL imgurl = new URL(Url);NEW:
Url = Url+"?ra="+ra+"&dec="+dec+"&height=1024&width=1024&scale=0.5";
Url = "http://www.google.com";
URL imgurl = new URL(Url);OLD:
FileOutputStream fw = new FileOutputStream("img.jpeg");NEW:
FileOutputStream fw = new FileOutputStream("test.html");
>ant compile
>java URLreader
>mozilla test.htmlListing 5
Use one of the VO registries to find a cone search which returns a VOTable. Use the URL class and either a DOM or SAX parser to read the VOTable out to a file.
The NVO Summer School is made possible through the support of the National Science Foundation and the National Aeronautics and Space Administration.
![]() |