Science With the Virtual Observatory
2005 Summer School

Introduction to Java Libraries for use with the VO

For this to work you need the software listed on the the software page.
Specifically, we will be looking at an example XML file in $NVOSS/java/dev/XMLparse (%NVOSS%\java\dev\XMLparse).

We shall look at :

Parsing XML

Parsing XML entails involves the interpretation of the XML tags in the document in question. Since XML must be well formed (see notes from XML Introduction), the structure of any given XML document adheres to very specific rules. Since XML is not free-form, any valid XML document may be interpreted without prior knowledge of the datatypes represented by the elements. Two main methods for parsing XML documents have become popular.

Document Object Model (DOM)

The DOM method of XML parsing builds a tree representation of the XML document in memory. This allows for non-sequential access of the document nodes. Methods in the XML parser provide the means for traversing the tree and retrieving information about each node. In situations where the document must be modified or accessed in a non-sequential way, this is the method to use. The downside is that the entire document must be held in memory which can be prohibitive if the document is very long.

Simple API for XML (SAX)

Unlike the DOM method of parsing, SAX is event based. The document is read sequentially and initiates callback methods based on the occurrence of tags. Since the document need not be contained in memory, this method is most appropriate for applications where the document does not need to be modified.

Overview

Use DOM when:

  • The XML document needs to be modified on the fly.
  • Use SAX when:

  • The XML document is too large to fit in memory
  • Sequential access of the document is not detrimental
  • XML Parser Examples

    The program PrintUsingDOM.java shows how to use the Xerces DOM parser to digest an XML document. The program PrintUsingSAX.java shows how to use the Xerces SAX parser to digest an XML document.

    Dom example

    Listing 1 shows the main method responsible for the parsing of the XML document.  This method uses Java recursion to walk the tree representation of the document in a depth-first pre-order traversal.  A Java applet which shows the different types of tree traversal is available here.  In the method in Listing 1 there are case switches for each type of node that may be encountered, but the real heavy lifting is done by the recursion call highlighted near the bottom of the listing

        //walk the DOM tree and print as you go
    private void walk(Node node)
    {
    int type = node.getNodeType();
    switch(type)
    {
    case Node.DOCUMENT_NODE:
    {
    System.out.println("<?xml version=\"1.0\" encoding=\""+
    "UTF-8" + "\"?>");
    break;
    }//end of document
    case Node.ELEMENT_NODE:
    {
    System.out.print('<' + node.getNodeName() );
    NamedNodeMap nnm = node.getAttributes();
    if(nnm != null )
    {
    int len = nnm.getLength() ;
    Attr attr;
    for ( int i = 0; i < len; i++ )
    {
    attr = (Attr)nnm.item(i);
    System.out.print(' '
    + attr.getNodeName()
    + "=\""
    + attr.getNodeValue()
    + '"' );
    }
    }
    System.out.print('>');

    break;

    }//end of element
    case Node.ENTITY_REFERENCE_NODE:
    {

    System.out.print('&' + node.getNodeName() + ';' );
    break;

    }//end of entity
    case Node.CDATA_SECTION_NODE:
    {
    System.out.print( "<![CDATA["
    + node.getNodeValue()
    + "]]>" );
    break;

    }
    case Node.TEXT_NODE:
    {
    System.out.print(node.getNodeValue());
    break;
    }
    case Node.PROCESSING_INSTRUCTION_NODE:
    {
    System.out.print("<?"
    + node.getNodeName() ) ;
    String data = node.getNodeValue();
    if ( data != null && data.length() > 0 ) {
    System.out.print(' ');
    System.out.print(data);
    }
    System.out.println("?>");
    break;

    }
    }//end of switch


    //recurse
    for(Node child = node.getFirstChild(); child != null; child = child.getNextSibling())
    {
    walk(child);
    }


    //without this we miss the ending tags
    if ( type == Node.ELEMENT_NODE )
    {
    System.out.print("</" + node.getNodeName() + ">");
    }


    }//end of walk
    Listing 1

    SAX Example

    Unlike DOM which uses tree traversal to navigate the XML document, SAX is event based and reads the document sequentially.  This means that there must be a content handler for the document.  The content handler contains callback methods which are executed when specific tags are reached.  In the fragment below, we see the content handler being set to this which is simply stating that the callback methods reside in the same class as the constructor.  See the Java source code for examples of how to define the callback methods.

        public PrintUsingSAX(String fileName)
    {
    try
    {
    XMLReader myParser = new SAXParser();
    myParser.setContentHandler(this);
    myParser.setErrorHandler(this);
    myParser.parse(fileName);
    }
    catch(Exception e)
    {
    System.out.println("Exc " + e);
    }

    }//end of constructor

    Listing 2

     Running the examples

    You may try running the parsers by following the following recipe.

    Windows:
    >cd %NVOSS2005%
    >bin\setup
    >cd %NVOSS2005%\java\dev\XMLparse
    >ant compile
    >java PrintUsingDOM simongen.xml
    >java PrintUsingSAX simongen.xml

    Unix/Mac
    >cd $NVOSS2005
    >source bin/setup.csh
    >cd java/dev/XMLparse
    >ant compile
    >java PrintUsingDOM simongen.xml
    >java PrintUsingSAX simongen.xml
    Listing 3

    The URL Class

    The URL Class is the main java class used for accessing data available through urls. This applies to REST type web services as well. Let's examine how the URL class is used to access a resource. Listing 1 is the relevant code from the URLreader.java file.

    Define default URL string to access.
    String Url = "http://casjobs.sdss.org/ImgCutoutDR4/getjpeg.aspx/"; //default service
    double ra,dec;
    int ind=0;

    if (args.length >= 2 ) {
    ra = Double.parseDouble(args[ind++]);
    dec = Double.parseDouble(args[ind++]);
    }
    else {
    ra = 323.414;
    dec = 10.5083;
    }


    Put full URL together and instantiate the URL class.
    Url = Url+"?ra="+ra+"&dec="+dec+"&height=1024&width=1024&scale=0.5";
    URL imgurl = new URL(Url);

    Open a file to write to and open a stream to the URL.
    DataHandler dh = new DataHandler(imgurl);
    FileOutputStream fw = new FileOutputStream("img.jpeg");
    dh.writeTo(fw);
    fw.close();

    Save the content of the URL to a file on the local filesystem.
    dh.writeTo(fw);
    fw.close();
    File f = new File(dh.getName());
    f.delete();
    Listing 4

    The URL class can point to any web resource with a valid URL definition. Thus, cutout servers, conesearches, and even regular html web pages can be accessed via this method.  To exemplify this, redefine the string Url to point to your favorite web page.  Change the output filename to test.html.  Compile and run the URLreader now open test.html in your browser.

    OLD:
    Url = Url+"?ra="+ra+"&dec="+dec+"&height=1024&width=1024&scale=0.5";
    URL imgurl = new URL(Url);
    NEW:
    Url = Url+"?ra="+ra+"&dec="+dec+"&height=1024&width=1024&scale=0.5";
    Url = "http://www.google.com";
    URL imgurl = new URL(Url);

    OLD:
    FileOutputStream fw = new FileOutputStream("img.jpeg");
    NEW:
    FileOutputStream fw = new FileOutputStream("test.html");

    >ant compile
    >java URLreader
    >mozilla test.html
    Listing 5

    Student Exercise:

    Use one of the VO registries to find a cone search which returns a VOTable. Use the URL class and either a DOM or SAX parser to read the VOTable out to a file.

    Useful Links:

    XML Parser Tutorial
    Xerces Documentation
    URL Class Documentation

    The NVO Summer School is made possible through the support of the National Science Foundation and the National Aeronautics and Space Administration.