VO Client side Integration: Lessons Learned (C. Miller 08/10/05)
In this lesson, we discuss how astronomers can integrate VO services into their day-to-day research activities.Contents:
VOTable Readers
Common Types
of XML Parsers
readvot.pro
Effective
Translation of VOTable Data to other Data types
Other
Techniques for Working with VOTables
Utilizing REST-based
Services
Cone Searches and Catalogs
SIAP Queries and Images
Utilizing SOAP-based
Services
Registry Calls
SkyPortal Calls
WESIX (the
SExtractor/SkyPortal service)
GALMORPH (the GIM2D/IRAF
service)
Reading VOTables
The most common type of data output you will find in VO services is the
VOTable format. It is a requirement of Cone Search and SIAP query
returns. Similarly, WESIX returns only a VOTable and SkyPortal can
return x,y,z. Thus, from whatever research environment you use,
the ability to read VOTables is very important. In most cases,
XML/VOTables can be re-formatted into a more "comfortable" format
(e.g., ascii, FITs, etc). But the importance of having a near seamless
connection between VOTable outputs and variables within your research
environment cannot be over-emphasized. A fairly inefficient way to do
one's research is to be continually exiting a research environment
(e.g., Python, IDL, SM) in order to convert VOTables. A much preferred
technique is to work with a converter or parser within your environment.Common Types of XML Parsers
See Simon Krughoff's Lesson.SAX (Simple API for XML)
Reads XML sequentially reporting events. Good for reading large data files.
DOM
Tree-based parser. Allows XML to be modified and written.
IDL supports both DOM and SAX parsers. We give an example of the use of SAX and DOM parsers on a simple XML file.
Using the IDL native SAX library: An Event-based parser. See rsi/idl_6.1/examples/data_access
First, create your parser (xml_to_array__define.pro), then read the XML file:
num_array.xml
<?xml version="1.0"?>
<!--
This file is used in the example of the xmlnumber parser object
class, described in the "Using the XML Parser Object Class"
chapter of _Building IDL Applications_.
-->
<!DOCTYPE array [
<!ELEMENT array (number+)>
<!ELEMENT number (#PCDATA)>
]>
<array>
<number>0</number>
<number>1</number>
<number>2</number>
</array>
xml_to_array__define.pro
Let's try it out:
; Called when the xml_to_array object is created.
FUNCTION xml_to_array::Init
self.pArray = PTR_NEW(/ALLOCATE_HEAP)
RETURN, self->IDLffXMLSAX::Init()
END
; Called when the xml_to_array object is destroyed.
PRO xml_to_array::Cleanup
; Release pointer
IF (PTR_VALID(self.pArray)) THEN PTR_FREE, self.pArray
END
; Called when parsing of the document data begins.
; If the array pointed at by pArray contains data, reinitialize it.
PRO xml_to_array::StartDocument
IF (N_ELEMENTS(*self.pArray) GT 0) THEN $
void = TEMPORARY(*self.pArray)
END
; Called when parsing character data within an element.
; Adds data to the charBuffer field.
PRO xml_to_array::characters, data
self.charBuffer = self.charBuffer + data
END
; Called when the parser encounters the start of an element.
PRO xml_to_array::startElement, URI, local, strName, attr, value
CASE strName OF
; If the array pointed at by pArray contains data,
; reinitialize it.
"array": BEGIN
IF (N_ELEMENTS(*self.pArray) GT 0) THEN $
void = TEMPORARY(*self.pArray); clear out memory
END
; Reinitialize the charBuffer field.
"number" : BEGIN
self.charBuffer = ''
END
ENDCASE
END
; Called when the parser encounters the end of an element.
PRO xml_to_array::EndElement, URI, Local, strName
CASE strName OF
"array": ; Do nothing.
"number": BEGIN
; Convert string data to an integer.
idata = FIX(self.charBuffer);
; If the array pointed at by pArray has no elements,
; set it equal to the current data.
IF (N_ELEMENTS(*self.pArray) EQ 0) THEN $
*self.pArray = iData $
; If the array pointed at by pArray contains data
; already, extend the array.
ELSE $
*self.pArray = [*self.pArray,iData]
END
ENDCASE
END
; Returns the current array stored internally. If
; no data is available, returns -1.
FUNCTION xml_to_array::GetArray
IF (N_ELEMENTS(*self.pArray) GT 0) THEN $
RETURN, *self.pArray $
ELSE RETURN , -1
END
; Object class definition method.
PRO xml_to_array__define
void = {xml_to_array, $
INHERITS IDLffXMLSAX, $
charBuffer : '', $
pArray : PTR_NEW() }
END
Using the IDL native DOM library: Consider this sample.xml file and the code sample:
IDL> xmlObj = OBJ_NEW('xml_to_array')
IDL> xmlFile = 'num_array.xml'
IDL> xmlObj->ParseFile, xmlFile
IDL> myArray = xmlObj->GetArray()
IDL> PRINT, myArray
sample.xml
<?xml version="1.0" encoding="UTF-8"?>
<plugin type="tab-iframe">
<name>Weather.com Radar Image [DEN]</name>
<description>600 mile Doppler radar image for DEN</description>
<version>1.0</version>
<tab>
<icon>weather.gif</icon>
<tooltip>DEN Doppler radar image</tooltip>
</tab>
</plugin>
sample.pro
PRO sample_recurse, oNode, indentLet's try it out:
; "Visit" the node by printing its name and value
PRINT, indent GT 0 ? STRJOIN(REPLICATE(' ', indent)) : '', $
oNode->GetNodeName(), ':', oNode->GetNodeValue()
; Visit children
oSibling = oNode->GetFirstChild()
WHILE OBJ_VALID(oSibling) DO BEGIN
SAMPLE_RECURSE, oSibling, indent+3
oSibling = oSibling->GetNextSibling()
ENDWHILE
END
PRO sample
oDoc = OBJ_NEW('IDLffXMLDOMDocument')
oDoc->Load, FILENAME="sample.xml"
SAMPLE_RECURSE, oDoc, 0
OBJ_DESTROY, oDoc
END
IDL> sample
% Compiled module: SAMPLE.
% Loaded DLM: XML.
#document:
plugin:
#text:
name:
#text:Weather.com Radar Image [DEN]
#text:
description:
#text:600 mile Doppler radar image for DEN
#text:
version:
#text:1.0
#text:
tab:
#text:
icon:
#text:weather.gif
#text:
tooltip:
#text:DEN Doppler radar image
#text:
#text:
VOlib_0.1 contains readvot.pro, a DOM-based VOTable reader: READVOT.PRO
How might you call it? Click Here
How does it work? It's a DOM-based recursive procedure to parse the XML tree and build an IDL structure.
Keep track of the nuumber of ROWs, COLUMNS, RESOURCES (tables) and which RESOURCE is placed in the structure.
Each call to SAMPLE_RECURSE2 builds a new line (a single row-structure). This newline is concatenated to the old structure.
SAMPLE_RECURSE2, oDoc, 0,str, J,K, resource_num, resource_read
concat_structs, old_str, str, new_str
str=new_str
Lessons Learned:
Effective Translation of VO Data to other data types
VOTables contain information (FIELDS) which can completely describe the data. The data type and the [array]size are vital. Names are nice, but not necessary, likewise for UCDs and units. Challenges occur when 'arraysize="*"' or the data type is not specified (which is fairly often) in the VO. Likewise, incorrect FIELD information (i.e., the wrong data type or arraysize) cause similar difficulties. To account for either of these, I find it easiest to read everything in as a string (or something non-type specific) and then typically parse. However, parsing alone can be a challenge (on strings).First: build the structure (needs structure size, tags-names, data-types, and array sizes).
READVOT looks for the following field-types: NAME, ID, UCD, ARRAYSIZE, DATATYPE
READVOT recognizes the following types: unsignedByte, long, int short, double, float, char. These are translated to IDL datatypes.
NAMEs are parsed for unallowable characters (e.g., $, ', ", ?, etc).
The structure is defined (as a single row). The structure tags (columns) are built up as the tree is transversed.
As the metadata information is read in, one hopes to have the correct datatypes and array definitions. But this is rare.
Instead, traverse the tree and try to determine the datatype and the array-size from the XML alone.
The initial structure is built from the XML table metadata. But the first time data is read, each TD element is parsed, typed, and sized to determine if it comes close to what is in the strcture already. The challenge with this is datatyping: STRING arrays must be treated differently from INT/FLOAT (numeric) arrays. I parse on "," in STRINGs and "," and " " in numbers.
By the end, one hopes to have a correctly build structure.
Give it a try:
IDL> readvot, 'nvoss2005/idl/VOlib_0.1/data/xmm1_votable.xml',str
3 tables found. Table number 1 was read in. You can specify another with table = x.
IDL> help, /str,str
** Structure <8347804>, 15 tags, length=92, data length=92, refs=1:
CREATED STRING '2005-09-02'
UNIQUE_ID LONG 38426
NAME STRING Array[1]
RA DOUBLE 34.252984
DEC DOUBLE -5.0352980
TIME DOUBLE 51756.944
EP_FLUX FLOAT 1.54315e-14
EP_FLUX_ERROR FLOAT 4.09244e-15
PN_FLUX FLOAT 1.54315e-14
PN_FLUX_ERROR FLOAT 4.09244e-15
M1_FLUX FLOAT -999.900
M1_FLUX_ERROR FLOAT -999.900
M2_FLUX FLOAT -999.900
M2_FLUX_ERROR FLOAT -999.900
SEARCH_OFFSET DOUBLE 2.1590000
IDL> readvot, 'xmm1_votable.xml',str, table=2
IDL> help, /str,str
** Structure <82adc3c>, 12 tags, length=108, data length=108, refs=1:
CREATED STRING '2005-09-02'
UNIQUE_ID LONG 191
NAME STRING Array[1]
RA DOUBLE 34.100000
DEC DOUBLE -5.0000000
PROPOSAL_TYPE STRING Array[1]
PRIORITY STRING Array[1]
PNO LONG 11237
EXPOSURE LONG 50000
PI_LNAME STRING Array[1]
PI_FNAME STRING Array[1]
SEARCH_OFFSET DOUBLE 9.5630000
Other Techniques
for Working with VOTables
Java Libraries (as opposed to native XML parsers)XML-tag removal
Using REST-based Webservices
Cone Searches and SIAP queries are REST web-services (i.e., a POST) and their results are simple XML files. These are easily obtained through spawning a "wget" or via port streaming (?). For both of these web-services, the required inputs are POSITION, SIZE and URL of the cone search or SIAP query.Points to note: spawning a wget require write privileges in some working directory to save the XML file and then a VOTable parser to read it in. Connecting to a port requires:
Cone Searches: CONECALL.PRO
How might you call it? Click
HereHow does it work? It "spawns" a WGET call and downloads the CONE SEARCH XML file. The XML VOTable is read in by the IDL VOTable to create a structure. It requires a position on the sky, and search radius, and a URL (obtained from a Registry call--see below).
We parse for all "?" marks to place the RA=&DEC=&SR=1
Lessons Learned
Some don't work:
http://irsa.ipac.caltech.edu/cgi-bin/Oasis/CatSearch/nph-catsearch?server=@rmt_grit&CAT=fp_2mass:fp_psc
Some have "?", some have "&" and "?". Some have neither.
http://simbad.u-strasbg.fr/simbad-conesearch.pl
http://www2.keck.hawaii.edu/software/vos/tkrsConeSearch.php?
http://heasarc.gsfc.nasa.gov/cgi-bin/vo/cone/coneGet.pl?table=ascalss&
How do you meaningfully name the file (VOTable) if you store.
Fill a structure with the data returned from some Cone Search call:
IDL> conecall, 202.8, -1.72, 0.1, str=str, $
url = "http://heasarc.gsfc.nasa.gov/cgi-bin/vo/cone/coneGet.pl?table=xmmssc&"
IDL> help, /str,str
** Structure <82c00e4>, 15 tags, length=92, data length=92, refs=1:
CREATED STRING '2005-09-01'
UNIQUE_ID LONG 37423
NAME STRING Array[1]
RA DOUBLE 202.80856
DEC DOUBLE -1.7227970
TIME DOUBLE 52119.536
EP_FLUX FLOAT -999.900
EP_FLUX_ERROR FLOAT -999.900
PN_FLUX FLOAT 1.59372e-14
PN_FLUX_ERROR FLOAT 3.78526e-15
M1_FLUX FLOAT -999.900
M1_FLUX_ERROR FLOAT -999.900
M2_FLUX FLOAT -999.900
M2_FLUX_ERROR FLOAT -999.900
SEARCH_OFFSET DOUBLE 0.54000000
SIAP Queries
Fill a structure with the VOTable Data returned from a SIAP query.Download the XML file from within your research environment.
How might you call it? Click Here
How does it work? It "spawns" a WGET call and downloads the SIAP query XML file. The XML VOTable is read in by the IDL VOTable to create a structure. It requires a position on the sky, and search radius, and a URL (obtained from a Registry call--see below).
We parse for all "?" marks to place the POS=&SIZE=
Once the SIAP call is made, the URL is provided. the procedure SIAPCALL will automatically download the image for you (wheter a fits, gzipped fits, jpg, or gif). WEBGET is part of the ASTROLIB libraries and works only on FITs images (or text files), but not XML.
NOTE: MacOSX comes standard with WGET v1.6. SIAPCALL needs version WGET 1.8+ to work properly.
Lessons Learned:
IDL> siapcall,180,1, 0.1, /metadata,str=str, $
url="http://skyview.gsfc.nasa.gov/cgi-bin/vo/sia.pl?survey=rass-cnt&"
IDL> help, /str,str
** Structure <82b9e84>, 12 tags, length=128, data length=128, refs=1:
CREATED STRING '2005-09-01'
SURVEY STRING Array[1]
RA DOUBLE 180.00000
DEC DOUBLE 1.0000000
DIM LONG 2
SIZE LONG64 Array[2]
SCALE DOUBLE Array[2]
FORMAT STRING Array[1]
PIXFLAGS STRING Array[1]
URL STRING Array[1]
NBYTES LONG 362880
LOGICALNAME STRING Array[1]
IDL> a = webget(str[2].url)
IDL> print, a.imageheader
IDL> tv, a.image
or
IDL> siapcall,180,1, 0.1, url="http://skyview.gsfc.nasa.gov/cgi-bin/vo/sia.pl?survey=rass-cnt&"
IDL> image = readfits('image0.fits',0,hdr)
IDL> hprint, hdr
IDL> tv, image
XMM: Non-compliant. JPGs, GIFs, FITs, GZIPPED. Same issues as Cone Searchers.
Of course, since the image location (URL) is specified in the SIAP XML file, spawning a wget also allows for the immediate download of the image file to some local working directory. In this case, there should be some meaningful root name specified by the user. Be wary of non-compliant SIAP servers (e.g., XMM) in which the images do not really exist at the specified URL, but the location in fact starts a script to retrieve the image. GUNZIPping on the fly.
Using SOAP-based Webservices
Examples of current SOAP-based VO services include the JHU NVO Registry, SkyPortal, and WESIX. For legacy software that has a Java-bridge (i.e., the ability to make calls to external Java classes), the easiest way to go is to use the Service WSDL to generate the Java stubs (wsdl2java) and classes (javac) locally, and then call those classes from within your legacy software. Different legacy software have different ways of "importing" external classes. It is extremely helpful to have a minimally useful Service provided client. Additionally, some thought into how the results will be useful to the user is required. In most cases, I have found that filling a structure with all of the VOTable (XML returned file) does the trick.Registry Calls
So how do I find the Cone Search servers and SIAP servers (etc) to run CONECALL and SIAPCALL? Use CALL_REGISTRY.SIAP queries, Cone Searches, Open Sky Query calls, all require service (or resource) discovery. This is typically done through a Registry call.
How might you call it? Click Here
How does it work? Use the JAVA classes RegistryLocator.class and the regService methods (getResgitrySoap, queryRegistry, etc).
The QUERY is made to be as general as possible.
CONEs, SIAPs, and SKYNODEs are treated "special".
regService = OBJ_NEW('IDLJavaObject$ORG_US_VO_WWW_REGISTRY_LOCATOR', 'org.us_vo.www.RegistryLocator')
regInterface = regService->getRegistrySoap()
query = "Title like '%" + keyword + "%' or " + $
"ShortName like '%" + keyword + "%' or " + $
"Subject like '%" + keyword + "%' or " + $
"Type like '%" + keyword + "%' or " + $
"Description like '%" + keyword + "%' or " + $
"ServiceType like '%" + keyword + "%' or " + $
"Identifier like '%" + keyword + "%'"
Data is in the object "results", which is a SimpleResource. Create a structure and fill it:
IF keyword_set(cone) THEN query = "ServiceType like '%CONE%' and (" + query + ")"
IF keyword_set(skynode) THEN query = "ServiceType like '%SKYNODE%' and (" + query + ")"
IF keyword_set(siap) THEN query = "(ServiceType like '%SIAP%' or ServiceType like '%SIAP;%ARCHIVE%') and (" + query + ")"
callit = regInterface->queryRegistry(query)
results = callit->getSimpleResource()
str = create_struct('Title', ' ', 'URL', ' ', 'Type', ' ', 'ShortName', ' ', $
'ID', ' ', 'Desc', ' ', 'ServiceType', ' ', 'Coverage', ' ', $
'Subjects', ' ')
str = replicate(str, n_elements(results))
FOR I = 0, n_elements(results) -1 DO BEGIN
str[I].Title = results[I]->getTitle()
str[I].URL = results[I]->getServiceURL()
str[I].Type = results[I]->getType()
str[I].ShortName = results[I]->getShortName()
str[I].ID = results[I]->getIdentifier()
str[I].Desc = results[I]->getDescription()
str[I].ServiceType = results[I]->getServiceType()
str[I].Coverage = results[I]->getCoverageSpatial()
Let's give it a try:
Lessons Learned:
IDL> call_registry, 'ROSAT',/SIAP, str=str
IDL> help, /str,str
** Structure <14a5116c>, 9 tags, length=108, data length=108, refs=1:
TITLE STRING '
ROSAT All Sky Survey
'
URL STRING '
http://skyview.gsfc.nasa.gov/cgi-bin/vo/sia.pl?survey=rass-cnt&
'
TYPE STRING ' Archive '
SHORTNAME STRING '
ROSAT/RASS
'
ID STRING 'ivo://nasa.heasarc/skyview/rass'
DESC STRING '
The ROSAT All-Sky X-ray Survey was obtained during 1990/1991 using the ROSAT Position Sens'...
SERVICETYPE STRING 'SIAP'
COVERAGE STRING '<region xsi:type="AllSky" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.ivo'...
SUBJECTS STRING 'surveys'
IDL> call_registry, 'XMM-Newton',/CONE, str=str
IDL> help, /str, str
IDL> help, /str,str[2]
** Structure <14a3666c>, 9 tags, length=108, data length=108, refs=2:
TITLE STRING 'XMM-Newton Serendipitous Source Catalog'
URL STRING '
http://heasarc.gsfc.nasa.gov/cgi-bin/vo/cone/coneGet.pl?table=xmmssc&
'
TYPE STRING 'Catalog'
SHORTNAME STRING 'XMM/SSC'
ID STRING 'ivo://nasa.heasarc/xmmssc'
DESC STRING '
The XMM-Newton Serendipitous Source Catalog (1XMM) is the first comprehensive catalog of serendipito'...
SERVICETYPE STRING 'CONE'
COVERAGE STRING '<region xsi:type="CircleRegion" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://w'...
SUBJECTS STRING 'Serendipitous sources'
IDL> print, str[2].url
http://heasarc.gsfc.nasa.gov/cgi-bin/vo/cone/coneGet.pl?table=xmmssc&
White space can be a nuisance (but it is manageable). Hidden characters (like line breaks) also cause headaches.
A Registry-calling Java client was extremely useful for a non-Java expert to utilize the Java-bridge in IDL.
Compare the Java client below to the IDL client-code above.
From nvoss2005/java/dev/coneclient/FindConeSearch.java
{
// get a registry service object
Registry regService = new RegistryLocator();
// get an interface object that can accept our query.
RegistrySoap regInterface = regService.getRegistrySoap();
// Combine our query with a constraint to return only Cone Searches.
// The resulting query will look something like this:
//
// ServiceType like '%CONE%' and (Title like '%parallax%')
//
query = "ResourceType like '%CONE%' and (" + query + ")";
// Now submit the query
ArrayOfSimpleResource results = regInterface.queryRegistry(query);
return results.getSimpleResource();
}
SkyPortal Calls
The Open Sky Query (OSQ--via the SkyPortal) allows for SQL-like queries of more than 20 astronomical catalogs. OSQ also will cross-match multiple catalogs for you.How might you call it? Click Here
How does it work? Use the JAVA classes SkyPortalLocator, XSD_VOTable, among others.
SKYPORTAL client needs a position, a search radius (in arcminutes) and possibly a SQL query:
qry = " SELECT o.objId, o.ra,o.dec, o.type,t.objId,t.j_m,o.z " + $
" FROM SDSSDR2:PhotoPrimary o, " + $
" TWOMASS:PhotoPrimary t WHERE XMATCH(o,t)<" + strtrim(string(chisqArr),2) + " " + " AND o.type = 3 " + $
" AND Region('Circle J2000 " + strtrim(string(raArr,format='(f10.3)'),2) + " " + strtrim(string(decArr, format='(f10.3)'),2) + $
" " + strtrim(string(srArr, format='(i2)'),2) + "') "
The SKYCLIENT then parses the VOTable internal and places the data into a structure.
resource = OBJ_NEW('IDLJavaObject$Static$FR_U_STRASBG_VIZIER_XML_VOTABLE_1_1_XSD_RESOURCE', 'fr.u_strasbg.vizier.xml.VOTable_1_1_xsd/RESOURCE')
vot = OBJ_NEW('IDLJavaObject$Static$FR_U_STRASBG_VIZIER_XML_VOTABLE_1_1_XSD_VOTABLE', 'fr.u_strasbg.vizier.xml.VOTable_1_1_xsd/VOTABLE')
loc = OBJ_NEW('IDLJavaObject$NET_IVOA_SKYPORTAL_SKYPORTALLOCATOR','net.ivoa.SkyPortal.SkyPortalLocator')
loc ->setSkyPortalSoapEndpointAddress,'http://openskyquery.net/Sky/SkyPortal/SkyPortal.asmx'
addr =loc -> getSkyPortalSoapAddress()
stub = loc -> getSKyPortalSoap()
;Make the query call, return a VOTable
vod = stub -> submitDistributedQuery(qry,"VOTABLE")
vot = vod ->getVOTABLE()
IDL> skyclient, 180,0,1,str = str; Defaults to whatever is uncommented in the MAKE_ADQL.PRO
% Compiled module: MAKE_ADQL.
SELECT o.objId, o.ra,o.dec, o.type,t.objId,t.j_m,o.z FROM SDSSDR2:PhotoPrimary o,
TWOMASS:PhotoPrimary t WHERE XMATCH(o,t)<2.50000 AND o.type = 3 AND Region('Circle J2000 180.000 0.000 1')
IDL> help, /str, str
** Structure <520790>, 8 tags, length=48, data length=48, refs=1:
SDSSDR2_OBJID LONG64 588848899912695818
SDSSDR2_RA DOUBLE 179.98346
SDSSDR2_DEC DOUBLE -0.0016030098
SDSSDR2_TYPE LONG 3
TWOMASS_OBJID LONG 1016752918
TWOMASS_J_M FLOAT 16.7000
SDSSDR2_Z FLOAT 17.6322
CHISQ DOUBLE 0.088211060
Lessons Learned:
I needed some pre-built Java classes (from nvoss2005/java/dev/skyportalclient) to make this work. If the Java classes needed to call a service (by some client) do not already exist, it is given as a responsibility to the user (wsdl2java, javac, etc).
WESIX Calls
WESIX allows the use to upload any image (or specify an image URL) and receive back a table of all found sources (via SExtractor). It will also cross-match to any of the catalogs in OSQ as well. Of course, your legacy code could already have a source extraction algorithm embedded (or is able to easily call one externally). In that case, the user would simply want to upload a table to SkyPortal and do the matching for an equivalent WESIX-like client.
IDL> wesix, 'http://frank.phyast.pitt.edu/~simon/fpC-002243-r4-0320.fit', str=str
IDL> help, /str, str
IDL> wesix, image='http://archive.noao.edu/nsa/search_output.php?action=fdownload&fname=tlgrid$g225.fits', str=str
