Perl Mini-tutorial

What is Perl?

Perl is an interpreted language available on essentially all architectures. It is widely used for scripting, building Web clients and services, and managing access to databases.

In this tutorial we assume you have at least Perl 5.6. Perl 5.8 is the current standard. If you have Perl installed you can check the version with

perl -v

This is perl, v5.6.1 built for i386-linux

Copyright 1987-2001, Larry Wall

Perl may be copied only under the terms of either the Artistic License or the
GNU General Public License, which may be found in the Perl 5 source kit.

Complete documentation for Perl, including FAQ lists, should be found on
this system using `man perl' or `perldoc perl'.  If you have access to the
Internet, point your browser at http://www.perl.com/, the Perl Home Page.

None of the examples we show here use any cutting-edge features of Perl. However the most recent versions are preferable in terms of ensuring compatibility with user contributed libraries. Source and binary distributions for Windows and Mac of Perl are freely available.

Some strengths and weaknesses.

Simple examples

Change directory to $NVOSS_HOME/perl/lib/examp to see the example files.

Hello World example

#!/usr/bin/perl

print "Hello world\n";

Note that there is virtually no boiler-plate required (except perhaps the final semi-colon. These scripts are written for Unix machines, so that we can execute this script either as:

perl < helloworld.pl

or

helloworld.pl

Fixing the first line.

You might have installed Perl somewhere other than the system default even if you are using Unix. If so we may need to fix all of the first lines. The following script will do that...

Warning: This will alter the files and if what you type in is incorrect for your system it can cause problems. Make sure that the first argument begins with a # so that it is a comment in some fashion. You probably need to escape so that it's properly recognized.

#!/usr/bin/perl
# Use the first argument as the first
# line in the files designated in subsequent arguments.
#
# Usage: firstline.pl newfirstline file1 file2 ...
#
my $line1 = shift(@ARGV);

foreach $file (@ARGV) {
    open(INPUT, $file);
    @content = <INPUT>;
    close(INPUT);
    $content[0] = $line1."\n";
    open(OUTPUT, ">$file");
    print OUTPUT @content;
    close(OUTPUT);
}

Suppose you installed Perl in /users/jones/bin/perl then we can fix the Hello World script with the simple command.

firstline.pl  '#!/users/jones/bin/perl' helloworld.pl

Note the usage of the ARGV for the command line arguments. ARGV[0] is the first argument, ARGV[1] the next and so forth. When we open a file and just give a file name, the file is opened for input. When we open it and prefix it with a '>' we open it for writing.

Perl data types.

Perl has three basic data types that you will use constantly: scalars, lists and hashes. Perl doesn't (by default) distinguish between integers and reals, or between numbers and strings. It generally tries to do what makes sense in a given context. They would all be Perl scalars.

You can tell which type a variable is by the symbol that prefixes it.

$ Scalar A single string or numeric values (or pointer in more advanced applications)
@ List An 0-indexed array which may not have missing values. An element of the list @myList is referenced as $myList[$index]
% Hash A object where we can set and get an entry based upon any string. An element of the hash %myHash is referenced as $myHash{$key}. The elements in a hash have no specified order.

Note that we use the % or @ prefixes only when we are refering to the hash or list as a whole (i.e., pop(@list) pops the last element of the list), but we use a $ prefix when we are refering to a single element of a hash or list -- since that is a scalar. Since Perl puts these punctuation characters before each variable, it's easy to write programs that look very dense. But it also allows us to intermingle text and variables in useful ways. E.g,

   print "A=$a, b=$c sum=$sum";
is clearer than
   printf("A=%10.3f, b=%6.2f, sum=%10.5f", a, b, c);
and doesn't have problems where you might get the order or number of variables incorrect.

Vocabulary example.

This simple example shows how the types and Perl's powerful text processing enables us to count the number of times we use all words in a document.

#!/usr/bin/perl
#
# Find the usage of words in a file.
# Usage: vocab.pl < file

my %hash;                         # Going to use a hash.
while ($line = <STDIN>) {         # Read a line into a scalar
    chomp ($line);                # Get rid of the newline at the end
    @words = split("[ |\.|,]+", $line);   # Split up each line into an array of words
    foreach $word (@words) {      # Loop over each word
        $hash{lc($word)} += 1;    # Increment the hash counter for each word
    }
}
my $total = 0;
foreach $word(sort(keys(%hash))) {  # Loop over each word that was found
    printf "%5d: %s\n", $hash{$word}, $word;  # Print the result.
    $total += $hash{$word};
}
print "\n Total: $total\n";

Perl and the Web

Web sites are just streams of characters and Perl is well adapted to processing such streams. E.g., suppose we have a list of RA's and Dec's and we want to convert them to L's and B's. There are plenty of Web sites that will do this, but we'd like to run this as a script. Here's one way to do it.

#!/usr/bin/perl
#
# Convert J2000 RA/Dec to Galactic coordinates.
# Use Batch mode of HEASARC coordinate converter.
#
# Usage: coordconverter < coordconvert.pl


use strict;
use lib "/www/server/vo/inst/local/lib";
use VO::Service;

while (my $line = <STDIN>) {
    my ($ra, $dec) = split(" ", $line);

    my $url     = "http://heasarc.gsfc.nasa.gov/cgi-bin/Tools/convcoord/convcoord.pl?CoordVal=$ra,$dec&Outp
ut=Batch";
    my $results = VO::Service::GET($url);
    my @res = split("[\||\n]", $results);
    print "$ra | $dec | $res[13] | $res[14]\n";
}

This assumes we have decimal RA's and Dec's, but with just a bit of tweaking it could handle sexagesimal coordinates or target names.

Perl and SOAP

Here's something similar but using SOAP rather than a simple HTTP GET. In principle with SOAP we should have been able to get back a complex structure directly, but this particular service chooses to return a string that we parse.

SOAP is a pretty complex interface, but that basic SOAP call can be done in just a few simple statements in Perl. The SOAP::Lite module has gazillions of options for more complex queries. You can try "perldoc SOAP::Lite" to learn about some of them.

#!/www/server/vo/inst/bin/perl
#
# Simple example of SOAP interface and Perl parsing capabilities.
#
# Usage: sesameTest.pl target [debug]
#
# This program calls the CDS Sesame SOAP service to find
# the position of an object.
#
# See http://cdsweb.u-strasbg.fr/cdsws.gml
#
# Tom McGlynn, June 29, 2005


use strict;
use lib "/www/server/vo/inst/local/lib";
use VO::Util;
use VO::Service::SOAP;

# What are the SOAP parameters we want for the Sesame service.
my $proxy   = 'http://cdsws.u-strasbg.fr/axis/services/Sesame';
my $ns      = 'urn:Sesame';
my $method  = "sesame";

# Indicate the field in the returned XML that we are going to want
# to look for.  The '//return' element is a complex string that we
# will parse.  We could return a more complex object if we desired.

my $want    = "//return";

# Set the arguments for the call.
my %args    = ("name" => $ARGV[0]);

# Make the SOAP query
my $results = VO::Service::SOAP::query($proxy, $ns, $method, $want, %args);

# If there are not any results, then the resolver didn't work
if (@$results < 1) {
    print "Error in connection/Invalid inputs?  No data returned.\n";

} else {
    my $result = $$results[0];

    if ($ARGV[1]) {
       print "Result before parsing is: \n\n****\n$result****\n\n";
    }

    # This parses out the decimal J2000 coordinates
    # I.e., look for string beginning with '%J ' and then
    # find everything until we hit a left parenthesis (.
    # This isn't very XMLish but it's easy enough!
    # A better SOAP service would have separate methods
    # or elements for each type already parsed.
    $result =~ /\%J ([^\(]*)/gm;
    if (defined($1)) {
        my ($ra, $dec) = split(" ", $1);
        print "Resolved position: RA=$ra, Dec=$dec\n";
    } else {
        print "Unknown target. Target not resolved.\n";
    }

}

Perl and CPAN

CPAN, the Comprehensive Perl Archive Network, is one of the main reasons to consider using Perl. Hundreds of user contributed and maintained libraries are available including ~30 libraries in the Astro package. This includes FITS and WCS tools as well as tools for linking to astronomy data analysis environments. Each library comes with an automated installation script and most can be plugged into your installation in just a few seconds.

If you want to do something, it's always a good idea to check out CPAN first.

The Perl VO Library.

A library including a few general packages and several which support VO functionalities is included in the summer school library. It includes:

These tools can be used to build VO clients and services.

The VO library tools include a general VO Table generator and parser. This currently speaks VOTable 1.0 but will shortly be upgraded to 1.1.

Installing Perl.

Distributions of the latest version of Perl are available at the official Perl web site www.perl.org. If there is an ActivePerl installation available for your machine architecture (Windows, Mac and some versions of Unix) these are probably the easiest. ActivePerl provides self-installing downloads. There are other executable and source distributions for a myriad different architectures.

The latest ActivePerl distributions come with almost all libraries you will need pre-installed. You will need to install the database libraries if you want to use database applications. If you have installed Perl in c:\Perl then you may wish to try the following:

c:\Perl > *ppm*
PPM - Programmer's Package Manager version 3.2.
Copyright (c) 2001 ActiveState Corp. All Rights Reserved.
ActiveState is a division of Sophos.

Entering interactive shell. Using Term::ReadLine::Perl as readline library.

Type 'help' to get started.

ppm> install DBI

... feedback on DBI installation

ppm> install DBD::mysql

... feedback on DBD::mysql installation

ppm> exit

This installs the overall database library (DBI) and the module for the MySQL database. DBDs are available for all major database systems.

If you start with another installation you may need to install other libraries including the SOAP::Lite module. You may wish to explore the CPAN module as the easiest way to explore installing libraries. Try

   perldoc cpan
for documentation on how to use CPAN.