XML::Parser

home > programming resources > perl > perl-xml > XML::Parser

Overview

XML::Parser was originally developed by Larry Wall and is now maintained by Clark Cooper. XML::Parser is an interface to James Clark's XML parser, Expat, a non-validating parser written in C.

XML::Parser is an event-based parser, which means that you must register event handlers (callbacks) with the parser to process an XML document. As the document is parsed, the event handlers are called when the corresponding events are detected.

There are several handlers, but most applications will only need the Start, End, and Char handlers. See the module documentation for a list of all the handlers.

Download

Requirements

Installation

As of version 2.30 of XML::Parser, Expat is no longer included with the package. If Expat is not already installed on your system, you can down load it from the link above. If you are using the ActiveState port of Perl, XML::Parser should already installed on your system. Otherwise, you must install the module yourself.

expat

% ./configure
% make
# make install

XML::Parser

% perl Makefile.PL
% make
% make test
# make install

Parsing a Simple XML Document

This first example will demonstrate how to parse a simple XML document using XML::Parser. This first script will use the Start, End, and Char handlers.

country.xml

<?xml version="1.0"?>
<country>
  <name abbrv="fr">France</name>
  <capital>Paris</capital>
  <language>French</language>
</country>

xmlparser1.pl

#!/usr/bin/perl -w
use strict;
use XML::Parser;

die "Usage: xmlparser1.pl " unless @ARGV == 1;

my $file = shift;

die qq!Can't find file "$file"! unless -f $file;

my $parser = new XML::Parser(
	Handlers => {Start => \&start_handler,
                     End   => \&end_handler,
                     Char  => \&char_handler));

$parser->parsefile($file);

##########################
# Handlers               #
##########################
sub start_handler {
    my $expat = shift;
    my $element = shift;
 
    print "START: $element\n";

    # Handle the attributes
    while (@_) {
        my $attr = shift;
        my $val = shift;
        print qq!ATTR: $attr="$val"\n!;
    }

}

sub end_handler {
    my($expat, $element) = @_;
    print "END: $element\n";

}

sub char_handler {
    my($expat, $data) = @_;
    return if $data =~ /^\s+$/;
    print "CHAR: $data\n";

}
Running this script,
% perl xmlparser1.pl country.xml
will produce the following output:
START: country
START: name
ATTR: abbrv="fr"
CHAR: France
END: name
START: capital
CHAR: Paris
END: capital
START: language
CHAR: French
END: language
END: country

Converting XML to Text

xmlparser2.pl

#!/usr/bin/perl -w
use strict;
use XML::Parser;

die "Usage: xmlparser1.pl " unless @ARGV == 1;

my $file = shift;

die qq!Can't find file "$file"! unless -f $file;

my $parser = new XML::Parser(ErrorContext => 2);
$parser->setHandlers(Start => \&start_handler,
                     End   => \&end_handler,
                     Char  => \&char_handler);

$parser->parsefile($file);

##########################
# Handlers               #
##########################
sub start_handler {
    my $expat = shift;
    my $element = shift;
    my %attribs = @_;

    if ( $element =~ /\bfirst\b/i ) {
        print "First Name: ";
    } elsif ( $element =~ /\blast\b/i ) {
        print "Last Name: ";
    } elsif ( $element =~ /\bstreet\b/i ) {
        print "Street: ";
    } elsif ( $element =~ /\bcity\b/i ) {
        print "City: ";
    } elsif ( $element =~ /\bstate\b/i ) {
        print "State: ";
    } elsif ( $element =~ /\bzip\b/i ) {
        print "Zip: ";
    } elsif ( $element =~ /\btel\b/i ) {
        print "Phone ($attribs{'type'}): ";
    } elsif ( $element =~ /\bemail\b/i ) {
        print "Email: ";
    }

}

sub end_handler {
    my($expat, $element) = @_;

    if ( $element =~ /\bcontact\b/i ) {
        print "\n";
    }

}

sub char_handler {
    my($expat, $data) = @_;
    return if $data =~ /^\s+$/;
    print "$data\n";

}

Converting XML to HTML

Resources


doug@dougsparling.com
Last Modified: Monday, 09-May-2001