[XML4Lib] argg, namespaces
Erik Hetzner
erik.hetzner at ucop.edu
Sat Jul 14 13:55:55 EDT 2007
At Sat, 14 Jul 2007 13:39:55 -0400,
Eric Lease Morgan <emorgan at nd.edu> wrote:
>
>
> Argg! Namespaces, a necessary evil.
>
> Seriously, how to I write a set of XPath statments to parse an XML
> file that contains a namespace?
>
> I have the following MODS file, and it includes an un-prefixed
> namespace (http://www.loc.gov/mods/v3):
>
> […]
> Then, using Perl, I create a LibXML parser, parse the file ($input)
> creating an object, and try to loop through all of the mods elements:
>
> $parser = XML::LibXML->new;
> $collection = $parser->parse_file( $input );
> foreach my $mods ( $collection->findnodes( '//mods' )) {
>
> my $titles = '';
> foreach $node ( $mods->findnodes( './/titleInfo/title' )) {
> $titles .= $node->textContent . '|'
> }
>
> # do more cool stuff here
>
> }
>
> As written my Perl script never enters the foreach loop, but as soon
> as I remove the namespace declaration from the MODS file the script
> works just fine.
>
> How do I specify the namespace in my findnodes method?
Writing the xpath is easy:
.//mods:titleInfo/mods:title
The trick is binding the mods prefix to <http://www.loc.gov/mods/v3>
in the Xpath ‘context’. It looks like with libxml you have to declare
a context object & use it to wrap your nodes. I haven’t tried this
code; for more info, see
<http://search.cpan.org/dist/XML-LibXML/lib/XML/LibXML/XPathContext.pod>
$parser = XML::LibXML->new;
$collection = $parser->parse_file( $input );
my $xc = XML::LibXML::XPathContext->new($collection);
$xc->registerNs('mods', 'http://www.loc.gov/mods/v3');
foreach my $mods ( $xc->findnodes( '//mods:mods' )) {
[…]
}
best,
Erik Hetzner
;; Erik Hetzner, California Digital Library
;; gnupg key id: 1024D/01DB07E3
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://lists.webjunction.org/wjlists/xml4lib/attachments/20070714/87238fb5/attachment.bin
More information about the XML4Lib
mailing list