[XML4Lib] Help with XPATH in a Schematron document
Schlosser, Melanie Brynn
mschloss at indiana.edu
Wed Jun 6 15:18:09 EDT 2007
I'm writing Schematron rules to perform automated quality control and
validation of TEI files. Most of my assertions are working correctly,
but I'm having trouble with a few XPATH expressions. If any of you
XPATH wizards wants to help me out, I'd really appreciate it!
I've included the assertions that aren't working, and then the entire
document. If you would like to see the TEI file I'm working on, I can
email it (it's a bit long to paste into a listserv email).
Thanks!
Melanie Schlosser
Indiana University Digital Library Program
**********************************************************************
The assertions that aren't working are these four:
1. I want it to check for the type attribute in <list>, and make sure
the content matches one of the four values listed. Right now it doesn't
notice if I remove the attribute.
<sch:rule id="r12" context="list">
<sch:assert role="M" test="@type='simple' or
@type='ordered' or @type='footnotes' or @type='bibliography'"
>List elements must have a 'type' attribute with the
value simple|ordered|footnotes|bibliography.</sch:assert>
</sch:rule>
2. The rend attribute is optional, but if it is present, the only value
is 'blockquote.' As it is, it doesn't notice if I change 'blockquote'
to something else.
<sch:rule id="r13" context="quote/@rend">
<sch:assert role="M" test="normalize-space(.) = 'blockquote'"
>'Blockquote' is the only acceptable value for the 'rend'
attribute in 'quote'.</sch:assert>
</sch:rule>
3. I want to make sure there are no elements inside <placeName> besides
country, region and settlement. I know this isn't the correct XPATH for
this, but I can't figure out how to use the syntax I used for <front>
and <back> on this one (see below).
<sch:rule id="r14" context="placeName">
<sch:assert role="M" test="country|region|settlement"
>PlaceName must contain only country, region, or settlement
elements.</sch:assert>
</sch:rule>
4. This is the same situation as 'blockquote'. It's commented out in
the document below because as it is it actually breaks the validator.
<sch:rule id="r16" context="placeName/region/@type">
<sch:assert role="M" test="@type='state' or @type='county'
or @type='province'"
>Region's optional 'type' attribute must have the value
"state," "county," or "province."</sch:assert>
</sch:rule>
*********************************************************************
The whole document looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<sch:schema xmlns:sch="http://purl.oclc.org/dsdl/schematron">
<!-- The ISO RelaxNG schema against which to validate is located
here: http://www.schematron.com/iso/iso-schematron.rnc -->
<sch:title>Indiana Magazine of History</sch:title>
<sch:p>This schema tests TEI-encoded issues of the Indiana Magazine
of History.</sch:p>
<sch:pattern id="p1">
<sch:title>Root element</sch:title>
<sch:p>This pattern tests the root TEI.2 element and its ID
attribute.</sch:p>
<sch:rule id="r1" context="TEI.2">
<sch:assert role="M" test="(.)[matches(@id, 'VAA4025-\d\d\d-\d')]"
>The ID attribute in the TEI.2 element must match
"VAA4025-xxx-x".</sch:assert>
</sch:rule>
</sch:pattern>
<sch:pattern id="p2">
<sch:title>TEI Header/fileDesc</sch:title>
<sch:p>This pattern tests elements in the TEI Header.</sch:p>
<sch:rule id="r2" context="teiHeader/fileDesc/titleStmt/title">
<sch:assert role="M" test="normalize-space(.) = 'Indiana
Magazine of History'"
>The title in the fileDesc must be "Indiana Magazine of
History"</sch:assert>
</sch:rule>
<sch:rule id="r3" context="teiHeader/fileDesc/titleStmt/respStmt/resp">
<sch:assert role="M" test="normalize-space(.) = 'Encoded by'"
>The resp element in the RespStmt must contain "Encoded
by".</sch:assert>
</sch:rule>
<sch:rule id="r4" context="teiHeader/fileDesc/titleStmt/respStmt/name">
<sch:assert role="M" test="normalize-space(.) = 'Aptara Inc.'"
>The name element in the RespStmt must contain "Aptara
Inc.".</sch:assert>
</sch:rule>
<sch:rule id="r5" context="teiHeader/fileDesc/publicationStmt">
<sch:assert role="M" test="normalize-space(publisher) =
'Digital Library Program, Indiana University'"
>The publisher must be "Digital Library Program,
Indiana University".</sch:assert>
<sch:assert role="M" test="normalize-space(pubPlace) =
'Bloomington, IN'"
>The pubPlace must be "Bloomington, IN"</sch:assert>
<sch:assert role="M" test="normalize-space(date) = '2007'"
>The publication date must be "2007"</sch:assert>
<sch:assert role="M"
test="normalize-space(availability/p[1]) ='Copyright 2007 Trustees of
Indiana University'"
>The Copyright statement must be "Copyright 2007
Trustees of Indiana University".</sch:assert>
<sch:assert role="M"
test="normalize-space(availability/p[2]) = 'Indiana University provides
the information contained on this web site for non-commercial,
personal, or research use only. All other use, including but not
limited to commercial or scholarly reproductions, redistribution,
publication or transmission, whether by electronic means or otherwise,
without prior written permission of the copyright holder is strictly
prohibited.'"
>The use statement in the second paragraph under
availability is incorrect.</sch:assert>
</sch:rule>
<sch:rule id="r6" context="teiHeader/fileDesc/seriesStmt">
<sch:assert role="M" test="normalize-space(title) =
'Indiana Magazine of History'"
>The title in the seriesStmt must be "Indiana Magazine
of History".</sch:assert>
</sch:rule>
<sch:rule id="r7"
context="teiHeader/fileDesc/sourceDesc/biblStruct/monogr/title">
<sch:assert role="M" test="normalize-space(.) = 'Indiana
Magazine of History'"
>The title in the biblStruct in sourceDesc must be
"Indiana Magazine of History".</sch:assert>
</sch:rule>
<sch:rule id="r8"
context="teiHeader/fileDesc/sourceDesc/biblStruct/monogr">
<sch:assert role="M"
test="normalize-space(imprint/pubPlace) = 'Bloomington, IN'"
>The pubPlace in the biblStruct in sourceDesc must be
"Bloomington, IN".</sch:assert>
<sch:assert role="M"
test="normalize-space(imprint/publisher) = 'Indiana University
Department of History in cooperation with the Indiana Historical
Society'"
>The publisher in the biblStruct in sourceDesc must be
"Indiana University Department of History in cooperation with
the Indiana Historical Society".</sch:assert>
<sch:assert role="M" test="imprint[matches(date, '19\d\d')]"
>The date in the biblStruct in sourceDesc must match
"19xx".</sch:assert>
<sch:assert role="M" test="imprint/biblScope[1][@type='issue']"
>The first biblScope must be 'type="issue"'.</sch:assert>
<sch:assert role="M" test="imprint/biblScope[2][@type='volume']"
>The second biblScope must be 'type="volume"'.</sch:assert>
<sch:assert role="M" test="imprint/biblScope[3][@type='pages']"
>The third biblScope must be 'type="pages"'.</sch:assert>
<sch:assert role="M" test="imprint[matches(biblScope[3],
'[0123]\d\d-[0123]\d\d')]"
>The page numbers must match "xxx-xxx", where the first
"x" in each number is between 0 and 3.</sch:assert>
</sch:rule>
</sch:pattern>
<sch:pattern id="p3">
<sch:title>Front- and back-matter</sch:title>
<sch:p>This pattern checks to make sure there is no content
inside the front and back sections except page break tags</sch:p>
<sch:rule id="r9" context="text">
<sch:assert role="M" test="not(front[* except pb])"
>There are elements in the front matter besides page
breaks.</sch:assert>
<sch:assert role="M" test="not(back[* except pb])"
>There are elements in the back matter besides page
breaks.</sch:assert>
</sch:rule>
</sch:pattern>
<sch:pattern id="p4">
<sch:title>Global elements</sch:title>
<sch:p>This pattern checks the presence, content, and/or
attributes of certain elements that can appear in multiple
places.</sch:p>
<sch:rule id="r10" context="pb">
<sch:assert role="M" test="@n"
>Page break elements must have an "n" attribute.</sch:assert>
<sch:assert role="M" test="@id"
>Page break elements must have an "id" attribute.</sch:assert>
<sch:assert role="M" test="(.)[matches(@id,
'VAA4025-\d\d\d-\d-\d\d\d')]"
>"id" attributes in page breaks must match the pattern
'VAA4025-xxx-x-xxx'.</sch:assert>
</sch:rule>
<sch:rule id="r11" context="div">
<sch:assert role="M" test="@type='scholarlyArticle' or
@type='bookReview' or @type='editorialMaterial' or @type='letter' or
@type='diary'"
>Div elements must have a "type" attribute with the
value 'scholarlyArticle', 'bookReview', 'editorialMaterial', 'letter',
or 'diary'.</sch:assert>
<sch:assert role="M" test="(.)[matches(@id,
'VAA4025-\d\d\d-\d-a\d\d')]"
>When div type="scholarlyArticle|bookReview", an id
attribute must be present. Its content should match the pattern
'VAA4025-xxx-x-axx'.</sch:assert>
</sch:rule>
<sch:rule id="r12" context="list">
<sch:assert role="M" test="@type"
>List elements must have a 'type' attribute with the
value simple|ordered|footnotes|bibliography.</sch:assert>
</sch:rule>
<!--The list one isn't working. The report doesn't catch it when I
remove the attribute. (='simple' or @type='ordered' or
@type='footnotes' or @type='bibliography')-->
<sch:rule id="r13" context="quote/@rend">
<sch:assert role="M" test="normalize-space(.) = 'blockquote'"
>'Blockquote' is the only acceptable value for the 'rend'
attribute in 'quote'.</sch:assert>
</sch:rule>
<!--This one isn't working either. I can't get it to just check
the content of the 'rend' attribute when present.-->
</sch:pattern>
<sch:pattern id="p5">
<sch:title>Place names</sch:title>
<sch:p>This pattern checks the place name encoding.</sch:p>
<sch:rule id="r14" context="placeName">
<sch:assert role="M" test="country|region|settlement"
>PlaceName must contain only country, region, or settlement
elements.</sch:assert>
<!--This one isn't working either.-->
</sch:rule>
<sch:rule id="r15" context="placeName/country">
<sch:assert role="M" test="@reg"
>Country elements must have a 'reg' attribute.</sch:assert>
</sch:rule>
<!--<sch:rule id="r16" context="placeName/region/@type">
<sch:assert role="M" test="@type='state' or @type='county'
or @type='province'"
>Region's optional 'type' attribute must have the value
"state," "county," or "province."</sch:assert>-->
<!--Same problem as rend="blockquote"-->
<!--</sch:rule>-->
<sch:rule id="r17" context="placeName/region[@type='state']">
<sch:assert role="M" test="@reg"
>When a region is type="state", a 'reg' attribute must
be present.</sch:assert>
</sch:rule>
<sch:rule id="r18" context="placeName/settlement">
<sch:assert role="M" test="@type='city'"
>Settlement elements must have a 'type' attribute with
the value "city." </sch:assert>
</sch:rule>
</sch:pattern>
</sch:schema>
More information about the XML4Lib
mailing list