[XML4Lib] Help with XPATH in a Schematron document

Schlosser, Melanie Brynn mschloss at indiana.edu
Wed Jun 6 15:18:09 EDT 2007


I'm writing Schematron rules to perform automated quality control and 
validation of TEI files. Most of my assertions are working correctly, 
but I'm having trouble with a few XPATH expressions. If any of you 
XPATH wizards wants to help me out, I'd really appreciate it!

I've included the assertions that aren't working, and then the entire 
document. If you would like to see the TEI file I'm working on, I can 
email it (it's a bit long to paste into a listserv email).

Thanks!
Melanie Schlosser
Indiana University Digital Library Program

**********************************************************************
The assertions that aren't working are these four:

1. I want it to check for the type attribute in <list>, and make sure 
the content matches one of the four values listed. Right now it doesn't 
notice if I remove the attribute.

<sch:rule id="r12" context="list">
            <sch:assert role="M" test="@type='simple' or 
@type='ordered' or @type='footnotes' or @type='bibliography'"
                >List elements must have a 'type' attribute with the 
value simple|ordered|footnotes|bibliography.</sch:assert>
        </sch:rule>

2. The rend attribute is optional, but if it is present, the only value 
is 'blockquote.' As it is, it doesn't notice if I change 'blockquote' 
to something else.

<sch:rule id="r13" context="quote/@rend">
        <sch:assert role="M" test="normalize-space(.) = 'blockquote'"
            >'Blockquote' is the only acceptable value for the 'rend' 
attribute in 'quote'.</sch:assert>
    </sch:rule>

3. I want to make sure there are no elements inside <placeName> besides 
country, region and settlement. I know this isn't the correct XPATH for 
this, but I can't figure out how to use the syntax I used for <front> 
and <back> on this one (see below).

<sch:rule id="r14" context="placeName">
            <sch:assert role="M" test="country|region|settlement"
            >PlaceName must contain only country, region, or settlement 
elements.</sch:assert>
       </sch:rule>

4. This is the same situation as 'blockquote'. It's commented out in 
the document below because as it is it actually breaks the validator.

<sch:rule id="r16" context="placeName/region/@type">
            <sch:assert role="M" test="@type='state' or @type='county' 
or @type='province'"
                >Region's optional 'type' attribute must have the value 
"state," "county," or "province."</sch:assert>
            </sch:rule>

*********************************************************************
The whole document looks like this:

<?xml version="1.0" encoding="UTF-8"?>

<sch:schema xmlns:sch="http://purl.oclc.org/dsdl/schematron">
    <!-- The ISO RelaxNG schema against which to validate is located 
here: http://www.schematron.com/iso/iso-schematron.rnc -->
    <sch:title>Indiana Magazine of History</sch:title>

    <sch:p>This schema tests TEI-encoded issues of the Indiana Magazine 
of History.</sch:p>

   <sch:pattern id="p1">
        <sch:title>Root element</sch:title>
        <sch:p>This pattern tests the root TEI.2 element and its ID 
attribute.</sch:p>
        <sch:rule id="r1" context="TEI.2">
            <sch:assert role="M" test="(.)[matches(@id, 'VAA4025-\d\d\d-\d')]"
                >The ID attribute in the TEI.2 element must match 
"VAA4025-xxx-x".</sch:assert>
        </sch:rule>
    </sch:pattern>

    <sch:pattern id="p2">
        <sch:title>TEI Header/fileDesc</sch:title>
        <sch:p>This pattern tests elements in the TEI Header.</sch:p>
        <sch:rule id="r2" context="teiHeader/fileDesc/titleStmt/title">
            <sch:assert role="M" test="normalize-space(.) = 'Indiana 
Magazine of History'"
                >The title in the fileDesc must be "Indiana Magazine of 
History"</sch:assert>
        </sch:rule>
        <sch:rule id="r3" context="teiHeader/fileDesc/titleStmt/respStmt/resp">
            <sch:assert role="M" test="normalize-space(.) = 'Encoded by'"
                >The resp element in the RespStmt must contain "Encoded 
by".</sch:assert>
        </sch:rule>
        <sch:rule id="r4" context="teiHeader/fileDesc/titleStmt/respStmt/name">
            <sch:assert role="M" test="normalize-space(.) = 'Aptara Inc.'"
                >The name element in the RespStmt must contain "Aptara 
Inc.".</sch:assert>
        </sch:rule>
        <sch:rule id="r5" context="teiHeader/fileDesc/publicationStmt">
            <sch:assert role="M" test="normalize-space(publisher) = 
'Digital Library Program, Indiana University'"
                >The publisher must be "Digital Library Program, 
Indiana University".</sch:assert>
            <sch:assert role="M" test="normalize-space(pubPlace) = 
'Bloomington, IN'"
                >The pubPlace must be "Bloomington, IN"</sch:assert>
            <sch:assert role="M" test="normalize-space(date) = '2007'"
                >The publication date must be "2007"</sch:assert>
            <sch:assert role="M" 
test="normalize-space(availability/p[1]) ='Copyright 2007 Trustees of 
Indiana University'"
                >The Copyright statement must be "Copyright 2007 
Trustees of Indiana University".</sch:assert>
            <sch:assert role="M" 
test="normalize-space(availability/p[2]) = 'Indiana University provides 
the information contained on this web site for non-commercial, 
personal, or research use only. All other use, including but not 
limited to commercial or scholarly reproductions, redistribution, 
publication or transmission, whether by electronic means or otherwise, 
without prior written permission of the copyright holder is strictly 
prohibited.'"
                >The use statement in the second paragraph under 
availability is incorrect.</sch:assert>
        </sch:rule>
        <sch:rule id="r6" context="teiHeader/fileDesc/seriesStmt">
            <sch:assert role="M" test="normalize-space(title) = 
'Indiana Magazine of History'"
                >The title in the seriesStmt must be "Indiana Magazine 
of History".</sch:assert>
        </sch:rule>
        <sch:rule id="r7" 
context="teiHeader/fileDesc/sourceDesc/biblStruct/monogr/title">
            <sch:assert role="M" test="normalize-space(.) = 'Indiana 
Magazine of History'"
                >The title in the biblStruct in sourceDesc must be 
"Indiana Magazine of History".</sch:assert>
        </sch:rule>
        <sch:rule id="r8" 
context="teiHeader/fileDesc/sourceDesc/biblStruct/monogr">
            <sch:assert role="M" 
test="normalize-space(imprint/pubPlace) = 'Bloomington, IN'"
                >The pubPlace in the biblStruct in sourceDesc must be 
"Bloomington, IN".</sch:assert>
            <sch:assert role="M" 
test="normalize-space(imprint/publisher) = 'Indiana University 
Department of History in cooperation with the Indiana Historical 
Society'"
                >The publisher in the biblStruct in sourceDesc must be 
"Indiana University Department of History in cooperation with
                the Indiana Historical Society".</sch:assert>
            <sch:assert role="M" test="imprint[matches(date, '19\d\d')]"
                >The date in the biblStruct in sourceDesc must match 
"19xx".</sch:assert>
            <sch:assert role="M" test="imprint/biblScope[1][@type='issue']"
                >The first biblScope must be 'type="issue"'.</sch:assert>
            <sch:assert role="M" test="imprint/biblScope[2][@type='volume']"
                >The second biblScope must be 'type="volume"'.</sch:assert>
            <sch:assert role="M" test="imprint/biblScope[3][@type='pages']"
                >The third biblScope must be 'type="pages"'.</sch:assert>
            <sch:assert role="M" test="imprint[matches(biblScope[3], 
'[0123]\d\d-[0123]\d\d')]"
                >The page numbers must match "xxx-xxx", where the first 
"x" in each number is between 0 and 3.</sch:assert>
        </sch:rule>
    </sch:pattern>

    <sch:pattern id="p3">
        <sch:title>Front- and back-matter</sch:title>
        <sch:p>This pattern checks to make sure there is no content 
inside the front and back sections except page break tags</sch:p>
        <sch:rule id="r9" context="text">
            <sch:assert role="M" test="not(front[* except pb])"
                >There are elements in the front matter besides page 
breaks.</sch:assert>
            <sch:assert role="M" test="not(back[* except pb])"
                >There are elements in the back matter besides page 
breaks.</sch:assert>
        </sch:rule>
    </sch:pattern>

    <sch:pattern id="p4">
        <sch:title>Global elements</sch:title>
        <sch:p>This pattern checks the presence, content, and/or 
attributes of certain elements that can appear in multiple 
places.</sch:p>
        <sch:rule id="r10" context="pb">
            <sch:assert role="M" test="@n"
                >Page break elements must have an "n" attribute.</sch:assert>
            <sch:assert role="M" test="@id"
            >Page break elements must have an "id" attribute.</sch:assert>
            <sch:assert role="M" test="(.)[matches(@id, 
'VAA4025-\d\d\d-\d-\d\d\d')]"
                >"id" attributes in page breaks must match the pattern 
'VAA4025-xxx-x-xxx'.</sch:assert>
        </sch:rule>
        <sch:rule id="r11" context="div">
            <sch:assert role="M" test="@type='scholarlyArticle' or 
@type='bookReview' or @type='editorialMaterial' or @type='letter' or 
@type='diary'"
                >Div elements must have a "type" attribute with the 
value 'scholarlyArticle', 'bookReview', 'editorialMaterial', 'letter', 
or 'diary'.</sch:assert>
            <sch:assert role="M" test="(.)[matches(@id, 
'VAA4025-\d\d\d-\d-a\d\d')]"
            >When div type="scholarlyArticle|bookReview", an id 
attribute must be present. Its content should match the pattern 
'VAA4025-xxx-x-axx'.</sch:assert>
        </sch:rule>
        <sch:rule id="r12" context="list">
            <sch:assert role="M" test="@type"
                >List elements must have a 'type' attribute with the 
value simple|ordered|footnotes|bibliography.</sch:assert>
        </sch:rule>

    <!--The list one isn't working. The report doesn't catch it when I 
remove the attribute. (='simple' or @type='ordered' or 
@type='footnotes' or @type='bibliography')-->
   <sch:rule id="r13" context="quote/@rend">
        <sch:assert role="M" test="normalize-space(.) = 'blockquote'"
            >'Blockquote' is the only acceptable value for the 'rend' 
attribute in 'quote'.</sch:assert>
    </sch:rule>
        <!--This one isn't working either. I can't get it to just check 
the content of the 'rend' attribute when present.-->
   </sch:pattern>

    <sch:pattern id="p5">
        <sch:title>Place names</sch:title>
        <sch:p>This pattern checks the place name encoding.</sch:p>
        <sch:rule id="r14" context="placeName">
            <sch:assert role="M" test="country|region|settlement"
            >PlaceName must contain only country, region, or settlement 
elements.</sch:assert>
            <!--This one isn't working either.-->
       </sch:rule>
        <sch:rule id="r15" context="placeName/country">
            <sch:assert role="M" test="@reg"
                >Country elements must have a 'reg' attribute.</sch:assert>
        </sch:rule>
       <!--<sch:rule id="r16" context="placeName/region/@type">
            <sch:assert role="M" test="@type='state' or @type='county' 
or @type='province'"
                >Region's optional 'type' attribute must have the value 
"state," "county," or "province."</sch:assert>-->
            <!--Same problem as rend="blockquote"-->
        <!--</sch:rule>-->
        <sch:rule id="r17" context="placeName/region[@type='state']">
            <sch:assert role="M" test="@reg"
                >When a region is type="state", a 'reg' attribute must 
be present.</sch:assert>
        </sch:rule>
        <sch:rule id="r18" context="placeName/settlement">
            <sch:assert role="M" test="@type='city'"
                >Settlement elements must have a 'type' attribute with 
the value "city." </sch:assert>
        </sch:rule>
    </sch:pattern>

</sch:schema>




More information about the XML4Lib mailing list