[XML4Lib] Problem with sorting and de-duplication in XSLT
Schlosser, Melanie Brynn
mschloss at indiana.edu
Wed Mar 21 15:01:01 EST 2007
Hi,
I'm using XSLT to generate lists of unique values for elements in a
TEI-encoded
encyclopedia that can occur more or less anywhere in the hierarchy. The
lists will be used both for decision-making about access points, and
for generating browse lists. Using the Muenchian method, I've managed
to generate some lists, but there are two problems I haven't been able
to resolve:
1. I can't get the sort function to work, so the lists are in document
order. I've tried using the <xsl:sort> element as a child of
<xsl:for-each> and as a child of <xsl:apply-templates>, and I've tried
it with and without the 'select' attribute, with very little result.
2. The lists aren't completely de-duplicated because the processor sees
"John Adams" and "John [linebreak] Adams" as different values.
normalize-space() seems like the obvious solution, but I've tried it in
the key, i've put it in a variable, i've put it in select="", i've even
put it in the predicate before the count function. At best I can get it
to strip the linebreaks from the result list after the initial
de-duplication, which makes the results prettier, but doesn't solve the
problem.
One of my stylesheets (to generate organization names) looks like this:
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:output method="xml" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>
<!--The first two templates are just to make sure the initial matching
is working-->
<xsl:template match="/">
<foundRoot>
<xsl:apply-templates/>
</foundRoot>
</xsl:template>
<xsl:template match="text">
<foundText>
<xsl:apply-templates/>
</foundText>
</xsl:template>
<xsl:key name="orgName" match="orgName" use="." />
<xsl:template match="//*">
<xsl:for-each select="orgName[count(. | key('orgName', .)[1]) = 1]" >
<xsl:sort />
<Organization>
<xsl:value-of select="." />
</Organization>
</xsl:for-each>
<xsl:apply-templates/>
</xsl:template>
<xsl:template match="@*|*|text()">
<xsl:apply-templates select="*"/>
</xsl:template>
</xsl:stylesheet>
The output looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<foundRoot>
<Organization>Indiana State
Library</Organization>
<Organization>Wabash College Library</Organization>
<Organization>Albert Lea
College</Organization>
<Organization>Indiana University</Organization>
<Organization>Whitewater
Presbyterian Academy</Organization>
<Organization>Indianapolis Public
Library</Organization>
<Organization>Cornell</Organization>
<Organization>Oxford</Organization>
<Organization>Wabash College</Organization>
<Organization>Yale
University</Organization>
<Organization>Dartmouth</Organization>
<Organization>Harvard</Organization>
<Organization>Kansas</Organization>
<Organization>Michigan</Organization>
<Organization>Yale</Organization>
<Organization>Burke and
Howe</Organization>
<Organization>Liber College</Organization>
<Organization>Union Literary
Institute</Organization>
<Organization>Indiana State Library</Organization>...[and so on ]
Any ideas?
Thanks!
Melanie Schlosser
Indiana University Digital Library Program
More information about the XML4Lib
mailing list