[XML4Lib] Problem with sorting and de-duplication in XSLT

Schlosser, Melanie Brynn mschloss at indiana.edu
Wed Mar 21 15:01:01 EST 2007


Hi,

I'm using XSLT to generate lists of unique values for elements in a 
TEI-encoded
encyclopedia that can occur more or less anywhere in the hierarchy. The 
lists will be used both for decision-making about access points, and 
for generating browse lists. Using the Muenchian method, I've managed 
to generate some lists, but there are two problems I haven't been able 
to resolve:

1. I can't get the sort function to work, so the lists are in document 
order. I've tried using the <xsl:sort> element as a child of 
<xsl:for-each> and as a child of <xsl:apply-templates>, and I've tried 
it with and without the 'select' attribute, with very little result.

2. The lists aren't completely de-duplicated because the processor sees 
"John Adams" and "John [linebreak] Adams" as different values. 
normalize-space() seems like the obvious solution, but I've tried it in 
the key, i've put it in a variable, i've put it in select="", i've even 
put it in the predicate before the count function. At best I can get it 
to strip the linebreaks from the result list after the initial 
de-duplication, which makes the results prettier, but doesn't solve the 
problem.

One of my stylesheets (to generate organization names) looks like this:

<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
version="1.0">

   <xsl:output method="xml" encoding="UTF-8" indent="yes"/>
   <xsl:strip-space elements="*"/>

<!--The first two templates are just to make sure the initial matching 
is working-->
   <xsl:template match="/">
       <foundRoot>
           <xsl:apply-templates/>
       </foundRoot>
   </xsl:template>

   <xsl:template match="text">
       <foundText>
           <xsl:apply-templates/>
       </foundText>
   </xsl:template>


   <xsl:key name="orgName" match="orgName" use="." />

   <xsl:template match="//*">


       <xsl:for-each select="orgName[count(. | key('orgName', .)[1]) = 1]" >
           <xsl:sort />
           <Organization>
               <xsl:value-of select="." />
       </Organization>
       </xsl:for-each>


       <xsl:apply-templates/>


   </xsl:template>

   <xsl:template match="@*|*|text()">
       <xsl:apply-templates select="*"/>
   </xsl:template>

</xsl:stylesheet>

The output looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<foundRoot>
  <Organization>Indiana State
                       Library</Organization>
  <Organization>Wabash College Library</Organization>
  <Organization>Albert Lea
                               College</Organization>
  <Organization>Indiana University</Organization>
  <Organization>Whitewater
                                   Presbyterian Academy</Organization>
  <Organization>Indianapolis Public
                               Library</Organization>
  <Organization>Cornell</Organization>
  <Organization>Oxford</Organization>
  <Organization>Wabash College</Organization>
  <Organization>Yale
                               University</Organization>
  <Organization>Dartmouth</Organization>
  <Organization>Harvard</Organization>
  <Organization>Kansas</Organization>
  <Organization>Michigan</Organization>
  <Organization>Yale</Organization>
  <Organization>Burke and
                                   Howe</Organization>
  <Organization>Liber College</Organization>
  <Organization>Union Literary
                               Institute</Organization>
  <Organization>Indiana State Library</Organization>...[and so on ]

Any ideas?

Thanks!
Melanie Schlosser
Indiana University Digital Library Program



More information about the XML4Lib mailing list