[XML4Lib] Problem with sorting and de-duplication in XSLT

Charles Yates ceyates at stanford.edu
Wed Mar 21 16:39:14 EST 2007


have you looked here?
http://www.dpawson.co.uk/xsl/sect2/N6280.html#d9850e16

Schlosser, Melanie Brynn wrote:
> Hi,
>
> I'm using XSLT to generate lists of unique values for elements in a 
> TEI-encoded
> encyclopedia that can occur more or less anywhere in the hierarchy. 
> The lists will be used both for decision-making about access points, 
> and for generating browse lists. Using the Muenchian method, I've 
> managed to generate some lists, but there are two problems I haven't 
> been able to resolve:
>
> 1. I can't get the sort function to work, so the lists are in document 
> order. I've tried using the <xsl:sort> element as a child of 
> <xsl:for-each> and as a child of <xsl:apply-templates>, and I've tried 
> it with and without the 'select' attribute, with very little result.
>
> 2. The lists aren't completely de-duplicated because the processor 
> sees "John Adams" and "John [linebreak] Adams" as different values. 
> normalize-space() seems like the obvious solution, but I've tried it 
> in the key, i've put it in a variable, i've put it in select="", i've 
> even put it in the predicate before the count function. At best I can 
> get it to strip the linebreaks from the result list after the initial 
> de-duplication, which makes the results prettier, but doesn't solve 
> the problem.
>
> One of my stylesheets (to generate organization names) looks like this:
>
> <?xml version="1.0"?>
> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
> version="1.0">
>
>   <xsl:output method="xml" encoding="UTF-8" indent="yes"/>
>   <xsl:strip-space elements="*"/>
>
> <!--The first two templates are just to make sure the initial matching 
> is working-->
>   <xsl:template match="/">
>       <foundRoot>
>           <xsl:apply-templates/>
>       </foundRoot>
>   </xsl:template>
>
>   <xsl:template match="text">
>       <foundText>
>           <xsl:apply-templates/>
>       </foundText>
>   </xsl:template>
>
>
>   <xsl:key name="orgName" match="orgName" use="." />
>
>   <xsl:template match="//*">
>
>
>       <xsl:for-each select="orgName[count(. | key('orgName', .)[1]) = 
> 1]" >
>           <xsl:sort />
>           <Organization>
>               <xsl:value-of select="." />
>       </Organization>
>       </xsl:for-each>
>
>
>       <xsl:apply-templates/>
>
>
>   </xsl:template>
>
>   <xsl:template match="@*|*|text()">
>       <xsl:apply-templates select="*"/>
>   </xsl:template>
>
> </xsl:stylesheet>
>
> The output looks like this:
>
> <?xml version="1.0" encoding="UTF-8"?>
> <foundRoot>
>  <Organization>Indiana State
>                       Library</Organization>
>  <Organization>Wabash College Library</Organization>
>  <Organization>Albert Lea
>                               College</Organization>
>  <Organization>Indiana University</Organization>
>  <Organization>Whitewater
>                                   Presbyterian Academy</Organization>
>  <Organization>Indianapolis Public
>                               Library</Organization>
>  <Organization>Cornell</Organization>
>  <Organization>Oxford</Organization>
>  <Organization>Wabash College</Organization>
>  <Organization>Yale
>                               University</Organization>
>  <Organization>Dartmouth</Organization>
>  <Organization>Harvard</Organization>
>  <Organization>Kansas</Organization>
>  <Organization>Michigan</Organization>
>  <Organization>Yale</Organization>
>  <Organization>Burke and
>                                   Howe</Organization>
>  <Organization>Liber College</Organization>
>  <Organization>Union Literary
>                               Institute</Organization>
>  <Organization>Indiana State Library</Organization>...[and so on ]
>
> Any ideas?
>
> Thanks!
> Melanie Schlosser
> Indiana University Digital Library Program
>
> _______________________________________________
> XML4Lib mailing list
> XML4Lib at webjunction.org
> http://lists.webjunction.org/mailman/listinfo/xml4lib


More information about the XML4Lib mailing list