[XML4Lib] Problem with sorting and de-duplication in XSLT
Schlosser, Melanie Brynn
mschloss at indiana.edu
Wed Mar 21 16:02:12 EST 2007
Thanks for the response, Cary. Unfortunately, I've already tried
<xsl:sort select="orgName"/> and a few other variations, so apparently
it's something else that's going wrong.
Also, I'm not actually trying to use <xsl:strip-space> to de-dup - I've
been trying to use the normalize-space() function. It should work - in
fact, I've seen examples online of other people using it in this way.
It's just mysteriously not working. :)
Melanie
Quoting Cary Gordon <listuser at chillco.com>:
> Try <xsl:sort select="orgName"/> for the sort issue. I believe that it is in
> the correct location.
>
> If you want to get rid of line breaks in nodes, you will have to do some
> form of search and replace. <xsl:strip-space elements="*"/> just removes
> nodes that contain only whitespace.
>
> Cary Gordon
> The Cherry Hill Company
>
>
> -----Original Message-----
> From: xml4lib-bounces at webjunction.org
> [mailto:xml4lib-bounces at webjunction.org] On Behalf Of Schlosser, Melanie
> Brynn
> Sent: Wednesday, March 21, 2007 1:01 PM
> To: xml4lib at webjunction.org
> Subject: [XML4Lib] Problem with sorting and de-duplication in XSLT
>
> Hi,
>
> I'm using XSLT to generate lists of unique values for elements in a
> TEI-encoded encyclopedia that can occur more or less anywhere in the
> hierarchy. The lists will be used both for decision-making about access
> points, and for generating browse lists. Using the Muenchian method, I've
> managed to generate some lists, but there are two problems I haven't been
> able to resolve:
>
> 1. I can't get the sort function to work, so the lists are in document
> order. I've tried using the <xsl:sort> element as a child of <xsl:for-each>
> and as a child of <xsl:apply-templates>, and I've tried it with and without
> the 'select' attribute, with very little result.
>
> 2. The lists aren't completely de-duplicated because the processor sees
> "John Adams" and "John [linebreak] Adams" as different values.
> normalize-space() seems like the obvious solution, but I've tried it in the
> key, i've put it in a variable, i've put it in select="", i've even put it
> in the predicate before the count function. At best I can get it to strip
> the linebreaks from the result list after the initial de-duplication, which
> makes the results prettier, but doesn't solve the problem.
>
> One of my stylesheets (to generate organization names) looks like this:
>
> <?xml version="1.0"?>
> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
> version="1.0">
>
> <xsl:output method="xml" encoding="UTF-8" indent="yes"/>
> <xsl:strip-space elements="*"/>
>
> <!--The first two templates are just to make sure the initial matching is
> working-->
> <xsl:template match="/">
> <foundRoot>
> <xsl:apply-templates/>
> </foundRoot>
> </xsl:template>
>
> <xsl:template match="text">
> <foundText>
> <xsl:apply-templates/>
> </foundText>
> </xsl:template>
>
>
> <xsl:key name="orgName" match="orgName" use="." />
>
> <xsl:template match="//*">
>
>
> <xsl:for-each select="orgName[count(. | key('orgName', .)[1]) = 1]" >
> <xsl:sort />
> <Organization>
> <xsl:value-of select="." />
> </Organization>
> </xsl:for-each>
>
>
> <xsl:apply-templates/>
>
>
> </xsl:template>
>
> <xsl:template match="@*|*|text()">
> <xsl:apply-templates select="*"/>
> </xsl:template>
>
> </xsl:stylesheet>
>
> The output looks like this:
>
> <?xml version="1.0" encoding="UTF-8"?>
> <foundRoot>
> <Organization>Indiana State
> Library</Organization>
> <Organization>Wabash College Library</Organization>
> <Organization>Albert Lea
> College</Organization>
> <Organization>Indiana University</Organization>
> <Organization>Whitewater
> Presbyterian Academy</Organization>
> <Organization>Indianapolis Public
> Library</Organization>
> <Organization>Cornell</Organization>
> <Organization>Oxford</Organization>
> <Organization>Wabash College</Organization>
> <Organization>Yale
> University</Organization>
> <Organization>Dartmouth</Organization>
> <Organization>Harvard</Organization>
> <Organization>Kansas</Organization>
> <Organization>Michigan</Organization>
> <Organization>Yale</Organization>
> <Organization>Burke and
> Howe</Organization>
> <Organization>Liber College</Organization>
> <Organization>Union Literary
> Institute</Organization>
> <Organization>Indiana State Library</Organization>...[and so on ]
>
> Any ideas?
>
> Thanks!
> Melanie Schlosser
> Indiana University Digital Library Program
>
> _______________________________________________
> XML4Lib mailing list
> XML4Lib at webjunction.org
> http://lists.webjunction.org/mailman/listinfo/xml4lib
>
>
More information about the XML4Lib
mailing list