[XML4Lib] ignoring doctype

Eric Lease Morgan emorgan at nd.edu
Sun Oct 29 07:42:41 EST 2006


How do I get my XSLT program to ignore the DOCTYPE declaration in an  
HTML file?

I amy trying to use xsltproc to transform my valid HTML files into a  
format that can be easily indexed by the Alvis filter of Zebra but my  
transformations only work if I remove the DOCTYPE definition from the  
HTML. I have the following HTML snippet:


   <?xml version="1.0" encoding="utf-8"?>
   <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
   <html>
     <head>
       <title>Communication is the key to our success / Eric Lease  
Morgan</title>
       <meta name="identifier" content="musings-91" />
       <meta name="author" content="Eric Lease Morgan" />
       <meta name="title" content="Communication is the key to our  
success" />
     </head>
     <body><h1>Hello, World!</h1></body>
   </html>


I have this XSLT file:


   <xsl:stylesheet
     xmlns:xsl = "http://www.w3.org/1999/XSL/Transform"
     xmlns:z   = "http://indexdata.dk/zebra/xslt/1"
     version   = "1.0">

   <xsl:template match="/">
     <z:record z:id="musings-91" z:type="update">
       <xsl:apply-templates />
     </z:record>
   </xsl:template>

   <xsl:template match="meta">
     <xsl:if test='./@name = "identifier"'>
       <identifier><xsl:value-of select='./@content'/></identifier>
     </xsl:if>
   </xsl:template>


The problem is the meta template never matches unless I remove the  
DOCTYPE declaration from the HTML file. What am I doing wrong? How  
can I match the meta template and retain my DOCTYPE declaration?

-- 
Eric Lease Morgan
University Libraries of Notre Dame




More information about the XML4Lib mailing list