<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">

<head>
<meta http-equiv=Content-Type content="text/html; charset=windows-1252">
<meta name=Generator content="Microsoft Word 12 (filtered medium)">
<!--[if !mso]>
<style>
v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style>
<![endif]-->
<style>
<!--
 /* Font Definitions */
 @font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
        {font-family:Tahoma;
        panose-1:2 11 6 4 3 5 4 4 2 4;}
 /* Style Definitions */
 p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0in;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}
p.MsoAutoSig, li.MsoAutoSig, div.MsoAutoSig
        {mso-style-priority:99;
        mso-style-link:"E-mail Signature Char";
        margin:0in;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman","serif";}
span.E-mailSignatureChar
        {mso-style-name:"E-mail Signature Char";
        mso-style-priority:99;
        mso-style-link:"E-mail Signature";
        font-family:"Calibri","sans-serif";}
span.EmailStyle19
        {mso-style-type:personal;
        font-family:"Arial","sans-serif";
        color:windowtext;}
span.EmailStyle21
        {mso-style-type:personal-reply;
        font-family:"Calibri","sans-serif";
        color:#1F497D;}
.MsoChpDefault
        {mso-style-type:export-only;
        font-size:10.0pt;}
@page Section1
        {size:595.3pt 841.9pt;
        margin:1.0in 1.25in 1.0in 1.25in;}
div.Section1
        {page:Section1;}
-->
</style>
<!--[if gte mso 9]><xml>
 <o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
 <o:shapelayout v:ext="edit">
  <o:idmap v:ext="edit" data="1" />
 </o:shapelayout></xml><![endif]-->
</head>

<body lang=EN-US link=blue vlink=purple>

<div class=Section1>

<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>No you are not searching for the holy grail.  There are several
tools that do what you are asking for.  Tidy [1] and tagsoup [2] come to mind.<o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'><o:p>&nbsp;</o:p></span></p>

<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>Andy.<o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'><o:p>&nbsp;</o:p></span></p>

<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>[1] <a href="http://tidy.sourceforge.net/">http://tidy.sourceforge.net/</a><o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'>[2] <a href="http://ccil.org/~cowan/XML/tagsoup/">http://ccil.org/~cowan/XML/tagsoup/</a><o:p></o:p></span></p>

<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'><o:p>&nbsp;</o:p></span></p>

<p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";
color:#1F497D'><o:p>&nbsp;</o:p></span></p>

<div style='border:none;border-left:solid blue 1.5pt;padding:0in 0in 0in 4.0pt'>

<div>

<div style='border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in'>

<p class=MsoNormal><b><span style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'>From:</span></b><span
style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'>
xml4lib-bounces@webjunction.org [mailto:xml4lib-bounces@webjunction.org] <b>On
Behalf Of </b>John Fitzgibbon<br>
<b>Sent:</b> Tuesday, July 15, 2008 4:47 AM<br>
<b>To:</b> xml4lib<br>
<b>Subject:</b> [XML4Lib] batch conversion of HTML files to XML<o:p></o:p></span></p>

</div>

</div>

<p class=MsoNormal><o:p>&nbsp;</o:p></p>

<p class=MsoNormal><span lang=EN-GB style='font-size:10.0pt;font-family:"Arial","sans-serif"'>Hi,<o:p></o:p></span></p>

<p class=MsoNormal><span lang=EN-GB style='font-size:10.0pt;font-family:"Arial","sans-serif"'><o:p>&nbsp;</o:p></span></p>

<p class=MsoNormal><span lang=EN-GB style='font-size:10.0pt;font-family:"Arial","sans-serif"'>Is
it possible to convert a folder of HTML files to XML without having to edit
each file with a text editor that supports regular expressions? In the past
this is how I accomplished this task but I am hoping there is an easier way.<o:p></o:p></span></p>

<p class=MsoNormal><span lang=EN-GB style='font-size:10.0pt;font-family:"Arial","sans-serif"'><o:p>&nbsp;</o:p></span></p>

<p class=MsoNormal><span lang=EN-GB style='font-size:10.0pt;font-family:"Arial","sans-serif"'>The
process would have to change tags like &lt;br&gt; to &lt;br/&gt;. Input tags in
forms would also have to be closed.<o:p></o:p></span></p>

<p class=MsoNormal><span lang=EN-GB style='font-size:10.0pt;font-family:"Arial","sans-serif"'><o:p>&nbsp;</o:p></span></p>

<p class=MsoNormal><span lang=EN-GB style='font-size:10.0pt;font-family:"Arial","sans-serif"'>It
may have to close tags like &lt;p&gt; and &lt;li&gt;.<o:p></o:p></span></p>

<p class=MsoNormal><span lang=EN-GB style='font-size:10.0pt;font-family:"Arial","sans-serif"'><o:p>&nbsp;</o:p></span></p>

<p class=MsoNormal><span lang=EN-GB style='font-size:10.0pt;font-family:"Arial","sans-serif"'>Finally,
attribute values are not necessarily bounded by quotes. For example, width=200
will have to become width=”200”.<o:p></o:p></span></p>

<p class=MsoNormal><span lang=EN-GB style='font-size:10.0pt;font-family:"Arial","sans-serif"'><o:p>&nbsp;</o:p></span></p>

<p class=MsoNormal><span lang=EN-GB style='font-size:10.0pt;font-family:"Arial","sans-serif"'>Am
I searching for a holy grail?<o:p></o:p></span></p>

<p class=MsoNormal><span lang=EN-GB style='font-size:10.0pt;font-family:"Arial","sans-serif"'><o:p>&nbsp;</o:p></span></p>

<p class=MsoNormal><span lang=EN-GB style='font-size:10.0pt;font-family:"Arial","sans-serif"'>Any
advice would be much appreciated.<o:p></o:p></span></p>

<p class=MsoNormal><span lang=EN-GB style='font-size:10.0pt;font-family:"Arial","sans-serif"'><o:p>&nbsp;</o:p></span></p>

<p class=MsoNormal><span lang=EN-GB style='font-size:10.0pt;font-family:"Arial","sans-serif"'>Regards<o:p></o:p></span></p>

<p class=MsoNormal><span lang=EN-GB style='font-size:10.0pt;font-family:"Arial","sans-serif"'>Jon<o:p></o:p></span></p>

<p class=MsoNormal><span lang=EN-GB style='font-size:10.0pt;font-family:"Arial","sans-serif"'><o:p>&nbsp;</o:p></span></p>

<p class=MsoAutoSig>w: www.galwaylibrary.ie<o:p></o:p></p>

<p class=MsoAutoSig>e: info@galwaylibrary.ie<o:p></o:p></p>

<p class=MsoAutoSig>p: 00 353 91 562471<o:p></o:p></p>

<p class=MsoAutoSig>f: 00 353 91 565039<o:p></o:p></p>

<p class=MsoNormal><span lang=EN-GB><o:p>&nbsp;</o:p></span></p>

<div class=MsoNormal align=center style='text-align:center'><span lang=EN-GB>

<hr size=2 width="100%" align=center>

</span></div>

<p class=MsoNormal><span lang=EN-GB>This e-mail message has been scanned for
Contentand cleared by <strong><span style='color:#400080'>MailMarshal Hosted at
Galway County Council</span></strong> <o:p></o:p></span></p>

<div class=MsoNormal align=center style='text-align:center'><span lang=EN-GB>

<hr size=2 width="100%" align=center>

</span></div>

</div>

</div>

</body>

</html>