[XML4Lib] batch conversion of HTML files to XML
Conal Tuohy
conal.tuohy at vuw.ac.nz
Tue Jul 15 18:24:18 EDT 2008
Chiming in with one more option: JTidy (a Java version of Tidy)
http://jtidy.sourceforge.net/
On Tue, 2008-07-15 at 09:47 +0100, John Fitzgibbon wrote:
> Hi,
>
>
>
> Is it possible to convert a folder of HTML files to XML without having
> to edit each file with a text editor that supports regular
> expressions? In the past this is how I accomplished this task but I am
> hoping there is an easier way.
>
>
>
> The process would have to change tags like <br> to <br/>. Input tags
> in forms would also have to be closed.
>
>
>
> It may have to close tags like <p> and <li>.
>
>
>
> Finally, attribute values are not necessarily bounded by quotes. For
> example, width=200 will have to become width=”200”.
>
>
>
> Am I searching for a holy grail?
>
>
>
> Any advice would be much appreciated.
>
>
>
> Regards
>
> Jon
>
>
>
> w: www.galwaylibrary.ie
>
> e: info at galwaylibrary.ie
>
> p: 00 353 91 562471
>
> f: 00 353 91 565039
>
>
>
>
>
> ______________________________________________________________________
> This e-mail message has been scanned for Contentand cleared by
> MailMarshal Hosted at Galway County Council
> ______________________________________________________________________
> _______________________________________________
> XML4Lib mailing list
> XML4Lib at webjunction.org
> http://lists.webjunction.org/mailman/listinfo/xml4lib
--
Conal Tuohy
New Zealand Electronic Text Centre
www.nzetc.org
More information about the XML4Lib
mailing list