[XML4Lib] batch conversion of HTML files to XML

David Kane dkane at wit.ie
Tue Jul 15 06:21:02 EDT 2008


Hi John,

I suggest htmltidy, which is a utility that does just what you want.  It
converts HTML to XHTML.

Google htmltidy and batch and you should get what you need.

Best,

David.

2008/7/15 John Fitzgibbon <jfitzgibbon at galwaylibrary.ie>:

>  Hi,
>
>
>
> Is it possible to convert a folder of HTML files to XML without having to
> edit each file with a text editor that supports regular expressions? In the
> past this is how I accomplished this task but I am hoping there is an easier
> way.
>
>
>
> The process would have to change tags like <br> to <br/>. Input tags in
> forms would also have to be closed.
>
>
>
> It may have to close tags like <p> and <li>.
>
>
>
> Finally, attribute values are not necessarily bounded by quotes. For
> example, width=200 will have to become width="200".
>
>
>
> Am I searching for a holy grail?
>
>
>
> Any advice would be much appreciated.
>
>
>
> Regards
>
> Jon
>
>
>
> w: www.galwaylibrary.ie
>
> e: info at galwaylibrary.ie
>
> p: 00 353 91 562471
>
> f: 00 353 91 565039
>
>
>  ------------------------------
> This e-mail message has been scanned for Contentand cleared by *MailMarshal
> Hosted at Galway County Council*
> ------------------------------
>
> _______________________________________________
> XML4Lib mailing list
> XML4Lib at webjunction.org
> http://lists.webjunction.org/mailman/listinfo/xml4lib
>
>


-- 
David Kane
Systems Librarian
Waterford Institute of Technology
http://library.wit.ie/
T: ++353.51302838
M: ++353.876693212
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.webjunction.org/wjlists/xml4lib/attachments/20080715/3de7870c/attachment.htm


More information about the XML4Lib mailing list