[XML4Lib] batch conversion of HTML files to XML
Houghton,Andrew
houghtoa at oclc.org
Tue Jul 15 09:25:19 EDT 2008
No you are not searching for the holy grail. There are several tools that do what you are asking for. Tidy [1] and tagsoup [2] come to mind.
Andy.
[1] http://tidy.sourceforge.net/
[2] http://ccil.org/~cowan/XML/tagsoup/
From: xml4lib-bounces at webjunction.org [mailto:xml4lib-bounces at webjunction.org] On Behalf Of John Fitzgibbon
Sent: Tuesday, July 15, 2008 4:47 AM
To: xml4lib
Subject: [XML4Lib] batch conversion of HTML files to XML
Hi,
Is it possible to convert a folder of HTML files to XML without having to edit each file with a text editor that supports regular expressions? In the past this is how I accomplished this task but I am hoping there is an easier way.
The process would have to change tags like <br> to <br/>. Input tags in forms would also have to be closed.
It may have to close tags like <p> and <li>.
Finally, attribute values are not necessarily bounded by quotes. For example, width=200 will have to become width=”200”.
Am I searching for a holy grail?
Any advice would be much appreciated.
Regards
Jon
w: www.galwaylibrary.ie
e: info at galwaylibrary.ie
p: 00 353 91 562471
f: 00 353 91 565039
________________________________
This e-mail message has been scanned for Contentand cleared by MailMarshal Hosted at Galway County Council
________________________________
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.webjunction.org/wjlists/xml4lib/attachments/20080715/21385c71/attachment-0001.htm
More information about the XML4Lib
mailing list