[XML4Lib] batch conversion of HTML files to XML

John Fitzgibbon jfitzgibbon at Galwaylibrary.ie
Tue Jul 15 04:47:27 EDT 2008


Hi,

Is it possible to convert a folder of HTML files to XML without having to edit each file with a text editor that supports regular expressions? In the past this is how I accomplished this task but I am hoping there is an easier way.

The process would have to change tags like <br> to <br/>. Input tags in forms would also have to be closed.

It may have to close tags like <p> and <li>.

Finally, attribute values are not necessarily bounded by quotes. For example, width=200 will have to become width="200".

Am I searching for a holy grail?

Any advice would be much appreciated.

Regards
Jon


w: www.galwaylibrary.ie

e: info at galwaylibrary.ie

p: 00 353 91 562471

f: 00 353 91 565039


#####################################################################################
This e-mail message has been scanned for  Content and cleared 
by MailMarshal Hosted  at Galway County  Council
#####################################################################################
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.webjunction.org/wjlists/xml4lib/attachments/20080715/33acb3ca/attachment.htm


More information about the XML4Lib mailing list