[XML4Lib] batch conversion of HTML files to XML

Conal Tuohy conal.tuohy at vuw.ac.nz
Tue Jul 15 18:24:18 EDT 2008


Chiming in with one more option: JTidy (a Java version of Tidy)

http://jtidy.sourceforge.net/

On Tue, 2008-07-15 at 09:47 +0100, John Fitzgibbon wrote:
> Hi,
> 
>  
> 
> Is it possible to convert a folder of HTML files to XML without having
> to edit each file with a text editor that supports regular
> expressions? In the past this is how I accomplished this task but I am
> hoping there is an easier way.
> 
>  
> 
> The process would have to change tags like <br> to <br/>. Input tags
> in forms would also have to be closed.
> 
>  
> 
> It may have to close tags like <p> and <li>.
> 
>  
> 
> Finally, attribute values are not necessarily bounded by quotes. For
> example, width=200 will have to become width=”200”.
> 
>  
> 
> Am I searching for a holy grail?
> 
>  
> 
> Any advice would be much appreciated.
> 
>  
> 
> Regards
> 
> Jon
> 
>  
> 
> w: www.galwaylibrary.ie
> 
> e: info at galwaylibrary.ie
> 
> p: 00 353 91 562471
> 
> f: 00 353 91 565039
> 
>  
> 
> 
> 
> ______________________________________________________________________
> This e-mail message has been scanned for Contentand cleared by
> MailMarshal Hosted at Galway County Council 
> ______________________________________________________________________
> _______________________________________________
> XML4Lib mailing list
> XML4Lib at webjunction.org
> http://lists.webjunction.org/mailman/listinfo/xml4lib
-- 
Conal Tuohy
New Zealand Electronic Text Centre
www.nzetc.org





More information about the XML4Lib mailing list