[XML4Lib] xml unicode difficulties

Andrew Cunningham andrewc at vicnet.net.au
Wed Jun 13 18:50:49 EDT 2007


A few different things are going on here.

jssauer at uwm.edu wrote:
> Greetings.  I am currently taking an introductory course in XML at the Univ. of
> Wisconsin/Milwuakee SOIS as part of my MLIS program.  I'm having a problem with
> code in a document meant to display foreign scripts as part of an assignment
> involving unicode and css (a very simple assignment - apologies to all the
> experts).
> The doc will display as intended from my desktop but not when I navigate to the
> URL.
> 
> Code looks like this:
> 
> <?xml version="1.0"?>
> <?xml-stylesheet type="text/css" href="unicode.css"?>

1) you should declare the encoding in the xml declaration. If the 
document is UTF-8 encoded: <?xml version="1.0" encoding="UTF-8"?>

Refer to the tutorials and articles on the W3C Internationalization 
activity pages: http://www.w3.org/International/

2) You have a UTF-16-LE byte order mark at the beginning of the file.

As far as I can tell you have a corrupted file. I suspect the problem 
relates to the tools you used to create the file. It looks like 
different encodings were inserted into the same file.

I'd start over again with a good Unicode text editor. A demo version of 
SC Unipad (http://www.unipad.org/) would be useful, it will allow you to 
see what's happening better.

Make sure you save the xml document and css document as UTF-8 without 
the Byte Order Mark (BOM)

Unicode is a coded character set. There are multiple character encodings 
  that can be used to represent Unicode data. The most common for the 
web is UTF-8.

Some editors have an option to save as "Unicode" and probably means one 
of the UTF-16 varieties and not UTF-8.

> When I try the website an error message shows characters after the close of the
> first line - but there are none in the document. The instructor simply had us
> paste text from a website (I used google news in chinese, japanese and hindi as
> examples)and I wonder if pasting is the problem?

Andrew


More information about the XML4Lib mailing list