character entities versus utf-8 coded special characters are they Illegal HTML character s?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

character entities versus utf-8 coded special characters are they Illegal HTML character s?

Zolta
Greetings All,
I was playing in the sandbox  and I was trying to extract data from a webpage  runing xquery against a webpage
  1. The webpage was iso-8859-2 i decided  I need my national characters so let's convert to utf-8
  2. I tought w3c's Amaya is standard enough to complete this task so i opened the webpage in Amaya and saved it as utf-8
  3. Well no. not really  orbeon is strict it says:
       Illegal HTML character: decimal 145
  4. I did some research like:
    let $guessifok :=(144,145,146)
    return
    <p>
    { codepoints-to-string($guessifok)}
       </p>

  5. OPS doesn't eat utf-8 coded control characters this way am I right?
  6. Is that a bad idea to make ops silently ignore these chars as -I guess- browsers do it in some cases?

somebody please any workaround besides
http://www.unicodetools.com/unicode/convert-to-html.php



I'm too lazy or too busy to write more than a short script for this.
Any good hints or links?
Thanks for your time and your effort
--Zolta



--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws
Reply | Threaded
Open this post in threaded view
|

Re: character entities versus utf-8 coded special characters are they Illegal HTML character s?

Hank Ratzesberger
Hi Zolta,

I don't find those code points in iso-8859-2 or utf-8.  I do find  
them in
windows 1252.  So indeed, they are not valid valid utf.

[ ]  144  09/00  220  90  (UNDEFINED)
[‘]  145  09/01  221  91  HIGH 6 SINGLE QUOTE
[’]  146  09/02  222  92  HIGH 9 SINGLE QUOTE

OPS is using Java XML libraries which parse the text.  I can't
think of a workaround to that ... besides some program that will
produce correct utf-8.

--Hank


On Mar 2, 2009, at 4:39 PM, Baráti Zoltán wrote:

> Greetings All,
> I was playing in the sandbox  and I was trying to extract data from  
> a webpage  runing xquery against a webpage
> The webpage was iso-8859-2 i decided  I need my national characters  
> so let's convert to utf-8
> I tought w3c's Amaya is standard enough to complete this task so i  
> opened the webpage in Amaya and saved it as utf-8
> Well no. not really  orbeon is strict it says:
>    Illegal HTML character: decimal 145
> I did some research like:
> let $guessifok :=(144,145,146)
> return
> <p>
> { codepoints-to-string($guessifok)}
>    </p>
>
> OPS doesn't eat utf-8 coded control characters this way am I right?
> Is that a bad idea to make ops silently ignore these chars as -I  
> guess- browsers do it in some cases?
> somebody please any workaround besides
> http://www.unicodetools.com/unicode/convert-to-html.php
>
>
> I'm too lazy or too busy to write more than a short script for this.
> Any good hints or links?
> Thanks for your time and your effort
> --Zolta
>
> --
> You receive this message as a subscriber of the [hidden email]  
> mailing list.
> To unsubscribe: mailto:[hidden email]
> For general help: mailto:[hidden email]?subject=help
> OW2 mailing lists service home page: http://www.ow2.org/wws
Hank Ratzesberger
NEES@UCSB
Institute for Crustal Studies,
University of California, Santa Barbara
805-893-8042







--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws
Reply | Threaded
Open this post in threaded view
|

Re: character entities versus utf-8 coded special characters are they Illegal HTML character s?

Erik Bruchez
Administrator
In reply to this post by Zolta
Where does the error come from? The XML parser? Or JTidy? Something  
else?

-Erik

On Mar 2, 2009, at 4:39 PM, Baráti Zoltán wrote:

> Greetings All,
> I was playing in the sandbox  and I was trying to extract data from  
> a webpage  runing xquery against a webpage
> • The webpage was iso-8859-2 i decided  I need my national  
> characters so let's convert to utf-8
> • I tought w3c's Amaya is standard enough to complete this task so  
> i opened the webpage in Amaya and saved it as utf-8
> • Well no. not really  orbeon is strict it says:
>    Illegal HTML character: decimal 145
> • I did some research like:
> let $guessifok :=(144,145,146)
> return
> <p>
> { codepoints-to-string($guessifok)}
>    </p>
>
> • OPS doesn't eat utf-8 coded control characters this way am I right?
> • Is that a bad idea to make ops silently ignore these chars as -I  
> guess- browsers do it in some cases?
> somebody please any workaround besides
> http://www.unicodetools.com/unicode/convert-to-html.php
>
>
> I'm too lazy or too busy to write more than a short script for this.
> Any good hints or links?
> Thanks for your time and your effort
> --Zolta
>
> --
> You receive this message as a subscriber of the [hidden email]  
> mailing list.
> To unsubscribe: mailto:[hidden email]
> For general help: mailto:[hidden email]?subject=help
> OW2 mailing lists service home page: http://www.ow2.org/wws
--
Orbeon Forms - Web Forms for the Enterprise Done the Right Way
http://www.orbeon.com/



--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws