Greetings All,
I was playing in the sandbox and I was trying to extract data from a webpage runing xquery against a webpage
somebody please any workaround besides http://www.unicodetools.com/unicode/convert-to-html.php I'm too lazy or too busy to write more than a short script for this. Any good hints or links? Thanks for your time and your effort --Zolta -- You receive this message as a subscriber of the [hidden email] mailing list. To unsubscribe: mailto:[hidden email] For general help: mailto:[hidden email]?subject=help OW2 mailing lists service home page: http://www.ow2.org/wws |
Hi Zolta,
I don't find those code points in iso-8859-2 or utf-8. I do find them in windows 1252. So indeed, they are not valid valid utf. [ ] 144 09/00 220 90 (UNDEFINED) [‘] 145 09/01 221 91 HIGH 6 SINGLE QUOTE [’] 146 09/02 222 92 HIGH 9 SINGLE QUOTE OPS is using Java XML libraries which parse the text. I can't think of a workaround to that ... besides some program that will produce correct utf-8. --Hank On Mar 2, 2009, at 4:39 PM, Baráti Zoltán wrote: > Greetings All, > I was playing in the sandbox and I was trying to extract data from > a webpage runing xquery against a webpage > The webpage was iso-8859-2 i decided I need my national characters > so let's convert to utf-8 > I tought w3c's Amaya is standard enough to complete this task so i > opened the webpage in Amaya and saved it as utf-8 > Well no. not really orbeon is strict it says: > Illegal HTML character: decimal 145 > I did some research like: > let $guessifok :=(144,145,146) > return > <p> > { codepoints-to-string($guessifok)} > </p> > > OPS doesn't eat utf-8 coded control characters this way am I right? > Is that a bad idea to make ops silently ignore these chars as -I > guess- browsers do it in some cases? > somebody please any workaround besides > http://www.unicodetools.com/unicode/convert-to-html.php > > > I'm too lazy or too busy to write more than a short script for this. > Any good hints or links? > Thanks for your time and your effort > --Zolta > > -- > You receive this message as a subscriber of the [hidden email] > mailing list. > To unsubscribe: mailto:[hidden email] > For general help: mailto:[hidden email]?subject=help > OW2 mailing lists service home page: http://www.ow2.org/wws NEES@UCSB Institute for Crustal Studies, University of California, Santa Barbara 805-893-8042 -- You receive this message as a subscriber of the [hidden email] mailing list. To unsubscribe: mailto:[hidden email] For general help: mailto:[hidden email]?subject=help OW2 mailing lists service home page: http://www.ow2.org/wws |
Administrator
|
In reply to this post by Zolta
Where does the error come from? The XML parser? Or JTidy? Something
else? -Erik On Mar 2, 2009, at 4:39 PM, Baráti Zoltán wrote: > Greetings All, > I was playing in the sandbox and I was trying to extract data from > a webpage runing xquery against a webpage > • The webpage was iso-8859-2 i decided I need my national > characters so let's convert to utf-8 > • I tought w3c's Amaya is standard enough to complete this task so > i opened the webpage in Amaya and saved it as utf-8 > • Well no. not really orbeon is strict it says: > Illegal HTML character: decimal 145 > • I did some research like: > let $guessifok :=(144,145,146) > return > <p> > { codepoints-to-string($guessifok)} > </p> > > • OPS doesn't eat utf-8 coded control characters this way am I right? > • Is that a bad idea to make ops silently ignore these chars as -I > guess- browsers do it in some cases? > somebody please any workaround besides > http://www.unicodetools.com/unicode/convert-to-html.php > > > I'm too lazy or too busy to write more than a short script for this. > Any good hints or links? > Thanks for your time and your effort > --Zolta > > -- > You receive this message as a subscriber of the [hidden email] > mailing list. > To unsubscribe: mailto:[hidden email] > For general help: mailto:[hidden email]?subject=help > OW2 mailing lists service home page: http://www.ow2.org/wws Orbeon Forms - Web Forms for the Enterprise Done the Right Way http://www.orbeon.com/ -- You receive this message as a subscriber of the [hidden email] mailing list. To unsubscribe: mailto:[hidden email] For general help: mailto:[hidden email]?subject=help OW2 mailing lists service home page: http://www.ow2.org/wws |
Free forum by Nabble | Edit this page |