Is is possible to re-encode a document received from oxf:request-
generator? The web site I am accessing using this processor specifies @content="text/html; charset=windows-1250" for the pages it serves. Later in my XPL I am processing the tidy-ed-up HTML, doing some string comparisons using XSLT processor and due to mismatch in the same string encoded differently (windows-1250 versus UTF-8) my comparison logic does not work. The documentation for the oxf:url-generator does not seem to suggest re-encoding is possible using this processor (likely I do not understand the instructions). If this is the case, what are my options? Thanks A. -- You receive this message as a subscriber of the [hidden email] mailing list. To unsubscribe: mailto:[hidden email] For general help: mailto:[hidden email]?subject=help ObjectWeb mailing lists service home page: http://www.objectweb.org/wws |
Administrator
|
Alexander Zatko wrote:
> Is is possible to re-encode a document received from > oxf:request-generator? The web site I am accessing using this processor > specifies @content="text/html; charset=windows-1250" for the pages it > serves. Later in my XPL I am processing the tidy-ed-up HTML, doing some > string comparisons using XSLT processor and due to mismatch in the same > string encoded differently (windows-1250 versus UTF-8) my comparison > logic does not work. > > The documentation for the oxf:url-generator does not seem to suggest > re-encoding is possible using this processor (likely I do not understand > the instructions). If this is the case, what are my options? -Erik -- Orbeon Forms - Web Forms for the Enterprise Done the Right Way http://www.orbeon.com/ -- You receive this message as a subscriber of the [hidden email] mailing list. To unsubscribe: mailto:[hidden email] For general help: mailto:[hidden email]?subject=help ObjectWeb mailing lists service home page: http://www.objectweb.org/wws |
Sorry - I meant to say url-generator
A. On May 29, 2007, at 4:51 AM, Erik Bruchez wrote: > Alexander Zatko wrote: >> Is is possible to re-encode a document received from oxf:request- >> generator? The web site I am accessing using this processor >> specifies @content="text/html; charset=windows-1250" for the pages >> it serves. Later in my XPL I am processing the tidy-ed-up HTML, >> doing some string comparisons using XSLT processor and due to >> mismatch in the same string encoded differently (windows-1250 >> versus UTF-8) my comparison logic does not work. >> The documentation for the oxf:url-generator does not seem to >> suggest re-encoding is possible using this processor (likely I do >> not understand the instructions). If this is the case, what are my >> options? > > Do you really mean oxf:request, or oxf:url-generator? > > -Erik > > -- > Orbeon Forms - Web Forms for the Enterprise Done the Right Way > http://www.orbeon.com/ > > > -- > You receive this message as a subscriber of the ops- > [hidden email] mailing list. > To unsubscribe: mailto:[hidden email] > For general help: mailto:[hidden email]?subject=help > ObjectWeb mailing lists service home page: http://www.objectweb.org/ > wws -- You receive this message as a subscriber of the [hidden email] mailing list. To unsubscribe: mailto:[hidden email] For general help: mailto:[hidden email]?subject=help ObjectWeb mailing lists service home page: http://www.objectweb.org/wws |
Administrator
|
Alex,
You are retrieving with the url-generator a page from a server. That page is served in the windows-1250 encoding. Then you compare something in this page with something else you have in an XML file encoded with utf-8. End even though the "something" and "something else" are the same, the comparison fails. Is this understanding correct? If it is, this is surprising, because once XML has been parsed, it is all in Unicode, and the original encoding should not matter. Alex On 5/29/07, Alexander Zatko <[hidden email]> wrote: > Sorry - I meant to say url-generator > > A. > > On May 29, 2007, at 4:51 AM, Erik Bruchez wrote: > > > Alexander Zatko wrote: > >> Is is possible to re-encode a document received from oxf:request- > >> generator? The web site I am accessing using this processor > >> specifies @content="text/html; charset=windows-1250" for the pages > >> it serves. Later in my XPL I am processing the tidy-ed-up HTML, > >> doing some string comparisons using XSLT processor and due to > >> mismatch in the same string encoded differently (windows-1250 > >> versus UTF-8) my comparison logic does not work. > >> The documentation for the oxf:url-generator does not seem to > >> suggest re-encoding is possible using this processor (likely I do > >> not understand the instructions). If this is the case, what are my > >> options? > > > > Do you really mean oxf:request, or oxf:url-generator? > > > > -Erik > > > > -- > > Orbeon Forms - Web Forms for the Enterprise Done the Right Way > > http://www.orbeon.com/ > > > > > > -- > > You receive this message as a subscriber of the ops- > > [hidden email] mailing list. > > To unsubscribe: mailto:[hidden email] > > For general help: mailto:[hidden email]?subject=help > > ObjectWeb mailing lists service home page: http://www.objectweb.org/ > > wws > > > > > -- > You receive this message as a subscriber of the [hidden email] mailing list. > To unsubscribe: mailto:[hidden email] > For general help: mailto:[hidden email]?subject=help > ObjectWeb mailing lists service home page: http://www.objectweb.org/wws > > -- Orbeon Forms - Web 2.0 Forms, open-source, for the Enterprise http://www.orbeon.com/ -- You receive this message as a subscriber of the [hidden email] mailing list. To unsubscribe: mailto:[hidden email] For general help: mailto:[hidden email]?subject=help ObjectWeb mailing lists service home page: http://www.objectweb.org/wws |
Administrator
|
Well in this case the HTML is "tidied-up", and apparently with the wrong
encoding. Tidying supports utf-8 and iso-8859-1 at least. If you use this in the config, does it get better: <encoding>iso-8859-1</encoding> <force-encoding>true</force-encoding> (I suspect that this may not help...) -Erik Alessandro Vernet wrote: > Alex, > > You are retrieving with the url-generator a page from a server. That > page is served in the windows-1250 encoding. Then you compare > something in this page with something else you have in an XML file > encoded with utf-8. End even though the "something" and "something > else" are the same, the comparison fails. Is this understanding > correct? > > If it is, this is surprising, because once XML has been parsed, it is > all in Unicode, and the original encoding should not matter. > > Alex > > On 5/29/07, Alexander Zatko <[hidden email]> wrote: >> Sorry - I meant to say url-generator >> >> A. >> >> On May 29, 2007, at 4:51 AM, Erik Bruchez wrote: >> >> > Alexander Zatko wrote: >> >> Is is possible to re-encode a document received from oxf:request- >> >> generator? The web site I am accessing using this processor >> >> specifies @content="text/html; charset=windows-1250" for the pages >> >> it serves. Later in my XPL I am processing the tidy-ed-up HTML, >> >> doing some string comparisons using XSLT processor and due to >> >> mismatch in the same string encoded differently (windows-1250 >> >> versus UTF-8) my comparison logic does not work. >> >> The documentation for the oxf:url-generator does not seem to >> >> suggest re-encoding is possible using this processor (likely I do >> >> not understand the instructions). If this is the case, what are my >> >> options? >> > >> > Do you really mean oxf:request, or oxf:url-generator? >> > >> > -Erik >> > >> > -- >> > Orbeon Forms - Web Forms for the Enterprise Done the Right Way >> > http://www.orbeon.com/ >> > >> > >> > -- >> > You receive this message as a subscriber of the ops- >> > [hidden email] mailing list. >> > To unsubscribe: mailto:[hidden email] >> > For general help: mailto:[hidden email]?subject=help >> > ObjectWeb mailing lists service home page: http://www.objectweb.org/ >> > wws >> >> >> >> >> -- >> You receive this message as a subscriber of the >> [hidden email] mailing list. >> To unsubscribe: mailto:[hidden email] >> For general help: mailto:[hidden email]?subject=help >> ObjectWeb mailing lists service home page: http://www.objectweb.org/wws >> >> > > -- Orbeon Forms - Web Forms for the Enterprise Done the Right Way http://www.orbeon.com/ -- You receive this message as a subscriber of the [hidden email] mailing list. To unsubscribe: mailto:[hidden email] For general help: mailto:[hidden email]?subject=help ObjectWeb mailing lists service home page: http://www.objectweb.org/wws |
Thank you (both) for your input. I will do some more investigation
into this tomorrow. To answer Alex - yes, you described the case correctly, but I have to double check that what I wrote is indeed the case. I will try to do what Erik suggested and try to simplify my setup to figure out what's up. A. On May 29, 2007, at 6:59 PM, Erik Bruchez wrote: > Well in this case the HTML is "tidied-up", and apparently with the > wrong encoding. Tidying supports utf-8 and iso-8859-1 at least. > > If you use this in the config, does it get better: > > <encoding>iso-8859-1</encoding> > <force-encoding>true</force-encoding> > > (I suspect that this may not help...) > > -Erik > > Alessandro Vernet wrote: >> Alex, >> You are retrieving with the url-generator a page from a server. That >> page is served in the windows-1250 encoding. Then you compare >> something in this page with something else you have in an XML file >> encoded with utf-8. End even though the "something" and "something >> else" are the same, the comparison fails. Is this understanding >> correct? >> If it is, this is surprising, because once XML has been parsed, it is >> all in Unicode, and the original encoding should not matter. >> Alex >> On 5/29/07, Alexander Zatko <[hidden email]> wrote: >>> Sorry - I meant to say url-generator >>> >>> A. >>> >>> On May 29, 2007, at 4:51 AM, Erik Bruchez wrote: >>> >>> > Alexander Zatko wrote: >>> >> Is is possible to re-encode a document received from oxf:request- >>> >> generator? The web site I am accessing using this processor >>> >> specifies @content="text/html; charset=windows-1250" for the >>> pages >>> >> it serves. Later in my XPL I am processing the tidy-ed-up HTML, >>> >> doing some string comparisons using XSLT processor and due to >>> >> mismatch in the same string encoded differently (windows-1250 >>> >> versus UTF-8) my comparison logic does not work. >>> >> The documentation for the oxf:url-generator does not seem to >>> >> suggest re-encoding is possible using this processor (likely I do >>> >> not understand the instructions). If this is the case, what >>> are my >>> >> options? >>> > >>> > Do you really mean oxf:request, or oxf:url-generator? >>> > >>> > -Erik >>> > >>> > -- >>> > Orbeon Forms - Web Forms for the Enterprise Done the Right Way >>> > http://www.orbeon.com/ >>> > >>> > >>> > -- >>> > You receive this message as a subscriber of the ops- >>> > [hidden email] mailing list. >>> > To unsubscribe: mailto:[hidden email] >>> > For general help: mailto:[hidden email]?subject=help >>> > ObjectWeb mailing lists service home page: http:// >>> www.objectweb.org/ >>> > wws >>> >>> >>> >>> >>> -- >>> You receive this message as a subscriber of the ops- >>> [hidden email] mailing list. >>> To unsubscribe: mailto:[hidden email] >>> For general help: mailto:[hidden email]?subject=help >>> ObjectWeb mailing lists service home page: http:// >>> www.objectweb.org/wws >>> >>> > > > -- > Orbeon Forms - Web Forms for the Enterprise Done the Right Way > http://www.orbeon.com/ > > > > -- > You receive this message as a subscriber of the ops- > [hidden email] mailing list. > To unsubscribe: mailto:[hidden email] > For general help: mailto:[hidden email]?subject=help > ObjectWeb mailing lists service home page: http://www.objectweb.org/ > wws -- You receive this message as a subscriber of the [hidden email] mailing list. To unsubscribe: mailto:[hidden email] For general help: mailto:[hidden email]?subject=help ObjectWeb mailing lists service home page: http://www.objectweb.org/wws |
In reply to this post by Erik Bruchez
I finally found time to digg into this issue and found out that the problem was in the MySQL JDBC driver. I guess it is not using UTF-8 by default. After I "instructed" it to do so (see below) the data is written correctly
<datasource> <driver-class-name>com.mysql.jdbc.Driver</driver-class-name> <uri>jdbc:mysql://localhost:3306/test?useUnicode=true&characterEncoding=UTF-8</uri> <username>mysql</username> <password>JksY28*</password> </datasource> A. On May 29, 2007, at 6:59 PM, Erik Bruchez wrote:
-- You receive this message as a subscriber of the [hidden email] mailing list. To unsubscribe: mailto:[hidden email] For general help: mailto:[hidden email]?subject=help ObjectWeb mailing lists service home page: http://www.objectweb.org/wws |
Administrator
|
Alex,
Thanks, that's useful information. -Erik Alexander Zatko wrote: > I finally found time to digg into this issue and found out that the > problem was in the MySQL JDBC driver. I guess it is not using UTF-8 by > default. After I "instructed" it to do so (see below) the data is > written correctly > > <datasource> > <driver-class-name>com.mysql.jdbc.Driver</driver-class-name> > > <uri>jdbc:mysql://localhost:3306/test*?useUnicode=true&characterEncoding=UTF-8*</uri> > <username>mysql</username> > <password>JksY28*</password> > </datasource> > > > A. > > On May 29, 2007, at 6:59 PM, Erik Bruchez wrote: > >> Well in this case the HTML is "tidied-up", and apparently with the >> wrong encoding. Tidying supports utf-8 and iso-8859-1 at least. >> >> If you use this in the config, does it get better: >> >> <encoding>iso-8859-1</encoding> >> <force-encoding>true</force-encoding> >> >> (I suspect that this may not help...) >> >> -Erik >> >> Alessandro Vernet wrote: >>> Alex, >>> You are retrieving with the url-generator a page from a server. That >>> page is served in the windows-1250 encoding. Then you compare >>> something in this page with something else you have in an XML file >>> encoded with utf-8. End even though the "something" and "something >>> else" are the same, the comparison fails. Is this understanding >>> correct? >>> If it is, this is surprising, because once XML has been parsed, it is >>> all in Unicode, and the original encoding should not matter. >>> Alex >>> On 5/29/07, Alexander Zatko <[hidden email] >>> <mailto:[hidden email]>> wrote: >>>> Sorry - I meant to say url-generator >>>> >>>> A. >>>> >>>> On May 29, 2007, at 4:51 AM, Erik Bruchez wrote: >>>> >>>> > Alexander Zatko wrote: >>>> >> Is is possible to re-encode a document received from oxf:request- >>>> >> generator? The web site I am accessing using this processor >>>> >> specifies @content="text/html; charset=windows-1250" for the pages >>>> >> it serves. Later in my XPL I am processing the tidy-ed-up HTML, >>>> >> doing some string comparisons using XSLT processor and due to >>>> >> mismatch in the same string encoded differently (windows-1250 >>>> >> versus UTF-8) my comparison logic does not work. >>>> >> The documentation for the oxf:url-generator does not seem to >>>> >> suggest re-encoding is possible using this processor (likely I do >>>> >> not understand the instructions). If this is the case, what are my >>>> >> options? >>>> > >>>> > Do you really mean oxf:request, or oxf:url-generator? >>>> > >>>> > -Erik >>>> > >>>> > -- >>>> > Orbeon Forms - Web Forms for the Enterprise Done the Right Way >>>> > http://www.orbeon.com/ >>>> > >>>> > >>>> > -- >>>> > You receive this message as a subscriber of the ops- >>>> > [hidden email] <mailto:[hidden email]> mailing list. >>>> > To unsubscribe: mailto:[hidden email] >>>> > For general help: mailto:[hidden email]?subject=help >>>> > ObjectWeb mailing lists service home page: http://www.objectweb.org/ >>>> > wws >>>> >>>> >>>> >>>> >>>> -- >>>> You receive this message as a subscriber of the >>>> [hidden email] <mailto:[hidden email]> mailing list. >>>> To unsubscribe: mailto:[hidden email] >>>> For general help: mailto:[hidden email]?subject=help >>>> ObjectWeb mailing lists service home page: http://www.objectweb.org/wws >>>> >>>> >> >> >> -- >> Orbeon Forms - Web Forms for the Enterprise Done the Right Way >> http://www.orbeon.com/ >> >> >> >> >> -- >> You receive this message as a subscriber of the >> [hidden email] <mailto:[hidden email]> mailing list. >> To unsubscribe: mailto:[hidden email] >> For general help: mailto:[hidden email]?subject=help >> ObjectWeb mailing lists service home page: http://www.objectweb.org/wws > -- Orbeon Forms - Web Forms for the Enterprise Done the Right Way http://www.orbeon.com/ -- You receive this message as a subscriber of the [hidden email] mailing list. To unsubscribe: mailto:[hidden email] For general help: mailto:[hidden email]?subject=help ObjectWeb mailing lists service home page: http://www.objectweb.org/wws |
Free forum by Nabble | Edit this page |