encoding question

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

encoding question

Alexander Žaťko
Is is possible to re-encode a document received from oxf:request-
generator? The web site I am accessing using this processor specifies  
@content="text/html; charset=windows-1250" for the pages it serves.  
Later in my XPL I am processing the tidy-ed-up HTML, doing some  
string comparisons using XSLT processor and due to mismatch in the  
same string encoded differently (windows-1250 versus UTF-8) my  
comparison logic does not work.

The documentation for the oxf:url-generator does not seem to suggest  
re-encoding is possible using this processor (likely I do not  
understand the instructions). If this is the case, what are my options?

Thanks

A.



--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
ObjectWeb mailing lists service home page: http://www.objectweb.org/wws
Reply | Threaded
Open this post in threaded view
|

Re: encoding question

Erik Bruchez
Administrator
Alexander Zatko wrote:

> Is is possible to re-encode a document received from
> oxf:request-generator? The web site I am accessing using this processor
> specifies @content="text/html; charset=windows-1250" for the pages it
> serves. Later in my XPL I am processing the tidy-ed-up HTML, doing some
> string comparisons using XSLT processor and due to mismatch in the same
> string encoded differently (windows-1250 versus UTF-8) my comparison
> logic does not work.
>
> The documentation for the oxf:url-generator does not seem to suggest
> re-encoding is possible using this processor (likely I do not understand
> the instructions). If this is the case, what are my options?
Do you really mean oxf:request, or oxf:url-generator?

-Erik

--
Orbeon Forms - Web Forms for the Enterprise Done the Right Way
http://www.orbeon.com/



--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
ObjectWeb mailing lists service home page: http://www.objectweb.org/wws
Reply | Threaded
Open this post in threaded view
|

Re: encoding question

Alexander Žaťko
Sorry - I meant to say url-generator

A.

On May 29, 2007, at 4:51 AM, Erik Bruchez wrote:

> Alexander Zatko wrote:
>> Is is possible to re-encode a document received from oxf:request-
>> generator? The web site I am accessing using this processor  
>> specifies @content="text/html; charset=windows-1250" for the pages  
>> it serves. Later in my XPL I am processing the tidy-ed-up HTML,  
>> doing some string comparisons using XSLT processor and due to  
>> mismatch in the same string encoded differently (windows-1250  
>> versus UTF-8) my comparison logic does not work.
>> The documentation for the oxf:url-generator does not seem to  
>> suggest re-encoding is possible using this processor (likely I do  
>> not understand the instructions). If this is the case, what are my  
>> options?
>
> Do you really mean oxf:request, or oxf:url-generator?
>
> -Erik
>
> --
> Orbeon Forms - Web Forms for the Enterprise Done the Right Way
> http://www.orbeon.com/
>
>
> --
> You receive this message as a subscriber of the ops-
> [hidden email] mailing list.
> To unsubscribe: mailto:[hidden email]
> For general help: mailto:[hidden email]?subject=help
> ObjectWeb mailing lists service home page: http://www.objectweb.org/ 
> wws



--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
ObjectWeb mailing lists service home page: http://www.objectweb.org/wws
Reply | Threaded
Open this post in threaded view
|

Re: encoding question

Alessandro Vernet
Administrator
Alex,

You are retrieving with the url-generator a page from a server. That
page is served in the windows-1250 encoding. Then you compare
something in this page with something else you have in an XML file
encoded with utf-8. End even though the "something" and "something
else" are the same, the comparison fails. Is this understanding
correct?

If it is, this is surprising, because once XML has been parsed, it is
all in Unicode, and the original encoding should not matter.

Alex

On 5/29/07, Alexander Zatko <[hidden email]> wrote:

> Sorry - I meant to say url-generator
>
> A.
>
> On May 29, 2007, at 4:51 AM, Erik Bruchez wrote:
>
> > Alexander Zatko wrote:
> >> Is is possible to re-encode a document received from oxf:request-
> >> generator? The web site I am accessing using this processor
> >> specifies @content="text/html; charset=windows-1250" for the pages
> >> it serves. Later in my XPL I am processing the tidy-ed-up HTML,
> >> doing some string comparisons using XSLT processor and due to
> >> mismatch in the same string encoded differently (windows-1250
> >> versus UTF-8) my comparison logic does not work.
> >> The documentation for the oxf:url-generator does not seem to
> >> suggest re-encoding is possible using this processor (likely I do
> >> not understand the instructions). If this is the case, what are my
> >> options?
> >
> > Do you really mean oxf:request, or oxf:url-generator?
> >
> > -Erik
> >
> > --
> > Orbeon Forms - Web Forms for the Enterprise Done the Right Way
> > http://www.orbeon.com/
> >
> >
> > --
> > You receive this message as a subscriber of the ops-
> > [hidden email] mailing list.
> > To unsubscribe: mailto:[hidden email]
> > For general help: mailto:[hidden email]?subject=help
> > ObjectWeb mailing lists service home page: http://www.objectweb.org/
> > wws
>
>
>
>
> --
> You receive this message as a subscriber of the [hidden email] mailing list.
> To unsubscribe: mailto:[hidden email]
> For general help: mailto:[hidden email]?subject=help
> ObjectWeb mailing lists service home page: http://www.objectweb.org/wws
>
>

--
Orbeon Forms - Web 2.0 Forms, open-source, for the Enterprise
http://www.orbeon.com/



--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
ObjectWeb mailing lists service home page: http://www.objectweb.org/wws
Reply | Threaded
Open this post in threaded view
|

Re: encoding question

Erik Bruchez
Administrator
Well in this case the HTML is "tidied-up", and apparently with the wrong
encoding. Tidying supports utf-8 and iso-8859-1 at least.

If you use this in the config, does it get better:

<encoding>iso-8859-1</encoding>
<force-encoding>true</force-encoding>

(I suspect that this may not help...)

-Erik

Alessandro Vernet wrote:

> Alex,
>
> You are retrieving with the url-generator a page from a server. That
> page is served in the windows-1250 encoding. Then you compare
> something in this page with something else you have in an XML file
> encoded with utf-8. End even though the "something" and "something
> else" are the same, the comparison fails. Is this understanding
> correct?
>
> If it is, this is surprising, because once XML has been parsed, it is
> all in Unicode, and the original encoding should not matter.
>
> Alex
>
> On 5/29/07, Alexander Zatko <[hidden email]> wrote:
>> Sorry - I meant to say url-generator
>>
>> A.
>>
>> On May 29, 2007, at 4:51 AM, Erik Bruchez wrote:
>>
>> > Alexander Zatko wrote:
>> >> Is is possible to re-encode a document received from oxf:request-
>> >> generator? The web site I am accessing using this processor
>> >> specifies @content="text/html; charset=windows-1250" for the pages
>> >> it serves. Later in my XPL I am processing the tidy-ed-up HTML,
>> >> doing some string comparisons using XSLT processor and due to
>> >> mismatch in the same string encoded differently (windows-1250
>> >> versus UTF-8) my comparison logic does not work.
>> >> The documentation for the oxf:url-generator does not seem to
>> >> suggest re-encoding is possible using this processor (likely I do
>> >> not understand the instructions). If this is the case, what are my
>> >> options?
>> >
>> > Do you really mean oxf:request, or oxf:url-generator?
>> >
>> > -Erik
>> >
>> > --
>> > Orbeon Forms - Web Forms for the Enterprise Done the Right Way
>> > http://www.orbeon.com/
>> >
>> >
>> > --
>> > You receive this message as a subscriber of the ops-
>> > [hidden email] mailing list.
>> > To unsubscribe: mailto:[hidden email]
>> > For general help: mailto:[hidden email]?subject=help
>> > ObjectWeb mailing lists service home page: http://www.objectweb.org/
>> > wws
>>
>>
>>
>>
>> --
>> You receive this message as a subscriber of the
>> [hidden email] mailing list.
>> To unsubscribe: mailto:[hidden email]
>> For general help: mailto:[hidden email]?subject=help
>> ObjectWeb mailing lists service home page: http://www.objectweb.org/wws
>>
>>
>
>

--
Orbeon Forms - Web Forms for the Enterprise Done the Right Way
http://www.orbeon.com/
       



--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
ObjectWeb mailing lists service home page: http://www.objectweb.org/wws
Reply | Threaded
Open this post in threaded view
|

Re: encoding question

Alexander Žaťko
Thank you (both) for your input. I will do some more investigation  
into this tomorrow. To answer Alex - yes, you described the case  
correctly, but I have to double check that what I wrote is indeed the  
case. I will try to do what Erik suggested and try to simplify my  
setup to figure out what's up.

A.


On May 29, 2007, at 6:59 PM, Erik Bruchez wrote:

> Well in this case the HTML is "tidied-up", and apparently with the  
> wrong encoding. Tidying supports utf-8 and iso-8859-1 at least.
>
> If you use this in the config, does it get better:
>
> <encoding>iso-8859-1</encoding>
> <force-encoding>true</force-encoding>
>
> (I suspect that this may not help...)
>
> -Erik
>
> Alessandro Vernet wrote:
>> Alex,
>> You are retrieving with the url-generator a page from a server. That
>> page is served in the windows-1250 encoding. Then you compare
>> something in this page with something else you have in an XML file
>> encoded with utf-8. End even though the "something" and "something
>> else" are the same, the comparison fails. Is this understanding
>> correct?
>> If it is, this is surprising, because once XML has been parsed, it is
>> all in Unicode, and the original encoding should not matter.
>> Alex
>> On 5/29/07, Alexander Zatko <[hidden email]> wrote:
>>> Sorry - I meant to say url-generator
>>>
>>> A.
>>>
>>> On May 29, 2007, at 4:51 AM, Erik Bruchez wrote:
>>>
>>> > Alexander Zatko wrote:
>>> >> Is is possible to re-encode a document received from oxf:request-
>>> >> generator? The web site I am accessing using this processor
>>> >> specifies @content="text/html; charset=windows-1250" for the  
>>> pages
>>> >> it serves. Later in my XPL I am processing the tidy-ed-up HTML,
>>> >> doing some string comparisons using XSLT processor and due to
>>> >> mismatch in the same string encoded differently (windows-1250
>>> >> versus UTF-8) my comparison logic does not work.
>>> >> The documentation for the oxf:url-generator does not seem to
>>> >> suggest re-encoding is possible using this processor (likely I do
>>> >> not understand the instructions). If this is the case, what  
>>> are my
>>> >> options?
>>> >
>>> > Do you really mean oxf:request, or oxf:url-generator?
>>> >
>>> > -Erik
>>> >
>>> > --
>>> > Orbeon Forms - Web Forms for the Enterprise Done the Right Way
>>> > http://www.orbeon.com/
>>> >
>>> >
>>> > --
>>> > You receive this message as a subscriber of the ops-
>>> > [hidden email] mailing list.
>>> > To unsubscribe: mailto:[hidden email]
>>> > For general help: mailto:[hidden email]?subject=help
>>> > ObjectWeb mailing lists service home page: http://
>>> www.objectweb.org/
>>> > wws
>>>
>>>
>>>
>>>
>>> --
>>> You receive this message as a subscriber of the ops-
>>> [hidden email] mailing list.
>>> To unsubscribe: mailto:[hidden email]
>>> For general help: mailto:[hidden email]?subject=help
>>> ObjectWeb mailing lists service home page: http://
>>> www.objectweb.org/wws
>>>
>>>
>
>
> --
> Orbeon Forms - Web Forms for the Enterprise Done the Right Way
> http://www.orbeon.com/
>
>
>
> --
> You receive this message as a subscriber of the ops-
> [hidden email] mailing list.
> To unsubscribe: mailto:[hidden email]
> For general help: mailto:[hidden email]?subject=help
> ObjectWeb mailing lists service home page: http://www.objectweb.org/ 
> wws



--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
ObjectWeb mailing lists service home page: http://www.objectweb.org/wws
Reply | Threaded
Open this post in threaded view
|

Re: encoding question

Alexander Žaťko
In reply to this post by Erik Bruchez
I finally found time to digg into this issue and found out that the problem was in the MySQL JDBC driver. I guess it is not using UTF-8 by default. After I "instructed" it to do so (see below) the data is written correctly

     <datasource>
       <driver-class-name>com.mysql.jdbc.Driver</driver-class-name>
       <uri>jdbc:mysql://localhost:3306/test?useUnicode=true&amp;characterEncoding=UTF-8</uri>
       <username>mysql</username>
       <password>JksY28*</password>
     </datasource>


A.

On May 29, 2007, at 6:59 PM, Erik Bruchez wrote:

Well in this case the HTML is "tidied-up", and apparently with the wrong encoding. Tidying supports utf-8 and iso-8859-1 at least.

If you use this in the config, does it get better:

<encoding>iso-8859-1</encoding>
<force-encoding>true</force-encoding>

(I suspect that this may not help...)

-Erik

Alessandro Vernet wrote:
Alex,
You are retrieving with the url-generator a page from a server. That
page is served in the windows-1250 encoding. Then you compare
something in this page with something else you have in an XML file
encoded with utf-8. End even though the "something" and "something
else" are the same, the comparison fails. Is this understanding
correct?
If it is, this is surprising, because once XML has been parsed, it is
all in Unicode, and the original encoding should not matter.
Alex
On 5/29/07, Alexander Zatko <[hidden email]> wrote:
Sorry - I meant to say url-generator

A.

On May 29, 2007, at 4:51 AM, Erik Bruchez wrote:

> Alexander Zatko wrote:
>> Is is possible to re-encode a document received from oxf:request-
>> generator? The web site I am accessing using this processor
>> specifies @content="text/html; charset=windows-1250" for the pages
>> it serves. Later in my XPL I am processing the tidy-ed-up HTML,
>> doing some string comparisons using XSLT processor and due to
>> mismatch in the same string encoded differently (windows-1250
>> versus UTF-8) my comparison logic does not work.
>> The documentation for the oxf:url-generator does not seem to
>> suggest re-encoding is possible using this processor (likely I do
>> not understand the instructions). If this is the case, what are my
>> options?
>
> Do you really mean oxf:request, or oxf:url-generator?
>
> -Erik
>
> --
> Orbeon Forms - Web Forms for the Enterprise Done the Right Way
>
>
> --
> You receive this message as a subscriber of the ops-
> [hidden email] mailing list.
> To unsubscribe: [hidden email]
> For general help: [hidden email]
> ObjectWeb mailing lists service home page: http://www.objectweb.org/
> wws




-- 
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: [hidden email]
For general help: [hidden email]
ObjectWeb mailing lists service home page: http://www.objectweb.org/wws




-- 
Orbeon Forms - Web Forms for the Enterprise Done the Right Way




--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: [hidden email]
For general help: [hidden email]
ObjectWeb mailing lists service home page: http://www.objectweb.org/wws



--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
ObjectWeb mailing lists service home page: http://www.objectweb.org/wws
Reply | Threaded
Open this post in threaded view
|

Re: encoding question

Erik Bruchez
Administrator
Alex,

Thanks, that's useful information.

-Erik

Alexander Zatko wrote:

> I finally found time to digg into this issue and found out that the
> problem was in the MySQL JDBC driver. I guess it is not using UTF-8 by
> default. After I "instructed" it to do so (see below) the data is
> written correctly
>
>      <datasource>
>        <driver-class-name>com.mysql.jdbc.Driver</driver-class-name>
>        
> <uri>jdbc:mysql://localhost:3306/test*?useUnicode=true&amp;characterEncoding=UTF-8*</uri>
>        <username>mysql</username>
>        <password>JksY28*</password>
>      </datasource>
>
>
> A.
>
> On May 29, 2007, at 6:59 PM, Erik Bruchez wrote:
>
>> Well in this case the HTML is "tidied-up", and apparently with the
>> wrong encoding. Tidying supports utf-8 and iso-8859-1 at least.
>>
>> If you use this in the config, does it get better:
>>
>> <encoding>iso-8859-1</encoding>
>> <force-encoding>true</force-encoding>
>>
>> (I suspect that this may not help...)
>>
>> -Erik
>>
>> Alessandro Vernet wrote:
>>> Alex,
>>> You are retrieving with the url-generator a page from a server. That
>>> page is served in the windows-1250 encoding. Then you compare
>>> something in this page with something else you have in an XML file
>>> encoded with utf-8. End even though the "something" and "something
>>> else" are the same, the comparison fails. Is this understanding
>>> correct?
>>> If it is, this is surprising, because once XML has been parsed, it is
>>> all in Unicode, and the original encoding should not matter.
>>> Alex
>>> On 5/29/07, Alexander Zatko <[hidden email]
>>> <mailto:[hidden email]>> wrote:
>>>> Sorry - I meant to say url-generator
>>>>
>>>> A.
>>>>
>>>> On May 29, 2007, at 4:51 AM, Erik Bruchez wrote:
>>>>
>>>> > Alexander Zatko wrote:
>>>> >> Is is possible to re-encode a document received from oxf:request-
>>>> >> generator? The web site I am accessing using this processor
>>>> >> specifies @content="text/html; charset=windows-1250" for the pages
>>>> >> it serves. Later in my XPL I am processing the tidy-ed-up HTML,
>>>> >> doing some string comparisons using XSLT processor and due to
>>>> >> mismatch in the same string encoded differently (windows-1250
>>>> >> versus UTF-8) my comparison logic does not work.
>>>> >> The documentation for the oxf:url-generator does not seem to
>>>> >> suggest re-encoding is possible using this processor (likely I do
>>>> >> not understand the instructions). If this is the case, what are my
>>>> >> options?
>>>> >
>>>> > Do you really mean oxf:request, or oxf:url-generator?
>>>> >
>>>> > -Erik
>>>> >
>>>> > --
>>>> > Orbeon Forms - Web Forms for the Enterprise Done the Right Way
>>>> > http://www.orbeon.com/
>>>> >
>>>> >
>>>> > --
>>>> > You receive this message as a subscriber of the ops-
>>>> > [hidden email] <mailto:[hidden email]> mailing list.
>>>> > To unsubscribe: mailto:[hidden email]
>>>> > For general help: mailto:[hidden email]?subject=help
>>>> > ObjectWeb mailing lists service home page: http://www.objectweb.org/
>>>> > wws
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> You receive this message as a subscriber of the
>>>> [hidden email] <mailto:[hidden email]> mailing list.
>>>> To unsubscribe: mailto:[hidden email]
>>>> For general help: mailto:[hidden email]?subject=help
>>>> ObjectWeb mailing lists service home page: http://www.objectweb.org/wws
>>>>
>>>>
>>
>>
>> --
>> Orbeon Forms - Web Forms for the Enterprise Done the Right Way
>> http://www.orbeon.com/
>>
>>
>>
>>
>> --
>> You receive this message as a subscriber of the
>> [hidden email] <mailto:[hidden email]> mailing list.
>> To unsubscribe: mailto:[hidden email]
>> For general help: mailto:[hidden email]?subject=help
>> ObjectWeb mailing lists service home page: http://www.objectweb.org/wws
>

--
Orbeon Forms - Web Forms for the Enterprise Done the Right Way
http://www.orbeon.com/



--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
ObjectWeb mailing lists service home page: http://www.objectweb.org/wws