UTF encoded request failing with "Invalid byte 2 of 3-byte UTF-8 sequence"

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

UTF encoded request failing with "Invalid byte 2 of 3-byte UTF-8 sequence"

Rene Single
Hi everybody,

I'm currently faced with the issue that Orbeon seems to ignore the
contenttype of the posted request.

I'm HTTP POSTing the attached input.xml to the attached
transformation.xpl and I'm specifying "application/xml;charset=UTF-8" as
contenttype in the POST.
For this I always get the "Invalid byte 2 of 3-byte UTF-8 sequence"
error. (also when using xs:anyURI in the Request processor).
Now when I change my input to be ISO-8859-1 encoded, then it works (even
when specifying just "application/xml" for the contenttype of the request).
Unfortunately using this is not an option for me.

Does anybody have any insight on what might go wrong there ?

By the way I'm using the (rather outdated) Orbeon Forms 3.5.1.200703310056.

kind regards

René

--
----------------------------------------------------
TANNER AG
René Single
Kemptener Straße 99
D-88131 Lindau
Germany

tel +49 8382 272-199
fax +49 8382 272-900
mailto:[hidden email]
http://www.tanner.de

Vorsitzender des Aufsichtsrats: Helmut Tanner
Vorstand: Stefan Kuegel (Vorsitzender), Georg-Friedrich Blocher
Aktiengesellschaft, Lindau (B)
Registergericht Kempten, HRB 7199
----------------------------------------------------

Die TANNER AG ist Träger des Bayerischen Qualitätspreises 2008!


<p:config xmlns:p="http://www.orbeon.com/oxf/pipeline"
          xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
          xmlns:oxf="http://www.orbeon.com/oxf/processors"
          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
          xmlns:xs="http://www.w3.org/2001/XMLSchema">
<!-- Base 64 Request -->
        <p:processor name="oxf:request">
                <p:input name="config">
                        <config stream-type="xs:base64Binary">
                                <include>/request/body</include>
                        </config>
                </p:input>
                <p:output name="data" id="received"/>
        </p:processor>
       
        <p:processor name="oxf:xslt">
                <p:input name="data" href="#received"/>
                <p:input name="config">
                        <xsl:stylesheet version="2.0">
                                <xsl:template match="/">
                                        <xsl:copy-of select="/request/body"/>
                                </xsl:template>
                        </xsl:stylesheet>
                </p:input>
                <p:output name="data" id="bin"/>
        </p:processor>

        <p:processor name="oxf:to-xml-converter">
                <p:input name="data" href="#bin"/>
                <p:input name="config">
                        <config>
                                <content-type>application/xml</content-type>
                                <encoding>UTF-8</encoding>
                                <version>1.0</version>
                        </config>
                </p:input>
                <p:output name="data" id="request"/>
        </p:processor>
       
        <p:processor name="oxf:xml-serializer">
                <p:input name="config">
                        <config/>
                </p:input>
                <p:input name="data" href="#request"/>
        </p:processor>
</p:config>
<?xml version="1.0" encoding="UTF-8"?>
<transformationinput>ä</transformationinput>

--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws
Reply | Threaded
Open this post in threaded view
|

Re: UTF encoded request failing with "Invalid byte 2 of 3-byte UTF-8 sequence"

Erik Bruchez
Administrator
Most likely your input.xml is not properly encoded. Hard to tell as  
you didn't send it. Remember specifying an encoding on a file doesn't  
change it's actual encoding, it is just an indication to the recipient  
that the given encoding should be used for parsing.

If you send your input.xml, please zip it first and attach it to the  
email in order to maximize the chance the encoding is properly  
preserved.

-Erik

On Feb 18, 2009, at 4:36 AM, Rene Single wrote:

> Hi everybody,
>
> I'm currently faced with the issue that Orbeon seems to ignore the  
> contenttype of the posted request.
>
> I'm HTTP POSTing the attached input.xml to the attached  
> transformation.xpl and I'm specifying "application/
> xml;charset=UTF-8" as contenttype in the POST.
> For this I always get the "Invalid byte 2 of 3-byte UTF-8 sequence"  
> error. (also when using xs:anyURI in the Request processor).
> Now when I change my input to be ISO-8859-1 encoded, then it works  
> (even when specifying just "application/xml" for the contenttype of  
> the request).
> Unfortunately using this is not an option for me.
>
> Does anybody have any insight on what might go wrong there ?
>
> By the way I'm using the (rather outdated) Orbeon Forms  
> 3.5.1.200703310056.
>
> kind regards
>
> René
>
> --
> ----------------------------------------------------
> TANNER AG
> René Single
> Kemptener Straße 99
> D-88131 Lindau
> Germany
>
> tel +49 8382 272-199
> fax +49 8382 272-900
> mailto:[hidden email]
> http://www.tanner.de
>
> Vorsitzender des Aufsichtsrats: Helmut Tanner
> Vorstand: Stefan Kuegel (Vorsitzender), Georg-Friedrich Blocher
> Aktiengesellschaft, Lindau (B)
> Registergericht Kempten, HRB 7199
> ----------------------------------------------------
>
> Die TANNER AG ist Träger des Bayerischen Qualitätspreises 2008!
>
> <p:config xmlns:p="http://www.orbeon.com/oxf/pipeline"
>  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>  xmlns:oxf="http://www.orbeon.com/oxf/processors"
>  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>  xmlns:xs="http://www.w3.org/2001/XMLSchema">
> <!-- Base 64 Request -->
> <p:processor name="oxf:request">
> <p:input name="config">
> <config stream-type="xs:base64Binary">
> <include>/request/body</include>
> </config>
> </p:input>
> <p:output name="data" id="received"/>
> </p:processor>
>
> <p:processor name="oxf:xslt">
> <p:input name="data" href="#received"/>
> <p:input name="config">
> <xsl:stylesheet version="2.0">
> <xsl:template match="/">
> <xsl:copy-of select="/request/body"/>
> </xsl:template>
> </xsl:stylesheet>
> </p:input>
> <p:output name="data" id="bin"/>
> </p:processor>
>
> <p:processor name="oxf:to-xml-converter">
> <p:input name="data" href="#bin"/>
> <p:input name="config">
> <config>
> <content-type>application/xml</content-type>
> <encoding>UTF-8</encoding>
> <version>1.0</version>
> </config>
> </p:input>
> <p:output name="data" id="request"/>
> </p:processor>
>
> <p:processor name="oxf:xml-serializer">
> <p:input name="config">
> <config/>
> </p:input>
> <p:input name="data" href="#request"/>
> </p:processor>
> </p:config><?xml version="1.0" encoding="UTF-8"?>
> <transformationinput>ä</transformationinput>
> --
> You receive this message as a subscriber of the [hidden email]  
> mailing list.
> To unsubscribe: mailto:[hidden email]
> For general help: mailto:[hidden email]?subject=help
> OW2 mailing lists service home page: http://www.ow2.org/wws
--
Orbeon Forms - Web Forms for the Enterprise Done the Right Way
http://www.orbeon.com/



--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws
Reply | Threaded
Open this post in threaded view
|

Re: Re: UTF encoded request failing with "Invalid byte 2 of 3-byte UTF-8 sequence"

Rene Single
Hi Erik,

Erik Bruchez schrieb:
> Most likely your input.xml is not properly encoded. Hard to tell as
> you didn't send it. Remember specifying an encoding on a file doesn't
> change it's actual encoding, it is just an indication to the recipient
> that the given encoding should be used for parsing.
>
> If you send your input.xml, please zip it first and attach it to the
> email in order to maximize the chance the encoding is properly preserved.
I did attach the input.xml ;)

Although , on your request I provide it here zipped again together with
the xpl in question (which does not a lot right now, but shows the issue).

kind regards

René

>
> -Erik
>
> On Feb 18, 2009, at 4:36 AM, Rene Single wrote:
>
>> Hi everybody,
>>
>> I'm currently faced with the issue that Orbeon seems to ignore the
>> contenttype of the posted request.
>>
>> I'm HTTP POSTing the attached input.xml to the attached
>> transformation.xpl and I'm specifying "application/xml;charset=UTF-8"
>> as contenttype in the POST.
>> For this I always get the "Invalid byte 2 of 3-byte UTF-8 sequence"
>> error. (also when using xs:anyURI in the Request processor).
>> Now when I change my input to be ISO-8859-1 encoded, then it works
>> (even when specifying just "application/xml" for the contenttype of
>> the request).
>> Unfortunately using this is not an option for me.
>>
>> Does anybody have any insight on what might go wrong there ?
>>
>> By the way I'm using the (rather outdated) Orbeon Forms
>> 3.5.1.200703310056.
>>
>> kind regards
>>
>> René
>>
>> --
>> ----------------------------------------------------
>> TANNER AG
>> René Single              
>> Kemptener Straße 99
>> D-88131 Lindau
>> Germany
>>
>> tel +49 8382 272-199
>> fax +49 8382 272-900
>> mailto:[hidden email]
>> http://www.tanner.de
>>
>> Vorsitzender des Aufsichtsrats: Helmut Tanner
>> Vorstand: Stefan Kuegel (Vorsitzender), Georg-Friedrich Blocher
>> Aktiengesellschaft, Lindau (B)
>> Registergericht Kempten, HRB 7199
>> ----------------------------------------------------
>>
>> Die TANNER AG ist Träger des Bayerischen Qualitätspreises 2008!
>>
>> <p:config xmlns:p="http://www.orbeon.com/oxf/pipeline"
>>       xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
>>       xmlns:oxf="http://www.orbeon.com/oxf/processors"
>>       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
>>       xmlns:xs="http://www.w3.org/2001/XMLSchema">
>> <!-- Base 64 Request -->
>>     <p:processor name="oxf:request">
>>         <p:input name="config">
>>             <config stream-type="xs:base64Binary">
>>                 <include>/request/body</include>
>>             </config>
>>         </p:input>
>>         <p:output name="data" id="received"/>
>>     </p:processor>
>>    
>>     <p:processor name="oxf:xslt">
>>         <p:input name="data" href="#received"/>
>>         <p:input name="config">
>>             <xsl:stylesheet version="2.0">
>>                 <xsl:template match="/">
>>                     <xsl:copy-of select="/request/body"/>
>>                 </xsl:template>
>>             </xsl:stylesheet>
>>         </p:input>
>>         <p:output name="data" id="bin"/>
>>     </p:processor>
>>
>>     <p:processor name="oxf:to-xml-converter">
>>         <p:input name="data" href="#bin"/>
>>         <p:input name="config">
>>             <config>
>>                 <content-type>application/xml</content-type>
>>                 <encoding>UTF-8</encoding>
>>                 <version>1.0</version>
>>             </config>
>>         </p:input>
>>         <p:output name="data" id="request"/>
>>     </p:processor>
>>    
>>     <p:processor name="oxf:xml-serializer">
>>         <p:input name="config">
>>             <config/>
>>         </p:input>
>>         <p:input name="data" href="#request"/>
>>     </p:processor>
>> </p:config><?xml version="1.0" encoding="UTF-8"?>
>> <transformationinput>ä</transformationinput>
>> --
>> You receive this message as a subscriber of the [hidden email]
>> mailing list.
>> To unsubscribe: mailto:[hidden email]
>> For general help: mailto:[hidden email]?subject=help
>> OW2 mailing lists service home page: http://www.ow2.org/wws
>
> --
> Orbeon Forms - Web Forms for the Enterprise Done the Right Way
> http://www.orbeon.com/
>

--
----------------------------------------------------
TANNER AG
René Single
Kemptener Straße 99
D-88131 Lindau
Germany

tel +49 8382 272-199
fax +49 8382 272-900
mailto:[hidden email]
http://www.tanner.de

Vorsitzender des Aufsichtsrats: Helmut Tanner
Vorstand: Stefan Kuegel (Vorsitzender), Georg-Friedrich Blocher
Aktiengesellschaft, Lindau (B)
Registergericht Kempten, HRB 7199
----------------------------------------------------

Die TANNER AG ist Träger des Bayerischen Qualitätspreises 2008!



--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws

input.zip (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Re: Re: UTF encoded request failing with "Invalid byte 2 of 3-byte UTF-8 sequence"

Erik Bruchez
Administrator
> I did attach the input.xml ;)

I don't see an attachment in your previous message. Maybe you pasted  
it into the message, but that won't help!

-Erik

--
Orbeon Forms - Web Forms for the Enterprise Done the Right Way
http://www.orbeon.com/



--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws
Reply | Threaded
Open this post in threaded view
|

Re: Re: Re: Re: UTF encoded request failing with "Invalid byte 2 of 3-byte UTF-8 sequence"

Rene Single
Hi Erik,

Erik Bruchez schrieb:
>> I did attach the input.xml ;)
>
> I don't see an attachment in your previous message. Maybe you pasted
> it into the message, but that won't help!
That's rather strange then, because for me my own message I get back
from the list does contain my attachments...
Here's another stab at it...

René

--
----------------------------------------------------
TANNER AG
René Single
Kemptener Straße 99
D-88131 Lindau
Germany

tel +49 8382 272-199
fax +49 8382 272-900
mailto:[hidden email]
http://www.tanner.de

Vorsitzender des Aufsichtsrats: Helmut Tanner
Vorstand: Stefan Kuegel (Vorsitzender), Georg-Friedrich Blocher
Aktiengesellschaft, Lindau (B)
Registergericht Kempten, HRB 7199
----------------------------------------------------

Die TANNER AG ist Träger des Bayerischen Qualitätspreises 2008!



--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws

input.zip (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Re: Re: Re: Re: UTF encoded request failing with "Invalid byte 2 of 3-byte UTF-8 sequence"

Erik Bruchez
Administrator
It seems that this file is properly encoded. So I am not sure what's  
wrong, but as you say this is with a fairly old version...

This said, you should be able to just use the "instance" input of a  
model or view pipeline to extract XML POSTed to your page. Maybe you  
can try that.

-Erik

On Feb 18, 2009, at 10:30 PM, Rene Single wrote:

> Hi Erik,
>
> Erik Bruchez schrieb:
>>> I did attach the input.xml ;)
>>
>> I don't see an attachment in your previous message. Maybe you  
>> pasted it into the message, but that won't help!
> That's rather strange then, because for me my own message I get back  
> from the list does contain my attachments...
> Here's another stab at it...
>
> René
>
> --
> ----------------------------------------------------
> TANNER AG
> René Single
> Kemptener Straße 99
> D-88131 Lindau
> Germany
>
> tel +49 8382 272-199
> fax +49 8382 272-900
> mailto:[hidden email]
> http://www.tanner.de
>
> Vorsitzender des Aufsichtsrats: Helmut Tanner
> Vorstand: Stefan Kuegel (Vorsitzender), Georg-Friedrich Blocher
> Aktiengesellschaft, Lindau (B)
> Registergericht Kempten, HRB 7199
> ----------------------------------------------------
>
> Die TANNER AG ist Träger des Bayerischen Qualitätspreises 2008!
>
> <input.zip>
> --
> You receive this message as a subscriber of the [hidden email]  
> mailing list.
> To unsubscribe: mailto:[hidden email]
> For general help: mailto:[hidden email]?subject=help
> OW2 mailing lists service home page: http://www.ow2.org/wws
--
Orbeon Forms - Web Forms for the Enterprise Done the Right Way
http://www.orbeon.com/



--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws
Reply | Threaded
Open this post in threaded view
|

Re: Re: Re: Re: UTF encoded request failing with "Invalid byte 2 of 3-byte UTF-8 sequence"

Rene Single
In reply to this post by Erik Bruchez
Erik Bruchez schrieb:
>> I did attach the input.xml ;)
>
> I don't see an attachment in your previous message. Maybe you pasted
> it into the message, but that won't help!
And if the second try with the attachments also didn't work, It's easy
to explain. The input Data just contains one element (as it's just a
test input) "<transformationinput>ä</transformationinput>". And it
definitely is in correct encoding, because it parses fine using DOMPrint
(and has been verified to do so from a coworker).
The pipeline is plain simple:

<p:config xmlns:p="http://www.orbeon.com/oxf/pipeline"
      xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
      xmlns:oxf="http://www.orbeon.com/oxf/processors"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xmlns:xs="http://www.w3.org/2001/XMLSchema">
<!-- Base 64 Request -->
    <p:processor name="oxf:request">
        <p:input name="config">
            <config stream-type="xs:base64Binary">
                <include>/request/body</include>
            </config>
        </p:input>
        <p:output name="data" id="received"/>
    </p:processor>
   
    <p:processor name="oxf:xslt">
        <p:input name="data" href="#received"/>
        <p:input name="config">
            <xsl:stylesheet version="2.0">
                <xsl:template match="/">
                    <xsl:copy-of select="/request/body"/>
                </xsl:template>
            </xsl:stylesheet>
        </p:input>
        <p:output name="data" id="bin"/>
    </p:processor>

    <p:processor name="oxf:to-xml-converter">
        <p:input name="data" href="#bin"/>
        <p:input name="config">
            <config>
                <content-type>application/xml</content-type>
                <encoding>UTF-8</encoding>
                <version>1.0</version>
            </config>
        </p:input>
        <p:output name="data" id="request"/>
    </p:processor>
   
    <p:processor name="oxf:xml-serializer">
        <p:input name="config">
            <config/>
        </p:input>
        <p:input name="data" href="#request"/>
    </p:processor>
</p:config>
 


The request logged through tcpTrace is as follows:

POST /ops/transformation HTTP/1.1
Accept: application/xml
Content-Type: application/xml;charset=UTF-8
User-Agent: Jakarta Commons-HttpClient/3.1
Host: localhost:8086
Content-Length: 45

<transformationinput>
ä</transformationinput>





--
----------------------------------------------------
TANNER AG
René Single
Kemptener Straße 99
D-88131 Lindau
Germany

tel +49 8382 272-199
fax +49 8382 272-900
mailto:[hidden email]
http://www.tanner.de

Vorsitzender des Aufsichtsrats: Helmut Tanner
Vorstand: Stefan Kuegel (Vorsitzender), Georg-Friedrich Blocher
Aktiengesellschaft, Lindau (B)
Registergericht Kempten, HRB 7199
----------------------------------------------------

Die TANNER AG ist Träger des Bayerischen Qualitätspreises 2008!



--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws
Reply | Threaded
Open this post in threaded view
|

Re: Re: Re: Re: Re: UTF encoded request failing with "Invalid byte 2 of 3-byte UTF-8 sequence"

Alessandro Vernet
Administrator
On Feb 18, 2009, at 10:45 PM, Rene Single wrote:
> And if the second try with the attachments also didn't work, It's  
> easy to explain.

Could you check if this is also happening with a recent nightly build?

Alex
--
Orbeon Forms - Web 2.0 Forms, open-source, for the Enterprise
Orbeon's Blog: http://www.orbeon.com/blog/
Personal Blog: http://avernet.blogspot.com/
Twitter - http://twitter.com/avernet



--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws
Reply | Threaded
Open this post in threaded view
|

***?SPAM?*** Re: Re: Re: Re: Re: UTF encoded request failing with "Invalid byte 2 of 3-byte UTF-8 sequence"

Rene Single
In reply to this post by Rene Single
Ok,

just for everybody's interest. I feel stupid now :(

We had an encoding bug in our client code that handled the response from
the Orbeon pipeline. So now everything works, i.e. Orbeon doesn't have
an issue with UTF-8 encoded requests.

kind regards

René

--
----------------------------------------------------
TANNER AG
René Single
Kemptener Straße 99
D-88131 Lindau
Germany

tel +49 8382 272-199
fax +49 8382 272-900
mailto:[hidden email]
http://www.tanner.de

Vorsitzender des Aufsichtsrats: Helmut Tanner
Vorstand: Stefan Kuegel (Vorsitzender), Georg-Friedrich Blocher
Aktiengesellschaft, Lindau (B)
Registergericht Kempten, HRB 7199
----------------------------------------------------

Die TANNER AG ist Träger des Bayerischen Qualitätspreises 2008!


Spam detection software, running on the system "lnxrelay2.tanner.de", has
identified this incoming email as possible spam.  The original message
has been attached to this so you can view it (if it isn't spam) or label
similar future email.  If you have any questions, see
[hidden email] for details.

Content preview:  Ok, just for everybody's interest. I feel stupid now :(
  We had an encoding bug in our client code that handled the response from
  the Orbeon pipeline. So now everything works, i.e. Orbeon doesn't have
  an issue with UTF-8 encoded requests. [...]

Content analysis details:   (5.0 points, 5.0 required)

 pts rule name              description
---- ---------------------- --------------------------------------------------
 5.0 BAYES_99               BODY: Bayesian spam probability is 99 to 100%
                            [score: 1.0000]




--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws
Reply | Threaded
Open this post in threaded view
|

Re: ***?SPAM?*** Re: Re: Re: Re: Re: UTF encoded request failing with "Invalid byte 2 of 3-byte UTF-8 sequence"

Alessandro Vernet
Administrator
René,

On Feb 22, 2009, at 10:20 PM, Rene Single wrote:

> We had an encoding bug in our client code that handled the response  
> from the Orbeon pipeline. So now everything works, i.e. Orbeon  
> doesn't have an issue with UTF-8 encoded requests.

Good to know! Thank you for sending this update.

Alex
--
Orbeon Forms - Web 2.0 Forms, open-source, for the Enterprise
Orbeon's Blog: http://www.orbeon.com/blog/
Personal Blog: http://avernet.blogspot.com/



--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws