Accessing PDFs using URL Generator or Resource Server processors

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Accessing PDFs using URL Generator or Resource Server processors

Tom Grahame
I'm trying to open a PDF in a browser window via a pipeline that calls a Struts action. I could call the Struts action directly using a load element, but there is a requirement for a simple, clean URL which using the pipeline satisfies.

My first solution was to use the URL generator and push the data to the pipeline output. That works, but causes memory usage to increase in the order of hundreds of megabytes for a 50MB PDF.

Can anyone tell me why this should be?

To get around this I tried configuring the Resource Server processor in place of the URL generator, however the Struts action has a requirement for a valid session and appears to refuse the Resource Server request.

Am I correct in saying the the URL Generator passes a session parameter but the Resource Server doesn't? Is there a way around this?

Is there a best practice method for accessing large PDFs through a pipeline? Some of the PDFs in question are of the 200MB size. I feel as though I'm missing something here.

I attach my pipeline with both configurations, the Resource Server configuration currently commented out.

load-resource.xpl

Thanks,

Tom
Reply | Threaded
Open this post in threaded view
|

Re: Accessing PDFs using URL Generator or Resource Server processors

Erik Bruchez
Administrator
Tom,

On Fri, Feb 12, 2010 at 2:31 AM, Tom Grahame
<[hidden email]> wrote:

>
> I'm trying to open a PDF in a browser window via a pipeline that calls a
> Struts action. I could call the Struts action directly using a load element,
> but there is a requirement for a simple, clean URL which using the pipeline
> satisfies.
>
> My first solution was to use the URL generator and push the data to the
> pipeline output. That works, but causes memory usage to increase in the
> order of hundreds of megabytes for a 50MB PDF.
>
> Can anyone tell me why this should be?
If your PDF goes through a pipeline, it is transformed to XML first (a
root element, and base64-encoded text for the contents). That will
already take more memory than the plain binary. If the pipeline has
tees, the XML will be temporarily stored in memory. That probably
explains why.

To avoid tees, I would make your pipeline a model pipeline, and put
the oxf:http-serializer right there, directly connected to the output
of oxf:url-generator. That *should* allow streaming of the content.

> To get around this I tried configuring the Resource Server processor in
> place of the URL generator, however the Struts action has a requirement for
> a valid session and appears to refuse the Resource Server request.
>
> Am I correct in saying the the URL Generator passes a session parameter but
> the Resource Server doesn't? Is there a way around this?

That's right. oxf:url-generator uses the Connection object, which does
headers forwarding, while oxf:resource-server simply opens a plain
connection. Since in general resources are served from disk, there is
less of a need to forward headers. But that could be changed.

> Is there a best practice method for accessing large PDFs through a pipeline?
> Some of the PDFs in question are of the 200MB size. I feel as though I'm
> missing something here.
>
> I attach my pipeline with both configurations, the Resource Server
> configuration currently commented out.
>
> http://n4.nabble.com/file/n1478368/load-resource.xpl load-resource.xpl

Can you try the suggestion above and see if it helps?

-Erik


--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws
Reply | Threaded
Open this post in threaded view
|

Re: Accessing PDFs using URL Generator or Resource Server processors

Tom Grahame
Erik,

Your suggestion works. And in fact is perfectly adequately explained in the Orbeon documentation, but at least I now understand more about how my pipeline is working.
I attach the amended pipeline in case anyone is following this:

load-resource.xpl

I am still having a problem with high memory use though. I may have to come to terms with the fact that although it is possible to pass a pdf through a pipeline, it may not be appropriate for the large documents I'm dealing with. I'm now investigating serving these documents using an alternative method.

Many thanks,

Tom
Reply | Threaded
Open this post in threaded view
|

Re: Re: Accessing PDFs using URL Generator or Resource Server processors

Erik Bruchez
Administrator
Tom,

Glad it's working. It would be interesting to know why this still
takes a lot of memory, as I am pretty sure that streaming could happen
in this case ;)

-Erik

On Mon, Feb 15, 2010 at 1:57 AM, Tom Grahame
<[hidden email]> wrote:

>
> Erik,
>
> Your suggestion works. And in fact is perfectly adequately explained in the
> Orbeon documentation, but at least I now understand more about how my
> pipeline is working.
> I attach the amended pipeline in case anyone is following this:
>
> http://n4.nabble.com/file/n1555932/load-resource.xpl load-resource.xpl
>
> I am still having a problem with high memory use though. I may have to come
> to terms with the fact that although it is possible to pass a pdf through a
> pipeline, it may not be appropriate for the large documents I'm dealing
> with. I'm now investigating serving these documents using an alternative
> method.
>
> Many thanks,
>
> Tom
> --
> View this message in context: http://n4.nabble.com/Accessing-PDFs-using-URL-Generator-or-Resource-Server-processors-tp1478368p1555932.html
> Sent from the ObjectWeb OPS - Users mailing list archive at Nabble.com.
>
>
> --
> You receive this message as a subscriber of the [hidden email] mailing list.
> To unsubscribe: mailto:[hidden email]
> For general help: mailto:[hidden email]?subject=help
> OW2 mailing lists service home page: http://www.ow2.org/wws
>
>


--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws