Getting pipeline outputs with org.orbeon.oxf.pipeline.api

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Getting pipeline outputs with org.orbeon.oxf.pipeline.api

Eric van der Vlist
Hi,

Am I wrong or isn't it possible to retrieve the outputs of a pipeline
using the org.orbeon.oxf.pipeline.api?

What I'd like to do is to instanciate a pipeline processor, connect its
config input on a static URI (so far so good), send it a second input as
SAX events and read its unique output as SAX events too. The second
input and the output would have predefined names (like the data
input/output of a page view if you like).

Would you have pointers that could help me doing so without reinventing
the wheel?

Thanks,

Eric
--
Le premier annuaire des apiculteurs 100% XML!
                                                http://apiculteurs.info/
------------------------------------------------------------------------
Eric van der Vlist       http://xmlfr.org            http://dyomedea.com
(ISO) RELAX NG   ISBN:0-596-00421-4 http://oreilly.com/catalog/relax
(W3C) XML Schema ISBN:0-596-00252-1 http://oreilly.com/catalog/xmlschema
------------------------------------------------------------------------




--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
ObjectWeb mailing lists service home page: http://www.objectweb.org/wws
Reply | Threaded
Open this post in threaded view
|

Re: Getting pipeline outputs with org.orbeon.oxf.pipeline.api

Erik Bruchez
Administrator
Eric,

An easy one. To achieve this, you connect the output of your processor
to the DOMSerializer processor:

   DOMSerializer domSerializer = new DOMSerializer();
   PipelineUtils.connect(myProcessor, myOutput.getName(),
                         domSerializer, "data");

Then you start the execution of your pipeline:

   domSerializer.start(pipelineContext);

When the execution has terminated, you can obtain the result as a dom4j
or W3C DOM document:

   domSerializer.getDocument(pipelineContext)
   domSerializer.getW3CDocument(pipelineContext)

-Erik

Eric van der Vlist wrote:

> Hi,
>
> Am I wrong or isn't it possible to retrieve the outputs of a pipeline
> using the org.orbeon.oxf.pipeline.api?
>
> What I'd like to do is to instanciate a pipeline processor, connect its
> config input on a static URI (so far so good), send it a second input as
> SAX events and read its unique output as SAX events too. The second
> input and the output would have predefined names (like the data
> input/output of a page view if you like).
>
> Would you have pointers that could help me doing so without reinventing
> the wheel?
>
> Thanks,
>
> Eric


--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
ObjectWeb mailing lists service home page: http://www.objectweb.org/wws
Reply | Threaded
Open this post in threaded view
|

Re: Getting pipeline outputs with org.orbeon.oxf.pipeline.api

Eric van der Vlist
Hi Erik,

Le vendredi 18 novembre 2005 à 15:48 +0100, Erik Bruchez a écrit :
> Eric,
>
> An easy one.

Not that sure, I think I haven't been clear enough :-) ...

My problem isn't to connect the external input/outputs of a custom
processor...

I am writing a processor that calls a pipeline processor in Java and my
question is the way round: how can I in my custom processor (in Java)
instanciate a pipeline processor, connect its config input on a static
URI (so far I have found how to do so with the
org.orbeon.oxf.pipeline.api package), send it a second input as SAX
events and read its unique output as SAX events too (this is what I
don't think you can do with this package) ?

It looks like the org.orbeon.oxf.pipeline.api has been designed with the
minimal amount of features needed to implement the command line utility
and I need more then than!

I can probably create a new instance of the PipelineProcessor directly,
call its createInput and createOutput methods, but then, how do I read
its output and give it its inputs?

Eric
--
If you have a XML document, you have its schema.
                                                  http://examplotron.org
------------------------------------------------------------------------
Eric van der Vlist       http://xmlfr.org            http://dyomedea.com
(ISO) RELAX NG   ISBN:0-596-00421-4 http://oreilly.com/catalog/relax
(W3C) XML Schema ISBN:0-596-00252-1 http://oreilly.com/catalog/xmlschema
------------------------------------------------------------------------




--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
ObjectWeb mailing lists service home page: http://www.objectweb.org/wws
Reply | Threaded
Open this post in threaded view
|

Re: Getting pipeline outputs with org.orbeon.oxf.pipeline.api

Eric van der Vlist
Le vendredi 18 novembre 2005 à 16:06 +0100, Eric van der Vlist a écrit :

> It looks like the org.orbeon.oxf.pipeline.api has been designed with the
> minimal amount of features needed to implement the command line utility
> and I need more then than!

I have made some progresses :-) ...

That's a quick hack since it would handle only one output, but adding
the following method to InitUtils appears to make the job for me:

    public static void readOutput(Processor processor, ExternalContext externalContext,
    PipelineContext pipelineContext, String outputName, ContentHandler contentHandler) throws Exception {

        // Record start time for this request
        long tsBegin = logger.isInfoEnabled() ? System.currentTimeMillis() : 0;
        String requestPath = null;
        try {
            ExternalContext.Request request = externalContext.getRequest();
            requestPath = request.getRequestPath();
        } catch (UnsupportedOperationException e) {
            // Don't do anything
        }

        // Set ExternalContext
        if (externalContext != null) {
            if (logger.isInfoEnabled()) {
                String startLoggerString = externalContext.getStartLoggerString();
                if (startLoggerString != null && startLoggerString.length() > 0)
                    logger.info(startLoggerString);
            }
            pipelineContext.setAttribute(PipelineContext.EXTERNAL_CONTEXT, externalContext);
        }
        // Make the static context available
        StaticExternalContext.setStaticContext(new StaticExternalContext.StaticContext(externalContext, pipelineContext));

        try {
            // Set cache size
            Integer cacheMaxSize = OXFProperties.instance().getPropertySet().getInteger(CACHE_SIZE_PROPERTY);
            if (cacheMaxSize != null)
                ObjectCache.instance().setMaxSize(pipelineContext, cacheMaxSize.intValue());

            // Start execution
            processor.reset(pipelineContext);
            processor.createOutput(outputName);
            ProcessorOutput processorOutput = processor.getOutputByName(outputName);
            processorOutput.read(pipelineContext, contentHandler);
            if (!pipelineContext.isDestroyed())
                pipelineContext.destroy(true);
        } catch (Exception e) {
            try {
                if (!pipelineContext.isDestroyed())
                    pipelineContext.destroy(false);
            } catch (Exception f) {
                logger.error("Exception while destroying context after exception", OXFException.getRootThrowable(f));
            }
            LocationData locationData = ValidationException.getRootLocationData(e);
            Throwable throwable = OXFException.getRootThrowable(e);
            String message = locationData == null
                    ? "Exception with no location data"
                    : "Exception at " + locationData.toString();
            logger.error(message, throwable);
            // Make sure the caller can do something about it, like trying to run an error page
            throw e;
        } finally {
            // Free context
            StaticExternalContext.removeStaticContext();

            if (logger.isInfoEnabled()) {
                // Display cache statistics
                CacheStatistics statistics = ObjectCache.instance().getStatistics(pipelineContext);
                int hitCount = statistics.getHitCount();
                int missCount = statistics.getMissCount();
                String successRate = null;
                if (hitCount + missCount > 0)
                    successRate = hitCount * 100 / (hitCount + missCount) + "%";
                else
                    successRate = "N/A";
                long timing = System.currentTimeMillis() - tsBegin;
                logger.info((requestPath != null ? requestPath : "Done running processor") + " - Timing: " + timing
                        + " - Cache hits: " + hitCount
                        + ", fault: " + missCount
                        + ", adds: " + statistics.getAddCount()
                        + ", success rate: " + successRate);
            }
        }
    }


Most of the stuff is shamelessly copied from the run() method, the
difference being:

            processor.createOutput(outputName);
            ProcessorOutput processorOutput = processor.getOutputByName(outputName);
            processorOutput.read(pipelineContext, contentHandler);

The question is now: what can we do with this :-) ...

The code uses a lot of private static declarations that seems to be
common between processors and I don't know if I could easily move it to
another class in my own package.

Also, I don't know if I would need all these lines if that was only a
"custom" method.

OTH, I don't think that this method is generic enough to deserve to be
committed and generally available...  

Eric
--
Le premier annuaire des apiculteurs 100% XML!
                                                http://apiculteurs.info/
------------------------------------------------------------------------
Eric van der Vlist       http://xmlfr.org            http://dyomedea.com
(ISO) RELAX NG   ISBN:0-596-00421-4 http://oreilly.com/catalog/relax
(W3C) XML Schema ISBN:0-596-00252-1 http://oreilly.com/catalog/xmlschema
------------------------------------------------------------------------




--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
ObjectWeb mailing lists service home page: http://www.objectweb.org/wws
Reply | Threaded
Open this post in threaded view
|

Re: Getting pipeline outputs with org.orbeon.oxf.pipeline.api

Eric van der Vlist
Le vendredi 18 novembre 2005 à 21:10 +0100, Eric van der Vlist a écrit :

>
> OTH, I don't think that this method is generic enough to deserve to be
> committed and generally available...  

Hmmm... I might have spoken too fast and be wrong.

That would maybe not be optimal, but you should be able to read several
outputs by calling this method several times since it doesn't destroy
the processor.

So, maybe you could include this new method into the source code?

Eric
--
Le premier annuaire des apiculteurs 100% XML!
                                                http://apiculteurs.info/
------------------------------------------------------------------------
Eric van der Vlist       http://xmlfr.org            http://dyomedea.com
(ISO) RELAX NG   ISBN:0-596-00421-4 http://oreilly.com/catalog/relax
(W3C) XML Schema ISBN:0-596-00252-1 http://oreilly.com/catalog/xmlschema
------------------------------------------------------------------------




--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
ObjectWeb mailing lists service home page: http://www.objectweb.org/wws
Reply | Threaded
Open this post in threaded view
|

Re: Getting pipeline outputs with org.orbeon.oxf.pipeline.api

Alessandro  Vernet
Administrator
In reply to this post by Eric van der Vlist
On 11/18/05, Eric van der Vlist <[hidden email]> wrote:
> That's a quick hack since it would handle only one output, but adding
> the following method to InitUtils appears to make the job for me:
> [...]

Hi Eric,

If you are already in the code of your own processor, you don't need
to do all those things that are done in InitUtils, like setting the
cache size and creating a new pipeline context.

In pseudo-code, you need to:

1) Create the pipeline processor (instantiating PipelineProcessor) =>
pipelineProcessor object
3) Create URLGenerator, connect it to the "config" input of
pipelineProcessor with PipelineUtils.connect().
2) Create input for "data" and create output for "data"
4) Connect to the input your own implementation of a ProcessorOutput
which when the method read(content hander) is called calls
readInputAsSAX(passing here the content handler) to read the "data"
input of the processor you are implementing.
5) Call read(content handler) on the processor output you created
passing the content handler you received from the method readImpl you
are implementing.

I hope that somehow this will makes sense :).

Alex
--
Blog (XML, Web apps, Open Source): http://www.orbeon.com/blog/



--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
ObjectWeb mailing lists service home page: http://www.objectweb.org/wws
--
Follow Orbeon on Twitter: @orbeon
Follow me on Twitter: @avernet
Reply | Threaded
Open this post in threaded view
|

Re: Getting pipeline outputs with org.orbeon.oxf.pipeline.api

Eric van der Vlist
Hi Alex,

Le mardi 22 novembre 2005 à 11:41 -0800, Alessandro Vernet a écrit :
.../...

> In pseudo-code, you need to:
>
> 1) Create the pipeline processor (instantiating PipelineProcessor) =>
> pipelineProcessor object
> 3) Create URLGenerator, connect it to the "config" input of
> pipelineProcessor with PipelineUtils.connect().
> 2) Create input for "data" and create output for "data"
> 4) Connect to the input your own implementation of a ProcessorOutput
> which when the method read(content hander) is called calls
> readInputAsSAX(passing here the content handler) to read the "data"
> input of the processor you are implementing.
> 5) Call read(content handler) on the processor output you created
> passing the content handler you received from the method readImpl you
> are implementing.
>
> I hope that somehow this will makes sense :).
Yes it does and that's much simpler than what I was doing!

I had tried to find out examples of this kind of flows in the source
code, but the ones I had found were hidden in the complexity of the PFC
or pipeline controllers and I hadn't been able to isolate the basic
calls like you did...

I'll try that on a simple example and let you know the outcome.

Thanks.

Eric
--
Don't you think all these XML schema languages should work together?
                                                         http://dsdl.org
------------------------------------------------------------------------
Eric van der Vlist       http://xmlfr.org            http://dyomedea.com
(ISO) RELAX NG   ISBN:0-596-00421-4 http://oreilly.com/catalog/relax
(W3C) XML Schema ISBN:0-596-00252-1 http://oreilly.com/catalog/xmlschema
------------------------------------------------------------------------




--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
ObjectWeb mailing lists service home page: http://www.objectweb.org/wws
Reply | Threaded
Open this post in threaded view
|

Re: Getting pipeline outputs with org.orbeon.oxf.pipeline.api

Eric van der Vlist
In reply to this post by Alessandro Vernet
Hi Alex,

Le mardi 22 novembre 2005 à 11:41 -0800, Alessandro Vernet a écrit :

> On 11/18/05, Eric van der Vlist <[hidden email]> wrote:
> > That's a quick hack since it would handle only one output, but adding
> > the following method to InitUtils appears to make the job for me:
> > [...]
>
> Hi Eric,
>
> If you are already in the code of your own processor, you don't need
> to do all those things that are done in InitUtils, like setting the
> cache size and creating a new pipeline context.
That's working fine. The only thing you'd not mentioned is:

> In pseudo-code, you need to:
>
> 1) Create the pipeline processor (instantiating PipelineProcessor) =>
> pipelineProcessor object

Here you need to reset the processor...

> 3) Create URLGenerator, connect it to the "config" input of
> pipelineProcessor with PipelineUtils.connect().
> 2) Create input for "data" and create output for "data"
> 4) Connect to the input your own implementation of a ProcessorOutput
> which when the method read(content hander) is called calls
> readInputAsSAX(passing here the content handler) to read the "data"
> input of the processor you are implementing.
> 5) Call read(content handler) on the processor output you created
> passing the content handler you received from the method readImpl you
> are implementing.
I'll need to have a look on caching and performance issues later on, but
I can now invoke XPL pipelines to index documents based on their media
types...

Thanks,

Eric
--
If you have a XML document, you have its schema.
                                                  http://examplotron.org
------------------------------------------------------------------------
Eric van der Vlist       http://xmlfr.org            http://dyomedea.com
(ISO) RELAX NG   ISBN:0-596-00421-4 http://oreilly.com/catalog/relax
(W3C) XML Schema ISBN:0-596-00252-1 http://oreilly.com/catalog/xmlschema
------------------------------------------------------------------------




--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
ObjectWeb mailing lists service home page: http://www.objectweb.org/wws
Reply | Threaded
Open this post in threaded view
|

Reusing pipeline processors (was: Getting pipeline outputs with org.orbeon.oxf.pipeline.api)

Eric van der Vlist
Hi,

Le mercredi 23 novembre 2005 à 12:45 +0100, Eric van der Vlist a écrit :

> I'll need to have a look on caching and performance issues later on, but
> I can now invoke XPL pipelines to index documents based on their media
> types...

Just some comments about a few things I have noticed and some related
questions...

1) The scheduler doesn't notice when a thread runs out of memory.

If you run a processor from the scheduler, if this processor runs out of
memory, the scheduler doesn't seem to notice and it won't restart a
processor with the same name (if you ask to check that no two processors
with the same name run concurrently).

2) Pipeline contexts grow

In my context where a long running custom processor executes a large
number of pipeline processors (I have tried with collections of several
thousands of documents), if you reuse a pipeline context you rapidly run
out of memory (even if you don't reuse the same pipeline processor).

My interpretation is that the stuff added by each processor is never
removed and I think that this could be helpful to have a method that
asks to a processor to clean the pipeline context

3) You can't safely reuse a pipeline processor if you don't delete its
input/outputs when you change them.

Despite the fact that a comment in the pipeline processors says:

 * <p>This processor is not only not thread safe, but it can't even be
 * reused: if there is one data output (with a 1 cardinality), one can't call
 * read multiple times and get the same result. Only the first call to read
 * on the data output will succeed.

I have tried to see how I could reuse these pipelines processors.

I have some few .xpl pipes attached to mime types and I am keeping them
in an Hashtable which key is the address of the .xpl file.

When I reuse one of these processors, I keep its config input unchanged,
I reset its data input with a new ProcessorOutput and read its data
output again.

If I do so without deleting the data input and output, I run rapidly out
of memory again.

4) My questions...

Is that safe to reuse a pipeline processor like I do, keeping its config
input unchanged between runs?

Is that enough to cache the config input and things such as the XSLT
transformations that it includes?

Is there an easy way to use OPS' cache system instead?

Wouldn't it be better than what I am doing right now (which is a kind of
poor man's cache)?

Thanks,

Eric

--
Carnet web :
           http://eric.van-der-vlist.com/blog?t=category&a=Fran%C3%A7ais
------------------------------------------------------------------------
Eric van der Vlist       http://xmlfr.org            http://dyomedea.com
(ISO) RELAX NG   ISBN:0-596-00421-4 http://oreilly.com/catalog/relax
(W3C) XML Schema ISBN:0-596-00252-1 http://oreilly.com/catalog/xmlschema
------------------------------------------------------------------------




--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
ObjectWeb mailing lists service home page: http://www.objectweb.org/wws
Reply | Threaded
Open this post in threaded view
|

Re: Reusing pipeline processors (was: Getting pipeline outputs with org.orbeon.oxf.pipeline.api)

Alessandro  Vernet
Administrator
On 11/24/05, Eric van der Vlist <[hidden email]> wrote:
> [...]
> Is that safe to reuse a pipeline processor like I do, keeping its config
> input unchanged between runs?

Hi Eric,

If you reset the processor instance, you can reuse it and call read
multiple times on outputs. That comment in the code seems to be
inaccurate as it does not seem take the existence of a reset() into
account. However, you might want to create a different instance of the
pipeline context every time you run your "pipeline"; I am using quotes
here as you are creating the equivalent of an XPL pipeline with Java
code.

> Is that enough to cache the config input and things such as the XSLT
> transformations that it includes?

Yes, if you keep reusing the same pipeline instance the cache should
work as expected.

> Is there an easy way to use OPS' cache system instead?
> Wouldn't it be better than what I am doing right now (which is a kind of
> poor man's cache)?

Yes, instead of the Hashtable, you could store the pipelines you are
creating in PresentationServer cache. Now if you are creating a
limited number of those pipelines for all practical purpose using a
Hashtable is just fine as you might not care about getting rid of
those pipelines to save memory.

Alex



--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
ObjectWeb mailing lists service home page: http://www.objectweb.org/wws
--
Follow Orbeon on Twitter: @orbeon
Follow me on Twitter: @avernet
Reply | Threaded
Open this post in threaded view
|

Re: Reusing pipeline processors (was: Getting pipeline outputs with org.orbeon.oxf.pipeline.api)

Eric van der Vlist
Hi Alex,

Le vendredi 25 novembre 2005 à 17:12 -0800, Alessandro Vernet a écrit :

> On 11/24/05, Eric van der Vlist <[hidden email]> wrote:
> > [...]
> > Is that safe to reuse a pipeline processor like I do, keeping its config
> > input unchanged between runs?
>
> Hi Eric,
>
> If you reset the processor instance, you can reuse it and call read
> multiple times on outputs. That comment in the code seems to be
> inaccurate as it does not seem take the existence of a reset() into
> account. However, you might want to create a different instance of the
> pipeline context every time you run your "pipeline"; I am using quotes
> here as you are creating the equivalent of an XPL pipeline with Java
> code.
In that specific case you can remove the quotes: I am instanciating the
pipeline processor thus creating real XPL pipelines with Java :-) . But
this would work with other processors too.

> > Is that enough to cache the config input and things such as the XSLT
> > transformations that it includes?
>
> Yes, if you keep reusing the same pipeline instance the cache should
> work as expected.

Great!

> > Is there an easy way to use OPS' cache system instead?
> > Wouldn't it be better than what I am doing right now (which is a kind of
> > poor man's cache)?
>
> Yes, instead of the Hashtable, you could store the pipelines you are
> creating in PresentationServer cache. Now if you are creating a
> limited number of those pipelines for all practical purpose using a
> Hashtable is just fine as you might not care about getting rid of
> those pipelines to save memory.

I am still not very clear on the memory usages during the life cycle of
a pipeline processor...

In my indexer, I am creating each pipeline processor and connecting it
to its config input once:

                PipelineProcessor pipelineProcessor = new PipelineProcessor();
                URLGenerator config = new URLGenerator("file:"
                                + resourceManager.getRealPath(processorKey));
                PipelineUtils.connect(config, "data", pipelineProcessor, "config");

And each time I am using it, I create its data input and output, reset
the processor, connect its data input, read from its data output and
delete its data input an output.

                                pipelineProcessor.createInput("data");
                                pipelineProcessor.createOutput("data");

                                pipelineProcessor.reset(context);
                                pipelineProcessor.getInputByName("data").setOutput(
                                                new DocumentObjectOutput(doc));
                                XmlSax2JavaObjectPipe pipe = new XmlSax2JavaObjectPipe();
                                pipe.setParameter("namespace2package", "", this.getClass()
                                                .getPackage().getName());
                                try {
                                        pipelineProcessor.getOutputByName("data").read(context,
                                                        (XmlSaxSource) pipe.getSource());
                                        processedDoc = (Document) pipe.getObject();
                                        pipelineProcessor.deleteInput(pipelineProcessor
                                                        .getInputByName("data"));
                                        pipelineProcessor.deleteOutput(pipelineProcessor
                                                        .getOutputByName("data"));
                                        break;
                                } catch (Exception e) {
                                        e.printStackTrace();
                                        pipelineProcessor.deleteInput(pipelineProcessor
                                                        .getInputByName("data"));
                                        pipelineProcessor.deleteOutput(pipelineProcessor
                                                        .getOutputByName("data"));
                                }

Is that optimal?

In particular, I am expecting that since I am reusing the config input,
it will be read and "compiled" only once. Is that the case?

The data input is a small document, but during the execution of the
pipeline, large documents can be read and manipulated by the pipeline.

Is the memory used by these documents freed when I "leave" the pipeline
processor between two uses?

If not, how can I release this memory? Would reseting the processor
after having used it help? Would it be better to embed these large
documents in the data input?

Thanks for your help,

Eric

--
Freelance consulting and training.
                                            http://dyomedea.com/english/
------------------------------------------------------------------------
Eric van der Vlist       http://xmlfr.org            http://dyomedea.com
(ISO) RELAX NG   ISBN:0-596-00421-4 http://oreilly.com/catalog/relax
(W3C) XML Schema ISBN:0-596-00252-1 http://oreilly.com/catalog/xmlschema
------------------------------------------------------------------------




--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
ObjectWeb mailing lists service home page: http://www.objectweb.org/wws
Reply | Threaded
Open this post in threaded view
|

Re: Reusing pipeline processors (was: Getting pipeline outputs with org.orbeon.oxf.pipeline.api)

Alessandro  Vernet
Administrator
On 11/26/05, Eric van der Vlist <[hidden email]> wrote:
> In particular, I am expecting that since I am reusing the config input,
> it will be read and "compiled" only once. Is that the case?

Yes, the config should be cached in this case.

> Is the memory used by these documents freed when I "leave" the pipeline
> processor between two uses?

Processors should not store anything in instance properties. They
should only store information in the pipeline context or the cache. So
if you don't keep a reference to the pipeline context, the processors
are not going to use more memory after one run than they did before.
Of course, after one run, there might be more memory used by the
cache.

> If not, how can I release this memory? Would reseting the processor
> after having used it help? Would it be better to embed these large
> documents in the data input?

The reset() method is used in general to initialize a "state" in the
context. It used in conjunction with the setState() method, both
defined in ProcessorImpl. So reseting after use would just free some
memory in the pipeline context, but it is better and simpler to just
start over with a new pipeline context.

Alex
--
Blog (XML, Web apps, Open Source): http://www.orbeon.com/blog/



--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
ObjectWeb mailing lists service home page: http://www.objectweb.org/wws
--
Follow Orbeon on Twitter: @orbeon
Follow me on Twitter: @avernet
Reply | Threaded
Open this post in threaded view
|

Re: Reusing pipeline processors (was: Getting pipeline outputs with org.orbeon.oxf.pipeline.api)

Eric van der Vlist
Hi Alex,

Le mardi 29 novembre 2005 à 19:30 -0800, Alessandro Vernet a écrit :

> On 11/26/05, Eric van der Vlist <[hidden email]> wrote:
> > In particular, I am expecting that since I am reusing the config input,
> > it will be read and "compiled" only once. Is that the case?
>
> Yes, the config should be cached in this case.
>
> > Is the memory used by these documents freed when I "leave" the pipeline
> > processor between two uses?
>
> Processors should not store anything in instance properties. They
> should only store information in the pipeline context or the cache.
Hmmm... I am afraid I had missed that very important point in some of my
own custom processors but I'll fix that ASAP!

> So
> if you don't keep a reference to the pipeline context, the processors
> are not going to use more memory after one run than they did before.
> Of course, after one run, there might be more memory used by the
> cache.
>
> > If not, how can I release this memory? Would reseting the processor
> > after having used it help? Would it be better to embed these large
> > documents in the data input?
>
> The reset() method is used in general to initialize a "state" in the
> context. It used in conjunction with the setState() method, both
> defined in ProcessorImpl. So reseting after use would just free some
> memory in the pipeline context, but it is better and simpler to just
> start over with a new pipeline context.
Yes, that explains why I had seen pipeline contexts grow in my long
running pipeline.

Thanks for the explanations.

Eric

--
Lisez-moi sur XMLfr.
                       http://xmlfr.org/index/person/eric+van+der+vlist/
------------------------------------------------------------------------
Eric van der Vlist       http://xmlfr.org            http://dyomedea.com
(ISO) RELAX NG   ISBN:0-596-00421-4 http://oreilly.com/catalog/relax
(W3C) XML Schema ISBN:0-596-00252-1 http://oreilly.com/catalog/xmlschema
------------------------------------------------------------------------




--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
ObjectWeb mailing lists service home page: http://www.objectweb.org/wws