"Java heap space" error

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

"Java heap space" error

thao nguyen

Hello orbeon users,

I got an error: "Java heap space" when calling my xpl file. I hope anyone can
help me with these below questions:

The situation: I have several XSLT files. The xpl file loads an input and do
transform through those XSLT files by calling oxf:xslt. I call the xpl file by
this command:

java -Xms768M -Xmx768M -jar "orbeon-cli.jar"
"file:/D:/my-folder/my_pipeline.xpl"

An input file is currently 3.5MB. There are around 20 xslt files, each
transformation step output is the input for the following transformation step.
The intermediate result can reach to 20MB.

As I can describe in this flow:
input.xml (3.5M) ---[step1.xsl]---> out_1.xml (~20M) ---[step2.xsl]--->
out_2.xml(~20MB)---[...]---> ...---[stepN.xsl]---> out_N.xml.

It cannot go through all steps, somewhere in step xx, it raised error: "Java
heap space". So, my questions are:
1. Can I remove the intermediate result out of the memory? (for the same input,
it can run through around 10 xslt transformations, so I afraid it store too
much in the memory) Does the null-serializer works for this purpose?
2. I have a look at the GC, and it seems to me that Orbeon did not remove all
the memory. Is there any option I can set to force GC to clean the memory? If
it just increasing, using 1GB or more might also raise to "Java heap space"
error.
3. Do I make a right choice to use orbeon pipeline for this linear steps of
transformation? I could not find the orbeon limitation written somewhere,
anyone has a clue?

I appreciate very much for any response!
Thao Nguyen


--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws
Reply | Threaded
Open this post in threaded view
|

Re: "Java heap space" error

Alessandro  Vernet
Administrator
Thao,

On Sun, Dec 27, 2009 at 10:43 PM, Thao Nguyen
<[hidden email]> wrote:
> 1. Can I remove the intermediate result out of the memory? (for the same input,
> it can run through around 10 xslt transformations, so I afraid it store too
> much in the memory) Does the null-serializer works for this purpose?

The data resulting from intermediary steps should not be kept in
memory (it is not stored in cache). This unless the output of a step
is read from multiple places, e.g. from a <p:choose> and from a
<p:input>. Is that the case in your pipeline, or is it strictly
linear? Could you also quote the top ~20 lines of the exception you
get, so we can see where the error happens?

> 2. I have a look at the GC, and it seems to me that Orbeon did not remove all
> the memory. Is there any option I can set to force GC to clean the memory? If
> it just increasing, using 1GB or more might also raise to "Java heap space"
> error.

That won't help. When the VM issues a OutOfMemoryError, it is because
it really can't allocate any more memory after doing a full GC.

> 3. Do I make a right choice to use orbeon pipeline for this linear steps of
> transformation? I could not find the orbeon limitation written somewhere,
> anyone has a clue?

I'd say you should be fine processing XML files of ~20MB with 768 MB,
but apparently there is a snag along the way. Let's try to find what
the issue is!

Alex
--
Orbeon Forms - Web forms, open-source, for the Enterprise
Orbeon's Blog: http://www.orbeon.com/blog/
My Twitter: http://twitter.com/avernet


--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws
--
Follow Orbeon on Twitter: @orbeon
Follow me on Twitter: @avernet
Reply | Threaded
Open this post in threaded view
|

Re: Re: "Java heap space" error

thao nguyen
Hello Alex,

Thank you very much for your response!

On Tue, Dec 29, 2009 at 5:01 AM, Alessandro Vernet <[hidden email]> wrote:
Thao,

On Sun, Dec 27, 2009 at 10:43 PM, Thao Nguyen
<[hidden email]> wrote:
> 1. Can I remove the intermediate result out of the memory? (for the same input,
> it can run through around 10 xslt transformations, so I afraid it store too
> much in the memory) Does the null-serializer works for this purpose?

The data resulting from intermediary steps should not be kept in
memory (it is not stored in cache). This unless the output of a step
is read from multiple places, e.g. from a <p:choose> and from a
<p:input>. Is that the case in your pipeline, or is it strictly
linear? Could you also quote the top ~20 lines of the exception you
get, so we can see where the error happens?

The pipeline includes at top some identities for its references. Those files are around 5MB. The pipeline is not strictly linear, I mean the output of one step is not always the input of the following. It has sometimes <p:choose>, yes. The <p:input> of one step can be the output of the directly previous step or some steps further before. Sorry that I didn't made a good flow. I try to make another one here, I hope it's clearer.

        input.xml (3.5M) ++++ ref1.xml +++ ref2.xml 
            |                                      |
            |                                      |
        [step0.xsl]                            |
            |                                      |
            |                                      |
        out_0.xml (~20M)                 |
            |                                      |
            |                                      |
        [step1.xsl]                            |
            |                                      |
            |                                      |
        out_1.xml (~20M)                 |
            |                                      |
            |                                      |
        [step2.xsl]                            |
            |                                      |
            |                                      |
        out_2.xml(~20MB)                |
            |                                      |
            |                                      |
          choose________________|
    |-----------------------|
    |                           |
[step3.xsl]          [step4.xsl]   
in: out_1.xml      in: out_2.xml
    |                           |
    |                           |
    |_______________|
            |
            |
        out_3.xml ---[...]---> ...---[stepN.xsl]---> out_N.xml.

In this case, out_1.xml and out_2.xml is still in the memory and it can be the input for xslt processors step3 and step4. And here, I get confuse. You said the resulting from intermediary steps won't be kept, so how can it decide that it should keep out_1 and out_2 for the later steps but not out_0?

The exception I got:
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
2009-12-23 11:35:10,512 ERROR org.orbeon.oxf.main.OPS  - Exception with no location data
java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOf(Unknown Source)
    at java.io.ByteArrayOutputStream.write(Unknown Source)
    at org.orbeon.oxf.processor.serializer.store.ResultStoreOutputStream.write(ResultStoreOutputStream.java:48)
    at sun.nio.cs.StreamEncoder.writeBytes(Unknown Source)
    at sun.nio.cs.StreamEncoder.implWrite(Unknown Source)
    at sun.nio.cs.StreamEncoder.write(Unknown Source)
    at java.io.OutputStreamWriter.write(Unknown Source)
    at org.orbeon.saxon.tinytree.CharSlice.write(CharSlice.java:170)
    at org.orbeon.saxon.event.XMLEmitter.writeCharSequence(XMLEmitter.java:595)
    at org.orbeon.saxon.event.XMLEmitter.characters(XMLEmitter.java:548)
    at org.orbeon.saxon.event.XMLIndenter.indent(XMLIndenter.java:196)
    at org.orbeon.saxon.event.XMLIndenter.startElement(XMLIndenter.java:73)
    at org.orbeon.saxon.event.ReceivingContentHandler.startElement(ReceivingContentHandler.java:217)
    at org.orbeon.oxf.xml.ForwardingContentHandler.startElement(ForwardingContentHandler.java:87)
    at org.orbeon.oxf.xml.ForwardingContentHandler.startElement(ForwardingContentHandler.java:87)
    at org.orbeon.oxf.xml.ForwardingContentHandler.startElement(ForwardingContentHandler.java:87)
    at org.orbeon.oxf.xml.SAXStore.startElement(SAXStore.java:401)
    at org.orbeon.oxf.xml.SimpleForwardingContentHandler.startElement(SimpleForwardingContentHandler.java:69)
    at org.orbeon.oxf.processor.transformer.xslt.XSLTTransformer$3.startElement(XSLTTransformer.java:267)
    at org.orbeon.saxon.event.ContentHandlerProxy.startContent(ContentHandlerProxy.java:349)
    at org.orbeon.saxon.event.ProxyReceiver.startContent(ProxyReceiver.java:162)
    at org.orbeon.saxon.event.NamespaceReducer.startContent(NamespaceReducer.java:168)
    at org.orbeon.saxon.event.ComplexContentOutputter.startContent(ComplexContentOutputter.java:502)
    at org.orbeon.saxon.event.ComplexContentOutputter.endElement(ComplexContentOutputter.java:383)
    at org.orbeon.saxon.instruct.ElementCreator.processLeavingTail(ElementCreator.java:253)
    at org.orbeon.saxon.instruct.Copy.processLeavingTail(Copy.java:152)
    at org.orbeon.saxon.instruct.Template.applyLeavingTail(Template.java:99)
    at org.orbeon.saxon.instruct.ApplyTemplates.applyTemplates(ApplyTemplates.java:319)
    at org.orbeon.saxon.instruct.ApplyTemplates.apply(ApplyTemplates.java:189)
    at org.orbeon.saxon.instruct.ApplyTemplates.processLeavingTail(ApplyTemplates.java:153)
    at org.orbeon.saxon.instruct.Block.processLeavingTail(Block.java:353)
    at org.orbeon.saxon.instruct.Instruction.process(Instruction.java:91)
2009-12-23 11:35:10,668 INFO  org.orbeon.oxf.main.OPS  - / - Timing: 2357557 - Cache hits for cache.main: 159, fault: 118, adds: 135, expirations: 0, success rate: 57%
//////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

I'm really hope you can help me figure out what's wrong with my pipe.

Thank you!

Thao

> 2. I have a look at the GC, and it seems to me that Orbeon did not remove all
> the memory. Is there any option I can set to force GC to clean the memory? If
> it just increasing, using 1GB or more might also raise to "Java heap space"
> error.

That won't help. When the VM issues a OutOfMemoryError, it is because
it really can't allocate any more memory after doing a full GC.

> 3. Do I make a right choice to use orbeon pipeline for this linear steps of
> transformation? I could not find the orbeon limitation written somewhere,
> anyone has a clue?

I'd say you should be fine processing XML files of ~20MB with 768 MB,
but apparently there is a snag along the way. Let's try to find what
the issue is!

Alex
--
Orbeon Forms - Web forms, open-source, for the Enterprise
Orbeon's Blog: http://www.orbeon.com/blog/
My Twitter: http://twitter.com/avernet



--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws
Reply | Threaded
Open this post in threaded view
|

Re: Re: Re: "Java heap space" error

Alessandro  Vernet
Administrator
Thao,

On Mon, Dec 28, 2009 at 7:52 PM, Nguyen Thi Ngoc Thao
<[hidden email]> wrote:
> In this case, out_1.xml and out_2.xml is still in the memory and it can be
> the input for xslt processors step3 and step4. And here, I get confuse. You
> said the resulting from intermediary steps won't be kept, so how can it
> decide that it should keep out_1 and out_2 for the later steps but not
> out_0?

I am simplifying a bit, but in essence, if the output of a processor
is used in two places (by two inputs) A and B, when A reads the data,
a copy of the output has to be kept in memory for when B will do its
reading later. If you have several cases like that where the data is
large, that can lead to an out of memory error.

> The exception I got:
> [...]

From the exception, it looks like the problem might happen in the
result store used by a file serializer. By default the File Serializer
caches its data input (the data it is writing to disk). If the amount
if data is large, you'll want to disable that setting
cache-control/use-local-cache to false in the config. Do you indeed
have a File Serializer? Could you try this? You'll find more on the
config for the File Serializer at:

http://www.orbeon.com/orbeon/doc/processors-serializers

Alex
--
Orbeon Forms - Web forms, open-source, for the Enterprise
Orbeon's Blog: http://www.orbeon.com/blog/
My Twitter: http://twitter.com/avernet


--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws
--
Follow Orbeon on Twitter: @orbeon
Follow me on Twitter: @avernet
Reply | Threaded
Open this post in threaded view
|

Re: Re: Re: Re: "Java heap space" error

thao nguyen
Hi Alex,

The hint of using "cache-control/use-local-cache" works. I could save a lot of space, thank you very much.
To answer your question, I have to use file serializer for debug purpose. Now, I use it as less as possible. However, it still has not enough space. The intermediate results gets bigger now. It's now around 65MB a file. Therefore, I separate one big pipeline into several smaller pipelines. Combining those small pipelines in one pipeline (a container pipeline) by using oxf:pipeline, unfortunately, failed. But it works when I call those small pipelines by a DOS batch file. I guess the container pipeline use the same heap for all children pipelines it called. So my questions are:
- Can I, somehow, force the container pipeline to delete the memory used by child pipeline which is already called?
- When I call orbeon-cli for one oxf:xslt step with input is around 65MB, the heap size imediately reach around 700MB, so, the heap size is around 10 times the real input size. Do you know if it's the correct ratio? Does it also depend on the xslt file? (maybe how complexity the xslt is?)
- And finally (but I guess, hopeless) how can I prevent the java-heap-space error?

Thanks very much again for your support so far,
Thao

On Thu, Dec 31, 2009 at 8:08 AM, Alessandro Vernet <[hidden email]> wrote:
Thao,

On Mon, Dec 28, 2009 at 7:52 PM, Nguyen Thi Ngoc Thao
<[hidden email]> wrote:
> In this case, out_1.xml and out_2.xml is still in the memory and it can be
> the input for xslt processors step3 and step4. And here, I get confuse. You
> said the resulting from intermediary steps won't be kept, so how can it
> decide that it should keep out_1 and out_2 for the later steps but not
> out_0?

I am simplifying a bit, but in essence, if the output of a processor
is used in two places (by two inputs) A and B, when A reads the data,
a copy of the output has to be kept in memory for when B will do its
reading later. If you have several cases like that where the data is
large, that can lead to an out of memory error.

> The exception I got:
> [...]

From the exception, it looks like the problem might happen in the
result store used by a file serializer. By default the File Serializer
caches its data input (the data it is writing to disk). If the amount
if data is large, you'll want to disable that setting
cache-control/use-local-cache to false in the config. Do you indeed
have a File Serializer? Could you try this? You'll find more on the
config for the File Serializer at:

http://www.orbeon.com/orbeon/doc/processors-serializers

Alex
--
Orbeon Forms - Web forms, open-source, for the Enterprise
Orbeon's Blog: http://www.orbeon.com/blog/
My Twitter: http://twitter.com/avernet


--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws



--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws
Reply | Threaded
Open this post in threaded view
|

Re: Re: Re: Re: Re: "Java heap space" error

Alessandro  Vernet
Administrator
Thao,

1) You can't really force content out of memory. If a processor keeps a reference to something in memory, it is either because it needs it, or it doesn't, it is a bug :).

2) Michael Kay, the author of Saxon, usually says that you should count on a factor of 5. But that is if only one copy of the XML is in memory. If you have two, then you get to 10.

3) Working on some optimization for a client during the last few weeks, I found that our XSLT processor was, by mistake (!), always storing its input in cache. This can have a very bad impact on memory usage, especially if you are manipulating large document. I will get this fix in the codebase and post a follow-up message when done.

Alex



On Jan 11, 2010, at 12:33 AM, Nguyen Thi Ngoc Thao <[hidden email]> wrote:

Hi Alex,

The hint of using "cache-control/use-local-cache" works. I could save a lot of space, thank you very much.
To answer your question, I have to use file serializer for debug purpose. Now, I use it as less as possible. However, it still has not enough space. The intermediate results gets bigger now. It's now around 65MB a file. Therefore, I separate one big pipeline into several smaller pipelines. Combining those small pipelines in one pipeline (a container pipeline) by using oxf:pipeline, unfortunately, failed. But it works when I call those small pipelines by a DOS batch file. I guess the container pipeline use the same heap for all children pipelines it called. So my questions are:
- Can I, somehow, force the container pipeline to delete the memory used by child pipeline which is already called?
- When I call orbeon-cli for one oxf:xslt step with input is around 65MB, the heap size imediately reach around 700MB, so, the heap size is around 10 times the real input size. Do you know if it's the correct ratio? Does it also depend on the xslt file? (maybe how complexity the xslt is?)
- And finally (but I guess, hopeless) how can I prevent the java-heap-space error?

Thanks very much again for your support so far,
Thao

On Thu, Dec 31, 2009 at 8:08 AM, Alessandro Vernet <[hidden email]> wrote:
Thao,

On Mon, Dec 28, 2009 at 7:52 PM, Nguyen Thi Ngoc Thao
<[hidden email]> wrote:
> In this case, out_1.xml and out_2.xml is still in the memory and it can be
> the input for xslt processors step3 and step4. And here, I get confuse. You
> said the resulting from intermediary steps won't be kept, so how can it
> decide that it should keep out_1 and out_2 for the later steps but not
> out_0?

I am simplifying a bit, but in essence, if the output of a processor
is used in two places (by two inputs) A and B, when A reads the data,
a copy of the output has to be kept in memory for when B will do its
reading later. If you have several cases like that where the data is
large, that can lead to an out of memory error.

> The exception I got:
> [...]

From the exception, it looks like the problem might happen in the
result store used by a file serializer. By default the File Serializer
caches its data input (the data it is writing to disk). If the amount
if data is large, you'll want to disable that setting
cache-control/use-local-cache to false in the config. Do you indeed
have a File Serializer? Could you try this? You'll find more on the
config for the File Serializer at:

http://www.orbeon.com/orbeon/doc/processors-serializers

Alex
--
Orbeon Forms - Web forms, open-source, for the Enterprise
Orbeon's Blog: http://www.orbeon.com/blog/
My Twitter: http://twitter.com/avernet


--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws


--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: [hidden email]
For general help: [hidden email]
OW2 mailing lists service home page: http://www.ow2.org/wws


--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws
--
Follow Orbeon on Twitter: @orbeon
Follow me on Twitter: @avernet
Reply | Threaded
Open this post in threaded view
|

Re: Re: Re: Re: Re: Re: "Java heap space" error

thao nguyen
Hi Alex,

Many thanks and I'm looking forward to hearing about your XSLT processor update.

Thao

On Tue, Jan 12, 2010 at 9:02 AM, Alessandro Vernet <[hidden email]> wrote:
Thao,

1) You can't really force content out of memory. If a processor keeps a reference to something in memory, it is either because it needs it, or it doesn't, it is a bug :).

2) Michael Kay, the author of Saxon, usually says that you should count on a factor of 5. But that is if only one copy of the XML is in memory. If you have two, then you get to 10.

3) Working on some optimization for a client during the last few weeks, I found that our XSLT processor was, by mistake (!), always storing its input in cache. This can have a very bad impact on memory usage, especially if you are manipulating large document. I will get this fix in the codebase and post a follow-up message when done.

Alex



On Jan 11, 2010, at 12:33 AM, Nguyen Thi Ngoc Thao <[hidden email]> wrote:

Hi Alex,

The hint of using "cache-control/use-local-cache" works. I could save a lot of space, thank you very much.
To answer your question, I have to use file serializer for debug purpose. Now, I use it as less as possible. However, it still has not enough space. The intermediate results gets bigger now. It's now around 65MB a file. Therefore, I separate one big pipeline into several smaller pipelines. Combining those small pipelines in one pipeline (a container pipeline) by using oxf:pipeline, unfortunately, failed. But it works when I call those small pipelines by a DOS batch file. I guess the container pipeline use the same heap for all children pipelines it called. So my questions are:
- Can I, somehow, force the container pipeline to delete the memory used by child pipeline which is already called?
- When I call orbeon-cli for one oxf:xslt step with input is around 65MB, the heap size imediately reach around 700MB, so, the heap size is around 10 times the real input size. Do you know if it's the correct ratio? Does it also depend on the xslt file? (maybe how complexity the xslt is?)
- And finally (but I guess, hopeless) how can I prevent the java-heap-space error?

Thanks very much again for your support so far,
Thao

On Thu, Dec 31, 2009 at 8:08 AM, Alessandro Vernet <[hidden email][hidden email]> wrote:
Thao,

On Mon, Dec 28, 2009 at 7:52 PM, Nguyen Thi Ngoc Thao
> In this case, out_1.xml and out_2.xml is still in the memory and it can be
> the input for xslt processors step3 and step4. And here, I get confuse. You
> said the resulting from intermediary steps won't be kept, so how can it
> decide that it should keep out_1 and out_2 for the later steps but not
> out_0?

I am simplifying a bit, but in essence, if the output of a processor
is used in two places (by two inputs) A and B, when A reads the data,
a copy of the output has to be kept in memory for when B will do its
reading later. If you have several cases like that where the data is
large, that can lead to an out of memory error.

> The exception I got:
> [...]

From the exception, it looks like the problem might happen in the
result store used by a file serializer. By default the File Serializer
caches its data input (the data it is writing to disk). If the amount
if data is large, you'll want to disable that setting
cache-control/use-local-cache to false in the config. Do you indeed
have a File Serializer? Could you try this? You'll find more on the
config for the File Serializer at:

http://www.orbeon.com/orbeon/doc/processors-serializers

Alex
--
Orbeon Forms - Web forms, open-source, for the Enterprise
Orbeon's Blog: http://www.orbeon.com/blog/
My Twitter: http://twitter.com/avernet


--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws
Reply | Threaded
Open this post in threaded view
|

Re: Re: Re: Re: Re: Re: Re: "Java heap space" error

Alessandro  Vernet
Administrator
Thao,

I have the change done, but need to forward-port it to the latest code. I hope to find the time to do this in the next few days. Sorry to make you wait!

Alex



On Jan 12, 2010, at 1:54 AM, Nguyen Thi Ngoc Thao <[hidden email]> wrote:

Hi Alex,

Many thanks and I'm looking forward to hearing about your XSLT processor update.

Thao

On Tue, Jan 12, 2010 at 9:02 AM, Alessandro Vernet <[hidden email]> wrote:
Thao,

1) You can't really force content out of memory. If a processor keeps a reference to something in memory, it is either because it needs it, or it doesn't, it is a bug :).

2) Michael Kay, the author of Saxon, usually says that you should count on a factor of 5. But that is if only one copy of the XML is in memory. If you have two, then you get to 10.

3) Working on some optimization for a client during the last few weeks, I found that our XSLT processor was, by mistake (!), always storing its input in cache. This can have a very bad impact on memory usage, especially if you are manipulating large document. I will get this fix in the codebase and post a follow-up message when done.

Alex



On Jan 11, 2010, at 12:33 AM, Nguyen Thi Ngoc Thao <[hidden email]> wrote:

Hi Alex,

The hint of using "cache-control/use-local-cache" works. I could save a lot of space, thank you very much.
To answer your question, I have to use file serializer for debug purpose. Now, I use it as less as possible. However, it still has not enough space. The intermediate results gets bigger now. It's now around 65MB a file. Therefore, I separate one big pipeline into several smaller pipelines. Combining those small pipelines in one pipeline (a container pipeline) by using oxf:pipeline, unfortunately, failed. But it works when I call those small pipelines by a DOS batch file. I guess the container pipeline use the same heap for all children pipelines it called. So my questions are:
- Can I, somehow, force the container pipeline to delete the memory used by child pipeline which is already called?
- When I call orbeon-cli for one oxf:xslt step with input is around 65MB, the heap size imediately reach around 700MB, so, the heap size is around 10 times the real input size. Do you know if it's the correct ratio? Does it also depend on the xslt file? (maybe how complexity the xslt is?)
- And finally (but I guess, hopeless) how can I prevent the java-heap-space error?

Thanks very much again for your support so far,
Thao

On Thu, Dec 31, 2009 at 8:08 AM, Alessandro Vernet <[hidden email][hidden email]> wrote:
Thao,

On Mon, Dec 28, 2009 at 7:52 PM, Nguyen Thi Ngoc Thao
> In this case, out_1.xml and out_2.xml is still in the memory and it can be
> the input for xslt processors step3 and step4. And here, I get confuse. You
> said the resulting from intermediary steps won't be kept, so how can it
> decide that it should keep out_1 and out_2 for the later steps but not
> out_0?

I am simplifying a bit, but in essence, if the output of a processor
is used in two places (by two inputs) A and B, when A reads the data,
a copy of the output has to be kept in memory for when B will do its
reading later. If you have several cases like that where the data is
large, that can lead to an out of memory error.

> The exception I got:
> [...]

From the exception, it looks like the problem might happen in the
result store used by a file serializer. By default the File Serializer
caches its data input (the data it is writing to disk). If the amount
if data is large, you'll want to disable that setting
cache-control/use-local-cache to false in the config. Do you indeed
have a File Serializer? Could you try this? You'll find more on the
config for the File Serializer at:

http://www.orbeon.com/orbeon/doc/processors-serializers

Alex
--
Orbeon Forms - Web forms, open-source, for the Enterprise
Orbeon's Blog: http://www.orbeon.com/blog/
My Twitter: http://twitter.com/avernet

--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: [hidden email]
For general help: [hidden email]
OW2 mailing lists service home page: http://www.ow2.org/wws


--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws
--
Follow Orbeon on Twitter: @orbeon
Follow me on Twitter: @avernet
Reply | Threaded
Open this post in threaded view
|

Re: Re: Re: Re: Re: Re: Re: Re: "Java heap space" error

thao nguyen
Alex,

That's alreay good news. Thank you very much!

Thao

On Wed, Jan 20, 2010 at 11:05 AM, Alessandro Vernet <[hidden email]> wrote:
Thao,

I have the change done, but need to forward-port it to the latest code. I hope to find the time to do this in the next few days. Sorry to make you wait!

Alex



On Jan 12, 2010, at 1:54 AM, Nguyen Thi Ngoc Thao <[hidden email]> wrote:

Hi Alex,

Many thanks and I'm looking forward to hearing about your XSLT processor update.

Thao

On Tue, Jan 12, 2010 at 9:02 AM, Alessandro Vernet <[hidden email][hidden email]> wrote:
Thao,

1) You can't really force content out of memory. If a processor keeps a reference to something in memory, it is either because it needs it, or it doesn't, it is a bug :).

2) Michael Kay, the author of Saxon, usually says that you should count on a factor of 5. But that is if only one copy of the XML is in memory. If you have two, then you get to 10.

3) Working on some optimization for a client during the last few weeks, I found that our XSLT processor was, by mistake (!), always storing its input in cache. This can have a very bad impact on memory usage, especially if you are manipulating large document. I will get this fix in the codebase and post a follow-up message when done.

Alex



On Jan 11, 2010, at 12:33 AM, Nguyen Thi Ngoc Thao <[hidden email][hidden email]> wrote:

Hi Alex,

The hint of using "cache-control/use-local-cache" works. I could save a lot of space, thank you very much.
To answer your question, I have to use file serializer for debug purpose. Now, I use it as less as possible. However, it still has not enough space. The intermediate results gets bigger now. It's now around 65MB a file. Therefore, I separate one big pipeline into several smaller pipelines. Combining those small pipelines in one pipeline (a container pipeline) by using oxf:pipeline, unfortunately, failed. But it works when I call those small pipelines by a DOS batch file. I guess the container pipeline use the same heap for all children pipelines it called. So my questions are:
- Can I, somehow, force the container pipeline to delete the memory used by child pipeline which is already called?
- When I call orbeon-cli for one oxf:xslt step with input is around 65MB, the heap size imediately reach around 700MB, so, the heap size is around 10 times the real input size. Do you know if it's the correct ratio? Does it also depend on the xslt file? (maybe how complexity the xslt is?)
- And finally (but I guess, hopeless) how can I prevent the java-heap-space error?

Thanks very much again for your support so far,
Thao

On Thu, Dec 31, 2009 at 8:08 AM, Alessandro Vernet <[hidden email][hidden email][hidden email]> wrote:
Thao,

On Mon, Dec 28, 2009 at 7:52 PM, Nguyen Thi Ngoc Thao
> In this case, out_1.xml and out_2.xml is still in the memory and it can be
> the input for xslt processors step3 and step4. And here, I get confuse. You
> said the resulting from intermediary steps won't be kept, so how can it
> decide that it should keep out_1 and out_2 for the later steps but not
> out_0?

I am simplifying a bit, but in essence, if the output of a processor
is used in two places (by two inputs) A and B, when A reads the data,
a copy of the output has to be kept in memory for when B will do its
reading later. If you have several cases like that where the data is
large, that can lead to an out of memory error.

> The exception I got:
> [...]

From the exception, it looks like the problem might happen in the
result store used by a file serializer. By default the File Serializer
caches its data input (the data it is writing to disk). If the amount
if data is large, you'll want to disable that setting
cache-control/use-local-cache to false in the config. Do you indeed
have a File Serializer? Could you try this? You'll find more on the
config for the File Serializer at:

http://www.orbeon.com/orbeon/doc/processors-serializers

Alex
--
Orbeon Forms - Web forms, open-source, for the Enterprise
Orbeon's Blog: http://www.orbeon.com/blog/
My Twitter: http://twitter.com/avernet



--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws
Reply | Threaded
Open this post in threaded view
|

Re: Re: Re: Re: Re: Re: Re: Re: Re: "Java heap space" error

Alessandro  Vernet
Administrator
Thao,

I just checked in the fix for this. So it will be included in the next
nightly build. And for reference, the bug is:

http://forge.ow2.org/tracker/index.php?func=detail&aid=314956&group_id=168&atid=350207

Alex

On Thu, Jan 28, 2010 at 8:59 PM, Nguyen Thi Ngoc Thao
<[hidden email]> wrote:

> Alex,
>
> That's alreay good news. Thank you very much!
>
> Thao
>
> On Wed, Jan 20, 2010 at 11:05 AM, Alessandro Vernet <[hidden email]>
> wrote:
>>
>> Thao,
>> I have the change done, but need to forward-port it to the latest code. I
>> hope to find the time to do this in the next few days. Sorry to make you
>> wait!
>> Alex
>>
>>
>> On Jan 12, 2010, at 1:54 AM, Nguyen Thi Ngoc Thao
>> <[hidden email]> wrote:
>>
>> Hi Alex,
>>
>> Many thanks and I'm looking forward to hearing about your XSLT processor
>> update.
>>
>> Thao
>>
>> On Tue, Jan 12, 2010 at 9:02 AM, Alessandro Vernet <[hidden email]>
>> wrote:
>>>
>>> Thao,
>>> 1) You can't really force content out of memory. If a processor keeps a
>>> reference to something in memory, it is either because it needs it, or it
>>> doesn't, it is a bug :).
>>> 2) Michael Kay, the author of Saxon, usually says that you should count
>>> on a factor of 5. But that is if only one copy of the XML is in memory. If
>>> you have two, then you get to 10.
>>> 3) Working on some optimization for a client during the last few weeks, I
>>> found that our XSLT processor was, by mistake (!), always storing its input
>>> in cache. This can have a very bad impact on memory usage, especially if you
>>> are manipulating large document. I will get this fix in the codebase and
>>> post a follow-up message when done.
>>> Alex
>>>
>>>
>>> On Jan 11, 2010, at 12:33 AM, Nguyen Thi Ngoc Thao
>>> <[hidden email]> wrote:
>>>
>>> Hi Alex,
>>>
>>> The hint of using "cache-control/use-local-cache" works. I could save a
>>> lot of space, thank you very much.
>>> To answer your question, I have to use file serializer for debug purpose.
>>> Now, I use it as less as possible. However, it still has not enough space.
>>> The intermediate results gets bigger now. It's now around 65MB a file.
>>> Therefore, I separate one big pipeline into several smaller pipelines.
>>> Combining those small pipelines in one pipeline (a container pipeline) by
>>> using oxf:pipeline, unfortunately, failed. But it works when I call those
>>> small pipelines by a DOS batch file. I guess the container pipeline use the
>>> same heap for all children pipelines it called. So my questions are:
>>> - Can I, somehow, force the container pipeline to delete the memory used
>>> by child pipeline which is already called?
>>> - When I call orbeon-cli for one oxf:xslt step with input is around 65MB,
>>> the heap size imediately reach around 700MB, so, the heap size is around 10
>>> times the real input size. Do you know if it's the correct ratio? Does it
>>> also depend on the xslt file? (maybe how complexity the xslt is?)
>>> - And finally (but I guess, hopeless) how can I prevent the
>>> java-heap-space error?
>>>
>>> Thanks very much again for your support so far,
>>> Thao
>>>
>>> On Thu, Dec 31, 2009 at 8:08 AM, Alessandro Vernet <[hidden email]>
>>> wrote:
>>>>
>>>> Thao,
>>>>
>>>> On Mon, Dec 28, 2009 at 7:52 PM, Nguyen Thi Ngoc Thao
>>>> <[hidden email]> wrote:
>>>> > In this case, out_1.xml and out_2.xml is still in the memory and it
>>>> > can be
>>>> > the input for xslt processors step3 and step4. And here, I get
>>>> > confuse. You
>>>> > said the resulting from intermediary steps won't be kept, so how can
>>>> > it
>>>> > decide that it should keep out_1 and out_2 for the later steps but not
>>>> > out_0?
>>>>
>>>> I am simplifying a bit, but in essence, if the output of a processor
>>>> is used in two places (by two inputs) A and B, when A reads the data,
>>>> a copy of the output has to be kept in memory for when B will do its
>>>> reading later. If you have several cases like that where the data is
>>>> large, that can lead to an out of memory error.
>>>>
>>>> > The exception I got:
>>>> > [...]
>>>>
>>>> From the exception, it looks like the problem might happen in the
>>>> result store used by a file serializer. By default the File Serializer
>>>> caches its data input (the data it is writing to disk). If the amount
>>>> if data is large, you'll want to disable that setting
>>>> cache-control/use-local-cache to false in the config. Do you indeed
>>>> have a File Serializer? Could you try this? You'll find more on the
>>>> config for the File Serializer at:
>>>>
>>>> http://www.orbeon.com/orbeon/doc/processors-serializers
>>>>
>>>> Alex
>>>> --
>>>> Orbeon Forms - Web forms, open-source, for the Enterprise
>>>> Orbeon's Blog: http://www.orbeon.com/blog/
>>>> My Twitter: http://twitter.com/avernet
>
>
>
> --
> You receive this message as a subscriber of the [hidden email] mailing
> list.
> To unsubscribe: mailto:[hidden email]
> For general help: mailto:[hidden email]?subject=help
> OW2 mailing lists service home page: http://www.ow2.org/wws
>
>


--
Orbeon Forms - Web forms, open-source, for the Enterprise -
http://www.orbeon.com/
My Twitter: http://twitter.com/avernet


--
You receive this message as a subscriber of the [hidden email] mailing list.
To unsubscribe: mailto:[hidden email]
For general help: mailto:[hidden email]?subject=help
OW2 mailing lists service home page: http://www.ow2.org/wws
--
Follow Orbeon on Twitter: @orbeon
Follow me on Twitter: @avernet