Hello orbeon users, I got an error: "Java heap space" when calling my xpl file. I hope anyone can help me with these below questions: The situation: I have several XSLT files. The xpl file loads an input and do transform through those XSLT files by calling oxf:xslt. I call the xpl file by this command: java -Xms768M -Xmx768M -jar "orbeon-cli.jar" "file:/D:/my-folder/my_pipeline.xpl" An input file is currently 3.5MB. There are around 20 xslt files, each transformation step output is the input for the following transformation step. The intermediate result can reach to 20MB. As I can describe in this flow: input.xml (3.5M) ---[step1.xsl]---> out_1.xml (~20M) ---[step2.xsl]---> out_2.xml(~20MB)---[...]---> ...---[stepN.xsl]---> out_N.xml. It cannot go through all steps, somewhere in step xx, it raised error: "Java heap space". So, my questions are: 1. Can I remove the intermediate result out of the memory? (for the same input, it can run through around 10 xslt transformations, so I afraid it store too much in the memory) Does the null-serializer works for this purpose? 2. I have a look at the GC, and it seems to me that Orbeon did not remove all the memory. Is there any option I can set to force GC to clean the memory? If it just increasing, using 1GB or more might also raise to "Java heap space" error. 3. Do I make a right choice to use orbeon pipeline for this linear steps of transformation? I could not find the orbeon limitation written somewhere, anyone has a clue? I appreciate very much for any response! Thao Nguyen -- You receive this message as a subscriber of the [hidden email] mailing list. To unsubscribe: mailto:[hidden email] For general help: mailto:[hidden email]?subject=help OW2 mailing lists service home page: http://www.ow2.org/wws |
Administrator
|
Thao,
On Sun, Dec 27, 2009 at 10:43 PM, Thao Nguyen <[hidden email]> wrote: > 1. Can I remove the intermediate result out of the memory? (for the same input, > it can run through around 10 xslt transformations, so I afraid it store too > much in the memory) Does the null-serializer works for this purpose? The data resulting from intermediary steps should not be kept in memory (it is not stored in cache). This unless the output of a step is read from multiple places, e.g. from a <p:choose> and from a <p:input>. Is that the case in your pipeline, or is it strictly linear? Could you also quote the top ~20 lines of the exception you get, so we can see where the error happens? > 2. I have a look at the GC, and it seems to me that Orbeon did not remove all > the memory. Is there any option I can set to force GC to clean the memory? If > it just increasing, using 1GB or more might also raise to "Java heap space" > error. That won't help. When the VM issues a OutOfMemoryError, it is because it really can't allocate any more memory after doing a full GC. > 3. Do I make a right choice to use orbeon pipeline for this linear steps of > transformation? I could not find the orbeon limitation written somewhere, > anyone has a clue? I'd say you should be fine processing XML files of ~20MB with 768 MB, but apparently there is a snag along the way. Let's try to find what the issue is! Alex -- Orbeon Forms - Web forms, open-source, for the Enterprise Orbeon's Blog: http://www.orbeon.com/blog/ My Twitter: http://twitter.com/avernet -- You receive this message as a subscriber of the [hidden email] mailing list. To unsubscribe: mailto:[hidden email] For general help: mailto:[hidden email]?subject=help OW2 mailing lists service home page: http://www.ow2.org/wws
--
Follow Orbeon on Twitter: @orbeon Follow me on Twitter: @avernet |
Hello Alex,
Thank you very much for your response! On Tue, Dec 29, 2009 at 5:01 AM, Alessandro Vernet <[hidden email]> wrote:
Thao, The pipeline includes at top some identities for its references. Those files are around 5MB. The pipeline is not strictly linear, I mean the output of one step is not always the input of the following. It has sometimes <p:choose>, yes. The <p:input> of one step can be the output of the directly previous step or some steps further before. Sorry that I didn't made a good flow. I try to make another one here, I hope it's clearer. input.xml (3.5M) ++++ ref1.xml +++ ref2.xml | | | | [step0.xsl] | | | | | out_0.xml (~20M) | | | | | [step1.xsl] | | | | | out_1.xml (~20M) | | | | | [step2.xsl] | | | | | out_2.xml(~20MB) | | | | | choose________________| |-----------------------| | | [step3.xsl] [step4.xsl] in: out_1.xml in: out_2.xml | | | | |_______________| | | out_3.xml ---[...]---> ...---[stepN.xsl]---> out_N.xml. In this case, out_1.xml and out_2.xml is still in the memory and it can be the input for xslt processors step3 and step4. And here, I get confuse. You said the resulting from intermediary steps won't be kept, so how can it decide that it should keep out_1 and out_2 for the later steps but not out_0? The exception I got: ////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////// 2009-12-23 11:35:10,512 ERROR org.orbeon.oxf.main.OPS - Exception with no location data java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Unknown Source) at java.io.ByteArrayOutputStream.write(Unknown Source) at org.orbeon.oxf.processor.serializer.store.ResultStoreOutputStream.write(ResultStoreOutputStream.java:48) at sun.nio.cs.StreamEncoder.writeBytes(Unknown Source) at sun.nio.cs.StreamEncoder.implWrite(Unknown Source) at sun.nio.cs.StreamEncoder.write(Unknown Source) at java.io.OutputStreamWriter.write(Unknown Source) at org.orbeon.saxon.tinytree.CharSlice.write(CharSlice.java:170) at org.orbeon.saxon.event.XMLEmitter.writeCharSequence(XMLEmitter.java:595) at org.orbeon.saxon.event.XMLEmitter.characters(XMLEmitter.java:548) at org.orbeon.saxon.event.XMLIndenter.indent(XMLIndenter.java:196) at org.orbeon.saxon.event.XMLIndenter.startElement(XMLIndenter.java:73) at org.orbeon.saxon.event.ReceivingContentHandler.startElement(ReceivingContentHandler.java:217) at org.orbeon.oxf.xml.ForwardingContentHandler.startElement(ForwardingContentHandler.java:87) at org.orbeon.oxf.xml.ForwardingContentHandler.startElement(ForwardingContentHandler.java:87) at org.orbeon.oxf.xml.ForwardingContentHandler.startElement(ForwardingContentHandler.java:87) at org.orbeon.oxf.xml.SAXStore.startElement(SAXStore.java:401) at org.orbeon.oxf.xml.SimpleForwardingContentHandler.startElement(SimpleForwardingContentHandler.java:69) at org.orbeon.oxf.processor.transformer.xslt.XSLTTransformer$3.startElement(XSLTTransformer.java:267) at org.orbeon.saxon.event.ContentHandlerProxy.startContent(ContentHandlerProxy.java:349) at org.orbeon.saxon.event.ProxyReceiver.startContent(ProxyReceiver.java:162) at org.orbeon.saxon.event.NamespaceReducer.startContent(NamespaceReducer.java:168) at org.orbeon.saxon.event.ComplexContentOutputter.startContent(ComplexContentOutputter.java:502) at org.orbeon.saxon.event.ComplexContentOutputter.endElement(ComplexContentOutputter.java:383) at org.orbeon.saxon.instruct.ElementCreator.processLeavingTail(ElementCreator.java:253) at org.orbeon.saxon.instruct.Copy.processLeavingTail(Copy.java:152) at org.orbeon.saxon.instruct.Template.applyLeavingTail(Template.java:99) at org.orbeon.saxon.instruct.ApplyTemplates.applyTemplates(ApplyTemplates.java:319) at org.orbeon.saxon.instruct.ApplyTemplates.apply(ApplyTemplates.java:189) at org.orbeon.saxon.instruct.ApplyTemplates.processLeavingTail(ApplyTemplates.java:153) at org.orbeon.saxon.instruct.Block.processLeavingTail(Block.java:353) at org.orbeon.saxon.instruct.Instruction.process(Instruction.java:91) 2009-12-23 11:35:10,668 INFO org.orbeon.oxf.main.OPS - / - Timing: 2357557 - Cache hits for cache.main: 159, fault: 118, adds: 135, expirations: 0, success rate: 57% ////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////// I'm really hope you can help me figure out what's wrong with my pipe. Thank you! Thao
-- You receive this message as a subscriber of the [hidden email] mailing list. To unsubscribe: mailto:[hidden email] For general help: mailto:[hidden email]?subject=help OW2 mailing lists service home page: http://www.ow2.org/wws |
Administrator
|
Thao,
On Mon, Dec 28, 2009 at 7:52 PM, Nguyen Thi Ngoc Thao <[hidden email]> wrote: > In this case, out_1.xml and out_2.xml is still in the memory and it can be > the input for xslt processors step3 and step4. And here, I get confuse. You > said the resulting from intermediary steps won't be kept, so how can it > decide that it should keep out_1 and out_2 for the later steps but not > out_0? I am simplifying a bit, but in essence, if the output of a processor is used in two places (by two inputs) A and B, when A reads the data, a copy of the output has to be kept in memory for when B will do its reading later. If you have several cases like that where the data is large, that can lead to an out of memory error. > The exception I got: > [...] From the exception, it looks like the problem might happen in the result store used by a file serializer. By default the File Serializer caches its data input (the data it is writing to disk). If the amount if data is large, you'll want to disable that setting cache-control/use-local-cache to false in the config. Do you indeed have a File Serializer? Could you try this? You'll find more on the config for the File Serializer at: http://www.orbeon.com/orbeon/doc/processors-serializers Alex -- Orbeon Forms - Web forms, open-source, for the Enterprise Orbeon's Blog: http://www.orbeon.com/blog/ My Twitter: http://twitter.com/avernet -- You receive this message as a subscriber of the [hidden email] mailing list. To unsubscribe: mailto:[hidden email] For general help: mailto:[hidden email]?subject=help OW2 mailing lists service home page: http://www.ow2.org/wws
--
Follow Orbeon on Twitter: @orbeon Follow me on Twitter: @avernet |
Hi Alex,
The hint of using "cache-control/use-local-cache" works. I could save a lot of space, thank you very much. To answer your question, I have to use file serializer for debug purpose. Now, I use it as less as possible. However, it still has not enough space. The intermediate results gets bigger now. It's now around 65MB a file. Therefore, I separate one big pipeline into several smaller pipelines. Combining those small pipelines in one pipeline (a container pipeline) by using oxf:pipeline, unfortunately, failed. But it works when I call those small pipelines by a DOS batch file. I guess the container pipeline use the same heap for all children pipelines it called. So my questions are: - Can I, somehow, force the container pipeline to delete the memory used by child pipeline which is already called? - When I call orbeon-cli for one oxf:xslt step with input is around 65MB, the heap size imediately reach around 700MB, so, the heap size is around 10 times the real input size. Do you know if it's the correct ratio? Does it also depend on the xslt file? (maybe how complexity the xslt is?) - And finally (but I guess, hopeless) how can I prevent the java-heap-space error? Thanks very much again for your support so far, Thao On Thu, Dec 31, 2009 at 8:08 AM, Alessandro Vernet <[hidden email]> wrote:
Thao, -- You receive this message as a subscriber of the [hidden email] mailing list. To unsubscribe: mailto:[hidden email] For general help: mailto:[hidden email]?subject=help OW2 mailing lists service home page: http://www.ow2.org/wws |
Administrator
|
Thao, 1) You can't really force content out of memory. If a processor keeps a reference to something in memory, it is either because it needs it, or it doesn't, it is a bug :). 2) Michael Kay, the author of Saxon, usually says that you should count on a factor of 5. But that is if only one copy of the XML is in memory. If you have two, then you get to 10. 3) Working on some optimization for a client during the last few weeks, I found that our XSLT processor was, by mistake (!), always storing its input in cache. This can have a very bad impact on memory usage, especially if you are manipulating large document. I will get this fix in the codebase and post a follow-up message when done. Alex
-- You receive this message as a subscriber of the [hidden email] mailing list. To unsubscribe: mailto:[hidden email] For general help: mailto:[hidden email]?subject=help OW2 mailing lists service home page: http://www.ow2.org/wws
--
Follow Orbeon on Twitter: @orbeon Follow me on Twitter: @avernet |
Hi Alex,
Many thanks and I'm looking forward to hearing about your XSLT processor update. Thao On Tue, Jan 12, 2010 at 9:02 AM, Alessandro Vernet <[hidden email]> wrote:
-- You receive this message as a subscriber of the [hidden email] mailing list. To unsubscribe: mailto:[hidden email] For general help: mailto:[hidden email]?subject=help OW2 mailing lists service home page: http://www.ow2.org/wws |
Administrator
|
Thao, I have the change done, but need to forward-port it to the latest code. I hope to find the time to do this in the next few days. Sorry to make you wait! Alex
-- You receive this message as a subscriber of the [hidden email] mailing list. To unsubscribe: mailto:[hidden email] For general help: mailto:[hidden email]?subject=help OW2 mailing lists service home page: http://www.ow2.org/wws
--
Follow Orbeon on Twitter: @orbeon Follow me on Twitter: @avernet |
Alex,
That's alreay good news. Thank you very much! Thao On Wed, Jan 20, 2010 at 11:05 AM, Alessandro Vernet <[hidden email]> wrote:
-- You receive this message as a subscriber of the [hidden email] mailing list. To unsubscribe: mailto:[hidden email] For general help: mailto:[hidden email]?subject=help OW2 mailing lists service home page: http://www.ow2.org/wws |
Administrator
|
Thao,
I just checked in the fix for this. So it will be included in the next nightly build. And for reference, the bug is: http://forge.ow2.org/tracker/index.php?func=detail&aid=314956&group_id=168&atid=350207 Alex On Thu, Jan 28, 2010 at 8:59 PM, Nguyen Thi Ngoc Thao <[hidden email]> wrote: > Alex, > > That's alreay good news. Thank you very much! > > Thao > > On Wed, Jan 20, 2010 at 11:05 AM, Alessandro Vernet <[hidden email]> > wrote: >> >> Thao, >> I have the change done, but need to forward-port it to the latest code. I >> hope to find the time to do this in the next few days. Sorry to make you >> wait! >> Alex >> >> >> On Jan 12, 2010, at 1:54 AM, Nguyen Thi Ngoc Thao >> <[hidden email]> wrote: >> >> Hi Alex, >> >> Many thanks and I'm looking forward to hearing about your XSLT processor >> update. >> >> Thao >> >> On Tue, Jan 12, 2010 at 9:02 AM, Alessandro Vernet <[hidden email]> >> wrote: >>> >>> Thao, >>> 1) You can't really force content out of memory. If a processor keeps a >>> reference to something in memory, it is either because it needs it, or it >>> doesn't, it is a bug :). >>> 2) Michael Kay, the author of Saxon, usually says that you should count >>> on a factor of 5. But that is if only one copy of the XML is in memory. If >>> you have two, then you get to 10. >>> 3) Working on some optimization for a client during the last few weeks, I >>> found that our XSLT processor was, by mistake (!), always storing its input >>> in cache. This can have a very bad impact on memory usage, especially if you >>> are manipulating large document. I will get this fix in the codebase and >>> post a follow-up message when done. >>> Alex >>> >>> >>> On Jan 11, 2010, at 12:33 AM, Nguyen Thi Ngoc Thao >>> <[hidden email]> wrote: >>> >>> Hi Alex, >>> >>> The hint of using "cache-control/use-local-cache" works. I could save a >>> lot of space, thank you very much. >>> To answer your question, I have to use file serializer for debug purpose. >>> Now, I use it as less as possible. However, it still has not enough space. >>> The intermediate results gets bigger now. It's now around 65MB a file. >>> Therefore, I separate one big pipeline into several smaller pipelines. >>> Combining those small pipelines in one pipeline (a container pipeline) by >>> using oxf:pipeline, unfortunately, failed. But it works when I call those >>> small pipelines by a DOS batch file. I guess the container pipeline use the >>> same heap for all children pipelines it called. So my questions are: >>> - Can I, somehow, force the container pipeline to delete the memory used >>> by child pipeline which is already called? >>> - When I call orbeon-cli for one oxf:xslt step with input is around 65MB, >>> the heap size imediately reach around 700MB, so, the heap size is around 10 >>> times the real input size. Do you know if it's the correct ratio? Does it >>> also depend on the xslt file? (maybe how complexity the xslt is?) >>> - And finally (but I guess, hopeless) how can I prevent the >>> java-heap-space error? >>> >>> Thanks very much again for your support so far, >>> Thao >>> >>> On Thu, Dec 31, 2009 at 8:08 AM, Alessandro Vernet <[hidden email]> >>> wrote: >>>> >>>> Thao, >>>> >>>> On Mon, Dec 28, 2009 at 7:52 PM, Nguyen Thi Ngoc Thao >>>> <[hidden email]> wrote: >>>> > In this case, out_1.xml and out_2.xml is still in the memory and it >>>> > can be >>>> > the input for xslt processors step3 and step4. And here, I get >>>> > confuse. You >>>> > said the resulting from intermediary steps won't be kept, so how can >>>> > it >>>> > decide that it should keep out_1 and out_2 for the later steps but not >>>> > out_0? >>>> >>>> I am simplifying a bit, but in essence, if the output of a processor >>>> is used in two places (by two inputs) A and B, when A reads the data, >>>> a copy of the output has to be kept in memory for when B will do its >>>> reading later. If you have several cases like that where the data is >>>> large, that can lead to an out of memory error. >>>> >>>> > The exception I got: >>>> > [...] >>>> >>>> From the exception, it looks like the problem might happen in the >>>> result store used by a file serializer. By default the File Serializer >>>> caches its data input (the data it is writing to disk). If the amount >>>> if data is large, you'll want to disable that setting >>>> cache-control/use-local-cache to false in the config. Do you indeed >>>> have a File Serializer? Could you try this? You'll find more on the >>>> config for the File Serializer at: >>>> >>>> http://www.orbeon.com/orbeon/doc/processors-serializers >>>> >>>> Alex >>>> -- >>>> Orbeon Forms - Web forms, open-source, for the Enterprise >>>> Orbeon's Blog: http://www.orbeon.com/blog/ >>>> My Twitter: http://twitter.com/avernet > > > > -- > You receive this message as a subscriber of the [hidden email] mailing > list. > To unsubscribe: mailto:[hidden email] > For general help: mailto:[hidden email]?subject=help > OW2 mailing lists service home page: http://www.ow2.org/wws > > -- Orbeon Forms - Web forms, open-source, for the Enterprise - http://www.orbeon.com/ My Twitter: http://twitter.com/avernet -- You receive this message as a subscriber of the [hidden email] mailing list. To unsubscribe: mailto:[hidden email] For general help: mailto:[hidden email]?subject=help OW2 mailing lists service home page: http://www.ow2.org/wws
--
Follow Orbeon on Twitter: @orbeon Follow me on Twitter: @avernet |
Free forum by Nabble | Edit this page |