Hi All,
We have released an application into production, and experiencing slowness in rendering of the forms sporadically. We are using the Orbeon-4.10-CE version, deployed on Apache Tomcat 7, with Open JDK 6. (open JDK 6 is because of a particlar version of cocoon library) We have written standard XFroms, and are using the orbeon xform engine, that are deployed under /orbeon/WEB-INF/resources/apps/core-app/form.xhtml The form rendering requests usually take 2-3 secs, but when load increases i.e. we have 4-5 parallel requests to render the forms within this /core-app, the requests become extremely slow and take upto 1 minute, and sometimes even more 2 mins. When this happens the Ajax requests for /orbeon/xforms-server also are slow to repond i.e. from 300 ms - 10 secs. We have increased the Limiting Filter numThreads to 32 and maxThreads to 48. The machines we have are AWS ec2 instance with 8 cores. I tried to take a Thread dump and anlayse the requests, there are no blocking threads, all threads are running. These XFORM have 5 retrieve-instance requests to an HTTP service which retrieve data from a database, so it could be that those requests are taking longer and hene the page construct takes longer. But monitoring the database load and requests does not seem to suggest so. Any pointers on how to debug this forward would be useful. Thanks, Balvinder |
Additional point, when the requests are taking longer to respond the CPU spikes (from 10 - 50%) and then returns to normal, the memory is constant and <10% always |
In reply to this post by bsdang
Hi All,
Can someone please explain what these logs mean? A URL connection is opened to a HTTP service, the "end opening connection {time (ms): "7556"}" means the time taken to get the response back from the service ? But the "opening URL" and "end opening connection" both appear in the log 0.001 second apart ? what does that mean ? How can we accurately track the time taken by orbeon to receive response ? 2015-10-22 10:24:29,340 DEBUG XFormsServer - opening URL connection {method: "GET", URL: "https://core.communities.gov. uk:443/DataCollection/templates/2015-CORE-Sales?owning-institution=b635ffe4f5ef754859384513c9d01551f6756c5c&managing-institution=b635ffe 4f5ef754859384513c9d01551f6756c5c", Cookie: "SessionID=5e3d1fda3f4a74c9", Orbeon-Token: "f7b9975bf580d9acab9d85f7d8c5db3de1809088"} 2015-10-22 10:24:29,340 DEBUG XFormsServer - response {status code: "200"} 2015-10-22 10:24:29,340 DEBUG XFormsServer - response headers {Server: "MarkLogic", Keep-Alive: "timeout=2", X-Frame-Op tions: "SAMEORIGIN", Connection: "Keep-Alive", Content-Length: "6259", Content-Type: "application/xml", Set-Cookie: "SessionID=5e3d1fda3 f4a74c9; path=/; secure"} 2015-10-22 10:24:29,341 DEBUG XFormsServer - response has content 2015-10-22 10:24:29,341 DEBUG XFormsServer - saved HTTP state {scope: "session", cookie names: "SessionID"} 2015-10-22 10:24:29,341 DEBUG XFormsServer - end opening connection {time (ms): "7556"} 2015-10-22 10:24:29,341 DEBUG XFormsServer - deserializing to mutable instance 2015-10-22 10:24:29,341 DEBUG XFormsServer - end connecting {time (ms): "7556"} This Service/Database log tells me that this particular request took 0.0093 seconds on the back end service /templates/2015-CORE-Sales?owning-institution=b635ffe4f5ef754859384513c9d01551f6756c5c 2015-10-22 10:24:29.339 Debug: dclg-datamart: request uri is templates:retrieve method GET took PT0.009394S So if back end is taking 0.0093 secs to respond, why do I see end connecting {time (ms): "7556"} ? Please explain this log statement "end connecting {time (ms): "7556"}" ? Thanks, Balvinder |
Administrator
|
Hi Balvinder,
Which ec2 instance did you pick? We recommend you go with an option that provides you with a decent amount of memory, and you can find here results we got from load testing on actual customer forms deployed on AWS: http://doc.orbeon.com/faq/form-builder-runner.html#how-much-load-can-orbeon-forms-handle The limiter filter should ensure that you don't have too many requests being handled at the same time, which would typically reduce throughput. This is enabled by default, so unless you removed it or changed the web.xml, you shouldn't have a problem there, but it's worth double checking. More on this at: http://doc.orbeon.com/configuration/advanced/limiter-filter.html When you say you "increased the Limiting Filter numThreads to 32 and maxThreads to 48", are you referring to the Orbeon Forms filter? Or the Tomcat configuration? If you're referring to the Orbeon Forms filter, those value seem to high if you have 8 cores. With 8 cores it usually isn't worth processing more than 8 requests concurrently. You might increase throughput with a slightly higher value if you have some contention at the database level, but I'd start with 8, i.e. x1, which is the default. If you're referring to the Tomcat config, you should keep those values high to avoid deadlocks, and the Tomcat defaults are usually fine. You're saying that a service call takes 7556 ms, but the service responds in 93 ms. Indeed, there's something fishy there. Could you capture a larger portion of the log around the part that quoted and attach it to your response. Let's try to figure this out! Alex
--
Follow Orbeon on Twitter: @orbeon Follow me on Twitter: @avernet |
Thanks Alex, these are very useful tips.
Meanwhile we have found something which could be the main reason for this problem AWS ELB have this concept "pre-open" optimization for ELBs, which means ELB maintains open connection with the back end HTTP Server, and utilizes those threads when load increases. Please read this article. In our case we have orbeon request traffic going to the REST Service directly, i.e. not through ELB, and the ELB was also sending external traffic to REST service, which means when external load increases all threads are utilized by external traffic and then kept open, whereas the internal requests are queued up. Which shows up as sporadic response times in loading the Xform/instances. We are going to talk to AWS about disabling this "pre-open" optimization, or re-configure to route all traffic via the ELB. Thanks, Balvinder |
Administrator
|
Hi Balvinder,
I'm not familiar with Amazon's Elastic Load Balancing, but you'll let us know if disabling that "pre-open" optimization solves the problem. Alex
--
Follow Orbeon on Twitter: @orbeon Follow me on Twitter: @avernet |
Free forum by Nabble | Edit this page |