How to diagnose stability issues

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

How to diagnose stability issues

Aaron Spike
I had very few (if any) issues with stability in my development and testing of Orbeon Forms. But now that I'm transitioning to the intended production environment Orbeon goes unresponsive every few hours and I need to restart tomcat. I don't see any stack traces in the logs. (I did have one error relating to JNDIRealm's LDAP connections timing out but this has been resolved.) I tried increasing the memory and Tomcat's Java options are now " -Xms3072m -Xmx3072m -XX:MaxPermSize=256m " but this didn't seem to solve any problems. Surprisingly little memory appears to be consumed on the host and there is no swapping. I saw in the installation instructions a note about not using GCJ:

On Unix systems, we recommend you don't use GIJ / GCG, as there are reports of issues with that runtime environment and Orbeon Forms. Instead, we recommend you use the Sun/Oracle runtime Java environment.
- https://github.com/orbeon/orbeon-forms/wiki/Installation-~-Tomcat
 
 I don't believe I am. Java -version return:

java version "1.7.0_71"
OpenJDK Runtime Environment (IcedTea 2.5.3) (suse-6.2-x86_64)
OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode)

Are there any known problems with OpenJDK? I'm using stock packages for Java and Tomcat but I do see the following APR warning on start:

SEVERE: An incompatible version 1.1.27 of the APR based Apache Tomcat Native library is installed, while Tomcat requires version 1.1.30

 I don't know what this version mismatch affects, but I plan to try a different version of the distro to see if there's a difference.

Being relatively new to the world of Java and Tomcat, I'm not sure where to look to track down this issue. Are there any logging options I should increase that might be helpful? What would others suggest I do to get a better picture of the issue?

Aaron Spike

This electronic communication, including any attached documents, may contain confidential and/or legally privileged information that is intended only for use by the recipient(s) named above. If you have received this communication in error, please notify the sender immediately and delete the communication and any attachments. Views expressed by the author do not necessarily represent those of Martin Luther College.

--
You received this message because you are subscribed to the Google Groups "Orbeon Forms" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Reply | Threaded
Open this post in threaded view
|

Re: How to diagnose stability issues

Alessandro  Vernet
Administrator
Hi Aaron,

2 quick questions to get started:

- Which version of Orbeon Forms are you using?
- Are you using eXist for persistence, or another database?

Alex
--
Follow Orbeon on Twitter: @orbeon
Follow me on Twitter: @avernet
Reply | Threaded
Open this post in threaded view
|

Re: How to diagnose stability issues

Aaron Spike
On Wednesday, February 4, 2015 at 1:33:21 PM UTC-6, Alessandro Vernet wrote:
2 quick questions to get started:
- Which version of Orbeon Forms are you using?
 
4.9.0pre. I realize I'm treading on thin ice using unreleased software, but that's where the neat new stuff is. :-)

- Are you using eXist for persistence, or another database?

PostgreSQL. It occurred to me after posting this morning that the issue might be similar to the idle connection timeout issues I experienced with JNDIRealm contacting the LDAP server, but this time for PostgreSQL. With the LDAP problem I saw an exception in the log file, I don't see any indications at the moment. We've made some configuration adjustments and I haven't had to restart the server yet except to push a configuration change. I'm interested to see if Orbeon will stay responsive over night. It is a little troubling to me that there are no potentially related log messages, but I still haven't fully grokked java/tomcat logging configuration.

Aaron Spike


This electronic communication, including any attached documents, may contain confidential and/or legally privileged information that is intended only for use by the recipient(s) named above. If you have received this communication in error, please notify the sender immediately and delete the communication and any attachments. Views expressed by the author do not necessarily represent those of Martin Luther College.

--
You received this message because you are subscribed to the Google Groups "Orbeon Forms" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Reply | Threaded
Open this post in threaded view
|

Re: How to diagnose stability issues

Aaron Spike
Orbeon was perfectly responsive when I arrived this morning. So the problem is quite likely over zealous firewalls truncating idle JDBC connections. I'm thinking this isn't primarily an Orbeon issue, though perhaps there is something that Orbeon could recover from the situation better. I would still like to know how to get Tomcat, the JDBC driver, and the logging system to tell me about the problem. I would also like to learn how to configure Tomcat and JDBC to recover properly. Not sure where to find answers to these questions, so if there are some experts lurking here, please let me know what you think.

Aaron Spike

This electronic communication, including any attached documents, may contain confidential and/or legally privileged information that is intended only for use by the recipient(s) named above. If you have received this communication in error, please notify the sender immediately and delete the communication and any attachments. Views expressed by the author do not necessarily represent those of Martin Luther College.

--
You received this message because you are subscribed to the Google Groups "Orbeon Forms" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Reply | Threaded
Open this post in threaded view
|

Re: How to diagnose stability issues

Alessandro  Vernet
Administrator
Hi Aaron,

Ah, right, you're using PostgreSQL. So that shouldn't be an issue (I was thinking of a problem that could happen when using the internal eXist under heavy load.) And I'm glad that the system is now responsive, but you'll let us know if you see anything worrisome. Obviously, stability issues are of the utmost importance!

About monitoring JDBC connection, in the past we've used https://code.google.com/p/log4jdbc/ and found it quite useful. Maybe something you can experiment with?

Alex
--
Follow Orbeon on Twitter: @orbeon
Follow me on Twitter: @avernet
Reply | Threaded
Open this post in threaded view
|

Re: How to diagnose stability issues

Aaron Spike
Log4jdbc looks super useful for development. Wanted something like that a number of times.

Again, I realize this isn't really an Orbeon question. If I put a postgresql category in the log4j.xml inside of the orbeon war, does that work? or no because the JDBC connection is really at the container level?

    <category name="org.postgresql">
       
<priority value="debug"/>
   
</category>

This electronic communication, including any attached documents, may contain confidential and/or legally privileged information that is intended only for use by the recipient(s) named above. If you have received this communication in error, please notify the sender immediately and delete the communication and any attachments. Views expressed by the author do not necessarily represent those of Martin Luther College.

--
You received this message because you are subscribed to the Google Groups "Orbeon Forms" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Reply | Threaded
Open this post in threaded view
|

Re: How to diagnose stability issues

Alessandro  Vernet
Administrator
Hi Aaron,

If I had to guess, I'd say that the driver is loaded using a different class loader, and thus if it uses Log4j, it won't be the Log4j inside the orbeon.war, and so "Orbeon's" log4j.xml won't be used in that case. But you'll let us know what you find.

Alex
--
Follow Orbeon on Twitter: @orbeon
Follow me on Twitter: @avernet
Reply | Threaded
Open this post in threaded view
|

Re: How to diagnose stability issues

Alessandro  Vernet
Administrator
Hi Aaron,

Just wondering: did you get to find what was going on with the JDBC connections to PostgreSQL, and if there was a problem at that level?

Alex
--
Follow Orbeon on Twitter: @orbeon
Follow me on Twitter: @avernet
Reply | Threaded
Open this post in threaded view
|

Re: How to diagnose stability issues

Aaron Spike
Idle connections were being dropped by a stateful firewall's idle connection timeout. I was able to improve my configuration to avoid the issue. But I have to say I'm still a little disappointed in how all of the components involved failed to alert me to the problem.

After digging around in the Postgresql JDBC driver code I figured out why there were no error messages. The driver has two internal log levels:

public static final int DEBUG = 2;
public static final int INFO = 1;

But the default log level is 0:

private int level = 0;

So the driver is absolutely silent by default. Thankfully loglevel can be set easily by adding a value to the connection string.

While reading the driver documentation (https://jdbc.postgresql.org/documentation/93/connect.html) I found another parameter, tcpKeepAlive, which I attempted to use to solve the problem in conjunction with lowering the value in /proc/sys/net/ipv4/tcp_keepalive_time (http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/usingkeepalive.html). But I had more success properly configuring the connection pool (http://commons.apache.org/proper/commons-dbcp/configuration.html). The default value of timeBetweenEvictionRunsMillis disables eviction runs even if testWhileIdle is true. testOnReturn also seemed to help the situation, but to be honest I didn't do enough testing to see which of the few values I changed were most important.

It was a bit of a struggle. But I did eventually get past it. I'm learning that Java documentation tends to be sparse, scattered and terse. I blame the general competence of the java community and the tendency toward modularity and reuse. :-)





This electronic communication, including any attached documents, may contain confidential and/or legally privileged information that is intended only for use by the recipient(s) named above. If you have received this communication in error, please notify the sender immediately and delete the communication and any attachments. Views expressed by the author do not necessarily represent those of Martin Luther College.

--
You received this message because you are subscribed to the Google Groups "Orbeon Forms" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Reply | Threaded
Open this post in threaded view
|

Re: How to diagnose stability issues

Alessandro  Vernet
Administrator
Hi Aaron,

Thank you for the update, and I'm glad that it was "just" idle connections in the pool causing the problem. In our experience, the simplest and safest way to deal with this is to add a testOnBorrow="true" on the <Resource> , e.g. for Oracle:

https://github.com/orbeon/orbeon-forms/wiki/Installation-~-Relational-Database-Setup#tomcat

This way every connection will be tested just before being used, every single time. For some applications, that might cause an unacceptable performance overhead, but with Orbeon Forms we haven't noticed anything like that in our testing.

Alex
--
Follow Orbeon on Twitter: @orbeon
Follow me on Twitter: @avernet
Reply | Threaded
Open this post in threaded view
|

Re: How to diagnose stability issues

Aaron Spike
I had followed those recommendations, but testOnBorrow proved to be insufficient. I believe this is because it was, in fact, the borrow test that failed and stalled.


On Friday, February 13, 2015 at 2:37:20 AM UTC-5, Alessandro Vernet wrote:
Hi Aaron,

Thank you for the update, and I'm glad that it was "just" idle connections
in the pool causing the problem. In our experience, the simplest and safest
way to deal with this is to add a testOnBorrow="true" on the <Resource> ,
e.g. for Oracle:

<a href="https://github.com/orbeon/orbeon-forms/wiki/Installation-~-Relational-Database-Setup#tomcat" target="_blank" rel="nofollow" onmousedown="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Forbeon%2Forbeon-forms%2Fwiki%2FInstallation-~-Relational-Database-Setup%23tomcat\46sa\75D\46sntz\0751\46usg\75AFQjCNHgZfSA_eFer9IEB08MTj9kJOaqNw';return true;" onclick="this.href='https://www.google.com/url?q\75https%3A%2F%2Fgithub.com%2Forbeon%2Forbeon-forms%2Fwiki%2FInstallation-~-Relational-Database-Setup%23tomcat\46sa\75D\46sntz\0751\46usg\75AFQjCNHgZfSA_eFer9IEB08MTj9kJOaqNw';return true;">https://github.com/orbeon/orbeon-forms/wiki/Installation-~-Relational-Database-Setup#tomcat

This way every connection will be tested just before being used, every
single time. For some applications, that might cause an unacceptable
performance overhead, but with Orbeon Forms we haven't noticed anything like
that in our testing.

Alex

-----
--
Follow Orbeon on Twitter: @orbeon
Follow me on Twitter: @avernet
--
View this message in context: <a href="http://discuss.orbeon.com/How-to-diagnose-stability-issues-tp4659501p4659561.html" target="_blank" rel="nofollow" onmousedown="this.href='http://www.google.com/url?q\75http%3A%2F%2Fdiscuss.orbeon.com%2FHow-to-diagnose-stability-issues-tp4659501p4659561.html\46sa\75D\46sntz\0751\46usg\75AFQjCNGiHSLbaZk27TmalPCGXzE4978MtQ';return true;" onclick="this.href='http://www.google.com/url?q\75http%3A%2F%2Fdiscuss.orbeon.com%2FHow-to-diagnose-stability-issues-tp4659501p4659561.html\46sa\75D\46sntz\0751\46usg\75AFQjCNGiHSLbaZk27TmalPCGXzE4978MtQ';return true;">http://discuss.orbeon.com/How-to-diagnose-stability-issues-tp4659501p4659561.html
Sent from the Orbeon Forms community mailing list mailing list archive at Nabble.com.

This electronic communication, including any attached documents, may contain confidential and/or legally privileged information that is intended only for use by the recipient(s) named above. If you have received this communication in error, please notify the sender immediately and delete the communication and any attachments. Views expressed by the author do not necessarily represent those of Martin Luther College.

--
You received this message because you are subscribed to the Google Groups "Orbeon Forms" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Reply | Threaded
Open this post in threaded view
|

Re: How to diagnose stability issues

Aaron Spike


On Sunday, February 15, 2015 at 1:52:01 PM UTC-5, Aaron Spike wrote:
I had followed those recommendations, but testOnBorrow proved to be insufficient. I believe this is because it was, in fact, the borrow test that failed and stalled.

Incidentally, I believe this was solved by setting socketTimeout (https://jdbc.postgresql.org/documentation/93/connect.html) in the PostgreSQL JDBC driver connection string.    

This electronic communication, including any attached documents, may contain confidential and/or legally privileged information that is intended only for use by the recipient(s) named above. If you have received this communication in error, please notify the sender immediately and delete the communication and any attachments. Views expressed by the author do not necessarily represent those of Martin Luther College.

--
You received this message because you are subscribed to the Google Groups "Orbeon Forms" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Reply | Threaded
Open this post in threaded view
|

Re: How to diagnose stability issues

Alessandro  Vernet
Administrator
Hi Aaron,

You are just setting that property? Or is it in addition to testOnBorrow="true"? If just a socketTimeout, I am not really seeing how that would work; if anything it would just make the request fail faster (assuming the timeout is shorter), but wouldn't help re-establish another connection.

Alex
--
Follow Orbeon on Twitter: @orbeon
Follow me on Twitter: @avernet
Reply | Threaded
Open this post in threaded view
|

Re: How to diagnose stability issues

Aaron Spike
On Monday, February 16, 2015 at 12:11:59 PM UTC-6, Alessandro Vernet wrote:
Hi Aaron,

You are just setting that property? Or is it in addition to
testOnBorrow="true"?

In addition
 
If just a socketTimeout, I am not really seeing how
that would work; if anything it would just make the request fail faster
(assuming the timeout is shorter), but wouldn't help re-establish another
connection.

I believe the default is waiting on the read forever.

Aaron Spike 

This electronic communication, including any attached documents, may contain confidential and/or legally privileged information that is intended only for use by the recipient(s) named above. If you have received this communication in error, please notify the sender immediately and delete the communication and any attachments. Views expressed by the author do not necessarily represent those of Martin Luther College.

--
You received this message because you are subscribed to the Google Groups "Orbeon Forms" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Reply | Threaded
Open this post in threaded view
|

Re: How to diagnose stability issues

Alessandro  Vernet
Administrator
Hi Aaron,

Then this makes sense! What value did you use? I'm thinking we should change our documentation to also recommend setting a timeout.

Alex
--
Follow Orbeon on Twitter: @orbeon
Follow me on Twitter: @avernet
Reply | Threaded
Open this post in threaded view
|

Re: How to diagnose stability issues

Aaron Spike
My current "kitchen sink" config looks like this:

  <Resource name="jdbc/postgresql" auth="Container" type="javax.sql.DataSource"
   
initialSize="3" maxActive="10" maxIdle="20" maxWait="30000"
   
driverClassName="org.postgresql.Driver"
   
poolPreparedStatements="true"
   
validationQuery="select version();"
   
testOnBorrow="true"
   
testOnReturn="true"
   
testWhileIdle="true"
   
timeBetweenEvictionRunsMillis="1800000"
   
minEvictableIdleTimeMillis="1800000"
   
username="orbeon"
   
password="orbeon"
   
url="jdbc:postgresql://server.example.com:5432/DBNAME?useUnicode=true&amp;characterEncoding=UTF8&amp;socketTimeout=30&amp;tcpKeepAlive=true&amp;loglevel=1"
   
/>

This made it work. Caveat lector: I have not yet taken the time to validate which parameters are strictly necessary nor to optimize the values. 

Ideally you wouldn't have to duplicate all of the documentation for the db pool or postgresql driver in the Orbeon documentation. But as a noob, I can tell you how extremely helpful a little duplication and explanation would be. It takes a while to understand the boundaries between components and responsibilities. Anything you do to ease the learning curve would be much appreciated.

Aaron Spike

This electronic communication, including any attached documents, may contain confidential and/or legally privileged information that is intended only for use by the recipient(s) named above. If you have received this communication in error, please notify the sender immediately and delete the communication and any attachments. Views expressed by the author do not necessarily represent those of Martin Luther College.

--
You received this message because you are subscribed to the Google Groups "Orbeon Forms" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To post to this group, send email to [hidden email].
Reply | Threaded
Open this post in threaded view
|

Re: How to diagnose stability issues

Alessandro  Vernet
Administrator
Hi Aaron,

Yes, indeed. So I put all the parameters you're using, except the one changing the log level, in our documentation:

https://github.com/orbeon/orbeon-forms/wiki/Installation-~-Relational-Database-Setup#postgresql-1

Thanks for sharing,

Alex
--
Follow Orbeon on Twitter: @orbeon
Follow me on Twitter: @avernet