We encountered an “interesting” challenge recently where some, not all, OC4J containers in an Oracle Application Server 10.1.3.1.0 installation would “crash” (they would stop running). There was no apparent pattern to the “crazy” crashing containers. The system administrator was actively doing application (re)deployments at the rate of 3-4 per week. The containers seemed to be “crashing” randomly, sometimes throughout the day, sometimes just after a deployment.
We increased many timeouts for OPMN as we believed that OPMN was just incorrectly “seeing” the containers as down and restarting them. OPMN restarts them by shutting them down first and then starting them.
We filed cases with Oracle support to no avail–they didn’t come up with any useful suggestions in a week or more. They were trying, but didn’t come up with the solution.
The system administrator developed a theory based on what he believed was a pattern. Every time he did a deployment, he would notice a crash of all the non-Oracle default containers. That is, the home and OC4J_WebCenter containers didn’t crash.
The deployment process that he followed resulted in him connecting to the server using remote desktop. His remote desktop client was configured with the /console option which was required by some other servers he managed, more about that later.
Once he was able to demonstrate that he could make the containers crash each time he logged off, we started testing variations using the system console, the remote desktop client with and without the /console option and found a pattern. The remote desktop client without the /console option did not cause a crash, but all other combinations did. Through all of this, the home and OC4J_WebCenter containers remained up and running.
Bottom line: Read Metalink Note 245609.1 which documents the apparently, well-known fact that logging out from the Windows console causes JVM termination. The very simple fix is to start the containers with the “-Xrs” option which tells the JVM to ignore certain signals from the OS.
The really terrible thing about all this is that Oracle puts the -Xrs option on the containers deployed during the installation, but the OEM tool doesn’t add them to the container startup parameters for the custom containers. Easy to fix, even easy to find once you know what to look for.
This begs two questions:
- Why doesn’t Oracle add -Xrs to the startup options for the containers created after the initial installation? That would have avoided all the problems and there’s apparently no negative side effect–at least not that we’ve seen.
- How could an SR analyst not find this Metalink note and refer us to the simple solution? Granted, we didn’t find it easily in our searches either, but eventually it was one of us that found the article and solution. Now that we know the fix, a simple search for -Xrs on Metalink gets plenty of hits. As they say, hindsight is 20/20.
Hopefully, this information will help some of you that are lucky enough to work on OC4J deployments on Windows.