WebLogic Server Doesn't Restart After Failure

After a node failure, WebLogic Server fails to start.

The WebLogic service log contains an error message similar to the following:

java.io.IOException: Error from fcntl() for file locking, Resource temporarily unavailable, errno=11

Cause 1: NFSv3 servers don't include a lock lease feature, so lock states aren't stored and locks can't be released after the node failure.

Solution 1: Request removal of file locks. For more information, see Removing File Locks from a Host that is No Longer Available.

Cause 2: Sometimes, the rpc-statd service, which is needed for NFSv3 locking, is in an unhealthy state after the server failure. This can be verified by running a sample lock test using fcntl module. For example:

$python3
>>> import fcntl
>>> f = open('/fss/path/testfile.txt', 'r') #Open an existing file as read mode (do not use 'w')
>>> fcntl.flock(f, fcntl.LOCK_EX | fcntl.LOCK_NB) #Throws "no lock available" error.
>>> exit()

Solution 2: Restart the rpc-statd service.

  1. Open a terminal window on the instance and use the following commands as the root user:

    $sudo systemctl status rpc-statd 
    $sudo systemctl stop rpc-statd 
    $sudo systemctl start rpc-statd 
    $sudo systemctl status rpc-statd 
  2. Verify that the fcntl sample lock test completes without error.
  3. Start the WebLogic server.

Cause 3: NFSv3 doesn't track lock owners. So, NFS holds the lock indefinitely if a lock owner fails. After a node failure, a WebLogic restart attempt can't acquire a lock.

Solution 3: This is a general NFSv3 limitation. Immediate mitigation and long-term design considerations are provided in WebLogic's documentation. For more information, see Verifying Server Restart Behavior.