Patching Failures on Bare Metal and Virtual Machine DB Systems

Patching operations can fail for various reasons. Typically, an operation fails because a database node is down, there is insufficient space on the file system, or the database host cannot access the object store.

This topic includes information to help you determine the cause of the failure and fix the problem. The information is organized into several sections, based on the error condition. If you already know the cause, you can skip to the section with the suggested solution. Otherwise, use the procedure in Determining the Problem to get started.

Determining the Problem

In the Console, you can identify a failed patching operation by viewing the patch history of a DB system or an individual database. A patch that was not successfully applied displays a status of Failed and includes a brief description of the error that caused the failure. If the error message does not contain enough information to point you to a solution, you can use the database CLI and log files to gather more data. Then, refer to the applicable section in this topic for a solution.

To identify the root cause of the patching operation failure
  1. Log on to the host as the root user and navigate to the /opt/oracle/dcs/bin/ directory.

  2. Determine the sequence of operations performed on the database.

    dbcli list-jobs

    Note the last job ID listed with a status other than Success.

  3. With the job ID you noted from the previous step, use the following command to check the details of that job:

    dbcli describe-job -i <job_ID> -j

    Typically, running this command is enough to reveal the root cause of the failure.

  4. If you require more information, review the /opt/oracle/dcs/log/dcs-agent.log file.

    You can find the job ID in this file by using the timestamp returned by the job report in step 2.

  5. If the patching failure is on a 2-node RAC database, perform steps 3 and 4 on both nodes.

Database Service Agent Issues

Your Oracle Cloud Infrastructure Database makes use of an agent framework to allow you to manage your database through the cloud platform.

Resolving Patching Failures Caused By a Stopped Agent

Occasionally you might need to restart the dcsagent program if it has the status of stop/waiting to resolve a patching failure.

To restart the database service agent
  1. From a command prompt, check the status of the agent:

    initctl status initdcsagent
  2. If the agent is in the stop/waiting state, try to restart the agent:

    initctl start initdcsagent
  3. Check the status of the agent again to confirm that it has the start/running status:

    initctl status initdcsagent

Resolving Patching Failures Caused By an Agent That Needs to Be Updated

Patching can also fail if your agent needs to be updated. The system gives the following error message for this failure:

Current DcsAgent version is less than or equal to minimum required version.

To resolve this issue, perform the steps in To have Oracle Support update the Oracle Cloud Infrastructure Database service agent.

To have Oracle Support update the Oracle Cloud Infrastructure Database service agent
  1. Confirm that the agent (dcsagent) and DCS Admin program (dcsadmin) are running using the following commands:

    initctl status initdcsagent
    initctl status initdcsadmin

    If these programs are not running, use the following commands to restart them:

    initctl start initdcsagent
    initctl start initdcsadmin
  2. Follow the instructions in Obtaining Further Assistance to collect your DCS agent log files.
  3. Contact Oracle Support for assistance with updating the agent.

Object Store Connectivity Issues

Oracle Cloud Infrastructure DB system and database patches are stored in Oracle Cloud Infrastructure Object Storage. Therefore, successful patching operations require connectivity between the DB system host and the Object Storage location from which the patches are downloaded.

To ensure your database host can connect to Oracle Cloud Infrastructure Object Storage
  1. Use the following command to verify the host can access Oracle Cloud Infrastructure Object Storage:

    dbcli describe-latestpatch

    Example output indicating success:

    [root@<host> ~]# dbcli describe-latestpatch
    
    componentType availableVersion
    --------------- --------------------
    gi 12.2.0.1.180417
    gi 12.1.0.2.180417
    gi 18.2.0.0.180417
    db 11.2.0.4.180417
    db 12.2.0.1.180417
    db 12.1.0.2.180417
    db 18.2.0.0.180417
    oak 12.1.2.11.3
    oak 12.2.1.1.0

    Example output indicating failure:

    [root@<host> ~]# dbcli describe-latestpatch
    
    DCS-10032:Resource patch metadata is not found.Failed to download patchmetadata from objectstore
  2. If you cannot connect to the object store, refer to Prerequisites for how to configure object store connectivity.

Host and Oracle Clusterware Issues

One or more of the following conditions on the database host can cause patching operations to fail:

Database Node Not Running During the Patching Operation

All nodes of the database must be active and running while a patching operation is in progress, whether you are patching the DB system or the database home. Use the Console to check that the status of each node is AVAILABLE, and start the node, if needed.

The File System Is Full

Patching operations require a minimum of 15 GB of free space in the /u01 directory on the host file system. Use the df -h command on the host to check the available space. If the file system has insufficient space, you can remove old log or trace files to free up space.

The Oracle Clusterware Is Not Running

Oracle Clusterware enables servers to communicate with each other so that they can function as a collective unit. The cluster software program must be up and running on the DB system for patching operations to complete. Occasionally you might need to restart the Oracle Clusterware to resolve a patching failure.

To restart the Oracle Clusterware
  1. From command prompt, check the status of Oracle Clusterware:

    crsctl check crs

    Example output:

    [grid@<host> ~]$ crsctl check crs
    CRS-4638: Oracle High Availability Services is online
    CRS-4537: Cluster Ready Services is online
    CRS-4529: Cluster Synchronization Services is online
    CRS-4533: Event Manager is online

    For more detailed status information, you can run crsctl stat res -t.

  2. If Oracle Clusterware is not online, try to restart the program:

    crsctl start crs
  3. Check the status of Oracle Clusterware to confirm that it is online:

    crsctl check crs

The Oracle Grid Infrastructure (GI) Is Not Updated

This problem occurs when you try to patch a database before you patch the DB system of that database. The error description indicates that the Oracle Grid Infrastructure must be updated first. To resolve this issue, patch the DB system to latest available version. After you patch the DB system, you can retry the database patching operation.

To get the current and latest-available GI versions for the DB system, use the following command:

dbcli describe-component

Database Issues

An improper database state can lead to patching failures.

Database Not Running During the Patching Operation

The database must be active and running for all of the patching tasks to complete. Otherwise, you must run the datapatch task manually.

To check that the database is active and running

Use the following command to check the state of your database, and ensure that any problems that might have put the database in an improper state are resolved:

srvctl status database -d <db_unique_name> -verbose

The system returns a message including the database instance status. The instance status must be Open for the patching operation to succeed.

If the database is not running, use the following command to start it:

srvctl start database -d <db_unique_name> -o open

If the database is mounted but does not have the Open status, use the following commands to access the SQL*Plus command prompt and set the status to Open:

sqlplus / as sysdba
alter database open;
To run the datapatch task

Before you run the datapatch command, ensure that all pluggable databases (PDBs) are open. To open a PDB, you can use SQL*Plus to execute ALTER PLUGGABLE DATABASE <pdb_name> OPEN READ WRITE; against the PDB.

$ORACLE_HOME/OPatch/datapatch

The datapatch command should be run on each database home.

Obtaining Further Assistance

If you were unable to resolve the problem using the information in this topic, follow the procedures below to collect relevant database and diagnostic information. After you have collected this information, contact Oracle Support.

To collect diagnostic information regarding failed jobs
  1. Log on to the host as the root user and navigate to the /opt/oracle/dcs/bin/ directory.

  2. Run the following two commands to generate information about the failed job:

    dbcli list-jobs
    dbcli describe-job -i <job_ID> -j

    The <job_ID> in the second command should be the ID of the latest failed job reported from the first command.

  3. Run the diagnostics collector script to create a zip file with the diagnostic information for Oracle Support Services.

    diagcollector.py

    This command creates a file named diagLogs-<timestamp>.zip in the /tmp directory.

To collect DCS agent log files

To collect DCS agent log files, do the following:

  1. Log in as opc user.
  2. Run the following command:

    sudo /opt/oracle/dcs/bin/diagcollector.py
  3. The system returns a message indicating that agent logs are available in a zip file at a specified directory. For example:

    
    [opc@prodpr ~]$ sudo /opt/oracle/dcs/bin/diagcollector.py
    
    Log files collected to :/tmp/dcsdiag/diagLogs-1234567890.zip
    
    Logs are being collected to:
    
    /tmp/dcsdiag/diagLogs-1234567890.zip
To collect Oracle Grid Infrastructure and Database log files

If an Oracle Grid Infrastructure or Oracle Database patch failed, you can find log files for these failures in the following locations:

Oracle Grid Infrastructure

$GI_HOME/cfgtoollogs/

Oracle Database

$ORACLE_HOME/cfgtoollogs/