Troubleshoot Update Failures

Update operations can fail for various reasons. Typically, an operation fails because a database node is down, there is insufficient space on the file system, or the database host cannot access the object store.

This article includes information to help you determine the cause of the failure and fix the problem. The information is organized into several sections, based on the error condition.

If you already know the cause, you can skip to the topic with the suggested solution. Otherwise, use the Identify the Cause of Failure topic to get started.

The following topics are covered in this article:

Tip:

You can also create serial console connections to troubleshoot your system in single-user mode. For information on creating a serial console connection in the OCI Console, see Manage Serial Console Connection to the DB System.

Identify the Cause of Failure

In the OCI Console, you can identify a failed update operation by viewing the update history of a DB system or an individual database. An update that was not successfully applied displays a status of Failed and includes a brief description of the error that caused the failure. If the error message does not contain enough information to point you to a solution, you can use the database CLI and log files to gather more data. Then, refer to the applicable section in this article for a solution.

Identify the Root Cause of the Update Operation Failure

  1. Log on to the host as the root user and navigate to the /opt/oracle/dcs/bin/ directory.

  2. Determine the sequence of operations performed on the database.

    dbcli list-jobs

    Note the last job ID listed with a status other than Success.

  3. With the job ID you noted from the previous step, use the following command to check the details of that job:

    dbcli describe-job -i <job_ID> -j

    Typically, running this command is enough to reveal the root cause of the failure.

  4. If you require more information, review the /opt/oracle/dcs/log/dcs-agent.log file.

    You can find the job ID in this file by using the timestamp returned by the job report in step 2.

  5. If the update failure is on a 2-node RAC database, perform steps 3 and 4 on both nodes.

Database Service Agent Issues

Your database makes use of an agent framework to allow you to manage it through the Oracle Cloud platform.

Resolve Update Failures Caused by a Stopped Agent

Occasionally you might need to restart the dcsagent program if it has the status of stop/waiting to resolve a update failure.

Restart the Database Service Agent

  1. From a command prompt, check the status of the agent:

    initctl status initdcsagent
  2. If the agent is in the stop/waiting state, try to restart the agent:

    initctl start initdcsagent
  3. Check the status of the agent again to confirm that it has the start/running status:

    initctl status initdcsagent

Resolve Update Failures Caused by an Agent That Needs to Be Updated

Update operation can also fail if your agent needs to be updated. The system gives the following error message for this failure:

Current DcsAgent version is less than or equal to minimum required version.

To resolve this issue, perform the steps in the following section.

Contact Oracle Support to Update the OCI Database Service Agent

  1. Confirm that the agent (dcsagent) and DCS Admin program (dcsadmin) are running using the following commands:

    initctl status initdcsagent
    initctl status initdcsadmin
  2. If these programs are not running, use the following commands to restart them:

    initctl start initdcsagent
    initctl start initdcsadmin
  3. Follow the instructions in Get Additional Help to collect your DCS agent log files.
  4. Contact Oracle Support for assistance with updating the agent.

Object Store Connectivity Issues

The DB system and database updates are stored in OCI Object Storage. Therefore, successful update operations require connectivity between the DB system host and the Object Storage location from which the updates are downloaded.

Ensure Your Database Host Can Connect to OCI Object Storage

  1. Use the following command to verify the host can access OCI Object Storage:

    dbcli describe-latestpatch

    Example output indicating success:

    componentType   availableVersion
    --------------  --------------
    gi              12.2.0.1.180417
    gi              12.1.0.2.180417
    db              11.2.0.4.180417
    db              12.2.0.1.180417
    db              12.1.0.2.180417
    oak             12.1.2.11.3
    oak             12.2.1.1.0

    Example output indicating failure:

    DCS-10032:Resource patch metadata is not found.Failed to download patchmetadata from objectstore
  2. If you cannot connect to the Object Store, refer to Back Up a Database Using the Console for how to configure Object Store connectivity.

Host Issues

One or more of the following conditions on the database host can cause update operations to fail:

Database Node Not Running During the Update Operation

All nodes of the database must be active and running while an update operation is in progress, whether you are updating the DB system or the database home. Use the OCI Console to check that the status of each node is AVAILABLE, and start the node, if needed.

The File System Is Full

Update operations require a minimum of 15 GB of free space in the /u01 directory on the host file system. Use the df -h command on the host to check the available space. If the file system has insufficient space, you can remove old log or trace files to free up space.

Oracle Clusterware Issues

The Oracle Clusterware Is Not Running

Oracle Clusterware enables servers to communicate with each other so that they can function as a collective unit. The cluster software program must be up and running on the DB system for update operations to complete. Occasionally you might need to restart the Oracle Clusterware to resolve a update failure.

Restart the Oracle Clusterware

  1. From command prompt, check the status of Oracle Clusterware:

    crsctl check crs

    Output:

    CRS-4638: Oracle High Availability Services is online
    CRS-4537: Cluster Ready Services is online
    CRS-4529: Cluster Synchronization Services is online
    CRS-4533: Event Manager is online

    For more detailed status information, you can run crsctl stat res -t.

  2. If Oracle Clusterware is not online, try to restart the program:

    crsctl start crs
  3. Check the status of Oracle Clusterware to confirm that it is online:

    crsctl check crs

The Oracle Grid Infrastructure (GI) is Not Updated

This problem occurs when you try to update a database before you update the DB system of that database. The error description indicates that the Oracle Grid Infrastructure must be updated first. To resolve this issue, update the DB system to latest available version. After you update the DB system, you can retry the database update operation.

To get the current and latest-available GI versions for the DB system, use the following command:

dbcli describe-component

Database Issues

An improper database state can lead to update failures.

Database Not Running During the Update Operation

The database must be active and running for all of the update tasks to complete. Otherwise, you must run the datapatch task manually.

Check That the Database Is Active and Running

Use the following command to check the state of your database, and ensure that any problems that might have put the database in an improper state are resolved:

srvctl status database -d <db_unique_name> -verbose

The system returns a message including the database instance status. The instance status must be Open for the update operation to succeed.

If the database is not running, use the following command to start it:

srvctl start database -d <db_unique_name> -o open

If the database is mounted but does not have the Open status, use the following commands to access the SQL*Plus command prompt and set the status to Open:

sqlplus / as sysdba
alter database open;

Run the datapatch Task

Before you run the datapatch command, ensure that all pluggable databases (PDBs) are open. To open a PDB, you can use SQL*Plus to execute ALTER PLUGGABLE DATABASE <pdb_name> OPEN READ WRITE; against the PDB.

$ORACLE_HOME/OPatch/datapatch

The datapatch command should be run on each database home.

Get Additional Help

If you were unable to resolve the problem using the information in this article, follow the procedures below to collect relevant database and diagnostic information. After you have collected this information, contact Oracle Support.

Collect Diagnostic Information Regarding Failed Jobs

  1. Log on to the host as the root user and navigate to the /opt/oracle/dcs/bin/ directory.

  2. Run the following two commands to generate information about the failed job:

    dbcli list-jobs | grep -i <dbname>
    dbcli describe-job -i <job_ID> -j

    The <job_ID> in the second command should be the ID of the latest failed job reported from the first command.

  3. Run the diagnostics collector script to create a zip file with the diagnostic information for Oracle Support Services.

    diagcollector.py

    This command creates a file named diagLogs-<timestamp>.zip in the /tmp directory.

Collect DCS Agent Log Files

To collect DCS agent log files, do the following:

  1. Log in as opc user.
  2. Run the following command:

    sudo /opt/oracle/dcs/bin/diagcollector.py
  3. The system returns a message indicating that agent logs are available in a zip file at a specified directory. For example:

    Log files collected to :/tmp/dcsdiag/diagLogs-1234567890.zip
    Logs are being collected to:
    /tmp/dcsdiag/diagLogs-1234567890.zip

Collect Oracle Grid Infrastructure and Database Log Files

If an Oracle Grid Infrastructure or Oracle Database update failed, you can find log files for these failures in the following locations:

Oracle Grid Infrastructure

$GI_HOME/cfgtoollogs/

Oracle Database

$ORACLE_HOME/cfgtoollogs/