Troubleshooting Exadata Cloud@Customer Systems

These topics cover some common issues you might run into and how to address them.

Patching Failures on Exadata Cloud@Customer Systems

You can patch Oracle Database and Oracle Grid Infrastructure using the dbaascli utility and update the Cloud Tooling on Exadata Cloud@Customer.

Patching operations can fail for various reasons. Typically, an operation fails because a database node is down, there is insufficient space on the file system, or the database host cannot access the object store.

Determining the Problem

In the Console, you can identify a failed patching operation by viewing the patch history of an Exadata Cloud@Customer system or an individual database.

A patch that was not successfully applied displays a status of Failed and includes a brief description of the error that caused the failure. If the error message does not contain enough information to point you to a solution, you can use the database CLI and log files to gather more data. Then, refer to the applicable section in this topic for a solution.

Troubleshooting and Diagnosis

Diagnose the most common issues that can occur during the patching process of any of the Exadata Cloud@Customer components.

Host Issues

One or more of the following conditions on the database host can cause patching operations to fail.

File System is Full

Patching operations require a minimum of 25 GB for Oracle Grid Infrastructure patching or 15 GB for Oracle Database patching. If the required Oracle home locations do not not meet the storage requirements, then an error message like the following can be observed during the patching pre-check operation:
[FATAL] [DBAAS-31009] - One or more Oracle patching pre-checks resulted in error conditions that needs to be
        addressed before proceeding: not enough space for s/w backups   ACTION: Verify the logs at /var/opt/oracle/log/exadbcpatch.

Use the df -h command on the host to check the available space. If the file system has insufficient space, you can remove old log or trace files to free up space.

Nodes Connectivity Problems

Cloud tooling relies on the proper networking and connectivity configuration between nodes of a given cluster. If the configuration is not set properly, this may incur in failures on all the operations that require cross-node processing. One example can be not being able to download the required files to apply a given patch. In such case and error like the following one would be observed upon a given patch precheck or apply request:
[FATAL] [DBAAS-31009] - One or more Oracle patching pre-checks resulted in error conditions that needs to be
        addressed before proceeding: % Total % Received % Xferd Average Speed Time Time Time Current
        Dload Upload Total Spent Left Speed0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- 0:00:01 --:--:-- 0curl: (7) Failed connect to [host address]

Given the case, you can perform the following actions:

  • Verify that the node or the URL is reachable by using the following commands:
    ping hostname
    curl target url
  • Verify that your DNS configuration is correct so that the relevant nodes addresses are resolvable within the VM cluster.
  • Refer to the relevant Cloud Tooling logs as instructed in the Obtaining Further Assistance section and contact Oracle Support for further assistance.

Oracle Grid Infrastructure Issues

One or more of the following conditions on Oracle Grid Infrastructure can cause patching operations to fail.

Oracle Grid Infrastructure is Down

Oracle Clusterware enables servers to communicate with each other so that they can function as a collective unit. The cluster software program must be up and running on the VM Cluster for patching operations to complete. Occasionally you might need to restart the Oracle Clusterware to resolve a patching failure.

In such cases, verify the status of the Oracle Grid Infrastructure as follows:
[grid@host:$GRID_HOME/bin]$ ./crsctl check cluster
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
If Oracle Grid Infrastructure is down, then restart by running the following commands:
crsctl start cluster -all
crsctl check cluster

Oracle Grid Infrastructure Upgrade Pre-check Failures

During the pre-check operation, many failures can be reported as the requested databases to be patched, fails in meeting the minimum requirements to perform the patching operation. An example of the required command is shown as it follows:
[root@host:~][0]# dbaascli patch db prereq --patchid <patch id> --dbnames GRID
DBAAS CLI version 19.4.4.2.0
Executing command patch db prereq --patchid LATEST --dbnames grid
INFO: DBCS patching
...

Patch ID not being recognized

If Cloud tooling fails to identify the specified patch ID to verify, then an error like the following one will be observed:
[FATAL] [DBAAS-10002] - The provided value for the parameter patchnum is invalid: Incorrect patchnum.
ACTION: Verify the corresponding application usage and/or logs at /var/opt/oracle/log/exadbcpatchmulti and try again.

To verify that the specified patch ID is correct, confirm that the specified patch ID is listed as an available patch on the Console.

If the specified patch ID is listed and if the prerequisite operation still fails to recognize the patch ID, then refer to the relevant Cloud Tooling logs as instructed in the Obtaining Further Assistance section and contact Oracle Support for further assistance.

Specific Pre-check Validation Failed

Once that the pre-check validation starts, Cloud tooling will perform a series of validations to determine whether or not the minimum requirements to perform the requested patching operation are met. If any of this minimum requirements are not met, then a failure of the following will be observed:
[FATAL] [DBAAS-31009] - One or more Oracle patching pre-checks resulted in error conditions that needs to be addressed before proceeding: <Specific Pre-check Validation Failure>

Depending on the specific failed prerequisite validation, the corresponding corrections can be performed on the environment or the Oracle home if required. Once that those corrections have been performed, then the operation can be reattempted.

If the failure persist, then refer to the relevant Cloud Tooling logs as instructed in the Obtaining Further Assistance section and contact Oracle Support for further assistance.

Oracle Grid Infrastructure Patch Apply Failures

During the actual installation of the requested patch for the corresponding Oracle Grid Infrastructure, the procedure may fall into error or unexpected conditions as the following example:
[root@host:~][0]# dbaascli patch db apply --patchid <patch id> --dbnames GRID
...
ERROR: Grid upgrade failed. Please check corresponding log in /var/opt/oracle/log/exadbcpatch

If a failure is detected on a given node during the patch installation process, then do the following:

  • Address the issue that originated the failure if its evident and then re-try the same command so that the operation can be resumed from the failure point.
  • If after retrying the command the issue persists or if its not possible to identify the root cause of the failure, then refer to the relevant Cloud Tooling logs as instructed in the Obtaining Further Assistance section and contact Oracle Support for further assistance.

Oracle Databases Issues

An improper database state can lead to patching failures.

Oracle Database is Down

The database must be active and running on all the active nodes so the patching operations can be completed successfully across the cluster.

Use the following command to check the state of your database, and ensure that any problems that might have put the database in an improper state are resolved:
srvctl status database -d db_unique_name -verbose

The system returns a message including the database instance status. The instance status must be Open for the patching operation to succeed.

If the database is not running, use the following command to start it:
srvctl start database -d db_unique_name -o open

Oracle Database Patching Pre-check Failures

During the pre-check operation, many failures can be reported as the requested databases to be patched, fails in meeting the minimum requirements to perform the patching operation. An example of the required command is shown as it follows:
[root@host:~][0]# dbaascli patch db prereq --patchid <patch id> --dbnames <database 1,...,database n>
DBAAS CLI version 19.4.4.2.0
Executing command patch db prereq --patchid LATEST --dbnames grid
INFO: DBCS patching
...

Patch ID not being recognized

If Cloud tooling fails to identify the specified patch ID to verify, then an error like the following one will be observed:
[FATAL] [DBAAS-10002] - The provided value for the parameter patchnum is invalid: Incorrect patchnum.
ACTION: Verify the corresponding application usage and/or logs at /var/opt/oracle/log/exadbcpatchmulti and try again.

To verify that the specified patch ID is correct, confirm that the specified patch ID is listed as an available patch on the Console.

Alternatively, you can verify the installed patch level into a given home by using the following command as well:
dbaascli dbhome info

If the specified patch ID is listed and if the prerequisite operation still fails to recognize the patch ID, then refer to the relevant Cloud Tooling logs as instructed in the Obtaining Further Assistance section and contact Oracle Support for further assistance.

Specific Prereq Validation Failed

Once that the prerequisite validation starts, Cloud tooling will perform a series of validations to determine whether or not the minimum requirements to perform the requested patching operation are met. If any of this minimum requirements are not met, then a failure of the following will be observed:
[FATAL] [DBAAS-31009] - One or more Oracle patching pre-checks resulted in error conditions that needs to be addressed before proceeding: <Specific Prereq Validation Failure>

Depending on the specific failed prereq validation, the corresponding corrections can be performed on the environment or the Oracle home if required. Once that those corrections have been performed, then the operation can be re-attempted.

If the failure persist, then refer to the relevant Cloud Tooling logs as instructed in the Obtaining Further Assistance section and contact Oracle Support for further assistance.

Oracle Database Patch Apply Failures

During the actual installation of the requested patch for the corresponding Oracle Grid Infrastructure, the procedure may fall into error or unexpected conditions as the following example:
[root@host:~][0]# dbaascli patch db apply --patchid <patch id> --dbnames <database 1,...,database n>
...
ERROR: Error during creation, empty dbhome patching failed. Check the corresponding logs

If its not possible to identify the root cause of the failure and its corresponding solution, then refer to the relevant Cloud Tooling logs as instructed in the Obtaining Further Assistance section and contact Oracle Support for further assistance.

Oracle Cloud Tooling Issues

Not Applicable Cloud Tooling Patches Available

One issue that may happen if the Cloud patch installation is tried right away is that the operation fails because there are no applicable RPMs to install. An example of such condition is shown as it follows:
[root@host:~]# dbaascli patch tools apply --patchid LATEST
DBAAS CLI version 19.4.4.2.0
Executing command patch tools apply --patchid LATEST
...
[FATAL] [DBAAS-33032] - An error occurred while performing the installation of the Oracle DBAAS tools: No applicable dbaastools rpms found.
ACTION: Verify the logs at /var/opt/oracle/log/exadbcpatch.
To confirm that indeed there are no applicable patches to be installed for Cloud tooling you can run the following command:
dbaascli patch tools list

If the patch level for the Cloud tooling is eligible for patch application and Cloud tooling is not able to disclose any applicable patch ID, then refer to the relevant Cloud Tooling logs as instructed in the Obtaining Further Assistance section and contact Oracle Support for further assistance.

Obtaining Further Assistance

If you were unable to resolve the problem using the information in this topic, follow the procedures below to collect relevant database and diagnostic information. After you have collected this information, contact Oracle Support.

Related Topics

Collecting Cloud Tooling Logs

Use the relevant log files that could assist Oracle Support for further investigation and resolution of a given issue.

DBAASAPI Logs

These logs are applicable for actions that are performed from the Console.

/var/opt/oracle/log/dbaasapi/db/db
  • Job HASH.log corresponding to the Backend API request
Note

All the log files are timestamped records so that the issues can be traced back to some point in the past during the DBSystem operation.

DBAASCLI Logs

/var/opt/oracle/log/dbaascli
  • dbaascli.log

DBAAS ExaPatch Logs

/var/opt/oracle/log/exadbcpatchmulti:
  • exadbcpatchmulti.log
  • exadbcpatchmulti-cmd.log
/var/opt/oracle/log/exadbcpatchsm:
  • exadbcpatchsm.log
/var/opt/oracle/log/exadbcpatch:
  • exadbcpatch.log
  • exadbcpatch-cmd.log
  • exadbcpatch-dmp.log
  • exadbcpatch-sql.log
Note

All the log files are timestamped records so that the issues can be traced back to some point in the past during the DBSystem operation.

Collecting Configuration Tools Logs

$GRID_BASE/cfgtoollogs
$ORACLE_BASE/cfgtoollogs

Collecting Oracle Diagnostics

To collect the relevant Oracle diagnostic information and logs, run the dbaas_diag_tool.pl script.
/var/opt/oracle/misc/dbaas_diag_tool.pl

For more information about the usage of this utility, see My Oracle Support note 2219712.1.

Failed Patch Modifies the Home Name in oraInventory with the Suffix "_PIP"

Description: The image-based patching process temporarily changes the name of the Home that is being patched in the oraInventory adding the suffix '_pip' for patching in progress. For example, changing OraDB19Home1 to OraDB19Home1_pip.

When a patch fails on node 2, the name is not reverted to the original name. This causes subsequent Home installed on node 2 to use the Home name OraDB19Home1.

Action: On the failing node run the following command to clear the inventory from the corresponding _pip entrance:
/var/opt/oracle/exapatch/exadbcpatchmulti -rollback_async patch id  
-instance1=hostname:ORACLE_HOME path -dbname=dbname1 -run_datasql=1

After performing the local rollback, resume applying the corresponding patching.

Database is Down While Performing Downgrade to Release 11.2 or 12.1

Description: Error is thrown as follows while running the database upgrade command with the –revert flag.
[FATAL] [DBAAS-54007] - An error occurred when
open the a121db database with resetlog options: ORA-01034: ORACLE not
available.

Action: Apply one-off patching for bug 31561819 prior to attempting the downgrade if required for Oracle Database releases 11.2 and 12.1.

If the issue persists and impacts a given database, then, per a similar bug 31762303 filed by MAA team, Oracle recommends that you run the following commands after the failure to complete database downgrade:
/u02/app/oracle/product/19.0.0.0/dbhome_3/bin/srvctl downgrade database -d
<DB_UNIQUE_NAME> -o /u02/app/oracle/product/11.2.0/dbhome_2 -t 11.2.0.4
/u02/app/oracle/product/11.2.0/dbhome_2/bin/srvctl setenv database -d <DB_UNIQUE_NAME> -T
"TNS_ADMIN=/u02/app/oracle/product/11.2.0/dbhome_2/network/admin/<DB_NAME>"

After Database Upgrade, the Standby Database Remains in Mounted State in Oracle Data Guard Configurations

Description: After performing the upgrade as recommended in Oracle MOS note 2628228.1, the standby database is left opened in MOUNT state.

Action: If it is required to bring the standby database back to read-only mode, then proceed with the following steps:

Run the following query on the primary database:
SELECT DEST_ID,THREAD#,sequence#,RESETLOGS_CHANGE#,STANDBY_DEST,ARCHIVED,APPLIED,status,to_char(completion_time,'DD-MM-YYYY:hh24:mi') from  v$archived_log;

Ensure that all the logs have been replicated successfully to the standby database after the upgrade operation.

Then on the standby database, run the following commands:
# dbaascli database stop --dbname standby dbname
# dbaascli database start --dbname standby dbname

Then the database should be open in read-only mode again.

Primary Database Fails to Downgrade to 18c in Oracle Data Guard Configurations

Description: The following failure is observed while downgrading a primary Data Guard database from 19c to 18c:
[FATAL] [DBAAS-54007]
- An error occurred when open the db163 database with resetlog options:
ORA-16649: possible failover to another database prevents this database from
being opened.

Action: Follow these steps to fix the issue:

  1. Open the initdbname.ora file:
    /var/opt/oracle/dbaas_acfs/upgrade_backup/dbname/initdbname.ora
  2. Set the *.dg_broker_start parameters to false and save the changes:
    *.dg_broker_start=FALSE
  3. Bring down the local instance and open it back in mount mode:
    startup mount pfile=’/var/opt/oracle/dbaas_acfs/upgrade_backup/dbname/initdbname.ora’;
  4. Then open it with the following command:
    alter database open resetlogs;
  5. Re-enable the Data Guard broker.
    alter system set dg_broker_start=true scope=BOTH;
  6. Restore the spfile.
    create spfile=’DATA DISKGROUP/db_unique_name/spfiledbname.ora’ from pfile=’ /var/opt/oracle/dbaas_acfs/upgrade_backup/dbname/initdbname.ora’
  7. Shut down the local instance.
    shutdown immediate;
  8. Manually downgrade the service.
    19c Oracle home/bin/srvctl downgrade database –d db_unique_name -oraclehome 18c Oracle home path -targetversion 18.0.0.0.0
  9. Restore the TNS_ADMIN variable.
    19c Oracle home/bin/srvctl setenv database –d db_unique_name -t oraclehome “18c Oracle home/network/admin/dbname
  10. Bounce the database across the cluster.
    18c Oracle home/bin/srvctl stop database –d db_unique_name
    18c Oracle home/bin/srvctl start database –d db_unique_name

Patching Primary and Standby Databases Configured with Oracle Data Guard Fails

Description: In OCI environments, patching primary or secondary nodes using the exadbcpatchmulti tool fails if there's no SSH connectivity between the primary and standby nodes.

Action: Depending on the node you're patching, add the -primary or -secondary flag. You can add flags to identify the nodes only if you're patching using the exadbcpatchmulti tool.

For example:

To patch standby nodes, use -secondary flag.
/var/opt/oracle/exapatch/exadbcpatchmulti action [patchid] dbname|instance_num -secondary
To patch primary nodes, use -primary flag.
/var/opt/oracle/exapatch/exadbcpatchmulti action [patchid] dbname|instance_num -primary
Note

Always patch standby nodes first and then proceed to primary nodes.