Using NVIDIA GPU Cloud with Oracle Cloud Infrastructure

NVIDIA GPU Cloud (NGC) is a GPU-accelerated cloud platform optimized for deep learning and scientific computing. This topic provides an overview of how to use NGC with Oracle Cloud Infrastructure.

NVIDIA makes available on Oracle Cloud Infrastructure a customized Compute image that is optimized for the NVIDIA Tesla Volta and Pascal GPUs. Running NGC containers on this instance provides optimum performance for deep learning jobs.

Warning

Avoid entering confidential information when assigning descriptions, tags, or friendly names to your cloud resources through the Oracle Cloud Infrastructure Console, API, or CLI.

Prerequisites

Launching an Instance based on the NGC Image

Using the Console

  1. Open the Console. For steps, see Signing In to the Console.

  2. Open the navigation menu. Under Core Infrastructure, go to Compute and click Instances.

  3. Select a Compartment that you have permission to work in.
  4. Click Create Instance.
  5. Enter a name for the instance.

  6. To select the NGC image, do the following:

    1. Click Change Image.
    2. On the Oracle Images tab, select the check box next to NVIDIA GPU Cloud Machine Image.
    3. Review and accept the terms of use, and then click Select Image.
  7. Select the Availability Domain that you want to create the instance in.
  8. In the Instance type section, select Virtual Machine or Bare Metal Machine.

  9. In the Shape section, click Change Shape. For Instance type, select Virtual Machine or Bare Metal Machine. Then, select a shape for the instance. For more information about shapes, see Compute Shapes.

  10. In the Configure networking section, select the virtual cloud network (VCN) compartment, VCN, subnet compartment, and subnet.

  11. In the Add SSH keys section, upload the public key portion (.pub) of the key pair that you want to use for SSH access to the instance. Browse to the key file that you want to upload, or drag and drop the file into the box.

  12. Click Create.

You should now see the NGC instance with the status of Provisioning. After the status has changed to Running, you can connect to the instance. For general information about launching Compute instances, see Creating an Instance.

See the following topics for steps to access and work with the instance:

When you connect to the instance using SSH you will be prompted for the NGC API key. If you supply the API key at the prompt, the instance will automatically log you into the NGC container registry so that you can run containers from the registry. You can choose not to supply the API key at the prompt and still log in to the instance. You can then log in later to the NGC container registry. See Logging in to the NGC Container Registry for more information.

Using the CLI

Oracle Cloud Infrastructure provides a Command Line Interface (CLI) you can use to complete tasks. For more information, see Quickstart and Configuring the CLI.

Use the launch command to create an instance, specifying image for sourceType and the image OCID ocid1.image.oc1..aaaaaaaaknl6phck7e3iuii4r4axpwhenw5qtnnsk3tqppajdjzb5nhoma3q in InstanceSourceDetails for LaunchInstanceDetails.

Using the File Storage Service for Persistent Data Storage

You can use the File Storage service for data storage when working with NGC. For more information, see Overview of File Storage. See the following tasks for creating and working with the File Storage service:

Using the Block Volume Service for Persistent Data Storage

You can use the Block Volume service for data storage when working with NGC. For more information, see Overview of Block Volume. See the following tasks for creating and working with the Block Volume service:

You can also use the CLI to manage block volumes, see the volume commands.

Examples of Running Containers

You first need to log into the NGC container registry. You can skip this section if you provided your API key when logging into the instance via SSH. If you did not provide your API key when connecting to your instance, then you must perform this step.

To log into the NGC container registry
Example: MNIST Training Run Using PyTorch Container
Example: MNIST Training Run Using TensorFlow Container