Managing Cluster Networks

A cluster network is a pool of high performance computing (HPC) instances or GPU instances that are connected with a high-bandwidth, ultra low-latency network. Each node in the cluster is a bare metal machine located in close physical proximity to the other nodes. A remote direct memory access (RDMA) network between nodes provides latency as low as single-digit microseconds, comparable to on-premises HPC clusters.

Cluster networks are designed for highly demanding parallel computing workloads. For example:

  • Computational fluid dynamics simulations for automotive or aerospace modeling
  • Financial modeling and risk analysis
  • Biomedical simulations
  • Trajectory analysis and design for space exploration
  • Artificial intelligence and big data workloads

Cluster networks are built on top of the instance pools feature. Most operations in the instance pool are managed directly by the cluster network, though you can monitor and add tags to the underlying instance pool.

For more information about how to access and store the data that you want to process in your cluster networks, see FastConnect Overview, Overview of File Storage, Overview of Object Storage, and Overview of Block Volume.

Caution

Avoid entering confidential information when assigning descriptions, tags, or friendly names to your cloud resources through the Oracle Cloud Infrastructure Console, API, or CLI.

Supported Shapes

The following shapes support cluster networks:

  • BM.HPC2.36
  • BM.GPU4.8

Typically, to be able to create the multiple HPC or GPU instances that are contained in a cluster network, you must request a service limit increase.

Supported Regions and Availability Domains

Cluster networks are supported in the following regions:

  • Regions in the Oracle Cloud Infrastructure commercial realm:

    • Australia East (Sydney)
    • Australia Southeast (Melbourne)
    • Germany Central (Frankfurt)
    • Japan Central (Osaka)
    • Japan East (Tokyo)
    • Netherlands Northwest (Amsterdam)
    • UK South (London)
    • US East (Ashburn)
    • US West (Phoenix)
  • Regions in the Government Cloud realms:

    • UK Gov South (London)
    • US Gov East (Ashburn)

The availability domain that you create the cluster network in must have cluster network-capable hardware.

Required IAM Policy

To use Oracle Cloud Infrastructure, you must be granted security access in a policy  by an administrator. This access is required whether you're using the Console or the REST API with an SDK, CLI, or other tool. If you get a message that you don’t have permission or are unauthorized, verify with your administrator what type of access you have and which compartment  you should work in.

For administrators: For a typical policy that gives access to cluster networks, see Let users manage Compute instance configurations, instance pools, and cluster networks.

Important

See this known issue for information about the policy statements that are required if the instance configuration or load balancer associated with the cluster network includes defined tags.

Tagging Resources

You can apply tags to your resources to help you organize them according to your business needs. You can apply tags at the time you create a resource, or you can update the resource later with the desired tags. For general information about applying tags, see Resource Tags.

Prerequisites

Create an instance configuration for the instance pool that is managed by the cluster network. To do this:

  1. Create an instance with the following settings:

    • Image or operating system: Click Change Image, and then click Oracle Images. Select the Oracle HPC cluster networking image.
    • Shape: Click Change Shape. Select Bare Metal Machine. Then, select either the BM.HPC2.36 shape or the BM.GPU4.8 shape.

      For more information about these shapes, see Compute Shapes.

  2. Create an instance configuration using the instance that you created in the previous step as a template.

    Optionally, you can delete the instance after you create the instance configuration.

Using the Console

To create a cluster network
  1. Open the navigation menu. Under Core Infrastructure, go to Compute and click Cluster Networks.

  2. Click Create Cluster Network.
  3. Enter a name for the cluster network. It doesn't have to be unique, and you can change it later.
  4. Select the compartment to create the cluster network in.
  5. Select the Availability Domain to run the cluster network in. Only the availability domains with cluster network-capable hardware can be selected.
  6. In the Configure networking section, specify the network that you want to use to administer the cluster network. This network is separate from the closed RDMA network between nodes within the cluster. Enter the following information:

    • Virtual cloud network: The virtual cloud network (VCN) for the cluster network.
    • Subnet: The subnet for the cluster network.
  7. In the Configure instance pool section, enter the following:

    • Instance pool name: A name for the instance pool that is managed by the cluster network.
    • Number of instances: The number of instances in the pool.
    • Instance configuration: Select the instance configuration to use when creating the instances in the cluster network's instance pool, as described in the prerequisites.
  8. Show Tagging Options: Optionally, you can add tags. If you have permissions to create a resource, you also have permissions to add free-form tags to that resource. To add a defined tag, you must have permissions to use the tag namespace. For more information about tagging, see Resource Tags. If you are not sure if you should add tags, skip this option (you can add tags later) or ask your administrator.
  9. Click Create Cluster Network.

    To track the progress of the operation, you can monitor the associated work request. For more information, see Using the Console to View Work Requests.

    For cluster networks with 10 or more instances, the cluster network is created if the required number of instances is available and at least 95% of the instances in the pool launch successfully. For cluster networks with less than 10 instances, all instances in the pool must launch successfully. If the cluster network fails to launch, wait a few minutes, and then try creating it again.

To edit the name of a cluster network
  1. Open the navigation menu. Under Core Infrastructure, go to Compute and click Cluster Networks.

  2. Click the cluster network that you're interested in.
  3. Click Edit Name.
  4. Enter a new name, and then click Save Changes.
To manage tags for a cluster network
  1. Open the navigation menu. Under Core Infrastructure, go to Compute and click Cluster Networks.

  2. Click the cluster network that you're interested in.
  3. Click the Tags tab to view or edit the existing tags. Or click Add Tags to add new ones.

For more information, see Resource Tags.

To delete a cluster network
Caution

When you delete a cluster network, all of its resources are permanently deleted, including associated instances, attached boot volumes, and block volumes.
  1. Open the navigation menu. Under Core Infrastructure, go to Compute and click Cluster Networks.

  2. Click the cluster network that you're interested in.
  3. Click Terminate, and then confirm when prompted.

    To track the progress of the operation, you can monitor the associated work request. For more information, see Using the Console to View Work Requests.