Quick Start: Create a Big Data Service Cluster

Many options are available for creating clusters, from small, flexible clusters operating in virtual environments to large, powerful clusters operating on the "bare metal" of dedicated hosts.

Note

This topic is part of the Oracle Big Data Service Quick Start.

These instructions tell you how to create a small cluster for development purposes. It's good for developing applications and testing functionality at a minimal cost. Its profile is:

  • The cluster is highly available (HA) and secure.
  • Master/utility nodes:
    • Number of nodes: 4 (HA clusters always have 4 utility nodes. Non-HA clusters always have 2.)
    • Shape used for each node VM.Standard2.4.
    • Cores in each node: 8
    • Storage per node: 750 GB block storage
  • Worker nodes:
    • Number of nodes: 3
    • Shape used for each node VM.Standard2.1 or VM.Standard2.4
    • Cores in each node: 2
    • Storage per node: 750 GB block storage

See Supported Node Shapes and Service Limits. to understand the other available options.

Requirements

  • You must be a member of a group with permissions to create a cluster, for example the bda-admins group described in Quick Start: Set Up Your Environment.
  • You must know the name of your tenancy and your credentials for signing into the Oracle Cloud Console.
  • You (or another administrator) must have completed the prerequisite tasks described in Preparing for Big Data Service.

Create the Cluster

To create the cluster,

  1. Sign into your tenancy in the Oracle Cloud Console. See Signing into the Console in the Oracle Cloud Infrastructure documentation.

  2. In the Oracle Cloud Console, open the navigation menu navigation menu. Under AI and Big Data, select Big Data.

  3. In the panel on the left, click Compartment and then select the name of the compartment where you want to create the cluster, for example mycompartment.

  4. Click Create Cluster, and then enter your configuration choices in the Create Cluster dialog box:

    • Cluster Name - Enter a name for the cluster, for example, mycluster.
    • Cluster Admin Password - Enter a password to be used to access the cluster. This password is also used as the administrator password for Cloudera Manager.
    • Secure & Highly Available (HA). Select the check box if you want to make the cluster secure and highly available (HA). A secure cluster has the full Hadoop security stack, including HDFS Transparent Encryption, Kerberos, and Apache Sentry. You can't change this after the cluster has been created.
    • Hadoop Nodes: Master/Utility Nodes
      • Choose Instance Type - Select Virtual Machine to create master and utility nodes with Oracle Cloud Infrastructure virtual machine (VM) compute shapes.
      • Choose Master/Utility Node Shape - Select VM.Standard2.4 to create a small, flexible cluster.
      • Block Storage Size per Master/Utility Node (in GB) - Enter 250, which provides minimal storage, for this small, flexible cluster. (The default initial block storage for master and utility nodes with standard shapes is 1 TB.)
    • Hadoop Nodes: Worker Nodes
      • Choose Instance Type - Select Virtual Machine to create worker nodes with Oracle Cloud Infrastructure virtual machine (VM) compute shapes.
      • Choose Worker Node Shape - Select VM.Standard2.1 or VM.Standard2.4.
      • Block Storage Size per Worker Node (in GB) - Enter 750. (The default initial block storage for worker nodes with standard shapes is 1 TB.)
      • Number of Worker Nodes - Enter 3.
    • Network Settings: Cluster Private Network

      By default, a cluster private network for your cluster is created in a private Oracle tenancy (that is, not in your customer tenancy). This network is used for private communication between the nodes of the cluster. It's inaccessible from outside hosts.

      • CIDR Block: Enter a CIDR block to specify the range of contiguous IP addresses available for the cluster private network, or accept the default 10.0.0.0/16. This CIDR block cannot overlap the CIDR block ranges of the subnet used for the cluster (which you'll select in the next step, below).
    • Network Settings: Customer Network

      Enter the information below to add the cluster to a Virtual Cloud Network (VCN) in your tenancy.

      • Choose VCN in <compartment>:. Accept the current compartment (for example, mycompartment), or click Change Compartment to select a different one. Then select the name of an existing VCN in that compartment to use for the cluster. The VCN must contain a regional subnet.

      • Choose Regional Subnet in <compartment> - Choose a regional subnet to use for the cluster.

        Important: If you plan to expose IP addresses outside the subnet (for example for accessing a node via the internet), you must select a public subnet for the cluster.

    • Network Settings: Networking Options

      Select Deploy Oracle-managed Service gateway and NAT gateway (Quick Start) to deploy a service gateway and a Network Address Translation (NAT) gateway in the cluster private network. The service gateway enables nodes without public IP addresses to privately access Oracle services, without exposing the data to an internet gateway or a NAT gateway. The NAT gateway enables nodes without public IP addresses to initiate connections to and receive responses from the internet but not to receive inbound connections initiated from the internet. These gateways are managed by Oracle and can't be reconfigured after they are created.

    • Additional Options
      • Upload an SSH public key. You'll use the associated private key to make SSH connections to the cluster
  5. Click Create Cluster.

Review the Cluster

The process of creating the cluster takes some time. You can monitor its progress:

  • Select Big Data from the Oracle Cloud Console navigation menu
  • Select mycluster from the list of clusters
  • Click Work Requests.

Once the cluster creation is complete, you'll see 7 nodes.

  • 2 Master Nodes
  • 2 Utility Nodes
  • 3 Worker Nodes

About the Node Names

The node names follow the order of the nodes in the list. For example, "first utility node" refers to the first utility node in the list of nodes. Physical host names also follow a naming pattern:

[first 7 letters of cluster name][2 letters representing node type][the order in the node list, starting with 0]

In this example, the node names for the cluster mycluster are:

  • Master nodes: myclustmn0, myclustmn1
  • Utility nodes: myclustun0, myclustun1
  • Worker nodes: myclustwn0, myclustwn1, myclustwn2

Locations of Services

Because this is a highly available cluster, the services are distributed as follows:

  • Master Node 1 - myclustmn0

    • HDFS Failover Controller
    • HDFS JournalNode
    • HDFS NameNode
    • Key Trustee KMS Proxy
    • Key Trustee Server Active
    • KTS Active Database
    • Spark History Server
    • YARN JobHistory Server
    • YARN ResourceManager
    • ZooKeeper Server
    • MIT KDC Primary
  • Master Node 2 - myclustmn1

    • HDFS Balancer
    • HDFS Failover Controller
    • HDFS HttpFS
    • HDFS JournalNode
    • HDFS NameNode
    • Hue Load Balancer
    • Hue Server
    • Hue Kerberos Ticket Renewer
    • Key Trustee KMS Proxy
    • KTS Passive Database
    • KTS Passive ResourceManager
    • ZooKeeper Server
    • MIT KDC Secondary
  • Utility Node 1 - myclustun0

    • HDFS JournalNode
    • CM Service Alert Publisher
    • CM Service Event Server
    • CM Service Host Monitor
    • CM Service Navigator Audit Server
    • CM Service Navigator Metadata Server
    • CM Service Reports Manager
    • CM Service Monitor
    • Sentry Server
    • Zookeeper Server
  • Utility Node 2 - myclustun1

    • Hive Metastore Server
    • HiveServer2
    • Hive WebHCat Server
    • Hue Load Balancer
    • Hue Server
    • Hue Kerberos Ticket Renewer
    • Oozie Server
    • Sentry Server
  • Worker Nodes - myclustwn0…2

    • HDFS DataNode
    • YARN
    • NodeManager
    • Hive Gateway
    • Spark Gateway