Manually Configuring a Data Science Tenancy

In this tutorial, you set up your tenancy for Data Science and test it by creating a notebook session.

This tutorial is directed at administrator users because they are granted the required access permissions.

In this tutorial, you are:

1. Creating a Data Scientists User Group.

2. Creating a Compartment for Your Work.

3. (Optional) Creating a VCN and Subnet.

4. Creating Policies.

5. Creating a Dynamic Group with Policies.

6. Creating a Notebook Session.

Before You Begin

To perform this tutorial, you must have the following:

  • A paid Oracle Cloud Infrastructure (OCI) account, or a new account with Oracle Cloud promotions. See Request and Manage Free Oracle Cloud Promotions.

  • Administrator privilege for the OCI account.
  • At least one user in your tenancy who wants to access the Data Science service. This user must be created in IAM.

1. Creating a Data Scientists User Group

Create a user group for the data scientists to work in.

  1. Open a supported browser and enter the Console URL:
    https://cloud.oracle.com
  2. Enter your Cloud Account Name, also referred to as your tenancy name, and click Next.
  3. Sign in with your user name and password.
  4. Open the navigation menu and click Identity & Security. Under Identity, click Groups.

    A list of the groups in your tenancy displays.

  5. Click Create Group.
  6. Name the new group data-scientists, and enter a description.
  7. Click Create.
  8. Click Add User to Group.
  9. Select a user to add, and then click Add.
  10. Repeat adding all your data scientist users to the data-scientists group.

2. Creating a Compartment for Your Work

Create a compartment for your data science resources.

  1. Open the navigation menu and click Identity & Security. Under Identity, click Compartments.
  2. Click Create Compartment.
  3. Name the new compartment data-science-work, and enter a description.
  4. Click Create Compartment.
  5. Confirm that the compartment appears in the compartments list.

3. (Optional) Creating a VCN and Subnet

This step is optional. When you create a notebook session in Step 6. Creating a Notebook Session, you can choose to create a default network with the proper setup for notebook sessions.

Important

The default network that the notebook provides is for use by the notebook only. With a default network option, you can skip creating a network and setting up subnets and gateways. If you skip this step and use the default network, you can’t access, modify, or use the notebook's provided default network for other purposes.

This section shows users who require access to their VCNs, how to create a VCN and later, how to choose the recommended subnet for notebook sessions. For example, if you are performing the Scheduling Data Science Job Runs tutorial, you create this network and use it both for the notebook session in Data Science, and the workspace in the Data Integration service.

  1. Open the navigation menu and click Networking and then click Virtual Cloud Networks.
  2. Click Start VCN Wizard.
  3. Select Create VCN with Internet Connectivity and then click Start VCN Wizard.
  4. Enter datascience-vcn for the VCN Name.
  5. Select the data-science-work compartment. This compartment hosts the VCN that you create in this section. It takes time for this new compartment to appear in the compartment list, so refresh the page until it appears.
  6. For Configure VCN and Subnets, keep the defaults:
    • VCN CIDR Block: 10.0.0.0/16
    • Public Subnet CIDR Block: 10.0.0.0/24
    • Private Subnet CIDR Block: 10.0.1.0/24
    • Use DNS hostnames in this VCN: selected
  7. Click Next.
  8. Review the VCN configuration.
  9. Click Create to create the VCN and the related resources such as a public and a private subnet, an internet gateway, a NAT gateway, and a service gateway.

    You use this VCN and its private subnet, Private Subnet-datascience-vcn when you create your notebook session.

  10. Click View Virtual Cloud Network to review your VCN and subnets.
Note

For egress access to the public internet, we recommend that you use a private subnet with a route to a NAT Gateway. A NAT gateway gives instances in a private subnet access to the internet. The VCN that you create in this step creates a private subnet with egress access to the internet through the VCN's NAT Gateway.

4. Creating Policies

Before users start their notebook sessions, you have to configure the Data Science policies.

  1. Open the navigation menu and click Identity & Security. Under Identity, click Policies.
  2. Click Create Policy.
  3. Enter data-science-policy for the Name.
  4. Enter Policy for data science users and service as the Description.
  5. Select the data-science-work compartment.
  6. Click Show manual editor.
  7. Enter the following five policy statements into the Policy Builder field:
    allow service datascience to use virtual-network-family in compartment data-science-work
    allow group data-scientists to manage data-science-family in compartment data-science-work
    allow group data-scientists to use virtual-network-family in compartment data-science-work 
    allow group data-scientists to manage buckets in compartment data-science-work 
    allow group data-scientists to manage buckets in compartment data-science-work 
  8. Click Create to create your policy.

Explanation for the policies:

  • To allow the Data Science service to attach your VCN to your notebook session and route egress traffic from the notebook environment, add:

    allow service datascience to use virtual-network-family in compartment data-science-work
  • To allow the data-scientists group to perform operations on all Data Science resources in the data-science-work compartment (projects, notebook sessions, models, model deployments, work requests, jobs, and job runs), add:

    allow group data-scientists to manage data-science-family in compartment data-science-work
  • To allow those data scientists to use the VCN, you created and attach it to their notebook session, add:

    allow group data-scientists to use virtual-network-family in compartment data-science-work 
  • To allow those data scientists to create and manage buckets, such as adding artifacts and conda environments to buckets, add:

    allow group data-scientists to manage buckets in compartment data-science-work
    allow group data-scientists to manage objects in compartment data-science-work 
Tip

Instead of specifying which resources to manage such as buckets, objects, or virtual network family, to allow data scientists administrative rights to their compartment, in which they can manage all the resources of OCI services, replace the preceding five policies with the following two policies:
allow group data-scientists to manage all-resources in compartment data-science-work
allow service datascience to use virtual-network-family in compartment data-science-work 

5. Creating a Dynamic Group with Policies

Create a dynamic group for Data Science resources and allow this dynamic group to access other OCI resources, such as Object Storage and Logging.

To give permission to OCI resources to access other OCI resources, first, you add the resources to a dynamic group, instead of a user group. Then you write policies to allow the dynamic group to access specified resources. Here, your dynamic group has three Data Science resources: notebook sessions, model deployments, and job runs.

  1. Open the navigation menu and click Identity & Security. Under Identity, click Compartments.
  2. Click the data-science-work compartment.
  3. For the OCID attribute, click Copy to save the entire OCID to your notepad.
  4. In the trail that displays the current page, click Compartments to return to the list of compartments.
  5. Click Dynamic Groups.
  6. Click Create Dynamic Group.
  7. Enter the following:
    • Name: data-science-dynamic-group
    • Description: Data Science dynamic group
  8. In the Matching Rules section, click Match any rules defined below.
  9. Enter the following three matching rules. Replace <compartment-ocid> with the compartment OCID that you copied.
    Rule 1:
    ALL {resource.type='datasciencenotebooksession', resource.compartment.id='<compartment-ocid>'}

    The preceding matching rule means that all notebook sessions created in your compartment are members of the data-science-dynamic-group.

    Click Additional Rule and add the following rule:

    Rule 2:

    ALL {resource.type='datasciencemodeldeployment', resource.compartment.id='<compartment-ocid>'}

    The preceding matching rule means that all model deployments created in your compartment are members of the data-science-dynamic-group.

    Click Additional Rule and add the following rule:

    Rule 3:

    ALL {resource.type='datasciencejobrun', resource.compartment.id='<compartment-ocid>'}

    The preceding matching rule means that all job runs created in your compartment are members of the data-science-dynamic-group.

  10. Click Create.

    Next, write policies to allow resources of this dynamic group to access other OCI services.

  11. In the trail that displays the current page, click Identity.
  12. Click Policies.
  13. Click Create Policy.
  14. Enter the following:
    • Name: data-science-dynamic-group-policy
    • Description: Policy for the Data Science dynamic group
  15. Instead of the data-science-work compartment, select the top-most compartment, which is your tenancy.
    Important

    Your policy fails to create if you don't use tenancy.
  16. Click Show manual editor.
  17. Enter the following policy statements into the Policy Builder field:
    allow dynamic-group data-science-dynamic-group to manage data-science-family in compartment data-science-work
    allow dynamic-group data-science-dynamic-group to manage dataflow-family in compartment data-science-work
    allow dynamic-group data-science-dynamic-group to read compartments in tenancy
    allow dynamic-group data-science-dynamic-group to read users in tenancy
    allow dynamic-group data-science-dynamic-group to use log-content in compartment data-science-work
    allow dynamic-group data-science-dynamic-group to use log-groups in compartment data-science-work
    allow dynamic-group data-science-dynamic-group to manage object-family in compartment data-science-work
  18. Click Create to create the policy.

You can use this dynamic group to give notebook sessions and model deployments that are in the data-science-work compartment, access to other OCI resources in the tenancy.

Explanation for the policies:

  • To allow notebook sessions to perform CRUD operations on entries in the model catalog, projects, and notebook session resources, add:

    allow dynamic-group data-science-dynamic-group to manage data-science-family in compartment data-science-work
    
  • To allow notebook sessions to perform CRUD operations on Data Flow applications and runs, add:

    allow dynamic-group data-science-dynamic-group to manage dataflow-family in compartment data-science-work
  • To allow notebook sessions to list and read compartments and user names that are in the tenancy, add:

    allow dynamic-group data-science-dynamic-group to read compartments in tenancy
    allow dynamic-group data-science-dynamic-group to read users in tenancy
  • To allow model deployments to emit logs to the Logging service, add:

    allow dynamic-group data-science-dynamic-group to use log-content in compartment data-science-work
  • To allow job runs to create logs and record job run details in the Logging service, add:

    allow dynamic-group data-science-dynamic-group to use log-groups in compartment data-science-work
  • To allow notebook sessions and model deployments to read and write files to object storage buckets, in the data-science-work compartment, add:

    allow dynamic-group data-science-dynamic-group to manage object-family in compartment data-science-work
Tip

  • The preceding policy allows model deployments to access any bucket in the data-science-work compartment.
  • To give model deployments read access to specific buckets outside the data-science-work compartment, specify the bucket names and their compartments in your policy.
  • Example: To allow model deployments to access published conda environments from bucket published-conda-env, and model artifacts from bucket model-artifacts, add:
    allow dynamic-group data-science-dynamic-group to read objects in compartment <another-compartment> where ANY {target.bucket.name='published-conda-envs', target.bucket.name='model-artifacts'}
  • If your policy statements mention tenancy or include compartments outside the data-science-work compartment, then in the Create Policy dialog, for the Compartment option, select <your-tenancy> (root). This way, in addition to your compartment, the policy can include rules for other compartments in the tenancy.

6. Creating a Notebook Session

Lastly, create a notebook session and test its access to the public internet.

  1. Open the navigation menu and click Analytics & AI. Under Machine Learning, click Data Science.
  2. Click Create Project.
  3. Select the data-science-work compartment.
  4. (Optional) Enter Initial Project for the Name.
  5. (Optional) Enter my first project for the Description.
  6. Click Create.
  7. Click Create notebook session.
  8. For Compartment, select data-science-work.
  9. (Optional) Enter my-first-notebook-session for the Name.
  10. For Compute shape, click Select.
  11. Choose the following options:
    • Instance Type: Virtual machine
    • Shape Series: Intel
    • Shape Name: VM.Standard3.Flex
  12. For VM.Standard3.Flex, keep the default allocations:
    • Number of OCPUs: 1
    • Amount of memory (GB): 16
  13. Click Select shape.
  14. For Block storage size, enter 100 GBs to attach to your virtual machine.
  15. Click Custom networking, and select the datascience-vcn VCN and Private Subnet-datascience-vcn subnet to route egress traffic from your notebook session.
    Instead of Custom networking, you can choose the Default networking option which creates the networking for you. With Default networking, you can skip the Step 3. Creating a VCN and Subnet section of this tutorial. This tutorial shows custom networking for users with custom settings, so they can see the steps.
  16. Click View detail page on clicking create.
  17. Click Create to create your first notebook session.

    Creating the notebook session takes a few minutes. When the notebook session status turns to Active, you can open the notebook session.

  18. Click Open.
  19. Enter your Oracle Cloud Infrastructure credentials to access the JupyterLab UI.
  20. If you don't have a tab called Launcher, click File, and then New Launcher.
  21. In the Launcher, under Other, click the Terminalicon to start a new terminal session.
  22. To perform a simple test, check that you can access the public internet from your notebook session by running this command:

    You should see a response similar to:

    (base) bash-4.2$ wget --spider https://www.oracle.com
    Spider mode enabled. Check if remote file exists.
    --<date>--  https://www.oracle.com/
    Resolving www.oracle.com (www.oracle.com)... 
    Connecting to www.oracle.com (www.oracle.com)... connected.
    HTTP request sent, awaiting response... 200 OK
    Length: unspecified [text/html]
    Remote file exists and could contain further links,
    but recursion is disabled -- not retrieving.

    The HTTP request sent, awaiting response... 200 OK indicates a successful test and you have public internet access in your notebook session.