After you have a notebook session created, you can write and run Python code using the machine learning libraries in the JupyterLab interface to build and train models.
Authenticating to the OCI APIs from a Notebook Session
When working within a notebook session, you're operating as the Linux user datascience. This user doesn't have an OCI
Identity and Access Management (IAM) identity, so it has no access to the OCI API. OCI resources include Data Science projects and models and the resources of other OCI services, such as Object Storage, Functions, Vault, Data Flow, and so on. To access these resources from the notebook environment, use one of the two authentication approaches:
(Recommended) Authenticating Using a Notebook Session's Resource Principal 🔗
A resource principal is a feature of IAM that enables resources to be authorized principal actors that can perform actions on service resources. Each resource has its own identity, and it authenticates using the certificates that are added to it. These certificates are automatically created, assigned to resources, and rotated, avoiding the need for you to store credentials in a notebook session.
The Data Science service enables you to authenticate using a notebook session's resource principal to access other OCI resources. Resource principals provides a more secure way to authenticate to resources compared to the OCI configuration and API key approach
A tenancy administrator must write policies to grant permissions for a resource principal to access other OCI resources, see Configuring Your Tenancy for Data Science.
You can authenticate with resource principals in a notebook session using the following
interfaces:
Use the `--auth=resource_principal` flag with commands.
Note
The resource principal token is cached for 15 minutes. If you change the policy or the dynamic group, you must wait for 15 minutes to see the effect of the changes.
Important
If you don't explicitly use the resource principals when invoking an SDK or CLI, then the
configuration file and API key approach is used
(Default) Authenticating Using OCI Configuration File and API Keys 🔗
You can operate as your own IAM user by setting up an OCI configuration file and API keys to access OCI resources. This is the default authentication approach
To authenticate using the configuration file and API key approach, you must upload an OCI configuration file into the notebook session's /home/datascience/.oci/ directory. For the relevant profile defined in the OCI configuration file, you also need to upload or create the required .pem files.
You can execute sftp, scp, curl,
wget or rsync commands to pull files into your notebook
session environment under the networking limitations imposed by your VCN and subnet
selection.
Installing Extra Python Libraries 🔗
You can install a library that's not preinstalled in the notebook session image. You can
install and change a pre-built conda environment
or create a conda environment from
scratch.
Using the Provided Environment Variables in Notebook Sessions 🔗
When you start up a notebook session, the service creates useful environment variables that
you can use in your code:
Variable Key Name
Description
Specified By
TENANCY_OCID
OCID of the tenancy the notebook belongs to.
Automatically populated by Data Science.
PROJECT_OCID
The OCID of the project associated with the current notebook session.
Automatically populated by Data Science.
PROJECT_COMPARTMENT_OCID
OCID of the compartment of the project the notebook is associated with.
Automatically populated by Data Science.
USER_OCID
User OCID.
Automatically populated by Data Science.
NB_SESSION_OCID
The OCID of the current notebook session.
Automatically populated by Data Science.
NB_SESSION_COMPARTMENT_OCID
The compartment OCID of the current notebook session.
Automatically populated by Data Science.
OCI_RESOURCE_PRINCIPAL_RPT_PATH
Path to the OCI resource principal token.
Automatically populated by Data Science.
OCI_RESOURCE_PRINCIPAL_RPT_ID
Id of the OCI resource principal token.
Automatically populated by Data Science.
NB_ONCREATE_SCRIPT_URL
Notebook session lifecycle script URL to run when creating.
User specified.
NB_ONACTIVATE_SCRIPT_URL
Notebook session lifecycle script URL to run when activating.
User specified.
NB_ONDEACTIVATE_SCRIPT_URL
Notebook session lifecycle script URL to run when deactivating.
User specified.
NB_ONDELETE_SCRIPT_URL
Notebook session lifecycle script URL to run when deleting.
User specified.
NB_SCRIPT_OUTPUT_LOG_NAMESPACE
Object Storage namespace for notebook lifecycle script output logs.
User specified.
NB_SCRIPT_OUTPUT_LOG_BUCKET
Object Storage bucket for notebook lifecycle script output logs.
User specified.
SECURE_DATA_SESSION
Disable file download from JupyterLab client and JupyterLab download API, set to True to disable download functionality.
User specified.
To access these environment variables in your notebook session, use the Python
os library. For example:
Copy
import os
project_ocid = os.environ['PROJECT_OCID']
print(project_ocid)
Note
The NB_SESSION_COMPARTMENT_OCID and
PROJECT_COMPARTMENT_OCID values do not update in a running notebook
session if the resources has moved compartments after the notebook session was created.
Using Custom Environment Variables 🔗
Use your own custom environment variables in notebook sessions.
After you define your custom environment variables, access these environment variables in a notebook session with the Python os library. For example, if you define a key value pair with key of MY_CUSTOM_VAR1 and value of VALUE-1, then when you run the following code, you get VALUE-1.
Copy
import os
my_custom_var1 = os.environ['MY_CUSTOM_VAR1']
print(my_custom_var1)
Note
The system doesn't let you overwrite the system provided
environment variables with custom ones. For example, you can't name a custom
variable, USER_OCID.
Using the Oracle Accelerated Data Science SDK 🔗
Oracle Accelerated Data Science (ADS) SDK speeds up common data science activities by providing tools that automate and simplify common data science tasks. It provides data scientists a friendly Python interface to OCI services including Data Science including jobs, Big Data, Data Flow, Object Storage, Streaming, and Vault, and to Oracle Database. ADS gives you an interface to manage the life cycle of machine learning models, from data acquisition to model evaluation, interpretation, and model deployment.
With ADS you can:
Read datasets from Object Storage, Oracle Database (ATP, ADW, and On premises), AWS S3, and other sources into Pandas data frames.
Tune models using hyperparameter optimization with the ADSTuner module.
Generate detailed evaluation reports of model candidates with the ADSEvaluator module.
Start distributed ETL, data processing, and model training jobs in Spark using Data Flow.
connect to the BDS from the notebook session, the cluster created must have Kerberos enabled.
Use Kerberos enabled clusters to connect to Big Data from a notebook session.
Use feature types to characterize data, create meaning summary statistics, and plot. Use the warning and validation system to test the quality of data.
Train machine learning models using Data Science jobs.
Manage the life cycle of conda environments using the ads conda CLI.