Overview of Data Science

Oracle Cloud Infrastructure Data Science is a fully managed and serverless platform for data science teams to build, train, and manage machine learning models using Oracle Cloud Infrastructure.

The Data Science Service:

  • Provides data scientists with a collaborative, project-driven workspace.

  • Enables self-service, serverless access to infrastructure for data science workloads.

  • Includes Python-centric tools, libraries, and packages developed by the open source community and the Oracle Accelerated Data Science Library, which supports the end-to-end lifecycle of predictive models:

    • Data acquisition, profiling, preparation, and visualization.

    • Feature engineering.

    • Model training (including Oracle AutoML).

    • Model evaluation, explanation, and interpretation (including Oracle MLX).

    • Model deployment.

  • Integrates with the rest of the Oracle Cloud Infrastructure stack, including Functions, Data Flow, Autonomous Data Warehouse, and Object Storage.

  • Includes policies, and vaults to control access to compartments and resources.
  • Helps data scientists concentrate on methodology and domain expertise to deliver models to production.

Data Science Concepts

Review the following concepts and terms to help you get started with Data Science.

PROJECT

Projects are collaborative workspaces for organizing and documenting Data Science assets, such as notebook sessions and models.

NOTEBOOK SESSION

Data Science notebook sessions are interactive coding environments for building and training models. Notebook sessions come with many pre-installed open source and Oracle developed machine learning and data science packages.

ACCELERATED DATA SCIENCE SDK

The Oracle Accelerated Data Science (ADS) SDK is a Python library that is included as part of the Oracle Cloud Infrastructure Data Science service. ADS has many functions and objects that automates or simplifies the steps in the Data Science workflow, including connecting to data, exploring and visualizing data, training a model with AutoML, evaluating models, and explaining models. In addition, ADS provides a simple interface to access the Data Science service model catalog and other Oracle Cloud Infrastructure services including Object Storage. To familiarize yourself with ADS, see the Accelerated Data Science Library.

MODEL

Models define a mathematical representation of your data and business process. The model catalog is a place to store, track, share, and manage models.

You should also be familiar with the Oracle Cloud Infrastructure Key Concepts.

Ways to Access Data Science

You access Data Science using the Console, REST API, SDKs, or CLI.

Use any of the following options, based on your preference and its suitability for the task you want to complete:

  • The Oracle Cloud Infrastructure Console is an easy-to-use, browser-based interface. To access the Console, you must use a supported browser.
  • The REST APIs provide the most functionality, but require programming expertise. API reference and endpoints provide endpoint details and links to the available API reference documents including the Data Science REST API.
  • Oracle Cloud Infrastructure provides SDKs that interact with Data Science without the need to create a framework.
  • The CLI provides both quick access and full functionality without the need for programming.

Creating Automation Using Events

You can create automation based on state changes for your Oracle Cloud Infrastructure resources by using the Event service types, rules, and actions.

These Data Science resources produce events:

  • Projects

  • Notebook Sessions

  • Models

Data Science event types explains how to set up event notifications including examples.

Regions and Availability Domains

Oracle Cloud Infrastructure services are hosted in regions and availability domains. A region  is a localized geographic area, and an availability domain  is one or more data centers located in that region.

Data Science is hosted in these regions:

  • Australia East (Sydney)

  • Australia Southeast (Melbourne)

  • Brazil East (Sao Paulo)

  • Canada Southeast (Montreal)

  • Canada Southeast (Toronto)

  • Germany Central (Frankfurt)

  • India South (Hyderabad)

  • India West (Mumbai)

  • Japan Central (Osaka)

  • Japan East (Tokyo)

  • Netherlands Northwest (Amsterdam)

  • Saudi Arabia West (Jeddah)

  • South Korea Central (Seoul)

  • South Korea North (Chuncheon)

  • Switzerland North (Zurich)

  • UK South (London)

  • US East (Ashburn)

  • US West (Phoenix)

  • US West (San Jose)

For more information, see Regions and Availability Domains.

Limits on Data Science Resources

When you sign up for Oracle Cloud Infrastructure, a set of service limits are configured for your tenancy. The service limit  is the quota or allowance set on the resources.

For limits on Data Science and other Oracle Cloud Infrastructure services, see Limits by Service.

Note

Failed and inactive notebook sessions and models count against your service limits. Only when you fully terminate an instance or delete a model is it not counted towards your quota.
Resource Identifiers

Most types of Oracle Cloud Infrastructure resources have an Oracle assigned unique ID called an Oracle Cloud Identifier   (OCID).

The OCID is included as part of the resource's information in both the Console and API. For information about the OCID format and other ways to identify your resources, see Resource Identifiers.

Authentication and Authorization

Each service in Oracle Cloud Infrastructure integrates with Identity and Access Management for access to cloud resources through all interfaces (the Oracle Cloud Infrastructure Console, SDKs, REST APIs, or the CLI).

An administrator in your organization must set up groups, compartments, and policies that control who can access which services and resources and the type of access. For example, Data Science Policies create and manage Data Science projects, or launch notebook sessions.

Your administrator can confirm which compartments you should be using.

For general information on policies, see Getting Started with Policies. For specific details about writing policies for each of the services, see Policy Reference. For common policies used to authorize Data Science users, see Common Policies. For in-depth information about policies on granting users permissions for Data Science resources, see Data Science Policies.

Provisioning on the Oracle Cloud Infrastructure

The Data Science service offers a serverless experience for model development and deployment.When you create Data Science resources, such as Notebook Sessions and Models, the underlying compute and storage infrastructure is provisioned and maintained for you.

You pay for the use of the underlying infrastructure (Block Storage, Compute, and Object Storage). You only pay for the infrastructure while you are using it with Data Science resources:

Notebook Sessions
  • Notebook sessions are serverless, and all underlying infrastructure is service-managed.
  • When creating a notebook session, you select the VM shape (type of machine and number of OCPUs per GB of RAM) and amount of block storage (minimum of 50 GB).
  • While a notebook session is active, you pay for Compute and Block Storage at the standard Oracle Cloud Infrastructure rates, see Activating Notebook Sessions.
  • You can deactivate your notebook session, which shuts down the Compute but retains the Block Storage. In this case, you are no longer be charged for Compute, but you continue to pay for the Block Storage. You can activate your notebook session to reattach this Block Storage to new Compute, see Deactivating and Activating Notebook Sessions.
  • When you terminate a notebook session, you are no longer charged for Compute or Block Storage, see Terminating Notebook Sessions.
Models
  • When you save a model to the model catalog, you are charged for the storage of the model artifact at the standard Object Storage rates in terms of GB per month.
  • When you delete a model, you are no longer charged, see Deleting Models.
Tip

You can use Checking Your Balance and Usage to review the costs associated with your account. Also, you can use the Oracle Cloud Infrastructure Billing and Payment Tools to analyze your Data Science usage and manage your costs.