Getting Started with Data Integration

Before you create a Data Integration workspace, review the prerequisites and list of tasks that you, the customer, are responsible for.

Customer Responsibility Checklist

You must have the following resources and minimum policies in your tenancy. If you don't have the proper rights, have your administrator create them for you.

Before You Begin

Before you start setting up the Data Integration service for use, you must have:

  • An Oracle Cloud Infrastructure account with administrator privileges
  • Access to the Data Integration service

List of Customer Tasks

This section summarizes the responsibilities of Data Integration customers before setting up and using Data Integration for the first time.

TaskDescription

Create Oracle Cloud Infrastructure resources for your Data Integration activities

In Oracle Cloud Infrastructure Identity and Access Management (IAM), create your compartments, users, and groups of users.

Configure networking components for your data sources

You can set up virtual cloud networks (VCNs) and subnets n Oracle Cloud Infrastructure Networking for Data Integration. Only regional subnets are supported, and DNS hostnames must be used in the subnets. Depending on the location of your data sources, you might have to create other network objects such as service gateways, network security groups, and Network Address Translation (NAT) gateways.

For data sources in a private network, create a VCN with at least one regional subnet.

Create policies to access and use Data Integration

In Oracle Cloud Infrastructure Identity and Access Management (IAM), create the required policies that give groups of users proper access to Data Integration resources.

Data Integration must also have permission to manage the virtual networks and subnets that you set up for integration.

For reference and examples, see Data Integration Policies, and also ensure that you understand the relationship between permissions and verbs.

Create a workspace

When you create a workspace in Data Integration, you can enable the private network that you have set up.

After creating a workspace, you can refer to Typical Data Integration User Activities as a guide.

See also Data Security.

Shared Responsibilities Checklist

Learn how control plane and data plane management tasks for Data Integration are shared between Oracle and you, the customer.

Generally speaking, the control plane is responsible for provisioning OCI resources and managing metadata operations to get, create, update, and delete Data Integration workspaces. The data plane is responsible for design time and runtime operations related to data assets, data flows, pipelines, tasks, and applications in Data Integration.

Task Who Description
Workspace resources provisioning Oracle and Customer

Oracle is responsible for provisioning Oracle Cloud Infrastructure resources for Data Integration workspaces, including compute instances and their connectivity to your subnet (if provided) via a secondary VNIC.

You, the customer, are responsible for:

  • Setting up the infrastructure resources beforehand, such as creating a compartment and networking resources.
  • Creating the Data Integration workspaces that you need by specifying the appropriate configuration characteristics.

For the list of customer responsibilities to set up the Data Integration service before first use, see Customer Responsibility Checklist.

Backup and recovery of workspaces and applications Oracle and Customer

Oracle backs up content on a continuous basis in order to perform disaster recovery of Data Integration service resources metadata and the operation of the service only. Such backups include customer workspace backups, but the backups are not made available to customers.

You, the customer, are responsible for making backups of your application data, by copying your applications to the same workspace, another workspace, or another compartment. This is especially important for cross-region disaster recovery.

Service patching and upgrading Oracle Oracle is responsible for patching and upgrading the Data Integration service and its agent components.
Scaling Oracle

Oracle is responsible for scaling the control and data planes.

You, the customer, can request scaling the OCI resources in the data plane for agent computation.

Health monitoring Oracle and Customer

Oracle is responsible for monitoring the health of workspace resources and for ensuring their availability.

You, the customer, are responsible for monitoring the health and performance of your tasks and applications at all levels, including the availability of dependent resources that are referenced in the data plane during task runs.

Application security Oracle and Customer

Oracle ensures that data stored in OCI is encrypted and ensures that connections to Data Integration require SSL encryption.

You, the customer, are responsible for the security of your applications at all levels. This responsibility includes access to workspace resources, network access to those resources, and access to dependent data.

Auditing Oracle and Customer

Oracle is responsible for logging REST API calls that are made to workspace resources and for making those logs available to you for auditing purposes.

You, the customer, are responsible for configuring your access to audit logs in the audit log service, and using the logs to audit usage and monitor activity within your tenancy.

Alerts and notifications Oracle and Customer

Oracle provides service events and notifications.

You, the customer, are responsible to configure alerts and notifications for service events and for monitoring alerts that may be of interest.

Creating Resources

To create resources for Data Integration activities:

  1. Create a compartment in the tenancy for Data Integration activities.

    For more information, see Working with Compartments.

  2. If your data sources are in a private network, create a VCN with at least one subnet in the compartment.
    Note

    The VCN and subnet you create here are the ones you select when you create a workspace. The subnet must be regional, spanning all availability domains.

    If you don't see your subnet listed, go back and check that it was created as a regional subnet.

    For more information, see VCNs and Subnets.

  3. Create a group for users in charge of workspaces, and then add users to the group.

    Take note of the group name. You create policies for the group in the next section. For more information, see Managing Groups.

Creating Policies

To control non-administrator user access to Data Integration resources and functions, you create groups in Oracle Cloud Infrastructure Identity and Access Management (IAM). Then you write IAM policies that give the groups proper access.

You can use Data Integration policy templates in the IAM Policy Builder to create a policy, or you can manually enter the policy statements in the manual editor. See Writing Policy Statements with the Policy Builder for information about how to use the Policy Builder and policy templates.

To understand the syntax used in writing a policy statement, see Policy Syntax. Ensure that you understand the relationship between permissions and verbs.

You can create most of the Data Integration policies at the tenancy level or at the compartment level. The policies listed here are examples, which you can modify to suit your access needs.

For more examples and reference, see Data Integration Policies.

Note

After you add IAM components (for example, dynamic groups and policy statements), don't try to perform the associated tasks immediately. New IAM policies require about five to 10 minutes to take effect.

For Workspaces

To create and use workspaces
Create workspaces

This policy gives permission to a group to create Data Integration workspaces.

allow group <group-name> to manage dis-workspaces in compartment <compartment-name>

Users with the inspect permission can only list dis-workspaces. Users with the manage permission for dis-workspaces can create and delete workspaces. Users with the use permission can only perform integration activities within workspaces. View more examples to create a policy specific to your requirements.

Check workspace creation status

This policy gives permission to a group to check the status while creating a workspace.

allow group <group-name> to manage dis-work-requests in compartment <compartment-name>
View user names

This policy gives Data Integration access to list users' names in the Created by field when they create projects, data assets, and applications in the workspace.

allow service dataintegration to inspect users in tenancy
Restrict group to a single workspace

After creating workspaces, you can allow a specific group to manage a specific workspace and not any other workspace:

allow group <group-name> to manage dis-workspaces in compartment <compartment-name> where target.workspace.id = '<workspace-ocid>'
Move compartments

This policy gives Data Integration access to move a workspace from one compartment to another target compartment.

allow service dataintegration to inspect compartments in compartment <target-compartment-name>
Move workspaces

This policy gives permission to a group to move Data Integration workspaces.

allow group <group-name> to manage dis-workspaces in compartment <source-compartment-name>
allow group <group-name> to manage dis-workspaces in compartment <target-compartment-name>
Tags

This policy gives permission to a group to manage tag-namespaces and tags in Data Integration workspaces.

allow group <group-name> to manage tag-namespaces in compartment <compartment-name>

To add a defined tag, you must have permission to use the tag namespace. To learn more about tagging, see Resource Tags.

Search

These policies give Data Integration access to search within workspaces in your tenancy.

allow service dataintegration to {TENANCY_INSPECT} in tenancy
allow service dataintegration to {DIS_METADATA_INSPECT} in tenancy
Calculate subnet size

While creating a workspace for which private network is enabled, to check whether the subnet has enough IP addresses to allocate, add the following policy:

allow group <group_name> to inspect instance-family in compartment <compartment_name>

To restrict the permission to a specific API call, add the following policy:

allow group <group_name> to inspect instance-family in compartment <compartment_name> where ALL {request.operation = 'ListVnicAttachments'}
To enable private network
Data Integration can be in a different tenancy from your resources. To run a task, Data Integration sends a request to your tenancy. In return, you must give Data Integration permission to manage the virtual networks that you have set up for integration. Create Data Integration workspaces in the same region as your network and securely access your networks through private IP addresses. Without a policy to accept this request, data integration fails.
allow service dataintegration to use virtual-network-family in compartment <compartment-name>

The following policy gives permission to a group to manage networking resources in the compartment.

allow group <group-name> to manage virtual-network-family in compartment <compartment-name>

Or, for non-admin users:

allow group <group-name> to use virtual-network-family in compartment <compartment-name>
allow group <group-name> to inspect instance-family in compartment <compartment-name>

You can limit user activities within the network when you assign the inspect permission for VCNs and subnets within your compartment instead of manage. Users can then view existing VCNs and subnets and select them when creating a workspace. View more examples to create a policy specific to your requirements.

For Data Assets

Object Storage

Create these policies to allow Data Integration to access Object Storage resources, such as objects and buckets.

allow group <group-name> to use object-family in compartment <compartment-name>
allow any-user to use buckets in compartment <compartment-name> where ALL {request.principal.type = 'disworkspace', request.principal.id = '<workspace-ocid>'}
allow any-user to manage objects in compartment <compartment-name> where ALL {request.principal.type = 'disworkspace', request.principal.id = '<workspace-ocid>'}

If your Data Integration workspace and Object Storage data source are in different tenancies, then you must also create the following policies for compartments:

In the workspace tenancy:


Endorse any-user to inspect compartments in tenancy <tenancy-name> where ALL {request.principal.type = 'disworkspace'}

In the Object Storage tenancy:


Admit any-user of tenancy <tenancy-name> to inspect compartments in tenancy
Note

Different types of policies (resource principal and on behalf of) are required for using Object Storage. The policies required also depend on whether the Object Storage instance and Data Integration instance are in the same tenancy or different tenancies, and whether you create the policies at the compartment level or tenancy level. Review more examples and this blog to identify the right policies for your needs.
Fusion Applications

Create these policies to allow Data Integration to access buckets and objects in Oracle Cloud Infrastructure Object Storage. The policies are required for staging extracted data, which need pre-authentication to complete the operations.

allow group <group-name> to use object-family in compartment <compartment-name>
allow any-user to use buckets in compartment <compartment-name> where ALL {request.principal.type = 'disworkspace', request.principal.id = '<workspace-ocid>'}
allow any-user to manage objects in compartment <compartment-name> where ALL {request.principal.type = 'disworkspace', request.principal.id = '<workspace-ocid>'}
allow any-user to manage buckets in compartment <compartment-name> where ALL {request.principal.type = 'disworkspace', request.principal.id = '<workspace-ocid>', request.permission = 'PAR_MANAGE'}
Note

Different types of policies (resource principal and on behalf of) are required for Object Storage. Policies required also depend on whether the Object Storage instance and Data Integration instance are in the same tenancy or different tenancies, and whether you create the policies at the compartment level or tenancy level. Review more examples and this blog to identify the right policies for your needs.
OCI Vault

Create this policy if you want to use secrets in OCI vault for sensitive information.

allow any-user to read secret-bundles in compartment <compartment-name> where ALL {request.principal.type = 'disworkspace', request.principal.id = '<workspace-ocid>'}

The following policy enables a group of users who are not administrators to use secrets with Oracle Autonomous Data Warehouse and Oracle Autonomous Transaction Processing:

allow group <group-name> to read secret-bundles in compartment <compartment-name>
Autonomous Databases

Create this policy if you use an autonomous database as a target. Autonomous databases use Object Storage for staging data and need pre-authentication to complete operations.

allow any-user to manage buckets in compartment <compartment-name> where ALL {request.principal.type = 'disworkspace', request.principal.id = '<workspace-ocid>', request.permission = 'PAR_MANAGE'}

Create this policy if you want the autonomous database credentials to be retrieved automatically while create an autonomous database data asset.

allow group <group-name> to read autonomous-database-family in compartment <compartment-name>

For Publishes

To publish tasks to OCI Data Flow

Create these policies if you want to publish Data Integration tasks from Data Integration to the OCI Data Flow service.

allow any-user to manage dataflow-application in compartment <compartment-name> where ALL {request.principal.type = 'disworkspace', request.principal.id = '<workspace-ocid>'}
allow any-user to read dataflow-private-endpoint in compartment <compartment-name> where ALL {request.principal.type = 'disworkspace', request.principal.id = '<workspace-ocid>'}
allow group <group-name> to read dataflow-application in compartment <compartment-name>
allow group <group-name> to manage dataflow-run in compartment <compartment-name>

For non-administrator users to publish to OCI Data Flow using a private endpoint, this policy is required to show private endpoints:

allow group <group-name> to inspect dataflow-private-endpoint in compartment <compartment-name>

Creating a Workspace

Before you can get started with Data Integration, you or your administrator must first create a workspace for your data integration projects.

Create a workspace after the connectivity requirements for Data Integration are satisfied. See Creating Resources.

For other networking information, see the following topics:

Ensure that you also have the required policies for creating workspaces, as described in Creating Policies. For example, if you're creating a workspace using virtual cloud network (VCN) resources, you must allow Data Integration access to your VCN in the compartment.

    1. Open the navigation menu and click Analytics & AI. Under Data Lake, click Data Integration.
    2. On the Data Integration service page, click Workspaces.
    3. On the Workspaces page, select the compartment to create the workspace in, and then click Create workspace.
    4. In the Create workspace panel, enter a name and an optional description for the workspace.
    5. In the Network selection section, select Enable private network to use a private network to connect to your data sources.
    6. If you choose to use a private network, provide the following values:
      • Choose a VCN in <Compartment_Name>: Select the VCN for your data integrations.
      • Choose a Subnet in <Compartment_Name>: Select the subnet in the selected VCN for your data integrations.
      • DNS server IP: (Optional) Enter the domain name system (DNS) server IP address of your server.
      • DNS server zone: (Optional) If you entered a DNS server IP address, enter the DNS zone of your server.

      After a workspace is created, you can't disable the private network connection, or change the compartment, VCN, or subnet selections.

    7. (Optional) In the Tags section, add tags to help you search for Data Integration resources within the tenancy.

      For information about tags, see Managing Tags and Tag Namespaces.

    8. Choose one of the following options:
      • To create the workspace, click Create.

        Note

        If you haven't added the required policies, workspace creation fails. In the Unauthorized access information box that appears, click Manage policies to view the details of the required policy statements. Specify the correct group name and compartment in the statements. If you're an administrator, you can add the policies by clicking Add policies. If you're not an administrator, click Copy policies and then send them to an administrator to add.

        You're returned to the Workspaces page. It might take a few minutes before the workspace is ready for you to access. When the status is Active, you can select the workspace from the list.

        For information about navigating and searching in a workspace, see Navigating a Workspace.

      • To create the workspace later using Resource Manager and Terraform, click Save as stack to save the resource definition as a Terraform configuration.

        For more information about saving stacks from resource definitions, see Creating a Stack from a Resource Creation Page.

    Use the workspace to create design-time artifacts such as data assets, data flows, and tasks in one or more projects or folders. For information about using projects in a workspace, see Using Projects and Folders.

  • Use the oci data-integration workspace create command and required parameters to create a workspace:

    oci data-integration workspace create [OPTIONS]

    For a complete list of flags and variable options for CLI commands, see the Command Line Reference.

  • Run the CreateWorkspace operation to create a workspace.

Components in a Design

After creating data assets for the source and target data systems, you create the data integration processes for extracting, loading, and transforming data.

In Data Integration, to ingest and transform data, you create data loader tasks, data flows, integration tasks, and other tasks. To orchestrate a set of tasks in a sequence or in parallel, you create pipelines and pipeline tasks. You can use the following tasks as a guideline.

TaskDescription
Create a data loader taskCreate a data loader task from the Tasks section of a project or folder details page. A data loader task takes data from a source, transforms the data, then loads the data into a target.
Create a data flowCreate a data flow from the Data Flows section of a project or folder details page.
Add operatorsIn the data flow designer, build the logical flow of data from your source data assets to your target data assets. Add data operators to specify the source and target data sources. Add shaping operators such as filter and join to cleanse, transform, and enrich data.
Add user-defined functionsCreate and use custom functions.
Apply transformationsIn the Data tab of an operator in the data flow designer, apply transformations to aggregate, cleanse, and shape data.
Assign parametersIn the Details tab of an operator in the data flow designer, assign parameters to externalize and override values. By using parameters, different configurations of your sources, targets, and transformations can be reused at design time and runtime.
Create an integration taskAfter completing a data flow design, from the Tasks section of a project or folder details page, create an integration task that uses the data flow. Wrapping the data flow in an integration task lets you run the data flow, and you can choose the parameter values you want to use at runtime.
Create other tasksIf needed, you can create other types of tasks from the Tasks section of a project or folder details page.
Create a pipelineCreate a pipeline from the Pipelines section of a project or folder details page. In the pipeline designer, use operators to add the tasks and activities you want to orchestrate as a set of processes in a sequence or in parallel. You can also use parameters to override values at design time and runtime.
Create a pipeline taskAfter completing a pipeline design, from the Tasks section of a project or folder details page, create a pipeline task that uses the pipeline. Wrapping the pipeline in a pipeline task lets you run the pipeline, and you can choose the parameter values you want to use at runtime.