Oracle Cloud Infrastructure Documentation

Deterministic Substitution

Purpose

The Deterministic Substitution masking format uses the specified substitution column as the source of masked values. It performs hash-based substitution to replace the original data in a column with values from the substitution column.

Inputs

  • Schema Name: The name of the schema containing the substitution column
  • Table Name: The name of the table containing the substitution column
  • Column name: The name of the substitution column containing the data that should be used for masking. The data types of the specified substitution column and column being masked must be the same. The substitution column must be present and accessible on the target database before masking. You can also use a pre-masking script to create this column.
  • Seed value: Deterministic Substitution uses a seed value to perform hash-based substitution. Provide the seed value at the time of submitting a data masking job. It can be any string containing alphanumeric characters. To perform deterministic masking, you need to use the same seed value across multiple masking runs.

Supported Data Types

  • Character
  • Numeric
  • Date

Characteristics

  • Combinable: No
  • Deterministic: Yes, as long as the values in the substitution column do not change and you provide the same seed value
  • Reversible: No
  • Uniqueness: Yes. The number of distinct values in the substitution column must be greater than or equal to the number of distinct values in the column to be masked.

Example

Suppose you discover a sensitive column named EMP_ID that contains employee IDs. Let's assume that you have fake employee ID values stored in another column named SUB_EMP_ID, which resides in the SUB_EMPLOYEES table in the SUB_HR schema. When configuring the masking policy in the Data Masking wizard, you choose the Deterministic Substitution masking format for the EMP_ID column and provide the inputs: SUB_HR, SUB_EMPLOYEES, and SUB_EMP_ID.

You also specify a seed value at job submission time. When the job runs, Data Masking replaces the values in the EMP_ID column with the fake values from the SUB_EMP_ID column. In the future, you can mask this column (or other similar columns) using the same substitution column and seed value to ensure that the employee IDs are masked the same way.