Oracle Cloud Infrastructure Documentation

Masking Formats

A masking format defines the logic to mask data in a database column. For example, the Shuffle masking format randomly shuffles values in a column, and the Email Address masking format replaces values with random email addresses. Oracle Data Safe provides a comprehensive set of masking formats that enables you to mask common sensitive and personal data, such as names, national identifiers, credit card numbers, phone numbers, and religion. To meet your specific requirements, you can easily create new masking formats by using basic masking formats, without requiring any technical skills. You can store these user-defined masking formats in the Oracle Data Safe Library for future use.

One of the key aspects of data masking is to replace the sensitive information with fictitious data, without breaking the semantics and structure of the data. The masked data must be realistic and pass specific checks, such as Luhn validation. For example, a masked credit card number must not only be a valid credit card number, but also a valid Visa, Mastercard, American Express, or Discover card number. Failing to maintain this data integrity may break the corresponding application. The predefined masking formats ensure that the generated data passes common validation checks.

The following are common data masking terms:

  • Combinable

    You can combine multiple masking formats within conditions.

    For example, assume that you want to mask a column containing data in format 999-999, where 9 signifies a digit. You want to replace the first three digits with a fixed three-digit number, preserve the hyphen, and replace the last three digits with some random digits. To generate the expected data, you could combine three basic masking formats: Fixed Number, Fixed String, and Random Number, as shown in the following example. The outputs of these three masking formats are concatenated to generate the masked values, for example, 678-333, 678-110, 678-656, and 678-999.

  • Uniqueness

    Sometimes you may want to ensure that the generated masked data in a column is unique (no two rows have the same value). Some masking formats ensure uniqueness of the generated data. It is useful for masking columns with uniqueness constraint.

  • Reversible

    Reversible masking techniques let you retrieve original column data from the masked data. Data masking usually means permanently replacing the data and ensuring that no one can retrieve the original data. But, sometimes you might want to see the original data. Reversible masking is helpful when businesses need to mask and send their data to a third party for analysis, reporting, or any other business processing purpose. After the processed data is received from the third party, the original data can be recovered.

    See Also:

    Deterministic Encryption, which supports reversible masking.
  • Deterministic

    One of the key requirements for masking data in large databases or multiple database environments is to mask some data consistently. That is, for a given input, the output should always be the same. At the same time, the masked output should not be predictable. Deterministic masking generates consistent output for a given input across databases and data masking jobs. It is helpful in maintaining data integrity across multiple applications and preserving system integrity in a single sign-on environment.

    For example, consider three applications: a human capital management application, a customer relationship management application, and a sales data warehouse. These three applications may have key common fields such as EMPLOYEE_ID that must be masked consistently across these applications. Deterministic masking techniques can be used here to ensure consistency.

    Let's consider another example. Suppose that two values, Joe and Tom, are masked to Henry and Peter by using a deterministic masking technique. When you repeat the technique on another database, Bob and Tom (if they exist), might be replaced with Louise and Peter. Notice that even though the two runs have different data, Tom is always replaced with Peter.

    The Deterministic Encryption, Deterministic Substitution, SQL Expression, and User Defined Function masking formats support deterministic masking.