Oracle Cloud Infrastructure Documentation

User-Defined Sensitive Types

A user-defined sensitive type is a sensitive type created by your own organization. Although Oracle Data Safe provides an extensive set of predefined sensitive types, you might want to create sensitive types to meet your specific requirements. You can also create new sensitive categories and arrange your sensitive types under them. You cannot place a user-defined sensitive type under a predefined sensitive category. For a user-defined sensitive type, you can assign a default masking format, which should be used to mask the columns discovered using this sensitive type. When creating a user-defined sensitive type, you must assign it to a resource group.

To create a sensitive type, you provide one or more column patterns (regular expressions) that should be used to discover sensitive columns. Data Discovery performs case-insensitive pattern matching. See Regular Expressions to learn how to write regular expressions.

Column Name Pattern

A column name pattern is a regular expression that is used to match column names during data discovery. For example, to search for columns containing Social Security numbers, you could define the following column name pattern:

(^|[_-])SSN($|[_-])|(SSN|SOC.*SEC.*).?(ID|NO|NUMBERS?|NUM|NBR|#)

The regular expression checks for specific keywords in column names. It matches column names, such as PATIENT_SSN, SSN#, SOCIAL_SECURITY_NUMBER, and EMPLOYEE_SOC_SEC_NO.

Tips for creating column name patterns:

  • Consider when to use .? and .*. Use .? if you want to allow zero or one character, and use .* to allow any number of characters. For example, you could use SOCIAL.?SECURITY.?NUMBER or SOC.*SEC.*NUMBER depending upon how strict you want the regular expression to be.
  • To get an exact match of a word or a match if the word is part of a column name, use (^|[_-])<WORD>($|[_-]). The pattern finds an exact match and variations of <WORD> plus the characters _- before or after the word.
  • Whenever searching for columns containing numbers, you could use keywords like (ID|NO|NUMBERS?|NUM|NBR|#).
  • To match singular and plural words, if applicable, use S?. For example, use CODES? to match CODE and CODES.
  • To match dates, use (DT|DATE) and the reverse pattern. For example, you could use the following pattern to match BIRTH_DATE and DATE_OF_BIRTH:
    BIRTH.?(DT|DATE)|(DT|DATE).*BIRTH

Column Comment Pattern

A column comment pattern is a regular expression that is used to match column comments during data discovery. Sometimes column names are obscure and therefore, metadata is entered as a comment for a database column. Data Discovery can search these comments and potentially find more sensitive data. For example, to search for columns containing Social Security numbers, you could define the following column comment pattern:

\bSSN#?\b|SOCIAL SECURITY (ID|NUM|\bNO\b|NBR)

The regular expression checks for specific keywords in column comments. For example, it matches the column comment Contains social security numbers of employees.

Tips for creating column comment patterns:

  • Avoid using .* in column comments to reduce false positives.
  • Use \b<word>\b to search for a specific word. It avoids matching words that contain <word>. For example, the regular expression \bNO\b matches social security no but not social security notification. Similarly, the regular expression \bSECT\b does not match the word SECTOR, and \bCULT\b does not match the word CULTURE.
  • Whenever searching for columns containing numbers, you can use keywords like (ID|\bNO\b|NUM|NBR|#).

Column Data Pattern

A column data pattern is a regular expression that is used to match the actual column data during data discovery. For example, to search for columns containing Social Security numbers, you could define the following column data pattern:

^[0-9]{3}[ -]?[0-9]{2}[ -]?[0-9]{4}$

The regular expression checks for 9-digit numbers. A number can be either numeric or can have three parts separated by hyphens or spaces. It matches numbers like 383368610 and 383-36-8610.

Tips for creating column data patterns:

  • Ensure that the data pattern is as specific as possible to avoid false positives.
  • See whether it is logical to have a data pattern. If the data pattern is too broad, it can result in false positives. If it does not add any value, you could decide not to add the data pattern for a sensitive type.
  • If you want to use a broad data pattern, you could use the And search operator to reduce false positives.

And/Or Search Pattern

The search pattern indicates how the column name, comment and data patterns of a sensitive type should be used to discover sensitive columns. There are two search options: AND and OR.

The AND search option ensures that all the provided patterns of a sensitive type must match for identifying a column as sensitive. For example, if a sensitive type has name, comment, and data patterns, they must match a column's name, comment, and data respectively, for identifying that column as sensitive. The following table covers the various possible combination of the patterns provided for a sensitive type and the corresponding AND search behavior.

Patterns Present in a Sensitive Type Search Behavior
Name, Comment, and Data Name AND Comment AND Data
Name and Data Name AND Data
Name and Comment Name AND Comment
Comment and Data Comment AND Data
Name Name
Comment Comment
Data Data

The OR search option provides some flexibility to identify a column as sensitive even if only some of the patterns of a sensitive type match. For example, if a sensitive type has name and comment patterns, a column is identified as sensitive even if only the name pattern (or comment pattern) matches the column's name (or comment). If a sensitive type has all three patterns, the data pattern must match along with either the name pattern or the comment pattern (or both). The following table covers the various possible combination of the patterns provided for a sensitive type and the corresponding OR search behavior.

Patterns Present in a Sensitive Type Search Behavior
Name, Comment, and Data Data OR (Name AND Data) OR (Comment AND Data)
Name and Data Data OR (Name AND Data)
Name and Comment Name OR Comment
Comment and Data Data OR (Comment AND Data)
Name Name
Comment Comment
Data Data