Data Catalog Overview

A variety of data professionals use Data Catalog, such as data engineers, data scientists, data stewards, and chief data officers.

Using Data Catalog, you may perform one or more of the following roles:

  • Data Engineer: Understand where the data is coming from, where it is going, and the impact of their changes to the data.
  • Data Scientist: Find and access trustworthy data for data science and analytics.
  • Data Steward: Capture business concepts in the form of glossaries, categories, and terms to establish common knowledge within your company.
  • Data Officer: Look at an end-to-end report to understand where the data came from for effective data governance.

Data Catalog Key Capabilities

  • Harvest technical metadata from a wide range of supported data sources that are accessible using public or private IPs.
  • Create and manage a common enterprise vocabulary with a business glossary. Build a hierarchy of categories, subcategories, and terms with detailed rich text descriptions.
  • Enrich the harvested technical metadata with annotations by linking data entities and attributes to the business terms or adding free-form tags.
  • Find the information you need by exploring the data assets, browsing the data catalog, or using the quick search bar.
  • Automate and manage harvesting jobs using schedules.
  • Integrate the enterprise class capabilities of your data catalog with other applications using REST APIs and SDKs.

Data Catalog Concepts

An understanding of the following concepts is essential for using Data Catalog.

Data Asset
Represents a data source, such as a database, an object store, a file or document store, a message queue, or an application.
Connection
Includes necessary details to establish a connection to a data source. A connection is always associated to one data asset. A data asset can have more than one connection.
Connection Type
Defines the different set of properties available in a connection to connect to a data asset.
Harvest
Process that extracts technical metadata from your connected data sources into your data catalog repository.
Object
An object in the Data Catalog refers to any object that is managed in your data catalog such as data assets, data entities, attributes, glossaries, terms and so on.
Data Object
A data object in Data Catalog refers to data assets and data entities.
Data Entity
A data entity is a collection of data such as a database table or view, or a single logical file and normally has many attributes that describe its data.
Attribute
An attribute describes a data item with a name and data type. For example, a column in a table or a field in a file.
Glossary
A glossary is a collection of business concepts in your company. Glossary constitutes of categories and business terms.
Category
A category is created in a glossary to group logically related business terms. You can create a category within a category to group your terms.
Term
Terms are the actual definitions of business concepts as agreed upon by different business stakeholders in your company. You use terms to organize your data entities and attributes.
Data Catalog Tag
Tags are free-form labels or keywords you create to be able to logically identify data objects. Tags help in metadata classification and discovery. You create tags for data assets, data entities, and attributes. Using tags, you can search for all data objects tagged with a specific tag name.
Job
A task that runs the harvest process. A job can be created and run immediately, scheduled to run at a specified frequency, or created and run when needed.
Schedule
An automated job that can run hourly, daily, weekly, or monthly.

Ways to Access Data Catalog

You access Data Catalog using the Console, REST API, SDKs, or CLI.

Use any of the following options, based on your preference and its suitability for the task you want to complete:

  • The Console is an easy-to-use, browser-based interface. To access the Console, you must use one of the following supported browsers:
    • Google Chrome 69 or later
    • Firefox 62 or later
    Use the Console link at the top of this page to go to the sign-in page. You are prompted to enter your cloud tenant, your user name, and your password.
  • The REST APIs provide the most functionality, but require programming expertise. API reference and endpoints provide endpoint details and links to the available API reference documents.
  • Oracle Cloud Infrastructure provides SDKs that interact with Data Catalog without you having to create a framework.
  • The command line interface (CLI) provides both quick access and full functionality without the need for programming.

Resource Identifiers

The Data Catalog resource has an Oracle-assigned unique identifier called an Oracle Cloud ID (OCID).

Regions and Availability Domains

Regions and availability domains indicate the physical and logical organization of your Data Catalog resources. A region is a localized geographic area, and an availability domain is one or more data centers located within a region.

Data Catalog is hosted in the following regions:

North America
US East (Ashburn)
US West (Phoenix)
Canada Southeast (Toronto)
Canada Southeast (Montreal)
APAC
Japan East (Tokyo)
Japan Central (Osaka)
South Korea Central (Seoul)
Australia East (Sydney)
India West (Mumbai)
Australia Southeast (Melbourne)
India South (Hyderabad)
South Korea North (Chuncheon)
EMEA
UK South (London)
Germany Central (Frankfurt)
Switzerland North (Zurich)
Netherlands Northwest (Amsterdam)
Saudi Arabia West (Jeddah)
LAD
Brazil East (Sao Paulo)

Limits and Quotas

Service Limits

Data Catalog limits you to two data catalog instances per region.

Compartment Quotas

You can limit the number of data catalog resources in a compartment by creating a quota limit. For example:

set data-catalog quota catalog-count to 1 in compartment <MyCompartment>

Integration

Data Catalog is integrated with various services and features.

IAM
Work Requests
Search
Compartment Explorer
Monitoring

Typical Workflow for Using Data Catalog

As a Data Catalog user, there are some typical tasks you perform.

Task Description
Accessing a Data Catalog instance Open your Data Catalog instance.
Creating a data asset Register your data sources in the Data Catalog as data assets.
Adding a connection Connect to the data assets.
Harvesting a data asset Harvest technical metadata from data assets.
Creating a business glossary Create a business glossary to define your company concepts and establish common understanding.
Linking glossary terms to data objects Link business terms to data objects.
Linking tags to data objects Annotate data objects with free-form tags.
Finding relevant data Search, browse, or explore the Data Catalog to find useful and trusted data.