Establish foundational data context with Knowledge Catalog

When working with data, you've probably asked questions like "What does this column name mean?", "Who owns this broken dataset?", or "Is this table approved for use?" Metadata tags try to answer these questions, but they quickly become outdated or inconsistent. Knowledge Catalog (formerly Dataplex Universal Catalog) solves this by letting you attach structured metadata and clear business definitions directly to data assets. Providing clear data context grounds AI agents and builds a foundation of trust for every user who interacts with the data.

This tutorial shows you how to establish data context in Knowledge Catalog. Designed for users such as data stewards and business analysts, this tutorial walks you through UI-based steps to build standard business terms and context before you automate these workflows. The tutorial clarifies relationships between key Knowledge Catalog concepts. By the end, you'll know how to make your data discoverable and trustworthy.

Objectives

In this tutorial, you learn how to:

  • Create a single source of truth for business terms with a business glossary.
  • Structure and organize metadata with aspect types.
  • Attach metadata to data assets with aspects.
  • Use Knowledge Catalog Search to find exactly what you need using this new structured metadata.

Before you begin

Before you begin, do the following:

Set up your environment

This tutorial uses Cloud Shell, a command-line environment that runs in the cloud.

  1. From the Google Cloud console, click Activate Cloud Shell in the top right toolbar. It takes a few moments to provision and connect to the environment.

  2. In Cloud Shell, set your PROJECT_ID and LOCATION variables so that all future commands target your specific Google Cloud project.

    export PROJECT_ID=$(gcloud config get-value project)
    gcloud config set project $PROJECT_ID
    export LOCATION="us-central1"
    
  3. Enable the necessary Google Cloud services.

    gcloud services enable \
      dataplex.googleapis.com \
      bigquery.googleapis.com \
      datacatalog.googleapis.com
    

Create a BigQuery dataset and prepare sample data

Use the following code to create a BigQuery dataset and load some sample CSV transactions into a table. After you create the table, Knowledge Catalog discovers it and creates an entry for it in the catalog.

Think of an entry as Knowledge Catalog's representation of a data asset. It's like a record in the catalog that you can attach metadata to. Instead of adding context to (or enriching) the BigQuery table directly, you add it to its entry in Knowledge Catalog.

# Create the BigQuery Dataset in the us-central1 region
bq --location=$LOCATION mk --dataset \
    --description "Sample retail data for foundational data context tutorial" \
    $PROJECT_ID:retail_data

# Create a temporary CSV file with the sample data
echo "transaction_id,user_email,gmv,transaction_date
1001,test@example.com,150.50,2025-08-28
1002,user@example.com,75.00,2025-08-28" > /tmp/transactions.csv

# Load the data from the temporary CSV file into a BigQuery table
bq load \
    --source_format=CSV \
    --autodetect \
    retail_data.transactions \
    /tmp/transactions.csv

# (Optional) Clean up the temporary file
rm /tmp/transactions.csv

Run a SELECT query to verify your setup:

bq query --nouse_legacy_sql "SELECT * FROM retail_data.transactions"

Example output:

+----------------+------------------+-------+------------------+
| transaction_id |    user_email    |  gmv  | transaction_date |
+----------------+------------------+-------+------------------+
|           1001 | test@example.com | 150.5 |       2025-08-28 |
|           1002 | user@example.com |  75.0 |       2025-08-28 |
+----------------+------------------+-------+------------------+

Establish common terms with a business glossary

Good data context relies on clear definitions. For example, a developer shouldn't have to guess whether a column named gmv means Gross Merchandise Value or whether it includes taxes and returns. A business glossary creates a single source of truth for these definitions across your organization. When teammates or AI agents analyze your data, they inherit this precise business context. Shared definitions align metrics across teams such as Finance, Sales, and Operations, and help AI agents avoid hallucinations.

Follow these steps to create a glossary and define your first term:

  1. In the Google Cloud console, go to the Knowledge Catalog Glossaries page.

    Go to Glossaries

  2. Click Create Business Glossary.

  3. Enter the following details:

    • Display name: Retail Business Glossary
    • Location: us-central1 (Iowa)
  4. Click Create.

  5. Click Create Category.

  6. Name the category Sales Metrics, and click Create.

  7. Select the Sales Metrics category and click Add term.

  8. Name the term Gross Merchandise Value and click Create.

  9. Click the Gross Merchandise Value term to open its details page.

  10. Click Add next to Overview. Enter the following details: The total value of merchandise sold over a given period of time before the deduction of any fees or expenses. This is a key indicator of e-commerce business growth.

  11. Click Save.

You have now created a glossary term that you can link to data entries across your organization.

Define technical metadata with an aspect type

When you use unstructured metadata tags, you often end up with inconsistent catalog entries. For example, one table might be tagged owner:bob and another steward:alice@example.com. To keep your metadata organized at scale, you need a consistent schema.

That's where aspect types come in. An aspect type is a metadata blueprint that lets you set clear rules and required fields. Requiring standard fields like valid email addresses for data stewards lets downstream scripts validate and protect your metadata automatically.

Follow these steps to create an aspect type:

  1. In the Google Cloud console, go to the Knowledge Catalog Aspect types tab on the Metadata types page.

    Go to Aspect types

  2. On the Custom tab, click Create.

  3. Enter the following details:

    • Display name: Data Asset Context
    • Location: us-central1 (Iowa)
  4. In the Template section, click Add field to create the following three fields:

    • Field 1:

      • Display name: Data Steward
      • Type: Text
      • Is Required: Select the checkbox.
      • Text type: Plain text
    • Field 2 (click Add field):

      • Display name: Data Sensitivity
      • Type: Enum
      • Is Required: Leave optional.
      • Values: Add Public, Internal, and Confidential
    • Field 3 (click Add a field):

      • Display name: Last Review Date
      • Is Required: Leave optional.
      • Type: Date and time
  5. Click Save.

You now have an aspect type for data governance-related metadata fields like data steward, sensitivity level, and review date. In the next section, you apply this schema to a table entry by attaching an aspect with specific values for these fields.

Enrich an entry with business and technical context

Column names are often abbreviated or ambiguous. Linking a column to a term in your business glossary provides a clear and consistent definition. In this step, you enrich the entry for the retail_data.transactions table by linking the Gross Merchandise Value term to a column named gmv and attaching an aspect to the table entry using your aspect type.

To clarify what the gmv column in retail_data.transactions is, link it to your Gross Merchandise Value term.

  1. In the Google Cloud console, go to the Knowledge Catalog Search page.

    Go to Search

  2. Click Filters to open the Filters panel.

  3. For Scope, select Current Project.

  4. Search for retail_data.transactions and click the returned transactions table.

  5. Click the Schema tab.

  6. Select the checkbox next to the gmv column, and click Add business term.

  7. Select Gross Merchandise Value.

Attach an aspect to the table entry

In addition to linking business terms to columns, you can attach an aspect to a table entry to capture table-level metadata, such as data ownership and sensitivity.

An aspect is an instance of an aspect type, with specific values for metadata fields. When you attach an aspect to an entry, Knowledge Catalog checks the information you provide against the schema defined in the aspect type to ensure consistency.

To define ownership and sensitivity for the retail_data.transactions table, attach the Data Asset Context aspect:

  1. On the Details tab of the retail_data.transactions entry page, click Add next to Optional aspects.
  2. Select Data Asset Context from the list.
  3. Enter values in the fields:

    • Data Steward: finance-team@example.com
    • Data Sensitivity: Select Internal.
    • Last Review Date: Select today's date.
  4. Click Save.

By enriching your sample retail transaction data, you've set up a solid foundation of data context in Knowledge Catalog.

Search for entries using enriched metadata

You can now use Knowledge Catalog Search to find entries based on the business context that you set up. For example, you can find all assets with a specific sensitivity level, or search for your glossary term to discover the underlying tables.

  1. In the Google Cloud console, go to the Knowledge Catalog Search page.

    Go to Search

  2. Click Filters to open the Filters panel.

  3. For Scope, select Current Project.

  4. In the search bar, enter Find tables where the Data Asset Context aspect has Internal sensitivity.

  5. You should see your retail_data.transactions table in the list of results.

  6. Clear the search bar and enter Find tables with the Gross Merchandise Value term attached.

  7. You should again see the retail_data.transactions table in the results, as its gmv column is directly linked to this business term.

When you connect an AI agent to Knowledge Catalog, it inherits this enriched metadata automatically. For example, when you ask an agent to retrieve internal sales metrics, it reads the Data Sensitivity aspect (which you set to Internal) and the linked Gross Merchandise Value glossary term. This shared context helps the agent verify its data sources, respect access policies, and avoid hallucinations.

Clean up

To avoid incurring charges, delete the resources that you created in this tutorial.

Delete the sample dataset

To delete the sample BigQuery dataset and all its tables, use the following command. This action is irreversible.

# Re-run these exports if your Cloud Shell session timed out
export PROJECT_ID=$(gcloud config get-value project)

# Manually type this command to confirm you are deleting the correct dataset
bq rm -r -f --dataset $PROJECT_ID:retail_data

Delete Knowledge Catalog artifacts

  1. In the Google Cloud console, go to the Knowledge Catalog Aspect types tab on the Metadata types page.

    Go to Aspect types

  2. Select the Data Asset Context aspect type and click Delete.

  3. In the Google Cloud console, go to the Knowledge Catalog Glossaries page.

    Go to Glossaries

  4. Select the Gross Merchandise Value term and click Delete.

  5. Select the Sales Metrics category and click Delete.

  6. Select the Retail Business Glossary and click Delete.

What's next

To learn more about catalog curation and building agents with Knowledge Catalog, see the following resources: