Teradata Vantage and Google Cloud Data Catalog

Connecting Teradata Vantage to the Google Cloud Data Catalog

2022年2月23日 6 最小阅读

Many Teradata customers are interested in integrating Teradata Vantage with Google Cloud First Party Services. This Getting Started Guide will help you connect Teradata Vantage with the Google Cloud Data Catalog service.

Although this approach has been implemented and tested internally, it is offered on an as-is basis. Neither Google Cloud nor Teradata provide support of Teradata Vantage with Google Cloud Data Catalog services.

Disclaimer: This guide includes content from both Google Cloud and Teradata product documentation.

Overview
This article describes the process to connect Teradata Vantage with Google Cloud Data Catalog using the Data Catalog Teradata Connector, Data Catalog Teradata Connector on GitHub, and then explore the metadata of the Vantage tables via Data Catalog.
Screen-Shot-2022-02-22-at-1-55-41-PM.png

•    Scrape: Connect to Teradata Vantage and retrieve all the available metadata
•    Prepare: Transform metadata in Data Catalog entities and create Tags
•    Ingest: Send the Data Catalog entities to the Google Cloud project

About Google Cloud Data Catalog
Google Cloud Data Catalog is a fully managed data discovery and metadata management service. Data Catalog can catalog the native metadata on data assets. Data Catalog is serverless, and provides a central catalog to capture both technical metadata as well as business metadata in a structured format.


About Teradata Vantage
Vantage is the connected multi-cloud data platform for enterprise analytics. It simplifies ecosystems by unifying analytics, data lakes and data warehouses. With Vantage, enterprise-scale companies can eliminate silos and cost-effectively query all their data, all the time, regardless of where the data resides, to get a complete view of their business.

Vantage combines descriptive, predictive, prescriptive analytics, autonomous decision-making, ML functions, and visualization tools into a unified, integrated platform that uncovers real-time business intelligence at scale, no matter where the data resides.

Vantage enables companies to start small and elastically scale compute or storage, paying only for what they use, harnessing low-cost object stores and integrating their analytic workloads.

Vantage supports R, Python, Teradata Studio, and any other SQL-based tools. You can deploy Vantage across public clouds, on-premises, on optimized or commodity infrastructure, or as-a-service.

Prerequisites
You are expected to be familiar with the Data Catalog, Teradata Vantage, and Python language.

You will need the following accounts, and systems:
•    A Google Service Account with Data Catalog Admin role 
•    A Cloud Console Project created for your account (i.e. partner-integration-lab)
•    Billing enabled 
•    A Teradata Vantage instance and access to the instance 
•    Google Cloud SDK installed and initialized
•    Python installed
•    Pip installed

Procedure
Once you have met the prerequisites, follow these steps:
1.    Enable Data Catalog APIs
2.    Install Teradata Data Catalog Connector
3.    Run
4.    Explore Teradata Vantage metadata with Data Catalog


Step 1: Enable Data Catalog API
Logon to Google console, choose APIs & Services from the Navigation menu, then click on Library. Make sure have your project is selected on the top menu bar.
Screen-Shot-2022-02-22-at-2-01-14-PM.pngPut Data Catalog in the search box and click on Google Cloud Data Catalog API, click ENABLE

Screen-Shot-2022-02-22-at-2-02-08-PM.pngStep 2: Install Teradata Data Catalog Connector
A Teradata Data Catalog connector is available on GitHub. This connector is written in Python. Before installing the connector, run following command to authorize gcloud to access the Cloud Platform with Google user credentials.

gcloud auth login

Choose your Google account when the Google login page opens up and click Allow on the next page.

gcloud config set project <project id>

Next, set up default project if you haven’t already done so.

Install virtualenv
We recommend to install the Teradata Data Catalog Connector in an isolated Python environment. To do so, install virtualenv first. 

Mac/Linux

Windows

  • pip install virtualenv
  • virtualenv --python python3.6 <your-env>
  • source <your-env>/bin/activate
  • <your-env>/bin/pip install google-datacatalog-teradata-connector
  •  
  • pip3 install virtualenv
  • virtualenv --python python3.6 <your-env>
  • <your-env>\Scripts\activate
  • <your-env>\Scripts\pip.exe install google-datacatalog-teradata-connector
  •  

Install Data Catalog Teradata Connector

Mac/Linux

Windows

  • pip install google-datacatalog-teradata-connector
  • pip.exe install google-datacatalog-teradata-connector

Set Environment Variables

export GOOGLE_APPLICATION_CREDENTIALS=<google_credentials_file>
export TERADATA2DC_DATACATALOG_PROJECT_ID=<google_cloud_project_id>
export TERADATA2DC_DATACATALOG_LOCATION_ID=<google_cloud_location_id>
export TERADATA2DC_TERADATA_SERVER=<teradata_server>
export TERADATA2DC_TERADATA_USERNAME=<teradata_username>
export TERADATA2DC_TERADATA_PASSWORD=<teradata_password>

Where <google_credential_file> is the key for your service account (json file).

Step 3: Run
Execute google-datacatalog-teradata-connector command to establish entry point to Vantage database.

> google-datacatalog-teradata-connector \
--datacatalog-project-id=$TERADATA2DC_DATACATALOG_PROJECT_ID \
--datacatalog-location-id=$TERADATA2DC_DATACATALOG_LOCATION_ID \
--teradata-host=$TERADATA2DC_TERADATA_SERVER \
--teradata-user=$TERADATA2DC_TERADATA_USERNAME \
--teradata-pass=$TERADATA2DC_TERADATA_PASSWORD

Sample output from the google-datacatalog-teradata-connector command:

INFO:root:

==============Starting CLI===============
INFO:root:This SQL connector does not implement the user defined datacatalog-entry-resource-url-prefix
INFO:root:This SQL connector uses the default entry resoure URL
INFO:root:
============Start teradata-to-datacatalog===========
INFO:root:

==============Scrape metadata===============
INFO:root:Scrapping metadata from connection_args
INFO:root:
1 table containers ready to be ingested...
INFO:root:

==============Prepare metadata===============
INFO:root:
--> database: Gcpuser
INFO:root:
37 tables ready to be ingested...
INFO:root:
==============Ingest metadata===============
DEBUG:google.auth._default:Checking /Users/Teradata/Apps/Cloud/GCP/teradata2dc-credentials.json for explicit credentials as part of auth process...
INFO:root:
INFO:root:Starting to clean up the catalog...
DEBUG:google.auth.transport.requests:Making request: POST https://oauth2.googleapis.com/token
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): oauth2.googleapis.com:443
DEBUG:urllib3.connectionpool:https://oauth2.googleapis.com:443 "POST /token HTTP/1.1" 200 None
INFO:root:0 entries that match the search query exist in Data Catalog!
INFO:root:Looking for entries to be deleted...
INFO:root:0 entries will be deleted.
INFO:root:
Starting to ingest custom metadata...
DEBUG:google.auth._default:Checking /Users/Teradata/Apps/Cloud/GCP/teradata2dc-credentials.json for explicit credentials as part of auth process...
INFO:root:
INFO:root:Starting the ingestion flow...
DEBUG:google.auth.transport.requests:Making request: POST https://oauth2.googleapis.com/token
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): oauth2.googleapis.com:443
DEBUG:urllib3.connectionpool:https://oauth2.googleapis.com:443 "POST /token HTTP/1.1" 200 None
INFO:root:Tag Template created: projects/partner-integration-lab/locations/us-west1/tagTemplates/teradata_database_metadata

INFO:root:Tag Template created: projects/partner-integration-lab/locations/us-west1/tagTemplates/teradata_table_metadata
INFO:root:Tag Template created: projects/partner-integration-lab/locations/us-west1/tagTemplates/teradata_column_metadata
INFO:root:Entry Group created: projects/partner-integration-lab/locations/us-west1/entryGroups/teradata
INFO:root:
INFO:root:1/38
INFO:root:Entry does not exist: projects/partner-integration-lab/locations/us-west1/entryGroups/teradata/entries/gcpuser
INFO:root:Entry created: projects/partner-integration-lab/locations/us-west1/entryGroups/teradata/entries/gcpuser
INFO:root:               ^ [database] 34.105.107.155/gcpuser
INFO:root:
INFO:root:Starting the upsert tags step
INFO:root:Processing Tag from Template: projects/partner-integration-lab/locations/us-west1/tagTemplates/teradata_database_metadata ...
INFO:root:Tag created: projects/partner-integration-lab/locations/us-west1/entryGroups/teradata/entries/gcpuser/tags/CWHNiGQeQmPT
INFO:root:
INFO:root:2/38
INFO:root:Entry does not exist: projects/partner-integration-lab/locations/us-west1/entryGroups/teradata/entries/gcpuser_Categories
INFO:root:Entry created: projects/partner-integration-lab/locations/us-west1/entryGroups/teradata/entries/gcpuser_Categories
INFO:root:               ^ [table] .34.105.107.155/gcpuser/Categories
INFO:root:
INFO:root:Starting the upsert tags step
INFO:root:Processing Tag from Template: projects/partner-integration-lab/locations/us-west1/tagTemplates/teradata_table_metadata ...
INFO:root:Tag created: projects/partner-integration-lab/locations/us-west1/entryGroups/teradata/entries/gcpuser_Categories/tags/Ceij5G9t915o
.
.
.
.
.
.
.

INFO:root:
INFO:root:38/38
INFO:root:Entry does not exist: projects/partner-integration-lab/locations/us-west1/entryGroups/teradata/entries/gcpuser_tablesv_instantiated_latest
INFO:root:Entry created: projects/partner-integration-lab/locations/us-west1/entryGroups/teradata/entries/gcpuser_tablesv_instantiated_latest
INFO:root:               ^ [table] 34.105.107.155/gcpuser/tablesv_instantiated_latest
INFO:root:
INFO:root:Starting the upsert tags step
INFO:root:Processing Tag from Template: projects/partner-integration-lab/locations/us-west1/tagTemplates/teradata_table_metadata ...
INFO:root:Tag created: projects/partner-integration-lab/locations/us-west1/entryGroups/teradata/entries/gcpuser_tablesv_instantiated_latest/tags/Ceij5G9t915o
INFO:root:
============End teradata-to-datacatalog============

Step 4: Explore Teradata Vantage Metadata with Data Catalog

Go to Data Catalog console, click on the project (i.e. partner-integration-lab) under Projects. The Teradata tables are showing on the right panel.
Screen-Shot-2022-02-22-at-2-09-44-PM.png

Click on the table to your interest (i.e. CITY_LEVEL_TRANS), and you’ll see the metadata about this table:

Screen-Shot-2022-02-22-at-2-11-11-PM.pngCleanup (optional)

Clean up metadata from Data Catalog:

Copy https://github.com/GoogleCloudPlatform/datacatalog-connectors-rdbms/blob/master/google-datacatalog-teradata-connector/tools/cleanup_datacatalog.py to local directory

change to where the file is –cd datacatalog/lib/python3.8/site-packages/google/datacatalog_connectors/tools/

Change directory to where the file is and then run following command:

python cleanup_datacatalog.py --datacatalog-project-ids=$TERADATA2DC_DATACATALOG_PROJECT_ID

Deactivate virtualenv

deactivate

关于我们 Wenjie Tehan

Wenjie is a Technical Consulting Manager, currently working with the Teradata Global Alliances team. 
 
With over 20 years in the IT industry, Wenjie has worked as developer, tester, business analyst, solution designer and project manager. This breadth of roles makes her perfect for the current role, understanding how the business needs data and how this data can be managed to meet those business needs.  
 
Wenjie has a BS in computer science from University of California at San Diego, and ME in computer engineering at Cornell University. Wenjie is also certified on both Teradata and AWS. 查看所有帖子 Wenjie Tehan

关于我们 Kevin Dougherty

Kevin Dougherty is a Product Management Director at Teradata and the Product Management Lead for Teradata Vantage on Google Cloud. Kevin is a proven and results-oriented entrepreneurial cloud leader, driving the product vision, strategy, and roadmap for Teradata’s as-a-service offerings on Google Cloud. Since joining the company in 2016, Kevin has held senior staff product management responsibilities spanning both public and private cloud spaces. Kevin brings to Teradata a wealth of product management and SaaS experience from multiple successful startups since the early 2000’s with a particular love for solving complex business challenges at enterprise scale.

查看所有帖子 Kevin Dougherty

随时了解情况

订阅 Teradata 的博客,获取每周向您提供的见解



我同意作为本网站提供商的Teradata天睿公司可能偶尔向我发送Teradata市场沟通电子邮件,其中包含有关产品、数据分析、活动和网络研讨会邀请的信息。我了解我可以随时通过点击我收到的任何电子邮件底部的取消订阅链接取消订阅。

您的隐私很重要。您的个人信息将根据Teradata全球隐私政策收集、存储和处理,您可以通过单击此隐私链接阅读和打印。

从 Teradata 查看更多信息