Glue Connectors Deployment Guide - Google Cloud Storage

Google Cloud Storage


The Google Cloud Storage Glue Connector for AWS Glue simplifies the process of connecting AWS Glue jobs to extract data from Google Cloud Storage. This connector provides comprehensive access to Google Cloud Storage data, facilitating cloud ETL processes for operational reporting, data governance, and more.

In this post, we will walk you through a step-by-step configuration to set up Google Cloud Storage connector for Glue to Extract, Transform & Load data from Google Cloud Storage. You can deploy the Google Cloud Storage connector for Glue from the AWS Marketplace.

Let’s start with discussing the prerequisites and then detailing the steps involved.

Versions Available: 1.0.0

Prerequisites


The following information should be available handy to start the setup.

You can pass the following options to the connector.

  • secret (required) the secret name which will contains GSP service account credentials.

  • Database (required) the database from Glue catalog.

  • table (required) the table name from Glue catalog you want to pull data(case sensitive).

The recommended approach to setting the connector options

Use AWS Secrets Manager can be used to store username, password and other sensitive information related to source connection.


GCP

 

The IAM Role you will be using in the Glue Job should contain the following policies

AWS Glue Service - To run the job.

AWS Glue Catalog - To access database and table from Glue Catalog

Amazon EC2 Container Registry (optional) - To access AWS Container Registry.

Secrets Manager (optional) - If you use AWS Secrets Manager for connection options.


GCP

Using the Google Cloud Storage Connector for AWS Glue


Here are the setup steps for configuring the Google Cloud Storage Connector:

  • Setup IAM Role for policies and secret AWS Secrets Manager

  • Setup Google Cloud Storage connector and a related connection on Glue Studio console.

  • Create a job.

  • Save and run the job.

Step 1: Setup IAM Role for policies and secret AWS Secrets Manager

Create database and table in AWS Glue Catalog:

  • Create table with location as gcs location. e.g. gs://bucket/folder

  • Add Schema with partition column if applicable. (Note: partition column should only be type varchar or String.)

  • Edit and add partition.pattern as table property in case of partitions e.g. partition.pattern : year=${year_column_name}/month${month_column_name}/

Create role and secret:

  • IAM Role for policies as described above.

  • Optional - Create secret in AWS Secrets Manager for the connector options described above.


Step 2: Setup Google Cloud Storage connector and a related connection on Glue Studio console

To set up the Google Cloud Storage connector and create a connection for your job:

  • Please subscribe to the product from AWS Marketplace and Activate the Glue connector from AWS Glue Studio.

  • Enter your connection name and choose "Create connection and active connector". You can optionally add a description, "Connection access" and "Network options". If you have created an AWS secret with the connector options you can choose your AWS secret from the dropdown under "Connection access".

    GCP


Step 3: Create a job

To create a job from your connection which is created in the previous step:

  • Choose the connection and "create job".

  • Select your created connection figure on the visual canvas.

    GCP

    GCP

    GCP

  • Add connection options and enter the necessary information.

    • Follow the earlier step (Connector options you need to set) to add the connection options

    • Or you can create an AWS secret with the connector options and attach with the connection as stated in the above step 2.1.

  • Enter the job name, choose IAM Role created in step 1, and other properties in the "Job details" tab, and Choose "Save"

    GCP

    GCP


Step 4: Save and run the job

Run the job after filling in all parameters and creating the connector job.

Conclusion


From the above details, you learned how to configure and use Glue connector for Google Cloud Storage. Now you don’t need to wait for all the data in your Google Cloud Storage to be used for ETL Jobs and maintained on a day-to-day basis to run your Jobs.

For more information, reach out to [email protected]

×

Get in Touch

Let us help you
transform and grow


By submitting your information, you agree to our revised  Privacy Statement.

Let’s Talk

x

Status message

We're eager to assist you! Please leave a message and we'll get back to you shortly.

By submitting your information, you agree to our revised  Privacy Statement.