Tag: AWS Glue

Data Build Tool (dbt) for Effective Data Transformation on AWS – Part 2 Glue

The data build tool (dbt) is an effective data transformation tool and it supports key AWS analytics services – Redshift, Glue, EMR and Athena. In part 2 of the dbt on AWS series, we discuss data transformation pipelines using dbt on AWS Glue. Subsets of IMDb data are used as source and data models are developed in multiple layers according to the dbt best practices.

Read more

AWS Glue Local Development with Docker and Visual Studio Code

As described in the product page, AWS Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning, and application development. For development, a development endpoint is recommended but it can be costly, inconvenient or unavailable (for Glue 2.0). The AWS Glue team published a Docker image that includes the AWS Glue binaries and all the dependencies packaged together. After inspecting it, I find some modifications are necessary in order to build a development environment on it. In this post, I’ll demonstrate how to build development environments for AWS Glue 1.0 and 2.0 using the Docker image and the Visual Studio Code Remote – Containers extension.

Read more