Simplify Streaming Ingestion on AWS – Part 1 MSK and Redshift

Streaming ingestion from Kafka (MSK) into Redshift and Athena can be much simpler as they now support direct integration. In part 1 of the simplify streaming ingestion on AWS series, we discuss an end-to-end streaming ingestion solution using EventBridge, Lambda, MSK and Redshift. We also use AWS SAM integrated with Terraform for developing a Lambda function locally.

In AI We Trust – Part 1

This blog explores the key ingredients that make for a successful machine learning use case, and the importance of finding the right balance.

How to configure Kafka consumers to seek offsets by timestamp

We will discuss how to configure the Kafka consumer to seek offsets by timestamp where topic partitions are dynamically assigned by subscription. Docker Compose is used for building a single node Kafka cluster and running multiple consumer instances.

Data Build Tool (dbt) for Effective Data Transformation on AWS – Part 5 Athena

The data build tool (dbt) is an effective data transformation tool and it supports key AWS analytics services – Redshift, Glue, EMR and Athena. In part 5 of the dbt on AWS series, we discuss data transformation pipelines using dbt on Amazon Athena. Subsets of IMDb data are used as source and data models are developed in multiple layers according to the dbt best practices.

Data Build Tool (dbt) for Effective Data Transformation on AWS – Part 4 EMR on EKS

The data build tool (dbt) is an effective data transformation tool and it supports key AWS analytics services – Redshift, Glue, EMR and Athena. In part 4 of the dbt on AWS series, we discuss data transformation pipelines using dbt on Amazon EMR on EKS. Subsets of IMDb data are used as source and data models are developed in multiple layers according to the dbt best practices.

Data Build Tool (dbt) for Effective Data Transformation on AWS – Part 3 EMR on EC2

The data build tool (dbt) is an effective data transformation tool and it supports key AWS analytics services – Redshift, Glue, EMR and Athena. In part 3 of the dbt on AWS series, we discuss data transformation pipelines using dbt on Amazon EMR. Subsets of IMDb data are used as source and data models are developed in multiple layers according to the dbt best practices.