Data Engineering Archives - Page 5 of 9

Snowflake vs Databricks: Which Platform is Best for Your Data Needs?

In this post, Roopa Venkatesh compares Snowflake and Databricks data analytics platforms to uncover the right fit for your business.

Streamlining Financial Crime Analytics with AWS SageMaker: A Comprehensive Guide

In this post, Roopa Venkatesh discusses how AWS SageMaker can be used to streamline financial crime analytics to streamline machine learning at scale.

Simplify Streaming Ingestion on AWS – Part 1 MSK and Redshift

Streaming ingestion from Kafka (MSK) into Redshift and Athena can be much simpler as they now support direct integration. In part 1 of the simplify streaming ingestion on AWS series, we discuss an end-to-end streaming ingestion solution using EventBridge, Lambda, MSK and Redshift. We also use AWS SAM integrated with Terraform for developing a Lambda function locally.

Accelerating Analytics with Apache Spark and Kubernetes

This blog explores Cevo’s Apache Spark on AWS EKS solution, designed to solve some of the challenges of big data analytics.

In AI We Trust – Part 1

This blog explores the key ingredients that make for a successful machine learning use case, and the importance of finding the right balance.

How to configure Kafka consumers to seek offsets by timestamp

We will discuss how to configure the Kafka consumer to seek offsets by timestamp where topic partitions are dynamically assigned by subscription. Docker Compose is used for building a single node Kafka cluster and running multiple consumer instances.

Data Build Tool (dbt) for Effective Data Transformation on AWS – Part 5 Athena

The data build tool (dbt) is an effective data transformation tool and it supports key AWS analytics services – Redshift, Glue, EMR and Athena. In part 5 of the dbt on AWS series, we discuss data transformation pipelines using dbt on Amazon Athena. Subsets of IMDb data are used as source and data models are developed in multiple layers according to the dbt best practices.

Cevo’s Guide to AWS re:Invent 2022 – Data

Welcome to Cevo’s guide to AWS re:Invent for 2022, where we share session recommendations designed to enhance your cloud transformation journey.

Data Build Tool (dbt) for Effective Data Transformation on AWS – Part 4 EMR on EKS

The data build tool (dbt) is an effective data transformation tool and it supports key AWS analytics services – Redshift, Glue, EMR and Athena. In part 4 of the dbt on AWS series, we discuss data transformation pipelines using dbt on Amazon EMR on EKS. Subsets of IMDb data are used as source and data models are developed in multiple layers according to the dbt best practices.

Data Build Tool (dbt) for Effective Data Transformation on AWS – Part 3 EMR on EC2

The data build tool (dbt) is an effective data transformation tool and it supports key AWS analytics services – Redshift, Glue, EMR and Athena. In part 3 of the dbt on AWS series, we discuss data transformation pipelines using dbt on Amazon EMR. Subsets of IMDb data are used as source and data models are developed in multiple layers according to the dbt best practices.