Tag: Data Engineering

Transforming Data Engineering with DevOps on the Databricks Platform - A data pipeline morphing into a sleek, automated production line or circuit pattern

Transforming Data Engineering with DevOps on the Databricks Platform 

The role of the Data Engineer is rapidly changing, from writing ETL scripts to engineering production-grade data products. On the Databricks Lakehouse Platform, this shift demands more than technical know-how; it requires a DevOps mindset. By embracing software engineering best practices, automated testing, and CI/CD pipelines, data teams can deliver scalable, reliable, and secure solutions. This blog explores how DevOps principles and tools like Git Folders and Databricks Asset Bundles are transforming data engineering into a discipline of continuous innovation and delivery.

Read more

Data Build Tool (dbt) for Effective Data Transformation on AWS – Part 2 Glue

The data build tool (dbt) is an effective data transformation tool and it supports key AWS analytics services – Redshift, Glue, EMR and Athena. In part 2 of the dbt on AWS series, we discuss data transformation pipelines using dbt on AWS Glue. Subsets of IMDb data are used as source and data models are developed in multiple layers according to the dbt best practices.

Read more

Serverless Application Model (SAM) for Data Professionals

We’ll discuss how to build a serverless data processing application using the Serverless Application Model (SAM). A Lambda function is developed, which is triggered whenever an object is created in a S3 bucket. 3rd party packages are necessary for data processing and they are made available by Lambda layers.

Read more