Apache Spark Archives

Photo of the top of an iceberg protruding from dark water

Using Apache Spark and Glue Job to load Apache Iceberg tables on batch and streaming mode

In this article, Fabio Ramos explains how you can implement Apache Iceberg in your modern data architecture.

Snowflake vs Databricks: Which Platform is Best for Your Data Needs?

In this post, Roopa Venkatesh compares Snowflake and Databricks data analytics platforms to uncover the right fit for your business.

Accelerating Analytics with Apache Spark and Kubernetes

This blog explores Cevo’s Apache Spark on AWS EKS solution, designed to solve some of the challenges of big data analytics.

Develop and Test Apache Spark Apps for EMR Remotely Using Visual Studio Code

We will discuss how to set up a remote dev environment on an EMR cluster deployed in a private subnet with VPN and the VS Code remote SSH extension. Typical Spark development examples will be illustrated while sharing the cluster with multiple users. Overall it brings an effective way of developing Spark apps on EMR, which improves developer experience significantly.

Data Warehousing ETL Demo with Apache Iceberg on EMR Local Environment

We’ll discuss how to implement data warehousing ETL using Iceberg for data storage/management and Spark for data processing. A Pyspark ETL app will be used for demonstration in an EMR local environment. Finally the ETL results will be queried by Athena for verification.

Develop and Test Apache Spark Apps for EMR Locally Using Docker

We’ll discuss how to create a Spark local dev environment for EMR using Docker and/or VSCode. A range of Spark development examples are demonstrated and Glue Catalog integration is illustrated as well.