Using Apache Spark and Glue Job to load Apache Iceberg tables on batch and streaming mode
In this article, Fabio Ramos explains how you can implement Apache Iceberg in your modern data architecture.
In this article, Fabio Ramos explains how you can implement Apache Iceberg in your modern data architecture.
In this post, Roopa Venkatesh compares Snowflake and Databricks data analytics platforms to uncover the right fit for your business.
This blog explores Cevo’s Apache Spark on AWS EKS solution, designed to solve some of the challenges of big data analytics.
We will discuss how to set up a remote dev environment on an EMR cluster deployed in a private subnet with VPN and the VS Code remote SSH extension. Typical Spark development examples will be illustrated while sharing the cluster with multiple users. Overall it brings an effective way of developing Spark apps on EMR, which improves developer experience significantly.
We’ll discuss how to implement data warehousing ETL using Iceberg for data storage/management and Spark for data processing. A Pyspark ETL app will be used for demonstration in an EMR local environment. Finally the ETL results will be queried by Athena for verification.
We’ll discuss how to create a Spark local dev environment for EMR using Docker and/or VSCode. A range of Spark development examples are demonstrated and Glue Catalog integration is illustrated as well.
Cevo trades as Cevo (VIC) Pty Ltd and Cevo (NSW) Pty Ltd | © All Rights Reserved CevoTM
Cevo acknowledges the Traditional Owners of the land on which our offices are situated, and pay our respects to their Elders past, present and emerging.