Building Serverless PySpark Jobs with EMR-Serverless and MWAA
In this blog, Jayaananth Jayaram highlights how both EMR Serverless PySpark jobs on MWAA can revolutionise big data processing and analysis.
In this blog, Jayaananth Jayaram highlights how both EMR Serverless PySpark jobs on MWAA can revolutionise big data processing and analysis.
We will discuss how to set up a remote dev environment on an EMR cluster deployed in a private subnet with VPN and the VS Code remote SSH extension. Typical Spark development examples will be illustrated while sharing the cluster with multiple users. Overall it brings an effective way of developing Spark apps on EMR, which improves developer experience significantly.
We’ll discuss how to create a Spark local dev environment for EMR using Docker and/or VSCode. A range of Spark development examples are demonstrated and Glue Catalog integration is illustrated as well.
Yet another serverless solution for invoking AWS Lambda at a sub-minute frequency
As described in the product page, AWS Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning, and application development. For development, a development endpoint is recommended but it can be costly, inconvenient or unavailable (for Glue 2.0). The AWS Glue team published a Docker image that includes the AWS Glue binaries and all the dependencies packaged together. After inspecting it, I find some modifications are necessary in order to build a development environment on it. In this post, I’ll demonstrate how to build development environments for AWS Glue 1.0 and 2.0 using the Docker image and the Visual Studio Code Remote – Containers extension.
Cevo trades as Cevo (VIC) Pty Ltd and Cevo (NSW) Pty Ltd | © All Rights Reserved CevoTM
Cevo acknowledges the Traditional Owners of the land on which our offices are situated, and pay our respects to their Elders past, present and emerging.