Have you ever found yourself needing to take granular backups of RDS databases? This becomes particularly crucial when using a multi-tenanted approach for RDS, which is quite common in AWS architecture. While AWS Backup service offers a solution for backing up entire RDS instances, AWS lacks a native service for granular-level database backups. This absence poses a challenge for organisations aiming to maintain data integrity and resilience in their AWS environments.
In this blog, we will delve into a serverless solution that not only scales according to your needs but also executes backup processes for multiple databases concurrently, with minimal operational overhead, leveraging the following serverless services:
- AWS Step Functions for orchestrating workflows
- ECS for conducting backups
- AWS Lambda for Slack communications and file reading
- AWS EventBridge rules for scheduling backup process
- AWS S3 for storing backup files
Need for RDS Backups
Granular RDS backups are crucial for efficient data management, enabling selective backup and restoration of specific databases or tables, reducing storage costs, and ensuring faster recovery times. They also support compliance requirements by providing detailed backup data and facilitate adherence to data retention policies by offering flexibility in backup durations. Granular backups play a vital role in maintaining data integrity and availability, optimising storage resources, and minimising downtime in various operational scenarios.
Absence of an AWS native solution
The absence of a streamlined, automated solution for granular backups not only introduces operational overhead but also raises concerns regarding data loss and recovery capabilities. Furthermore, the lack of a native service for such backups within the AWS ecosystem prompts organisations to seek external solutions, often leading to increased costs and integration complexities.
Addressing these challenges requires a comprehensive approach that not only enables granular backups of RDS databases but also ensures scalability, efficiency and minimal operational overhead.
Organisations seek a solution that seamlessly integrates with existing AWS services, automates backup processes, provides flexibility in managing backup schedules and configurations, and offers robust error handling and notification mechanisms. There is a pressing need for a serverless solution that empowers organisations to efficiently manage granular backups of RDS databases within the AWS ecosystem while minimising complexity and operational burden.
Solution Design Considerations
- The solution must be scalable to 400 databases initially, with the ability to add more databases in the future.
- There should be a feature enabling the execution of backups for a predetermined set of databases. For instance, if a user wishes to manually initiate the step function for 10 databases out of a total of 400.
- The process must execute daily to back up all the databases.
- There could be a few large databases that should run in parallel; otherwise, the entire sequential process will take a lot of time.
- A list of large databases to be backed up should be provided as a static file in S3, which can be updated by the business user at any time.
- Completion notifications should be sent to the Slack channel.
- Error handling should be implemented for the failure of an individual database. Failure to take a backup of a single database must not impact others.
- Backup process may take more than 15 minutes.
Solution Overview
The main components of this whole backup solutions are as follows:
- Elastic Container Registry (ECR) – As a first step, a docker image containing a shell script is created and pushed to ECR repository. This image facilitates the execution of granular backups by reading a “large database” file from S3 and excluding listed databases, ensuring efficient backup processes. Execution logs are stored in an S3 bucket for auditing and troubleshooting purposes.
- Elastic Container Service (ECS) – A task definition file is created within ECS, defining the execution environment for backup tasks. RDS credentials are securely retrieved from the Parameter Store as Secrets, while bucket details are passed as environment variables, ensuring secure and seamless integration with other AWS services.
- Lambda Functions: Three distinct Lambda functions are utilised within the solution architecture:
- ListLargeDatabasesFromS3: This function reads the “large databases” file from S3, providing the necessary data for backup orchestration.
- SendCompletionNotification: This function is responsible for sending completion notifications to Slack channels, ensuring stakeholders are informed of backup status.
- SendFailureNotification: This function handles errors encountered during the backup process, ensuring timely response and resolution.
- AWS EventBridge – An EventBridge Rule is used to trigger the Step Function on a daily schedule, automating the execution of backup workflows.
- AWS Step Functions – As shown above, serving as a workflow manager, AWS Step Functions orchestrates the backup processes, ensuring seamless execution and coordination of tasks across various AWS services.
Benefits
Implementing the proposed serverless solution for granular backups of RDS databases offers numerous benefits to organisations operating within AWS environments:
- Improved Data Resilience: By enabling granular backups at the database level, organisations enhance their ability to recover from data loss or corruption incidents. The automated backup processes ensure that critical data is consistently backed up, reducing the risk of data loss and minimising downtime in the event of a disaster of a single client database.
- Cost Efficiency: Leveraging serverless services such as AWS Lambda, ECS, and Step Functions optimises resource utilisation and reduces operational costs. You can benefit from a pay-as-you-go model, eliminating the need for upfront investments in infrastructure while only paying for the resources consumed during backup operations.
- Scalability: The solution is designed to scale seamlessly according to the organisation’s needs, accommodating a growing number of databases and adapting to fluctuating workloads. As you expand their AWS footprint or onboard new tenants, the backup solution can effortlessly scale to meet evolving requirements without manual intervention.
- Operational Efficiency: Automation of backup processes minimises manual intervention, streamlines operations and reduces the risk of human error.
- Enhanced Security: Secure handling of credentials and sensitive data is ensured through integration with AWS Parameter Store for secret management. By leveraging AWS security best practices, organisations can maintain compliance with industry regulations and safeguard sensitive information against unauthorised access or data breaches.
- Real-time Monitoring and Reporting: Detailed execution logs stored in S3 enable real-time monitoring and auditing of backup processes. You gain visibility into backup status, execution times and any errors encountered, allowing for proactive troubleshooting and optimisation of backup workflows.
- Streamlined Collaboration: Integration with Slack for completion and failure notifications promotes collaboration and transparency within cross-functional teams. Stakeholders receive timely updates on backup status, facilitating communication and coordination across different departments or teams involved in data management processes.
Conclusion
In conclusion, the serverless solution presented in this blog offers a comprehensive and efficient approach to addressing the challenges associated with granular backups of RDS databases within AWS environments. By leveraging a combination of AWS services such as Elastic Container Registry, Elastic Container Service, Lambda Functions, EventBridge, and Step Functions, users can achieve scalable, automated, and cost-effective backup processes.
The solution’s architecture not only facilitates granular-level backups of RDS databases but also ensures flexibility, scalability, and security in managing backup operations. With features such as automated scheduling, error handling, notification mechanisms, and secure credential management, users can streamline backup processes, minimise operational overhead, and enhance data resilience.
Overall, the serverless solution outlined in this blog empowers users to achieve efficient and reliable granular backups of RDS databases within the AWS ecosystem. By implementing this solution, users can enhance their data management capabilities, mitigate risks associated with data loss or corruption, and drive business continuity in today’s dynamic and data-driven landscape.
In the next blog, we will discuss further about the actual CloudFormation code to deploy this solution, offering practical insights and guidance for implementation.