Leveraging event-driven architecture to digitalise fuel dockets at Flybuys
The opportunity at a glance
Cevo assisted Flybuys to build and launch a new serverless, event routing platform on AWS as part of their new Digital Fuel Dockets initiative.
Flybuys is an Australian customer loyalty program equally owned by the Coles Group and Wesfarmers through joint venture Loyalty Pacific. Members can collect points by shopping at Coles Group and Wesfarmers brands and some third-party partners. Points can be redeemed for money off purchases as well as vacations and household goods.
Coles fuel discount offers/dockets deals are only printed on the Coles supermarket receipts as a barcode for a qualifying spend. The Digital Fuel Dockets initiative aimed to deliver a digital fuel docket that can be viewed and activated on the Flybuys mobile application and website. The digital docket can be redeemed at any Coles Express outlet thereby removing the need for customers to carry around their old receipts to use the fuel discount or bonus Flybuys points offer.
The end goal of the initiative was to build and deliver an event-driven, serverless solution that would enable partners to push offer-specific events to Flybuys’ systems. The reason behind the need to incorporate a serverless approach was to minimise operational costs and ease the process of managing and deploying new features to production.
Cevo was engaged to work in collaboration with the team at Flybuys and help bring the initiative to market with our extensive AWS experience in cloud native engineering and DevOps best practices.
The solution, termed as the “Event Hub” was built on AWS utilising serverless technologies namely, EventBridge, Lambda and SQS.
The entry point to the system is via an API Gateway which performs functions like validation of request body and headers, rate limiting, OAuth token and scope validation. Flybuys’ partners push events via the API Gateway and these get routed to a Lambda function in a private VPC. The Lambda function puts these events onto a common event bus. Events are inspected against a set of rules which are then forwarded to a secondary event bus referred to as the ‘partner bus’. The partner bus contains granular, domain-specific rules which route to SQS queues configured as targets. Downstream services poll the queues for events and process them in a timely manner.
Prevention of data loss
If the matching rule target, such as an SQS queue, is unavailable due to an outage, EventBridge will retry sending the event for up to 24 hours using exponential backoff. If the event fails to be re-sent to the target even after 24 hours, it would be delivered to dead-letter queue associated with the rule target. This is a relatively simple safeguard that has been implemented to prevent data loss due to service level failures.
In addition to using dead-letter-queues, the ‘event archive’ feature of AWS EventBridge has been utilised to aid in the replaying of events. The archive is attached to the common event bus and matches on all rules and stores them indefinitely. This safeguards against code-level bugs introduced into the system where events are incorrectly marked as being processed successfully. By archiving all messages, EventBridge can be instructed to replay the archived events by sending them back to the event bus once the bug has been fixed.
Continuous Integration and Deployment
The infrastructure components were created and deployed using AWS Cloud Development Kit (CDK). The CDK source code would build the CDK-based stacks and deploy them into the relevant AWS accounts. This combination accelerated the deployment process and helped in achieving a high level of repeatability, reliability and automation.
A goal of the project was to have zero downtime during the deployment of the Event Hub. This was achieved using AWS CodeDeploy to deploy the Lambda function using a canary deployment strategy. On deploying an update to the Lambda function, CodeDeploy transitions traffic between the old and new versions of the function in 10% increments over a 10 minute period. AWS CloudWatch Alarm monitors the error rates of the function during deployment and if triggered, will result in the deployment being automatically rolled back to the previous version of the function.
Observability, monitoring and tracing
Due to the distributed nature of the system, we endeavoured to make sure we had good visibility across events as they traversed the various systems. To aid this, multiple dashboards and monitors were built using a popular cloud-scale observability service.
The dashboards provide a near-real-time look at the health of the system by showing things such as the number of events being sent to and received from queues as well as whether the events have been delivered to any dead-letter-queues due to outages.
A number of monitors were also created that can trigger alerts when predefined thresholds are exceeded. This enables the IT operations team to triage and act upon any infrastructure related issues in a timely manner.
Great care has been taken to ensure the Event Hub was designed and built in such a way that is extensible and reusable across future initiatives.
The above solution enables Flybuys to:
- Improve DevOps practices and utilise the right tools, unlocking business agility and innovation
- Leverage / activate via pay-per-use pricing for reduced operational costs
- Manage and deploy changes to the solution rapidly and significantly reduce deployment error rates due to full automation
- Use serverless technologies provided by AWS reduces operational overhead
- Implement Canary Deployment of Lambda function to limit service disruption
- Easily create new partner event buses with minimal effort by reusing custom constructs generated with the AWS CDK
- Decouple services to enable parts of the overall system to be deployed independently from each other
- Adopt the usage of automated testing frameworks and mocking services
- Implement secure delegated access of applications to clients using the OAuth 2.0 authorization protocol