3 Months Into an All-In AWS Migration: A Sailboat Retro

Trent Hornibrook

10 July, 2018

Use the wind, keep the anchor light, and mind the rocks – there’s land ahead!

I’m currently Tech Lead on an all-in Amazon Web Services (AWS) migration for the Australian arm of a multinational company. Three months in, I thought that sharing my insights into our migration activity may be of benefit and interest to the wider community.

To do this, I’m harking back to my Agile roots, and using the Sailboat Retro; a great, simple, retrospective tool to reflect on the past period of time as well as articulating future risks and future opportunities.

WHAT IS THE SAILBOAT RETRO?

The Sailboat (team) is heading towards land (the end goal). It has the wind (helping forces) behind it, propelling it to the destination – in this case an amazing continuous deliverable Amazon Web Services (AWS) environment for all the products and applications that make up the environment.

However, the Sailboat has an anchor of items slowing it down, hampering it from reaching this destination. Further, there might also be rocks ahead (risks) that could destroy the Sailboat if it cannot navigate around them.

THE RETRO – LOOKING AT THE PIECES

The Wind

It’s fun! It’s not often you get to perform a large scale AWS migration for part of a multi-billion dollar company! Celebrating success of each milestone, even if that celebration is small, encourages oneself to take a step back and look at the wider landscape instead of the day-to-day rocky waves.

Only paying for what we use in AWS While it’s now common for anyone who has ever worked within AWS, it is still extremely powerful to be able to build a brand new environment in AWS and validate the workings of a system with minimal cost.

Leveraging AWS services where possible Databases are not just for Christmas: When you run a project-scale database you must consider its uptime and resilience, operational metrics and recovery time and recovery point objectives. Having AWS take care of most of this enables us to focus on the rest of the migration.

Build in a learning culture Mastery forums like technical brown bags and guilds encourage the pollination of knowledge and validation of tools and techniques. For a large multi-year AWS migration, ensuring everyone on the team is learning from others is key to ensure we’re doing the most optimal thing.

The Anchor

*Get your VPC & networking straight!* Account and VPC network design and the communication structure is absolutely critical to get right. Trying to adjust route tables and network access control lists can be very difficult after that first large deployment into a new account. Invest heavily in getting this right from the start otherwise it can be awfully time consuming refactoring the foundations after-the-fact in a safe and repeatable way.

Least privilege from the start – but don’t block work from getting done Building in an operating model of privilege escalation from the start might take a little more time to get going, but will payoff in the security posture and security culture later. It’s easy to grant wide ranging access and hard to remove it after the fact. Building in a culture of trust (by enabling privilege escalation) but verify might also be a better operating model than imposing draconian security controls that people will look to circumvent anyway.

Continuously integrate everything Making it really easy to automatically deploy a Cloudformation stack via a CI system can take a lot of work to provision in a safe and secure way. IAM roles and trusts and building the governance around access control takes a lot of work. However once that capability is in place you’re left with a system that can address security controls to ensure that everything put into an environment is audited and tracked. It can then be really powerful to leverage that capability so that upon a commit to a branch, an automatic deployment to an environment takes place. Making it really easy for people to work within the security and governance framework reduces the likelihood of people circumventing it.

The Rocks

Momentum is key A large all-in AWS migration can take a long time, is high risk, and expensive. Demonstrating value early on increases the trust throughout the wider business and helps to build confidence on handling the harder parts to the migration. Once lessons are learnt from the first few application migrations, set aside capacity to address the highest risk/most difficult components – they’re the components that likely have a long lead time before a migration can take place and are critical to the success. Invest heavily in those, but only after trust and momentum are built.

Have a strategy for proprietary & legally bounded systems Some organisations are bound by long data retention laws which are generally implemented in backup tape technologies. Once everything is running in AWS it may not be possible to perform a restore from four years ago if all the backups are stored on Digital Linear Tape (anyone running Netbackup with a PX720 and SLDT tapes?). Having a strategy for those situations is key as they too might have a long lead time to execute. I hope you enjoyed reading this retro outcome – here’s to smooth sailing!