With our Heads in the Cloud

Let’s peel back the covers on the Cevo hive mind, and watch the team take on the tricky subject of cloud agnostic solutions.

Steve Mactaggart

In this post we look the future of cloud computing within the context of vendor lock-in, and the increased demand for cloud-agnostic solutions. I’m going to play arm chair futurist and, by reflecting on our past, take a look at one possible future for cloud computing.

There has been a lot of engaging discussion amongst the team at Cevo recently around the concept of multi-cloud or cloud-agnostic design.

There is a tendency for snap judgement by technologists to say that a multi-cloud strategy is overly complex and too expensive to maintain. I’m one of those sceptics. And I don’t just throw general scepticism at this issue; I look at the current ecosystem for examples of this being plausible - and I don’t find many. What I do find are many failed attempts at creating a common interface between cloud providers in hopes of achieving this goal.

In most of these failed cases, the rate of change of the different cloud providers and their divergent approach to features and structure, made it near on impossible to develop and maintain an abstraction layer sufficiently complex to provide any value, and generic enough that didn’t end up tying a solution to one provider over another.

So WHY are we thinking multi-cloud or cloud agnostic in the first place?

My colleague Mark Danaro put it succinctly as follows:

Every large enterprise’s dilemma about multi-cloud is a political risk-based concern. It is multi-faceted and not solely driven by vendor lock-in.

Firstly, APRA (Australian Prudential Regulatory Authority) - one of the governing bodies regulating the finance sector, have expressed concerns around ‘concentration risk’. Huh? Concentration risk is the risk stemming from all/most/lots of ‘stuff’ being provided by only one supplier. In 2015-16, this was something that APRA worried about because AWS had most of the material public cloud workloads.

Secondly, there is the actual biz stakeholders who do not want to shoulder the risk appetite going ‘all in’ with just one vendor. Financial institutions traditionally like to hold the upper hand with suppliers but have been burnt in the past by the likes of IBM/Oracle. This concern drives requirements for things like multi-cloud.

The good news is that most companies don’t want to go multi-cloud - they simply want to know how it can be done, and how they can ‘best’ orchestrate the development of their cloud infra and apps to minimise the pain if a ‘swap’ was required. Having this plan is also all that APRA are seeking.

So, one of the main concerns is not about being multi-cloud, but by being able to manage our risk if the basket in which we put all of our eggs suddenly has a change of heart, raises its prices or even goes out of business.

While most companies can wear the risks of the like of Google, Microsoft or Amazon going out of business or significantly changing their business model, there are an increasing number of businesses adopting these cloud platforms that we as a society can’t afford to fail.

If all of our banks, financial institutions and utility providers are not prudential with their vendor and technology selection, it is not only them and their shareholders that are at risk if something happens, but we as consumers and our economy in general.

So what do we propose? And what things do we need to consider?

Let loose on the topic, our team debated this across many aspects - starting with thinking about patterns that can be used to abstract away specific implementation details.

@Rhyno voiced:

This is already solved in the dev world. This is something I was working on with one of my open source projects. Just push all the logic into a library. The serverless function are just proxy’s that offload to the library. This is called a facade pattern.

@PaulB chipped in with:

Can I please point out the obvious …? This started with a discussion around ‘Serverless’. There are many tools in place that can make you pretty agnostic already - dockerization, portable codebases, Kubernetes/helm - not to mention a bunch services you can roll your own (Apache has dozens).

Of course, I couldn’t be kept quiet, piping in with:

Compute is only one aspect of the cloud - the other two being storage and connectivity - these are much more difficult to deliver in an agnostic way unless they are rolled into the active running of the solution.

We all have experience of disaster recovery (DR) solutions that never work, and that is designed around the same tech stack and similar hardware with controlled networking - imaging needing to DR over from AWS to Azure. Have all those NACL and network connectivity elements been updated to support the different connection methods?

@Colin provided a wise point of view:

The main challenge of multi-cloud is not “how can I package my apps to work in different environments”, it’s “how do I port between different high-level services”?

The answer is, there’s no direct way to do it. As @Rhyno correctly pointed out, facade pattern gets you there but then you’re chasing N different APIs, one from each cloud provider, and you never get to leverage the richness of the underlying service. Instead, the least unreasonable approach appears to be “understand the cost of porting/migrating, and build that in to your projects/estimations.

I piped back in:

Currently the best solutions we have are to limit the interfaces to vendor specific services and use tools and frameworks that are as portable as possible.

For example, Terraform: this is touted as a multi-cloud solution for infra, but in reality, it is just a common language and tool kit - all of the AWS/GCP/Azure services are defined differently as they each have different constraints.

There is a lot of value in having that one common language, but we sacrifice the power at the edge of each of the providers playing the lowest common denominator problem.

@Rhyno reflected on these points adding:

With agnostic serverless stuff, couldn’t we just create interfaces for each service we want to interact with and then just use dependency injection to inject the implementation for each vendor?

So, if I use a queue in my code, I wrap this in an interface and inject the implementation of the queue for each cloud provider.

Quickly followed by a @Colin rebuttal:

Sure, but you’re now either limiting the application code to whatever interface you provide – which means that you’re abstracting over many different implementations, losing the richness of each of them – or you’re constantly chasing multiple vendor’s APIs.

A previous employer tried to do this in order to be “cloud agnostic” just against AWS and just for provisioning EC2 instances and it turned into a team who had to maintain the library constantly. Just for Ruby! And after a year, it was “get off this thing!”)

@RobL ever wisely contributed:

I think there’s various levels of approaches to this from fully cloud agnostic (I can deploy my apps on any cloud at any time, and they can only use services that are ‘common’ to all clouds) to the more pragmatic end (the work required for me to move my code from one cloud to another is minimal).

All have their advantages and disadvantages, and it’s up to us to make those making the decisions about this aware of their options.

I continued on the abstraction thread:

The issue of managing an interface layer takes capacity away from the organisation from focusing on its actual value. We have a discreet number of engineering hours a week, whether we spend that on managing an abstraction or getting things done.

This is the reason why frameworks exist, to abstract the details - I’m sure if someone built a solid multi-cloud abstraction then people would use it, but I’ve not seen one of those yet.

@RobL reminded us of competitive advantage:

We need to keep in mind the trade-offs/impact of going cloud-agnostic. i.e. if you want to do full cloud agnostic you’re essentially limiting yourself to only using services that are common across all clouds, which means you don’t get to play with the ‘new-shinys’ that get released, and also what it could mean if your competitors are not limiting themselves etc.

@PaulB summarised our discussion with a sage list of considerations:

Maybe there are some considerations pointed out that could be suggested to keep you as agnostic as possible?

Some that come to mind are:

* Choice of codebase

* Choice of delivery (containers as artefacts)

* Choice of event interactions (e.g. queues over events)

* Choice of monitoring (Grafana, Nagios)

* Choice of metric collector (DataDog, NewRelic Prometheus)

* Choice of coding pattern

* Choice of storage

* Choice of database engine + hosting.

This is a great view into what we at Cevo call the hive mind - a group of diverse individuals, with different experiences and paths - all able to communicate in an open, fast flowing and ego-less environment.

This Stuff is Hard

What we were circling was the same dilemma as others before us - this stuff is hard. But I said I was going to go armchair futurist today, so here is where I start to trip off into said future.

I wonder if AWS Outpost is a play for AWS to mitigate this risk - you can run your own API compatible cloud - but you have to do all the hardware and physical security, etc. If you can take the risk of AWS not going out of business, then let them run it.

The common thread in this discussion is that we are all proponents of good system design. We all want to use patterns like infrastructure as code, resilient design, service decoupling, etc. And currently that is really only achievable on the cloud - but if things like on-premises Kubernetes, or Openstack or any of the cloud-in-a-box options become available, they’ll definitely be considered. I’m not bound to the cloud as a thing. I’m motivated to deliver value to customers as fast as possible, and at the moment it is hard to beat the cloud providers in that case.

If we start to see increased prevalence of cloud-in-a-box options then I’ll be happy to adopt them - but I personally wouldn’t risk running it (I wouldn’t trust myself).

What all this could do is open up a secondary market here for companies to host API compatible clouds - like the data centres do today, only the interaction layers we use are API compatible to current cloud providers.

A Cloud Cycle?

I’ve said for a long time that IT moves in cycles - we centralise, then decentralise. We started with mainframes centralising our compute, then went out to minicomputer. We then delivered apps via desktop for clients, then pulled all this back to the webserver with MVC patterns. The we started moving compute back to the edge with reactive web apps driven by API’s.

Even mobile devices have gone through this cycle starting with web frames, moving on to native apps; now we are starting to see the cycle back towards progressive web apps.

I wonder if the next in-and-out is going to happen in the cloud space. We have been drawing all of our compute and storage and networking in - it’s about time that it starts to go back out.

I think we are seeing the start of this with IoT - moving processing and reaction back out to the edge, while using the central location for aggregation and co-ordination.

Satisfying a risk appetite

But there could be an even more seismic shift if we can start to get cloud-in-a-box solutions, where organisations have the ability to turn the dial on the risk they are willing to accept. They can maintain their own cloud-like system under their control, have it extended from the SaaS cloud and manage which workloads run where.

I foresee organisations not wanting to wear the risk of having a fully SaaS cloud solution, but needing the agility that its API first design allows. They will use the on-demand flexibility from the first party players like AWS/GCP/Azure - but will deliver their production workloads on API compatible solutions managed and hosted by independent third parties.

This ability to seamlessly transition from one supplier to the next will satisfy their risk appetite, while still allowing the right level of speed to market to allow them to remain competitive in the market.

We might look at this and think that the stranglehold that AWS/GCP/Azure have is too great - and this lead will never be assailed. But remember many people said that about Microsoft with Windows, of Facebook for social or even Google for search - none of these pillars are as dominant as they were; technology and society just keep on moving.