Frameworks for Resilent Cloud Solutions:

“Our greatest glory is not in never falling, but in rising every time we fall.”

― Confucius

Migrating to the cloud is now common practice, but how can you make your cloud applications resilient to failure? 

We’ve analysed a few frameworks that provide Architecture Best Practice for Cloud Solutions and listed here their core principles. Among the most popular ones, Microsoft and AWS “Well-Architected Frameworks” set out in detail how to successfully architect cloud solutions. 

The dimensions that should be considered when architecting a resilient service for the cloud should include both Technical and Commercial aspects, discussed in more detail below. 

Technical Dimensions

When migrating to the cloud, redundancy and fault-tolerance are the most important technical aspects. The common denominator in this section is to design a cloud architecture where individual components can fail without affecting the availability of the entire system. To do so, the following aspects of the cloud solution should always be addressed. 

  • Withstand Delay:  applications deployed on the cloud should be designed to manage network delays. This includes the extreme case where the delay is so long to consider the instance lost. The designed application should be resilient against delays on a single instance, a service node or a full service and should be able to recover failure without the need to know the status of the failed instances (the use of idempotent functions is always to be preferred as allows to easily recover from failure replaying failed messages). Some organisations implemented automation that periodically simulates component failures and tests service resilence, it is advisable to be ready for failure and such practice should be part of the normal routine.
  • Enforce Compliance: after the application is migrated to a cloud infrastructure it is important to analyse what’s happening in production. Instances can behave erratically and sometimes configurations are not executed correctly, this can happen especially when manual intervention changes are enforced and violate best-practice principles (e.g. concerning auto-scaling groups).
  • Maintain Health: applications must be monitored to detect signs of unhealthy instances and to be able to proactively intervene before instance failures lead to service degradation. 
  • Keep it Clean: in a cloud environment, the number of resources allocated to a process should always be balanced by a computing load. If this is not happening, resources are getting wasted leading to unjustified cost.
  • Be Secure: provisioned cloud instances must be proactively monitored to make sure their configuration is secure (e.g. they are part of a valid security group and their certificates are not expiring). 

Commercial Dimensions

It is very easy to get the cost of operating a cloud solution out of control. Furthermore, each main cloud provider has an ever-growing number of shiny product offerings that make both IT and Business users feel like a kid in a candy shop. For these reasons, the solution should be built incrementally, focussing on quick wins that have high business value first and avoid a capital intensive solution. This approach helps to keep computing resources efficient, meeting and maintaining system requirements as demand changes and technologies evolve. To do so, the following aspects of the cloud solution should always be considered. 
  • Control Cost: the cost of Cloud solutions can spiral up very quickly so workloads should be monitored and optimised by using the right resources and sizes. Cost calculators can offer an initial estimate of the operational costs but for optimal results you should follow a BUILD-MEASURE-OPTIMISE approach. BUILD your initial cloud solution using Prepay and on-demand offerings before committing to reserved instances. This will allow time to MEASURE, monitor and OPTIMISE the design of the solution to maximise cost reduction.  Do not forget to set policies and controls to limit the cost of your solutions. 
  • Be wary of Lock-In: many cloud providers offer compelling upselling offers that lure you in their world, together with the promise of free training and world-class excellence. The quality of cloud providers varies from service to service but they are averagely comparable, so keep in mind that the best value offer should always be considered even when it leads you far from your main cloud provider.
  • Look for Volume Deals: once there is a clear understanding of how many resources your cloud solution requires, it is time to think long term and reserve your cloud resources to save in the long run.
  • Leverage on what you already have, go hybrid: migrating to the cloud does not prevent your organisation from using the full-lifecycle of the resources you already invested on. It is wise to consider a hybrid approach (mixed on-premise and on-the-cloud) if a big infrastructure investment was already done in a specific domain and transition to a full cloud model only after you made full use of the existing infrastructure.

Are your Cloud Solutions Resilient enough? 

If you would like to discuss further then contact our partner Bruhati as they can help drive business value for your organisation by architecting your cloud solutions for resilience.

For more information please visit or contact us on

Feel free to comment and subscribe to be notified when a new article is published.

Follow the Author on LinkedIn

Manuel Di Toma

Inline Feedbacks
View all comments