Amazon S3 Outage: What About Cloud Resilience?
Outages happen. It’s inevitable. Today’s example? Amazon S3
As one tweet declared, “Half the Internet” runs on S3, so when the public cloud is down, small and large businesses alike are impacted with downtime. While the actual impact of the disruption may not be measurable, for days to come we’ll continue to hear commentary and see estimated calculations of the effects of this outage across businesses around the world.
We, ourselves were somewhat impacted, though gratefully to a much lesser degree than many others. Our own Zerto technical documentation is housed on Amazon S3, so our product download links were unavailable for a short time.
Bottom line: As the reactionary memes roll through our twitter feeds, and discussion over the outage continues, the focus needs to be squarely on the fact that the cloud is not where it needs to be in terms of IT resilience. (What is IT Resilience? See here.) We, the community of IT professionals and IT companies, need to build and adopt tools and platforms with redundancy, scalability and with simpler recovery processes. IT resilience is achieved when a company is capable of responding to a disruption so quickly that end-users and customers are not aware that a disruption occurred.
How does resilience happen?
- Building more than one recovery site
- Using multiple clouds for recovery instead of replication within the same cloud or infrastructure
- Creating hybrid cloud environments where infrastructure “eggs” are not all in “one basket”, so that outages on one platform are mitigated, and do not take down the entire environment of a company relying on public cloud
Let’s all help to build a more resilient world.