Calls for cloud diversification after global AWS outage

Johannesburg, 21 Oct 2025

The global AWS outage showed that online services are heavily dependent on a handful of infrastructure and cloud service providers. (Image source: 123RF, created via GenAI)

Analysts and industry players are urging organisations to review their cloud computing environments following the global outage experienced yesterday by Amazon Web Services (AWS).

The outage disrupted internet services worldwide, affecting popular platforms such as Snapchat, Fortnite, Starbucks, Ring and Alexa.

The disruption, traced to a DNS and network load-balancer failure in AWS’s US-EAST-1 region, caused login errors, payment issues, and downtime across banks, airlines and government services.

Although Amazon restored most systems, the incident highlighted the global dependency on a few cloud providers and the risks of centralised digital infrastructure.

According to estimates from Synergy Research Group, Amazon’s market share in the worldwide cloud infrastructure market amounted to 30% in the second quarter of 2025, ahead of Microsoft's Azure platform at 20% and Google Cloud at 13%.

AWS says all its services have fully recovered, noting that the disruption originated in the US-EAST-1 region, where DNS resolution issues in DynamoDB triggered widespread failures across dependent services such as EC2, Lambda, CloudWatch and Network Load Balancers.

Dependency syndrome

Commenting on the incident, Ian Jansen van Rensburg, head security engineering for Africa at Check Point Software Technologies, says: “Whether this AWS outage was the result of human error, a system failure, or a cyber attack, the issue remains the same – online services are heavily dependent on a handful of infrastructure and cloud service providers.”

According to Jansen van Rensburg, examples such as CrowdStrike last year, which caused a global IT outage following an update, or more recently the shutdown of several European airports following a cyber attack on software, show how dependent we are on these services.

He notes that an outage or cyber attack on one link in the supply chain disrupts the entire chain.

“The AWS outage is another reminder that the digital world doesn’t stop at borders – a local fault can ripple worldwide in minutes. We’ve built convenience on shared systems, but resilience still depends on people and process.

“For individuals, that means keeping good backups, saving key information offline, and knowing alternative ways to connect or pay if systems fail. Stay alert for scams or phishing attempts – especially when banking sites are down – and never click links or share details you don’t recognise.”

For organisations, Jansen van Rensburg believes it’s time to diversify. “Don’t keep everything in one cloud. Test your failovers, train your teams, and plan for downtime before it arrives.

“When companies rush to restore access, systems and staff are stretched thin − and that’s when attackers strike. Expect a spike in fake ‘refund’ or ‘discount’ offers, phishing e-mails, and scam links claiming to fix the problem.

“It’s not just businesses at risk. Many of the affected platforms are games and apps used by children − a prime time for scammers to exploit trust. Because the internet may be global, but resilience starts local − with what each of us does next.”

For Rafe Pilling, director of threat intelligence at Sophos, when anything like this happens, the concern that it's a cyber incident is understandable.

“AWS has a far-reaching and intricate footprint, so any issue can cause a major upset. In this case, it looks like it is an IT issue on the database side and they will be working to remedy it as an absolute priority.”

Genie out of the bottle

Mark Walker, MD of IDC SouthAfrica, says many enterprise users, including banks, government departments, insurance houses, retailers and educational institutions use AWS, and had interruptions to their online-based cloud services.

He notes that the degree of interruption varied depending on the applications and localities in use.

“Local software developers also experienced interruptions given the fundamental nature of AWS services underpinning tools and platforms they use. In most cases, alternative routing or loading options were put in place to minimise downtime.

“The cloud genie is already out of the bottle and highly reliable, flexible and secure so a hybrid or diversified approach that lessens operational risk by classifying those applications and data that are kept on site versus completely in the cloud is critical.”

According to Walker, most organisations already have risk mitigation policies in place; however, a fresh review of system architecture will very likely be a top agenda item at both AWS and enterprise users following this event.

“A hard day at the office for AWS. We forget that ultimately technology can (and will) fail so the importance of strong architecture reviews and risk management are critical, especially as we become more reliant on hyperscalers to deliver but even more so as AI becomes baked into these systems.”

Striking a balance

According to Christopher Geerdts, MD of BMIT, the outage was significant and impacted South Africans consuming a range of services across various categories – social media, gaming, content, video conferencing and collaboration, payment and commercial trading platforms.

“The direct impact seems to have been less than in the United States and United Kingdom, but Standard Bank was definitely impacted directly and many South African companies use the platforms that were impacted – for example, those who use Canva or Zoom within their operations,” says Geerdts.

Faced with such an outage, he says resilience is always recommended. He points out that AWS has been down three times in the last few years, and BMIT expects cyber attacks on platforms in general to increase in frequency and intensity, making downtime likely to become more frequent.

“There is always a balance for companies between spending more on resilience versus relying on one service, especially for companies in start-up phase, and the best is to determine which services are mission-critical and find solutions for those.

“The AWS issues appear to have all been related to one region, and therefore replication at another region seems the most cost-effective solution, whilst using another cloud provider would enable additional redundancy, but would then double many of the costs.”

However, he says achieving resilience can be complex. “In cases such as the AWS failure, which involved a database service, having a redundant solution requires that data be synchronised in real-time.

“Furthermore, many companies consume a bouquet of interconnected cloud services from one cloud provider, and replication is complex. In addition, much of the impact was indirect – being on collaborative or payment platforms − which cannot simply be replicated.”

Geerdts says companies need to ensure they have a considered and documented resilience strategy, which prioritises mission-critical operations. They should also consider spreading workloads and replicating resources on a multi-cloud and multi-region basis.

“The additional wake-up call is to look beyond one’s own IT operations, to consider all the third-party platform services we now rely on, such as for customer interaction, collaboration and payments. We expect to see companies now interrogating the platform services they rely on and even selecting based on the redundancy strategies of those platforms (Canva, Slack or Zoom being examples).”