2012 was a year in which the cloud crashed – several times – bringing down a host of hosted services. And this trend is continuing in 2013. What are hosted services? What actually happens when the cloud crashes? And what does this mean for you?
What are cloud hosted services?
“A hosted service is a business that delivers a combination of traditional IT functions such as infrastructure, applications (software as a service), security, monitoring, storage, web development, website hosting and email, over the Internet.” The services are stored or “hosted” “in the cloud” and the end user accesses the services through a web browser, or a desktop or mobile app.
Companies use public cloud hosted services, in part, because they provide scalability, have lower upfront costs, may be more readily accessible for a mobile workforce, and are generally fully managed and supported. For example a company can increase its bandwidth during its busy season so that its staff don’t experience a degradation of services, and decrease its bandwidth when demand is less, and the company only pays for what it uses. An advantage of software as a service is that the cloud providers manage the infrastructure and platforms that run the applications, reducing IT operational costs; the business pays a subscription fee instead of the costs of hardware and maintenance. Hosted email and databases can be accessed from any device through a web browser.
What happens when the cloud crashes?
Amazon’s cloud came down in June, October and December of 2012. GoDaddy’s cloud came down in September 2012. Facebook, CloudFlare, Google Drive and Microsoft’s clouds all came down in the first quarter of 2013.
Each time one of the hosting platforms crashed, the websites and services they hosted went down too. What causes the cloud to crash and what does this mean for you?
What causes the cloud to crash?
Amazon’s outages had different causes. Amazon’s cloud services went down in June, when a lightning storm hit the data center in Virginia knocking out power and taking down Netflix, Pinterest, Instagram and other sites. On October 22, 2012 it was their Elastic Compute Cloud (EC2), an “infrastructure as a service” offering that hosts websites and web applications of varying sizes, which caused the disruption taking a number of popular web sites and services with it including Reddit, Foursquare and Heroku. Amazon’s cloud crashed again on Christmas Eve due to a problem with its Elastic Load Balancing which spreads heavy traffic among multiple servers to prevent overload. This caused a blackout on Netflix for most of the day and even affected Amazon’s own streaming service, Amazon Prime.
GoDaddy, the largest domain registrar and one of the biggest website hosts, was down for several hours on September 10, 2012 as the result of internal network events, taking with it millions of websites and causing widespread internet problems. GoDaddy, based in Scottsdale, Arizona, hosts 53 million domain names. Any domain registered with GoDaddy that used its nameservers and DNS records, even if the site used GoDaddy for DNS but was hosted elsewhere, was inaccessible when GoDaddy went down. The outage took down websites and email. While the initial news reports said that the service outage was caused by a hacker, GoDaddy reported that it was due to a series of internal network events that corrupted router data tables.
On January 28, 2013, Facebook faced the worst outage they had encountered in over four years, due to the handling of an error condition. Facebook had to actually “turn off the site” to recover from the feedback cycle. Their service was down for up to 2 ½ hours for some users.
On February 25, 2013, Microsoft suffered an outage to its hosted email platforms, Hotmail.com and Outlook.com, both due to an overheating datacenter.
CloudFlare, which adds a layer between websites and their users in order to speed up traffic and prevent security issues, was down for close to an hour on March 3, 2013 due to an issue with its edge routers. The outage took down 785,000 websites including 4chan, Wikileaks, and Metallica.com. It was their third significant outage in four years.
Google Drive experienced three outages during the third week of March 2013. It was down on March 18th for two hours, on March 19th for over two hours, and again on March 21 for almost 12 ½ hours. The first outage was due to a bug present in the network control software, but the cause of the other two outages was not announced. During the outages of Google’s cloud-storage service, users could not access their documents and could not save documents to Google Drive, so if they didn’t also have documents saved on their own computers, they couldn’t access or edit the documents.
Most recently, on April 17, 2013, Gmail and Google Drive were out again for unexplained reasons.
How does it affect you if the cloud crashes?
When we talk generally of the cloud, we might picture some fluffy white thing in the sky that is a solution for replacing physical servers, and we don’t think about the technology behind the moniker. But these cloud outages are a reminder that “the cloud” is just racks of servers inside data centers. A website hosted in the cloud uses several, perhaps hundreds of, clustered load-balanced virtual servers known as “the cloud.” The term cloud actually comes from the cloud-shaped symbol that is the abstract visual image for the complex infrastructure. When the cloud crashes, there are not actually physical clouds bumping into each other; rather there is an outage of the infrastructure, servers, or services at the core of the hosted service and the service comes down, or “crashes.”
The sites and data hosted on the cloud are only available if the cloud technology works, and there are a lot of factors that can prevent the technology from working. Harsh weather, demand overload, technical malfunction, viruses, and hackers can all knock out the cloud. And when this happens, your web hosted services – which might be your website or key business application – become inaccessible.
Now that you understand what it means that “the cloud crashed,” you’re probably concerned about what it means for your business. In most cases, there is nothing you can do but wait until the problem is resolved and cloud services are resumed. However, there are some actions you can take to minimize the disruption to your business caused when the cloud crashes. We’ll cover these in next week’s post.