- · Problems hit thousands of websites including Slack, Medium and The Verge
- · Users reported seeing some entire sites go down and broken links on others
- · Problems continued for around 3.5 hours, with some users suffering for longer
- · Amazon's AWS storage system in North Virginia was root of problem
- · Apps such as Nest's smart thermometer have also suffered problems
d Amazon
says an incorrectly typed command during a routine debugging of its billing
system caused the five-hour outage of some Amazon Web
Services servers on Tuesday.
The
Seattle company was forced to issue an embarrassing apology after a command
meant to remove a small number of servers for one of its S3 subsystems was
entered incorrectly, knocking a far larger set of servers offline.
A
full restart was required, Amazon said, which took longer than expected due to
how fast Amazon Web Services has grown over the past few years.
Amazon
says an incorrectly typed command during a routine debugging of its billing
system caused the five-hour outage of some Amazon Web Services servers on
Tuesday. The problem at one of its major East Coast data centers in North
Virginia, caused major problems for internet users across the globe as an
estimated 150,000 sites were hit.
an
authorized S3 team member using an established playbook executed a command
which was intended to remove a small number of servers for one of the S3
subsystems that is used by the S3 billing process,' Amazon said.
'Unfortunately,
one of the inputs to the command was entered incorrectly and a larger set of
servers was removed than intended.
'The
servers that were inadvertently removed supported two other S3 subsystems.
Amazon
says it is making changes to its system to make sure incorrect commands won't
trigger an outage of its web services in the future.
'W,e
want to apologize for the impact this event caused for our customers,' it said.
'While
we are proud of our long track record of availability with Amazon S3, we know
how critical this service is to our customers, their applications and end
users, and their businesses.
'We
will do everything we can to learn from this event and use it to improve our
availability even further.'
Amazon
is the world's largest provider of cloud services, which entails hosting
companies' computing functions on remote servers.
The problems caused thousands of sites and apps to become completely unavailable, while others show broken links and images, leaving users and companies around the globe confused.
Amazon's
Simple Storage Service, or Amazon S3, had difficulty sending and receiving
clients' data for more than 3-1/2 hours, according to company status reports
online.
Amazon
did not disclose the cause, and some of its smaller cloud applications in North
America continued to have trouble.
Internet
users took to Twitter to complain about the outage, with some asking 'where has
the cloud gone?' after receiving error messages when trying to access sites.
Web
firm similartech said almost
150,000 sites had been affected.
While
few services went down completely, thousands, if not tens of thousands of
companies had trouble with functions ranging from file sharing to webfeeds to
loading any type of data stored on Amazon's 'simple storage service,' known as
S3.
Amazon
services began returning around 4 p.m. EST, and an hour later the company noted
on its service site that S3 was fully recovered and 'operating normally.'
Users
from Apple to Slack were hit, with some internet users claiming 'half the
internet is down' due to the huge number of firms that rely on amazon.
Amazon
confirmed its cloud service was affected by the partial failure of a hosting
platform, affecting a number of internet services and media outlets.
'We're
continuing to work to remediate the availability issues for Amazon S3 in
US-EAST-1,' Amazon said on its Amazon Web Services website.
Slack,
Trello, Splitwise, Soundcloud and Medium were among the popular internet
services that were impacted.
Sites
such as Mashable were left unable to publish, while others like the Verge could
not post images.
Users
of Nest smart thermostats were unable to connect via the firm's app.
Apple
on its website reported issues with its app store, music-streaming service and
other products.
The
iPhone-maker did not immediately comment on the cause, but it previously has
said it uses Amazon S3 for some storage.
Experts
said the problem was a major one for Amazon's reputation.
'Imagine
your business not being able to run for a day. That's a big problem,' said Gene
Munster, head of research for Loup Ventures.
Loup
Ventures' Munster called the disruption 'a temporary black eye' for Amazon.
Customers
would not go through the hassle of switching to a competing cloud service
because of a one-time event, he said.
Sites
such as Soundcloud, Business Insider and imgur have also been hit, with some disappearing
completely.
It
is believed one of its data centers in Northern Virginia was responsible for
the issue, which affected everyone from Expedia to the U.S. Securities and
Exchange Commission.
A
spokesperson for the U.S. Securities and Exchange Commission said in a
statement, 'Our cloud services provider has informed us that they are
experiencing issues that are affecting page loads on http://sec.gov and that
they are working to resolve the issues as quickly as possible.'
AWS
is a large, fast-growing source of revenue for Amazon. It has helped transform
the retailer, once known simply for selling books online, into a technology
platform.
Amazon shares closed
down less than 1 percent.
Amazon.com said on Tuesday its cloud service was
affected by the partial failure of a hosting platform, affecting a number of
internet services and media outlets. Slack, Trello, Splitwise, Soundcloud
and Medium were among the popular internet services that were impacted.
Amazon's
Simple Storage Service, or S3, stores files and data for companies on remote
servers.
It's
used for everything from building websites and apps to storing images, customer
data and customer transactions.
'Anything
you can think about storing in the most cost-effective way possible,' is how
Rich Mogull, CEO of data security firm Securosis, puts it.
Amazon.com said on Tuesday its
cloud service was affected by the partial failure of a hosting platform,
affecting a number of internet services and media outlets. It is believed to
have begun at around 11:35, and this afternoon the firm was still showing some
services as unavailable, although by 2:08 is said most were back in action.
Since
Amazon hasn't said exactly what is happening yet, it's hard to know just how
serious the outage is.
'We
do know it's bad,' Mogull said. 'We just don't know how bad.'
The
problem affected both 'front-end' operations - meaning the websites and apps
that users see - and back-end data processing that takes place out of sight.
Some smaller online services, such as Trello, Scribd and IFTTT, appeared to be
down for a while, although all have since recovered.
The
corporate message service Slack, by contrast, stayed up, although it reported
'degraded service ' for some features.
Users
reported that file sharing in particular appeared to freeze up.
Many users took to twitter to
vent their frustration at the massive outage, with some dramatic reactions.
Many users took to twitter to
vent their frustration at the massive outage, with some dramatic reactions.
Major
cloud-computing outages happen periodically.
In
2015, Amazon's DynamoDB service, a cloud-based database, had problems that
affected companies like Netflix and Medium.
But
usually providers have workarounds that can get things working again quickly.
'What's
really surprising to me is that there's no fallback - usually there is some
sort of backup plan to move data over, and it will be made available within a
few minutes,' said Patrick Moorhead, analyst at Moor Insights & Strategy.
A
certain article 'published without an image because our image system runs on
AWS,' Nilay Patel editor-in-chief of tech website The Verge tweeted.
'AWS
services and customer applications depending on S3 will continue to experience
high error rates as we are actively working to remediate the errors in Amazon
S3.'
The site most poeple
use to monitor websites that were down, called downdetector.com, was also down
because of the outage.
'This
is a pretty big outage,' said Dave Bartoletti, a cloud analyst with Forrester,
told USA Today.
'AWS
had not had a lot of outages and when they happen, they're famous.
'People
still talk about the one in September of 2015 that lasted five hours,' he said.
The
outage appeared to have begun around 12:45 pm ET.
It
was centered in AWS' S3 storage system on the east coast.
S3
is Amazon's largest service, used by more than half of its million plus
customers, Bartoletti said.
'It's
got north of 3 to 4 trillion pieces of data stored in it,' he said.
Amazon
is the leader in cloud computing, capturing more than 40 percent of the market,
according to a recent report.
AWS
topped $12 billion in sales for the year, up 55 percent from the same period
last year, blowing past a goal of reaching $10 billion in sales in 2016.
The effects of the
outage will vary depending on how a site uses the service, as many use multiple
databases to 'build' a single page
Amazon was initially even unable
to update its own system health dashboard
The outage sparked a rash of
memes, with everything from the White House press briefings to the IT Crowd
comedy being used
Many
users reported seeing broken links or images on pages as a result of the
issue.
The
firm was initially even unable to update its own system health dashboard.
At
11:35am, the firm added 'We have now repaired the ability to update the service
health dashboard.
'
We are working hard at repairing S3, believe we understand root cause, and are
working on implementing what we believe will remediate the issue.'
Forrester's
Bartoletti said the problems on Tuesday could lead to some Amazon customers
storing their data on Amazon's servers in more than one location, or even
shifting to other providers.
"A
lot more large companies could look at their application architecture and ask
'how could we have insulated ourselves a little bit more,'" he said. But
he added, "I don't think it fundamentally changes how incredibly reliable
the S3 service has been."
No comments:
Post a Comment