Wednesday, March 1, 2017

Alexa to Insteon Service Outage

Today, February 28, 2017

Starting this morning, I noticed when looking on Amazon, I couldn't look up order details on a Kindle book I had bought some time ago. Then I discovered that none of my orders' details on anything could be looked up. All I got was this:


"There's a problem displaying some of your orders right now."

"If you don't see the order you're looking for, try refreshing this page, or click "View order details" for that order."

I didn't know what was going on, I tried it several times to no avail. Rats! Still, it was a minor annoyance. I thought it'll be fixed soon. It wasn't. I have never, ever experienced glitches on Amazon.com in 16 years of shopping/reading, music & video streaming with them.

Then I caught wind of it on another site. It was a financial news site that covers stock news & analysis, not a tech or general news site. The story told of Amazon's web service S3 being down.

Amazon's AWS isn't the first to be affected by an outage. Remember the Dyn DNS DDoS fiasco? They provide DNS servers (Domain Name System) that translate a website address into the actual IP address and other critical functions. If that doesn't work, a website can't be accessed or won't function properly. That was the biggest Distributed Denial of Service (DDoS) by hackers in history. It affected so many major sites like Twitter, The New York Times, even Amazon itself.

This is what I experienced in the Dyn attack:

http://bit.ly/2maX2fq

My home automation system was still working at that point. To recap the configuration of my system: the foundation is an Insteon by Smartlabs Hub, with Insteon plug-in dimmer, and on/off modules as well as dimmer wall switches. I control it primarily with the Echo and Dot, using Alexa voice commands. I also use Insteon's web-based cloud setup page and their Android app for my phone and tablet. I also use and prefer the Homeboy for Insteon app. I also have both apps on my Kindle as well.
(Good to have redundancy).

After a couple of hours, I asked "Alexa, dim Ceiling"
When I told it to dim my bedroom lights, there was a long pause, then it answered with:

"Hmm...Insteon isn't responding."

From a personal perspective, it affected me in my home automation system. I use an Insteon Hub/control modules for my lighting and on/off of some devices. My main control is through the Echo & Dot (Amazon devices, ironically) using Alexa voice control.

It stopped working about Noon EST. I tried resetting my Hub without success. Then tried linking & unlinking Insteon to the Alexa app. Nogo, umgats nein, nada. But Alexa responded to all other requests for weather, news, even streaming music. I used my Android apps, both the official & Homeboy for Insteon apps. Both worked. That isolated the problem to the interface/ middleware API between Insteon's cloud service and Alexa. Obviously part of the functionality depended on something critical hosted on AWS.

As of now, 7:30 EST, we don't know what's caused it.

http://cbsn.ws/2mIks8f

http://cnet.co/2mb9s6R

Update, March 2, 2017: Here's what caused it:

http://www.recode.net/2017/3/2/14792636/amazon-aws-internet-outage-cause-human-error-incorrect-command

From Recode:

[Amazon today blamed human error for the the big AWS outage that took down a bunch of large internet sites for several hours on Tuesday afternoon.

In a blog post, the company said that one of its employees was debugging an issue with the billing system and accidentally took more servers offline than intended. That error started a domino effect that took down two other server subsystems and so on and so on.

“Removing a significant portion of the capacity caused each of these systems to require a full restart,” the post read. “While these subsystems were being restarted, S3 was unable to service requests. Other AWS services in the US-EAST-1 Region that rely on S3 for storage, including the S3 console, Amazon Elastic Compute Cloud (EC2) new instance launches, Amazon Elastic Block Store (EBS) volumes (when data was needed from a S3 snapshot), and AWS Lambda were also impacted while the S3 APIs were unavailable.”

In response, the company said it is making some changes to ensure that a similar human error wouldn’t have as large an impact. One is that the tool employees use to remove server capacity will no longer allow them to remove as much as quickly as they previously could.

Amazon also said it is making changes to prevent the AWS Service Health Dashboard — the webpage that shows which AWS services are operating normally and not — from stopping working in the event of a similar occurrence.

AWS, which leases out computing power and data storage to companies big and small, is on pace to be a $14 billion business over the next year. It also drives a large portion of Amazon’s operating income.]

The takeaway from this major incident? Cloud services aren't 100% reliable, always have some form of local control or redundancy for your devices/system(s). I recommend that you keep some devices and lighting NOT be connected or "smart", just in case. The Internet of Things can and will fail, it can and will be hacked. There are multiple ways it can fail and some of these devices have what are called a "single point of failure." That means there's a fatal flaw that's been overlooked, there's no redundancy. Just like what happened in the Amazon incident.





No comments:

Post a Comment