RIM: Is Amateur Hour Over?

Last week the most severe outage in RIM’s history crippled BlackBerry users’ abilities to use e-mail, BBM, and the Internet in general for over three days.

The outage highlighted two deeply concerning issues with RIM. First, it is almost beyond comprehension how a single point of failure could bring RIM’s global network down for this period of time. In the face of fierce competition from Apple and Google, RIM had been able to depend on real-time and reliable e-mail delivery as one of its key competitive differentiators. Not any more.

Worse, the company’s response has come across as arrogant, aloof, and out-of-touch. The company did nothing to alert users or notify them of the status of their systems, and offered little in the way of updates in terms of root cause analysis or ETA to recovery. Perplexingly, the company doesn’t run a system status service similar to Amazon’s or Google’s, leaving customers to find their own updates on status via news outlets and Twitter. As Queen’s University marketing professor John Pliniussen put it to the Globe and Mail last week, “It’s sad that a world-class company with state-of-the-art technology has state-of-the-ark public relations.”

The RIM outage highlights the need for companies delivering services that demand “dial-tone” levels of reliability to have open, honest and timely communications during an outage and, importantly, to have a pre-built communication portal that will be used when such an outage occurs. As a customer of such services – whether it’s provided by a company such as RIM, a cloud computing provider, or a utility company – ask as part of your due diligence process how the company notifies customers of outages or issues with its service. Ideally the company should have a public-facing website with current service status, along with a historical list of outages, durations and, hopefully, a post-mortem and root cause analysis. A great example of a company adopting such practices is Amazon with its Amazon Web Services Status site along with its post-mortem of the services’ high profile April 2011 outage.

If the provider does not have such a site, it doesn’t mean they don’t have outages – everyone does – it just means they, like RIM, don’t like talking about them.

Comments

  1. David Collier-Brown

    The kind of outage they had isn’t that unusual. What is unusual is that it was in a customer-visible system.

    When working as a capacity planner, I saw lots of systems unexpectedly fail to survive the loss of a major component. That’s usually why they engaged a capacity planner in the first place. The difference is that the failures I saw were almost always in services that weren’t visible to the general public. When internal systems fail, the usual request is “don’t tell anyone, but we need you to help us avoid a more serious outage in the future”.

    Customer-visible system are very different, and because they’re public, often have a status page and a proactive customer notification process, exactly as Mr. Newton says.

    Regrettably, it sometimes takes a public failure to shock a company into establishing customer notifications and capacity planning.

    –dave