RSS

Westhost and VPS.net outages causing client problems

Tue, 23rd February 2010, 11:09

On Saturday, February 20th some routine testing of the Westhost / VPS.net datacenter's inergen fire suppressant system appear to have gone horribly wrong. According to the status updates it appears that the inergen system caused large amounts of data corruption & potential data loss. As of this time the Westhost / VPS.net teams are still working to resolve the entire issue but the ETA for resolution appears to getting continually pushed back.

Initially the Westhost clients were told that the issue would be resolved on Saturday evening but after a few days of the ETA being jostled around, many Westhost clients are beginning to doubt the story behind the outage. There's a great deal going on within the Westhost forums which you can peruse here: http://forums.westhost.com/showthread.php?t=14067

There's also a brief outage page detailing the problems that Westhost is currently experiencing. As of this time it appears that 100% of the vps.net systems are functional and restored in full, however an ETA for Westhost is still uncertan.

We're awaiting some addiitonal information from Inergen & a few other parties involved to see what their take on things are. Interestingly enough, the only reference to Inergen being able to cause any problems with hard-drives is a mention in the related Wikipedia article here: Inergen

Please note that the reference to it causing damage was added by an IP address owned by UK2group, the UK-based company that purchased Westhost in November of 2008


One particular client on the Westhost forums contacted Tyco, one of the major suppliers of Inergen (and one of the only suppliers of refills for the systems) and got this response:

I just had a call from Tyco (http://www.tycofireandsecurity.com/), the local suppliers of Inergen. Very interesting. They were very surprised that their product was being blamed for an outage like this.

 

They said - and I was talking with their country manager, seems they got very panicey about the reputation of their product - that in all but one case which is still undergoing review, outages due to discharge could be traced back to underlying causes, not the product itself. And in that one case that was still under investigation the HDDs still spun up, the servers just needed a reboot.

 

Basically, when Inergen is deployed, it must fill the room to a high percentage in a very short time. Thus the discharge of gas is at a quite high velocity. Due to this, in this country, all installations are discharge tested at installation & periodically thereafter.

 

He said they had seen incidents where the installation was not up to par, and during discharge other debris was blown in to servers etc. He also felt that in any enterprise DC - and I agree on this - levels of dust & other particulate matter should not accumulate to the point of being at risk of disrupting server operation. He did say he'd seen installations where discharge had blown peices of ceiling tile out...

 

So, the summary was that Inergen on it's own is very unlikely to have caused the outage. Airconditioning? Naturally it would have been shut down during the discharge, but unless it was one dude doing the test late at night & he went out for a coffee when the discharge occured, the aircons should have been brought back up in time to stop damage...even if the servers didn't shut themselves down when the temperature rose.