{"id":65,"date":"2011-12-03T09:37:00","date_gmt":"2011-12-03T17:37:00","guid":{"rendered":"https:\/\/thebeagle.itgroove.net\/?p=65"},"modified":"2023-02-24T21:47:07","modified_gmt":"2023-02-24T21:47:07","slug":"some-days-youre-the-bear-and-other-days-the-bear-gets-you","status":"publish","type":"post","link":"https:\/\/regroove.ca\/archive\/2011\/12\/03\/some-days-youre-the-bear-and-other-days-the-bear-gets-you\/","title":{"rendered":"Some days you\u2019re the bear and other days the bear gets you \u2026"},"content":{"rendered":"<p>There are times when those of us that work in IT Consulting have the same kinds of experiences that customers have when it all goes horribly pear shaped.&#160; So, this is a cautionary tale that proves even the \u201cgods of IT\u201d (hmmm, maybe I\u2019ll trademark that) are mere mortals.<\/p>\n<p>I shouldn\u2019t have to explain the need for one or more UPS\u2019s in a server environment but, for those who may not know, a UPS provides clean, filtered power to computers and peripherals and also provides backup power from a battery in case of a power failure.&#160; In most cases, the UPS will run for X amount of time on battery and then will signal servers and devices to shutdown cleanly if the battery level runs down past a certain level.&#160; The idea is the UPS will provide \u201cbridge power\u201d for a short power outage, say 15 minutes or so, and then provide the mechanism to cleanly shutdown servers and devices if the outage runs longer.&#160; There are some gotcha\u2019s waiting out there in UPS land as all UPS\u2019s are not created equal; you may have super fast servers that are super sensitive to power fluctuations and they require fast-switching UPS\u2019s whereas other devices may be more tolerant of a \u201cslow switching\u201d UPS.&#160; Suffice it to say, you do need to do your homework when selecting and purchasing a UPS but you can save yourself a lot of hassle by following one simple rule &#8212; don\u2019t cut corners on the UPS!<\/p>\n<p>Anyway, we followed our own best practice when we built out our server rack and specified multiple UPS\u2019s to support our servers (currently 5 in the rack) and our SAN.&#160; We are lucky in that we are located in the Save-On-Foods Memorial Centre in Victoria and as we share infrastructure facilities with the building our server rack is located in an area of the building that is supplied with backup power by a diesel generator.&#160; We are supposed to have power supplied by the generator within about 15 seconds of a power failure therefore our UPS\u2019s should really only have to provide bridge power for that time.&#160; No big deal, right?&#160; Well, here\u2019s where the cautionary tale comes in \u2026<\/p>\n<p>An infrastructure is only as good as the planning that goes into it and the execution that puts it in place.&#160; If you miss something you are going to pay.&#160; We planned our UPS installation properly and we thought we had executed correctly but sometimes things just don\u2019t work as planned.&#160; The first inkling that there might be something not quite right came when we suffered a sudden failure of one of our big VMware servers, we lost all of the VM\u2019s on it all at once.&#160; There was no indication of any sort of problem prior to the outage so we caught totally by surprise.&#160; When I went down to the server room to check things out I discovered our rack had been moved by one of the building engineers and the power supply plug to one of the UPS\u2019s had been pulled out of the power socket.&#160; Obviously the UPS had gone on battery, the batteries had flattened and the UPS shut down.&#160; The server in question had BOTH it\u2019s power supplies plugged into the one UPS (NOT best practice) so there was the reason for the server crash.<\/p>\n<p>The flattened UPS would not power up which made sense as the unit was configured to NOT turn on until the batteries had recharged to at least 50% capacity.&#160; I plugged the server into the other UPS and started it up and made a mental note to check the UPS config once it restarted as I wanted to know why we weren\u2019t notified of the power failure.<\/p>\n<p>After the UPS came back online (a few hours later) I discovered that while we had configured email notification for alerts from the UPS the \u201cenable\u201d tick box hadn\u2019t been checked.&#160; DOH!&#160; I fixed that and thought we were good at that point.&#160; I went back down to the server room and moved the power connection from one of the power supplies on the server that had crashed back to the UPS in question.&#160; That should have left us in good shape, right?&#160; Well, no, and here\u2019s why.&#160; One of our UPS\u2019s (the one that had flattened) is a fast-switching type and the other is the more traditional slow-switching type.&#160; We have at least one server that needs the fast-switching type so we try and split the power supplies on the servers between the two UPS\u2019s. The idea being that the fast-switching UPS will keep the boxes running immediately following the power failure (the fast-switch) and then both UPS\u2019s will supply power for the prescribed period.&#160; This is an okay plan IF both UPS\u2019s work properly, if the fast-switch UPS has an issue then there is a better than even chance the servers that require fast-switch will crash during a power failure.&#160; I\u2019m sure you know where this is going \u2026<\/p>\n<p>A few days after the original UPS incident the building experienced a real power failure (I had already secured power cables so they couldn\u2019t be pulled put of the sockets again) and we lost severs again even though the generator had kicked in! No alerts just crashed servers.&#160; This made no sense so down to the server room I go.&#160; Lo and behold the UPS that had been the problem earlier was deader than a doorknob even though the server room had power!&#160; No amount of fiddling with the UPS would wake it up, it appeared to have failed completely.&#160; I shuffled power cables around and even put some cables direct on to the mains and brought all of the crashed servers back on line.&#160; Time to investigate what had happened.<\/p>\n<p>To make a long story short, it appeared that our fast-switch UPS was actually faulty.&#160; We discovered this once the UPS finally came back online and we ran a calibration test on the unit.&#160; As soon as we kicked off the calibration (a simulated power failure) the UPS crashed horribly and became as dead as a doorknob. This explained why the servers crashed as power would have been cut off immediately (no battery power) AND the UPS could not signal servers to shutdown as the whole unit crashed at once.&#160; <\/p>\n<p>The UPS had become faulty sometime between the last time we ran tests and calibrations on the unit and the first power failure where the unit actually failed.&#160; All of our planning and design became compromised at that point because we had designed around both a fast-switch and a slow-switch UPS without taking into account the fact that a UPS *could* fail totally.&#160; To be fair, none of us had ever come across a situation like this one.&#160; Normally we see batteries start to fail (and the UPS tells you) or there is a wiring fault (and the UPS tells you) and that can be easily fixed.&#160; A total failure of the UPS just seemed out of the question until this happened.<\/p>\n<p>Moral of the story: plan for the worst and design accordingly.&#160; In our case we have already ordered an additional fast-switch UPS (bigger and better, too!) and our UPS vendor is replacing the&#160; failed unit under warranty.&#160; And, yes, we will test more frequently; you should, too!&#160; A UPS infrastructure is ONLY as good as the UPS\u2019s are; are they healthy?&#160; Do they alert properly?&#160; Do they shut your systems down cleanly?&#160; Don\u2019t wait for the inevitable power failure to find out as you may not be as lucky as we were.&#160; We didn\u2019t suffer any permanent damage to our systems (although ego\u2019s may have been slightly bruised) but that was more fluke than anything else.&#160; Learn from this cautionary tale, we certainly have!!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>There are times when those of us that work in IT Consulting have the same kinds of experiences that customers have when it all goes horribly pear shaped.&#160; So, this is a cautionary tale that proves even the \u201cgods of IT\u201d (hmmm, maybe I\u2019ll trademark that) are mere mortals. I shouldn\u2019t have to explain the &hellip; <a href=\"https:\/\/regroove.ca\/archive\/2011\/12\/03\/some-days-youre-the-bear-and-other-days-the-bear-gets-you\/\"><\/a><\/p>\n","protected":false},"author":10,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"inline_featured_image":false,"footnotes":""},"categories":[263,270],"tags":[314,341,505,618],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v23.0 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Some days you\u2019re the bear and other days the bear gets you \u2026 - Archive<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/regroove.ca\/archive\/2011\/12\/03\/some-days-youre-the-bear-and-other-days-the-bear-gets-you\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Some days you\u2019re the bear and other days the bear gets you \u2026 - Archive\" \/>\n<meta property=\"og:description\" content=\"There are times when those of us that work in IT Consulting have the same kinds of experiences that customers have when it all goes horribly pear shaped.&#160; So, this is a cautionary tale that proves even the \u201cgods of IT\u201d (hmmm, maybe I\u2019ll trademark that) are mere mortals. I shouldn\u2019t have to explain the &hellip;\" \/>\n<meta property=\"og:url\" content=\"https:\/\/regroove.ca\/archive\/2011\/12\/03\/some-days-youre-the-bear-and-other-days-the-bear-gets-you\/\" \/>\n<meta property=\"og:site_name\" content=\"Archive\" \/>\n<meta property=\"article:published_time\" content=\"2011-12-03T17:37:00+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2023-02-24T21:47:07+00:00\" \/>\n<meta name=\"author\" content=\"Sean Wallbridge\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Sean Wallbridge\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"7 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/regroove.ca\/archive\/2011\/12\/03\/some-days-youre-the-bear-and-other-days-the-bear-gets-you\/\",\"url\":\"https:\/\/regroove.ca\/archive\/2011\/12\/03\/some-days-youre-the-bear-and-other-days-the-bear-gets-you\/\",\"name\":\"Some days you\u2019re the bear and other days the bear gets you \u2026 - Archive\",\"isPartOf\":{\"@id\":\"https:\/\/regroove.ca\/archive\/#website\"},\"datePublished\":\"2011-12-03T17:37:00+00:00\",\"dateModified\":\"2023-02-24T21:47:07+00:00\",\"author\":{\"@id\":\"https:\/\/regroove.ca\/archive\/#\/schema\/person\/74e1c0def190f181c1394c2b6d883e77\"},\"breadcrumb\":{\"@id\":\"https:\/\/regroove.ca\/archive\/2011\/12\/03\/some-days-youre-the-bear-and-other-days-the-bear-gets-you\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/regroove.ca\/archive\/2011\/12\/03\/some-days-youre-the-bear-and-other-days-the-bear-gets-you\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/regroove.ca\/archive\/2011\/12\/03\/some-days-youre-the-bear-and-other-days-the-bear-gets-you\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Blog Archive\",\"item\":\"https:\/\/regroove.ca\/archive\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Some days you\u2019re the bear and other days the bear gets you \u2026\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/regroove.ca\/archive\/#website\",\"url\":\"https:\/\/regroove.ca\/archive\/\",\"name\":\"Archive\",\"description\":\"\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/regroove.ca\/archive\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/regroove.ca\/archive\/#\/schema\/person\/74e1c0def190f181c1394c2b6d883e77\",\"name\":\"Sean Wallbridge\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/regroove.ca\/archive\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/adf8cea6291c39d166616f2148d919a6?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/adf8cea6291c39d166616f2148d919a6?s=96&d=mm&r=g\",\"caption\":\"Sean Wallbridge\"},\"url\":\"https:\/\/regroove.ca\/archive\/author\/swallbridge\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Some days you\u2019re the bear and other days the bear gets you \u2026 - Archive","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/regroove.ca\/archive\/2011\/12\/03\/some-days-youre-the-bear-and-other-days-the-bear-gets-you\/","og_locale":"en_US","og_type":"article","og_title":"Some days you\u2019re the bear and other days the bear gets you \u2026 - Archive","og_description":"There are times when those of us that work in IT Consulting have the same kinds of experiences that customers have when it all goes horribly pear shaped.&#160; So, this is a cautionary tale that proves even the \u201cgods of IT\u201d (hmmm, maybe I\u2019ll trademark that) are mere mortals. I shouldn\u2019t have to explain the &hellip;","og_url":"https:\/\/regroove.ca\/archive\/2011\/12\/03\/some-days-youre-the-bear-and-other-days-the-bear-gets-you\/","og_site_name":"Archive","article_published_time":"2011-12-03T17:37:00+00:00","article_modified_time":"2023-02-24T21:47:07+00:00","author":"Sean Wallbridge","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Sean Wallbridge","Est. reading time":"7 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/regroove.ca\/archive\/2011\/12\/03\/some-days-youre-the-bear-and-other-days-the-bear-gets-you\/","url":"https:\/\/regroove.ca\/archive\/2011\/12\/03\/some-days-youre-the-bear-and-other-days-the-bear-gets-you\/","name":"Some days you\u2019re the bear and other days the bear gets you \u2026 - Archive","isPartOf":{"@id":"https:\/\/regroove.ca\/archive\/#website"},"datePublished":"2011-12-03T17:37:00+00:00","dateModified":"2023-02-24T21:47:07+00:00","author":{"@id":"https:\/\/regroove.ca\/archive\/#\/schema\/person\/74e1c0def190f181c1394c2b6d883e77"},"breadcrumb":{"@id":"https:\/\/regroove.ca\/archive\/2011\/12\/03\/some-days-youre-the-bear-and-other-days-the-bear-gets-you\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/regroove.ca\/archive\/2011\/12\/03\/some-days-youre-the-bear-and-other-days-the-bear-gets-you\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/regroove.ca\/archive\/2011\/12\/03\/some-days-youre-the-bear-and-other-days-the-bear-gets-you\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Blog Archive","item":"https:\/\/regroove.ca\/archive\/"},{"@type":"ListItem","position":2,"name":"Some days you\u2019re the bear and other days the bear gets you \u2026"}]},{"@type":"WebSite","@id":"https:\/\/regroove.ca\/archive\/#website","url":"https:\/\/regroove.ca\/archive\/","name":"Archive","description":"","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/regroove.ca\/archive\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/regroove.ca\/archive\/#\/schema\/person\/74e1c0def190f181c1394c2b6d883e77","name":"Sean Wallbridge","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/regroove.ca\/archive\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/adf8cea6291c39d166616f2148d919a6?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/adf8cea6291c39d166616f2148d919a6?s=96&d=mm&r=g","caption":"Sean Wallbridge"},"url":"https:\/\/regroove.ca\/archive\/author\/swallbridge\/"}]}},"_links":{"self":[{"href":"https:\/\/regroove.ca\/archive\/wp-json\/wp\/v2\/posts\/65"}],"collection":[{"href":"https:\/\/regroove.ca\/archive\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/regroove.ca\/archive\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/regroove.ca\/archive\/wp-json\/wp\/v2\/users\/10"}],"replies":[{"embeddable":true,"href":"https:\/\/regroove.ca\/archive\/wp-json\/wp\/v2\/comments?post=65"}],"version-history":[{"count":1,"href":"https:\/\/regroove.ca\/archive\/wp-json\/wp\/v2\/posts\/65\/revisions"}],"predecessor-version":[{"id":3082,"href":"https:\/\/regroove.ca\/archive\/wp-json\/wp\/v2\/posts\/65\/revisions\/3082"}],"wp:attachment":[{"href":"https:\/\/regroove.ca\/archive\/wp-json\/wp\/v2\/media?parent=65"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/regroove.ca\/archive\/wp-json\/wp\/v2\/categories?post=65"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/regroove.ca\/archive\/wp-json\/wp\/v2\/tags?post=65"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}