I was enjoying a nice cup of white tea, starting my day by going over the usual suspects and looking into errors when I could no longer access some of my servers. It was interesting to note that I could ping the server, had correct name resolution, and some I could even remote into!
After a few minutes of looking around, the SQL Server services were stopped… but why? When I went to take a look at the log files I noticed… there aren’t any drives! Where did all of my data go?! (Insert puzzled look here)
After a few more minutes and some collaboration, the problem was found – the HBAs couldn’t talk to a specific disk pool on the SAN. This affected more than just the SQL Server instances, as some of my servers are SAN boot and were offline entirely. While I hate for anyone to have to be in this situation, overall I believe it is needed every once and a while.
Right now most people would look at me and think I’m a little crazy, and it might be accurate to a degree. After all, who in their right mind likes it when a disaster happens? I don’t specifically like it, but it does allow for a few things to come into light.
1. How good is your disaster management/recovery plan
2. It shows that a SAN device isn’t the end-all-be-all-never-has-issues that everyone thinks.
3. Real world testing of your HA environment to show deficiencies
4. Your composure during an issue and your training on handling it as a DBA.
5. You have VERIFIED GOOD backups that aren’t on the same SAN… right?
The end result was the problem was fixed and systems came up without issue, no data was lost… but it could have been.
What happened was a nice mixture of different teams working together to fix the issue as fast and accurately as possible, limiting downtime and restoring service. What I took from it was my training from mock disaster testing and how it helped immensly with REAL disasters. I wasn’t shaken, I didn’t panic, I was confident we’d get through this with as little impact as I could possibly allow.
My tea had gone cold by the time we were finished, but it was a small price for the experience earned through the issue.
