When Using Availability Groups, Beware of Mirroring Behaviors

Many DBAs haven’t had the experience of using database mirroring, since it was deprecated in 2012 when availability groups were released, and so there’s a lot of history that ends up being lost. We’re going to dive into some of that history and how it could affect your availability group!

A Long, Long Time Ago…

Around 2005, mirroring was born. It was an evolution on log shipping, which is taking log backups, moving them around, and restoring them all in an automated fashion to different servers. Mirroring upped that game and created a dedicated network channel between servers (you could only have 1 principle and 1 mirror, so 2 total) so that there wasn’t this funny business of copying and restoring, additionally it allowed the mirror server to be a highly available copy with automatic failover. Since Microsoft marketing is terrible at naming things, it was originally called, “Real Time Log Shipping” which was then changed to “Mirroring” and in typical fashion you can find the unofficial “Real Time Log Shipping” name all over the place where it was never updated. (I can’t really blame them here, though, it’s hard to find all the little places you’re putting this moniker in and then having some other team tell you to change it all at some way later point)

Mirroring, however, did not have a single connection point, such as the listener acts for an availability group, and this was a problem. If mirroring offered high availability with automatic failover… but no clients could find the new server… that wouldn’t be very good. You could get around this with some fancy networking setups (at the time [2005-2007], now it’s standard stuff) but client drivers were still not great at handling certain situations. How do you connect to the mirror server, automagically, if the principle wasn’t available?

Ch-ch-ch-changes

It was time for some updates to the TDS specification and the client drivers. This allows the client driver to act like overprotective parents leaving their children in the hands of a teenage babysitter in a bad 80s/90s movie, “Here’s the numbers for every place we’re going to be, give them a call if anything happens. We’ll be gone exactly this long and will be back sharp. I want to know if my baby even burps funny.”. This worked, for the most part, pretty great for mirroring setups.

The changes made to the TDS specification (7.2) fall under a new environment change token (these tokens are used for all kinds of stuff, and have been added to recently, even for availability group items), called “Real Time Log Shipping” aka mirroring. This environment change token has the wonderful and magical job of sending the partner server name of the mirroring partner to the client driver, this then can cache that value and if anything occurs to the principle server, can automagically use this value to connect to the mirror – which *might* now be the new principle. Since mirroring could only ever have, at most, two servers involved (the write copy and the non-readable copy), this worked just fine! This odes require that the database context be a database involved in mirroring, that the principle has a mirror, and that the driver understands TDS version 7.2.

Notice that is says “Database Mirroring”, however, later in the same section it says, “Real Time Log Shipping”

Glad they could confuse you by calling it two different things!

There was additional enhancement made to the client drivers in the form of connection string changes, with the addition of the “Failover_Partner” option, which would do the work of the “Real Time Log Shipping” environment change option directly in the connection string, which is much more direct.

Still Need to Support DOS 1.1 In 2012

Microsoft is the reigning champion of backwards compatibility. This isn’t even a quip, most of the time the reason that something can’t be changed to “fix” something or have it work in an updated manner is because it *might* break code or scripts that someone in 1987 wrote. I’m not kidding. Imagine needing 50 years of backwards compatibility… then again, look at modern day processors which do have backwards compatibility from the 1980s (feel free to run that $1,000 32 core package in 16-bit mode). Some days I really wish there was a line drawn in the technology community where you just aren’t backwards compatible, sorry to bring you forward 50 years but let some of these past items die already. Stop beating a dead horse, its bones are just dust now.

Anyway… availability groups were introduced, and SQL Server was better for it… but what about all those people running mirroring? Don’t fret, you’re covered! There’s a not very known (and rarely hit) availability group configuration where you can get the same behavior as mirroring. You’ll need the following setup:

  • Two node availability group
  • Client driver(s) that understand TDS 7.2 (pretty much anything 2008+)
  • At least one database in the availability group
  • A client application (SSMS counts)
  • Connection string with the initial database as the user database in the AG
  • Connection to the current primary replica of the AG using ONLY the node name (not the listener)

Using the above configuration, in the current connection to the primary AG in the context of the user database, run a simple command such as `SELECT @@SERVERNAME`. You’ll get the result as the current primary, say “Node1”. Issue a failover of the availability group to the only secondary replica. Wait a few seconds for the failover to occur and the database on the secondary to become available. Execute the same query again, utilizing the same connection (don’t do a new connection in your code of the client app, or if using SSMS just hit execute again). Notice that the server name has changed to the new primary, say “Node2”.

The initial reaction of… well… everyone is, “ThIs Is A bUg! OmG!”. It’s not. It’s backwards compatibility. You’re welcome your VB3 application from 1994 still works.

Lies, Damned Lies, and Statistics

I like to show hard proof that I’m not selling snake oil… unless you’re into that kind of thing, then meet me around the corner in 5 minutes.

When the connection occurs to the current primary in the context of the user database that belongs to the AG (as in, meeting all the requirements), SQL Server will generate an environment change token in the response to the client driver. The client driver, if it understands the token, will then cache the returned data as the new failover partner. If the connection is broken or the server is otherwise unavailable, the partner will be tried. Since the partner is now the new primary and the database accessible, the connection will succeed and you’ll be *transparently* redirected to the partner, aka the new primary. You’re welcome.

Here’s an example from my environment:

Notice the new value is, “SQL19C1N2” for the partner. Below is the hex value if you’re into that sort of thing.

You Can’t Make Me, You’re Not My MOM

Let’s say you don’t want this behavior, what can you do? Luckily, it’s pretty trivial to remove:

  • Have more or less than 2 nodes in the AG
  • Don’t have the database context in the connection
  • Connect to a secondary
  • Connect to the primary using the listener

Most people and places don’t run into this, almost everyone universally uses the listener which stops the behavior. While two node AGs used to be popular, most are scaling out to 4 nodes or more, which also stops the behavior. Since many connections also use connection pooling, the initial database is generally master which can’t be in an availability group, thus you won’t get this behavior.

So… yeah. You just got app compat’d!

7 thoughts on “When Using Availability Groups, Beware of Mirroring Behaviors”

  1. Thanks for the article, Sean, it is clears some things for me. I think I experienced such thing within SQL 2016 Agent job step that checks for primary replica, and it really puzzled me (and Microsoft support as well btw). Like when certain job step with target database that was user database in AG under failover continued to start and fail, despite of fn_hadr_is_primary_replica check that should stop it. So that job step thought it was still near the primary replica.

    1. VN,

      Sounds like you did encounter this! A quick fix is to change the database to master and run the primary replica check as the first step in the agent job. It’s definitely a surprise when it happens.

      Normally this is pretty rare, I’ve only had it happen a handful of times to people, so thanks for adding in your experience!

      -Sean

  2. I personally wish it would just always work this way natively instead of having to use a listener, or the additional connection options like failover_partner or whatever.

    The thing that ruins it for me being able to take advantage of this is using connection pooling (sigh).

    Am I wrong? Why wouldn’t we want this behavior all the time?

    1. I believe it would depend on the application and need. I can see times it would be amazingly helpful… and other times not so much. Readable secondary replicas might not be as helpful, agent jobs could go awry (though I wish the contained feature made it to RTM).

      1. Oh true readable secondaries might be an issue…

        Thank you Sean! Your articles are insanely insightful, I have appreciated every one of them!

Comments are closed.