Access Violations (AV) are another common error that will cause SQL Server to take a memory dump. These can occur for a variety of reasons, and unlike the last one (Non-Yielding Scheduler) it’s quite the task, especially given public symbols don’t have structure data and offsets.
Citations and Fines
Access violations occur because a region of memory was attempted to be access but was otherwise not available to be accessed. This could be on purpose, such as protecting read-only memory and a scribbler comes along and attempts to write to it in which case the memory is protected as read only… or perhaps code which expects a value to always be there hits a code path where the value is not there but doesn’t do a sanity check before attempting to access it. There are many reasons, again, and the previous examples are just two. I will say, though, the most common reason is from attempting to access the bottom portions of memory (minimum the bottom 1 MB on modern versions of Windows), which, again, is generally due to bad pointer math or not checking before accessing.
Below is an example of this, it’s just notepad. Notepad is generally fairly light in the memory department (until you use dialogs) and won’t use much memory. If we look at the lowest address range, the descriptor tells us that it’s not in use and it’s set to a protection mode of no access (this isn’t to say all 339MB below can’t be allocated, however the lowest allocation for this notepad process is at `15340000 though that could change with usage). If a piece of virtual memory is attempted to be accessed with the no access protection, the behavior is to trigger an access violation. This is not true for other types of protection, such as guard pages.
Usage: Free Base Address: 00000000`00000000 End Address: 00000000`15340000 Region Size: 00000000`15340000 ( 339.250 MB) State: 00010000 MEM_FREE Protect: 00000001 PAGE_NOACCESS
This is the same information as above, just using VMMap instead of attaching a debugger.
When an access violation happens it’s already too late (in some scenarios, such as double frees, the initial problem could have occurred days, weeks, months ago). SQL Server takes the approach of a single thread minidump, only the offending thread stack will be dumped with a minimal amount of memory. This can be helpful, for example to see the state of certain objects and directly referenced memory. In most cases, though, the helpfulness is weak at best, outside of correlating where the AV is happening. In some cases it’s always in the same spot, time after time, which would lend credence to the assumption that there may be a problem with a specific call path scenario or bad input. Other times, when the AV is all over the place, it can lead to the assumption there is some type of bad actor doing something to memory (this could be a host of items, including bad hardware, drivers, etc.).
Much as I stated previously, these dumps aren’t generally helpful as the initial issue or state could have been set at any point in the past since the application had started coupled with the fact that it’s not easily possible to check data structures with public symbols. In some cases, though, the minidumps can be very use such as the one below. I invite you to take a look at the below AV minidump and take a guess at what the problem might be for this instance of SQL Server…
0e ntdll!RtlDispatchException 0f ntdll!KiUserExceptionDispatch 10 ntdll!RtlpWaitOnCriticalSection 11 ntdll!RtlpEnterCriticalSectionContended 12 msdtcprx!CIEnlistmentAsynch::StateMachineDriver 13 msdtcprx!CIEnlistmentAsynch::Release 14 cwbzzodb 15 0x0 16 0x0 17 cwbcore 18 0x0 19 0x0 1a cwbzzodb 1b 0x0 1c 0x0 1d 0x0 1e 0x0 1f 0x0 20 cwbzzodb
Initially it might be thought that there was some sort of stack issue, but really, it’s due to not having the unwind information for the modules (x86 and x64 Windows have different models for the stack and unwinding functions). Looking at the top of the stack before the exception filter invocations, we see all kinds of interesting modules that do not have any function calls associated with them (note that the above stack was generated with
That’s interesting, it’s also interesting that there is a call to an in-provider loaded MSDTC module. This means whatever was being accomplished with the distributed transaction was being called directly from whatever these unknown modules are… and it may or may not be passing in good data to MSDTC.
The purpose here isn’t to get hardcore into debugging and looking at the information, merely how to quickly look at the stack (and not having private symbols) to try and make a fast determination of what could be done to possibly stop the AVs from happening. In this case, I’d look at the unknown modules. Looking at one via the lm commands we see there are no symbols (public or private) for this module. Additionally, the module we are using here is from 2009 (it’s 2021 when this post was written) and lives in the system32 directory.
0:000> lmvm cwbzzodb Browse full module list start end module name 00000000`20180000 00000000`2034d000 cwbzzodb T (no symbols) Loaded symbol image file: cwbzzodb.dll Image path: C:\Windows\System32\cwbzzodb.dll Image name: cwbzzodb.dll Browse all global symbols functions data Timestamp: Thu Dec 10 10:55:06 2009 (4B2127EA) CheckSum: 001CDC8D ImageSize: 001CD000 File version: 220.127.116.11 Product version: 18.104.22.168
If I take a quick look at the system32 directory on a SQL Server with the same version installed, I do not see this file. Further, there is no manufacturer information associated with the file, leading me to believe it’s a 3rd party module that was installed.
This was a trivial fix, stop the module from being loaded into the SQL Server process space.
Not always will AVs be easy to solve, in fact almost all cases are annoyingly hard or impossible from a single thread minidump. However, this shouldn’t preclude you from opening them up and looking at the stacks. This helps you in a few ways:
- You’ll get more familiar with the debugger
- You’ll potentially learn something
- You now have a stack or more information to search the internet
- It might just lead you to a quick resolution
While AVs aren’t SQL Server specific, it’s a prolific exception that happens more often than one would expect. Learning just a little bit about them and why SQL Server does what it does will hopefully add to your arsenal of knowledge. Note that there is no guarantee that SQL Server can continue running after an access violation. While it may continue running without any perceivable issues, there are always things that could happen, such as orphaning or otherwise having a sick spinlock, which could lead to a failure at a future point, or if running in an FCI/AG lead to an automatic failover attempt when the instance learns it’s no longer healthy.