Demystifying Dumps: Access Violation

Access Violations (AV) are another common error that will cause SQL Server to take a memory dump. These can occur for a variety of reasons, and unlike the last one (Non-Yielding Scheduler) it’s quite the task, especially given public symbols don’t have structure data and offsets.

Citations and Fines

Access violations occur because a region of memory was attempted to be access but was otherwise not available to be accessed. This could be on purpose, such as protecting read-only memory and a scribbler comes along and attempts to write to it in which case the memory is protected as read only… or perhaps code which expects a value to always be there hits a code path where the value is not there but doesn’t do a sanity check before attempting to access it. There are many reasons, again, and the previous examples are just two. I will say, though, the most common reason is from attempting to access the bottom portions of memory (minimum the bottom 1 MB on modern versions of Windows), which, again, is generally due to bad pointer math or not checking before accessing.

Below is an example of this, it’s just notepad. Notepad is generally fairly light in the memory department (until you use dialogs) and won’t use much memory. If we look at the lowest address range, the descriptor tells us that it’s not in use and it’s set to a protection mode of no access (this isn’t to say all 339MB below can’t be allocated, however the lowest allocation for this notepad process is at `15340000 though that could change with usage). If a piece of virtual memory is attempted to be accessed with the no access protection, the behavior is to trigger an access violation. This is not true for other types of protection, such as guard pages.

Usage: Free
Base Address: 00000000`00000000
End Address: 00000000`15340000
Region Size: 00000000`15340000 ( 339.250 MB)
State: 00010000 MEM_FREE
Protect: 00000001 PAGE_NOACCESS

This is the same information as above, just using VMMap instead of attaching a debugger.

Summons

When an access violation happens it’s already too late (in some scenarios, such as double frees, the initial problem could have occurred days, weeks, months ago). SQL Server takes the approach of a single thread minidump, only the offending thread stack will be dumped with a minimal amount of memory. This can be helpful, for example to see the state of certain objects and directly referenced memory. In most cases, though, the helpfulness is weak at best, outside of correlating where the AV is happening. In some cases it’s always in the same spot, time after time, which would lend credence to the assumption that there may be a problem with a specific call path scenario or bad input. Other times, when the AV is all over the place, it can lead to the assumption there is some type of bad actor doing something to memory (this could be a host of items, including bad hardware, drivers, etc.).

Much as I stated previously, these dumps aren’t generally helpful as the initial issue or state could have been set at any point in the past since the application had started coupled with the fact that it’s not easily possible to check data structures with public symbols. In some cases, though, the minidumps can be very use such as the one below. I invite you to take a look at the below AV minidump and take a guess at what the problem might be for this instance of SQL Server…

0e ntdll!RtlDispatchException
0f ntdll!KiUserExceptionDispatch
10 ntdll!RtlpWaitOnCriticalSection
11 ntdll!RtlpEnterCriticalSectionContended
12 msdtcprx!CIEnlistmentAsynch::StateMachineDriver
13 msdtcprx!CIEnlistmentAsynch::Release
14 cwbzzodb
15 0x0
16 0x0
17 cwbcore
18 0x0
19 0x0
1a cwbzzodb
1b 0x0
1c 0x0
1d 0x0
1e 0x0
1f 0x0
20 cwbzzodb

Initially it might be thought that there was some sort of stack issue, but really, it’s due to not having the unwind information for the modules (x86 and x64 Windows have different models for the stack and unwinding functions). Looking at the top of the stack before the exception filter invocations, we see all kinds of interesting modules that do not have any function calls associated with them (note that the above stack was generated with kc).

That’s interesting, it’s also interesting that there is a call to an in-provider loaded MSDTC module. This means whatever was being accomplished with the distributed transaction was being called directly from whatever these unknown modules are… and it may or may not be passing in good data to MSDTC.

The purpose here isn’t to get hardcore into debugging and looking at the information, merely how to quickly look at the stack (and not having private symbols) to try and make a fast determination of what could be done to possibly stop the AVs from happening. In this case, I’d look at the unknown modules. Looking at one via the lm commands we see there are no symbols (public or private) for this module. Additionally, the module we are using here is from 2009 (it’s 2021 when this post was written) and lives in the system32 directory.

0:000> lmvm cwbzzodb
Browse full module list
start end module name
00000000`20180000 00000000`2034d000 cwbzzodb T (no symbols)
Loaded symbol image file: cwbzzodb.dll
Image path: C:\Windows\System32\cwbzzodb.dll
Image name: cwbzzodb.dll
Browse all global symbols functions data
Timestamp: Thu Dec 10 10:55:06 2009 (4B2127EA)
CheckSum: 001CDC8D
ImageSize: 001CD000
File version: 13.64.0.0
Product version: 7.1.0.0

If I take a quick look at the system32 directory on a SQL Server with the same version installed, I do not see this file. Further, there is no manufacturer information associated with the file, leading me to believe it’s a 3rd party module that was installed.

This was a trivial fix, stop the module from being loaded into the SQL Server process space.

Testimony

Not always will AVs be easy to solve, in fact almost all cases are annoyingly hard or impossible from a single thread minidump. However, this shouldn’t preclude you from opening them up and looking at the stacks. This helps you in a few ways:

  1. You’ll get more familiar with the debugger
  2. You’ll potentially learn something
  3. You now have a stack or more information to search the internet
  4. It might just lead you to a quick resolution

While AVs aren’t SQL Server specific, it’s a prolific exception that happens more often than one would expect. Learning just a little bit about them and why SQL Server does what it does will hopefully add to your arsenal of knowledge. Note that there is no guarantee that SQL Server can continue running after an access violation. While it may continue running without any perceivable issues, there are always things that could happen, such as orphaning or otherwise having a sick spinlock, which could lead to a failure at a future point, or if running in an FCI/AG lead to an automatic failover attempt when the instance learns it’s no longer healthy.

1 thought on “Demystifying Dumps: Access Violation”

Comments are closed.