Search Suggest

Posts

Your biggest customer can’t work! It is stuck due to a SQL Server randomly crash? And now?


Hi Guys,
Welcome back!

Today an exciting post like a detective story.

The breath on the neck of our large customer reminds you that it is the end of the month and the administration must proceed with the billing!
But SQL Server restarts randomly 

From where can we start our investigations?

First of all it must be remembered that tracing the cause of this type of error is never trivial.
Never!

Today I will tell you the procedure that I personally follow!

First take a look to the error log:
       

2019-11-29 10:31:04.12 Server Error: 17311, Severity: 16, State: 1.
2019-11-29 10:31:04.12 Server SQL Server is terminating because of fatal exception c0000005. This error may be caused by an unhandled Win32 or C++ exception, or by an access violation encountered during exception handling. Check the SQL error log for any related stack dumps or messages. This exception forces SQL Server to shutdown. To recover from this error, restart the server (unless SQLAgent is configured to auto restart).
2019-11-29 10:31:04.13 Server **Dump thread - spid = 0, EC = 0x0000000000000000
2019-11-29 10:31:04.13 Server ***Stack Dump being sent to C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\MSSQL\LOG\SQLDump0096.txt
2019-11-29 10:31:04.13 Server * *******************************************************************************
2019-11-29 10:31:04.13 Server *
2019-11-29 10:31:04.13 Server * BEGIN STACK DUMP:
2019-11-29 10:31:04.13 Server * 11/29/19 10:31:04 spid 34452
2019-11-29 10:31:04.13 Server *
2019-11-29 10:31:04.13 Server * ex_handle_except encountered exception C0000005 - Server terminating
2019-11-29 10:31:04.13 Server *
2019-11-29 10:31:04.13 Server *
2019-11-29 10:31:04.13 Server * MODULE BASE END SIZE
2019-11-29 10:31:04.13 Server * sqlservr 00007FF7F3020000 00007FF7F3087FFF 00068000
2019-11-29 10:31:04.13 Server * ntdll 00007FF82A110000 00007FF82A2DFFFF 001d0000
2019-11-29 10:31:04.13 Server * KERNEL32 00007FF82A060000 00007FF82A10BFFF 000ac000

[………]

2019-11-29 10:31:08.67 Server External dump process return code 0x20000001.
External dump process returned no errors.

2019-11-29 10:31:08.67 Server SQL Server shutdown has been initiated
2019-11-29 10:31:08.67 spid13s SQL Trace was stopped due to server shutdown. Trace ID = '1'. This is an informational message only; no user action is required.
2019-11-29 10:31:08.69 Logon Error: 17188, Severity: 16, State: 1.

[………]

2019-11-29 10:31:08.80 Logon SQL Server cannot accept new connections, because it is shutting down. The connection has been closed. [CLIENT: 172.16.1.35]
2019-11-29 10:31:08.91 spid15s The SQL Server Network Interface library successfully deregistered the Service Principal Name (SPN) [ MSSQLSvc/xxx.xxx.locale:1433 ] for the SQL Server service.


From the error log we can observe, first of all, when the occoured.
This help to identify what procedures was running in that moment.
We can see also the error number in this case C000005.


SQL Server is terminating because of fatal exception c0000005.

Actually not very descriptive.
And now? 

Take a memory dump


You must know that in case of crash SQL Server generates a memory dump.
So locate a file with the mdmp (memory dump) extension.

Now i copy the most recent file from the customer's server to my pc.
This is because we will inside into the memory dump and this will require a debugger software.

The debugger Windbg


On my personal computer i have installed a debugger called Windbg.
You do the same.
After the installation you need only to download all the SQL Server symbols file available.
Each bin file and each dll of SQL Server have it's psbd file downloadable.
That great!

Now execute Windbg.
Then locate the "open Crash Dump..." voice in the File Menù and choose your mdmp file.



Ok then start to analyze our dump.
I will not enter in details on this program.
I will say only that for starting analysis you must write this command:

!analyze -v



And again ...


Now we can observe in what point crash happened!
Look at the rows highlighted above.


Who is the guilty one?


Now look!
Note two names highlighted that are:  CWBCore.dll and Cwbodbc.dll

These are the culprits!
These two dll are part of the IBM iSeries Access for Windows ODBC driver


Unfortunately for these driver psdb symbols are not available so we can't indentify the name of the procedure that cause the crash.

But...

SQL Server uses this ODBC driver to read data from a linked server.
So now we know where to look.

By disabling, one at time each Query, developers found the guilty one.


So, at this point, the case is closed! 
The guilty was not the butler (ah ah ah!) but a Query executed through the linked server!
And the customer invoiced happy and contented 

That's all for today and have a good week end!
I hope you enjoy this post!
See you soon.

Luca Biondi @ SQLServerPerformance blog!




 

 

 

 

Next post: SQL Server, Why doesn't my update come to an end? a light post!

Previous post: SQL Server, Again info about statistics and the "Ascending Key Problem"

A song with the mood of the dayBillie Jean

Post a Comment