Search Suggest

SQL Server, the Parametrization and the Parameter Sniffing

Hi Guys,

What I want to do today is an experiment.
In fact this is my first article written in English.
I don't know at the moment if it will be the last.
But what is certain is that, as usually, i will try to tell you each topic in a simple and clear way!

Enjoy the reading!



In my previous article i showed you a little tip, today instead i will tell you a little about theory.
We will talk about parameterization and a phenomenon called parameter sniffing.
But now to introduce the topic lets start with...

The history of the execution of a Query

Today's question is?  What happens when i run a query?
Well, as you can imagine, solving a query is a process that requires many steps.
These steps, how can be see from the drawing below, are:


Query execution steps: Parsing, Binding, Opitimization, Execution, Result


The Parsing phase

The first of these steps is called parsing.
During the parsing (the parser) checks our T-SQL String for proper syntax (eg. you write the words "SLECT" or "FRON")
At the end a structure called parse tree is generated.

The Binding phase

After the parsing phase it's the turn of the binding phase.
During this phase also called the Algebrizing phase (…and previously also called normalization in the old SQL Server versions) the "parse tree" is read and after several operations the "Algebrized tree" (or "Query tree") is generated.
In this moment are performed these four steps:
  • Name resolution: During the Name resolution is verified that all objects present in the Query actually exists and that they can be accessed in the context of user security.
  • Type derivation: The next step is called Type derivation and during this step is determined the final data type for each node of the parse tree.
  • Aggregate binding: In the third step in performed the Aggregate binding: is verified if GROUP BY and aggregate columns are placed in right place or not
  • Group Binding: In the last step called Group Binding is bound any aggregations to the appropriate select list.
In this phase if a syntax error is found the optimization process halts and the error is returned to the user.
The algebrized tree will be sent to the Query Optimizer.

The Query Optimization


Well, now that we have the Algebrized tree is the turn of the Query Optimization block.
His goal is to find a good way to return data, remember not the best.

This is because in a non-trivial query there can be hundreds or thousands of different ways to return the same result. This process however is a very  time-consuming process and so finding the best way to optimize the query took too long.

For this reason the optimizer is based on a cost concept. The SQL Server engine has a certain amount of time available in which it will search for the least cost among the already calculated execution plans. But let's not go into more detail today..

The important concept is:

We have therefore seen how expensive it is to determine the best way to return data.
And therefore for this reason that SQL Server already caches the execution plans and tends to reuse them.

And how does he do it? ...through the parametrization! 

 

And now finally let's talk about the parametrization


Another Concept: Execution plans are cached and stored in a memory area called cache plan.
Sayd this, the parametrization is a process whereby SQL Server takes the T-SQL you entered and try to find and replace the values with a token.
In this way when another similar T-SQL is parametrized SQL Server can reuse the previous cached execution plan.

For example, assume now to execute the following queries:

SELECT * FROM PERSON WHERE SURNAME = 'Tom'


The String 'tom' will replaced with a parameter and T-SQL Query will be stored in the cache plan as:

SELECT * FROM PERSONE WHERE SURNAME = @1


If now execute this Query:

SELECT * FROM PERSON WHERE SURNAME = 'Jerry'

The parametrization again will replace the String "Jerry" with a parameter.

But this time a row of cache plan with this plan will already exist and so the execution plan will be reused!


A little more in detail:


If you have tried running the queries above, you will notice that a row has been inserted into the "plan cache" for the parameterized T-SQL string.
 
However, a row is added for each T-SQL not yet parameterized but in this way there is therefore a waste of memory. 

SQL Server Plan Cache

For this reason, when writing a Query (for example within a Stored Procedure) it is advisable to specify a parameter directly.
 
For example in this way:

CREATE procedure SP_EXAMPLE_DO_NOTHING AS
Begin
DECLARE @surname varchar(20)
SET @Surname = 'Tom'
SELECT * FROM PERSON WHERE SURNAME = @Surname
End

There are many other things that I would like to tell you but today I am only interested in explaining to you the logic.
But we'll talk about it soon.

The parameter sniffing


What is the parameter sniffing?

When the optimizer creates an execution plan it sniffs the parameter values
So this is not an issue because a parameter is needed to build the best plan. 
So sniffing is necessary!
 
But what happens? 
By doing sniffing we are bound to a specific value and his becomes a problem if the data is not distributed evenly.
 
In most cases the database workload is homogeneous so parameter sniffing won’t be a problem. 
However on a small number of cases this becomes a problem!
 
To better understand suppose you have a table containing orders placed by a customer.

We can have a customer who makes a single order and a customer who has made so many.

 If we execute our query by querying the order table for the customer who has only one order, the execution plan will be optimized for this specific case.
 
If we subsequently execute our query querying the order table for the customer a lot of orders the execution plan will in any case be the one already in the cache.

A pratical example:

We create a stored procedure that execute a select.

Then we run the stored procedure passing in the parameter @id a value for wich only one row is returned (@id = 13)
If now we look at the execution plan we see that an index seek on the table product is used.


 
Now, let's run the same procedure again with another value.
By executing the stored procedure with the value 12 we extract almost 100K rows.
Our execution plan is in the cache plan, so it is reused.



But where is the problem?

The problem is that in this case the exection plan is not optimal.

Since we have practically extracted all the rows from the table it would have been more convenient to use an index scan:



There would be many other things I would like to tell you but that's all for today!
I hope you enjoyed this article. So if you liked this article subscribe to my blog!

See you soon!
Luca

Luca Biondi @ The Cutting Edge SQL Server Performance blog!


 

 

 
 
 
 
Next post: SQL Server 2019 and the Accelerated Database Recovery

Previous post: SQL Server, come vedere tutte le proprietà di una tabella con un solo comando


Post a Comment