SQL Server, the "Group By" competition: from 30 seconds to 56 milliseconds

Hi Guys,

In the last posts we spoken of all the teory begin the new features of SQL 2019.

Today it's time to get to work!

I propose you an exercise on which you can also try your hand at home.

Given a Query it will be necessary to write it in order to make it as fast as possible.
All the techniques we have said up to now are valid.

Given a table containing movements what we want to obtain is the number of the various classifiers.
If you think about it it is a very common request.

I will propose some solutions but I expect your solutions!

Ready?

Let the "Group by" Race begin

As a data structure we have a table called Moviments:


CREATE TABLE [dbo].[Movements](
	[Id] [int] IDENTITY(1,1) NOT NULL,
	[Qty] [float] NULL,
	[Price] [float] NULL,
	[Classifiers] [varchar](10) NULL
) ON [PRIMARY]
GO

CREATE CLUSTERED INDEX CI_MOVEMENTS_ID ON Movements(ID)

A clustered index (CI) is defined on the column Id.

For this example we fill our table with 50,000,000 records using the command:


INSERT INTO Movements (Qty,Price,Classifiers)
SELECT 
  TOP 50000000 1,1,CHAR( 65 + row_number() OVER (ORDER BY s1.object_id) % 4)
FROM SYS.Columns s1 
  JOIN SYS.Columns s2 ON s1.object_id <> s2.object_id
  JOIN SYS.Columns s3 ON s1.object_id <> s3.object_id

The column Classifiers contain the value of a classifiers. For example: A, B, C, ...

You will have already understood, a solution we can get it by writing in this way:


SELECT Classifiers, COUNT(Classifiers)
FROM Movements
GROUP BY Classifiers

This is the result:

We have 12.500.000 rows with the classifier "A", and the same number with the others classifiers "B", "C" and "D".

But what about performances? ...Well we need a baseline!

This is our baseline!

We set the compatibility level to 140 (SQL Server 2017)


ALTER DATABASE orderbyrace SET COMPATIBILITY_LEVEL = 140

Then set the MAXDOP property (maximum degree of parallelism) to 1


SELECT Classifiers, COUNT(Classifiers)
FROM Movements
GROUP BY Classifiers
OPTION (MAXDOP 1)

On my VM running SQL2019, with 6 GB of memory assigned to SQL Server and 8 CPU the run time was... 32818 ms.


Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.
Table 'Movements'. Scan count 1, logical reads 223582, physical reads 0, page server reads 0, read-ahead reads 74680, page server read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob page server reads 0, lob read-ahead reads 0, lob page server read-ahead reads 0.

(1 row affected)

 SQL Server Execution Times:
   CPU time = 30547 ms,  elapsed time = 32818 ms.

Yes, there will also be 50 million movements in the table but with 32 seconds of time elapsed your customer is already calling you on the phone ... can't you hear him ringing?

Let's see where we can get!

How many CPUs do you have?

By removing the MAXDOP option, SQL Server is free to use the CPUs it has available.

Below we have our mr execution plan with MAXDOP = 1.

Now if the extimated Subtree Cost is great than the cost threshold for parallelism value (I have set a value of 35) the Query will be parallelized:

Note that a new operator called parallelism (Gather Stream) apparared. The task of this operator is to collect all multiple input streams into a single output stream.

I ran the above query with 2, 4 and 8 CPUs and these are the results:

From 32.818 to 7.329 s seconds! not bad but it's just the beginning!

The Power of the Batch mode!

Let's take a step forward now.
Let's set SQL Server 2019 native compatibility level and run the tests again!


ALTER DATABASE orderbyrace SET COMPATIBILITY_LEVEL = 150

In order to use these new features you need to install the developer version (as I did) or the enterprise version.

Go here if you don't know what a batch is: SQL Server 2019 and the Batch Mode on Rowstore

let's now run our query again without any MAXDOP OPTION


SELECT Classifiers, COUNT(Classifiers)
FROM Movements
GROUP BY Classifiers

Ready?

This time the execution plan shows that Batch Mode is used.

This time with all the 8 CPUs we have an elapsed time of 1881 ms.


SQL Server Execution Times:
   CPU time = 5984 ms,  elapsed time = 1881 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

mmmh... probably this time the customer will not call us!

Do we go even faster?

Yes I know you want it ... if you are on this blog!

Let's try adding an index on the classifier field..


CREATE INDEX IDX_MOVEMENTS_CLASSIFIERS ON MOVEMENTS(CLASSIFIERS)

Run again our Query.

The index added is used.


SQL Server Execution Times:
   CPU time = 19141 ms,  elapsed time = 5037 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

This time we are not lucky!
We are back to row mode

Then let's try to add a columnstore index:


CREATE COLUMNSTORE INDEX IDX_MOVEMENTS_CLASSIFIERS ON MOVEMENTS(CLASSIFIERS)

Let's see what happens by running our query again.

This is the execution plan:

Wow! We have a new record:


SQL Server Execution Times:
   CPU time = 156 ms,  elapsed time = 179 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

This time our client is calling us to compliment us

Remember that we are reading 50 millions of rows! While I often hear ... SQL Server is slow because there are so many data and maybe the table has 30,000 rows ..

Can we do better? We can use a feature of the columnstore index called..

And finally the aggregate Pushdown

Using the Aggregate Pushdown feature of the Batch Mode is possible to push down an aggregate operation (like min, max, sum and count) to the scan node.

Without going into further details, because a dedicated post will follow in the future, our columnstore index scan operator will return distinct values directly and thus avoiding reading our 50 million rows!

Actually this feature has some limitations and therefore to use it we will use a trick:

We will add to our Query the aggregate operation COUNT_BIG(*)

Our query will become:


SELECT Classifiers, COUNT(Classifiers)
FROM Movements
GROUP BY Classifiers
ORDER BY COUNT_BIG(*)

Try now to execute the Query!

From the extraction plan you can see that the 50 million lines are no longer read!

Sit down and watch the running time:


SQL Server Execution Times:
  CPU time = 0 ms, elapsed time = 56 ms.
SQL Server parse and compile time:  
  CPU time = 0 ms, elapsed time = 0 ms.

Wow only 56 milliseconds.. I remember we started with a running time of 30 seconds .. judge for yourself!

That's all for today mate!
But don't forget to follow me also on linked to be updated as soon as a new post comes out!

Luca Biondi @ SQLServerPerformance blog!

Help me to share knowledge on my blog

Previous post: SQL Server 2019, Row Mode Memory Grant Feedback

HOIT ASIA -Sever Free Download

Search Suggest