Search Suggest

SQL Server, columnstore indexes: A bit of theory and some examples (when to use columnstore indexes and when not)

Hi guys,
Luca Biondi for server.hoit.asia

 
Today we speak about columnstore index
I wrote a summary, a list of things to know about and some examples in order to learn how to use (or not) this types of indexes.

Are you Ready? Yes! 

 

What is a columstore index

Yes but what is a columnstore index?
 
Microsoft introduced, in SQL Server, Columnstore indexes that are the standard for storing and querying large tables, let's think to tables with million of records.
 
The difference is that data is stored inside page in a different way.
  • In a row-based index data is stored in one or more data pages.
  • In a column-based index data is stored in a separate pages for each column of the table.
Rowstore vs. ColumnStore storage
A columnstore is data that's logically organized as a table with rows and columns, and physically stored in a column-wise data format.
 
A rowstore is data that's logically organized as a table with rows and columns, and physically stored in a row-wise data format. 

Just as there are clustered and non-clustered indexes, we can have clustered and non-clustered columnstore indexes.

No other details in this post just let's see their history for a moment ..
 

The evolution of the columstore indexes

Columnstore indexes are not a new entry infact them will turn 10 years old next year. They were introduced with SQL Server 2012 and so SQL Server 2008 indeed have no clusterindexes at all.
 
With SQL Server 2008 you can only create "classics" rowstore indexes, one clustered Index (CI) and n nonclustered indexes (NCI)
 
As sayd before SQL Server 2012 was the first release that supports a Columnstore, albeit in a very limited way:  you can only create a read-only non clustered Columnstore Index (NCCI).

An NCCI can be created on most data types with some limitations: e.g. the data type DECIMAL is supported only up to a precision of 18 digits.
 
Starting from SQL Server 2014 microsoft introduced a writeable Clustered Columnstore Index (CCI). The CCI must be the only index on a table, so you can't add additional CI or NCI.
 
Finally SQL Server 2016 has come.With this release all Columnstore indexes are writeable.
Furthermore you can have both Rowstore and Columnstore on the same table.

Today, even through some examples, we will see when use columnstore index.

When to use them and when not to

Before show any example must be say that Columnstore indexes are the preferred data storage format for large data warehoue and analytics workloads.
 
However thet can be added also in OLTP queries. I think for example that queries with group by and order by can benefit from columnstore indexes. 
 
Important: Clustered indexes should be used on large table (>5 million of rows) where less than 10% of rows are ever modified.  Deletes can cause fragmentation  thus reducing the index efficency. Updates are expensive because are done by a deleting and inserting technique.

So, what is a tipical OLTP workload that use a large table and a query suitable for a clustered index?
 
I think to tables that contain account movements, inventory movements or lines of invoices, delivery notes or orders. All these logical entities are commonly used in many ERP.
 
 
Example 1 - Adding a non clustered columnstore index
 
Let's do an example taking our example table ORDRIG.

This table contain a list of row of orders and it is so defined:
SELECT [id]
,[idordtes]
,[idcustomer]
,[idproduct]
,[Qty]
FROM [OrdRig]

All field are integer while Qty is float.
On the id column there is a clustered index.
The table has 1.5 million of rows

Now i write a simple query that group by idproduct and perform the sum of the column Qty:
       
SELECT IDPRODUCT,SUM(QTY)
FROM OrdRig 
GROUP BY IDPRODUCT ORDER BY IDPRODUCT

We will analyze 3 cases:
 
1) Only the clustered index on the ID column
2) Adding a non clustered (rowstore) index on the columns IDPRODUCT , QTY
3) Adding a non clustered columnstore index on the columns IDPRODUCT , QTY

For each case we will read the execution plan and the numbers of pages read from the disk (set statistics io and statistics time to ON) 

In the first case we get this result
 
Query Result
 
The execution plan show me that the Clustered index on the ID column is used:

Execution plan


Statistics IO and Statistics TIME to ON show me that we have read 6445 pages of 8 KB Size and that we have an execution time of 620 ms.
       

Tempo di analisi e compilazione SQL Server:
tempo di CPU = 0 ms, tempo trascorso = 0 ms.

Tempo di esecuzione SQL Server:
tempo di CPU = 0 ms, tempo trascorso = 0 ms.
Tempo di analisi e compilazione SQL Server:
tempo di CPU = 0 ms, tempo trascorso = 2 ms.

(2 righe interessate)
Tabella 'Worktable'. Conteggio analisi 0, letture logiche 0, letture fisiche 0, letture server di pagine 0, letture read-ahead 0, letture read-ahead server di pagine 0, letture logiche LOB 0, letture fisiche LOB 0, letture LOB server di pagine 0, letture LOB read-ahead 0, letture read-ahead LOB server di pagine 0.
Tabella 'Workfile'. Conteggio analisi 0, letture logiche 0, letture fisiche 0, letture server di pagine 0, letture read-ahead 0, letture read-ahead server di pagine 0, letture logiche LOB 0, letture fisiche LOB 0, letture LOB server di pagine 0, letture LOB read-ahead 0, letture read-ahead LOB server di pagine 0.
Tabella 'OrdRig'. Conteggio analisi 1, letture logiche 6445, letture fisiche 0, letture server di pagine 0, letture read-ahead 0, letture read-ahead server di pagine 0, letture logiche LOB 0, letture fisiche LOB 0, letture LOB server di pagine 0, letture LOB read-ahead 0, letture read-ahead LOB server di pagine 0.

(1 riga interessata)

Tempo di esecuzione SQL Server:
tempo di CPU = 625 ms, tempo trascorso = 620 ms.
Tempo di analisi e compilazione SQL Server:
tempo di CPU = 0 ms, tempo trascorso = 0 ms.

Tempo di esecuzione SQL Server:
tempo di CPU = 0 ms, tempo trascorso = 0 ms.

Ora di completamento: 2021-08-14T23:49:31.2297857+02:00



For the second case let me add a non clustered index on the column IDPRODUCT , QTY.

Running the Query we obtain that the non clustered index just added is effectively used.

execution plan
Read pages are dropped, Execution time is dropped too.
       

Tempo di analisi e compilazione SQL Server:
tempo di CPU = 0 ms, tempo trascorso = 0 ms.

Tempo di esecuzione SQL Server:
tempo di CPU = 0 ms, tempo trascorso = 0 ms.
Tempo di analisi e compilazione SQL Server:
tempo di CPU = 0 ms, tempo trascorso = 2 ms.

(2 righe interessate)
Tabella 'OrdRig'. Conteggio analisi 1, letture logiche 4292, letture fisiche 0, letture server di pagine 0, letture read-ahead 0, letture read-ahead server di pagine 0, letture logiche LOB 0, letture fisiche LOB 0, letture LOB server di pagine 0, letture LOB read-ahead 0, letture read-ahead LOB server di pagine 0.

(1 riga interessata)

Tempo di esecuzione SQL Server:
tempo di CPU = 485 ms, tempo trascorso = 485 ms.
Tempo di analisi e compilazione SQL Server:
tempo di CPU = 0 ms, tempo trascorso = 0 ms.

Tempo di esecuzione SQL Server:
tempo di CPU = 0 ms, tempo trascorso = 0 ms.

Ora di completamento: 2021-08-15T00:04:01.1891322+02:00



Now the the third case i will add a non clustered columnstore index on the same columns.
 
To create a columnstore index is enough add the word "columnstore" in the index definition after the create word.

Create Columnstore Index IDX_CS_ORDRIG_IDPRODUCT_QTA ON Ordrig (Idproduct,Qty)

Running the query again we get that the columnstore index is used.

execution plan

The execution time is awesome. Only 53 ms versus 485 ms!

       

Tempo di analisi e compilazione SQL Server:
tempo di CPU = 0 ms, tempo trascorso = 0 ms.

Tempo di esecuzione SQL Server:
tempo di CPU = 0 ms, tempo trascorso = 0 ms.
Tempo di analisi e compilazione SQL Server:
tempo di CPU = 0 ms, tempo trascorso = 2 ms.

(2 righe interessate)
Tabella 'OrdRig'. Conteggio analisi 2, letture logiche 0, letture fisiche 0, letture server di pagine 0, letture read-ahead 0, letture read-ahead server di pagine 0, letture logiche LOB 1098, letture fisiche LOB 7, letture LOB server di pagine 0, letture LOB read-ahead 3529, letture read-ahead LOB server di pagine 0.
Tabella 'OrdRig'. Letture segmento 2, segmento ignorato 0.
Tabella 'Worktable'. Conteggio analisi 0, letture logiche 0, letture fisiche 0, letture server di pagine 0, letture read-ahead 0, letture read-ahead server di pagine 0, letture logiche LOB 0, letture fisiche LOB 0, letture LOB server di pagine 0, letture LOB read-ahead 0, letture read-ahead LOB server di pagine 0.

(1 riga interessata)

Tempo di esecuzione SQL Server:
tempo di CPU = 47 ms, tempo trascorso = 53 ms.
Tempo di analisi e compilazione SQL Server:
tempo di CPU = 0 ms, tempo trascorso = 0 ms.

Tempo di esecuzione SQL Server:
tempo di CPU = 0 ms, tempo trascorso = 0 ms.

Ora di completamento: 2021-08-15T00:13:16.0179180+02:00



We consider that this is in line with what was declared by microsoft or 10 times higher performance (10x)
In this case it is true!



Example 2 - Adding a clustered columnstore index

Adding a clustered columnstore index create your data will be physically stored as a columnstore.

So for this example we will drop the rowstore index and will add a columnstore index:


CREATE CLUSTERED COLUMNSTORE INDEX IDX_CS_ORDRIG_ID ON ORDRIG


Note that if your query performs analytics only on few columns of your table then the great part of the segment can de disregarded.
For example: if your table contains 10 columns and your query performs analytics only on 1 columns, then the result will be that 90% of the segments (for other columns) can be disregarded.
 
This is the our query:
       
SELECT MIN(QTY), MAX(QTY) FROM OrdRig

As mentioned above only the QTY column will be read.
 
The execution plan show that the cluster columnstore index is used:


execution plan

Furthermore you can see that the storage is Columnstore and that the execution mode is Batch.

execution plan

Finally we can see that the execution time is of only 62 ms.
      

Tempo di analisi e compilazione SQL Server:
tempo di CPU = 0 ms, tempo trascorso = 0 ms.

Tempo di esecuzione SQL Server:
tempo di CPU = 0 ms, tempo trascorso = 0 ms.
Tempo di analisi e compilazione SQL Server:
tempo di CPU = 6 ms, tempo trascorso = 6 ms.

(1 riga interessata)
Tabella 'OrdRig'. Conteggio analisi 1, letture logiche 0, letture fisiche 0, letture server di pagine 0, letture read-ahead 0, letture read-ahead server di pagine 0, letture logiche LOB 6, letture fisiche LOB 0, letture LOB server di pagine 0, letture LOB read-ahead 0, letture read-ahead LOB server di pagine 0.
Tabella 'OrdRig'. Letture segmento 2, segmento ignorato 0.

(1 riga interessata)

Tempo di esecuzione SQL Server:
tempo di CPU = 62 ms, tempo trascorso = 62 ms.
Tempo di analisi e compilazione SQL Server:
tempo di CPU = 0 ms, tempo trascorso = 0 ms.

Tempo di esecuzione SQL Server:
tempo di CPU = 0 ms, tempo trascorso = 0 ms.

Ora di completamento: 2021-08-15T12:51:48.2934744+02:00



Conclusions

Are you wondering if you can use columnstore indexes for your query that you are developing for your OLTP ERP? Yes you can!

Are you developing statistics, analytics, reports or pivot tables for your ERP and do you need realtime results? Maybe a columnstore index could be applied.

Just you need to be in a right scenario and knows what the PROs and CONs are.

In this post i have not told you the whole columnstore indexes theory but if it is of your interest i can be more deep in a future post
 
Many ERP like SAP (the version with SQL Server not with the hana in memory database) already use columnstore indexes in their query. 
And you? What are you waiting for?

 
 
Thank you for reading my posts.
Send me your comments if you like this format or if you have questions.

Luca Dr. Biondi

Luca Biondi @ SQLServerPerformance blog 2021!






 

  

 

Previous post: SQL Server 2019 CU12 is out and other updates




 
 
 
 


Post a Comment