Search Suggest

SQL Server, How to do a fast massive insert

Hi Guys!

Today a light post to read to start the week!
 
An easy and practical trick related to massive insertions.
Do we insert indexes before or after populating a table?

Enjoy the reading!

 

 

A massive insert

Suppose you have to insert a large number of rows into a table that doesn't exist yet. This table will have a clustered index let's say on the ID field.

Let's start with a question: When do we create our clustered index?

I have seen many times procedures that created the table and put the clustered index on it, then mass insertion took place.

Let's do our test!

Let's create our table with the command:


CREATE TABLE [dbo].[Movements](
[Id] [int]
[Qty] [float] NULL,
[Price] [float] NULL
) ON [PRIMARY]

Right now our table has no indexes so it's called a heap table.

Let's create our clustered index:


CREATE CLUSTERED INDEX CI_MOVEMENTS_ID ON Movements(ID)

Now we insert 10 million rows with this command:


INSERT INTO Movements (Id,Qty,Price)
SELECT
TOP 10000000 10000000 - row_number() OVER (ORDER BY s1.object_id),1,1
FROM sys.columns s1
JOIN sys.columns s2 ON s1.object_id <> s2.object_id
JOIN sys.columns s3 ON s1.object_id <> s3.object_id

Total execution time:


SQL Server parse and compile time:
CPU time = 62 ms, elapsed time = 149 ms.

SQL Server Execution Times:
CPU time = 67469 ms, elapsed time = 102457 ms.


It took us 102seconds to enter 10 million records.

 

However, there is a though

If you think about it, already having the index on the table before inserting the data we force SQL server to keep the table sorted every time a row is inserted!

We could therefore do it differently!

Let's create the table. We populate it and finally, after the massive insertion, we create the index.

Let's repeat the test and see the results .. so drop the table and recreate it!

This time we will not add the clustered index before insert data.

Running now the same insert command below:


INSERT INTO Movements (Id,Qty,Price)
SELECT
TOP 10000000 10000000 - row_number() OVER (ORDER BY s1.object_id),1,1
FROM sys.columns s1
JOIN sys.columns s2 ON s1.object_id <> s2.object_id
JOIN sys.columns s3 ON s1.object_id <> s3.object_id

we have:


SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 6 ms.

SQL Server Execution Times:
CPU time = 71234 ms, elapsed time = 81260 ms.

It took us 81 seconds instead of 102 seconds.About 20% less.

Not Bad!



 

That's all for today.
I wish you a great weekend! 
the next post will be about another trick ..so stay tuned!

...and don't forget to follow me on linked by clicking on follow me!



 

 

 

 

Help me to share knowledge on my blog  


Next post:

Previous post: SQL Server: Grouped Aggregate Pushdown. Make columnstore indexes go faster with aggregate pushdown functionality!

Post a Comment