SQL Server, How to do a fast massive insert

Hi Guys!

Today a light post to read to start the week!

An easy and practical trick related to massive insertions.

Do we insert indexes before or after populating a table?

Enjoy the reading!

A massive insert

Suppose you have to insert a large number of rows into a table that doesn't exist yet. This table will have a clustered index let's say on the ID field.

Let's start with a question: When do we create our clustered index?

I have seen many times procedures that created the table and put the clustered index on it, then mass insertion took place.

Let's do our test!

Let's create our table with the command:


CREATE TABLE [dbo].[Movements](
[Id] [int] 
[Qty] [float] NULL,
[Price] [float] NULL
) ON [PRIMARY]

Right now our table has no indexes so it's called a heap table.

Let's create our clustered index:


CREATE CLUSTERED INDEX CI_MOVEMENTS_ID ON Movements(ID)

Now we insert 10 million rows with this command:


INSERT INTO Movements (Id,Qty,Price)
SELECT 
  TOP 10000000 10000000 - row_number() OVER (ORDER BY s1.object_id),1,1
FROM sys.columns s1 
  JOIN sys.columns s2 ON s1.object_id <> s2.object_id
  JOIN sys.columns s3 ON s1.object_id <> s3.object_id

Total execution time:


 SQL Server parse and compile time: 
   CPU time = 62 ms, elapsed time = 149 ms.

 SQL Server Execution Times:
   CPU time = 67469 ms,  elapsed time = 102457 ms.

It took us 102seconds to enter 10 million records.

However, there is a though

If you think about it, already having the index on the table before inserting the data we force SQL server to keep the table sorted every time a row is inserted!

We could therefore do it differently!

Let's create the table. We populate it and finally, after the massive insertion, we create the index.

Let's repeat the test and see the results .. so drop the table and recreate it!

This time we will not add the clustered index before insert data.

Running now the same insert command below:


INSERT INTO Movements (Id,Qty,Price)
SELECT 
  TOP 10000000 10000000 - row_number() OVER (ORDER BY s1.object_id),1,1
FROM sys.columns s1 
  JOIN sys.columns s2 ON s1.object_id <> s2.object_id
  JOIN sys.columns s3 ON s1.object_id <> s3.object_id

we have:


 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 6 ms.

 SQL Server Execution Times:
   CPU time = 71234 ms,  elapsed time = 81260 ms.

It took us 81 seconds instead of 102 seconds.About 20% less.

Not Bad!

That's all for today.

I wish you a great weekend!

the next post will be about another trick ..so stay tuned!

...and don't forget to follow me on linked by clicking on follow me!

Help me to share knowledge on my blog

Previous post: SQL Server: Grouped Aggregate Pushdown. Make columnstore indexes go faster with aggregate pushdown functionality!

HOIT ASIA -Sever Free Download

Search Suggest

SQL Server, How to do a fast massive insert

A massive insert

Post a Comment

How to detect said database is primary Replica or Are You the Primary Replica? in SQL Server 2014 and SQL 2016

Component name: SQL Server Database Engine Services Instance Features Component error code: 0x851A0018 Database Engine Services fail

Invalid object name master.dbo.spt_values [solved]

The computer object associated with the cluster network name resource 'AG-DIGA_AG-DIGAL' could not be updated in domain 'adven.com' during the Resource post on-line operation.

updating permission setting for file 'F:\INST21_DAta\System Volumne Information\ResumeKeyFilter.store failed. The file permission settings were supposed to be set to D:P(A;OICI;FA;;;BA)(A;OICI;FA;;;SY)(A;OICI;FA;;;S-1-5-80-1237161748-11049725-35363445448-169532831-1350696550)'

HOIT ASIA