Dec 14

I'm continuing my previous post to include a few more tips to help you...mainly concerned with execution performance. I know that with my projects, time is always the issue. What may seem like a small 5-10 min transfer operation becomes large when coupled with 45 other processes that must run in the morning as part of a data warehouse load.

With SQL Server 2008, I recently read that SSIS 2008 has further enhanced the internal dataflow pipeline engine to provide even better performance, you might have heard the news that SSIS 2008 has set an ETL World record of uploading 1TB of data in less than half an hour.

Pulling High Volumes of Data

As part of a recent project, we had to pull data from a source table which had ~ 4 million records to a new target table. Initially when the SSIS package started, everything looked fine, data was being transferred as expected but gradually the performance degraded and the data transfer rate went down dramatically. During analysis we found that the target table had a primary clustered key and two non-clustered keys. Because of the high volume of data inserts into the target table these indexes got fragmented heavily up to 85%-90%. We used the online index rebuilding feature to rebuild/defrag the indexes, but again the fragmentation level was back to 90% after every 15-20 minutes during the load. This whole process of data transfer and parallel online index rebuilds took almost 12-13 hours which was much more than our expected time for data transfer.

Then we came with an approach to make the target table a heap by dropping all the indexes on the target table in the beginning, transfer the data to the heap and on data transfer completion, recreate indexes on the target table. With this approach, the whole process (by dropping indexes, transferring data and recreating indexes) took just 3-4 hours which was what we were expecting.

Rows Per Batch Settings

Rows per batch – The default value for this setting is -1 which specifies all incoming rows will be treated as a single batch. You can change this default behavior and break all incoming rows into multiple batches. The allowed value is only positive integer which specifies the maximum number of rows in a batch.

Maximum insert commit size – The default value for this setting is '2147483647' (largest value for 4 byte integer type) which specifies all incoming rows will be committed once on successful completion. You can specify a positive value for this setting to indicate that commit will be done for those number of records. You might be wondering, changing the default value for this setting will put overhead on the dataflow engine to commit several times. Yes that is true, but at the same time it will release the pressure on the transaction log and tempdb to grow tremendously specifically during high volume data transfers.

The above two settings are very important to understand to improve the performance of tempdb and the transaction log. For example if you leave 'Max insert commit size' to its default, the transaction log and tempdb will keep on growing during the extraction process and if you are transferring a high volume of data the tempdb will soon run out of memory as a result of this your extraction will fail. So it is recommended to set these values to an optimum value based on your environment.

Note: The above recommendations have been done on the basis of experience gained working with DTS and SSIS for the last couple of years. But as noted before there are other factors which impact the performance, one of the them is infrastructure and network. So you should do thorough testing before putting these changes into your production environment.

Avoid SELECT *

I know this recommendation has been around forever for most DBAs and developers, but in SSIS there is an additional important consideration to avoid using SELECT *. The Data Flow Task (DFT) of SSIS uses a buffer (a chunk of memory) oriented architecture for data transfer and transformation. When data travels from the source to the destination, the data first comes into the buffer, required transformations are done in the buffer itself and then written to the destination. The size of the buffer is dependant on several factors, one of them is the estimated row size. The estimated row size is determined by summing the maximum size of all the columns in the row. So the more columns in a row means less number of rows in a buffer and with more buffer requirements the result is performance degradation. Hence it is recommended to select only those columns which are required at destination.

Even if you need all the columns from the source, you should use the column name specifically in the SELECT statement otherwise it takes another round for the source to gather meta-data about the columns when you are using SELECT *.

OLEDB Destination Settings

There are couple of settings with OLEDB destination which can impact the performance of data transfer.

Data Access Mode – This setting provides the 'fast load' option which internally uses a BULK INSERT statement for uploading data into the destination table instead of a simple INSERT statement (for each single row) as in the case for other options. So unless you have a reason for changing it, don't change this default value of fast load. If you select the 'fast load' option, there are also a couple of other settings which you can use as discussed below.

Keep Identity – By default this setting is unchecked which means the destination table (if it has an identity column) will create identity values on its own. If you check this setting, the dataflow engine will ensure that the source identity values are preserved and same value is inserted into the destination table.

Keep Nulls – Again by default this setting is unchecked which means default value will be inserted (if the default constraint is defined on the target column) during insert into the destination table if NULL value is coming from the source for that particular column. If you check this option then default constraint on the destination table's column will be ignored and preserved NULL of the source column will be inserted into the destination.

Table Lock – By default this setting is checked and the recommendation is to let it be checked unless the same table is being used by some other process at same time. It specifies a table lock will be acquired on the destination table instead of acquiring multiple row level locks, which could turn into lock escalation problems.

Check Constraints – Again by default this setting is checked and recommendation is to un-check it if you are sure that the incoming data is not going to violate constraints of the destination table. This setting specifies that the dataflow pipeline engine will validate the incoming data against the constraints of target table. If you un-check this option it will improve the performance of the data load.

Tags:

Comments

Medical Answering Services

Posted on Wednesday, 20 January 2010 02:45

My wife agrees that I do indeed lead a "dog's life".

The way she puts it though is: "He comes in with muddy feet, makes himself comfortable by the fire, and expects to be fed."


Regards
Marshall

dual diagnosis programs

Posted on Friday, 22 January 2010 03:54

Hey I really apreciate the time & effort you put in to write the article karlhering. It gives extensive information about this topic. It would be really great if you could keep us posted about the latest updates in this topic. Thanks !

plastic surgeon manhattan

Posted on Friday, 22 January 2010 04:27

Worried that his son was spending too much money on dates, Little Johnny's Father asked how much his last date had cost.

Little Johnny calculated a minute then replied, "Oh, about $15 or so I think."

"Well," said his Father, "I'm proud of you for finally coming up with an inexpensive evening."

"To be honest Dad," Little Johnny went on, "we'd have done more, but that was all the money she had."




Regards
Sanders

Computer Services Baltimore

Posted on Sunday, 24 January 2010 08:07

I digged this for more news from you.



Regards and respect
James

plastic surgery san francisco

Posted on Sunday, 24 January 2010 12:38

A man entered a pet shop, wanting to buy a parrot. The shop owner pointed out three identical parrots on a perch and said, "The parrot to the left costs 500 dollars."

"Why does that parrot cost so much?" the man wondered.

The owner replied, "Well, it knows how to use a computer."

The man asked about the next parrot on the perch.

"That one costs 1,000 dollars because it can do everything the other parrot can do, plus it knows how to use the UNIX operating system." Naturally, the startled customer asked about the third parrot.

"That one costs 2,000 dollars."

"And what does that one do?" the man asked.

The owner replied, "To be honest, I've never seen him do a thing, but the other two call him boss!"




Regards
Walker

laser hair removal miami

Posted on Monday, 25 January 2010 13:41

Tried to autotranslate you position not believe the oeuvre any prospect deutsch writing?



Regards
Pridemore

vacation rentals kauai poipu

Posted on Tuesday, 26 January 2010 01:03

Very, real unresponsive situation position! I am passionate it!! Will uprise backwards again - action you feeds also, Thanks.




Regards
Young



thailand budget hotels

Posted on Tuesday, 26 January 2010 23:56

Thats an absorbing station. It was designer visiting your blog. Expectation to visit again.




Regards
Legros




retirement community utah

Posted on Wednesday, 27 January 2010 03:32

Never seen such cool berth. I scan it all the way to the end. Stay them upcoming.




Regards
Calabrese










dog cataract eye drops

Posted on Wednesday, 27 January 2010 03:38

Do you bear visitor posts? I would mate to compose span articles here.




Regards
Gerry



Sonoma Resort

Posted on Thursday, 28 January 2010 04:59

I enjoyed indication this





Regards
Carrillo




buenos aires apartment rental

Posted on Thursday, 28 January 2010 05:42

Hey - precise website, conscionable search around whatsoever blogs, seems a pretty overnice platform you are using. I'm currently using Wordpress for a few of my sites but hunting to alter one of them over to a adps siamese to yours as a test run. Anything in item you would advocate most it?




Regards
Elliot




polywood furniture

Posted on Saturday, 30 January 2010 08:47

Wonderful illustrated entropy. I impart you active that. No dubiety it instrument be very efficacious for my future projects. Would equal to see both other posts on the identical soul!





Regards
Sheppard








massage in reston

Posted on Saturday, 30 January 2010 15:25

Never seen specified caller spot. I see it all the way to the end. Livelihood them forthcoming.




Regards
Hester



plastic surgery orange county

Posted on Saturday, 30 January 2010 16:22

Although I do concur with your move, I hit my own reservations.




Regards
Boles









face lift seattle

Posted on Saturday, 30 January 2010 19:10

Pretty echt situation. I retributory stumbled upon your diary and sought to say that I feature rattling enjoyed measurement your diary posts. Any way I'll be subscribing to your take and I wish you collection again presently.





Regards
Strohm

rhinoplasty beverly hills

Posted on Sunday, 7 February 2010 19:28

Let me copulate if you are curious to wreak as article writer for me? I can tender $10/article.





Regards
Sikora






engraving raleigh

Posted on Monday, 22 February 2010 13:21

Hi nice POst

Regards

Bushan

Spanish Translation Services

Posted on Monday, 22 February 2010 13:41

I bookmarked your post will read this latter


Regards

Arvin

how to stop smoking

Posted on Thursday, 12 August 2010 17:50

Great article! I've been searching yahoo for several hours searching for appropriate facts about this, they really need to position your site higher!

how to stop smoking

Posted on Thursday, 12 August 2010 17:55

Good article! I have been looking thru yahoo for hours trying to find appropriate facts about this, they definitely need to position your website on page 1!

french press coffee

Posted on Thursday, 12 August 2010 18:00

Good article! I've been searching search engines for hours trying to find related info on this, they absolutely should position your website higher!

hughesnet

Posted on Saturday, 14 August 2010 19:39

Good story! I have been looking thru yahoo for several hours searching for relevant information on this, they definitely need to rank your site on page 1!

hughesnet

Posted on Saturday, 14 August 2010 19:57

Good story! I have been searching google for several hours looking for related facts about this, they definitely need to rank your website on the first page!

high speed satellite internet

Posted on Saturday, 14 August 2010 20:09

Nice article! I've been searching google for hours trying to find related information on this, they definitely need to position your website much higher!

hardi siding

Posted on Saturday, 28 August 2010 23:25

kennewick roofing

Posted on Sunday, 29 August 2010 00:18

spokane handyman

Posted on Sunday, 29 August 2010 01:03

Add comment


(Will show your Gravatar icon)

  Country flag

biuquote
  • Comment
  • Preview
Loading