Ways To Load Data In SQL Azure

This blog provides links to a presentation and a sample application that shows how to load data in SQL Azure using different techniques and tools. The presentation compares the following techniques: INSERT, BCP, INSERT BULK, SQLBulkCopy and a few tools like SSIS and Enzo Data Copy.

The presentation contains load tests performed from a local machine with 4 CPUs, using 8 threads (leveraging the Task Parallel Library), to a SQL Azure database using the code provided below. The test loads 40,000 records. Note that however the test was not conducted from a highly controlled environment, so results may vary. However enough differences were found to show a trend and demonstrate the speed of the SQLBulkCopy API versus other programmatic techniques.

You can download the entire presentation deck here: presentation (PDF)

image

Enzo Data Copy and SSIS

The presentation deck also shows that the Enzo Data Copy wizard loads data efficiently with large tables. However it performs slower for very small databases. The reason the Enzo Data Copy is fast with larger databases is due to its internal chunking algorithm and highly tuned parallel insert operations tailored for SQL Azure. In addition, Enzo Data Copy is designed to retry failed operations that could be the result of network connection issues or throttling; this resilience to connection issues ensures that large databases are more likely to be transferred successfully the first time. The Enzo Data Copy tool can be found here: Enzo Data Copy

In this test, with SSIS left with its default configuration, SSIS was 25% slower than the Enzo Data Copy Wizard with 2,000,000 records to transfer.  The SSIS Package created was very basic; the UseBulkInsertWhenPossible property was set to true which controls the use of the INSERT BULK command. Note that a more advanced SSIS developer will probably achieve better results by tuning the SSIS load; the comparison is not meant to conclude that SSIS is slower than the Enzo Data Copy utility; rather it is meant to show that the utility compares with SSIS in load times with larger data sets. Also note that the utility is designed to be a SQL Server Migration tool; not a full ETL product.

image

 

Note about the source code

Note that the source code is provided as-is for learning purposes. In order to use the source code you will need to change the connection string to a SQL Azure database, create a SQL Azure database if you don’t already have one, and create the necessary database objects (the T-SQL commands to run can be found in the code). 

The code is designed to give you control over how many threads you want to use and the technique to load the data. For example, this command loads 40,000 records (1000 x 40) using 8 threads in batches of 1000: ExecuteWithSqlBulkCopy(1000, 40, 8); While this command loads 40,000 records (500 x 80) using 4 threads in batches of 500: ExecuteWithPROC(500, 80, 4);

Here is the link to the source code: source code

Here is a sample output of the code:

image

Print | posted @ Tuesday, February 7, 2012 5:35 AM

Comments on this entry:

Comments are closed.

Comments have been closed on this topic.