Friday, January 30, 2009

New improvements / features in SSIS 2008

With the release of SQL SERVER 2008 comes improved SSIS 2008. I will try to list down the improved and new features in SSIS 2008

1) Improved Parallelism of Execution Trees:

The biggest performance improvement in the SSIS 2008 is incorporation of parallelism in the processing of execution tree. In SSIS 2005, each execution tree used a single thread whereas in SSIS 2008 , the Data flow engine is redesigned to utilize multiple threads and take advantage of dynamic scheduling to execute multiple components in parallel, including components within the same execution tree

2) Any .NET language for Scripting:

SSIS 2008 is incorporated with new Visual Studio Tool for Application(VSTA) scripting engine. Advantage of VSTA is it enables user to use any .NET language for scripting.

3) New ADO.NET Source and Destination Component:

SSIS 2008 gets a new Source and Destination Component for ADO.NET Record sets.

4) Improved Lookup Transformation:

In SSIS 2008, the Lookuo Transformation has faster cache loading and lookup operations. It has new caching options, including the ability for the reference dataset to use a cache file(.caw) accessed by the Cache Connectin Manager. In addition same cache can be shared between multiple Lookup Transformations.

5) New Data Profiling Task:

SSIS 2008 has a new debugging aid Data Profiling Task that can help user analyze the data flows occurring in the package.In many cases, execution errors are caused by unexpected variations in the data that is being transferred. The Data Profiling Task can help users to discover the cource of these errors by giving better visibility into the data flow.

6) New Connections Project Wizard:

One of the main usability enhancement to SSIS 2008 is the new Connections Project Wizard. The Connections Project Wizard guides user through the steps required to create source and destinations.

Thursday, January 29, 2009

Difference between Merge and Union all transformation

I have been asked by many new SSIS developer about difference between Merge and Union all transformation in SSIS.

Well both of them essentially takes outputs from more than one sources and combines them into a single result set but there are couple of differences between two:

a) Merge transformation can accept only two inputs whereas Union all can take more than two inputs

b) Data has to be sorted before Merge Transformation whereas Union all doesn't have any condition like that.

Tuesday, January 27, 2009

Using varibles in Data flow Script task in SSIS

How to use variables in Script component of Data flow task in SSIS

1) Declare a variable as MyVar in varible window.


2) Declare variable type in dataflow> Script Component >Script


3) Use variable in the script
Now here is the catch we use variable in data flow task by
Me.Variable.MyVar ="MyVariable"

where as in control flow we use variable using
DTS.Variable("MyVar").value = "MyVariable"

Monday, January 05, 2009

Creating directory using SSIS

Creating a directory/folder at a dynamic path using SSIS sometimes becomes complex job as I feel and can understand by reading questions posted on various SSIS forums, File System task is not good in handling dynamic path

Here as an example and answer to a question posted on MSDN forum I would demonstrate how we can create a directory of today's date.

1) Create a variable varDirPath of string datatype


2.a) Create a Script task and give varDirPath as ReadWriteVariable


2.b) Use below code to set varDirPath value as the path of directory to be created
Public Sub Main()
'set dir path
Dts.Variables("varDirPath").Value = "D:\" + Date.Today.Year.ToString _
+ Date.Today.Month.ToString + Date.Today.Day.ToString

Dts.TaskResult = Dts.Results.Success
End Sub

this will create path like D:\20090115

3) Use File System Task and configure like below pic:


4) Execute the package

Friday, January 02, 2009

Cracking Query Execution - II: Why Actual and Estimated Query Execution Plan may differ

Generally there is no difference between Actual Execution plan and Estimated plan and Query is executed the way Estimated Plan outlines.However, circumstances can arise that can cause Execution Plan to change:

Obsolete Statistics:
The main cause of difference between the plans is difference between the statistics and the actual data.This generally occurs over time as data is added and deleted. This causes the key values that define the index to change,or their distribution to change.

Estimated Plan becomes Invalid:
In some instances, SQL Server Engine doesn't even generate Estimated execution plan, like
CREATE TABLE tblAbc
(
Col1 INT,
Col2 INT);

INSERT INTO tblAnd VALUES ( 1,2);

This will through error as
Msg 208, Level 16, State 1, Line 6
Invalid object name 'tblAnd'.

As table tblAbc doesn't exists in database yet, so it cant create a plan for it.

When Parallelism is required:
When a plan meets a threshold for parallelism two plans are created. Which plan is actually executed is up to the query engine.

Google Dataset Search

Google's Vision statement  is “ to provide access to the world's information in one click. ” Google’s mission Statement is “ ...