Sunday, December 27, 2009

Top New Features in SSIS 2008

1) Create Script tasks by using Microsoft Visual C# and Microsoft Visual Basic .NET.

2) Use ADO.NET for tasks as well as for source and destination components.

3) Improve scalability with thread pooling and enhanced lookup transformations.

4) Perform more functional and scalable data transfers with the improved SQL Server Import and Export Wizard.

5) New SSIS connectors for SAP BW, Oracle and Teradata.

Tuesday, November 24, 2009

Dynamic query as sql command for ADO NET / OleDb Source

A question on MSDN forum drew my attention towards a very common requirement which is made very tricky to do in SSIS.

A user may need to pass a dynamic query to OleDb source/ADO source something like
SELECT * FROM TblNm WHERE Col1 = @Var
In OleDb source user can create query like this in a variable and pass it to the OleDb Source but ADO NET source does not have Variable as Data Access Mode, so what do we do for ADO Net Source. Okay dont scratch around there is way out which works for both OleDb and ADO Net Source.

1. Create a variable


2. Create ADO NET Source or OleDb Source and select SQL Command as DATA ACCESS MODE and give a simple query like SELECT * FROM TblNm


3. In Control Flow Select DataFlow Task which contains this Source and from it Property Tab select Expressions and open Property Expressions Editor


4. Choose [ADO NET Source].[SQL Commnad] and make a expression like

5. Now run the package and Sql will be dynamically created and will get the value of variable.

Sunday, November 01, 2009

Fail to save package file : class not registered

Problem:

Sometime a user may encounter an error like below while creating a SSIS package from BIDS.

Error creating package

------------------------------
ADDITIONAL INFORMATION:

Failed to save package file "C:\Documents and Settings\UserName\Local Settings\Temp\tmp87.tmp" with error 0x80040154 "Class not registered".

-------------------------------

Solution:

To solve this try to register below DLLs by typing below commands in CMD promopt:-

regsvr32 msxml3.dll
regsvr32 msxml4.dll
regsvr32 msxml6.dll


Tuesday, April 21, 2009

Copy/Rename a file using File System Task in SSIS

This is a reference answer to a post on MSDN forum but is very useful

Question: While copying and renaming a file to a dynamic location through File System Task using variable throws an error

Error: Failed to lock variable "c:\test\test_200904202009.txt" for read access with error 0xC0010001 "The variable cannot be found. This occurs when an attempt is made to retrieve a variable from the Variables collection on a container during execution of the package, and the variable is not there. The variable name may have changed or the variable is not being created.".
OR

How to copy and rename a file to a dynamic location using SSIS?

Solution:

I will move and rename a file "test.txt" from c:\ to c:\backup with new name appended with date.

1) Create three variables

Src_File = c:\test.txt
Dest_File = test
Dest_Dir = c:\backup


2) In connection Manager create a File Connection with name as DestinationConn

3) In Expression property of DestinationConn use ConnectionString and provide following expression

@[Dest_Dir] + "\\" + @[Dest_File] + "_" + (DT_WSTR,4)DatePart("yyyy", GetDate()) + RIGHT("0" + (DT_WSTR,2)DatePart("mm", GetDate()), 2) + RIGHT("0" + (DT_WSTR,2)DatePart("dd", GetDate()), 2) + RIGHT("0" + (DT_WSTR,2)DatePart("hh", GetDate()), 2) + RIGHT("0" + (DT_WSTR,2)DatePart("mi", GetDate()), 2) + ".txt"



this will set the destination path and new name for the file

4) Create a File System task and configure like :




5) Execute the package

To run the package create a file named "test.txt"  and a folder "backup"at C:\

Thanks!!
Please leave a comment.

Tuesday, March 24, 2009

Column length in Excel source

Here is one very common warning with using Excel in SSIS

Whenever we use Excel file in SSIS, it takes default length of each column as 255. This results in having a warning if we try to map this column with column having length less than 255. We can avoid this warning my setting column length of the Excel source.

Right click to Open Excel source in Advance editor and set lenght of the column as depicted in the figure.

Monday, March 02, 2009

Getting Daily Feed in SSIS

Many time there comes a situation when you need to use a daily feed kind of file which is copied in a fixed directory and we need to use the file for data import.

Below I will show an approach to automate the process so that SSIS package would check if the today's file is present or not and if present then do the data import.

For Example
File name format : FileNameDate (FileName03022009.xls)
Fixed Directory : D:\Abc

1.Create a Variable "FileName" This will hold the path and name of the daily feed file.

2. Use script task. specify readwrite variable as "FileName"

3. Coding
a) Configure FileName to have path and name of the daily feed file.
Dts.Variables("FileName").Value = "D:\Abc\FileName" _ & Date.Today.Month.ToString.PadLeft(2, "0"c) & Date.Today.Day.ToString.PadLeft(2, "0"c) _ & Date.Today.Year & ".xls"

b) check if file is present or not. (Fail the package if file is not present)
'check if file is present
Dim Fi As New FileInfo(CStr(Dts.Variables("FileName").Value))
If (CBool(Fi.Exists().ToString())) Then Dts.TaskResult = Dts.Results.Success Else MsgBox(CStr(Dts.Variables("FileName").Value) & " is not present") Dts.TaskResult = Dts.Results.Failure End If

4) Create a Excel Connection Manager and in Expression property specify


5) Use the Excel Connection Manager in Excel source of DATA FLOW
6) Execute

Wednesday, February 25, 2009

Passing variable to Child package from Parent package in SSIS

(APPLICABLE ONLY IN SSIS 2005, Use Parent Package Configuration for SSIS 2008)

Very frequent Question which looks so complicated to programmers.

Actually passing a variable value to a child package is very trivial task. We can pass on the value by configuring parent variable in package configuration but there is an easy way of achieve this and the fact lies beneath the fundamental principle of Variable Scope.

If you call a Child package then it is like a container itself and all the variables defined in above hierarchy will be accessible in the Child package.

Let me show this with an example in which I will declare a variable "ParentVar" in my parent package and call a Child package which will access "ParentVar" and display in a msgbox.

1) Parent: Create Parent Package and declare a variable "ParentVar"


2) Child: Create a Child package and use a script task and define readonly variable as ParentVar



3) Child: Now in Script you can use ParentVar like any other variable.eg I am using to display it in a msgbox. I would suggest to create another child package variable and assign Parent package variable value to it and use it in the child package variable through out the package.


4) Child: Whole Child package will look like


5) Parent: Now in parent package call the child package through Execute Package task.
The Parent package will look like


6) Result: Execute Parent package.it will in turn call child package and it will display msgbox


Here was a simple method to use parent package variable in Child package.

Monday, February 23, 2009

SSIS Interview Questions

Common search for new SSIS programmer looking for change is what questions to expect on SSIS. Based on the interviews I take on SSIS, I will list down my favorites and expected questions on SSIS.


Q1 Explain architecture of SSIS?

SSIS architecture consists of four key parts:
a) Integration Services service: monitors running Integration Services packages and manages the storage of packages.
b) Integration Services object model: includes managed API for accessing Integration Services tools, command-line utilities, and custom applications.
c) Integration Services runtime and run-time executables: it saves the layout of packages, runs packages, and provides support for logging, breakpoints, configuration, connections, and transactions. The Integration Services run-time executables are the package, containers, tasks, and event handlers that Integration Services includes, and custom tasks.
d) Data flow engine: provides the in-memory buffers that move data from source to destination.



Q2 How would you do Logging in SSIS?
Logging Configuration provides an inbuilt feature which can log the detail of various events like onError, onWarning etc to the various options say a flat file, SqlServer table, XML or SQL Profiler.

Q3 How would you do Error Handling?
A SSIS package could mainly have two types of errors
a) Procedure Error: Can be handled in Control flow through the precedence control and redirecting the execution flow.
b) Data Error: is handled in DATA FLOW TASK buy redirecting the data flow using Error Output of a component.


Q4 How to pass property value at Run time? How do you implement Package Configuration?
A property value like connection string for a Connection Manager can be passed to the pkg using package configurations.Package Configuration provides different options like XML File, Environment Variables, SQL Server Table, Registry Value or Parent package variable.

Q5 How would you deploy a SSIS Package on production?

 A) Through Manifest
1. Create deployment utility by setting its propery as true .
2. It will be created in the bin folder of the solution as soon as package is build.
3. Copy all the files in the utility and use manifest file to deply it on the Prod.

B) Using DtsExec.exe utility
C)Import Package directly in MSDB from SSMS by logging in Integration Services.

Q6 Difference between DTS and SSIS?
Every thing except both are product of Microsoft :-).

Q7 What are new features in SSIS 2008?
explained in other post
http://sqlserversolutions.blogspot.com/2009/01/new-improvementfeatures-in-ssis-2008.html

Q8 How would you pass a variable value to Child Package?
too big to fit here so had a write other post
http://sqlserversolutions.blogspot.com/2009/02/passing-variable-to-child-package-from.html


Q9 What is Execution Tree?
Execution trees demonstrate how package uses buffers and threads. At run time, the data flow engine breaks down Data Flow task operations into execution trees. These execution trees specify how buffers and threads are allocated in the package. Each tree creates a new buffer and may execute on a different thread. When a new buffer is created such as when a partially blocking or blocking transformation is added to the pipeline, additional memory is required to handle the data transformation and each new tree may also give you an additional worker thread.


Q10 What are the points to keep in mind for performance improvement of the package?
http://technet.microsoft.com/en-us/library/cc966529.aspx

Q11 You may get a question stating a scenario and then asking you how would you create a package for that e.g. How would you configure a data flow task so that it can transfer data to different table based on the city name in a source table column?


Q13 Difference between Unionall and Merge Join?
a) Merge transformation can accept only two inputs whereas Union all can take more than two inputs

b) Data has to be sorted before Merge Transformation whereas Union all doesn't have any condition like that.


Q14 May get question regarding what X transformation do?Lookup, fuzzy lookup, fuzzy grouping transformation are my favorites.
For you.

Q15 How would you restart package from previous failure point?What are Checkpoints and how can we implement in SSIS?
When a package is configured to use checkpoints, information about package execution is written to a checkpoint file. When the failed package is rerun, the checkpoint file is used to restart the package from the point of failure. If the package runs successfully, the checkpoint file is deleted, and then re-created the next time that the package is run.

Q16 Where are SSIS package stored in the SQL Server?
MSDB.sysdtspackages90 stores the actual content and ssydtscategories, sysdtslog90, sysdtspackagefolders90, sysdtspackagelog, sysdtssteplog, and sysdtstasklog do the supporting roles.


Q17 How would you schedule a SSIS packages?
Using SQL Server Agent. Read about Scheduling a job on Sql server Agent

Q18 Difference between asynchronous and synchronous transformations?
Asynchronous transformation have different Input and Output buffers and it is up to the component designer in an Async component to provide a column structure to the output buffer and hook up the data from the input.

Q19 How to achieve parallelism in SSIS?
Parallelism is achieved using MaxConcurrentExecutable property of the package. Its default is -1 and is calculated as number of processors + 2.

-More questions added-Sept 2011
Q20 How do you do incremental load?

Fastest way to do incremental load is by using Timestamp column in source table and then storing last ETL timestamp, In ETL process pick all the rows having Timestamp greater than the stored Timestamp so as to pick only new and updated records

Q21 How to handle Late Arriving Dimension or Early Arriving Facts.
 

Late arriving dimensions sometime get unavoidable 'coz delay or error in Dimension ETL or may be due to logic of ETL. To handle Late Arriving facts, we can create dummy Dimension with natural/business key and keep rest of the attributes as null or default.  And as soon as Actual dimension arrives, the dummy dimension is updated with Type 1 change. These are also known as Inferred Dimensions.


Thursday, February 19, 2009

Selecting Excel source range in SSIS

Many times I have been asked by SSIS programmers that they face difficulties when they need to select some specified columns and rows or they need to skip first 2 /3 rows in the excel sheet.

Below I will demonstrate how we can select a specified range in an Excel Sheet.

A) Let me take below sheet as an example and select data after first 2 rows



B) Now in Data Flow task, select an Excel source
Configure Data Access Mode as SQL Command
and in SQL Command Text give query as

SELECT * FROM [Sheet1$ range]
SELECT * FROM [Sheet1$A2:B10]



C) Preview



This way user can select any range or any specific cell value in an Excel File

Tuesday, February 03, 2009

Matching in Lookup Component in SSIS

Many SSIS programmers get confused with the comparison results of Lookup transformation and feels baffled ;-| . Lookup transformation in SSIS is used to lookup for a reference in the lookup table where it performs a equi-join and returns the referenced column.

Let me explain the comparison/matching behavior of lookup transformation: It matches data differently when in cached (default) and non-cached (Enable memory Restriction) mode. In cached mode comparison will be case sensitive where as in non-cached mode comparison will be based on collation level of column being matched.

Mode :: Behavior
Cached (default) :: Case sensitive
Non-cached (Enable memory restrictions) :: Collation level of column

Why?
It basically depends where the comparision is done. In cached mode SSIS reterives data from the source and comparision is done at the client on the byte level which is CASE SENSITIVE. Whereas in Non-Cache mode ( Enable memory Restriction) comparision is done at database and is based on the collation level of the column being matched.

That's the behavior of Lookup Transformation in SSIS.

Friday, January 30, 2009

New improvements / features in SSIS 2008

With the release of SQL SERVER 2008 comes improved SSIS 2008. I will try to list down the improved and new features in SSIS 2008

1) Improved Parallelism of Execution Trees:

The biggest performance improvement in the SSIS 2008 is incorporation of parallelism in the processing of execution tree. In SSIS 2005, each execution tree used a single thread whereas in SSIS 2008 , the Data flow engine is redesigned to utilize multiple threads and take advantage of dynamic scheduling to execute multiple components in parallel, including components within the same execution tree

2) Any .NET language for Scripting:

SSIS 2008 is incorporated with new Visual Studio Tool for Application(VSTA) scripting engine. Advantage of VSTA is it enables user to use any .NET language for scripting.

3) New ADO.NET Source and Destination Component:

SSIS 2008 gets a new Source and Destination Component for ADO.NET Record sets.

4) Improved Lookup Transformation:

In SSIS 2008, the Lookuo Transformation has faster cache loading and lookup operations. It has new caching options, including the ability for the reference dataset to use a cache file(.caw) accessed by the Cache Connectin Manager. In addition same cache can be shared between multiple Lookup Transformations.

5) New Data Profiling Task:

SSIS 2008 has a new debugging aid Data Profiling Task that can help user analyze the data flows occurring in the package.In many cases, execution errors are caused by unexpected variations in the data that is being transferred. The Data Profiling Task can help users to discover the cource of these errors by giving better visibility into the data flow.

6) New Connections Project Wizard:

One of the main usability enhancement to SSIS 2008 is the new Connections Project Wizard. The Connections Project Wizard guides user through the steps required to create source and destinations.

Thursday, January 29, 2009

Difference between Merge and Union all transformation

I have been asked by many new SSIS developer about difference between Merge and Union all transformation in SSIS.

Well both of them essentially takes outputs from more than one sources and combines them into a single result set but there are couple of differences between two:

a) Merge transformation can accept only two inputs whereas Union all can take more than two inputs

b) Data has to be sorted before Merge Transformation whereas Union all doesn't have any condition like that.

Tuesday, January 27, 2009

Using varibles in Data flow Script task in SSIS

How to use variables in Script component of Data flow task in SSIS

1) Declare a variable as MyVar in varible window.


2) Declare variable type in dataflow> Script Component >Script


3) Use variable in the script
Now here is the catch we use variable in data flow task by
Me.Variable.MyVar ="MyVariable"

where as in control flow we use variable using
DTS.Variable("MyVar").value = "MyVariable"

Monday, January 05, 2009

Creating directory using SSIS

Creating a directory/folder at a dynamic path using SSIS sometimes becomes complex job as I feel and can understand by reading questions posted on various SSIS forums, File System task is not good in handling dynamic path

Here as an example and answer to a question posted on MSDN forum I would demonstrate how we can create a directory of today's date.

1) Create a variable varDirPath of string datatype


2.a) Create a Script task and give varDirPath as ReadWriteVariable


2.b) Use below code to set varDirPath value as the path of directory to be created
Public Sub Main()
'set dir path
Dts.Variables("varDirPath").Value = "D:\" + Date.Today.Year.ToString _
+ Date.Today.Month.ToString + Date.Today.Day.ToString

Dts.TaskResult = Dts.Results.Success
End Sub

this will create path like D:\20090115

3) Use File System Task and configure like below pic:


4) Execute the package

Friday, January 02, 2009

Cracking Query Execution - II: Why Actual and Estimated Query Execution Plan may differ

Generally there is no difference between Actual Execution plan and Estimated plan and Query is executed the way Estimated Plan outlines.However, circumstances can arise that can cause Execution Plan to change:

Obsolete Statistics:
The main cause of difference between the plans is difference between the statistics and the actual data.This generally occurs over time as data is added and deleted. This causes the key values that define the index to change,or their distribution to change.

Estimated Plan becomes Invalid:
In some instances, SQL Server Engine doesn't even generate Estimated execution plan, like
CREATE TABLE tblAbc
(
Col1 INT,
Col2 INT);

INSERT INTO tblAnd VALUES ( 1,2);

This will through error as
Msg 208, Level 16, State 1, Line 6
Invalid object name 'tblAnd'.

As table tblAbc doesn't exists in database yet, so it cant create a plan for it.

When Parallelism is required:
When a plan meets a threshold for parallelism two plans are created. Which plan is actually executed is up to the query engine.

Google Dataset Search

Google's Vision statement  is “ to provide access to the world's information in one click. ” Google’s mission Statement is “ ...