Data Solutions: 2008

Wednesday, December 31, 2008

Cracking Query Execution - I: Query Cycle

The very basic idea of designing a database is to store information and Query effectively, I will discuss what a Query is and how it gets executed:

QUERY: A precise request for information retrieval with database and information systems

What happens when we submit a query:
As we submit a query, number of processes in the database engine go to work on it so as to manage the system in such way that retrieve or store a data in a timely a manner as possible, whilst maintaining the integrity of the data.
We can roughly break down these processes into two stages
A) Processes that occurs in the Relational engine
In relational engine the query is parsed and then processed by Query Optimizer, which generates an execution plan. This plan is sent (in binary format) to Storage engine.
B) Processes that occurs in the Storage engine
In storage engine process such as locking, index maintenance and transaction occur.

Query Parsing:
As a SQL Server Engine receives a query,it passes it through a process that checks that the T-SQL is written correctly and is well formed.The output of the parser process is a PARSE TREE.

If the T-SQL is a DML, the Parse tree is passed to a precess called as ALGEBRIZER, it resolves all the names of various objects like table and columns referred in the query, it also performs type checking and does aggreate binding (determines location of aggregates with in the query). Output of Algebrizer is called as QUERY PROCESSOR TREE or ALEGEBRISER TREE and is passed to Query optimiser
Non DML are not passed to Algebrizer of query optimiser as non-DML cant be optimized (SQL Server doesn't provide many ways of creating a table).

Query Optimizer:
The optimizer figures out how best to implement the request represented by the T-SQL query you submitted. It decides if the data can be accessed through indexes, what types of joins to use and much more. The decisions made by the optimizer are based on what it calculates to be the cost of a given execution plan, in terms of the required CPU processing and I/O, and how fast it will execute. Hece, this is known as cost-based plan.

The optimizer will generate and evaluate many plans (unless there is already a cached plan) and, generally speaking, will choose the lowest-cost plan i.e. the plan it thinks will execute the query as fast as possible and use the least amount of resources, CPU and I/O. The calulation of the execution speed is the most important calculation. Sometimes, the optimizer will select a less efficient plan if it thinks ith will take more time to evaluate many plans than to run a less efficient plan.

Once the optimizer arrives at an execution plan, the estimated execution plan is created and stored in a memory space known as the PLAN CACHE.
Sometimes actual executed plan differs from the estimated execution plan , WHY

Query Execution:
Once the execution plan is generated, the action switches to the storage engine, where the query is actually executed, according to the plan.

Wednesday, December 24, 2008

Numeric Checking in SSIS

A post in MSDN forum drew my attention toward missing functionality of IsNumeric() in SSIS expressions.

To solve this we can use script task and check of IsNumric()
forexample I will use following data and table

--create a test table
create table TEST
(
[id] varchar(20),
[name] varchar(20)
)
--Fill some data

Insert into TEST values ( 1,'Mike')
Insert into TEST values ( 2 ,'Rahul')
--error data with non numeric id
Insert into TEST values ( '3a' ,'Worngdata')

Now lets design our package
1) Get data extract in Oledb DataSoruce
2) Create a Script task and specify input column as ID

2.b) Add a column as IsNumeric of Boolean datatype

2.c) Write below code as Script

If IsNumeric(Row.id) Then
Row.IsNumeric = True
Else
Row.IsNumeric = False
End If

3) Use Conditional Split to separate out bad records and valid records

4) Now when we execute the package it will separate out records with non-numeric data

Full package

Tuesday, December 23, 2008

Identify Orphan Users on an instance

Many a time when we move our database to new instance, rebuilding server or setting up a backup we need to synchronize user logins with the server

We can use below script to find out the Orphan Users on the Server

IF EXISTS (SELECT * FROM sysobjects WHERE id = OBJECT_ID(N'[dbo].[usp_ShowOrphanUsers]') AND OBJECTPROPERTY(id, N'IsProcedure') = 1)
DROP PROCEDURE [dbo].[usp_ShowOrphanUsers]
GO

CREATE PROC dbo.usp_ShowOrphanUsers
AS
BEGIN
CREATE TABLE #Results
([Database Name] sysname ,
[Orphaned User] sysname ,
[Type User] sysname )

SET NOCOUNT ON

DECLARE @DBName sysname, @Qry NVARCHAR(4000)

SET @Qry = ''
SET @DBName = ''

WHILE @DBName IS NOT NULL
BEGIN
SET @DBName =
(
SELECT MIN(name)
FROM master..sysdatabases
WHERE name NOT IN
('master', 'model', 'tempdb', 'msdb', 'distribution', 'pubs', 'northwind', 'dba_database')
AND DATABASEPROPERTY(name, 'IsOffline') = 0
AND DATABASEPROPERTY(name, 'IsSuspect') = 0
AND name > @DBName
)

IF @DBName IS NULL BREAK

SET @Qry = ' SELECT ''' + @DBName + ''' AS [Database Name],
CAST(name AS sysname) COLLATE Latin1_General_CI_AS AS [Orphaned User],
[Type User] =
CASE isntuser
WHEN ''0'' THEN ''SQL User''
WHEN ''1'' THEN ''NT User''
ELSE ''Not Available''
END
FROM ' + QUOTENAME(@DBName) + '..sysusers su
WHERE su.islogin = 1
AND su.name NOT IN (''INFORMATION_SCHEMA'', ''sys'', ''guest'', ''dbo'', ''system_function_schema'')
AND NOT EXISTS (SELECT 1 FROM master..syslogins sl WHERE su.sid = sl.sid)'
INSERT INTO #Results
EXEC master..sp_executesql @Qry
END
SELECT *
FROM #Results
ORDER BY [Database Name], [Orphaned User]
IF @@ROWCOUNT = 0
PRINT 'No orphaned users exist in this server.'
END
GO

--Execute the proc
EXEC [dbo].[usp_ShowOrphanUsers]

"This script is taken from SQL Server Central"

Friday, October 31, 2008

New Datatypes in SQL Server 2008

SQL Server 2008 has introduced 4 new datatype:

Date and Time: Four new date and time data types have been added, making working with time much easier than it ever has in the past. They include: DATE, TIME, DATETIME2, and DATETIMEOFFSET.
Spatial: Two new spatial data types have been added--GEOMETRY and GEOGRAPHY--which you can use to natively store and manipulate location-based information, such as Global Positioning System (GPS) data.
HIERARCHYID: The HIERARCHYID data type is used to enable database applications to model hierarchical tree structures, such as the organization chart of a business.
FILESTREAM: FILESTREAM is not a data type as such, but is a variation of the VARBINARY(MAX) data type that allows unstructured data to be stored in the file system instead of inside the SQL Server database.

Date and Time:

In SQL Server 2005 and earlier, SQL Server only offered two date and time data types: DATETIME and SMALLDATETIME. While they were useful in many cases, they had a lot of limitations, including:

Both the date value and the time value are part of both of these data types, and you can’t choose to store one or the other. This often causes a lot of wasted storage (because you store data you don’t need or want); adds unwanted complexity to many queries because the data types often had to be converted to a different form to be useful; and often reduces performance because WHERE clauses with these data and time data types often had to include functions to convert them to a more useful form, preventing these queries from using indexes.
They are not time-zone aware, which often requires extra coding for time-aware applications.
Precision is only .333 seconds, which is often not granular enough for some applications.
The range of supported dates is not adequate for some applications, and the range does not match the range of .NET CLR DATETIME data type, which requires additional conversion code.

To overcome these problems, SQL Server 2008 introduces four new date and time data types, which include:

DATE: As you can imagine, the DATE data type only stores a date in the format of YYYY-MM-DD. It has a range of 0001-01-01 through 9999-12-32, which should be adequate for most business and scientific applications. The accuracy is 1 day, and it only takes 3 bytes to store the date.
TIME: TIME is stored in the format: hh:mm:ss.nnnnnnn, with a range of 00:00:00.0000000 through 23:59:59:9999999 and is accurate to 100 nanoseconds. Storage depends on the precision and scale selected, and runs from 3 to 5 bytes.
DATETIME2: DATETIME2 is very similar to the older DATETIME data type, but has a greater range and precision. The format is YYYY-MM-DD hh:mm:ss:nnnnnnnm with a range of 0001-01-01 00:00:00.0000000 through 9999-12-31 23:59:59.9999999, and an accuracy of 100 nanoseconds. Storage depends on the precision and scale selected, and runs from 6 to 8 bytes.
DATETIMEOFFSET: DATETIMEOFFSET is similar to DATETIME2, but includes additional information to track the time zone. The format is YYYY-MM-DD hh:mm:ss[.nnnnnnn] [+|-]hh:mm with a range of 0001-01-01 00:00:00.0000000 through 0001-01-01 00:00:00.0000000 through 9999-12-31 23:59:59.9999999 (in UTC), and an accuracy of 100 nanoseconds. Storage depends on the precision and scale selected, and runs from 8 to 10 bytes.

Monday, October 20, 2008

Prompting for parameter in SSIS

A question in MSDN forum drew my attention towards the missing functionality of

Prompting for parameters in SSIS

A question might struck in your mind that if we have it in SSRS than why not in SSIS. Well I am a great fan of Microsoft as they provide alternative for any thing under the sun (Sometimes SUN itself). In this post I would like to demonstrate how we can implement prompt for variable values in SSIS..

For the implemention, Let me consider a simple table having Name and Email Id.
Through SSIS package we will prompt for Name and accept Name from User.

CREATE TABLE [tblSSISPrompt]
(
[Name] varchar(15),
[Email] varchar(50)
)
GO
--Fill table with some data

INSERT INTO [tblSSISPrompt] VALUES ('Rahul', 'Rahul.Kumar@sqlsvrsol.com')
INSERT INTO [tblSSISPrompt] VALUES ('Prashant','Prashant.M@sqlsvrsol.com')
INSERT INTO [tblSSISPrompt] VALUES ('Mark','Mark.W@sqlsvrsol.com')
GO

SELECT * FROM [tblSSISPrompt]

Now we start with SSIS
A) Declare two variable
a) SqlStmt to store SQL query
b) Email to get Email from the resultset from Sql query

B) Use Script Task to prompt and prepare SQL query and store in SqlStmt variable
a) Specify SqlStmt as ReadWriteVariables

b) Code for prompt and prepare SQL query

Public Sub Main()
Dim Name As String

'Prompt
Name = InputBox("Enter Name", "Name Dialog")

'Prepare SQL query
Dts.Variables("SqlStmt").Value = _
"SELECT Email FROM tblSSISPrompt WHERE Name = '" + Name + "'"

'msgbox
MsgBox(Dts.Variables("SqlStmt").Value, , "Query")

End Sub

C) Use Execute SQL Task to execute the query

D) Execute the Package
a) As you execute the package, it will prompt for Name

b) Script will display query in Message box

c) Finally Execute SQL Task will execute the query

Congrats!!! thats Prompting for parameters in SSIS

If you want to do the same thing in C#, read first comment.

Wednesday, October 15, 2008

Zip and unzip files in a folder

Many a time we need to zip or unzip more then one file present in a folder. SSIS provides a easy way to handle this.

For my example I am using 7-zip though we have many other compress software like (pkzip or winzip)

Steps to zip/unzip files
A) Declare a variable FileName as string datatype

B) Use ForEach loop to loop through the files in the folder and map file names to FileName variable
a) Provide folder name inEnumerator configuration

b) map FileName Variable

C) Use Execute Process task to zip/unzip to individual files
a) Browse to Exe of 7z in Executable

b) Use expression to configure the argument

D) Execute the package

Use the files........

Returning numeric value in Execute SQL task

Yesterday night a question on MSDN fourm drew my attention towards a wired but ture fact that SSIS does not accept numeric value as double in Execute SQL task.

So how do we use our numeric value in Execute SQL Task

SSIS considers the numeric values as string when they are returned from Execute SQL task, so to use that value we have to follow two steps

1) Map the value to a String or Object datatype variable
2) Convert this value to double and store in other variable

Let me take an example to show how it works

I will consider below table for the example and extract maximum value of Num column
CREATE TABLE tblAbc
(
[Id] INT IDENTITY (1,1),
[Num] NUMERIC(10,4)
)
GO

INSERT INTO tblAbc VALUES (123.23)
INSERT INTO tblAbc VALUES (555.23)
GO
--query we will use to extract max of Num
SELECT Max(Num)as Num FROM tblAbc

Now its implementation
A) Declare two variables 1) objNum as object and 2) dblNum as double datatype

B) Use Execute SQL task to extract numeric value and map it to objNum variable

map result to objNum variable

C) Use script task to convert objNum value into double and assign it to dblNum variable
a) specify
Readonlyvariable as ObjNum
Readwritevariable as dblNum

b) Write below code as script

Dts.Variables("dblNum").Value = CDbl(Dts.Variables("objNum").Value
MsgBox(Dts.Variables("dblNum").Value, , "dblNum")

D) Excute the package

Now we have dblNum having our numeric value and can use it for any calculation purpose

Tuesday, October 14, 2008

Difference between Stored Proc and User-defined Functions

Today morning my close friend Prashanth (DotNet expert) asked difference between Stored Proc and Functions . So I am dedicating this post to him, also today is bithday of beautiful Buddu, his girlfriend.

Difference betweem Stored Proc and User-defined Functions

Stored Procedure have pre-compiled execuction plan where as functions do not.
Functions are used for computations where as procedures are mainly used for performing business logic.
Functions can only have 'in' parameter where as SP can have both 'in' and 'out' parameters.
Function does not allow DML (insert, update, delete) queries on an object where as SP allows.
Functions can be used inline where as SP can't be.
Stored Procedure can retun more than one value at a time while funtion returns only one value at a time.
Functions MUST return a value, procedures need not be.

Please leave your comments..

Functions

With my personal experience of SQL Server, I believe Function (User defined functions) are under utilized then their potential so I thought of writing this post in complete honor of UDFs.

Microsoft SQL Server provides three programmable objects
a) User defined Functions b) Stored Procedure c) Triggers

" Functions are pre-prepared pieces of code that may accept parameters, and always return a value"

UDFs support the creation of rich programming logic, including looping, flow control, decision making and branching.
UDFs can be divided into two types based on the value they return
a) Scaler functions b)Table-valued functions

A) Scaler functions: Scalar functions accept 0 or more input parameters and return a single scalar value.
You use the CREATE FUNCTION Transact-SQL statement to create a function. The general syntax of the statement is as follows:

CREATE FUNCTION [ schema_name. ] function_name
( [ { @parameter_name [ AS ][ type_schema_name. ] parameter_data_type
[ = default ] }
[ ,...n ]
]
)
RETURNS return_data_type
[ WITH [ ,...n ] ]
[ AS ]
BEGIN
function_body
RETURN scalar_expression
END [ ; ]
::=
{
[ ENCRYPTION ]
| [ SCHEMABINDING ]
| [ EXECUTE_AS_Clause ]
}

Example from AdventureWorks database
CREATE FUNCTION [dbo].[ufnGetStock](@ProductID [int])
RETURNS [int]
AS
-- Returns the stock level for the product. This function is used internally only
BEGIN
DECLARE @ret int;

SELECT @ret = SUM(p.[Quantity])
FROM [Production].[ProductInventory] p
WHERE p.[ProductID] = @ProductID
AND p.[LocationID] = '6'; -- Only look at inventory in the misc storage

IF (@ret IS NULL)
SET @ret = 0

RETURN @ret
END;
GO

--Execute
SELECT
p.[ProductID]
,p.[Name]
, dbo.ufnGetStock(p.ProductID) as [Stock]
FROM Production.Product p;

Result

B) Table-Valued Functions: The only difference in table-valued functions is that they return a table as output. Therefore, they are generally used in the FROM clause of a SELECT statement and possibly joined to other tables or views.

Syntax
CREATE FUNCTION [ schema_name. ] function_name
( [ { @parameter_name [ AS ] [ type_schema_name. ] parameter_data_type
[ = default ] }
[ ,...n ]
]
)
RETURNS @return_variable TABLE <>
[ WITH [ ,...n ] ]
[ AS ]
BEGIN
function_body
RETURN
END [ ; ]

Example from AdventureWorks :
CREATE FUNCTION [dbo].[ufnGetContactInfo] (@ContactID int)
RETURNS @retContactInformation TABLE
(
-- Columns returned by the function
[ContactID] int PRIMARY KEY NOT NULL,
[FirstName] [nvarchar](50) NULL,
[LastName] [nvarchar](50) NULL,
[JobTitle] [nvarchar](50) NULL,
[Email] [nvarchar] (150) Null
)
AS
BEGIN
IF @ContactID IS NOT NULL
BEGIN
INSERT @retContactInformation
SELECT
c.[ContactID]
,c.[FirstName]
,c.[LastName]
,e.[Title] AS [JobTitle]
,c.[EmailAddress]
FROM [HumanResources].[Employee] e
INNER JOIN [Person].[Contact] c
ON c.[ContactID] = e.[ContactID]
WHERE c.[ContactID]= @ContactID;
END;

RETURN;
END;
GO
select * from ufnGetContactInfo(1209)
Result

C) Restrictions on UDFs
Although we can execute virtually any valid batch of code with in function, but still they do have some restrictions:
Most significant is you cannot use a function to change the state of any object in a database or the database itself.Therefore,you cannot insert, update, or delete data in tables, nor can you create, alter, or drop objects in the database. However, you can create one or more table variables and issue INSERT, UPDATE, and DELETE statements against the table variable

Please leave comment...

Thursday, October 09, 2008

Views - the other face

To put it in a simple way : A view is a SELECT statement that has a name and is stored in Microsoft SQL Server.

Views act as virtual tables to provide several benefits.

a) Re-usability : A view gives developers a standardized way to execute queries, enabling them to write certain common queries once as views and then include the views in application code so that all applications use the same version of a query.

b) Security : A view provides a level of security by giving users access to just a subset of data contained in the base tables that the view is built over and can give users a more friendly, logical view of data in a database.

c) Performance : a view with indexes created on it can provide dramatic performance improvements, especially for certain types of complex queries.

We can categorize views as :
a) regular views, b) updateable views, c) indexed views.

A) Syntax of view creation
CREATE VIEW [ schema_name . ] view_name [ (column [ ,...n ] ) ]
[ WITH [ ,...n ] ]
AS select_statement [ ; ]
[ WITH CHECK OPTION ]
::=
{
[ ENCRYPTION ]
[ SCHEMABINDING ]
[ VIEW_METADATA ] }

ENCRYPTION : Should View be encrypted
SCHEMABINDING : will not allow you to drop a table, view or function reference by this view without dropping the view
VIEW_METADATA : returns metadata about a view to client-side data access libraries.

Select statement can be of any complexity as long as it is a valid query with following exceptions

should not have COMPUTE or COMPUTE BY clause
should not have INTO keyword
should not have OPTION clause
should not reference a temporary table or table variable
should not have ORDER BY clause unless it also specifies the TOP operator

Example of a view can be on AdventureWorks database
CREATE VIEW [vwEmployee]
AS
SELECT
e.[EmployeeID]
,c.[FirstName]
,c.[MiddleName]
,c.[LastName]
,e.[Title] AS [JobTitle]
,c.[Phone]
,c.[EmailAddress]
FROM [HumanResources].[Employee] e
INNER JOIN [Person].[Contact] c
ON c.[ContactID] = e.[ContactID]

B) Modifying Data Through Views (UPDATEBLE VIEW)
Although views allows you to update data with some exception but it will be best practice if we use INSTEAD triggers.

Exception for data modification through VIEWS

All changes must directly reference column, not derivations of a column
View definition cannot contain GROUPBY or DISTINCT clause
You cannot modify columns that are derived through aggregate functions (AVG,COUNT, SUM,MIN,SUBSTRING)
You cannot reference columns generated by using operators such as union, crossjoin, and intersect.
You cannot use TOP when you specify WITH CHECK OPTION

Example:
UPDATE [vwEmployee ]
SET [Phone] ='123-256-1685'
WHERE [employeeID] =1

C) INDEXED VIEW

An indexed view, also called a materialized view, causes SQL Server to execute the SELECT statement in the view definition. SQL Server then builds a clustered index on the view’s results, and stores the data and index within the database. As you change data in the base tables, SQL Server propagates these changes to the indexed view. If the result of the view could change from one execution to another or could change if different query options were set, the entire set of data SQL Server calculated and stored would be invalidated. Therefore, all the operators or functions that can cause varying results are disallowed.

Restrictions for index views

View should be Schema bound
The SELECT statement cannot reference other views.
All functions must be deterministic. For example, you cannot use getdate() because every time it is executed, it returns a different date result.
AVG, MIN, MAX, and STDEV are not allowed.

Example to create index on above view,
We will need to have it schema bound
ALTER VIEW [vwEmployee]
WITH
SCHEMABINDING
AS
SELECT
e.[EmployeeID]
,c.[FirstName]
,c.[MiddleName]
,c.[LastName]
,e.[Title] AS [JobTitle]
,c.[Phone]
,c.[EmailAddress]
FROM [HumanResources].[Employee] e
INNER JOIN [Person].[Contact] c
ON c.[ContactID] = e.[ContactID]

Now we can create index as
CREATE UNIQUE CLUSTERED INDEX [IX_vwEmployee_EmpID] ON [vwEmployee]
(
[EmployeeID] ASC
)

Welcome

Hi,

Welcome to my blog which is solely devoted to SQL Server.

Good Luck and Cheers
Rahul Kumar
MCTS - SQL Server 2005