Wednesday, December 31, 2008

Cracking Query Execution - I: Query Cycle

The very basic idea of designing a database is to store information and Query effectively, I will discuss what a Query is and how it gets executed:

QUERY: A precise request for information retrieval with database and information systems

What happens when we submit a query:
As we submit a query, number of processes in the database engine go to work on it so as to manage the system in such way that retrieve or store a data in a timely a manner as possible, whilst maintaining the integrity of the data.
We can roughly break down these processes into two stages
A) Processes that occurs in the Relational engine
In relational engine the query is parsed and then processed by Query Optimizer, which generates an execution plan. This plan is sent (in binary format) to Storage engine.
B) Processes that occurs in the Storage engine
In storage engine process such as locking, index maintenance and transaction occur.


Query Parsing:
As a SQL Server Engine receives a query,it passes it through a process that checks that the T-SQL is written correctly and is well formed.The output of the parser process is a PARSE TREE.

If the T-SQL is a DML, the Parse tree is passed to a precess called as ALGEBRIZER, it resolves all the names of various objects like table and columns referred in the query, it also performs type checking and does aggreate binding (determines location of aggregates with in the query). Output of Algebrizer is called as QUERY PROCESSOR TREE or ALEGEBRISER TREE and is passed to Query optimiser
Non DML are not passed to Algebrizer of query optimiser as non-DML cant be optimized (SQL Server doesn't provide many ways of creating a table).

Query Optimizer:
The optimizer figures out how best to implement the request represented by the T-SQL query you submitted. It decides if the data can be accessed through indexes, what types of joins to use and much more. The decisions made by the optimizer are based on what it calculates to be the cost of a given execution plan, in terms of the required CPU processing and I/O, and how fast it will execute. Hece, this is known as cost-based plan.

The optimizer will generate and evaluate many plans (unless there is already a cached plan) and, generally speaking, will choose the lowest-cost plan i.e. the plan it thinks will execute the query as fast as possible and use the least amount of resources, CPU and I/O. The calulation of the execution speed is the most important calculation. Sometimes, the optimizer will select a less efficient plan if it thinks ith will take more time to evaluate many plans than to run a less efficient plan.

Once the optimizer arrives at an execution plan, the estimated execution plan is created and stored in a memory space known as the PLAN CACHE.
Sometimes actual executed plan differs from the estimated execution plan , WHY

Query Execution:
Once the execution plan is generated, the action switches to the storage engine, where the query is actually executed, according to the plan.

Wednesday, December 24, 2008

Numeric Checking in SSIS

A post in MSDN forum drew my attention toward missing functionality of IsNumeric() in SSIS expressions.

To solve this we can use script task and check of IsNumric()
forexample I will use following data and table

--create a test table
create table TEST
(
[id] varchar(20),
[name] varchar(20)
)
--Fill some data

Insert into TEST values ( 1,'Mike')
Insert into TEST values ( 2 ,'Rahul')
--error data with non numeric id
Insert into TEST values ( '3a' ,'Worngdata')

Now lets design our package
1) Get data extract in Oledb DataSoruce
2) Create a Script task and specify input column as ID


2.b) Add a column as IsNumeric of Boolean datatype


2.c) Write below code as Script


If IsNumeric(Row.id) Then
Row.IsNumeric = True
Else
Row.IsNumeric = False
End If

3) Use Conditional Split to separate out bad records and valid records


4) Now when we execute the package it will separate out records with non-numeric data


Full package

Tuesday, December 23, 2008

Identify Orphan Users on an instance

Many a time when we move our database to new instance, rebuilding server or setting up a backup we need to synchronize user logins with the server

We can use below script to find out the Orphan Users on the Server

IF EXISTS (SELECT * FROM sysobjects WHERE id = OBJECT_ID(N'[dbo].[usp_ShowOrphanUsers]') AND OBJECTPROPERTY(id, N'IsProcedure') = 1)
DROP PROCEDURE [dbo].[usp_ShowOrphanUsers]
GO

CREATE PROC dbo.usp_ShowOrphanUsers
AS
BEGIN
CREATE TABLE #Results
([Database Name] sysname ,
[Orphaned User] sysname ,
[Type User] sysname )

SET NOCOUNT ON

DECLARE @DBName sysname, @Qry NVARCHAR(4000)

SET @Qry = ''
SET @DBName = ''

WHILE @DBName IS NOT NULL
BEGIN
SET @DBName =
(
SELECT MIN(name)
FROM master..sysdatabases
WHERE name NOT IN
('master', 'model', 'tempdb', 'msdb', 'distribution', 'pubs', 'northwind', 'dba_database')
AND DATABASEPROPERTY(name, 'IsOffline') = 0
AND DATABASEPROPERTY(name, 'IsSuspect') = 0
AND name > @DBName
)

IF @DBName IS NULL BREAK

SET @Qry = ' SELECT ''' + @DBName + ''' AS [Database Name],
CAST(name AS sysname) COLLATE Latin1_General_CI_AS AS [Orphaned User],
[Type User] =
CASE isntuser
WHEN ''0'' THEN ''SQL User''
WHEN ''1'' THEN ''NT User''
ELSE ''Not Available''
END
FROM ' + QUOTENAME(@DBName) + '..sysusers su
WHERE su.islogin = 1
AND su.name NOT IN (''INFORMATION_SCHEMA'', ''sys'', ''guest'', ''dbo'', ''system_function_schema'')
AND NOT EXISTS (SELECT 1 FROM master..syslogins sl WHERE su.sid = sl.sid)'
INSERT INTO #Results
EXEC master..sp_executesql @Qry
END
SELECT *
FROM #Results
ORDER BY [Database Name], [Orphaned User]
IF @@ROWCOUNT = 0
PRINT 'No orphaned users exist in this server.'
END
GO

--Execute the proc
EXEC [dbo].[usp_ShowOrphanUsers]

"This script is taken from SQL Server Central"

Google Dataset Search

Google's Vision statement  is “ to provide access to the world's information in one click. ” Google’s mission Statement is “ ...