Wednesday, December 31, 2008

Cracking Query Execution - I: Query Cycle

The very basic idea of designing a database is to store information and Query effectively, I will discuss what a Query is and how it gets executed:

QUERY: A precise request for information retrieval with database and information systems

What happens when we submit a query:
As we submit a query, number of processes in the database engine go to work on it so as to manage the system in such way that retrieve or store a data in a timely a manner as possible, whilst maintaining the integrity of the data.
We can roughly break down these processes into two stages
A) Processes that occurs in the Relational engine
In relational engine the query is parsed and then processed by Query Optimizer, which generates an execution plan. This plan is sent (in binary format) to Storage engine.
B) Processes that occurs in the Storage engine
In storage engine process such as locking, index maintenance and transaction occur.


Query Parsing:
As a SQL Server Engine receives a query,it passes it through a process that checks that the T-SQL is written correctly and is well formed.The output of the parser process is a PARSE TREE.

If the T-SQL is a DML, the Parse tree is passed to a precess called as ALGEBRIZER, it resolves all the names of various objects like table and columns referred in the query, it also performs type checking and does aggreate binding (determines location of aggregates with in the query). Output of Algebrizer is called as QUERY PROCESSOR TREE or ALEGEBRISER TREE and is passed to Query optimiser
Non DML are not passed to Algebrizer of query optimiser as non-DML cant be optimized (SQL Server doesn't provide many ways of creating a table).

Query Optimizer:
The optimizer figures out how best to implement the request represented by the T-SQL query you submitted. It decides if the data can be accessed through indexes, what types of joins to use and much more. The decisions made by the optimizer are based on what it calculates to be the cost of a given execution plan, in terms of the required CPU processing and I/O, and how fast it will execute. Hece, this is known as cost-based plan.

The optimizer will generate and evaluate many plans (unless there is already a cached plan) and, generally speaking, will choose the lowest-cost plan i.e. the plan it thinks will execute the query as fast as possible and use the least amount of resources, CPU and I/O. The calulation of the execution speed is the most important calculation. Sometimes, the optimizer will select a less efficient plan if it thinks ith will take more time to evaluate many plans than to run a less efficient plan.

Once the optimizer arrives at an execution plan, the estimated execution plan is created and stored in a memory space known as the PLAN CACHE.
Sometimes actual executed plan differs from the estimated execution plan , WHY

Query Execution:
Once the execution plan is generated, the action switches to the storage engine, where the query is actually executed, according to the plan.

1 comment:

  1. Very nice to see your article over here. Good job !
    Best regards,
    Prashant

    ReplyDelete

Google Dataset Search

Google's Vision statement  is “ to provide access to the world's information in one click. ” Google’s mission Statement is “ ...