Wednesday 30 January 2008

Tips for Writing High-performance SQL

These tips apply broadly when writing high-performance stored procedures. Unfortunately, unlike some tips, you can't simply apply most of them without first considering the nature and schema of the data you're querying.

1. Avoid using cursors (as well as other looping structures) as much as possible.

Cursors are inefficient, and database engines usually don't have the best loop implementations in terms of performance.

On the database side, you can usually replace code involving cursors with aggregate SQL statements (SELECT, INSERT, and UPDATE) that use vector tables. All database engines are heavily optimized for aggregate statements, so even if a loop is unavoidable, it is always better to execute a few aggregate statements in a loop with a small number of iterations, than to create a cursor and execute simple statements over a large number of iterations.

Even if initial performance tests, especially with a small amount of data, show cursors to be more efficient than a complex aggregate statement, it is worthwhile to try to optimize the operation by breaking it into smaller portions or using other approaches—unless you can guarantee that the data value will stay small. Cursor approaches will not scale.

2 . Filter data wisely.

One alternative to using cursors uses a fall-through approach, filtering and aggregating data in multiple steps via a set of data storages, which could be physical tables, temporary tables, or table variables. It is usually best to include some aggregate filters into aggregate statements to filter out the majority of data in one simple shot whenever necessary, working on smaller amounts of data. Then you can proceed with joining and filtering, making sure to keep the number of join permutations under control at all times.



3. It is usually more efficient to execute multiple statements with one condition than a single statement with multiple OR conditions when executing UPDATE and DELETE statements against permanent database tables that can be accessed by multiple users simultaneously. This tip is especially important from the scalability point of view; from the performance point of view the difference is usually marginal. The major reason for the tip is the locking of the database records and the lock escalations that occur behind the scenes.

4. Make wise distinctions between temp tables and table variables.

Table variables are in-memory structures that may work from 2-100 times faster than temp tables. But keep in mind that access to table variables gets slower as the volume of data they contain grows. At some point, table variables will overflow the available memory and that kills the performance. Therefore, use table variables only when their data content is guaranteed not to grow unpredictably; the breaking size is around several thousand records. For larger data volumes, I recommend temp tables with clustered indexes. Interestingly, I've found that a temp table with one clustered index is often faster than having multiple simple indexes. In contrast, multiple simple indexes with physical tables are often faster than one clustered index.

5. Make careful distinctions between hard rules and assumptions.

This is more of a business design tip, which applies more to code design than to performance and scalability design in general. In real life however, performance and scalability are generally the first things to suffer from improper design. When rules are implemented as assumptions, they usually cause unnecessary calculations to be performed, affecting performance. However, when assumptions are implemented as rules they tend to cause errors and algorithm failures, which usually requires an urgent redesign. That, in turn, is usually performed with business constraints and results in inefficient final algorithms. That's because bad design decisions are often corrected in a rush and without sufficient resources—sometimes under pressure from customers whose businesses are usually in a critical stage when problems are uncovered, but must continue operating during the process.

6. Pay attention to join order.

Using proper join order sometimes lets the database engine generate hints that execute joins with an optimal amount of records. Most database engines also support hard hints, but in most cases you should avoid using hard hints and let the database engine figure out the best way to do its job on its own.

7. Be careful when joining complex views to other views and database tables in complex SELECT statements.

When the database contains a significant amount of data, SQL Server engine tends to recalculate the execution plan of the resulting statement, which often results in an inefficient execution plan and may kill the performance. The most difficult part is that the behavior of SQL Server engine is inconsistent in that respect, and heavily depends on the database size, indexes, foreign keys, and other database structures and constraints. The consistent work-around is to pre-select data from the view into a temp table with the reasonable pre-filters, and then use that temp table in place of the underlying view.

8. Create indexes on temp tables wisely.

As mentioned in Tip 4, clustered indexes are usually the best in terms of performance for temp tables; however, there is a difference between creating the index before or after inserting data into the temp table. Creating the index before the insert complicates the insert, because the database engine must order the selection. For complex selections such as those mentioned in Tip 7, the extra ordering may overcomplicate the overall statement and drastically degrade the performance. On the other hand, creating the index after the insert forces the database engine to recalculate the execution plan of the stored procedure every time it is called. Therefore, the decision is always a trade-off and you should make it based on the relative costs of the two possibilities.

9. In general, try to avoid execution plan recalculation.

One common cause of recalculation occurs when the stored procedure contains several paths that depend on values passed in parameters. However, whether avoiding recalculation is possible depends on the complexity of the stored procedure and on other circumstances, such as those described in tip 8. When the engine does recalculate execution, performance always suffers; however, recalculating the execution plan of the caller does not force the execution plan recalculation of the called procedure (or view or function). Therefore, the workaround is to divide one stored procedure into multiple procedures (depending on the passed-in parameters), and then call the children from the parent conditionally. You should perform this subdivision very carefully though, because it can be a maintenance nightmare—but sometimes it seems to be the only way to achieve acceptable database performance and scalability.

Finally, although this isn't either a performance or a scalability tip, I urge you to format your stored procedure scripts legibly. It's best to agree on common practices such as clause order and formatting rules with your coworkers in advance. Not only does that help avoid errors, it also clearly shows the logical structure of the statements and often aids in figuring out faulty filters and joins.

This list of tips is certainly not exhaustive, but they probably cover the most important performance and scalability factors. Still, there's nothing like an example to drive home the point. The Sudoku solution described in the rest of this article illustrates the techniques in the first six tips.
Share:

Tuesday 29 January 2008

Passing datetime parameter to SQL Stored procedure in Query Analyzer

Yesterday I faced problem passing datatime parameter in sql server queryanalyzer, where I was trying to run a procedure UP_ORDER_LOG which expects parameter @co_id, @Prog_id, @event and @event_date

So I tried executing the procedure like

exec UP_ORDER_LOG @co_id='XYZ', @prog_id='ABC', @event='Some event', @event_date=getdate()

exec UP_ORDER_LOG @co_id='XYZ', @prog_id='ABC', @event='Some event', @event_date=getdate

exec UP_ORDER_LOG @co_id='XYZ', @prog_id='ABC', @event='Some event', @event_date=CURRENT_TIMESTAMP

But everytime I got the error

Server: Msg 170, Level 15, State 1, Line 1
Line 1: Incorrect syntax near ')'.

So finally I cracked the above mentioned problem with the solution given below :

declare @eventdate datetime
select @passdate = CURRENT_TIMESTAMP
exec UP_ORDER_LOG @co_id='XYZ', @prog_id='ABC', @event='Some event', @event_date=@eventdate

I hope this will help.
Please do add comments if you get some other solution also.
Share:

Thursday 24 January 2008

Difference Between .NET Framework 2.0 and .NET Framework 3.0

.NET Framework 3.0 = .NET Framework 2.0 + WCF + WPF + WF + WCS
Kirk Allen Evans has the best explanation so far of what comprises the .NET Framework 3.0. It can be summarized as:

.NET Framework 3.0 = .NET Framework 2.0 + WCF + WPF + WF + WCS

He also has a diagram showing the different parts.

Brad Abrams also tries to clear up the confusion by giving some more information .NET 3.0 and the "Orcas" release, which will be the vNextNext of the .NET Framework.
The "Orcas" release will include the LINQ related technologies, like the new C# 3.0 and VB 9.0 compilers. Brad anticipates that the "Orcas" release will be called the .NET Framework 3.5. If you look at the comments to his blog entry you can see that a lot of confusion about the details still remains.

Most people are happy to see the WinFX components being put under the .NET Framework umbrella. Nevertheless a lot of people (including me) aren't happy with the version number 3.0. It suggests too much. You would think the step from 2.0 to 3.0 would mean an updated version of the CLR, the compilers and framework classes like those in ASP.NET and ADO.NET.

Yes, the WinFX components are a major addition to the .NET Framework. But I think the major version number of the .NET Framework is the wrong place to put the marketing spin about this. I even have doubts about changing the minor version number of the .NET Framework. I think it should just remain:

.NET Framework 2.0 + WinFX = .NET Framework 2.0 + WCF + WPF + WF + WCS

No matter how big in functionality, WinFX is still only an addition to .NET 2.0. As such it should not change the version number. I think Microsoft should use the "Orcas" release, which does change .NET 2.0 by updating the compilers and framework classes, to fully embrace WinFX and change the version number of the .NET Framework.

http://bloggingabout.net/blogs/erwyn/archive/2006/06/12/12458.aspx
Share:

Microsoft ASP.NET Futures (July 2007): Documentation and Resources

The Microsoft ASP.NET Futures July 2007 (“Futures”) release contains an early developer preview of features providing a wide range of new functionality for both ASP.NET, Dynamic Languages and Microsoft SilverlightTM. The functionality in the Futures release includes early experimental versions of features currently being considered for future versions of ASP.NET and the .NET Framework. The Futures releases assume some prior knowledge of:

--The core features of ASP.NET.
--The previous ASP.NET AJAX Futures January CTP release.
--The previous the ASP.NET Futures May 2007 release.

Visit : http://quickstarts.asp.net/futures/
Share: