Search

Thursday, March 31, 2011

HOW TO CALCULATE MEDIAN IN SQL SERVER

-- sample table:

create table Employee

(

Name varchar(1) primary key,

City varchar(10),

Age int

)

go


-- with some sample data:


insert into Employee

select 'A','Calcutta',13 union all -- odd #

select 'B','Calcutta',33 union all

select 'C','Calcutta',19 union all

select 'D','Delhi',25 union all -- single #

select 'E','Mumbai',22 union all -- even #

select 'F','Mumbai',65 union all

select 'G','Mumbai',67 union all

select 'H','Mumbai',71

go


-- here's our query, showing median age per city:


select city,

AVG(age) as MedianAge

from

(

select City, Name, Age,

ROW_NUMBER() over (partition by City order by Age ASC) as AgeRank,

COUNT(*) over (partition by City) as CityCount

from

Employee

) x

where

x.AgeRank in (x.CityCount/2+1, (x.CityCount+1)/2)

group by

x.City

go


-- clean it all up

drop table Employee

And here's the result:


city MedianAge

---------- -----------

Calcutta 19

Delhi 25

Mumbai 66


(3 row(s) affected)



Simply remove "City" from the SELECT clause and the GROUP BY clause to get the median age for all.

There may be more efficient tricks out there, but this is certainly the shortest and simplest technique I am aware of.

Wednesday, March 30, 2011

Automatic Statistics Update Slows Down SQL Server 2005

Statistics are objects which contain information about the distribution of values in one or more columns of a table or indexed view. The query optimizer uses this statistical data to estimate the number of rows in the query result. In the case where the statistics are invalid or outdated then the query optimizer is likely to choose a wrong execution plan and query performance will decrease. Therefore keeping statistics up to date is an essential part of SQL Server maintenance.

If statistics updates are such a vital part of SQL Server then how can they cause performance issues? In contrast to the above mentioned benefits, updating the statistics also takes resources from the system. This can lead to performance issues when SQL Server is under heavy load especially when you are doing bulk data inserts or have continuous pressure from insert and update operations. In this case statistics are constantly updated keeping your server busy. You can use SQL Profiler to investigate whether the statistics updates are causing a slowdown or not. You can capture the "Auto Stats" event to see the overhead on your system.

If you experience problems due to statistics automatically being updated then the solution might be to disable the Auto Update Statistics option. However, we cannot live with outdated statistics on large databases, so if you turn this option off you should create a scheduled job which explicitly updates the statistics during off-peak periods. To do this, you can use the sp_updatestats procedure.

Following are the steps to disable the "Auto Update Statistics" and to put in place a scheduled job to maintain statistics.


Disable Auto Update Statistics

Open Microsoft SQL Server Management Studio and navigate to your database using Object Explorer. Right click on the database and choose Properties. Select "Options" in the new window and set "Auto Update Statistics" to False as shown on the following screen.

Alternatively you can use the following script to disable this option:

ALTER DATABASE YourDBName  SET AUTO_UPDATE_STATISTICS OFF 


Create New SQL Job

Open Microsoft SQL Server Management Studio and navigate to SQL Server Agent -> Jobs using Object Explorer. Right click Jobs and choose New Job.

Enter the job name, owner, category (Database maintenance) and description on the following screen:

Click on Steps in the left side pane. Fill in the job name and database. The command type is T-SQL script. You should enter the following command: EXEC sp_updatestats

On the advanced tab you can choose success/failure actions. I recommend you log the job output to a file. You should enter a run as user, but please note that only the DBO and sysadmins can execute this procedure.

The last task is to create a schedule to run the job in off-peak periods. It depends on your database usage on how often you should update the statistics: daily, twice a day, etc. It really depends on your database size, the number of changing rows, etc... I recommend you experiment with this to find the optimal solution for your database. First schedule the job to run once a day in a convenient time and also measure how long it runs. Check the performance gain and consider whether more frequent updates are required.

Additional Options

There is also an option to update the statistics for only one table or a specified index. For this purpose you can use the UPDATE STATISTICS command. Please note that this command sets the Auto Update Statistics to ON if you do not use the NORECOMPUTE option.

UPDATE STATISTICS YourTableName(YourIndex) WITH NORECOMPUTE 

In addition, you can turn off Auto Update Statistics for a specific table using the sp_autostats system stored procedure.

Ref: http://www.mssqltips.com

Tuesday, March 29, 2011

Cannot resolve the Collation Conflict.

Problem

We had created a new Database in SQL Server 2008 instance; we received the following error massage while using SQL Server Management Studio:

Failed to retrieve data for this request. (Microsoft.SqlServer.Management.Sdk.Sfc)
Additional information:
An exception occurred while executing a Transact-SQL statement or batch.
(Microsoft.SqlServer.ConnectionInfo)
Cannot resolve the collation conflict between "Latin1_General_CI_AS" and "SQL_Latin1_General_CP1_CI_AS" in the UNION operation. (Microsoft SQL Server, Error:468)

Additionally, when we tried to run a query joining two tables, one table that belongs to the database created and the other a temporary table, both tables joined by a common column of varchar(10) data type, then we received the following error from SSMS:

(1 row(s) affected)
Msg 468, Level 16, State 9, Line 4
Cannot resolve the collation conflict between "Latin1_General_CI_AS" and "SQL_Latin1_General_CP1_CI_AS" in the equal to operation.




Cause.

We configured the SQL Server instance with the SQL_Latin1_General_CP1_CI_AS because it is the standard in our organization, and that means all system databases on the server have that collation (including TempDB), but we had created the new database with a different collation, collation Latin1_General_CI_AS. The conflict is originated by the difference in collation between the instance and the created database.

Solution.

If possible change the database collation. The following link gives instructions on how to change the database collation.

http://msdn.microsoft.com/en-us/library/ms175835.aspx

To change the server collation, you either have to reinstall SQL Server or rebuild system databases. For more information, please see the following article:

http://msdn.microsoft.com/en-us/library/ms179254.aspx

To allow specific queries to run despite the difference on collations, we need to modify those queries and include the COLLATE or COLLATE database_default clause when comparing string columns with different collations. Please see the following article, for more information:

http://msdn.microsoft.com/en-us/library/ms184391.aspx

Monday, March 28, 2011

A severe error occurred on the current command. The results, if any, should be discarded.

Error Messages:
  • Msg 5242, Level 22, State 1, Line 2
  • An inconsistency was detected during an internal operation in database 'Data'(ID:13) on page (1:26292). Please contact technical support. Reference number 4.
  • Msg 3316, Level 21, State 2, Line 2
  • During undo of a logged operation in database 'Data', an error occurred at log record ID (68865:616:348). The row was not found. Restore the database from a full backup, or repair the database.
  • Msg 3314, Level 21, State 4, Line 2
  • The log for database 'Data' is not available. Check the event log for related error messages. Resolve any errors and restart the database.
  • Msg 3314, Level 21, State 5, Line 2
  • During undoing of a logged operation in database 'Data', an error occurred at log record ID (68847:438:1). Typically, the specific failure is logged previously as an error in the Windows Event Log service. Restore the database or file from a backup, or repair the database.
  • Msg 0, Level 20, State 0, Line 0
  • A severe error occurred on the current command. The results, if any, should be discarded.

    Solution:


    All the above mentioned error occurs due to Data corruption in database.

    To recover from this error you should run the following queries. Please use the following steps…

    1. Take a current backup of the Database (Strictly recommended)
    2. Go to Start > All Programs > Microsoft SQL Server 2005 > SQL Server Management Studio Express
    3. Click on New Query button.
    4. run the following Queries one by one

      • ALTER DATABASE your_database_name SET SINGLE_USER
      • Go
      • DBCC CHECKDB ('your_database_name', REPAIR_ALLOW_DATA_LOSS) WITH ALL_ERRORMSGS, NO_INFOMSGS
      • Go
      • ALTER DATABASE your_database_name SET MULTI_USER
      • Go

    Notes: Make Sure All Connection with the Database is stopped at that time.

Friday, March 25, 2011

HOW TO CALCULATE THE NUMBER OF WEEK DAYS BETWEEN TWO DATES

If the start date and end date are both week days, then the total number of week days in between is simply:

(total difference in days) - (total difference in weeks) * 2

or

DateDiff(dd, @start, @end) - DateDiff(ww, @start, @end)*2

... since the DateDiff() function with weeks returns the number of week "boundaries" that are crossed; i.e., the number of weekends.

If you have a table of holidays, then you can simply subtract them out as well:

DateDiff(dd, @start, @end) -

DateDiff(ww, @start, @end)*2 -

(select count(*) from holidays where holiday_date between @start and @end)

Now, what if the start day or the end day is on a weekend? In that case, you need to define what to do in those situations in your requirements.

For example, if the start date is Sunday, Nov 20th, and the end day is Monday, Nov 21st -- how many week days are between those dates? There's no universal correct answer; it could be 0, or 1, or perhaps even "undefined" (null) depending on your needs.

Thursday, March 24, 2011

How many types of data models are there?

There are no standards in this area. Authors and theorists make it up as they go. The entity-relationship model (ER) has hundreds of derivitives (bachman, chen, ibm, IDEF1x etc.). the most popular of the OO models is Unified Modeling Language (UML). Actually UML and IDEF1x are closest to becoming a standard that can support software products. Rational already has products and IDEF1x is the language of ERwin.
Don't be fooled by these variations. They all represent the same things, you have to be very careful that you understand all of the non-standard symbols or you will surely make mistakes in interpreting what the pictures mean.

Wednesday, March 23, 2011

Change database compatibility level

Use below script to change the database compatibility level

ALTER DATABASE DatabaseName
SET SINGLE_USER
GO
EXEC sp_dbcmptlevel DatabaseName, 90;
GO
ALTER DATABASE DatabaseName
SET MULTI_USER
GO

SQL Server VersionCompatibility Level
SQL Server 6.565
SQL Server 7.070
SQL Server 200080
SQL Server 200590
SQL Server 2008100

Tuesday, March 22, 2011

Problem detected with installed .Net Framework

While connecting with Database from some other System, it is showing the following error message. How to resolve this? Message: "Problem detected with installed .Net Framework"


This error message is coming because the Database is in "Readonly" Mode. Please use the following steps to resolve the problem…

  1. Go to Start > Programs > Microsoft SQL Server 2005 > SQL Server Management Studio
  2. Check that the database is showing "Read Only"
  3. Right Click on Database name
  4. Select Properties
  5. Select "Options"
  6. Scroll Down to bottom of the newly opened window
  7. 7. Set “Database Read-Only = False”
  8. Click on “Ok” button

Monday, March 21, 2011

File Growth / Log Space / Transaction Log is Full

Issue

While trying to run the program following message appears.

“File Growth / Log Space is full. Delete Transaction Log file”

Solution

Close all application on all nodes (if running). Run the following commands using Query Analyzer of MSSQL Server.

BACKUP TRANSACTION Data WITH no_log

GO

CHECKPOINT

GO

/* IF YOU ARE USING SQL SERVER 2000 USE FOLLOWING QUERY */

DBCC SHRINKDATABASE(‘Data’,10)

/* IF YOU ARE USING SQL SERVER 7.0 USE FOLLOWING QUERY */

DBCC SHRINKDB(‘Data’,10)

GO

CHECKPOINT

GO

Friday, March 18, 2011

Error Attaching Database

Could not find row in sysindexes for database ID 7, object ID 1, index ID 1. Run DBCC CHECKTABLE on sysindexes.
Could not open new database'FR6DemoDB'. CREATE DATABASE is aborted.

This error is coming due to version mismatch of SQL Server. This particular error appears when you are trying to attach a SQL Server 2005 database in SQL Server 2000.

Data is created in SQL Server 2005 and you are trying to Restore/Attach this Data in SQL 2000, the above mentioned error will occur.

Please check your system and be sure that SQL Server 2005 is installed.

Thursday, March 17, 2011

How to remove a corrupt SQL Server 2005 Express Instance?

  • Download msicuu2.exe from http://support.microsoft.com/kb/290301
  • Remove the highlighted programs as shown in the picture.
  • Completely delete C:\Program Files\Microsoft SQL Server
  • Install SQL Server Express Edition again.
  • If instance error comes then remove the services.

How to Delete / Remove Services in Windows XP and Vista

  1. Go to Start > Run > services.msc to start Services applet.
  2. In the Services applet, find the service that you want to delete permanently, right click it and click on the Properties from the menu.
  3. In the properties window, find the name of the service and copy it to your clipboard.
  4. NOTE : ITS Service NAME / NOT the display Name.
  5. You may also want to stop the service, if it is running; this will kill all the running processes for it.
  6. Open a command prompt by going to Start > Run > CMD
  7. Once a command prompt has opened up, type the command “sc delete service name” without the quotes, replace service name, with the name of the service you copied in above step.

Note : Once a service has been deleted you should see a message saying “[SC] DeleteService SUCCESS”, this should mean that the service has been deleted, to ensure that, just click on the refresh button in the services applet and confirm that the service has been deleted.

Description of the Windows Installer CleanUp Utility

SUMMARY :When you are working on your computer and installing a new program, the installation suddenly fails. Now you are left with a partly installed program. You try to install the program again, but you are unsuccessful. Or, maybe you have problems trying to remove an old program because the installation files are corrupted.

Do not worry. Windows Installer CleanUp Utility might be able to help. You can use the utility to remove installation information for programs that were installed by using Windows Installer. Be aware that Windows Installer CleanUp Utility will not remove the actual program from your computer. However, it will remove the installation files so that you can start the installation, upgrade, or uninstall over.

You can download Windows Installer CleanUp Utility from http://support.microsoft.com/kb/290301

Tuesday, March 15, 2011

How to rebuild system databases in SQL Server 2005

Rebuilding the master database installs all system databases to their initial location. Use below procedure to rebuild the master database for a corrupted installation of SQL Server also.

To rebuild the system databases you need to run the setup command from the Command prompt and follow the following procedure:

1. Click Start, click Run, type cmd, and then click OK.
2. Run the following command to rebuild the system databases:

start /wait \setup.exe /qn INSTANCENAME= REINSTALL=SQL_Engine REBUILDDATABASE=1 SAPWD=StrongPassword

For example:

start /wait D:\setup.exe /qn INSTANCENAME=MSSQLSERVER REINSTALL=SQL_Engine REBUILDDATABASE=1 SAPWD=password

The /qn switch Writes all Setup messages, including error messages in Setup log files. The log is located at %ProgramFiles%\Microsoft SQL Server\90\Setup Bootstrap\LOG\Summary.txt

The /qb switch can be used instead of the /qn switch. /qb allows the display basic setup dialog boxes and all error messages.

SAPWD is needed to specify a new password for the System Administrator account. Microsoft recommends de use of a strong password.

The parameter INSTANCENAME is used to specify the instance name. Use MSSQLSERVER for the deafult instance.


Monday, March 14, 2011

Creating a comma-separated list

This solution utilizes features introduced in SQL 2005. It starts off with a CTE (common table expression) of all of the distinct AccountNumbers in the table. For each AccountNumber, we get a comma separated list of the Value field, sorted by the Value field.

WITH CTE AS
(
SELECT DISTINCT
AccountNumber
FROM #TestData
)
SELECT AccountNumber,
CommaList
= STUFF((
SELECT ',' + Value
FROM #TestData
WHERE AccountNumber = CTE.AccountNumber
ORDER BY Value
FOR XML PATH(''),
TYPE).value('.','varchar(max)'),1,1,'')
FROM CTE
ORDER BY AccountNumber;

The key to creating the comma separated list is the correlated subquery. Working from the inside out, we get each value prefixed with a comma, order by the Value. The FOR XML PATH('') generates an XML structure, with an empty string as the root node. Since the field is ',' + Value (an unnamed expression), there is no name for the individual elements. What is left is a list of values, with each value prefixed with a comma. The TYPE clause specifies to return the data as an XML type. The .value('.','varchar(max)') takes each value, and converts it into a varchar(max) data type. The combination of the TYPE and .value means that values are created at XML tags (such as the ampersand (&), and the greater than (>) and less than (<) signs), will not be tokenized into their XML representations and will remain as is.

At this point, you will have a comma separated list of all values starting with a comma for each value. All that remains is to remove the very first leading comma from the entire string. To do this, we utilize the STUFF function. Using the string created by the FOR XML PATH(''), TYPE, and starting with the first character, we replace one character (the leading comma) with an empty string. (Note that if you wanted the string to be separated with a comma and a space, you would specify ', ', and replace 2 characters in the STUFF function with an empty string.

The subquery is correlated, meaning that it references a value outside of itself to control what it is doing. In this case, it is referencing the current AccountNumber from the CTE.

The results will look like this (abbreviated due to length)

AccountNumber CommaList ------------- -----------------------------------------------------------------------------
1 @@H,@BE,@CE,@DA,@FA,@FH,@GB,@GD,@HC,@HG,A@E,A@G,AEC,AH@,AHB,AHI,AIG,B@@,B@A,B@B,BBG,B, ...
2 @CB,@CE,@CG,@DB,@EE,@GG,@GG,@HC,@IF,A@E,AAF,AAI,ACG,AEA,AFA,AFB,AFC,AFC,AFI,AGF,AIE,AIH,B ...
3 @@E,@@H,@BE,@CD,@DC,@DI,@EF,@EI,@FB,@GE,A@@,AAE,ACE,AEF,AFA,AGC,AH@,AIH,B@C,BAI,BC@,BDF, ...

Resources:

SQL Spackle - Creating a comma-separated list.docx

Saturday, March 12, 2011

How to Identify Memory related performance problem in Microsoft SQL Server

There are many reasons for memory related performance problems on a MS SQL Server instance, there can be either memory pressure from other applications, limit in physical or virtual memory or inside the SQL Server. We have many built-in tools which can be used to know the root cause.

DBCC MEMORYSTATUS command

We can use the DBCC MEMORYSTATUS command to check for any abnormal memory problem in SQL Server. Run the

DBCC MEMORYSTATUS

Command and scroll down to the Buffer Counts section, look for the Target value. It shows the number of 8-KB pages which can be committed without causing paging. A drop in the number of target pages might indicate response to an external physical memory pressure.

If the Committed amount is above Target, continue investigating the largest memory consumers inside SQL Server. When the server is not loaded, Target normally exceeds Committed and the value of the Process: Private Bytes performance counter.

If the Target value is low, but the server Process: Private Bytes is high, you might be facing internal SQL memory problems with components that use memory from outside of the buffer pool. Such components include linked servers, COM objects, extended stored procedures, SQL CLR, etc. If the Target value is low but its value is close to the total memory used by SQL Server, than you should check whether your SQL Server received a sufficient amount of memory from the system. Also you can check the server memory configuration parameters.

You can compare the Target count against the max server memory values if it is set. Later option limits the maximum memory consumption of the buffer pool. Therefore the Target value cannot exceed this value. Also the low Target count can indicate problems: in case it is less than the min server memory setting, you should suspect external virtual memory pressure.

Also check the Stolen Pages count in DBCC MEMORYSTATUS output. A high percentage (>70%) of Stolen Pages compared to Target can be a sign of internal memory pressure.

Further reading on Microsoft Support page:

Resource Governor

The Resource Governor in SQL Server 2008 Enterprise edition allows you to fine tune SQL Server memory allocation strategies, but incorrect settings can be a cause for out-of-memory errors. The following DMVs can provide information about the Resource Governor feature of SQL Server 2008:

sys.dm_resource_governor_workload_groups, sys.dm_resource_governor_configuration and sys.dm_resource_governor_resource_pools

SQL Server Profiler

SQL Server Profiler is a graphical user interface to SQL Trace for monitoring an instance of the Database Engine or Analysis Services. It shows how SQL Server resolves queries internally. This allows administrators to see exactly what Transact-SQL statements or Multi-Dimensional Expressions are submitted to the server and how the server accesses the database or cube to return result sets.

Additional information can be found in the articles

Working with SQL Server Profiler Trace Files andCreating a Trace Template in SQL Server Profiler.

SQL Server ring buffers

Another source of diagnostic memory information is the sys.dm_os_ring_buffers DMV. Each ring buffer records the last number of notifications. You can query the ring buffer event counts using the following code:

SELECT ring_buffer_type, COUNT(*) AS [Events] FROM sys.dm_os_ring_buffers GROUP BY ring_buffer_type ORDER BY ring_buffer_type

Here is a list of ring buffers of interest:

  • RING_BUFFER_BUFFER_POOL: This ring buffer contains records of buffer pool failures.
  • RING_BUFFER_RESOURCE_MONITOR: This ring buffer captures every memory state change by using resource monitor notifications.
  • RING_BUFFER_SCHEDULER_MONITOR: Stores information about the overall state of the server. The SystemHealthrecords are created with one minute intervals.
  • RING_BUFFER_MEMORY_BROKER: This ring buffer contains memory notifications for the Resource Governor Resource pool.
  • RING_BUFFER_OOM: This ring buffer contains records indicating out-of-memory conditions.

Dynamic Management Views

You can use the

sys.dm_os_memory_clerks

Dynamic management view (DMV) to get detailed information about memory allocation by the server components in SQL Server 2005 and 2008. You can get additional information about the caches by joining with the sys.dm_os_cache_counters (Please note that the amount of pages is NULL for USERSTORE entries):

SELECT DISTINCT SDMC.cache_address, SDMC.name, SDMC.type, SDMC.single_pages_kb, SDMC.multi_pages_kb, SDMC.single_pages_in_use_kb, SDMC.multi_pages_in_use_kb, SDMC.entries_count, SDMC.entries_in_use_count, SDMCCH.removed_all_rounds_count, SDMCCH.removed_last_round_count FROM sys.dm_os_memory_cache_counters AS SDMC JOIN sys.dm_os_memory_cache_clock_hands SDMCCH ON (SDMC.cache_address = SDMCCH.cache_address)

You can also use the following DMVs for memory troubleshooting both in SQL Server 2005 and 2008:

  • sys.dm_exec_requests
  • sys.dm_exec_query_memory_grants
  • sys.dm_exec_query_resource_semaphores
  • sys.dm_exec_sessions
  • sys.dm_exec_cached_plans
  • sys.dm_os_memory_cache_entries

Performance Monitor

Performance Monitor is part of the Microsoft Management Console, you can find it by navigating to Start Menu -> Administrative Tools Group. Additional information can be found in these articles:

Friday, March 11, 2011

Important Properties of Database

Use below script to list out the important properties of the database

select
sysDB.database_id,
sysDB.Name as 'Database Name',
syslogin.Name as 'DB Owner',
sysDB.state_desc,
sysDB.recovery_model_desc,
sysDB.collation_name,
sysDB.compatibility_level,
sysDB.user_access_desc,
sysDB.is_read_only,
sysDB.is_auto_shrink_on,
sysDB.is_auto_close_on,
sysDB.is_auto_create_stats_on,
sysDB.is_auto_update_stats_on,
sysDB.is_fulltext_enabled,
sysDB.is_trustworthy_on
from sys.databases sysDB
INNER JOIN sys.syslogins syslogin ON sysDB.owner_sid = syslogin.sid

Thursday, March 10, 2011

SQL DISTINCT on Multiple Columns

The DISTINCT clause works in combination with SELECT and gives you unique date from a database table or tables. The syntax for DISTINCT is show below

SELECT DISTINCT "column_name"
FROM "table_name"

If you want a DISTINCT combination of more than one column then the syntax is

SELECT DISTINCT column1, column2
FROM "table_name"

Let's look at some examples to understand the usage of the DISTINCT keyword. First, let's create a table for our illustration and insert some data.

CREATE TABLE DuplicateTest(
Firstname
nvarchar (30) NOT NULL,
Lastname
nvarchar(30) NOT NULL,
PostalCode
nvarchar(15) NOT NULL,
City
nvarchar(30) NOT NULL
)

insert into DuplicateTest
(Firstname,Lastname,PostalCode,City)
values
('Sarvesh', 'Singh', 'B283SP', 'Birmingham'),
('Steve', 'White', 'EC224HQ', 'London'),
('Mark', 'Smith', 'L324JK', 'Liverpool'),
('Claire', 'whitehood', 'M236DM', 'Manchester'),
('Param', 'Singh', 'B283SP', 'Birmingham')

select * from DistinctTutorial
DuplicateTest
FirstnameLastnamePostalCodeCity
SarveshSinghB263SPBirmingham
SteveWhiteEC224HQLondon
MarkSmithL324JKLiverpool
ClairewhitehoodM236DMManchester
ParamSinghB283SPBirmingham

In the result set above there are repetitions in the City Column. Let's get a list of all cities without repeating them using DISTINCT.

select DISTINCT City from DuplicateTest
City
Birmingham
Liverpool
London

Manchester


You can see 'Birmingham' is just returned once in this result, even though it appears more than once in the table. You can get the same result using GROUP BY as shown below.

select city from DuplicateTest
group by city

Let's now use DISTINCT with more than one column. We will add the Lastname column in as well.

select DISTINCT City,Lastname from DuplicateTest
Distinct City and lastname
CityLastname
BirminghamSingh
LiverpoolSmith
LondonWhite
Manchesterwhitehood

We get a list of results that have multiple rows, none of which are duplicated.

Again, you can get the same result by using GROUP BY as shown below:

select city,lastname from DuplicateTest
group by city, lastname

If you look at the original data, there are two users with same Lastname (Singh) who live in the same city (Birmingham). With the DISTINCT keyword you get one unique row. Let's now add another column to our SELECT query.

select DISTINCT City,Lastname,Postalcode from DuplicateTest

This returns:

CityLastnamePostalCode
BirminghamSinghB263SP
BirminghamSinghB283SP
LiverpoolSmithL324JK
LondonWhiteEC224HQ
ManchesterwhitehoodM236DM

You will notice now that you are seeing two rows with the same lastname of Singh. This is because their 'Postalcode' is different, and the addition of that column makes the rows unique.

Again you will get the same result using GROUP BY as shown below:


select city, lastname, postalcode
from DuplicateTest
group by city, lastname, postalcode

Let's look at another example where you can use DISTINCT on multiple columns to find duplicate address. I've taken this example from the post. Please refer to this post for more detail.

SELECT PostCode, COUNT(Postcode)
FROM
(
SELECT DISTINCT Address1, Address2, City, Postcode
FROM AddressTable
) AS Sub
GROUP BY Postcode
HAVING COUNT(Postcode) > 1

Or you can use GROUP BY as follows:

SELECT Address1,Address2,City,PostCode,Count(PostCode)
FROM AddressTable
GROUP BY Address1,Address2,City,PostCode
HAVING Count(PostCode) > 1

In both of these cases, we are using DISTINCT to find those rows that are duplicates, based on the columns we include in the queries.

DISTINCT can also be used to get unique column values with an aggregate function. In the example below, the query gets the unique values of reorderpoint and then does a SUM.

USE AdventureWorks
GO
SELECT SUM(DISTINCT ReorderPoint) as DistinctSum
FROM Production.Product
GO

Result: 1848 rows

In the example below query is doing a SUM of ReorderPoint including the duplicates.

SELECT SUM(ReorderPoint) as WithoutDistinct
FROM Production.Product
GO

Result: 202287 rows

As you can see from the above two examples the importance of DISTINCT with an aggregate function. The user could end up un-knowingly using completely incorrect SUM had he used the result from the second query if the requirement was to get the SUM of unique values of ReorderPoint.


Ref: http://www.sqlservercentral.com