Search

Monday, January 10, 2011

Find and Delete Duplicate Rows From Tables without Primary Key

Someone recently asked me how to find and delete duplicate rows from sample tables, which do not have a Primary Key. I have written one such sample on MSDN code Find and/or Delete Duplicate Rows which I will share here
SAMPLE Data
-- Suppress data loading messagesSET NOCOUNT ON-- Create TableCREATE TABLE #Customers (ID integer, CustName varchar(20), Pincode int)-- Load Sample Data in TableINSERT INTO  #Customers VALUES (1, 'Jack',45454 )INSERT INTO  #Customers VALUES (2, 'Jill', 43453)INSERT INTO  #Customers VALUES (3, 'Tom', 43453)INSERT INTO  #Customers VALUES (4, 'Kathy', 22343)INSERT INTO  #Customers VALUES (5, 'David', 65443)INSERT INTO  #Customers VALUES (6, 'Kathy', 22343)INSERT INTO  #Customers VALUES (7, 'Kim', 65443)INSERT INTO  #Customers VALUES (8, 'Hoggart', 33443)INSERT INTO  #Customers VALUES (9, 'Kate', 61143)INSERT INTO  #Customers VALUES (10, 'Kim', 65443)
To indentify duplicate rows, the trick is to Group the rows by CustName, Pincode. Rows having similar CustName and Pincode will have more than one rows in the grouping. So locate them using HAVING COUNT(*) > 1. If duplicate values are encountered, return the maximum ID for each duplicate row. Using the outer query, delete any ID returned by subquery.
-- Find Duplicate RowsSELECT    MAX(ID) as ID, CustName, Pincode FROM #CustomersGROUP BY CustName, PincodeHAVING COUNT(*) > 1-- Delete Duplicate RowsDELETE FROM #CustomersWHERE ID IN( SELECT    MAX(ID) FROM #CustomersGROUP BY CustName, PincodeHAVING COUNT(*) > 1)
SELECT * FROM #Customers

No comments:

Post a Comment