Search

Friday, May 27, 2011

Calculate Percentile


Calculating Percentiles with SQL Server CTE
Here percentiles calculations are based on this fictitious table of employee salaries:

CREATE TABLE Employee
(EmployeeID INT IDENTITY(1,1) PRIMARY KEY,
 GroupName VARCHAR(30),
 Emp_Name VARCHAR(80),
 Salary FLOAT)

INSERT INTO Employee(GroupName, Emp_Name, Salary) values('PRODUCTION', 'P1', 12000)
INSERT INTO Employee(GroupName, Emp_Name, Salary) values('PRODUCTION', 'P2', 13500)
INSERT INTO Employee(GroupName, Emp_Name, Salary) values('PRODUCTION', 'P3', 13500)
INSERT INTO Employee(GroupName, Emp_Name, Salary) values('PRODUCTION', 'P4', 11500)
INSERT INTO Employee(GroupName, Emp_Name, Salary) values('SALES', 'S1', 12500)
INSERT INTO Employee(GroupName, Emp_Name, Salary) values('SALES', 'S2', 15500)
INSERT INTO Employee(GroupName, Emp_Name, Salary) values('SALES', 'S3', 14500)
INSERT INTO Employee(GroupName, Emp_Name, Salary) values('SALES', 'S4', 12500)
INSERT INTO Employee(GroupName, Emp_Name, Salary) values('SALES', 'S5', 11000)
INSERT INTO Employee(GroupName, Emp_Name, Salary) values('SALES', 'S6', 14000)
INSERT INTO Employee(GroupName, Emp_Name, Salary) values('MARKETING', 'M1', 11500)
INSERT INTO Employee(GroupName, Emp_Name, Salary) values('MARKETING', 'M2', 11000)
INSERT INTO Employee(GroupName, Emp_Name, Salary) values('MARKETING', 'M3', 16500)
INSERT INTO Employee(GroupName, Emp_Name, Salary) values('MARKETING', 'M4', 15500)
INSERT INTO Employee(GroupName, Emp_Name, Salary) values('MARKETING', 'M5', 12250)
INSERT INTO Employee(GroupName, Emp_Name, Salary) values('MARKETING', 'M6', 11500)
INSERT INTO Employee(GroupName, Emp_Name, Salary) values('ACCOUNTING', 'A1', 10500)
INSERT INTO Employee(GroupName, Emp_Name, Salary) values('ACCOUNTING', 'A2', 13500)
INSERT INTO Employee(GroupName, Emp_Name, Salary) values('ACCOUNTING', 'A3', 15000)
GO

Here is the code to calculate the 75th percentile of employee salaries using the above table data:

DECLARE @Percentile FLOAT
SELECT @Percentile = .95;

WITH Emp_Sal(Salary, Prev_Rank, Curr_Rank, Next_Rank) AS
(
   SELECT Salary, 
         (ROW_NUMBER() OVER ( ORDER BY GroupName, Salary ) - 2.0) / ((SELECT COUNT(*) FROM Employee) - 1)  [Prev_Rank],
         (ROW_NUMBER() OVER ( ORDER BY GroupName, Salary ) - 1.0) / ((SELECT COUNT(*) FROM Employee) - 1)  [Curr_Rank],
         (ROW_NUMBER() OVER ( ORDER BY GroupName, Salary ) + 0.0) / ((SELECT COUNT(*) FROM Employee) - 1)  [Next_Rank]
   FROM Employee
)
SELECT 
   CASE 
   WHEN T1.Salary = T2.Salary THEN T1.Salary
   ELSE T1.Salary + (T2.Salary - T1.Salary) * ((@Percentile - T1.Curr_Rank) /  (T2.Curr_Rank - T1.Curr_Rank)) 
   END
FROM Emp_Sal T1, Emp_Sal T2
WHERE (T1.Curr_Rank = @Percentile OR (T1.Curr_Rank < @Percentile AND T1.Next_Rank > @Percentile))
  AND (T2.Curr_Rank = @Percentile OR (T2.Curr_Rank > @Percentile AND T2.Prev_Rank < @Percentile))

This query uses SQL Server CTE along with the new ROW_NUMBER() function. The Emp_Sal CTE at the top produces a table that ranks each row. The Prev_Rank and Next_Rank columns help locate a row for interpolation when the desired percentile does not fall onto one of the rows. 


Calculating Percentiles with Grouping


The query in the previous section evaluates only the overall percentile. If you want to calculate the 95th percentile for each Group than use the below query: It contains two CTEs:

DECLARE @Percentile FLOAT
SELECT @Percentile = .95;

WITH Group_Row_Count(GroupName, Row_Count) AS
(
   SELECT GroupName, COUNT(*)
   FROM Employee
   GROUP BY GroupName
   HAVING COUNT(*) > 1
),
Emp_Sal(GroupName, Salary, Prev_Rank, Curr_Rank, Next_Rank) AS
(
   SELECT E.GroupName, E.Salary, 
         (ROW_NUMBER() OVER ( PARTITION BY E.GroupName ORDER BY E.Salary) - 2.0) / (C.Row_Count - 1), 
         (ROW_NUMBER() OVER ( PARTITION BY E.GroupName ORDER BY E.Salary) - 1.0) / (C.Row_Count - 1), 
         (ROW_NUMBER() OVER ( PARTITION BY E.GroupName ORDER BY E.Salary) + 0.0) / (C.Row_Count - 1)
   FROM Employee E 
   JOIN Group_Row_Count C ON C.GroupName = E.GroupName
)
SELECT T1.GroupName, 
   CASE 
   WHEN T1.Salary = T2.Salary THEN T1.Salary
   ELSE T1.Salary + (T2.Salary - T1.Salary) * ((@Percentile - T1.Curr_Rank) / (T2.Curr_Rank - T1.Curr_Rank)) 
   END
FROM Emp_Sal T1 
JOIN Emp_Sal T2 ON T1.GroupName = T2.GroupName
WHERE (T1.Curr_Rank = @Percentile OR (T1.Curr_Rank < @Percentile AND T1.Next_Rank > @Percentile))
  AND (T2.Curr_Rank = @Percentile OR (T2.Curr_Rank > @Percentile AND T2.Prev_Rank < @Percentile))
Go

Result:







15 comments:

  1. Hi Arun,
    Thanks a million, you are a star, code works beautifully, many thanks again,I think I found my SQL Guru!!!

    Regards
    Matrika

    ReplyDelete
  2. Hi Arun,
    thaak you for this solution.
    Please may i know how you would go about calculating the 3rd quartile, like in excel by month?

    Kind regards,
    Dayalan

    ReplyDelete
  3. Hello Dayalan,
    Check below link for the solution
    http://sqlanddotnetdevelopment.blogspot.in/2012/02/how-to-generate-quartile-ranking.html

    Thanks
    Arun Ladha

    ReplyDelete
  4. HI Arun,
    i have one doubt ,
    could you please help me.
    question : In my table Personame is Rajesh and his age 50.
    could you please help me, how to calculate the single record percentile values?
    thanks in advance.

    Regards,
    vinay

    ReplyDelete
  5. Hi Arun,

    Thanks for your great article,it was really helped me.
    But i have one doubt.
    In the below "Employee" Table

    EmpID GroupName Emp_Name Salary
    1 PRODUCTION P1 12000
    2 PRODUCTION P2 13500
    3 PRODUCTION P3 13500
    4 PRODUCTION P4 11500
    5 SALES S1 12500
    11 MARKETING M1 11500
    12 MARKETING M2 11000
    13 MARKETING M3 16500
    14 MARKETING M4 15500
    15 MARKETING M5 12250
    16 MARKETING M6 11500
    17 ACCOUNTING A1 10500
    18 ACCOUNTING A2 13500
    19 ACCOUNTING A3 15000

    i am unable to find out "Sales" percentile ?

    Could you please help me?

    Thanks in advance for your help.

    Regards,
    vinay.

    ReplyDelete
    Replies
    1. Hi Vinay,

      Check below query for solution:

      DECLARE @Percentile FLOAT
      SELECT @Percentile = .95;

      WITH Group_Row_Count(GroupName, Row_Count) AS
      (
      SELECT GroupName, COUNT(*)
      FROM Employees
      GROUP BY GroupName
      HAVING COUNT(*) >= 1
      ),
      Emp_Sal(GroupName, Salary, Prev_Rank, Curr_Rank, Next_Rank) AS
      (
      SELECT E.GroupName, E.Salary,
      (ROW_NUMBER() OVER ( PARTITION BY E.GroupName ORDER BY E.Salary) - 2.0) / (C.Row_Count - (CASE WHEN C.Row_Count > 1 THEN 1 ELSE 0 END)),
      (ROW_NUMBER() OVER ( PARTITION BY E.GroupName ORDER BY E.Salary) - 1.0) / (C.Row_Count - (CASE WHEN C.Row_Count > 1 THEN 1 ELSE 0 END)),
      (ROW_NUMBER() OVER ( PARTITION BY E.GroupName ORDER BY E.Salary) + 0.0) / (C.Row_Count - (CASE WHEN C.Row_Count > 1 THEN 1 ELSE 0 END))
      FROM Employees E
      JOIN Group_Row_Count C ON C.GroupName = E.GroupName
      )
      SELECT T1.GroupName,
      CASE
      WHEN T1.Salary = T2.Salary THEN T1.Salary
      ELSE T1.Salary + (T2.Salary - T1.Salary) * ((@Percentile - T1.Curr_Rank) / (T2.Curr_Rank - T1.Curr_Rank))
      END
      FROM Emp_Sal T1
      JOIN Emp_Sal T2 ON T1.GroupName = T2.GroupName
      WHERE ((T1.Curr_Rank = @Percentile OR (T1.Curr_Rank < @Percentile AND T1.Next_Rank > @Percentile))
      AND (T2.Curr_Rank = @Percentile OR (T2.Curr_Rank > @Percentile AND T2.Prev_Rank < @Percentile)))
      OR (T1.Curr_Rank = 0 AND T1.Next_Rank = 1 AND T1.Prev_Rank = -1)
      Go

      Delete
    2. Thanks Arun,
      This query has helpmed me a lot.
      thanks a lot for your query.

      Regards,
      vinay

      Delete
  6. Hi Arun,

    can we calculate the percentile values by using the temp tables inseted of by using
    CTE?
    could you please suggest me with some example.

    thanks in advance..

    ReplyDelete
  7. Hi Arun,
    i have tried with example which you posted earlier,
    but i am getting two percentile values for below table.

    Name CSO_Age
    OT 73
    OT 3

    could you suggest me how can calculate these two percenterle values?

    Thanks in advance!!

    Regards,
    vinay

    ReplyDelete
  8. Hi Sir,
    Thanks for the excellent solutions.

    Regards,
    Hiren

    ReplyDelete
  9. Is this possible with TWO by groups? Would it require another CTE?

    ReplyDelete
    Replies
    1. Yes it is possible with two Group. You had to Group by two columns. Check below Example:

      Added one more column in table.

      DECLARE @Percentile FLOAT
      SELECT @Percentile = .95;

      WITH Group_Row_Count(GroupName, GroupName1, Row_Count) AS
      (
      SELECT GroupName, GroupName1, COUNT(*)
      FROM Employee
      GROUP BY GroupName, GroupName1
      HAVING COUNT(*) > 1
      ),
      Emp_Sal(GroupName, GroupName1, Salary, Prev_Rank, Curr_Rank, Next_Rank) AS
      (
      SELECT E.GroupName, E.GroupName1, E.Salary,
      (ROW_NUMBER() OVER ( PARTITION BY E.GroupName, E.GroupName1 ORDER BY E.Salary) - 2.0) / (C.Row_Count - 1),
      (ROW_NUMBER() OVER ( PARTITION BY E.GroupName, E.GroupName1 ORDER BY E.Salary) - 1.0) / (C.Row_Count - 1),
      (ROW_NUMBER() OVER ( PARTITION BY E.GroupName, E.GroupName1 ORDER BY E.Salary) + 0.0) / (C.Row_Count - 1)
      FROM Employee E
      JOIN Group_Row_Count C ON C.GroupName = E.GroupName AND C.GroupName1 = E.GroupName1
      )
      SELECT T1.GroupName, T1.GroupName1,
      CASE
      WHEN T1.Salary = T2.Salary THEN T1.Salary
      ELSE T1.Salary + (T2.Salary - T1.Salary) * ((@Percentile - T1.Curr_Rank) / (T2.Curr_Rank - T1.Curr_Rank))
      END
      FROM Emp_Sal T1
      JOIN Emp_Sal T2 ON T1.GroupName = T2.GroupName AND T1.GroupName1 = T2.GroupName1
      WHERE (T1.Curr_Rank = @Percentile OR (T1.Curr_Rank < @Percentile AND T1.Next_Rank > @Percentile))
      AND (T2.Curr_Rank = @Percentile OR (T2.Curr_Rank > @Percentile AND T2.Prev_Rank < @Percentile))
      Go

      Delete
  10. Fantastic, this gets the results i am after.. How can this be setup in a view to report from ?

    ReplyDelete