mysql percentile rank by group - mysql

I have a table containing date, id, and value, with about 1000 id rows per date. I need to calculate the percentile rank of each row, by date. I am using the following code for percentile rank for a single date, but with over 10 years of daily data this is very inefficient to run date-by-date. Seems that it should be able to be formulated in MySQL but I've not been able to make it work.
Date ID Value
date1 01 -7.2
date1 02 0.6
date2 01 1.2
date2 02 3.8
SELECT c.id, c.value, ROUND( (
(#rank - rank) / #rank ) *100, 2) AS rank
FROM (
SELECT * , #prev := #curr , #curr := a.value,
#nxtRnk := #nxtRnk + 1,
#rank := IF( #prev = #curr , #rank , #nxtRnk ) AS rank
FROM (
SELECT id, value
FROM temp
WHERE date = '2013-06-28'
) AS a, (
SELECT #curr := NULL , #prev := NULL , #rank :=0, #nxtRnk :=0
) AS b
ORDER BY value DESC
) AS c
So basically I want to SELECT DISTINCT(date), and then for each date perform the above SELECT, which is preceeded by INSERT INTO table2( ... ) to write the results to table2.
Thanks for any help,
Hugh

I finally developed an acceptable solution by using a temporary table. Maybe not the optimum solution, but it works in about 5 sec on a million + record table.
My temporary table (t1) contains date and the count of rows for date.
The third select above is changed to
SELECT t1.date, t1.cnt, id, value FROM t1 LEFT JOIN temp ON(t1.date = temp.date)
Also, the calculations in the first SELECT above were changed to use c.cnt rather than #rank, and an #prevDate variable was created to reset the rank count on date changes.
Thanks to anyone who looked at this and tried to work up a solution.

I was trying to solve this for quite some time and then I found the following answer. Honestly brilliant. Also quite fast even for big tables (the table where I used it contained approx 5 mil records and needed a couple of seconds).
SELECT
CAST(SUBSTRING_INDEX(SUBSTRING_INDEX( GROUP_CONCAT(field_name ORDER BY
field_name SEPARATOR ','), ',', 95/100 * COUNT(*) + 1), ',', -1) AS DECIMAL)
AS 95th Per
FROM table_name;
As you can imagine just replace table_name and field_name with your table's and column's names.
For further information check Roland Bouman's original post

Related

Grouping rows via two different columns in MYSQL

I just want to ask if grouping rows with the same value but came from different columns is possible.
I have a scenario that we should sum up the total minutes if the records are found "continuous" transactions by checking if the STARTDATETIME column matches the previous data of ENDDATETIME column if they are the same. See image link below for reference.
Thanks guys.
I modified Gordon Linoff's solution ( see my comment under the question):
SELECT
c.employee_id
,MIN(c.start_date) AS start_date
,MAX(c.end_date) AS end_date
,COUNT(*) AS numcontracts,
TIMESTAMPDIFF(minute,MIN(c.start_date),MAX(c.end_date)) AS timediff
FROM
(
SELECT
c0.*
,(#rn := #rn + COALESCE(startflag, 0)) AS cumestarts
FROM
(SELECT c1.*,
(NOT EXISTS (SELECT 1
FROM contracts c2
WHERE c1.employee_id = c2.employee_id AND
c1.start_date = c2.end_date
)
) AS startflag
FROM contracts c1
ORDER BY employee_id, start_date
) c0 CROSS JOIN (SELECT #rn := 0) params
) c
GROUP BY c.employee_id, c.cumestarts
http://rextester.com/VOGMU19779
timediff contains the minutes passed in the combined interval.

MySQL: Limiting result for WHERE IN list

Let's say there are millions of records in my_table.
Here is my query to extract rows with a specific name from list:
SELECT * FROM my_table WHERE Name IN ('name1','name2','name3','name4')
How do I limit the returned result per name1, name2, etc?
The following query would limit the whole result (to 100).
SELECT * FROM my_table WHERE Name IN ('name1','name2','name3','name4') LIMIT 100
I need to limit to 100 for each name.
This is a bit of a pain in MySQL, but the best method is probably variables:
select t.*
from (select t.*,
(#rn := if(#n = name, #rn + 1,
if(#n := name, 1, 1)
)
) as rn
from my_table t cross join
(select #n := '', #rn := 0) params
order by name
) t
where rn <= 100;
If you want to limit this to a subset of the names, then add the where clause to the subquery.
Note: If you want to pick certain rows -- such as the oldest or newest or biggest or tallest -- just add a second key to the order by in the subquery.
Try
SELECT * FROM my_table WHERE Name IN ('name1','name2','name3','name4') FETCH FIRST 100 ROWS ONLY

calculate the differences between two rows in SQL

I have a SQL table, one row is the revenue in the specific day, and I want to add a new column in the table, the value is the incremental (could be positive or negative) revenue between a specific day and the previous day, and wondering how to implement by SQL?
Here is an example,
original table,
...
Day1 100
Day2 200
Day3 150
...
new table (add incremental column at the end, and for first column, could assign zero),
Day1 100 0
Day2 200 100
Day3 150 -50
I am using MySQL/MySQL Workbench.
thanks in advance,
Lin
SELECT a.day, a.revenue , a.revenue-COALESCE(b.revenue,0) as previous_day_rev
FROM DailyRevenue a
LEFT JOIN DailyRevenue b on a.day=b.day-1
the query assume that each day has one record in the table. If there could be more than 1 row for each day you need to create a view that sums up all days grouping by day.
If you're okay with re-ordering the columns slightly, something like this is pretty simple to understand:
SET #prev := 0;
SELECT day, revenue - #prev AS diff, #prev := revenue AS revenue
FROM revenue ORDER BY day ASC;
The trick is that we calculate the difference to the previous first, then set the previous to the current and display it as the current in one step.
Note, this depends on the order being correct since the calculations are done during the returning of the rows, so you need to make sure you have an ORDER BY clause that returns the days in the correct order.
Try;
select
t.date_col, t.val_col,
case when t1.val_col is null then 0
else t.val_col - t1.val_col end diff
from (
select t.* , #r := #r + 1 lev
from tbl t,
(select #r := 0) r
order by t.date_col
) t
left join (
select t.* , #r1 := #r1 + 1 lev
from tbl t,
(select #r1 := 1) r
order by t.date_col
) t1
on t.lev = t1.lev
This will calculate value diff even if there is a missing date

I need to find any 5 rows that match where clause and they occur "in a row" (they are neighbors)

I have a MySQL table for fictional fitness app.
Let's say that app is monitoring user progress on doing pushups day by day.
TrainingDays
id | id_user | date | number_of_pushups
Now, I need to find if user have ever managed to do more than 100 pushups 5 days in a row.
I know this is probably doable by fetching all days and then making some php loops, but I wonder if there is possibility to do this in plain mysql...
In MySQL, the easiest way is to use variables. The following gets all sequences of days with 100 or more pushups:
select grp, count(*) as numdaysinarow
from (select (date - interval rn day) as grp, td.*
from (select td.*,
(#rn := if(#i = id_user, #rn + 1
if(#i := id_user, 1, 1)
) as rn
from trainingdays td cross join
(select #rn := 0, #i := NULL) vars
where number_of_pushups >= 100
order by id_user, date
) td
) td
group by grp;
This uses the observation that when you subtract a sequence of numbers from a series of dates that increment, then the resulting value is constant.
To determine if there are 5 or more days in a row, use max():
select max(numdaysinarow)
from (select grp, count(*) as numdaysinarow
from (select (date - interval rn day) as grp, td.*
from (select td.*,
(#rn := if(#i = id_user, #rn + 1
if(#i := id_user, 1, 1)
) as rn
from trainingdays td cross join
(select #rn := 0, #i := NULL) vars
where number_of_pushups >= 100
order by id_user, date
) td
) td
group by grp
) td;
Your app can then check the value against whatever minimum you like.
Note: this assumes that there is only one record per day. The above can easily be modified if you are looking for the sum of the number of pushups on each day.
Order of records shouldn't be relied on, e.g. with ORDER BY you can change the sequence.
However, you have many functions at hand in a database, which also enables you to use less PHP. What you want is SUM function. Combined with a WHERE clause, this should get you started:
SELECT SUM(number_of_pushups) AS sum_pushups
FROM TrainingDays
WHERE date >= :start_day
AND user_id = :user_id

MySQL increment user variable when value changes

I have a table consisting of groups of, for example, five rows each. Each row in each group possesses a date value unique to that group.
What I want to do in my query, is go through the table, and increment a user variable (#count) when this date value changes. That's to say, #count should equal the group count, rather than the row count.
My current query looks like this, in case you're wondering:
SELECT #row := #row +1 AS rownum, date
FROM ( SELECT #row := 0 ) r, stats
Thanks a lot.
What about something like this?
SELECT
(CASE WHEN #date <> date THEN #row := #row +1 ELSE #row END) AS rownum,
#date:= date,
date
FROM ( SELECT #row := 0, #date := NOW() ) r, stats
You don't need a user variable to answer the query that you are doing. Is there a reason you want to use the user variable (for example, to emulate a ranking function?)
If not:
-- how many groups are there?
select count(distinct date) distinct_groups from table;
-- each group and count of rows in the group
select date, count(*) from table group by date;
-- if you want the Nth row from each group, assuming you have an auto_increment called id:
select *
from table
join ( select date, max(id) id
from table
group by date
) sq
on table.id = sq.id