I am trying to calculate the spearmans rank correlation for some data in mysql. For this I need to rank my data on a descending order. I got this working but when 2 rows have the same variable the rank should be the average of the 2 or more ranks.
As an example here is some example data with the current ranks and the expected ranks
| id|var|rank|
| 8 | 1 | 1 |
| 2 | 2 | 2 | # rank should be 2.5
| 6 | 2 | 3 | # rank should be 2.5
| 4 | 3 | 4 |
| 5 | 4 | 5 |
| 1 | 5 | 6 |
| 3 | 6 | 7 | # rank should be 8
| 7 | 6 | 8 | # rank should be 8
| 9 | 6 | 9 | # rank should be 8
My query looks like this right now:
SET #rownum := 0;
SET #rownum2 := 0;
SELECT rank_x.id, rank_x.var1, rank_x.rk_x
FROM
(SELECT id, #rownum := #rownum + 1 AS rk_x, var1
FROM sampledata order by var1 asc) as rank_x;
You can do this by assigning the sequential number and then taking the average. This requires some nested subqueries, but is doable. The idea is:
First assign the sequential value
Then find the max for each id.
Then find the min
Then take the average
The query looks like:
SELECT id, var1, (minrn + maxrn) / 2
FROM (SELECT sd.*,
(#maxrn := if(#v2 = var1, #maxrn,
if(#v2 := var1, rn, rn)
)
) as maxrn
FROM (SELECT sd.*,
(#minrn := if(#v = var1, #minrn,
if(#v := var1, rn, rn)
)
) as minrn
FROM (SELECT id, var1, (#rn := #rn + 1) as rn
FROM sampledata sd CROSS JOIN
(SELECT #rn := 0) vars
ORDER BY var1 asc
) sd CROSS JOIN
(SELECT #minrn := 0, #v := -1) vars
ORDER BY var1, rn
) sd CROSS JOIN
(SELECT #maxrn := 0, #v2 := -1) vars
ORDER BY var1, rn desc
) sd;
Related
I want to find a user's position in a leaderboard and return the 4 users above and 4 users below their position.
My table, 'predictions', looks something like this:
+----+---------+--------+-------+---------+
| id | userId | score | rank | gameId |
+----+---------+--------+-------+---------+
| 1 | 12 | 11 | 1 | 18 |
| 2 | 1 | 6 | 4 | 18 |
| 3 | 43 | 7 | 3 | 12 |
| 4 | 4 | 9 | 2 | 18 |
| 5 | 98 | 2 | 5 | 19 |
| 6 | 3 | 0 | 6 | 18 |
+----+---------+--------+-------+---------+
Obviously this isn't properly ordered, so I run this:
SELECT l.userId,
l.rank,
l.score,
l.createdAt,
#curRow := #curRow + 1 AS row_number
FROM (SELECT * FROM `predictions` WHERE gameId = 18) l
JOIN (SELECT #curRow := 0) r
ORDER BY rank ASC
which gets me a nice table with each entry numbered.
I then want to search this generated table, find the row_number where userId = X, and then return the values 'around' that result.
I think I have the logic of the query down, I just can't work out how to reference the table 'generated' by the above query.
It would be something like this:
SELECT *
FROM (
SELECT l.userId,
l.rank,
l.score,
l.createdAt,
#curRow := #curRow + 1 AS row_number
FROM (SELECT * FROM `predictions` WHERE gameId = 18) l
JOIN (SELECT #curRow := 0) r
ORDER BY rank ASC) generated_ordered_table
WHERE row_number < (SELECT row_number FROM generated_ordered_table WHERE userId = 1)
ORDER BY row_number DESC
LIMIT 0,5
This fails. What I'm trying to do is to generate my first table with the correct query, give it an alias of generated_ordered_table, and then reference this 'table' later on in this query.
How do I do this?
MySQL version 8+ could have allowed the usage of Window functions, and Common Table Expressions (CTEs); which would have simplified the query quite a bit.
Now, in the older versions (your case), the "Generated Rank Table" (Derived Table) cannot be referenced again in a subquery inside the WHERE clause. One way would be to do the same thing twice (select clause to get generated table) again inside the subquery, but that would be relatively inefficient.
So, another approach can be to use Temporary Tables. We create a temp table first storing the ranks. And, then reference that temp table to get results accordingly:
CREATE TEMPORARY TABLE IF NOT EXISTS gen_rank_tbl AS
(SELECT l.userId,
l.rank,
l.score,
l.createdAt,
#curRow := #curRow + 1 AS row_number
FROM (SELECT * FROM `predictions` WHERE gameId = 18) l
JOIN (SELECT #curRow := 0) r
ORDER BY rank ASC)
Now, you can reference this temp table to get the desired results:
SELECT *
FROM gen_rank_tbl
WHERE row_number < (SELECT row_number FROM gen_rank_tbl WHERE userId = 1)
ORDER BY row_number DESC
LIMIT 0,5
You could use a bunch of unions
select userid,rank,'eq'
from t where gameid = 18 and userid = 1
union
(
select userid,rank,'lt'
from t
where gameid = 18 and rank < (select rank from t t1 where t1.userid = 1 and t1.gameid = t.gameid)
order by rank desc limit 4
)
union
(
select userid,rank,'gt'
from t
where gameid = 18 and rank > (select rank from t t1 where t1.userid = 1 and t1.gameid = t.gameid)
order by rank desc limit 4
);
+--------+------+----+
| userid | rank | eq |
+--------+------+----+
| 1 | 4 | eq |
| 4 | 2 | lt |
| 12 | 1 | lt |
| 3 | 6 | gt |
+--------+------+----+
4 rows in set (0.04 sec)
But it's not pretty
You can use two derived tables:
SELECT p.*,
(#user_curRow = CASE WHEN user_id = #x THEN rn END) as user_rn
FROM (SELECT p.*, #curRow := #curRow + 1 AS rn
FROM (SELECT p.*
FROM predictions p
WHERE p.gameId = 18
ORDER BY rank ASC
) p CROSS JOIN
(SELECT #curRow := 0, #user_curRow := -1) params
) p
HAVING rn BETWEEN #user_curRow - 4 AND #user_currow + 4;
If I have a table with the following columns and values, ordered by parent_id:
id parent_id line_no
-- --------- -------
1 2
2 2
3 2
4 3
5 4
6 4
And I want to populate line_no with a sequential number that starts over at 1 every time the value of parent_id changes:
id parent_id line_no
-- --------- -------
1 2 1
2 2 2
3 2 3
4 3 1
5 4 1
6 4 2
What would the query or sproc look like?
NOTE: I should point out that I only need to do this once. There's a new function in my PHP code that automatically creates the line_no every time a new record is added. I just need to update the records that already exist.
Most versions of MySQL do not support row_number(). So, you can do this using variables. But you have to be very careful. MySQL does not guarantee the order of evaluation of variables in the select, so a variable should not be assigned an referenced in different expressions.
So:
select t.*,
(#rn := if(#p = parent_id, #rn + 1,
if(#p := parent_id, 1, 1)
)
) as line_no
from (select t.* from t order by id) t cross join
(select #p := 0, #rn := 0) params;
The subquery to sort the table may not be necessary. Somewhere around version 5.7, this became necessary when using variables.
EDIT:
Updating with variables is fun. In this case, I would just use subqueries with the above:
update t join
(select t.*,
(#rn := if(#p = parent_id, #rn + 1,
if(#p := parent_id, 1, 1)
)
) as new_line_no
from (select t.* from t order by id) t cross join
(select #p := 0, #rn := 0) params
) tt
on t.id = tt.id
set t.line_no = tt.new_line_no;
Or, a little more old school...
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(id SERIAL PRIMARY KEY
,parent_id INT NOT NULL
);
INSERT INTO my_table VALUES
(1, 2),
(2 , 2),
(3 , 2),
(4 , 3),
(5 , 4),
(6 , 4);
SELECT x.*
, CASE WHEN #prev = parent_id THEN #i := #i+1 ELSE #i := 1 END i
, #prev := parent_id prev
FROM my_table x
, (SELECT #prev:=null,#i:=0) vars
ORDER
BY parent_id,id;
+----+-----------+------+------+
| id | parent_id | i | prev |
+----+-----------+------+------+
| 1 | 2 | 1 | 2 |
| 2 | 2 | 2 | 2 |
| 3 | 2 | 3 | 2 |
| 4 | 3 | 1 | 3 |
| 5 | 4 | 1 | 4 |
| 6 | 4 | 2 | 4 |
+----+-----------+------+------+
You can use subquery if the row_number() doesn't help :
select t.*,
(select count(*)
from table t1
where t1.parent_id = t.parent_id and t1.id <= t.id
) as line_no
from table t;
I have a table like this:
// requests
+----+----------+-------------+
| id | id_user | unix_time |
+----+----------+-------------+
| 1 | 2353 | 1339412843 |
| 2 | 2353 | 1339412864 |
| 3 | 5462 | 1339412894 |
| 4 | 3422 | 1339412899 |
| 5 | 3422 | 1339412906 |
| 6 | 2353 | 1339412906 |
| 7 | 7785 | 1339412951 |
| 8 | 2353 | 1339413640 |
| 9 | 5462 | 1339413621 |
| 10 | 5462 | 1339414490 |
| 11 | 2353 | 1339414923 |
| 12 | 2353 | 1339419901 |
| 13 | 8007 | 1339424860 |
| 14 | 7785 | 1339424822 |
| 15 | 2353 | 1339424902 |
+----+----------+-------------+
I want to grouping unix_time column based on separated days. Actually I'm trying to make this for an specific user:
As you see I need tow numbers for an user:
the number of all days which there is a foot print of the user into requests table
the number of biggest consecutive days
How can I do that?
Actually I can use WHERE id_user = :id to select user's rows. And I can calculate the number of days by SUM(). And by using MAX() I can calculate the biggest consecutive range. Just I need to grouping those unix times.
Please give it a try:
SELECT
t.id_user,
COUNT(*) totalVisits,
MAX(t.max_cons) maxCons
FROM
(SELECT
id_user,
#lastUnixTime AS lastUnixTimeOfuser,
IF(#uid <> id_user, #currentMax := 1 , #currentMax),
IF(#uid <> id_user, #lastUnixTime := 0, #lastUnixTime := #lastUnixTimeOfLastRecord),
IF(#uid = id_user,
IF((#lastUnixTime + 86400) >= utime, #currentMax := #currentMax + 1, #currentMax := 1), #lastUnixTime := 0),
IF(#currentMax > #max, #max := #currentMax, #max ),
IF(#uid <> id_user , #max := 1 ,#max),
#uid := id_user,
#lastUnixTimeOfLastRecord := utime,
#max AS max_cons
FROM
(
SELECT
id_user,
(unix_time DIV 86400) * 86400 AS utime
FROM requests
GROUP BY id_user, utime ) dayWiseRequestTable ,
(
SELECT
#uid := 0,
#currentMax := 0,
#max := 0,
#lastUnixTime := 0,
#lastUnixTimeOfLastRecord := 0
) vars
ORDER BY id_user, utime) t
GROUP BY t.id_user;
SQL FIDDLE DEMO
Output:
The final output looks like below:
id_user Total_Visits Maximum_Consecutive_Visits
2353 7 2
3422 2 2
5462 3 2
7785 2 1
8007 1 1
EDIT:
In order to get output for a specific user you need to add a WHERE clause in the inner query.
Please check this SQL FIDDLE
Use can extract the day using from_unixtime(). Then you can get count the days using variables:
select id_user, d,
(#rn := if(#di = concat_ws(':', d - interval 1 day, id_user), #rn + 1,
if(#di := concat_ws(':', d, id_user), 1, 1)
)
) as rn
from (select id_user, date(from_unixtime(unix_time)) as d
from t
group by id_user, d
) cross join
(select #di := '', #rn := 0) params
order by id_user, d;
From here to the summary is just an aggregation:
select id_user, count(*) as numdays, max(rn) as maxconsecutive
from (select id_user, d,
(#rn := if(#di = concat_ws(':', d - interval 1 day, id_user), #rn + 1,
if(#di := concat_ws(':', d, id_user), 1, 1)
)
) as rn
from (select id_user, date(from_unixtime(unix_time)) as d
from t
group by id_user, d
) cross join
(select #di := '', #rn := 0) params
order by id_user, d
) ud
group by id_user;
Here is a SQL Fiddle illustrating the code.
I have a table with gr_no, year_dob, family_id etc. etc.
I am trying to rank birth year according to family_id and am unable to generate the siblings_rank result.
+----------+--------------+-----------+
| gr_no | year_dob | family_id | siblings_rank
+----------+--------------+-----------+
| 1001 | 1992 | 95 | 1
| 10234 | 1995 | 95 | 2
| 10236 | 2004 | 96 | 1
| 15568 | 2006 | 96 | 2
| 1225 | 2004 | 92 | 1
+----------+--------------+-----------+
This query is working :
SET #prev := null;
SET #cnt := 1;
SELECT gr_no, gs_id, gf_id, year_dob, IF(#prev <> gf_id, #cnt := 1, #cnt := #cnt + 1) AS siblings_position, #prev := gf_id as previous_gf_id
FROM student_registered
ORDER BY gf_id, year_dob asc
This query is also working:
SELECT gr_no, gs_id, gf_id, year_dob, IF(#prev <> gf_id, #cnt := 1, #cnt := #cnt + 1) AS siblings_position, #prev := gf_id as previous_gf_id
FROM student_registered
JOIN (SELECT #prev := null) p
JOIN (SELECT #cnt := 1) c
ORDER BY gf_id, year_dob asc
............... I am unable to create view with these query?
or
If a procedure can update the student_registered column 'siblings_position' based on the queries?
you can't use sql variables in the view
here is another way to get the same result using correlated subquery
SELECT gr_no, family_id,year_dob,
( select count(*) from Table1 T1
where T1.family_id = T.family_id
and T1.year_dob <= T.year_dob) as siblings_position
FROM Table1 T
ORDER BY family_id, year_dob asc
i have user_points Table with two columns.
select user_values, userid from user_points, based on count of userid i want to assgin the rank to users.. i have write this query
SELECT count_temp.* , #curRank:=(#curRank + 1) AS rank
FROM (
SELECT userid, COUNT(*) AS totalcount FROM user_points t GROUP BY t.userid
) AS count_temp
, (SELECT #curRank := 0) r
ORDER BY totalcount DESC;
gives the result as :
userid | totalcount | rank
2 6 1
3 2 2
4 2 3
1 1 4
but i want to assgin to rank 2 for userid 3 and 4 because their totalcount are same ..
To emulate RANK() function, which returns the rank of each row within the partition of a result set, you can do
SELECT userid, totalcount, rank
FROM
(
SELECT userid, totalcount,
#n := #n + 1, #r := IF(#c = totalcount, #r, #n) rank, #c := totalcount
FROM
(
SELECT userid, COUNT(*) AS totalcount
FROM user_points t
GROUP BY t.userid
ORDER BY totalcount DESC
) t CROSS JOIN
(
SELECT #r := 0, #n := 0, #c := NULL
) i
) q;
Output:
| USERID | TOTALCOUNT | RANK |
|--------|------------|------|
| 2 | 6 | 1 |
| 3 | 2 | 2 |
| 4 | 2 | 2 |
| 1 | 1 | 4 |
To emulate DENSE_RANK() function, which returns the rank of rows within the partition of a result set, without any gaps in the ranking, you can do
SELECT userid, totalcount, rank
FROM
(
SELECT userid, totalcount,
#r := IF(#c = totalcount, #r, #r + 1) rank, #c := totalcount
FROM
(
SELECT userid, COUNT(*) AS totalcount
FROM user_points t
GROUP BY t.userid
ORDER BY totalcount DESC
) t CROSS JOIN
(
SELECT #r := 0, #c := NULL
) i
) q;
Output:
| USERID | TOTALCOUNT | RANK |
|--------|------------|------|
| 2 | 6 | 1 |
| 3 | 2 | 2 |
| 4 | 2 | 2 |
| 1 | 1 | 3 |
Here is SQLFiddle demo for both queries