getting the ranking of the rows in mysql ORDER BY statements

getting the ranking of the rows in mysql ORDER BY statements - mysql

suppose I have
SELECT * FROM t ORDER BY j
is there a way to specify the query to also return an autoincremented column that go along with the results that specifies the rank of that row in terms of the ordering?
also this column should also work when using ranged LIMITs, eg
SELECT * FROM t ORDER BY j LIMIT 10,20
should have the autoincremented column return 11,12,13,14 etc....

Oracle, MSSQL etc support ranking functions that do exactly what you want, unfortunately, MySQL has some catching up to do in this regard.
The closest I've ever been able to get to approximating ROW_NUMBER() OVER() in MySQL is like this:
SELECT t.*,
#rank = #rank + 1 AS rank
FROM t, (SELECT #rank := 0) r
ORDER BY j
I don't know how that would rank using ranged LIMIT unless you used that in a subquery perhaps (although performance may suffer with large datasets)
SELECT T2.*, rank
FROM (
SELECT t.*,
#rank = #rank + 1 AS rank
FROM t, (SELECT #rank := 0) r
ORDER BY j
) t2
LIMIT 10,20
The other option would be to create a temporary table,
CREATE TEMPORARY TABLE myRank
(
`rank` INT(11) NOT NULL AUTO_INCREMENT,
`id` INT(11) NOT NULL,
PRIMARY KEY(id, rank)
)
INSERT INTO myRank (id)
SELECT T.id
FROM T
ORDER BY j
SELECT T.*, R.rank
FROM T
INNER JOIN myRank R
ON T.id = R.id
LIMIT 10,20
Of course, the temporary table would need to be persisted between calls.
I wish there was a better way, but without ROW_NUMBER() you must resort to some hackery to get the behavior you want.

Related

how to random select row in one table according to each row in other table?

there are 2 tables:
crash and traffic_flow.
crash table has attributes as crash_date, time, and corresponding detector ID.
traffic_flow table recorded by detectors has attributes date,time,detector_ID, auto-incrementing id and traffic flow parameters.
Now I'm willing to random select 10 rows in traffic_flow for each row in crash respectively and insert them in a new table.
The following is a trial:
select traffic_flow.id
from traffic_flow,crash
where traffic_flow.date=crash.date and traffic_flow.ID=crash.ID
order by rand()
limit 10;
but this sql statement select 10 rows in total for all crash records, not for each row in crash, so it can't meet my requirement. could you please modify the statement for me?

In MySQL, the simplest method is to use variables to enumerate the rows for each crash:
select *
from (select ct.*,
(#rn := if(#ct = id, #rn + 1,
if(#ct := id, 1, 1)
)
) as rn
from (select c.*, tf.id as tf_id
from traffic_flow tf join
crash c
on tf.date = c.date and tf.ID = c.ID
order by c.id, rand()
) ct cross join
(select #cid := -1, #rn := 0) params
)
where rn <= 10;

Mysql - Accumulatively count the total on a row by row basis

I'm trying in MySql to count the number of users created each day and then get an accumulative figure on a row by row basis. I have followed other suggestions on here, but I cannot seem to get the accumulation to be correct.
The problem is that it keeps counting from the base number of 200 and not taking account of previous rows.
Where was I would expect it to return
My Sql is as follows;
SELECT day(created_at), count(*), (#something := #something+count(*)) as value
FROM myTable
CROSS JOIN (SELECT #something := 200) r
GROUP BY day(created_at);
To create the table and populate it you can use;
CREATE TABLE myTable (
id INT AUTO_INCREMENT,
created_at DATETIME,
PRIMARY KEY (id)
);
INSERT INTO myTable (created_at)
VALUES ('2018-04-01'),
('2018-04-01'),
('2018-04-01'),
('2018-04-01'),
('2018-04-02'),
('2018-04-02'),
('2018-04-02'),
('2018-04-03'),
('2018-04-03');
You can view this on SqlFiddle.

Use a subquery:
SELECT day, cnt, (#s := #s + cnt)
FROM (SELECT day(created_at) as day, count(*) as cnt
FROM myTable
GROUP BY day(created_at)
) d CROSS JOIN
(SELECT #s := 0) r;
GROUP BY and variables have not worked together for a long time. In more recent versions, ORDER BY also needs a subquery.

SQL find rows where value is not increasing

I have a table with columns like this:
id | timestamp | ...
and I am looking for rows where the timestamp decreased since the previous row.
I tried a statement like this:
SELECT count(a.id)
FROM tbl AS a INNER JOIN tbl AS b ON a.id+1=b.id
WHERE a.timestamp<b.timestamp;
but it appears not to have worked. I get zero results even though I expect some. Any suggestions what is wrong?
I would also appreciate any ideas on a better way to write this query.
I am using MySQL.

You can get the previous value using a correlated subquery, and then use that for the comparison:
select t.*
from (select t.*,
(select t2.timestamp from tbl t2 where t2.id < t.id order by t2.id desc limit 1
) as prevts
from tbl t
) t
where timestamp < prevts;
The problem with your query is probably that the ids have gaps in them.
EDIT:
You can do this with variables. The challenge is getting the variable comparison and assignment in a single expression. This is needed because MySQL does not guarantee the order of evaluation of expressions in a select statement.
The following assigns a value to IsDecreasing and assigns the values:
select t.*
from (select t.*,
if(#prev > timestamp, if(#prev := timestamp, 1, 1),
if(#prev := timestamp, 0, 0)
) IsDecreasing
from tbl t cross join
(select #prev := -1) vars
order by id
) t
where IsDecreasing = 1;
This should be faster than the previous method -- probably even when you have the right index.

What are the subquery equivalents of SQL aggregate functions MAX/MIN/AVG/COUNT

Can someone show me how to represent the following SQL statements without the use of aggregate functions?
SELECT COUNT(column) FROM table;
SELECT AVG(column) FROM table;
SELECT MAX(column) FROM table;
SELECT MIN(column) FROM table;

MIN() and MAX() can be done with simple subqueries:
select (select column from table order by column is not null desc, column asc limit 1) as "MIN",
(select column from table order by column is not null desc, column desc limit 1) as "MAX"
COUNT() and AVG() require the use of variables, if you don't allow any aggregations:
select rn as "COUNT", sumcol / rnaas "AVG"
from (select t.*
from (select t.*,
(#rn := #rn + 1) as rn,
(#rna := #rna + if(column is not null, 1, 0)) as rna,
(#sum := #sum + coalesce(column, 0)) as sumcol
from table t cross join
(select #rn := 0, #rna := 0, #sum := 0) const
order by column
) t
order by rn desc
limit 1
) t
This latter formulation only works in MySQL.
EDIT:
The empty table is a challenge. Let's do this with a left outer join:
select cast(coalesce(rn, 0) as int) as "COUNT",
(case when rna > 0 then sumcol / rna end) as "AVG"
from (select 1 as n
) n left outer join
(select t.*
from (select t.*,
(#rn := #rn + 1) as rn,
(#rna := #rna + if(column is not null, 1, 0)) as rna,
(#sum := #sum + coalesce(column, 0)) as sumcol
from table t cross join
(select #rn := 0, #rna := 0, #sum := 0) const
order by column
) t
order by rn desc
limit 1
) t
on n.n = 1;
Notes. This will return 0 for the count if the table is empty. That is correct. If the table is empty, it will return NULL for the average, and that is also correct.
If the table is not empty, but the values are all NULL, then it will also return NULL. The types for the count are always integers, so that should be ok. The type of the average is more problematic, but the variables will return some sort of generic numeric type, which seems compatible in spirit.

min/max can be replaced with something like this:
select t1.pk_column,
t1.some_column
from the_table t1
where t1.some_column < ALL (select t2.some_column
from the_table t2
where t2.pk_column <> t2.pk_column);
For getting the max you need to replace < with >. pk_column is the primary key column of the table and is needed to avoid comparing each row to itself (it doesn't have to be a PK it only needs to be unique)
I don't think there is an alternative for count() or avg() (at least I can't think of one)
I used the_column and the_table because column and table are reserved words

SET #t1=0, #t2=0, #t3=0,#T4=0;
COUNT:
Select #t1:=#t1+1 as CNT from table
order by #t1:=#t1+1 DESC
LIMIT 1
Similar methods could be put together for Avg and max/min using limits...
Still thinking about Min/Max...

Not to supersede the excellent answer from Gordon Linoff, but there's a little more work involved to accurately emulate the AVG(), COUNT(), and SUM() functions. (The answer for the MIN and MAX functions in Gordon's answer are spot on.)
There's a corner case when the table is empty. In order to emulate the SQL aggregate functions, we need our query to return a single row. But at the same time, we need a test of whether or not the table contains at least one row.
Here's a query that is a more precise emulation:
-- create an empty table
CREATE TABLE `foo` (col INT);
-- TRUNCATE TABLE `foo`;
SELECT IF(s.ne IS NULL,0,s.rn) AS `COUNT(*)`
, IF(s.cc>0,s.tc,NULL) AS `SUM(col)`
, IF(s.cc>0,s.tc/s.cc,NULL) AS `AVG(col)`
FROM ( SELECT v.rn
, v.cc
, v.tc
, e.ne
FROM ( SELECT #rn := #rn + 1 AS rn
, #cc := #cc + (t.col IS NOT NULL) AS cc
, #tc := #tc + IFNULL(t.col,0) AS tc
FROM (SELECT #rn := 0, #cc := 0, #tc := 0) c
LEFT
JOIN `foo` t
ON 1=1
) v
LEFT
JOIN (SELECT 1 AS ne FROM `foo` z LIMIT 1) e
ON 1=1
ORDER BY v.rn DESC
LIMIT 1
) s
NOTES:
The purpose of the inline view aliased as e is to give us a way to determine whether or not the table contains any rows. If the table contains at least one row, we'll get a value of 1 returned as column ne (not empty). If the table is empty, that query won't return a row, and e.ne will be NULL, which is something we can test in the outer query.
In order to return a row, so we can return a value, like a 0 for a COUNT, we need to insure that we return at least one row from the inline view v. Since we are guaranteed exactly one row from the inline view aliased as c (which initializes our user defined variables), we'll use that as the "driving" table for a LEFT [OUTER] JOIN operation.
But, if the table is empty, our our row counter (#rn) coming out of v is going to have a value of 1. But we'll deal with that, we have the e.ne we can check to know if the count should really be returned as 0.
In order to calculate the average, we can't divide by the row counter, we have to divide by the number of rows where col was not null. We use the #cc user defined variable to keep track of the count of those rows.
Similarly, for the SUM (and the average) we need to accumulate only the non-NULL values. (If we were to add a NULL, it would turn the whole total to NULL, basically wiping out are accumulation. So, we're going to do a conditional test to check if t.col IS NULL, to avoid accidentally wiping out the accumulation. And our accumulator is going to be a 0 if there aren't any rows that are not null. But that's not a problem, because we'll make sure we check our #cc to see if there were any rows that were included. We're going to need to check it anyway, to avoid a "divide by zero" issue.
To test, run against the empty table foo. It will return a count of 0, and NULL for SUM and AVG, equivalent to the result we get from:
SELECT COUNT(*), SUM(col), AVG(col) FROM foo;
We can also test the query against a table containing only NULL values for col:
INSERT INTO `foo` (col) VALUES (NULL);
As well as some non-NULL values:
INSERT INTO `foo` (col) VALUES (2),(3),(5),(7),(11),(13),(17),(19);
And compare the results of the two queries.
This essentially the same as the answer from Gordon Linoff, with just a little more precision to work around the corner cases of NULL values and the empty table.

Getting latest rows in MySQL based on date (grouped by another column)

This type of question is asked every now and then. The queries provided works, but it affects performance.
I have tried the JOIN method:
SELECT *
FROM nbk_tabl
INNER JOIN (
SELECT ITEM_NO, MAX(REF_DATE) as LDATE
FROM nbk_tabl
GROUP BY ITEM_NO) nbk2
ON nbk_tabl.REF_DATE = nbk2.LDATE
AND nbk_tabl.ITEM_NO = nbk2.ITEM_NO
And the tuple one (way slower):
SELECT *
FROM nbk_tabl
WHERE REF_DATE IN (
SELECT MAX(REF_DATE)
FROM nbk_tabl
GROUP BY ITEM_NO
)
Is there any other performance friendly way of doing this?
EDIT: To be clear, I'm applying this to a table with thousands of rows.

Yes, there is a faster way.
select *
from nbk_table
order by ref_date desc
limit <n>
Where is the number of rows that you want to return.
Hold on. I see you are trying to do this for a particular item. You might try this:
select *
from nbk_table n
where ref_date = (select max(ref_date) from nbk_table n2 where n.item_no = n2.item_no)
It might optimize better than the "in" version.

Also in MySQL you can use user variables (Suppose nbk_tabl.Item_no<>0):
select *
from (
select nbk_tabl.*,
#i := if(#ITEM_NO = ITEM_NO, #i + 1, 1) as row_num,
#ITEM_NO := ITEM_NO as t_itemNo
from nbk_tabl,(select #i := 0, #ITEM_NO := 0) t
order by Item_no, REF_DATE DESC
) as x where x.row_num = 1;

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

getting the ranking of the rows in mysql ORDER BY statements - mysql

Related

how to random select row in one table according to each row in other table?

Mysql - Accumulatively count the total on a row by row basis

SQL find rows where value is not increasing

What are the subquery equivalents of SQL aggregate functions MAX/MIN/AVG/COUNT

Getting latest rows in MySQL based on date (grouped by another column)

Categories

Resources