Auto incremental temporary column in conditional select statement - mysql

I am using the following select query to group and calculate a specific column:
SELECT `COUNTRY`, sum(`POINT`) as total_mark
FROM results
GROUP BY `COUNTRY`
ORDER by total_mark DESC
I would like to add a temporary column that numbers all rows incrementally. I used:
(#cnt := #cnt + 1) AS rowNumber
The new column gets created but returns a NULL value for all rows. Can anyone help me to do the grouping and calculation first and the numbering the rows it returns. I tried this:
SELECT `COUNTRY`, sum(`POINT`) as total_mark, (#cnt := #cnt + 1) AS rowNumber
FROM results
GROUP BY `COUNTRY`
ORDER by total_mark DESC
I still only get NULL values.

You have to initialize #cnt to 0.
You should also do the ordering in a subquery. Otherwise, the row numbers may be assigned before ordering.
SELECT country, total_mark, (#cnt := #cnt + 1) AS rowNumber
FROM (
SELECT `COUNTRY`, sum(`POINT`) as total_mark
FROM results
GROUP BY `COUNTRY`
ORDER by total_mark DESC
) AS x
CROSS JOIN (SELECT #cnt := 0) AS vars
Note that if you're using MySQL 8.x you can use the ROW_NUMBER() window function instead of a session variable.

Related

MySQL: user-variable definition within SQL statement to create counter column

Is it possible to create a counter in mysql/mariadb in one single SELECT-statement. I've tried the following but it returns only the value 1 in the first column:
SELECT #rownr := IF(ISNULL(#rownr),0,#rownr)+1 AS rowNumber, * FROM table_x LIMIT 0,10
If I run the statement more often in the same mysql-instance it starts counting from the last number. So the second time it starts at 2, the third time at 12. This means that the variable is created but seems to be only available for modification when it was instantiated before the SQL statement was issued.
It is possible, but a bit tricky. First, you need to declare the variable outside of the select clause (in a separate set assignment, or in a derived table). Also, it is safer to sort the rows in a subquery first, and then compute the variable.
I would recommend:
set #rn := 0;
select t.*, #rn := #rn + 1 rowNumber
from (select t.* from mytable t order by id limit 10) t
Note that I added an order by clause to the inner query, otherwise it is undefined in which sequence rows will be ordered (I assumed id).
Alternatively, you can declare the variable in a derived table:
select t.*, #rn := #rn + 1 rowNumber
from (select t.* from mytable t order by id limit 10) t
cross join (select #rn := 0) x
Finally: if you are running MySQL 8.0, just use row_number():
select t.*, row_number() over(order by id) rn
from mytable t
order by id
limit 10;
You don't have an order by, so the ordering is indeterminate. But you can initialize the parameter in the statement itself:
SELECT #rownr := (#rownr + 1) AS rowNumber, x.*
FROM table_x x.CROSS JOIN
(SELECT #rownr := 0) params
LIMIT 0, 10;
If you want a particular ordering, you should use an order by in a subquery.
Also note that starting in MySQL 8, variable assignments in SELECT are deprecated. You should be using window functions (row_number()) in more recent versions.

Mysql - Accumulatively count the total on a row by row basis

I'm trying in MySql to count the number of users created each day and then get an accumulative figure on a row by row basis. I have followed other suggestions on here, but I cannot seem to get the accumulation to be correct.
The problem is that it keeps counting from the base number of 200 and not taking account of previous rows.
Where was I would expect it to return
My Sql is as follows;
SELECT day(created_at), count(*), (#something := #something+count(*)) as value
FROM myTable
CROSS JOIN (SELECT #something := 200) r
GROUP BY day(created_at);
To create the table and populate it you can use;
CREATE TABLE myTable (
id INT AUTO_INCREMENT,
created_at DATETIME,
PRIMARY KEY (id)
);
INSERT INTO myTable (created_at)
VALUES ('2018-04-01'),
('2018-04-01'),
('2018-04-01'),
('2018-04-01'),
('2018-04-02'),
('2018-04-02'),
('2018-04-02'),
('2018-04-03'),
('2018-04-03');
You can view this on SqlFiddle.
Use a subquery:
SELECT day, cnt, (#s := #s + cnt)
FROM (SELECT day(created_at) as day, count(*) as cnt
FROM myTable
GROUP BY day(created_at)
) d CROSS JOIN
(SELECT #s := 0) r;
GROUP BY and variables have not worked together for a long time. In more recent versions, ORDER BY also needs a subquery.

MySQL: Limiting result for WHERE IN list

Let's say there are millions of records in my_table.
Here is my query to extract rows with a specific name from list:
SELECT * FROM my_table WHERE Name IN ('name1','name2','name3','name4')
How do I limit the returned result per name1, name2, etc?
The following query would limit the whole result (to 100).
SELECT * FROM my_table WHERE Name IN ('name1','name2','name3','name4') LIMIT 100
I need to limit to 100 for each name.
This is a bit of a pain in MySQL, but the best method is probably variables:
select t.*
from (select t.*,
(#rn := if(#n = name, #rn + 1,
if(#n := name, 1, 1)
)
) as rn
from my_table t cross join
(select #n := '', #rn := 0) params
order by name
) t
where rn <= 100;
If you want to limit this to a subset of the names, then add the where clause to the subquery.
Note: If you want to pick certain rows -- such as the oldest or newest or biggest or tallest -- just add a second key to the order by in the subquery.
Try
SELECT * FROM my_table WHERE Name IN ('name1','name2','name3','name4') FETCH FIRST 100 ROWS ONLY

What are the subquery equivalents of SQL aggregate functions MAX/MIN/AVG/COUNT

Can someone show me how to represent the following SQL statements without the use of aggregate functions?
SELECT COUNT(column) FROM table;
SELECT AVG(column) FROM table;
SELECT MAX(column) FROM table;
SELECT MIN(column) FROM table;
MIN() and MAX() can be done with simple subqueries:
select (select column from table order by column is not null desc, column asc limit 1) as "MIN",
(select column from table order by column is not null desc, column desc limit 1) as "MAX"
COUNT() and AVG() require the use of variables, if you don't allow any aggregations:
select rn as "COUNT", sumcol / rnaas "AVG"
from (select t.*
from (select t.*,
(#rn := #rn + 1) as rn,
(#rna := #rna + if(column is not null, 1, 0)) as rna,
(#sum := #sum + coalesce(column, 0)) as sumcol
from table t cross join
(select #rn := 0, #rna := 0, #sum := 0) const
order by column
) t
order by rn desc
limit 1
) t
This latter formulation only works in MySQL.
EDIT:
The empty table is a challenge. Let's do this with a left outer join:
select cast(coalesce(rn, 0) as int) as "COUNT",
(case when rna > 0 then sumcol / rna end) as "AVG"
from (select 1 as n
) n left outer join
(select t.*
from (select t.*,
(#rn := #rn + 1) as rn,
(#rna := #rna + if(column is not null, 1, 0)) as rna,
(#sum := #sum + coalesce(column, 0)) as sumcol
from table t cross join
(select #rn := 0, #rna := 0, #sum := 0) const
order by column
) t
order by rn desc
limit 1
) t
on n.n = 1;
Notes. This will return 0 for the count if the table is empty. That is correct. If the table is empty, it will return NULL for the average, and that is also correct.
If the table is not empty, but the values are all NULL, then it will also return NULL. The types for the count are always integers, so that should be ok. The type of the average is more problematic, but the variables will return some sort of generic numeric type, which seems compatible in spirit.
min/max can be replaced with something like this:
select t1.pk_column,
t1.some_column
from the_table t1
where t1.some_column < ALL (select t2.some_column
from the_table t2
where t2.pk_column <> t2.pk_column);
For getting the max you need to replace < with >. pk_column is the primary key column of the table and is needed to avoid comparing each row to itself (it doesn't have to be a PK it only needs to be unique)
I don't think there is an alternative for count() or avg() (at least I can't think of one)
I used the_column and the_table because column and table are reserved words
SET #t1=0, #t2=0, #t3=0,#T4=0;
COUNT:
Select #t1:=#t1+1 as CNT from table
order by #t1:=#t1+1 DESC
LIMIT 1
Similar methods could be put together for Avg and max/min using limits...
Still thinking about Min/Max...
Not to supersede the excellent answer from Gordon Linoff, but there's a little more work involved to accurately emulate the AVG(), COUNT(), and SUM() functions. (The answer for the MIN and MAX functions in Gordon's answer are spot on.)
There's a corner case when the table is empty. In order to emulate the SQL aggregate functions, we need our query to return a single row. But at the same time, we need a test of whether or not the table contains at least one row.
Here's a query that is a more precise emulation:
-- create an empty table
CREATE TABLE `foo` (col INT);
-- TRUNCATE TABLE `foo`;
SELECT IF(s.ne IS NULL,0,s.rn) AS `COUNT(*)`
, IF(s.cc>0,s.tc,NULL) AS `SUM(col)`
, IF(s.cc>0,s.tc/s.cc,NULL) AS `AVG(col)`
FROM ( SELECT v.rn
, v.cc
, v.tc
, e.ne
FROM ( SELECT #rn := #rn + 1 AS rn
, #cc := #cc + (t.col IS NOT NULL) AS cc
, #tc := #tc + IFNULL(t.col,0) AS tc
FROM (SELECT #rn := 0, #cc := 0, #tc := 0) c
LEFT
JOIN `foo` t
ON 1=1
) v
LEFT
JOIN (SELECT 1 AS ne FROM `foo` z LIMIT 1) e
ON 1=1
ORDER BY v.rn DESC
LIMIT 1
) s
NOTES:
The purpose of the inline view aliased as e is to give us a way to determine whether or not the table contains any rows. If the table contains at least one row, we'll get a value of 1 returned as column ne (not empty). If the table is empty, that query won't return a row, and e.ne will be NULL, which is something we can test in the outer query.
In order to return a row, so we can return a value, like a 0 for a COUNT, we need to insure that we return at least one row from the inline view v. Since we are guaranteed exactly one row from the inline view aliased as c (which initializes our user defined variables), we'll use that as the "driving" table for a LEFT [OUTER] JOIN operation.
But, if the table is empty, our our row counter (#rn) coming out of v is going to have a value of 1. But we'll deal with that, we have the e.ne we can check to know if the count should really be returned as 0.
In order to calculate the average, we can't divide by the row counter, we have to divide by the number of rows where col was not null. We use the #cc user defined variable to keep track of the count of those rows.
Similarly, for the SUM (and the average) we need to accumulate only the non-NULL values. (If we were to add a NULL, it would turn the whole total to NULL, basically wiping out are accumulation. So, we're going to do a conditional test to check if t.col IS NULL, to avoid accidentally wiping out the accumulation. And our accumulator is going to be a 0 if there aren't any rows that are not null. But that's not a problem, because we'll make sure we check our #cc to see if there were any rows that were included. We're going to need to check it anyway, to avoid a "divide by zero" issue.
To test, run against the empty table foo. It will return a count of 0, and NULL for SUM and AVG, equivalent to the result we get from:
SELECT COUNT(*), SUM(col), AVG(col) FROM foo;
We can also test the query against a table containing only NULL values for col:
INSERT INTO `foo` (col) VALUES (NULL);
As well as some non-NULL values:
INSERT INTO `foo` (col) VALUES (2),(3),(5),(7),(11),(13),(17),(19);
And compare the results of the two queries.
This essentially the same as the answer from Gordon Linoff, with just a little more precision to work around the corner cases of NULL values and the empty table.

getting the ranking of the rows in mysql ORDER BY statements

suppose I have
SELECT * FROM t ORDER BY j
is there a way to specify the query to also return an autoincremented column that go along with the results that specifies the rank of that row in terms of the ordering?
also this column should also work when using ranged LIMITs, eg
SELECT * FROM t ORDER BY j LIMIT 10,20
should have the autoincremented column return 11,12,13,14 etc....
Oracle, MSSQL etc support ranking functions that do exactly what you want, unfortunately, MySQL has some catching up to do in this regard.
The closest I've ever been able to get to approximating ROW_NUMBER() OVER() in MySQL is like this:
SELECT t.*,
#rank = #rank + 1 AS rank
FROM t, (SELECT #rank := 0) r
ORDER BY j
I don't know how that would rank using ranged LIMIT unless you used that in a subquery perhaps (although performance may suffer with large datasets)
SELECT T2.*, rank
FROM (
SELECT t.*,
#rank = #rank + 1 AS rank
FROM t, (SELECT #rank := 0) r
ORDER BY j
) t2
LIMIT 10,20
The other option would be to create a temporary table,
CREATE TEMPORARY TABLE myRank
(
`rank` INT(11) NOT NULL AUTO_INCREMENT,
`id` INT(11) NOT NULL,
PRIMARY KEY(id, rank)
)
INSERT INTO myRank (id)
SELECT T.id
FROM T
ORDER BY j
SELECT T.*, R.rank
FROM T
INNER JOIN myRank R
ON T.id = R.id
LIMIT 10,20
Of course, the temporary table would need to be persisted between calls.
I wish there was a better way, but without ROW_NUMBER() you must resort to some hackery to get the behavior you want.