I want to get a random row for each group when using GROUP BY in MySQL 5.7. The most clean way to do it from my research is doing something like this:
SELECT ANY_VALUE(column_1), ANY_VALUE(column_2), ..., ANY_VALUE(column_n)
FROM table
GROUP BY column
Since there is no syntax for something like ANY_VALUE(*) or ANY_VALUE(column_1, column2, ..., column_n) I am left confused if with the above query each value can come from a different row, or if all ANY_VALUE fields will come from the same row.
If you want a random row, use row_number():
select t.*
from (select t.*,
row_number() over (partition by column order by rand()) as seqnum
from t
) t
where seqnum = 1;
I am guessing that this is also faster than group by, but you can check if that is the case.
In MySQL 5.7, you can use variables:
select t.*
from (select t.*,
(#rn := if(#c = column, #rn + 1,
if(#c := column, 1, 1)
)
) as rn
from (select t.* from t order by column, rand) t cross join
(select #c := '', #rn := 0) params
) t
where rn = 1;
Assuming the following schema and sample data:
create table tbl(
id int auto_increment primary key,
grp int not null,
val int not null,
index (grp)
);
insert into tbl (grp, val) values (1, 1);
insert into tbl (grp, val) values (1, 2);
insert into tbl (grp, val) values (1, 3);
insert into tbl (grp, val) values (2, 1);
insert into tbl (grp, val) values (2, 2);
Get distinct groups in a derived table (or use the base table for groups, if you have). Get a random primary key in a subquery in SELECT clause with ORDER BY rand() LIMIT 1. Then join the result as a derived table with the base table.
select t.*
from (
select (
select id
from tbl t
where t.grp = g.grp
order by rand()
limit 1
) as id
from (select distinct grp from tbl) g
) r
join tbl t using (id);
Result would be something like
| id | grp | val |
| --- | --- | --- |
| 2 | 1 | 2 |
| 4 | 2 | 1 |
View on DB Fiddle
Related
The goal
I am trying to write a query to find duplicate rows. A row is duplicate when either Column A or Column B is the same.
Writing it so that both need to be the same is easy; just a simple GROUP BY A, B.
However, filtering by just one of the two is proving to be a bit more difficult. How would one go about doing this?
I've tried the following:
select distinct a as col_a,
b as col_b,
(
select count(*)
from table_name
where a = col_a
or b = col_b
) as duplicate_count
from table_name
having duplicate_count > 1;
but it does not feel like the right way to go about this and with 84.000 rows it is also very slow.
Example
With the following table:
+----+------------------------+---+---------+
| id | name | a | b |
+----+------------------------+---+---------+
| 1 | Lorem ipsum | 1 | Donec |
+----+------------------------+---+---------+
| 2 | dolor sit | 2 | rhoncus |
+----+------------------------+---+---------+
| 3 | amet | 3 | rhoncus |
+----+------------------------+---+---------+
| 4 | consectetur adipiscing | 1 | primis |
+----+------------------------+---+---------+
| 5 | vulputate cursus | 4 | Aliquam |
+----+------------------------+---+---------+
Either result 1 or 4 (same A) and either result 2 or 3 (same B) should be returned, both with a duplicate_count of 2.
Which one of the two "duplicates" is returned does not matter.
Versions
On my local machine I use MySQL 5.7.24.
I just checked the live server, it uses 10.1.43-MariaDB.
You already know that this query:
select a, b
from tablename
group by a, b
having count(*) > 1
returns duplicates with both a and b equal.
You can get the rest of the duplicates for your requirement with EXISTS:
select t.a, t.b
from tablename t
where exists (
select 1 from tablename
where (a = t.a and b <> t.b) or (a <> t.a and b = t.b)
)
Or if you want them all use UNION ALL:
select a, b
from tablename
group by a, b
having count(*) > 1
union all
select t.a, t.b
from tablename t
where exists (
select 1 from tablename
where (a = t.a and b <> t.b) or (a <> t.a and b = t.b)
)
Update:
If you have an ID column then use EXISTS like this:
select t.*
from tablename t
where exists (
select 1 from tablename
where id <> t.id and (a = t.a or b = t.b)
)
Or if you want just 1 of the duplicates use id > t.id instead of id <> t.id.
See the demo.
Or with a self join:
select t.*
from tablename t inner join tablename tt
on (tt.a = t.a or tt.b = t.b) and tt.id <> t.id
Following solution works :
Another demo with a line that has duplication in a and b
CREATE TEMPORARY TABLE ab_duplicates (
a INTEGER
) AS
SELECT a, count(*) as cnt
FROM tablename
group by a, b
Having cnt > 1;
ALTER TABLE ab_duplicates ADD INDEX (a);
-- Select duplicates for a, but not for a and b
SELECT id, name, a, b
FROM (SELECT x.*, t.id, t.name, t.a, t.b,
#rn := IF(t.a = #a, #rn + 1, 1) rn,
#a := t.a,
ab.a as ab_exists
FROM (select #a := null, #rn := 0) x,
tablename t
LEFT JOIN ab_duplicates ab on ab.a = t.a
ORDER BY a
) a_duplicates
where rn = 2 and ab_exists is null
UNION
-- union duplicates for b, including duplicates for a and b
SELECT id, name, a, b
FROM (SELECT x.*, t.id, t.name, t.a, t.b,
#rn := IF(t.b = #b, #rn + 1, 1) rn,
#b := t.b
FROM (select #b := null, #rn := 0) x,
tablename t
ORDER BY b
) b_and_ab_duplicates
where rn = 2;
Previous solutions that only worked in some edge cases
Using group by and count() :
First finding ids with duplicates for a :
SELECT min(id) id, count(*) cnt from tablename t group by a having cnt > 1
-- this will work better if you have an index starting with a
Same with b :
SELECT min(id) id, count(*) cnt from tablename t group by b having cnt > 1
-- this will work better if you have an index starting with b
First solution :
Union gives you ids where there are duplicates for a or b requires 2 indices)
SELECT min(id) id, count(*) cnt from tablename t group by a having cnt > 1
UNION
SELECT min(id) id, count(*) cnt from tablename t group by b having cnt > 1
Use the ids to filter the table, if you need more data from the table :
SELECT tablename.*
FROM (
SELECT min(id) id, count(*) cnt from tablename t group by a having cnt > 1
UNION
SELECT min(id) id, count(*) cnt from tablename t group by b having cnt > 1
) as ids
JOIN tablename on tablename.id = ids.id
Now this might not use an index, but you can use a temporary table to have one :
First solution, using a temporary table (might be faster) :
-- using a temporary table to set an index
CREATE TEMPORARY TABLE ids (
-- adds an index on id, for the JOIN in the result query
`id` INTEGER PRIMARY KEY
) as
SELECT id
FROM (
-- duplicates on a, requires an index (a) on tablename
SELECT min(id) id, count(*) cnt from tablename t group by a having cnt > 1
-- removes duplicates between both part of the UNION : this might be slow
-- if there cannot be duplicates on a and b at the same time, consider using UNION ALL
UNION
-- duplicates on b, requires an index (b) on tablename
SELECT min(id) id, count(*) cnt from tablename t group by b having cnt > 1
) tempids;
SELECT tablename.*
FROM ids -- using the temporary table, MUST be in the same database connection, will filter duplicates
JOIN tablename on tablename.id = ids.id;
I do not know if setting the index on the temporary table is better then setting one after populating the data :
-- you might want to postpone the index after the ids are set
-- using a temporary table to set an index
CREATE TEMPORARY TABLE ids2 (
`id` INTEGER
) as
SELECT id
FROM (
-- duplicates on a, requires an index (a) on tablename
SELECT min(id) id, count(*) cnt from tablename t group by a having cnt > 1
-- removes duplicates between both part of the UNION : this might be slow
-- if there cannot be duplicates on a and b at the same time, consider using UNION ALL
UNION
-- duplicates on b, requires an index (b) on tablename
SELECT min(id) id, count(*) cnt from tablename t group by b having cnt > 1
) tempids;
ALTER TABLE ids2 ADD INDEX (id);
SELECT tablename.*
FROM ids2 -- using the temporary table, MUST be in the same database connection, will filter duplicates
JOIN tablename on tablename.id = ids2.id;
With mariadb 10.2, or mysql 8 you could use window function (I guess).
Another solution : using vars :
SELECT id, name, a, b, rn
FROM (SELECT *,
#rn := IF(a = #a, #rn + 1, 1) rn,
#a := a
FROM (select #a := null, #rn := 0) x,
tablename
ORDER BY a
) a_duplicates
where rn = 2
UNION
SELECT id, name, a, b, rn
FROM (SELECT *,
#rn := IF(b = #b, #rn + 1, 1) rn,
#b := b
FROM (select #b := null, #rn := 0) x,
tablename
ORDER BY b
) b_duplicates
where rn = 2
Demo : with some extra steps to understand
Edit : this only works if you don t have lines where a and b are duplicates. Which is the case in the example.
I need to make the following query:
I have 4 tables, the first is the main, in which with the 'id' is foreign in the other 3 tables. I need to get the date and description of each of the tables where it presents the id_tabla1. In some tables I have more records than in the other.
Is it possible to relate these tables?
Table 1 main
id_table1
Name
Table 2
id_table2
date
description
fk_table1
Table 3
id_table3
date
description
fk_table1
Table 4
id_table4
date
description
fk_table1
I want to get something like this:
This type of operation is a bit of a pain in MySQL. In fact, the result is not particularly "relational", because each column is a separate list. You can't do a join because there is no join key.
You can generate one in MySQL using variables and then use aggregation. Here is an example with two tables:
select id_table1,
max(t2_date) as t2_date,
max(t2_desc) as t2_desc,
max(t3_date) as t3_date,
max(t3_desc) as t3_desc
from ((select id_table1, NULL as t2_date, NULL as t2_desc, NULL as t3_date, NULL as t3_desc, 1 as rn
from table1 t1
) t1 union all
(select fk_table1, date as t2_date, description as t2_desc, NULL as t3_date, NULL as t3_desc,
(#rn1 := if(#fk1 = fk_table1, #rn1 + 1,
if(#fk1 := fk_table1, 1, 1)
)
) as rn
from table1 t1 cross join
(select #rn1 := 0, #fk1 := 0) params
order by fk_table1, date
) t1 union all
(select fk_table1, NULL, NULL, date as t3_date, description as t3_desc
(#rn2 := if(#fk2 = fk_table1, #rn2 + 1,
if(#fk2 := fk_table1, 1, 1)
)
) as rn
from table1 t1 cross join
(select #rn2 := 0, #fk2 := 0) params
order by fk_table1, date
)
) t
group by id_table1, rn;
I want to find all NULL values in column parameter_id and set them to lowest unused parameter_id.
I have query which will find lowest unused parameter_id, I also know how to get list of NULL values.
SELECT MIN(t1.parameter_id)+1 FROM table AS t1 WHERE NOT EXISTS (SELECT * FROM table AS t2 WHERE t2.parameter_id = t1.parameter_id+1)
I can get list of all rows with parameter_id=NULL, then make query to find current lowest unused parameter_id and then update parameter_id to that lowest unused number. Since table has 50.000 rows, this approach would create thousands of queries (50.000 * 2 per row).
Is there way to run "single query" which will find all parameter_id=NULL and update them all to current lowest unused parameter_id?
Here is table decrtiption (MySQL 5.5):
id (INT) primary key, auto_increment
parameter_id (INT) default NULL
Sample data:
# id, parameter_id
1, NULL
2, 1
3, NULL
4, 5
5, 3
Desired result:
# id, parameter_id
1, 2
2, 1
3, 4
4, 5
5, 3
EDIT:
I distilled what I want to single query. I simply need to run this query until there is 0 rows affected by UPDATE.
UPDATE `table`
SET parameter_id=
(SELECT *
FROM
(SELECT MIN(t1.parameter_id)+1
FROM `table` AS t1
WHERE NOT EXISTS
(SELECT *
FROM `table` AS t2
WHERE t2.parameter_id = t1.parameter_id+1)) AS t4)
WHERE parameter_id IS NULL LIMIT 1
The following enumerates the unused parameter ids:
select t.*, (#rn := #rn + 1) as seqnum
from table t cross join
(select #rn := 0) params
where not exists (select 1 from table t2 where t2.parameter_id = t.id)
order by t.id;
(You might want to put this in a temporary table with an index on seqnum for the subsequent query.)
The problem is getting a join key for the update. Here is a bit of a kludge: I'm going to add a column, enumerate it, and then drop it:
alter table `table` add column null_seqnum;
update `table` t cross join (select #rn1 := 0) params
set null_seqnum = (#rn1 := #rn1 + 1)
where parameter_id is null;
update `table` t join
(select t.*, (#rn := #rn + 1) as seqnum
from `table` t cross join
(select #rn := 0) params
where not exists (select 1 from `table` t2 where t2.parameter_id = t.id)
order by t.id
) tnull
on t.null_seqnum = tnull.seqnum
set t.parameter_id = tnull.id;
alter table `table` drop column null_seqnum;
I have a table, to which I need to add an increment column, however the increment should happen based on the existing values in the other columns.
select * from mytable;
first_col second_col
A B
A C
A D
A E
A B
A D
Now, I want to add another column, say new_column whose value increments uniquely on the basis of the first_col and second_col.
The column should be populated like these :
first_col second_col new_col
A B 1
A C 1
A D 1
A E 1
A B 2
A D 2
A B 3
Is it possible to do this using some sort of an MySQL in built auto increment strategy.
Using a temporary table with an auto_incremented id column you could do
create temporary table tt (
id int auto_increment primary key,
col1 varchar(32),col2 varchar(32));
insert into tt
select col1, col2 from origtable;
select col1, col2,
(select count(*)+1 from tt s
where s.col1=m.col1 and s.col2=m.col2
and s.id<m.id) n
frm tt m
There is no built in increment method in MySQL, but you can do this with either correlated subqueries or variables:
select t.*,
(#rn := if(#c = concat(first_col, ':', second_col), #rn + 1,
#c := concat(first_col, ':', second_col), 1, 1
)
) as new_col
from mytable t cross join
(select #rn := 0, #c := '') params
order by first_col, second_col;
Note: this re-orders the results. If you want the results in the original order, then you need a column that specifies that ordering.
Here's how you can do it, replace col1val and col2val with the values to be inserted.
INSERT INTO mytable (first_col, second_col, new_col)
VALUES (SELECT col1val, col2val SUM(COUNT(*), 1)
FROM mytable
GROUP BY first_col, second_col
HAVING first_col = col1val AND second_col = col2val)
Note that this is an insert query, and will affect only newly inserted values.
Is there any way I can get the actual row number from a query?
I want to be able to order a table called league_girl by a field called score; and return the username and the actual row position of that username.
I'm wanting to rank the users so i can tell where a particular user is, ie. Joe is position 100 out of 200, i.e.
User Score Row
Joe 100 1
Bob 50 2
Bill 10 3
I've seen a few solutions on here but I've tried most of them and none of them actually return the row number.
I have tried this:
SELECT position, username, score
FROM (SELECT #row := #row + 1 AS position, username, score
FROM league_girl GROUP BY username ORDER BY score DESC)
As derived
...but it doesn't seem to return the row position.
Any ideas?
You may want to try the following:
SELECT l.position,
l.username,
l.score,
#curRow := #curRow + 1 AS row_number
FROM league_girl l
JOIN (SELECT #curRow := 0) r;
The JOIN (SELECT #curRow := 0) part allows the variable initialization without requiring a separate SET command.
Test case:
CREATE TABLE league_girl (position int, username varchar(10), score int);
INSERT INTO league_girl VALUES (1, 'a', 10);
INSERT INTO league_girl VALUES (2, 'b', 25);
INSERT INTO league_girl VALUES (3, 'c', 75);
INSERT INTO league_girl VALUES (4, 'd', 25);
INSERT INTO league_girl VALUES (5, 'e', 55);
INSERT INTO league_girl VALUES (6, 'f', 80);
INSERT INTO league_girl VALUES (7, 'g', 15);
Test query:
SELECT l.position,
l.username,
l.score,
#curRow := #curRow + 1 AS row_number
FROM league_girl l
JOIN (SELECT #curRow := 0) r
WHERE l.score > 50;
Result:
+----------+----------+-------+------------+
| position | username | score | row_number |
+----------+----------+-------+------------+
| 3 | c | 75 | 1 |
| 5 | e | 55 | 2 |
| 6 | f | 80 | 3 |
+----------+----------+-------+------------+
3 rows in set (0.00 sec)
SELECT #i:=#i+1 AS iterator, t.*
FROM tablename t,(SELECT #i:=0) foo
Here comes the structure of template I used:
select
/*this is a row number counter*/
( select #rownum := #rownum + 1 from ( select #rownum := 0 ) d2 )
as rownumber,
d3.*
from
( select d1.* from table_name d1 ) d3
And here is my working code:
select
( select #rownum := #rownum + 1 from ( select #rownum := 0 ) d2 )
as rownumber,
d3.*
from
( select year( d1.date ), month( d1.date ), count( d1.id )
from maindatabase d1
where ( ( d1.date >= '2013-01-01' ) and ( d1.date <= '2014-12-31' ) )
group by YEAR( d1.date ), MONTH( d1.date ) ) d3
You can also use
SELECT #curRow := ifnull(#curRow,0) + 1 Row, ...
to initialise the counter variable.
Assuming MySQL supports it, you can easily do this with a standard SQL subquery:
select
(count(*) from league_girl l1 where l2.score > l1.score and l1.id <> l2.id) as position,
username,
score
from league_girl l2
order by score;
For large amounts of displayed results, this will be a bit slow and you will want to switch to a self join instead.
If you just want to know the position of one specific user after order by field score, you can simply select all row from your table where field score is higher than the current user score. And use row number returned + 1 to know which position of this current user.
Assuming that your table is league_girl and your primary field is id, you can use this:
SELECT count(id) + 1 as rank from league_girl where score > <your_user_score>
I found the original answer incredibly helpful but I also wanted to grab a certain set of rows based on the row numbers I was inserting. As such, I wrapped the entire original answer in a subquery so that I could reference the row number I was inserting.
SELECT * FROM
(
SELECT *, #curRow := #curRow + 1 AS "row_number"
FROM db.tableName, (SELECT #curRow := 0) r
) as temp
WHERE temp.row_number BETWEEN 1 and 10;
Having a subquery in a subquery is not very efficient, so it would be worth testing whether you get a better result by having your SQL server handle this query, or fetching the entire table and having the application/web server manipulate the rows after the fact.
Personally my SQL server isn't overly busy, so having it handle the nested subqueries was preferable.
I know the OP is asking for a mysql answer but since I found the other answers not working for me,
Most of them fail with order by
Or they are simply very inefficient and make your query very slow for a fat table
So to save time for others like me, just index the row after retrieving them from database
example in PHP:
$users = UserRepository::loadAllUsersAndSortByScore();
foreach($users as $index=>&$user){
$user['rank'] = $index+1;
}
example in PHP using offset and limit for paging:
$limit = 20; //page size
$offset = 3; //page number
$users = UserRepository::loadAllUsersAndSortByScore();
foreach($users as $index=>&$user){
$user['rank'] = $index+1+($limit*($offset-1));
}