Label groups in SQL - mysql

I have a table like this:
id name
0 Bob
1 Alice
2 Bob
3
4 Bob
5 Mary
6 Alice
I need to assign a group_id to each distinct name:
id name group_id
0 Bob 0 -- Bob's group
1 Alice 1 -- Alice's group
2 Bob 0 -- Bob's group
3 -- no group (NULL)
4 Bob 0 -- Bob's group
5 Mary 2 -- Mary's group
6 Alice 1 -- Alice's group
Can this be done in one line in MySQL?
I know I could find the unique names with an autoincrement column, then JOIN back with the original table based on the name -- but I was wondering if there exists a simpler/faster solution...

Yes, use case. Just add a new computed column based on an expression that outputs the appropriate value for each string value of the name column
Select id, name,
case name
when 'Bob' then 0
when 'Alice' then 1
when 'Mary' then 2
-- etc.
end GroupId
From table
If you don't know the names in advance, or if there are too many, try this:
Select id, name,
(select count(distinct name)
from table
where name < t.Name) groupId
From table t
Unless you add an index on the name column, this will be very slow on a large table.
To output a null instead of a 0 for rows with name = null, use this:
Select id, name,
case when name is null then null
else (select count(distinct name)
from table
where name < t.Name) end groupId
From table t

One method is -- as you suggest -- group by and join. If the numbers do not have to be sequential:
select t.*, minid as group_id
from t join
(select name, min(id) as minid
from t
group by name
) tt
on not t.name <=> tt.name; -- to handle `NULL`
If they do, use variables:
select t.*, minid as group_id
from t join
(select name, min(id) as minid, (#grp := #grp + 1) as group_id
from t cross join
(select #grp := -1) params
group by name
) tt
on not t.name <=> tt.name; -- to handle `NULL`;
You could also do the whole operation with two sorts and variables:
select t.*
from (select t.*,
(#grp := if(not #n <=> name, #grp,
if(#n := name, 1, 1)
)
) as group_id
from t
(select #grp := -1, #n := '') params
order by name
) t
order by id;

Related

Correlated subquery with row number count

I have a table as follows and what I want is to use get the initial row with least id of each uid group.
The table is as follows
_id uid type
1 a a
2 b bbb #satisfied
3 b ccc
4 b aaa #satisfied
5 a aaa #satisfied
6 b eee
I can already get the initial row using the following correlated subquery
SELECT *
FROM table
WHERE _id IN (
SELECT MIN(_id)
FROM table
WHERE type IN ('aaa','bbb')
GROUP BY uid
);
However, I want the 4th column shown the count of rows satisfied the condition (type IN ('aaa','bbb')), as cnt shown below:
_id uid type cnt
5 a aaa 1
2 b bbb 2
I think I can count this use several joins and then join the result to my code...But this is ugly...Is there any elegant way to achieve this...
You can try this:
SELECT t1.*, t2.cnt
FROM table t1 INNER JOIN (
SELECT MIN(_id) AS id, COUNT(_id) AS cnt
FROM table
WHERE type IN ('aaa','bbb')
GROUP BY uid
) t2 ON t1._id = t2.id
ORDER BY t1.uid
If you are running MySQL 8.0, you can just use window functions for this:
select _id, uid, type, cnt
from (
select
t.*,
count(*) over(partition by uid) cnt,
row_number() over(partition by uid order by _id) rn
from mytable t
where type in ('aaa', 'bbb')
) t
where rn = 1
You can do this without a subquery. In MySQL 8+, you can use this logic:
SELECT DISTINCT MIN(_id) OVER (PARTITION BY uid) as _id,
uid,
FIRST_VALUE(type) OVER (PARTITION BY uid ORDER BY _id) as type,
COUNT(*) OVER (PARTITION BY uid) as cnt
FROM table
WHERE type IN ('aaa', 'bbb');
Unfortunately, MySQL doesn't have a "first" aggregation function, but there is a trick if you like:
SELECT MIN(_id) as _id, uid,
SUBSTRING_INDEX(GROUP_CONCAT(type ORDER BY _id), ',', 1) as type,
COUNT(*) as cnt
FROM table
WHERE type IN ('aaa', 'bbb')
GROUP BY uid;

SELECT without duplicates unless interupted by other values

Given the following data:
name | temp
-----------
hoi | 15
hoi | 15
hoi | 16
hoi | 15
hej | 13
hoi | 13
I would like to select the data in the given two columns without duplicates, However I do want to keep duplicates that are duplicates if they where interrupted by another value:
name | temp
-----------
hoi | 15 // selected
hoi | 15 // ignored duplicate
hoi | 15 // ignored duplicate
hoi | 16 // selected
hoi | 15 // selected because while not being unique it follows a different value
hoi | 15 // ignored duplicate
hej | 13 // selected
hoi | 13 // selected
hoi | 13 // ignored duplicate
hoi | 14 // selected
hoi | 13 // selected because while not being unique it follows a different value
This question was hard to formulate for me given English is not my native tongue, Feel free to edit the question or ask for clarifications.
Edit:
There is an id field and a datetime field.
Edit 2:
I use mySQL 5.7
Since you are using MySQL 5.7, which doesn't support analytical functions, you will need to use variables to store the values of temp and name, from the previous row:
SELECT t.ID,
t.Name,
t.Temp
FROM ( SELECT t.*,
IF(#temp = t.temp AND #name = t.Name, 1, 0) AS IsDuplicate,
#temp:= t.temp,
#name:= t.Name
FROM YourTable AS t
CROSS JOIN (SELECT #temp := 0, #name := '') AS v
ORDER BY t.ID
) AS t
WHERE t.IsDuplicate = 0
ORDER BY ID;
Example on DB<>Fiddle
The key parts are (not in the order in which they appear, but in the order in which it is logical to think about it).
(1) Initialise the variables, and order by ID (or whatever field(s) you like) to ensure variables are assigned in the correct order
CROSS JOIN (SELECT #temp := 0, #name := '') AS v
ORDER BY t.ID
(2) Check if the values stored in the variables matches the current row, and flag with a 1 or a 0
IIF(#temp = t.temp AND #name = t.Name, 1, 0) AS IsDuplicate
(3) Assign the values of temp and name in the current row to the variables, so they can be checked against the next row:
#temp:= t.temp,
#name:= t.Name
(4) Remove duplicates from the final data set:
WHERE t.IsDuplicate = 0;
To go one further, you could change the IsDuplicate flag to be a group marker, and use GROUP BY, so you can find out how many records there were in total, while still not displaying duplicates:
SELECT MIN(ID) AS FirstID,
t.Name,
t.Temp,
COUNT(*) AS Records,
MAX(ID) AS LastID
FROM ( SELECT t.*,
#group:= IF(#temp = t.temp AND #name = t.Name, #group, #group + 1) AS GroupID,
#temp:= t.temp,
#name:= t.Name
FROM YourTable AS t
CROSS JOIN (SELECT #temp := 0, #name := '', #group:= 0) AS v
ORDER BY t.ID
) AS t
GROUP BY t.GroupID, t.Name, t.Temp
ORDER BY t.GroupID;
Example on DB<>Fiddle
This may be surplus to requirements, but it can be useful as you are able to extract a lot more information than when just identifying duplicate rows.
Finally if/when you upgrade to version 8.0 or newer, you will be able to use ROW_NUMBER(), or if you move to any other DBMS that supports ROW_NUMBER() (which is most nowadays), then you can use the following:
SELECT MIN(ID) AS FirstID,
t.Name,
t.Temp,
COUNT(*) AS Records,
MAX(ID) AS LastID
FROM ( SELECT t.*,
ROW_NUMBER() OVER(ORDER BY ID) -
ROW_NUMBER() OVER(PARTITION BY Temp, Name ORDER BY ID) AS GroupID
FROM YourTable AS t
ORDER BY t.ID
) AS t
GROUP BY t.GroupID, t.Name, t.Temp
ORDER BY t.GroupID;
Example on DB<>Fiddle
Generic Solution
You can use the following query to do this on any DBMS:
select nd.*
from dedup nd
inner join (
-- find the previous id for each id
select id, (select max(id) from dedup where id < o.id) prev_id
from dedup o
) id_to_prev on id_to_prev.id = nd.id
-- join with the prev row to check for dups
left join dedup d on d.id = id_to_prev.prev_id
and d.name = nd.name
and d.temp = nd.temp
where d.id is null -- if no prev row found with same name+temp, include this row
order by nd.id
SQL Fiddle: http://sqlfiddle.com/#!9/0584ca3/9
If You are using Oracle:
select name, temp from (
select id,
name,
temp,
lag(temp,1,-99999) over (order by id) as temp_prev
from table
order by id) t
where t.temp != t.temp_prev
might work for You (depending on Your Oracle version!), it uses the LAG analytics function to look into previous rows values, creates a temp table then filters it.
create table #temp (name varchar(3),temp int)
insert into #temp values ('hoi',15)
insert into #temp values ('hoi',15)
insert into #temp values ('hoi',15)
insert into #temp values ('hoi',16)
insert into #temp values ('hoi',15)
insert into #temp values ('hoi',15)
insert into #temp values ('hej',13)
insert into #temp values ('hoi',13)
insert into #temp values ('hoi',13)
insert into #temp values ('hoi',14)
insert into #temp values ('hoi',13)
;with FinalResult as (
select ROW_NUMBER()Over(partition by name,temp order by name) RowNumber,*
from #temp
)
select * from FinalResult where RowNumber =1
drop table #temp
You want to look at the previous row in order to decide whether to show a row or not. This would be easy with LAG, available as of MySQL 8. With MySQL 5.7 you need a correlated subquery with LIMIT instead to get the previous row.
select *
from mytable
where not (name, temp) <=>
(
select prev.name, prev.temp
from mytable prev
where prev.id < mytable.id
order by prev.id desc
limit 1
);
Demo: https://dbfiddle.uk/?rdbms=mysql_5.7&fiddle=4c775dbee12298cd93c5087d7085982f

SQL Create Unique Value Flag

There are lots of questions/answers about selecting unique values in a MySQL query but I haven't seen any on creating a unique value flag.
I have a customer_ID that can appear more than once in a query output. I want to create a new column that flags whether the customer_ID is unique or not (0 or 1).
The output should look something like this:
ID | Customer ID | Unique_Flag
1 | 1234 | 1
2 | 2345 | 1
3 | 2345 | 0
4 | 5678 | 1
Please let me know if anybody needs clarifications.
You seem to want to mark the first occurrence as unique, but not others. So, let's join in the comparison value:
select t.*,
(id = min_id) as is_first_occurrence
from t join
(select customer_id, min(id) as min_id
from t
group by customer_id
) tt
on t.customer_id = tt.customer_id;
For most people, a "unique" flag would mean that the overall count is "1", not that this is merely the first appearance. If that is what you want, then you can use similar logic:
select t.*,
(id = min_id) as is_first_occurrence,
(cnt = 1) as is_unique
from t join
(select customer_id, min(id) as min_id, count(*) as cnt
from t
group by customer_id
) tt
on t.customer_id = tt.customer_id;
And, in MySQL 8+, you would use window functions:
select t.*,
(row_number() over (partition by customer_id order by id) = 1) as is_first_occurrence,
(count(*) over (partition by customer_id) = 1) as is_unique
from t;
You can try below
select id,a.customerid, case when cnt=1 then 1 else 0 end as Unique_Flag
from tablename a
left join
(select customerid, count(*) as cnt from tablename
group by customerid
)b on a.customerid=b.customerid
You can use lead function as given below to get the required output.
SELECT ID, CUSTOMER_ID,
CASE
WHEN CUSTOMER_ID != CUSTOMER_ID_NEXT THEN 1
ELSE 0
END AS UNIQUE_FLAG FROM
(SELECT ID, CUSTOMER_ID,LEAD(CUSTOMER_ID, 1, 0) OVER (ORDER BY CUSTOMER_ID) AS CUSTOMER_ID_NEXT FROM TABLE)T

MYSQL count distinct datas depends on if condition

I have really different problem about database query. There is a little bit different scenarios:
I have a table created with 3 columns. They have ID, ItemId, TypeId columns. I need a count query, it should count ItemId and TypeId together but except duplicate columns. For example;
Id ItemId TypeId
-- ------ ------
1 1 1 -> count +1
2 1 1 -> ignore
3 1 2 -> count -1
4 1 2 -> ignore
5 1 1 -> count +1
result count = 1
In the end, if distinct row repeated, count ignore that row. But TypeId data changed for one specific Item it should increase or decrease count. TypeId equals to 1 count +=1, equals to 2 count -=1.
In MySQL, you would seemingly use count(distinct):
select count(distinct itemId, typeId)
from t;
However, you really have a gaps-and-islands problem. You are looking at the ordering to see where things change.
If I trust that the id has no gaps, you can do:
select count(*)
from t left join
t tprev
on t.id = tprev.id + 1
where not ((t.itemId, t.typeid) <=> (tprev.itemId, t.prev.id))
Try the following query. This employs User-defined session variables. It will work in all the cases (including gaps in Id):
SELECT
SUM(dt.factor) AS total_count
FROM
( SELECT
#factor := IF(#item = ItemId AND
#type = TypeId,
0,
IF(TypeID = 2, -1, 1)
) AS factor,
#item := ItemId,
#type := TypeId
FROM your_table
CROSS JOIN (SELECT #item := 0,
#type := 0,
#factor := 0) AS user_init_vars
ORDER BY Id
) AS dt
DB Fiddle DEMO

Getting the second max value of col2 while being grouped by col1

I'm facing a corner here...
The background:
TABLE myrecord (
id int # primary key
name varchar(32) # test name
d_when varchar(8) # date in yyyymmdd string format
)
Content:
id name d_when
100 Alan 20110201
101 Dave 20110304
102 Alan 20121123
103 Alan 20131001
104 Dave 20131002
105 Bob 20131004
106 Mike 20131101
In layman terms, I want to figure out who is a "returner" and when was his last (i.e., 'penultimate') visit.
something like the over enthusiastic:
SELECT SECOND_MAX(id), CORRESPONDING(d_when)
FROM myrecord
GROUP BY name
HAVING count(name)>1;
result expected:
101 Dave 20110304
102 Alan 20121123
I tried the following so far.
SELECT T1.id, t1.name, T1.d_when
FROM myrecord t1
WHERE id IN (SELECT MAX(id),
COUNT(id) cn
WHERE cn>1
ORDER BY d_when DESC)
but something is clearly not right here.
Here's one way...
SELECT x.*
FROM my_table x
JOIN my_table y
ON y.name=x.name
AND y.d_when >= x.d_when
GROUP
BY x.name
, x.d_when
HAVING COUNT(*) = 2;
why is the second last id necessary?
if it's not:
SELECT MAX(id), name, MAX(d_when)
FROM myrecord
GROUP BY name
HAVING count(name)>1
if it is:
SELECT name, max(id), max(d_when)
FROM myrecord
WHERE
-- get only the names that have more then one occurrence
name in (
SELECT name
FROM myrecord
GROUP BY name
HAVING COUNT(*) > 1
)
-- filter out the max id's
AND id NOT IN(
SELECT max(id)
FROM myrecord
GROUP BY name
)
GROUP BY name
or even better (thanks to #Andomar for the mention):
SELECT name, max(id), max(d_when)
FROM myrecord
WHERE
-- filter out the max id's, this will also filter out those with one record
AND id NOT IN(
SELECT max(id)
FROM myrecord
GROUP BY name
)
GROUP BY name
In MySQL, for retrieving all those who have made second visit and their second visited date.
Query:
SELECT *
FROM
(
SELECT
#ID:=CASE WHEN #Name <> Name THEN 1 ELSE #ID+1 END AS rn,
#Name:=Name AS Name,
ID, d_when
FROM
(SELECT ID, Name, d_when
FROM myrecord
ORDER BY Name asc, d_when asc
) rec1, (SELECT #ID:= 0) rec1_id, (SELECT #Name:= 0) rec1_nm
) rec
where rec.rn=2
Output:
rn Name ID d_when
2 Dave 104 20131002
2 Alan 102 20121123
Assuming the id column ascends over time, you can select the second-highest d_when per name like:
select name
, d_when
from YourTable yt1
where id in
(
select max(id)
from YourTable yt2
where id not in
(
select max(id)
from YourTable yt3
group by
name
)
group by
name
)
select * from
myrecord
where id in (
SELECT max(id)
FROM myrecord
WHERE id not in (SELECT MAX(id)
FROM myrecord
GROUP BY name
HAVING count(name)>1)
group by name )