SQL Create Unique Value Flag - mysql

There are lots of questions/answers about selecting unique values in a MySQL query but I haven't seen any on creating a unique value flag.
I have a customer_ID that can appear more than once in a query output. I want to create a new column that flags whether the customer_ID is unique or not (0 or 1).
The output should look something like this:
ID | Customer ID | Unique_Flag
1 | 1234 | 1
2 | 2345 | 1
3 | 2345 | 0
4 | 5678 | 1
Please let me know if anybody needs clarifications.

You seem to want to mark the first occurrence as unique, but not others. So, let's join in the comparison value:
select t.*,
(id = min_id) as is_first_occurrence
from t join
(select customer_id, min(id) as min_id
from t
group by customer_id
) tt
on t.customer_id = tt.customer_id;
For most people, a "unique" flag would mean that the overall count is "1", not that this is merely the first appearance. If that is what you want, then you can use similar logic:
select t.*,
(id = min_id) as is_first_occurrence,
(cnt = 1) as is_unique
from t join
(select customer_id, min(id) as min_id, count(*) as cnt
from t
group by customer_id
) tt
on t.customer_id = tt.customer_id;
And, in MySQL 8+, you would use window functions:
select t.*,
(row_number() over (partition by customer_id order by id) = 1) as is_first_occurrence,
(count(*) over (partition by customer_id) = 1) as is_unique
from t;

You can try below
select id,a.customerid, case when cnt=1 then 1 else 0 end as Unique_Flag
from tablename a
left join
(select customerid, count(*) as cnt from tablename
group by customerid
)b on a.customerid=b.customerid

You can use lead function as given below to get the required output.
SELECT ID, CUSTOMER_ID,
CASE
WHEN CUSTOMER_ID != CUSTOMER_ID_NEXT THEN 1
ELSE 0
END AS UNIQUE_FLAG FROM
(SELECT ID, CUSTOMER_ID,LEAD(CUSTOMER_ID, 1, 0) OVER (ORDER BY CUSTOMER_ID) AS CUSTOMER_ID_NEXT FROM TABLE)T

Related

SQL: select a count distinct for entries with higher ID than previous and conditions met

Say that I have the following data in a table:
ID ENTRY NAME ENTRY_ID
6 REMOVE ALICE 333
5 ADD JOHN 333
4 REMOVE JOHN 222
3 ADD ALICE 222
2 ADD AMANDA 111
1 ADD JOHN 111
I am trying to get a count for all who has an "ADD" in their latest entry which is determined by having a higher number in the "ENTRY_ID".
So in this case the count I am looking for is going to be 2, as "JOHN" in 333 has an "ADD" and "AMANDA" in 111 has an "ADD" - and none of the two has a higher ENTRY_ID with "REMOVE", as is the case with "ALICE", who is not suppose to be counted as her newest (highest) ENTRY_ID is a "REMOVE".
How can I most easily achieve this?
You can use window functions:
select count(*)
from (
select t.*, row_number() over(partition by name order by entry_id) rn
from mytbale t
) t
where rn = 1 and entry = 'ADD'
Or using first_value():
select count(*) cnt
from (
select t.*, first_value(entry) over(partition by name order by entry_id desc) last_entry
from mytbale t
) t
where last_entry = 'ADD'
This requires MySQL 8.0. In earlier versions, one option uses a correlated subquery for filtering:
select count(*)
from mytable t
where
t.entry = 'ADD'
and t.entry_id = (select max(t1.entry_id) from mytable t1 where t1.name = t.name)
You can get the list using aggregation:
select name
from t
group by name
having max(entry_id) = max(case when entry = 'ADD' then entry_id end);
This gets all names where the entry id of "ADD" matches the last entry id.
You can use a subquery and get the count:
select count(*)
from (select name
from t
group by name
having max(entry_id) = max(case when entry = 'ADD' then entry_id end)
) t;
Otherwise, I might suggest a correlated subquery:
select count(*)
from t
where t.entry = 'ADD' and
t.entry_id = (select max(t2.entry_id) from t t2 where t2.name = t.name);

Count repeated value from a column for each row with same value in mysql

+------+-------+
| name | value |
+======+=======+
| 5 | 0 |
+------+-------+
| 4 | 0 |
+------+-------+
| 3 | 1 |
+------+-------+
| 4 | 1 |
+------+-------+
| 4 | 1 |
+------+-------+
| 5 | 0 |
+------+-------+
I want to obtain the most repeated value for each name in part.
name 5 have the most repeated value 0
name 4 have the most repeated value 1
name 3 have the most repeated value 1
How can i do that in a single query to mysql ?
Thanks
SOLVED
With the select statement from #nvidot and another posts from SO, i found this is a common problem with this type of query.
Newer versions of MySQL come with ONLY_FULL_GROUP_BY enabled by default, and many of the solutions here will fail in testing with this condition.
So the working formula for me was:
SELECT DISTINCT t1.name, MAX(t1.occ), MAX(t2.value)
FROM (select name, value, count(*) as occ from `table` group by name, value order by occ desc) AS t1
JOIN (select name, adstatus, count(*) as occ from `table` group by name, value order by occ desc) AS t2 ON t2.name = t1.name AND t2.occ = (
SELECT MAX(occ) FROM (select name, value, count(*) as occ from `table` group by name, value order by occ desc) t3 WHERE t3.name = t1.name
)
GROUP BY t1.name;
In Oracle's PL/Sql, there is a specific feature can satisfy your request which is called Window function, but in MySql, there is no such thing untile mysql-8.0
SELECT `column`,
COUNT(`column`) AS `value_occurrence`
FROM `my_table`
GROUP BY `column`
ORDER BY `value_occurrence` DESC
Also please visit this link to more clear.
select name, val
from (select name, val, max(occ)
from (select name, val, count(*) as occ
from `sample` group by name, val
order by occ desc) as groups
group by name) as maximums;
Outer most select serves as cosmetic to display only name and val
order by occ desc serves to obtain the correct val
The following might be sufficient:
select name, val
from (select name, val, count(*) as occ
from `sample`
group by name, val
order by occ desc) as groups
group by name;
[edit]: The following should not trigger error as it does not use non-aggregate column and it does not rely on order by. Multiple name rows might exists if there exists multiples maxs for name/val count.
select name,val
from (select name as maxname, max(occ) as maxocc
from (select name, val, count(*) as occ
from `sample`
group by name, val) as counts2
group by name) as maxs
join (select name, val, count(*) as numocc
from `sample`
group by name, val) as counts1
on name = maxname AND numocc = maxocc;

SQL/MySQL DELETE all rows EXCEPT 2 of them

I have a database table setup like this:
id | code | group_id | status ---
---|-------|---------|------------
1 | abcd1 | group_1 | available
2 | abcd2 | group_1 | available
3 | adsd3 | group_1 | available
4 | dfgd4 | group_1 | available
5 | vfcd5 | group_1 | available
6 | bgcd6 | group_2 | available
7 | abcd7 | group_2 | available
8 | ahgf8 | group_2 | available
9 | dfgd9 | group_2 | available
10 | qwer6 | group_2 | available
In the example above, each group_id has 5 total rows (arbitrary for example, total rows will be dynamic and vary), I need to remove every row that matches available in status except for 2 of them (which 2 does not matter, as long as there are 2 of them remaining)
Basically every unique group_id should only have 2 total rows with status of available. I am able to do a simple SQL query to remove all of them, but struggling to come up with a SQL query to remove all except for 2 ... please helppppp :)
If code is unique, you can use subqueries to keep the "min" and "max"
DELETE FROM t
WHERE t.status = 'available'
AND (t.group_id, t.code) NOT IN (
SELECT group_id, MAX(code)
FROM t
WHERE status = 'available'
GROUP BY group_id
)
AND (t.group_id, t.code) NOT IN (
SELECT group_id, MIN(code)
FROM t
WHERE status = 'available'
GROUP BY group_id
)
Similarly, with an auto increment id:
DELETE FROM t
WHERE t.status = 'available'
AND t.id NOT IN (
SELECT MAX(id) FROM t WHERE status = 'available' GROUP BY group_id
UNION
SELECT MIN(id) FROM t WHERE status = 'available' GROUP BY group_id
)
I reworked the subquery into a UNION instead in this version, but the "AND" format would work just as well too. Also, if "code" was unique across the whole table, the NOT IN could be simplified down to excluding the group_id as well (though it would still be needed in the subqueries' GROUP BY clauses).
Edit: MySQL doesn't like subqueries referencing tables being UPDATEd/DELETEd in the WHERE of the query doing the UPDATE/DELETE; in those cases, you can usually double-wrap the subquery to give it an alias, causing MySQL to treat it as a temporary table (behind the scenes).
DELETE FROM t
WHERE t.status = 'available'
AND t.id NOT IN (
SELECT * FROM (
SELECT MAX(id) FROM t WHERE status = 'available' GROUP BY group_id
UNION
SELECT MIN(id) FROM t WHERE status = 'available' GROUP BY group_id
) AS a
)
Another alternative, I don't recall if MySQL complains as much about joins in DELETE/UPDATE....
DELETE t
FROM t
LEFT JOIN (
SELECT MIN(id) AS minId, MAX(id) AS maxId, 1 AS keep_flag
FROM t
WHERE status = 'available'
GROUP BY group_id
) AS tKeep ON t.id IN (tKeep.minId, tKeep.maxId)
WHERE t.status = 'available'
AND tKeep.keep_flag IS NULL
To keep the min and max ids, I think a join is the simplest solution:
DELETE t
FROM t LEFT JOIN
(SELECT group_id, MIN(id) as min_id, MAX(id) as max_id
FROM t
WHERE t.status = 'available'
GROUP BY group_id
) tt
ON t.id IN (tt.min_id, tt.max_id)
WHERE t.status = 'available' AND
tt.group_id IS NULL;
If the column "id" is the PRIMARY KEY or a UNIQUE KEY, then we could use a correlated subquery to get the second lowest value for a particular group_id.
We could then use that to identify rows for group_id that have higher values of the "id" column.
A query something like this:
SELECT t.`id`
, t.`group_id`
FROM `setup_like_this` t
WHERE t.`status` = 'available'
AND t.`id`
> ( SELECT s.`id`
FROM `setup_like_this` s
WHERE s.`status` = 'available'
AND s.`group_id` = t.`group_id`
ORDER
BY s.`id`
LIMIT 1,1
)
We test that as a SELECT first, to examine the rows that are returned. When we are satisfied this query is returning the set of rows we want to delete, we can replace SELECT ... FROM with DELETE t.* FROM to convert it to a DELETE statement to remove the rows.
Error 1093 encountered converting to DELETE statement.
One workaround is to make the query above into a inline view, and then join to the target table
DELETE q.*
FROM `setup_like_this` q
JOIN ( -- inline view, query from above returns `id` of rows we want to delete
SELECT t.`id`
, t.`group_id`
FROM `setup_like_this` t
WHERE t.`status` = 'available'
AND t.`id`
> ( SELECT s.`id`
FROM `setup_like_this` s
WHERE s.`status` = 'available'
AND s.`group_id` = t.`group_id`
ORDER
BY s.`id`
LIMIT 1,1
)
) r
ON r.id = q.id
select id, code, group_id, status
from (
select id, code, group_id, status
, ROW_NUMBER() OVER (
PARTITION BY group_id
ORDER BY id DESC) row_num
) rownum
from a
) q
where rownum < 3

SQL DISTINCT EXISTS GROUP BY aggregate function

Does a relational database exist that has a GROUP BY aggregate function such as DISTINCT EXISTS that returns TRUE if there is more than one distinct value for the group and FALSE otherwise? I am looking for something that would iterate through the values in the group until the current value is not the same as the previous value, instead of counting ALL of the distinct values.
Example:
pv_name | time_stamp | value
A | 1 | 1
B | 2 | 1
C | 3 | 1
A | 4 | 2
C | 5 | 2
B | 6 | 3
SELECT pv_name
FROM example
WHERE time_stamp > 0 AND time_stamp < 6
GROUP BY pv_name
HAVING DISTINCT_EXISTS(value);
Result: A, C
SELECT pv_name
FROM example
WHERE time_stamp > 0 AND time_stamp < 6
GROUP BY pv_name
HAVING MIN(value)<>MAX(value);
Might get you there quicker depending on indexes. I don't think you'll do much better than this or COUNT(DISTINCT value) though.
Have you tried joining to example twice?
Psuedo-code example:
with
(
SELECT pv_name
FROM example
WHERE time_stamp > 0 AND time_stamp < 6
) as Q
select distinct Q1.pv_name
from Q as Q1 inner join Q as Q2 on
Q1.pv_name=Q2.pv_name and
Q1.value<>q2.value
You probably know about the COUNT(DISTINCT) function and you want to avoid it to prevent unnecessary computations.
It is hard to know why you are looking for this but I assume that it takes long time to find these groups using the most obvious query:
SELECT type, COUNT(DISTINCT product)
FROM aTable
GROUP BY type
HAVING COUNT(DISTINCT product) > 1
I can recommend you try the window functions. Try for example the new T-SQL's LAST_VALUE and FIRST_VALUE functions:
with c as (
SELECT type
,LAST_VALUE(product) OVER (PARTITION BY type ORDER BY product) lv
,FIRST_VALUE(product) OVER (PARTITION BY type ORDER BY product) pv
FROM aTable
)
SELECT * from c where lv <> pv
If the DB engine is smart enough it will find the first/last value for the group and will not try to count all the values, and therefore perform better.
For MySQL you can use helper variables to get the row_number per group based on the distinct values, something like this:
SELECT type, product
FROM (
SELECT #row_num := IF(#prev_type=type and #prev_prod=product,#row_num+1,1) AS RowNumber
,type
,product
,#prev_type := type
,#prev_prod := product
FROM Person,
(SELECT #row_num := 1) x,
(SELECT #prev_type := '') y,
(SELECT #prev_prod := '') z
ORDER BY type, product
) as a
WHERE RowNumber > 1
I think the having min (value) <> max (value) will be most efficient here. An alternative is:
Select distinct pv_name
From example e
Left join (
Select value
From example
Where ...
Group by value
Having count (*) = 1
) s on e.value = s.value
Where s.value is null
Or you could use NOT EXISTS against that subquery instead.
Include the relevant where clause in the sub query.

Fetch 2nd Higest value from MySql DB with GROUP BY

I have a table tbl_patient and I want to fetch last 2 visit of each patient in order to compare whether patient condition is improving or degrading.
tbl_patient
id | patient_ID | visit_ID | patient_result
1 | 1 | 1 | 5
2 | 2 | 1 | 6
3 | 2 | 3 | 7
4 | 1 | 2 | 3
5 | 2 | 3 | 2
6 | 1 | 3 | 9
I tried the query below to fetch the last visit of each patient as,
SELECT MAX(id), patient_result FROM `tbl_patient` GROUP BY `patient_ID`
Now i want to fetch the 2nd last visit of each patient with query but it give me error
(#1242 - Subquery returns more than 1 row)
SELECT id, patient_result FROM `tbl_patient` WHERE id <(SELECT MAX(id) FROM `tbl_patient` GROUP BY `patient_ID`) GROUP BY `patient_ID`
Where I'm wrong
select p1.patient_id, p2.maxid id1, max(p1.id) id2
from tbl_patient p1
join (select patient_id, max(id) maxid
from tbl_patient
group by patient_id) p2
on p1.patient_id = p2.patient_id and p1.id < p2.maxid
group by p1.patient_id
id11 is the ID of the last visit, id2 is the ID of the 2nd to last visit.
Your first query doesn't get the last visits, since it gives results 5 and 6 instead of 2 and 9.
You can try this query:
SELECT patient_ID,visit_ID,patient_result
FROM tbl_patient
where id in (
select max(id)
from tbl_patient
GROUP BY patient_ID)
union
SELECT patient_ID,visit_ID,patient_result
FROM tbl_patient
where id in (
select max(id)
from tbl_patient
where id not in (
select max(id)
from tbl_patient
GROUP BY patient_ID)
GROUP BY patient_ID)
order by 1,2
SELECT id, patient_result FROM `tbl_patient` t1
JOIN (SELECT MAX(id) as max, patient_ID FROM `tbl_patient` GROUP BY `patient_ID`) t2
ON t1.patient_ID = t2.patient_ID
WHERE id <max GROUP BY t1.`patient_ID`
There are a couple of approaches to getting the specified resultset returned in a single SQL statement.
Unfortunately, most of those approaches yield rather unwieldy statements.
The more elegant looking statements tend to come with poor (or unbearable) performance when dealing with large sets. And the statements that tend to have better performance are more un-elegant looking.
Three of the most common approaches make use of:
correlated subquery
inequality join (nearly a Cartesian product)
two passes over the data
Here's an approach that uses two passes over the data, using MySQL user variables, which basically emulates the analytic RANK() OVER(PARTITION ...) function available in other DBMS:
SELECT t.id
, t.patient_id
, t.visit_id
, t.patient_result
FROM (
SELECT p.id
, p.patient_id
, p.visit_id
, p.patient_result
, #rn := if(#prev_patient_id = patient_id, #rn + 1, 1) AS rn
, #prev_patient_id := patient_id AS prev_patient_id
FROM tbl_patients p
JOIN (SELECT #rn := 0, #prev_patient_id := NULL) i
ORDER BY p.patient_id DESC, p.id DESC
) t
WHERE t.rn <= 2
Note that this involves an inline view, which means there's going to be a pass over all the data in the table to create a "derived tabled". Then, the outer query will run against the derived table. So, this is essentially two passes over the data.
This query can be tweaked a bit to improve performance, by eliminating the duplicated value of the patient_id column returned by the inline view. But I show it as above, so we can better understand what is happening.
This approach can be rather expensive on large sets, but is generally MUCH more efficient than some of the other approaches.
Note also that this query will return a row for a patient_id if there is only one id value exists for that patient; it does not restrict the return to just those patients that have at least two rows.
It's also possible to get an equivalent resultset with a correlated subquery:
SELECT t.id
, t.patient_id
, t.visit_id
, t.patient_result
FROM tbl_patients t
WHERE ( SELECT COUNT(1) AS cnt
FROM tbl_patients p
WHERE p.patient_id = t.patient_id
AND p.id >= t.id
) <= 2
ORDER BY t.patient_id ASC, t.id ASC
Note that this is making use of a "dependent subquery", which basically means that for each row returned from t, MySQL is effectively running another query against the database. So, this will tend to be very expensive (in terms of elapsed time) on large sets.
As another approach, if there are relatively few id values for each patient, you might be able to get by with an inequality join:
SELECT t.id
, t.patient_id
, t.visit_id
, t.patient_result
FROM tbl_patients t
LEFT
JOIN tbl_patients p
ON p.patient_id = t.patient_id
AND t.id < p.id
GROUP
BY t.id
, t.patient_id
, t.visit_id
, t.patient_result
HAVING COUNT(1) <= 2
Note that this will create a nearly Cartesian product for each patient. For a limited number of id values for each patient, this won't be too bad. But if a patient has hundreds of id values, the intermediate result can be huge, on the order of (O)n**2.
Try this..
SELECT id, patient_result FROM tbl_patient AS tp WHERE id < ((SELECT MAX(id) FROM tbl_patient AS tp_max WHERE tp_max.patient_ID = tp.patient_ID) - 1) GROUP BY patient_ID
Why not use simply...
GROUP BY `patient_ID` DESC LIMIT 2
... and do the rest in the next step?