SQL DISTINCT EXISTS GROUP BY aggregate function - mysql

Does a relational database exist that has a GROUP BY aggregate function such as DISTINCT EXISTS that returns TRUE if there is more than one distinct value for the group and FALSE otherwise? I am looking for something that would iterate through the values in the group until the current value is not the same as the previous value, instead of counting ALL of the distinct values.
Example:
pv_name | time_stamp | value
A | 1 | 1
B | 2 | 1
C | 3 | 1
A | 4 | 2
C | 5 | 2
B | 6 | 3
SELECT pv_name
FROM example
WHERE time_stamp > 0 AND time_stamp < 6
GROUP BY pv_name
HAVING DISTINCT_EXISTS(value);
Result: A, C

SELECT pv_name
FROM example
WHERE time_stamp > 0 AND time_stamp < 6
GROUP BY pv_name
HAVING MIN(value)<>MAX(value);
Might get you there quicker depending on indexes. I don't think you'll do much better than this or COUNT(DISTINCT value) though.
Have you tried joining to example twice?
Psuedo-code example:
with
(
SELECT pv_name
FROM example
WHERE time_stamp > 0 AND time_stamp < 6
) as Q
select distinct Q1.pv_name
from Q as Q1 inner join Q as Q2 on
Q1.pv_name=Q2.pv_name and
Q1.value<>q2.value

You probably know about the COUNT(DISTINCT) function and you want to avoid it to prevent unnecessary computations.
It is hard to know why you are looking for this but I assume that it takes long time to find these groups using the most obvious query:
SELECT type, COUNT(DISTINCT product)
FROM aTable
GROUP BY type
HAVING COUNT(DISTINCT product) > 1
I can recommend you try the window functions. Try for example the new T-SQL's LAST_VALUE and FIRST_VALUE functions:
with c as (
SELECT type
,LAST_VALUE(product) OVER (PARTITION BY type ORDER BY product) lv
,FIRST_VALUE(product) OVER (PARTITION BY type ORDER BY product) pv
FROM aTable
)
SELECT * from c where lv <> pv
If the DB engine is smart enough it will find the first/last value for the group and will not try to count all the values, and therefore perform better.
For MySQL you can use helper variables to get the row_number per group based on the distinct values, something like this:
SELECT type, product
FROM (
SELECT #row_num := IF(#prev_type=type and #prev_prod=product,#row_num+1,1) AS RowNumber
,type
,product
,#prev_type := type
,#prev_prod := product
FROM Person,
(SELECT #row_num := 1) x,
(SELECT #prev_type := '') y,
(SELECT #prev_prod := '') z
ORDER BY type, product
) as a
WHERE RowNumber > 1

I think the having min (value) <> max (value) will be most efficient here. An alternative is:
Select distinct pv_name
From example e
Left join (
Select value
From example
Where ...
Group by value
Having count (*) = 1
) s on e.value = s.value
Where s.value is null
Or you could use NOT EXISTS against that subquery instead.
Include the relevant where clause in the sub query.

Related

SQL select maximum number of duplicates value in a column

Here I have this table:
Copies
nInv | Subject | LoanDate | BookCode |MemberCode|
1 |Storia |15/04/2019 00:00:00 |7844455544| 1 |
2 |Geografia |12/09/2020 00:00:00 |8004554785| 4 |
4 |Francese |17/05/2006 00:00:00 |8004894886| 3 |
5 |Matematica |17/06/2014 00:00:00 |8004575185| 3 |
I'm trying to find the value of the highest number of duplicates in the MemberCode column. So in this case I should get 3 as result, as its value appears two times in the table. Also, MemberCode is PK in another table, so ideally I should select all rows of the second table that match the MemberCode in both tables. For the second part I guess I should write something like SELECT * FROM Table2, Copies WHERE Copies.MemberCode = Table2.MemberCode but I'm missing out almost everything on the first part. Can you guys help me?
Use group by and limit:
select membercode, count(*) as num
from t
group by membercode
order by count(*) desc
limit 1;
SELECT MAX(counted) FROM
(SELECT COUNT(MemberCode) AS counted
FROM table_name GROUP BY MemberCode)
Using analytic functions, we can assign a rank to each member code based on its count. Then, we can figure out what its count is.
WITH cte AS (
SELECT t2.MemberCode, COUNT(*) AS cnt,
RANK() OVER (ORDER BY COUNT(*) DESC, t2.MemberCode) rnk
FROM Table2 t2
INNER JOIN Copies c ON c.MemberCode = t2.MemberCode
GROUP BY t2.MemberCode
)
SELECT cnt
FROM cte
WHERE rnk = 1;
Something like this
with top_dupe_member_cte as (
select top(1) MemberCode, Count(*)
from MemberTable
group by MemberCode
order by 2 desc)
select /* columns from your other table */
from OtherTable ot
join top_dupe_member_cte dmc on ot.MemberCode=dmc.MemberCode;

SQL Create Unique Value Flag

There are lots of questions/answers about selecting unique values in a MySQL query but I haven't seen any on creating a unique value flag.
I have a customer_ID that can appear more than once in a query output. I want to create a new column that flags whether the customer_ID is unique or not (0 or 1).
The output should look something like this:
ID | Customer ID | Unique_Flag
1 | 1234 | 1
2 | 2345 | 1
3 | 2345 | 0
4 | 5678 | 1
Please let me know if anybody needs clarifications.
You seem to want to mark the first occurrence as unique, but not others. So, let's join in the comparison value:
select t.*,
(id = min_id) as is_first_occurrence
from t join
(select customer_id, min(id) as min_id
from t
group by customer_id
) tt
on t.customer_id = tt.customer_id;
For most people, a "unique" flag would mean that the overall count is "1", not that this is merely the first appearance. If that is what you want, then you can use similar logic:
select t.*,
(id = min_id) as is_first_occurrence,
(cnt = 1) as is_unique
from t join
(select customer_id, min(id) as min_id, count(*) as cnt
from t
group by customer_id
) tt
on t.customer_id = tt.customer_id;
And, in MySQL 8+, you would use window functions:
select t.*,
(row_number() over (partition by customer_id order by id) = 1) as is_first_occurrence,
(count(*) over (partition by customer_id) = 1) as is_unique
from t;
You can try below
select id,a.customerid, case when cnt=1 then 1 else 0 end as Unique_Flag
from tablename a
left join
(select customerid, count(*) as cnt from tablename
group by customerid
)b on a.customerid=b.customerid
You can use lead function as given below to get the required output.
SELECT ID, CUSTOMER_ID,
CASE
WHEN CUSTOMER_ID != CUSTOMER_ID_NEXT THEN 1
ELSE 0
END AS UNIQUE_FLAG FROM
(SELECT ID, CUSTOMER_ID,LEAD(CUSTOMER_ID, 1, 0) OVER (ORDER BY CUSTOMER_ID) AS CUSTOMER_ID_NEXT FROM TABLE)T

Can't get the right column value for a query containing max()

I have a set of data like :
Nm | item | type | value
21 | 19 | A | 15
22 | 40 | B | 10
21 | 20 | A | 80
32 | 40 | C | 40
I tried several queries and i always get : (for the record Nm = 21)
Nm | item | type | max(value)
21 | 19 | A | 80
which is not what i want ,since the max value is from the item = 20
select
* from table t1 where nm=21
order by value desc
limit 1
You need to find row which is having maximum value for particular nm. For that you need to lookup each nm and find maximum value in sub query and then compare that maximum value with main query.
Query:
select *
from item_table it_o
where it_o.value in
(select max(value)
from item_table it_i
where it_i.nm=it_o.nm)
Output:
nm item type value
22 40 B 10
21 20 A 80
32 40 C 40
SELECT Nm, item, type, value
FROM ( SELECT Nm, MAX( value ) AS value
FROM YourTable
GROUP
BY Nm ) AS m
NATURAL JOIN YourTable
WHERE Nm = 21;
I've been asked to provide an explanation so here goes:
First, you need to find the maximum value (you haven't given a table name so I'm going to use YourTable):
SELECT MAX( value ) AS value
FROM YourTable
WHERE item = 21
Second, you want to project all attributes which requires joining the table expression above back to YourTable but we can't do that because we haven't projected the Nm attribute.
It's tempting to think we can simply project the attribute:
SELECT Nm, MAX( value ) AS value
FROM YourTable
WHERE item = 21
However, this makes SQL barf. To make SQL happy we must say which columns we are summarizing by (no matter how obvious it is!) using SQL's rather clunky GROUP BY syntax:
SELECT Nm, MAX( value ) AS value
FROM YourTable
WHERE item = 21
GROUP
BY Nm
Now we can join back to YourTable but again things aren't so simple:
SELECT Nm, item, type, value
FROM ( SELECT Nm, MAX( value ) AS value
FROM YourTable
WHERE Nm = 21
GROUP
BY Nm )
NATURAL JOIN YourTable;
Again, SQL barfs because we haven't given our derived table a name. Now you may be wondering, what is the point of giving it a name if we are using NATURAL JOIN, of which one of its advantages over, say, INNER JOIN is that we don't need range variables? Well, there is no point, it is not needed. However, the SQL Standards declared it is required. Therefore, we are forced to include a name, pointless though it is:
SELECT Nm, item, type, value
FROM ( SELECT Nm, MAX( value ) AS value
FROM YourTable
WHERE Nm = 21
GROUP
BY Nm ) AS pointless_name
NATURAL JOIN YourTable;
Note my SQL code above is different: one applies one's experience to change the structure of the query to make it generally more useful (sorry, I don't have an explanation beyond intuition for this!).
You have to do order by item desc so max value come first then select only one row by doing limit 1
SELECT * FROM TABLE ORDER BY VALUE DESC LIMIT 1
or you can select max id in subquery and select that id in main query (this query can return multiple rows)
SELECT * FROM TABLE WHERE VALUE IN (SELECT MAX(VALUE) FROM TABLE)
You can try like this
select * from tablename where value = (SELECT MAX(value) FROM tablename )
OR
select top 1 * from tablename order by value desc
SELECT a.*
FROM YourTable a
JOIN
( SELECT nm
, MAX(value) value
FROM YourTable
GROUP
BY nm
) b
ON b.nm = a.nm
AND b.value = a.value

Fetch 2nd Higest value from MySql DB with GROUP BY

I have a table tbl_patient and I want to fetch last 2 visit of each patient in order to compare whether patient condition is improving or degrading.
tbl_patient
id | patient_ID | visit_ID | patient_result
1 | 1 | 1 | 5
2 | 2 | 1 | 6
3 | 2 | 3 | 7
4 | 1 | 2 | 3
5 | 2 | 3 | 2
6 | 1 | 3 | 9
I tried the query below to fetch the last visit of each patient as,
SELECT MAX(id), patient_result FROM `tbl_patient` GROUP BY `patient_ID`
Now i want to fetch the 2nd last visit of each patient with query but it give me error
(#1242 - Subquery returns more than 1 row)
SELECT id, patient_result FROM `tbl_patient` WHERE id <(SELECT MAX(id) FROM `tbl_patient` GROUP BY `patient_ID`) GROUP BY `patient_ID`
Where I'm wrong
select p1.patient_id, p2.maxid id1, max(p1.id) id2
from tbl_patient p1
join (select patient_id, max(id) maxid
from tbl_patient
group by patient_id) p2
on p1.patient_id = p2.patient_id and p1.id < p2.maxid
group by p1.patient_id
id11 is the ID of the last visit, id2 is the ID of the 2nd to last visit.
Your first query doesn't get the last visits, since it gives results 5 and 6 instead of 2 and 9.
You can try this query:
SELECT patient_ID,visit_ID,patient_result
FROM tbl_patient
where id in (
select max(id)
from tbl_patient
GROUP BY patient_ID)
union
SELECT patient_ID,visit_ID,patient_result
FROM tbl_patient
where id in (
select max(id)
from tbl_patient
where id not in (
select max(id)
from tbl_patient
GROUP BY patient_ID)
GROUP BY patient_ID)
order by 1,2
SELECT id, patient_result FROM `tbl_patient` t1
JOIN (SELECT MAX(id) as max, patient_ID FROM `tbl_patient` GROUP BY `patient_ID`) t2
ON t1.patient_ID = t2.patient_ID
WHERE id <max GROUP BY t1.`patient_ID`
There are a couple of approaches to getting the specified resultset returned in a single SQL statement.
Unfortunately, most of those approaches yield rather unwieldy statements.
The more elegant looking statements tend to come with poor (or unbearable) performance when dealing with large sets. And the statements that tend to have better performance are more un-elegant looking.
Three of the most common approaches make use of:
correlated subquery
inequality join (nearly a Cartesian product)
two passes over the data
Here's an approach that uses two passes over the data, using MySQL user variables, which basically emulates the analytic RANK() OVER(PARTITION ...) function available in other DBMS:
SELECT t.id
, t.patient_id
, t.visit_id
, t.patient_result
FROM (
SELECT p.id
, p.patient_id
, p.visit_id
, p.patient_result
, #rn := if(#prev_patient_id = patient_id, #rn + 1, 1) AS rn
, #prev_patient_id := patient_id AS prev_patient_id
FROM tbl_patients p
JOIN (SELECT #rn := 0, #prev_patient_id := NULL) i
ORDER BY p.patient_id DESC, p.id DESC
) t
WHERE t.rn <= 2
Note that this involves an inline view, which means there's going to be a pass over all the data in the table to create a "derived tabled". Then, the outer query will run against the derived table. So, this is essentially two passes over the data.
This query can be tweaked a bit to improve performance, by eliminating the duplicated value of the patient_id column returned by the inline view. But I show it as above, so we can better understand what is happening.
This approach can be rather expensive on large sets, but is generally MUCH more efficient than some of the other approaches.
Note also that this query will return a row for a patient_id if there is only one id value exists for that patient; it does not restrict the return to just those patients that have at least two rows.
It's also possible to get an equivalent resultset with a correlated subquery:
SELECT t.id
, t.patient_id
, t.visit_id
, t.patient_result
FROM tbl_patients t
WHERE ( SELECT COUNT(1) AS cnt
FROM tbl_patients p
WHERE p.patient_id = t.patient_id
AND p.id >= t.id
) <= 2
ORDER BY t.patient_id ASC, t.id ASC
Note that this is making use of a "dependent subquery", which basically means that for each row returned from t, MySQL is effectively running another query against the database. So, this will tend to be very expensive (in terms of elapsed time) on large sets.
As another approach, if there are relatively few id values for each patient, you might be able to get by with an inequality join:
SELECT t.id
, t.patient_id
, t.visit_id
, t.patient_result
FROM tbl_patients t
LEFT
JOIN tbl_patients p
ON p.patient_id = t.patient_id
AND t.id < p.id
GROUP
BY t.id
, t.patient_id
, t.visit_id
, t.patient_result
HAVING COUNT(1) <= 2
Note that this will create a nearly Cartesian product for each patient. For a limited number of id values for each patient, this won't be too bad. But if a patient has hundreds of id values, the intermediate result can be huge, on the order of (O)n**2.
Try this..
SELECT id, patient_result FROM tbl_patient AS tp WHERE id < ((SELECT MAX(id) FROM tbl_patient AS tp_max WHERE tp_max.patient_ID = tp.patient_ID) - 1) GROUP BY patient_ID
Why not use simply...
GROUP BY `patient_ID` DESC LIMIT 2
... and do the rest in the next step?

How do I pair rows together in MYSQL?

I'm working on a simple time tracking app.
I've created a table that logs the IN and OUT times of employees.
Here is an example of how my data currently looks:
E_ID | In_Out | Date_Time
------------------------------------
3 | I | 2012-08-19 15:41:52
3 | O | 2012-08-19 17:30:22
1 | I | 2012-08-19 18:51:11
3 | I | 2012-08-19 18:55:52
1 | O | 2012-08-19 20:41:52
3 | O | 2012-08-19 21:50:30
Im trying to create a query that will pair the IN and OUT times of an employee into one row like this:
E_ID | In_Time | Out_Time
------------------------------------------------
3 | 2012-08-19 15:41:52 | 2012-08-19 17:30:22
3 | 2012-08-19 18:55:52 | 2012-08-19 21:50:30
1 | 2012-08-19 18:51:11 | 2012-08-19 20:41:52
I hope I'm being clear in what I'm trying to achieve here.
Basically I want to generate a report that had both the in and out time merged into one row.
Any help with this would be greatly appreciated.
Thanks in advance.
There are three basic approaches I can think of.
One approach makes use of MySQL user variables, one approach uses a theta JOIN, another uses a subquery in the SELECT list.
theta-JOIN
One approach is to use a theta-JOIN. This approach is a generic SQL approach (no MySQL specific syntax), which can work with multiple RDBMS.
N.B. With a large number of rows, this approach can create a significantly large intermediate result set, which can lead to problematic performance.
SELECT o.e_id, MAX(i.date_time) AS in_time, o.date_time AS out_time
FROM e `o`
LEFT
JOIN e `i` ON i.e_id = o.e_id AND i.date_time < o.date_time AND i.in_out = 'I'
WHERE o.in_out = 'O'
GROUP BY o.e_id, o.date_time
ORDER BY o.date_time
What this does is match every 'O' row for an employee with every 'I' row that is earlier, and then we use the MAX aggregate to pick out the 'I' record with the closest date time.
This works for perfectly paired data; could produce odd results for imperfect pairs... (two consecutive 'O' records with no intermediate 'I' row, will both get matched to the same 'I' row, etc.)
correlated subquery in SELECT list
Another approach is to use a correlated subquery in the SELECT list. This can have sub-optimal performance, but is sometimes workable (and is occasionally the fastest way to return the specified result set... this approach works best when we have a limited number of rows returned in the outer query.)
SELECT o.e_id
, (SELECT MAX(i.date_time)
FROM e `i`
WHERE i.in_out = 'I'
AND i.e_id = o.e_id
AND i.date_time < o.date_time
) AS in_time
, o.date_time AS out_time
FROM e `o`
WHERE o.in_out = 'O'
ORDER BY o.date_time
User variables
Another approach is to make use of MySQL user variables. (This is a MySQL-specific approach, and is a workaround to the "missing" analytic functions.)
What this query does is order all of the rows by e_id, then by date_time, so we can process them in order. Whenever we encounter an 'O' (out) row, we use the value of date_time from the immediately preceding 'I' row as the 'in_time')
N.B.: This usage of MySQL user variables is dependent on MySQL performing operations in a specific order, a predictable plan. The use of the inline views (or "derived tables", in MySQL parlance) gets us a predictable execution plan. But this behavior is subject to change in future releases of MySQL.
SELECT c.e_id
, CAST(c.in_time AS DATETIME) AS in_time
, c.out_time
FROM (
SELECT IF(#prev_e_id = d.e_id,#in_time,#in_time:=NULL) AS reset_in_time
, #in_time := IF(d.in_out = 'I',d.date_time,#in_time) AS in_time
, IF(d.in_out = 'O',d.date_time,NULL) AS out_time
, #prev_e_id := d.e_id AS e_id
FROM (
SELECT e_id, date_time, in_out
FROM e
JOIN (SELECT #prev_e_id := NULL, #in_time := NULL) f
ORDER BY e_id, date_time, in_out
) d
) c
WHERE c.out_time IS NOT NULL
ORDER BY c.out_time
This works for the set of data you have, it needs more thorough testing and tweaking to ensure you get the result set you want with quirky data, when the rows are not perfectly paired (e.g. two 'O' rows with no 'I' row between them, an 'I' row with no subsequent 'O' row, etc.)
SQL Fiddle
Unfortunately, MySQL doesn't have ROW_NUMBER() OVER(PARTITION BY ORDER BY() function like SQL Server or this would be incredibly easy.
But, there is a way to do this in MySQL:
set #num := 0, #in_out := '';
select emp_in.id,
emp_in.in_time,
emp_out.out_time
from
(
select id, in_out, date_time in_time,
#num := if(#in_out = in_out, #num + 1, 1) as row_number,
#in_out := in_out as dummy
from mytable
where in_out = 'I'
order by date_time, id
) emp_in
join
(
select id, in_out, date_time out_time,
#num := if(#in_out = in_out, #num + 1, 1) as row_number,
#in_out := in_out as dummy
from mytable
where in_out = 'O'
order by date_time, id
) emp_out
on emp_in.id = emp_out.id
and emp_in.row_number = emp_out.row_number
order by emp_in.id, emp_in.in_time
Basically, this creates two sub-queries each one generates a row_number for that particular record - one subquery is for in_time and the other is for out_time.
Then you JOIN the two queries together on the emp_id and the row_number
See SQL Fiddle with Demo