MySQL: strange behavior with GROUP BY in WHERE subselect - mysql

I hope you can help me with that topic.
I have one table, the relevant fields are VARCHAR id, VARCHAR name and date
3DF0001AB TESTING_1 2017-04-04
3DF0002ZG TESTING_2 2017-04-03
3DF0003ER TESTING_1 2017-04-01
3DF0004XY TESTING_1 2017-03-26
3DF0005UO TESTING_3 2017-03-25
The goal is to retrieve two entries for every name (>500), sorted by date. As I can just use database queries I tried following approach. Get one id for every name, UNION the result with the same query, but excluding the ids from the first set.
First step was to get one entry for every name. Result as expected, one id for every name.
SELECT id FROM table GROUP BY name;
Second step; using the above statement in the WHERE clause to receive results, that are not in the first result:
SELECT id FROM table WHERE id NOT IN (SELECT id FROM table GROUP BY name)
But here the result is empty, then I tried to invert the WHERE by using WHERE id IN instead of NOT IN. Expected result was that the same ids would show up when just using the subquery, result was all ids from the table. So I assume that the subquery delivers a wrong result, because when I copy the ids manually -> id IN ("3DF0001AB", ...) it works.
So maybe someone can explain the behavior and/or help to find a solution for the original problem.

This is a really bad practice:
SELECT id
FROM table
GROUP BY name;
Although MySQL allows this construct, the returned id is from an indeterminate row. You can even get different rows when you run the same query at different times.
A better approach is to use an aggregation function:
SELECT MAX(id)
FROM table
GROUP BY name;
Your real problem, though, is slightly different. When you use NOT IN, no rows are returned if any value in the IN list is NULL. That is how NOT IN is defined.
I would recommend using NOT EXISTS or LEFT JOIN instead, because their behavior is more intuitive:
SELECT t.id
FROM table t LEFT JOIN
(SELECT MAX(id) as id
FROM table t2
GROUP BY name
) tt
ON t.id = tt.id
WHERE tt.id IS NULL;

Related

Mysql DISTINCT with more than one column (remove duplicates)

My database is called: (training_session)
I try to print out some information from my data, but I do not want to have any duplicates. I do get it somehow, may someone tell me what I do wrong?
SELECT DISTINCT athlete_id AND duration FROM training_session
SELECT DISTINCT athlete_id, duration FROM training_session
It works perfectly if i use only one column, but when I add another. it does not work.
I think you misunderstood the use of DISTINCT.
There is big difference between using DISTINCT and GROUP BY.
Both have some sort of goal, but they have different purpose.
You use DISTINCT if you want to show a series of columns and never repeat. That means you dont care about calculations or group function aggregates. DISTINCT will show different RESULTS if you keep adding more columns in your SELECT (if the table has many columns)
You use GROUP BY if you want to show "distinctively" on a certain selected columns and you use group function to calculate the data related to it. Therefore you use GROUP BY if you want to use group functions.
Please check group functions you can use in this link.
https://dev.mysql.com/doc/refman/8.0/en/group-by-functions.html
EDIT 1:
It seems like you are trying to get the "latest" of a certain athlete, I'll assume the current scenario if there is no ID.
Here is my alternate solution:
SELECT a.athlete_id ,
( SELECT b.duration
FROM training_session as b
WHERE b.athlete_id = a.athlete_id -- connect
ORDER BY [latest column to sort] DESC
LIMIT 1
) last_duration
FROM training_session as a
GROUP BY a.athlete_id
ORDER BY a.athlete_id
This syntax is called IN-SELECT subquery. With the help of LIMIT 1, it shows the topmost record. In-select subquery must have 1 record to return or else it shows error.
MySQL's DISTINCT clause is used to filter out duplicate recordsets.
If your query was SELECT DISTINCT athlete_id FROM training_session then your output would be:
athlete_id
----------
1
2
3
4
5
6
As soon as you add another column to your query (in your example, the column called duration) then each record resulting from your query are unique, hence the results you're getting. In other words the query is working correctly.

Subquery returns more rows than straight same query in MySQL

I want to remove duplicates based on the combination of listings.product_id and listings.channel_listing_id
This simple query returns 400.000 rows (the id's of the rows I want to keep):
SELECT id
FROM `listings`
WHERE is_verified = 0
GROUP BY product_id, channel_listing_id
While this variation returns 1.600.000 rows, which are all records on the table, not only is_verified = 0:
SELECT *
FROM (
SELECT id
FROM `listings`
WHERE is_verified = 0
GROUP BY product_id, channel_listing_id
) AS keepem
I'd expect them to return the same amount of rows.
What's the reason for this? How can I avoid it (in order to use the subselect in the where condition of the DELETE statement)?
EDIT: I found that doing a SELECT DISTINCT in the outer SELECT "fixes" it (it returns 400.000 records as it should). I'm still not sure if I should trust this subquery, for there is no DISTINCT in the DELETE statement.
EDIT 2: Seems to be just a bug in the way phpMyAdmin reports the total count of the rows.
Your query as it stands is ambiguous. Suppose you have two listings with the same product_id and channel_id. Then what id is supposed to be returned? The first, the second? Or both, ignoring the GROUP request?
What if there is more than one id with different product and channel ids?
Try removing the ambiguity by selecting MAX(id) AS id and adding DISTINCT.
Are there any foreign keys to worry about? If not, you could pour the original table into a copy, empty the original and copy back in it the non-duplicates only. Messier, but you only do SELECTs or DELETEs guaranteed to succeed, and you also get to keep a backup.
Assign aliases in order to avoid field reference ambiguity:
SELECT
keepem.*
FROM
(
SELECT
innerStat.id
FROM
`listings` AS innerStat
WHERE
innerStat.is_verified = 0
GROUP BY
innerStat.product_id,
innerStat.channel_listing_id
) AS keepem

SQL: Group By is mismatching records

I'm trying to get the highest version within a group. My query:
SELECT
rubric_id,
max(version) as version,
group_id
FROM
rubrics
WHERE
client_id = 1
GROUP BY
group_id
The Data:
The Results:
The rubric of ID 2 does not have a version of 2, why is this being mismatched? What do I need to do to correct this?
Edit, not a duplicate:
This is not a duplicate of SQL Select only rows with Max Value on a Column , which is a post I have read and referenced before writing this. My question is not how to find the max, my question is why is the version not matched to the correct ID
MySQL is confusing you by letting you get away with having a column in your select that isn't in your group by. To resolve the issue, make sure you don't select any field that isn't in the group by.
Instead of trying to get everything in one statement, you will need to use a subquery to find the max_version_id and then join to it.
SELECT T.*
FROM rubrics T
JOIN
(
SELECT
group_id,
max(version) as max_version
FROM
rubrics
GROUP BY
group_id
) dedupe
on T.group_id = dedupe.group_id
and T.version_id = dedupe.max_version_id
WHERE
T.client_id = 1
Edit: So MySQL allows it, but I don't think it's a good practise to use it.
You are trying to query non-aggregated data from an aggregated query. You should not do that.
A GROUP BY takes the field it should make group of rows with (in your case, what you say with your GROUP BY is: give me a result per different group_id) and gives a result (the aggregated data) based on the grouping.
Here, you try to access non aggregated data (rubric_id in your case). For some reason, the query does not crash and picks a "random" id in your aggregated data.

match results generated by SQL WHERE IN CLAUSE

let's see I have a simple table like this:
name id
tom 1
jerry 2
... ...
And from the outside, I got a list contains the names (tom, jerry, kettie...)
I am trying to use WHERE IN clause to retrieve the id based on the name list.
I can do
SELECT id FROM mySimpleTable where name in ('tom','jerry','kettie');
So just iterate the name list and generate the contents in the parentheses.
This works, but the results is not in the input order, for example, the input is tom, jerry, kettie, the expected the result is 1,2,3, however, the output actually could be in any order.
Then how can I modify the SQL clause to make sure I get my input and output matched so that I can do the following process accrordingly. I heard JOIN may help in this situation.
SELECT id
FROM mySimpleTable
where name in ('tom','jerry','kettie')
order by field(name, 'tom','jerry','kettie')
I heard JOIN may help in this situation.
Yes it can help:
SELECT m.id
FROM mySimpleTable m
JOIN (
SELECT 'tom' AS name, 1 AS orderNum
UNION ALL
SELECT 'jerry' AS name, 2 AS orderNum
UNION ALL
SELECT 'kettie' AS name, 3 AS orderNum
) AS sub
ON m.name = sub.name
ORDER BY sub.orderNum ASC;
SqlFiddleDemo
This solution can be also used in different RDBMS. field is MySQL specific.
How it works:
Create derived table/subquery with values you need to check and ordering column
JOIN will return only rows that correspond each other based on name
ORDER BY column you've added in subquery
just select id,name from table_a where name in ('tom','jerry','happy') , you will have the combination of the input name and output id.
this entirely depends on where you're getting the list for your "in" clause.
if it's from somewhere on the outside, you probably should first turn the list into a temp table, adding an id column that indicates the order (see this answer for a start on how to do that) - and then do an inner join with it.
I did try to run your SQL query, and me for one did get the resultant output in the same order as that of the input. Well, but still it isn't necessary it would happen the same way every time, so the best way to arrange your output in a particular hierarchy is to use the ORDER BY clause. The syntax would be:
SELECT column_name
FROM table_name
WHERE conditions
ORDER BY column_name;
So in your case, the query would read as:
SELECT id
FROM mysimpletable
WHERE name
IN('tom','jerry','kettie'....)
ORDER BY id;
You can get more help with MySQL concepts here for further information.
Select
id
from mySimpleTable
where name in ('tom','jerry','kettie')
Order by id

MySQL group by issue

I'm having a strange problem with MySQL and would like to see if the community has any thoughts:
I have a table 'tbl' that contains
____________
| id | sdate |
And I'm trying to execute this query:
select id, max(sdate) as sd from tbl where id in(123) group by id;
This returns no results.
However, this query:
select id, sdate from tbl where id in(123);
Returns many results with id's and dates.
Why would the top query fail to produce results?
So IDs in this table aren't distinct, right? For example, it could be a list of questions here on StackOverflow with a viewed date, and each question ID could appear multiple times in the results. Otherwise, if the IDs are always unique then there's no point in doing a GROUP BY on them. When you're restricting the results to a single ID you don't technically need the GROUP BY clause since MAX() is an aggregate function that will return a single row.
What's the datatype of sdate? int/datetime?
It's perfectly fine to supply a single ID to an IN() clause; it just can't be blank: IN().
Is it possible to provide the output of "DESCRIBE tbl;" and a few example rows?
Turns out the index was corrupt. Running the following solved the issue:
REPAIR TABLE tbl;