Difference in queries results with group by and without it - mysql

I have two tables:
t1 with the following columns: name | key | length
t2 with the following columns: name | country.
I need to select all distinct keys with length>2000 group by country. So, I made
SELECT count(distinct key), country
from db.t1
inner join db.t2
on t1.name=t2.name
where length>2000
group by country;
But, when I make the query:
SELECT count(distinct key)
from db.t1
where Length>2000;
I am supposed to get equal results but I'm getting different results. For example, in the first query, I get 125494 and in the second I get: 121653.
What is the reason for this different results?? Knowing that there are some fields in the country are ''. It seems to me they don't appear as a group and i counted them and found that they are 134 records. but I can't find out the reason.

Unless key is UNIQUE (in which case, why bother with the DISTINCT keywords?), there is no reason that your two queries should return the same results.
Suppose t1 contains:
+------+-----+--------+
| name | key | length |
+------+-----+--------+
| a | x | 5000 |
| b | x | 5000 |
| b | y | 5000 |
| c | z | 5000 |
+------+-----+--------+
And t2 contains:
+------+---------+
| name | country |
+------+---------+
| a | uk |
| b | fr |
| c | de |
+------+---------+
Then your queries will return:
First query:
SELECT count(distinct key), country
from db.t1
inner join db.t2
on t1.name=t2.name
where length>2000
group by country;
Will yield:
+---------------------+---------+
| count(distinct key) | country |
+---------------------+---------+
| 1 | uk |
| 2 | fr |
| 1 | de |
+---------------------+---------+
Second query:
SELECT count(distinct key)
from db.t1
where Length>2000;
Will yield:
+---------------------+
| count(distinct key) |
+---------------------+
| 3 |
+---------------------+
See it on sqlfiddle.

If you have multiple rows in t2 with the same name the join will be creating duplicates.

Related

MySQL: How to make a query between 2 tables that returns NULL to a row that isn't in the 2nd table?

I have this 2 tables
1st Table "Users"
+----+-----------+----------+
| ID | FirstName | LastName |
+----+-----------+----------+
| 1 | Jeff | Bezos |
| 2 | Bill | Gates |
| 3 | Elon | Musk |
+----+-----------+----------+
2nd Table "Records"
+----+--------+------------+
| ID | IDUser | RecordDate |
+----+--------+------------+
| 1 | 1 | 15/06/2021 |
| 2 | 2 | 05/06/2021 |
| 3 | 2 | 12/06/2021 |
| 4 | 2 | 02/06/2021 |
| 5 | 1 | 17/06/2021 |
+----+--------+------------+
So this 2 tables are linked each other by using a Foreing key Records.IDUsers -> Users.ID
I wanted to make a query that does this
+-----------+----------+----------------+--------------------+
| FirstName | LastName | Lastest Record | Numbers of Records |
+-----------+----------+----------------+--------------------+
| Jeff | Bezos | 17/06/2021 | 2 |
| Bill | Gates | 12/06/2021 | 3 |
| Elon | Musk | NULL | NULL |
+-----------+----------+----------------+--------------------+
You need to use LEFT JOIN in order to get back users without records too; then the MAX and COUNT aggregate functions.
First version: This will return 0 for the number of records instead of NULL, when there are no records for a specific user. Latest record will be NULL as expected.
SELECT
FirstName,
LastName,
MAX(RecordDate) AS LatestRecord,
COUNT(Records.ID) AS NumberOfRecords
FROM Users LEFT JOIN Records on Users.ID = Records.IDUser
GROUP BY Users.ID;
If you want NULL instead of 0 (which normally you do not want), you can use the IF function like this:
SELECT
FirstName,
LastName,
MAX(RecordDate) AS LatestRecord,
IF(COUNT(Records.ID) > 0, COUNT(Records.ID), NULL) AS NumberOfRecords
FROM Users LEFT JOIN Records on Users.ID = Records.IDUser
GROUP BY Users.ID;
Second version: It might happen that running the above query will return an error, something like:
Error: ER_WRONG_FIELD_WITH_GROUP: ...; this is incompatible with sql_mode=only_full_group_by
This happens when/if the ONLY_FULL_GROUP_BY SQL mode is enabled (which it is by default since MySQL 5.7.5). In order to get around this error, you can use the ANY_VALUE function to select the nonaggregated fields:
SELECT
ANY_VALUE(FirstName) AS FirstName,
ANY_VALUE(LastName) AS LastName,
MAX(RecordDate) AS LatestRecord,
COUNT(Records.ID) AS NumberOfRecords
FROM Users LEFT JOIN Records on Users.ID = Records.IDUser
GROUP BY Users.ID;
left join select all user even if does not have records
select * from users left join records on records.IDUser = ID;

MySQL select unique rows in two columns with the highest value in one column

I have a basic table:
+-----+--------+------+------+
| id, | name, | cat, | time |
+-----+--------+------+------+
| 1 | jamie | 1 | 100 |
| 2 | jamie | 2 | 100 |
| 3 | jamie | 1 | 50 |
| 4 | jamie | 2 | 150 |
| 5 | bob | 1 | 100 |
| 6 | tim | 1 | 300 |
| 7 | alice | 4 | 100 |
+-----+--------+------+------+
I tried using the "Left Joining with self, tweaking join conditions and filters" part of this answer: SQL Select only rows with Max Value on a Column but some reason when there are records with a value of 0 it breaks, and it also doesn't return every unique answer for some reason.
When doing the query on this table I'd like to receive the following values:
+-----+--------+------+------+
| id, | name, | cat, | time |
+-----+--------+------+------+
| 1 | jamie | 1 | 100 |
| 4 | jamie | 2 | 150 |
| 5 | bob | 1 | 100 |
| 6 | tim | 1 | 300 |
| 7 | alice | 4 | 100 |
+-----+--------+------+------+
Because they are unique on name and cat and have the highest time value.
The query I adapted from the answer above is:
SELECT a.name, a.cat, a.id, a.time
FROM data A
INNER JOIN (
SELECT name, cat, id, MAX(time) as time
FROM data
WHERE extra_column = 1
GROUP BY name, cat
) b ON a.id = b.id AND a.time = b.time
The issue here is that ID is unique per row you can't get the unique value when getting the max; you have to join on the grouped values instead.
SELECT a.name, a.cat, a.id, a.time
FROM data A
INNER JOIN (
SELECT name, cat, MAX(time) as time
FROM data
WHERE extra_column = 1
GROUP BY name, cat
) b ON A.Cat = B.cat and A.Name = B.Name AND a.time = b.time
Think about it... So what ID is mySQL returning form the Inline view? It could be 1 or 3 and 2 or 4 for jamie. Hows does the engine know to pick the one with the max ID? it is "free to choose any value from each group, so unless they are the same, the values chosen are indeterminate. " it could pick the wrong one resulting in incorrect results. So you can't use it to join on.
https://dev.mysql.com/doc/refman/5.0/en/group-by-handling.html
If you want to use a self join, you could use this query:
SELECT
d1.*
FROM
date d1 LEFT JOIN date d2
ON d1.name=d2.name
AND d1.cat=d2.cat
AND d1.time<d2.time
WHERE
d2.time IS NULL
It is very simple
SELECT MAX(TIME),name,cat FROM table name group by cat

MySQL: join grouped data to table - only first row joined

I have problem joining tables with following content:
Table RingOrderItem:
+----+-------------+--------+
| ID | ID_RingType | Amount |
+----+-------------+--------+
| 1 | A | 100 |
| 2 | B | 50 |
| 3 | A | 500 |
| 4 | C | 100 |
+----+-------------+--------+
Grouped table Rings - result of SELECT min(Rings.Number) AS Number, ID_RingType FROM Rings GROUP BY ID_RingType statement:
+--------+-------------+
| Number | ID_RingType |
+--------+-------------+
| 1 | A |
| 1 | B |
+--------+-------------+
I want to retrieve all records from RingOrderItem and join number from grouped table Rings to them, for which I used this query:
SELECT
roi.ID,
roi.ID_RingOrder,
roi.ID_RingType,
roi.Amount,
min(r.Number) AS `FromValue`,
min(r.Number) + roi.Amount - 1 AS `ToValue`
FROM
RingOrderItem AS roi
LEFT JOIN
(SELECT min(Rings.Number) AS Number, ID_RingType FROM Rings
GROUP BY ID_RingType)
AS r ON r.ID_RingType = roi.ID_RingType;
For some reason, I get only the first row from RingOrderItem table:
+----+--------------+-------------+--------+-----------+---------+
| ID | ID_RingOrder | ID_RingType | Amount | FromValue | ToValue |
+----+--------------+-------------+--------+-----------+---------+
| 1 | 1 | A | 100 | 1 | 100 |
+----+--------------+-------------+--------+-----------+---------+
I want all rows, and if the data can not be joined (value C in ID_RingType), than return simply NULL.
Thanks,
Zbynek
I don't think you need the two min() functions on the main query since you are already getting the min values in the sub query.
Also, it's not really a good idea to do math to a column that might be NULL
Try this:
SELECT
roi.ID,
roi.ID_RingOrder,
roi.ID_RingType,
roi.Amount,
r.Number AS FromValue,
COALESCE(r.Number, 0) + roi.Amount - 1 AS ToValue
FROM
RingOrderItem AS roi
LEFT JOIN
(
SELECT
MIN(Rings.Number) AS Number,
ID_RingType
FROM
Rings
GROUP BY
ID_RingType
) AS r ON roi.ID_RingType = r.ID_RingType;
Also, switch your left join ON clause to have the first table listed first.

MySQL Select row with lowest value in column

I have a table
+------+-----+-------+
| name | age | class |
+------+-----+-------+
| Ben | 4 | B |
| Alex | 7 | A |
| Jim | 3 | B |
| Ben | 5 | C |
| Ben | 2 | C |
| Alex | 9 | A |
+------+-----+-------+
I need a query so that I can select the person with the lowest age such that I get:
+------+-----+-------+
| name | age | class |
+------+-----+-------+
| Ben | 2 | C |
| Jim | 3 | B |
| Alex | 7 | A |
+------+-----+-------+
I've been messing with various combinations or GROUP BYs and ORDER BYs and can't seem to get it right.
Also, the table consists of about 8 million records so performance is important.
You first have to select the minimum age per class:
select min(age) as age, class as class from t group by class
(Note: I am assuming you want the minimum age per class. I you want the minimum age per name, then replace class with name in the queries ...)
Then you have to join the result with your table to get the respective rows.
The full SQL would be
select t.* from t
inner join
(
select min(age) as age, class as class from t group by class
) min_ages on t.age = min_ages.age and t.class = min_ages.class;
For optimal performance, make sure that age is indexed as well as class (or name, whichever you want in your group by expression).
SELECT name,age,class FROM table t1
JOIN
(SELECT name,MIN(age)as minage FROM table GROUP BY name)t2
ON t1.name=t2.name AND t1.age=t2.minage

LIMIT results to n unique column values?

I have some MySQL results like this:
---------------------------
| name | something_random |
---------------------------
| john | ekjalsdjalfjkldd |
| alex | akjsldfjaekallee |
| alex | jkjlkjslakjfjflj |
| alex | kajslejajejjaddd |
| bob | ekakdie33kkd93ld |
| bob | 33kd993kakakl3ll |
| paul | 3k309dki595k3lkd |
| paul | 3k399kkfkg93lk3l |
etc...
This goes on for 1000's of rows of results. I need to limit the number of results to the first 50 unique names. I think there is a simple solution to this but I'm not sure.
I've tried using derived tables and variables but can't quite get there. If I could figure out how to increment a variable once every time a name is different I think I could say WHERE variable <= 50.
UPDATED
I've tried the Inner Join approach(es) suggested below. The problem is this:
The subselect SELECT DISTINCT name FROM testTable LIMIT 50 grabs the first 50 distinct names. Perhaps I wasn't clear enough in my original post, but this limits my query too much. In my query, not every name in the table is returned in the result. Let me modify my original example:
----------------------------------
| id | name | something_random |
----------------------------------
| 1 | john | ekjalsdjalfjkldd |
| 4 | alex | akjsldfjaekallee |
| 4 | alex | jkjlkjslakjfjflj |
| 4 | alex | kajslejajejjaddd |
| 6 | bob | ekakdie33kkd93ld |
| 6 | bob | 33kd993kakakl3ll |
| 12 | paul | 3k309dki595k3lkd |
| 12 | paul | 3k399kkfkg93lk3l |
etc...
So I added in some id numbers here. These ID numbers pertain to the people's names in the tables. So you can see in the results, not every single person/name in the table is necessarily in the result (due to some WHERE condition). So the 50th distinct name in the list will always have an ID number higher than 49. The 50th person could be id 79, 234, 4954 etc...
So back to the problem. The subselect SELECT DISTINCT name FROM testTable LIMIT 50 selects the first 50 names in the table. That means that my search results will be limited to names that have ID <=50, which is too constricting. If there are certain names that don't show up in the query (due to some WHERE condition), then they are still counted as one of the 50 distinct names. So you end up with too few results.
UPDATE 2
To #trapper: This is a basic simplification of what my query looks like:
SELECT
t1.id,
t1.name,
t2.details
FROM t1
LEFT JOIN t2 ON t1.id = t2.some_id
INNER JOIN
(SELECT DISTINCT name FROM t1 ORDER BY id LIMIT 0,50) s ON s.name = t1.name
WHERE
SOME CONDITIONS
ORDER BY
t1.id,
t1.name
And my results look like this:
----------------------------------
| id | name | details |
----------------------------------
| 1 | john | ekjalsdjalfjkldd |
| 3 | alex | akjsldfjaekallee |
| 3 | alex | jkjlkjslakjfjflj |
| 4 | alex | kajslejajejjaddd |
| 6 | bob | ekakdie33kkd93ld |
| 6 | bob | 33kd993kakakl3ll |
| 12 | paul | 3k309dki595k3lkd |
| 12 | paul | 3k399kkfkg93lk3l |
...
| 37 | bill | kajslejajejjaddd |
| 37 | bill | ekakdie33kkd93ld |
| 41 | matt | 33kd993kakakl3ll |
| 50 | jake | 3k309dki595k3lkd |
| 50 | jake | 3k399kkfkg93lk3l |
----------------------------------
The results stop at id=50. There are NOT 50 distinct names in the list. There are only roughly 23 distinct names.
My MySql syntax may be rusty, but the idea is to use a query to select the top 50 distinct names, then do a self-join on name and select the name and other information from the join.
select a.name, b.something_random
from Table b
inner join (select distinct name from Table order by RAND() limit 0,50) a
on a.name = b.name
SELECT DISTINCT name FROM table LIMIT 0,50
Edited: Ahh yes I misread question first time, this should do the trick though :)
SELECT a.name, b.something_random
FROM `table` b
INNER JOIN (SELECT DISTINCT name FROM `table` ORDER BY RAND() LIMIT 0,50) a
ON a.name = b.name ORDER BY a.name
How this work is the (SELECT DISTINCT name FROMtableORDER BY RAND() LIMIT 0,50) part is what pulls out the names to include in the join. So here I am taking 50 unique names at random, but you can change this to any other selection criteria if you want.
Then you join those results back into your table. This links each of those 50 selected names back to all of the rows with a matching name for your final results. Finally ORDER BY a.name just to be sure all the rows for each name end up grouped together.
This should do it:
SELECT tA.*
FROM
testTable tA
INNER JOIN
(SELECT distinct name FROM testTable LIMIT 50) tB ON tA.name = tB.name
;