LIMIT results to n unique column values? - mysql

I have some MySQL results like this:
---------------------------
| name | something_random |
---------------------------
| john | ekjalsdjalfjkldd |
| alex | akjsldfjaekallee |
| alex | jkjlkjslakjfjflj |
| alex | kajslejajejjaddd |
| bob | ekakdie33kkd93ld |
| bob | 33kd993kakakl3ll |
| paul | 3k309dki595k3lkd |
| paul | 3k399kkfkg93lk3l |
etc...
This goes on for 1000's of rows of results. I need to limit the number of results to the first 50 unique names. I think there is a simple solution to this but I'm not sure.
I've tried using derived tables and variables but can't quite get there. If I could figure out how to increment a variable once every time a name is different I think I could say WHERE variable <= 50.
UPDATED
I've tried the Inner Join approach(es) suggested below. The problem is this:
The subselect SELECT DISTINCT name FROM testTable LIMIT 50 grabs the first 50 distinct names. Perhaps I wasn't clear enough in my original post, but this limits my query too much. In my query, not every name in the table is returned in the result. Let me modify my original example:
----------------------------------
| id | name | something_random |
----------------------------------
| 1 | john | ekjalsdjalfjkldd |
| 4 | alex | akjsldfjaekallee |
| 4 | alex | jkjlkjslakjfjflj |
| 4 | alex | kajslejajejjaddd |
| 6 | bob | ekakdie33kkd93ld |
| 6 | bob | 33kd993kakakl3ll |
| 12 | paul | 3k309dki595k3lkd |
| 12 | paul | 3k399kkfkg93lk3l |
etc...
So I added in some id numbers here. These ID numbers pertain to the people's names in the tables. So you can see in the results, not every single person/name in the table is necessarily in the result (due to some WHERE condition). So the 50th distinct name in the list will always have an ID number higher than 49. The 50th person could be id 79, 234, 4954 etc...
So back to the problem. The subselect SELECT DISTINCT name FROM testTable LIMIT 50 selects the first 50 names in the table. That means that my search results will be limited to names that have ID <=50, which is too constricting. If there are certain names that don't show up in the query (due to some WHERE condition), then they are still counted as one of the 50 distinct names. So you end up with too few results.
UPDATE 2
To #trapper: This is a basic simplification of what my query looks like:
SELECT
t1.id,
t1.name,
t2.details
FROM t1
LEFT JOIN t2 ON t1.id = t2.some_id
INNER JOIN
(SELECT DISTINCT name FROM t1 ORDER BY id LIMIT 0,50) s ON s.name = t1.name
WHERE
SOME CONDITIONS
ORDER BY
t1.id,
t1.name
And my results look like this:
----------------------------------
| id | name | details |
----------------------------------
| 1 | john | ekjalsdjalfjkldd |
| 3 | alex | akjsldfjaekallee |
| 3 | alex | jkjlkjslakjfjflj |
| 4 | alex | kajslejajejjaddd |
| 6 | bob | ekakdie33kkd93ld |
| 6 | bob | 33kd993kakakl3ll |
| 12 | paul | 3k309dki595k3lkd |
| 12 | paul | 3k399kkfkg93lk3l |
...
| 37 | bill | kajslejajejjaddd |
| 37 | bill | ekakdie33kkd93ld |
| 41 | matt | 33kd993kakakl3ll |
| 50 | jake | 3k309dki595k3lkd |
| 50 | jake | 3k399kkfkg93lk3l |
----------------------------------
The results stop at id=50. There are NOT 50 distinct names in the list. There are only roughly 23 distinct names.

My MySql syntax may be rusty, but the idea is to use a query to select the top 50 distinct names, then do a self-join on name and select the name and other information from the join.
select a.name, b.something_random
from Table b
inner join (select distinct name from Table order by RAND() limit 0,50) a
on a.name = b.name

SELECT DISTINCT name FROM table LIMIT 0,50
Edited: Ahh yes I misread question first time, this should do the trick though :)
SELECT a.name, b.something_random
FROM `table` b
INNER JOIN (SELECT DISTINCT name FROM `table` ORDER BY RAND() LIMIT 0,50) a
ON a.name = b.name ORDER BY a.name
How this work is the (SELECT DISTINCT name FROMtableORDER BY RAND() LIMIT 0,50) part is what pulls out the names to include in the join. So here I am taking 50 unique names at random, but you can change this to any other selection criteria if you want.
Then you join those results back into your table. This links each of those 50 selected names back to all of the rows with a matching name for your final results. Finally ORDER BY a.name just to be sure all the rows for each name end up grouped together.

This should do it:
SELECT tA.*
FROM
testTable tA
INNER JOIN
(SELECT distinct name FROM testTable LIMIT 50) tB ON tA.name = tB.name
;

Related

MYSQL show all entries sorted by 2 columns random on one column

We are looking to return rows of a query as groups and displaying all entries of the group in the sort order. Randomly based on the set_id... and then in order by the sort_id.
So, randomly it will show:
Carl,
Phil,
Wendy,
Tina,
Rick,
Joe
or
Tina,
Rick,
Joe,
Carl,
Phil,
Wendy
This query is always showing Tina/Rick/Joe first
SELECT * FROM products ORDER BY set_id, rand()
Any help would be appreciated
+---------+--------+-------+----------+
| id | set_id | name | sort_id |
+---------+--------+-------+----------+
| 1 | AA |Rick | 2 |
| 2 | BB |Carl | 1 |
| 3 | AA |Joe | 3 |
| 4 | AA |Tina | 1 |
| 5 | BB |Phil | 2 |
| 6 | BB |Wendy | 3 |
+---------+--------+-------+----------+
if you need a random comma separated name list this will do the trick.
This will keep the groups and the correct sorting within the group.
Query
SELECT
GROUP_CONCAT(Table_names_rand.names) as names
FROM (
SELECT
*
FROM (
SELECT
GROUP_CONCAT(name ORDER BY sort_id) as names
FROM
Table1
GROUP BY
set_id
)
AS Table1_names
ORDER BY
RAND()
)
AS Table_names_rand
Result
| names |
|-------------------------------|
| Carl,Phil,Wendy,Tina,Rick,Joe |
or
| names |
|-------------------------------|
| Tina,Rick,Joe,Carl,Phil,Wendy |
demo http://www.sqlfiddle.com/#!9/487ac9/9
if you need random names as records output.
Query
SELECT
Table1.name
FROM
Table1
CROSS JOIN (
SELECT
GROUP_CONCAT(Table_names_rand.names) as names
FROM (
SELECT
*
FROM (
SELECT
GROUP_CONCAT(name ORDER BY sort_id) as names
FROM
Table1
GROUP BY
set_id
)
AS Table1_names
ORDER BY
RAND()
)
AS Table_names_rand
)
AS Table_names_rand
ORDER BY
FIND_IN_SET(name, Table_names_rand.names)
Result
| name |
|-------|
| Carl |
| Phil |
| Wendy |
| Tina |
| Rick |
| Joe |
or
| name |
|-------|
| Tina |
| Rick |
| Joe |
| Carl |
| Phil |
| Wendy |
demo http://www.sqlfiddle.com/#!9/487ac9/28
If we strip away the randomness of the gorup ordering, your query would look like this:
SELECT
*
FROM
products
ORDER BY
set_id,
sort_id;
The ordering by set_id is necessary to "group" the results, without really grouping them. You do not want to group them, because then the rows with the same group would be aggregated, meaning that only one row per group would be put out.
Since you only want to randomize the groups, you need to write another query that assigns a random number to each group, like the one below:
SELECT
set_id,
RAND() as 'rnd'
FROM
products
GROUP BY
set_id
The GROUP BY clause makes sure, that each group is only selected once. The resultset will look like this:
| set_id | priority |
+--------+---------+
| AA | 0.21 |
| BB | 0.1 |
With that result we can then randomize the output, by combining both queries with a JOIN on the set_id field. This will add the randomly generated number from the second query to the result set of the first query and therefore extend the static set_id with the randomized, but still for all group members equal, rnd:
SELECT
products.*
FROM
products
JOIN (
SELECT
set_id,
RAND() as 'rnd'
FROM
products
GROUP BY
set_id
) as rnd ON rnd.set_id = products.set_id
ORDER BY
rnd.rnd,
products.set_id,
products.sort_id;
Keep in mind, that it is important to still group on products.set_id, because it may be possible that two groups get the same random number assigned. If the result would not be ordered by products.set_id those groups members would then be merged.

MySQL select unique rows in two columns with the highest value in one column

I have a basic table:
+-----+--------+------+------+
| id, | name, | cat, | time |
+-----+--------+------+------+
| 1 | jamie | 1 | 100 |
| 2 | jamie | 2 | 100 |
| 3 | jamie | 1 | 50 |
| 4 | jamie | 2 | 150 |
| 5 | bob | 1 | 100 |
| 6 | tim | 1 | 300 |
| 7 | alice | 4 | 100 |
+-----+--------+------+------+
I tried using the "Left Joining with self, tweaking join conditions and filters" part of this answer: SQL Select only rows with Max Value on a Column but some reason when there are records with a value of 0 it breaks, and it also doesn't return every unique answer for some reason.
When doing the query on this table I'd like to receive the following values:
+-----+--------+------+------+
| id, | name, | cat, | time |
+-----+--------+------+------+
| 1 | jamie | 1 | 100 |
| 4 | jamie | 2 | 150 |
| 5 | bob | 1 | 100 |
| 6 | tim | 1 | 300 |
| 7 | alice | 4 | 100 |
+-----+--------+------+------+
Because they are unique on name and cat and have the highest time value.
The query I adapted from the answer above is:
SELECT a.name, a.cat, a.id, a.time
FROM data A
INNER JOIN (
SELECT name, cat, id, MAX(time) as time
FROM data
WHERE extra_column = 1
GROUP BY name, cat
) b ON a.id = b.id AND a.time = b.time
The issue here is that ID is unique per row you can't get the unique value when getting the max; you have to join on the grouped values instead.
SELECT a.name, a.cat, a.id, a.time
FROM data A
INNER JOIN (
SELECT name, cat, MAX(time) as time
FROM data
WHERE extra_column = 1
GROUP BY name, cat
) b ON A.Cat = B.cat and A.Name = B.Name AND a.time = b.time
Think about it... So what ID is mySQL returning form the Inline view? It could be 1 or 3 and 2 or 4 for jamie. Hows does the engine know to pick the one with the max ID? it is "free to choose any value from each group, so unless they are the same, the values chosen are indeterminate. " it could pick the wrong one resulting in incorrect results. So you can't use it to join on.
https://dev.mysql.com/doc/refman/5.0/en/group-by-handling.html
If you want to use a self join, you could use this query:
SELECT
d1.*
FROM
date d1 LEFT JOIN date d2
ON d1.name=d2.name
AND d1.cat=d2.cat
AND d1.time<d2.time
WHERE
d2.time IS NULL
It is very simple
SELECT MAX(TIME),name,cat FROM table name group by cat

MySQL Select row with lowest value in column

I have a table
+------+-----+-------+
| name | age | class |
+------+-----+-------+
| Ben | 4 | B |
| Alex | 7 | A |
| Jim | 3 | B |
| Ben | 5 | C |
| Ben | 2 | C |
| Alex | 9 | A |
+------+-----+-------+
I need a query so that I can select the person with the lowest age such that I get:
+------+-----+-------+
| name | age | class |
+------+-----+-------+
| Ben | 2 | C |
| Jim | 3 | B |
| Alex | 7 | A |
+------+-----+-------+
I've been messing with various combinations or GROUP BYs and ORDER BYs and can't seem to get it right.
Also, the table consists of about 8 million records so performance is important.
You first have to select the minimum age per class:
select min(age) as age, class as class from t group by class
(Note: I am assuming you want the minimum age per class. I you want the minimum age per name, then replace class with name in the queries ...)
Then you have to join the result with your table to get the respective rows.
The full SQL would be
select t.* from t
inner join
(
select min(age) as age, class as class from t group by class
) min_ages on t.age = min_ages.age and t.class = min_ages.class;
For optimal performance, make sure that age is indexed as well as class (or name, whichever you want in your group by expression).
SELECT name,age,class FROM table t1
JOIN
(SELECT name,MIN(age)as minage FROM table GROUP BY name)t2
ON t1.name=t2.name AND t1.age=t2.minage

Mysql include column with no rows returned for specific dates

I would like to ask a quick question regarding a mysql query.
I have a table named trans :
+----+---------------------+------+-------+----------+----------+
| ID | Date | User | PCNum | Customer | trans_In |
+----+---------------------+------+-------+----------+----------+
| 8 | 2013-01-23 16:24:10 | test | PC2 | George | 10 |
| 9 | 2013-01-23 16:27:22 | test | PC2 | Nick | 0 |
| 10 | 2013-01-24 16:28:48 | test | PC2 | Ted | 10 |
| 11 | 2013-01-25 16:36:40 | test | PC2 | Danny | 10 |
+----+---------------------+------+-------+----------+----------+
and another named customers :
+----+---------+-----------+
| ID | Name | Surname |
+----+---------+-----------+
| 1 | George | |
| 2 | Nick | |
| 3 | Ted | |
| 4 | Danny | |
| 5 | Alex | |
| 6 | Mike | |
.
.
.
.
+----+---------+-----------+
I want to view the sum of trans_in column for specific customers in a date range BUT ALSO include in the result set, those customers that haven't got any records in the selected date range. Their sum of trans_in could appear as NULL or 0 it doesn't matter...
I have the following query :
SELECT
`Date`,
Customer,
SUM(trans_in) AS 'input'
FROM trans
WHERE Customer IN('George','Nick','Ted','Danny')
AND `Date` >= '2013-01-24'
GROUP BY Customer
ORDER BY input DESC;
But this will only return the sum for 'Ted' and 'Danny' because they only have transactions after the 24th of January...
How can i include all the customers that are inside the WHERE IN (...) function, even those who have no transactions in the selected date range??
I suppose i'll have to join them somehow with the customers table but i cannot figure out how.
Thanks in advance!!
:)
In order to include all records from one table without matching records in another, you have to use a LEFT JOIN.
SELECT
t.`Date`,
c.name,
SUM(t.trans_in) AS 'input'
FROM customers c LEFT JOIN trans t ON (c.name = t.Customer AND t.`Date` >= '2013-01-24')
WHERE c.name IN('George','Nick','Ted','Danny')
GROUP BY c.name
ORDER BY input DESC;
Of course, I would mention that you should be referencing customer by ID, and not by name in your related table. Your current setup leads to information duplication. If the customer changes their name, you now have to update all related records in the trans table instead of just in the customer table.
try this
SELECT
`Date`,
Customer,
SUM(trans_in) AS 'input'
FROM trans
inner join customers
on customers.Name = trans.Customer
WHERE Customer IN('George','Nick','Ted','Danny')
GROUP BY Customer
ORDER BY input DESC;

Difference in queries results with group by and without it

I have two tables:
t1 with the following columns: name | key | length
t2 with the following columns: name | country.
I need to select all distinct keys with length>2000 group by country. So, I made
SELECT count(distinct key), country
from db.t1
inner join db.t2
on t1.name=t2.name
where length>2000
group by country;
But, when I make the query:
SELECT count(distinct key)
from db.t1
where Length>2000;
I am supposed to get equal results but I'm getting different results. For example, in the first query, I get 125494 and in the second I get: 121653.
What is the reason for this different results?? Knowing that there are some fields in the country are ''. It seems to me they don't appear as a group and i counted them and found that they are 134 records. but I can't find out the reason.
Unless key is UNIQUE (in which case, why bother with the DISTINCT keywords?), there is no reason that your two queries should return the same results.
Suppose t1 contains:
+------+-----+--------+
| name | key | length |
+------+-----+--------+
| a | x | 5000 |
| b | x | 5000 |
| b | y | 5000 |
| c | z | 5000 |
+------+-----+--------+
And t2 contains:
+------+---------+
| name | country |
+------+---------+
| a | uk |
| b | fr |
| c | de |
+------+---------+
Then your queries will return:
First query:
SELECT count(distinct key), country
from db.t1
inner join db.t2
on t1.name=t2.name
where length>2000
group by country;
Will yield:
+---------------------+---------+
| count(distinct key) | country |
+---------------------+---------+
| 1 | uk |
| 2 | fr |
| 1 | de |
+---------------------+---------+
Second query:
SELECT count(distinct key)
from db.t1
where Length>2000;
Will yield:
+---------------------+
| count(distinct key) |
+---------------------+
| 3 |
+---------------------+
See it on sqlfiddle.
If you have multiple rows in t2 with the same name the join will be creating duplicates.