SQL alternative to double subquery - mysql

I have a table MyTable with values that look like this:
| id | name | type | category |
---------------------------------------------------
| 1 | Rob | Red | Rock |
| 2 | Rob | Blue | Rap |
| 2 | Rob | Blue | Rock |
| 3 | Jane | Green | Country |
| 3 | Jane | Green | Rap |
| 4 | Meg | Yellow | Rock |
| 5 | Jane | Blue | Rap |
| 5 | Jane | Blue | Rock |
| 6 | Jane | Red | Country |
| 6 | Jane | Red | Rock |
| 7 | Rob | Red | Rap |
| 7 | Rob | Red | Country |
| 8 | Meg | Green | Country |
| 9 | Meg | Blue | Rap |
Now, my issue resides in the fact that (as the data is given to me), there are duplicate ids. Each id stands for a report, so id of 1 is for report 1. In this report, there is a name, type, and category. The report can only have one type, but as many categories as it likes. Hence, the duplicate ids come from there being different categories, each constituting a new row. The end result i wish to achieve it to list the names in one row, along with all the types + count (where count is the count of the distinct types, as in one type per report) in the next row. It would look like such:
| Rob | Red(2), Blue(1) |
| Jane | Green(1), Blue(1), Red(1) |
| Meg | Yellow(1), Green(1), Blue(1)|
Now, I've developed a query that actually uses two subqueries and successfully achieves this result. It goes
select name as firstCol, group_concat(type_counts order by countName desc separator ', ') as secondCol from
(select name, concat(type,' (',count(name),')') as type_counts, count(name) as countName from
(select name, type from
some join stuff
where (date_reported between '2014-11-01' and '2014-11-31')
group by id order by type, name) a
group by name, type order by name, count(name) desc) a
group by name;
This query essentially groups by id first, to remove the duplicate ids and disregard the split due to differing categories. The query wrapping it then groups by name and traffic type, concatenating the traffic type and the count of names together as "type (count)". The third groups solely by name and group concats all of the types to have a column for the name, and then a second column with all of the type+counts listed and separated by commas. I was just wondering...is there a query that could make it faster, by not having to use so many subqueries and such? Thanks.

The end result i wish to achieve it to list the names in one row, along with all the types + count (where count is the count of the distinct types, as in one type per report) in the next row.
Use a subquery as an expression in the SELECT clause and another in the WHERE clause. For example:
SELECT B.Name, B.UserId AS [User Link], (SELECT Count(B2.Name) FROM Badges B2 WHERE B2.Name = B.Name) as [Total Gold, Silver, and Bronze Badges Awarded for this Tag]
FROM Badges B
WHERE Class = 2
AND TagBased = 1
AND (SELECT Count(B2.Name)
FROM Badges B2
WHERE B2.Name = B.Name
AND B2.Class = 2
AND B2.TagBased = 1) = 1
GROUP BY B.Name, B.UserId
ORDER BY B.Name
References
Users with their own Silver Tag Badges - Stack Exchange Data Explorer
Relational Algebra and SQL (pdf)
A Gentle Introduction to SQL

Related

Is there a more idiomatic way to merge related rows from two tables?

I'm using a contrived example in order to illustrate the issue.
Imagine a simple table of books containing a title and subject/genre. In addition, there's an associated table of related subjects.
> SELECT * FROM books;
+----+--------+-----------+
| id | title | subject |
+----+--------+-----------+
| 1 | Book A | science |
| 2 | Book B | reference |
| 3 | Book C | fiction |
+----+--------+-----------+
> SELECT * FROM related_subjects;
+----+---------+---------+
| id | book_id | subject |
+----+---------+---------+
| 1 | 1 | physics |
| 2 | 1 | space |
| 3 | 3 | crime |
+----+---------+---------+
I'd like a query that could output all the title + subject combinations, so that it would look something like:
+----+--------+-----------+
| id | title | SUBJECT |
+----+--------+-----------+
| 1 | Book A | science |
| 1 | Book A | space |
| 1 | Book A | physics |
| 2 | Book B | reference |
| 3 | Book C | fiction |
| 3 | Book C | crime |
+----+--------+-----------+
The most obvious way, is to use a UNION as follows:
SELECT books.id, books.title, SUBJECT FROM books
UNION
SELECT books.id, books.title, related_subjects.subject FROM books
INNER JOIN related_subjects ON related_subjects.book_id = books.id;
Which yields a good result:
+----+--------+-----------+
| id | title | SUBJECT |
+----+--------+-----------+
| 1 | Book A | science |
| 2 | Book B | reference |
| 3 | Book C | fiction |
| 1 | Book A | space |
| 1 | Book A | physics |
| 3 | Book C | crime |
+----+--------+-----------+
However, it would be preferable if the natural output ordering was similar to my desired output, where the books row comes out first, followed by its related rows from the related_subjects table, and so on.
I'm curious as to whether there's a better/more efficient way of doing this sort of task? Particularly one that would give me a more natural ordering without having to apply a sort on the end result first.
Note: of course, I know I can apply a DB sort to the union output by ordering on books.id, related_subjects.id, but the output in my real world app consists of hundreds of thousands of rows, and so no harm in avoiding a relatively expensive sort if it can be avoided.
Introduce a computed column into the union query for ordering:
SELECT id, title, subject
FROM
(
SELECT id, title, subject, 1 AS src FROM books
UNION ALL
SELECT b.id, b.title, rs.subject, 2
FROM books b
INNER JOIN related_subjects rs ON rs.book_id = b.id
) t
ORDER BY id, src;
It seems like you have a one-to-many relationship between books and subjects. So you could drop the subject column from the books table, and just make sure all the subjects that apply to the book are in the related_subjects table. Then you don't have to use UNION to get both, you just do the join from book to related_subjects.
In theory SQL does not guarantee any order of query results unless you specify the ORDER BY. But in practice, InnoDB returns rows in the order it reads them in the index it uses to look them up.

How to select a column's value, only if all other entries with the same value exist and match? (MYSQL)

The title is a bit messy, but here's an example
suppose we have table:
| name | room |
=================
| John | 4 |
| John | 6 |
| John | 9 |
| Smith | 4 |
| Smith | 6 |
| Brian | 4 |
| Brian | 6 |
| Brian | 9 |
I want to select John and Brian because they both have exactly rooms 4, 6 and 9, but not Smith, since he doesn't have the room 9. (If we had another person who ONLY has room 4 and 6, then it'd select that other person as well as Smith).
I know I need to do some kind of correlated query, but I'm not sure how to actually get it to do something like
for a check for b
If you want groups of names that share the exact same rooms, I would recommend group_concat():
select rooms, group_concat(name) as names
from (select name, group_concat(room order by room) as rooms
from t
group by name
) n
group by rooms;
If you want only combinations with more than one name, then add having count(*) > 1 to the outer select.

MYSQL show all entries sorted by 2 columns random on one column

We are looking to return rows of a query as groups and displaying all entries of the group in the sort order. Randomly based on the set_id... and then in order by the sort_id.
So, randomly it will show:
Carl,
Phil,
Wendy,
Tina,
Rick,
Joe
or
Tina,
Rick,
Joe,
Carl,
Phil,
Wendy
This query is always showing Tina/Rick/Joe first
SELECT * FROM products ORDER BY set_id, rand()
Any help would be appreciated
+---------+--------+-------+----------+
| id | set_id | name | sort_id |
+---------+--------+-------+----------+
| 1 | AA |Rick | 2 |
| 2 | BB |Carl | 1 |
| 3 | AA |Joe | 3 |
| 4 | AA |Tina | 1 |
| 5 | BB |Phil | 2 |
| 6 | BB |Wendy | 3 |
+---------+--------+-------+----------+
if you need a random comma separated name list this will do the trick.
This will keep the groups and the correct sorting within the group.
Query
SELECT
GROUP_CONCAT(Table_names_rand.names) as names
FROM (
SELECT
*
FROM (
SELECT
GROUP_CONCAT(name ORDER BY sort_id) as names
FROM
Table1
GROUP BY
set_id
)
AS Table1_names
ORDER BY
RAND()
)
AS Table_names_rand
Result
| names |
|-------------------------------|
| Carl,Phil,Wendy,Tina,Rick,Joe |
or
| names |
|-------------------------------|
| Tina,Rick,Joe,Carl,Phil,Wendy |
demo http://www.sqlfiddle.com/#!9/487ac9/9
if you need random names as records output.
Query
SELECT
Table1.name
FROM
Table1
CROSS JOIN (
SELECT
GROUP_CONCAT(Table_names_rand.names) as names
FROM (
SELECT
*
FROM (
SELECT
GROUP_CONCAT(name ORDER BY sort_id) as names
FROM
Table1
GROUP BY
set_id
)
AS Table1_names
ORDER BY
RAND()
)
AS Table_names_rand
)
AS Table_names_rand
ORDER BY
FIND_IN_SET(name, Table_names_rand.names)
Result
| name |
|-------|
| Carl |
| Phil |
| Wendy |
| Tina |
| Rick |
| Joe |
or
| name |
|-------|
| Tina |
| Rick |
| Joe |
| Carl |
| Phil |
| Wendy |
demo http://www.sqlfiddle.com/#!9/487ac9/28
If we strip away the randomness of the gorup ordering, your query would look like this:
SELECT
*
FROM
products
ORDER BY
set_id,
sort_id;
The ordering by set_id is necessary to "group" the results, without really grouping them. You do not want to group them, because then the rows with the same group would be aggregated, meaning that only one row per group would be put out.
Since you only want to randomize the groups, you need to write another query that assigns a random number to each group, like the one below:
SELECT
set_id,
RAND() as 'rnd'
FROM
products
GROUP BY
set_id
The GROUP BY clause makes sure, that each group is only selected once. The resultset will look like this:
| set_id | priority |
+--------+---------+
| AA | 0.21 |
| BB | 0.1 |
With that result we can then randomize the output, by combining both queries with a JOIN on the set_id field. This will add the randomly generated number from the second query to the result set of the first query and therefore extend the static set_id with the randomized, but still for all group members equal, rnd:
SELECT
products.*
FROM
products
JOIN (
SELECT
set_id,
RAND() as 'rnd'
FROM
products
GROUP BY
set_id
) as rnd ON rnd.set_id = products.set_id
ORDER BY
rnd.rnd,
products.set_id,
products.sort_id;
Keep in mind, that it is important to still group on products.set_id, because it may be possible that two groups get the same random number assigned. If the result would not be ordered by products.set_id those groups members would then be merged.

MySQL Query to retrieve maximum value among certain criteria

I have products with different rankings. The products may be members of a supergroup (like Cream).
product_id | supergroup | rank | other_info
1 | Cream | 3 | Eric
2 | Zep | 1 | Jimmy
3 | Zep | 4 | Jon Paul
4 | Cream | 3 | Jack
5 | Cream | 4 | Ginger
6 | Who | 4 | Roger
7 | Who | 5 | John
8 | Who | 3 | Pete
I want to get the max product rank from each group, along with other info for that product id. Ranks are not meant for intragroup ranks. They are ranks that work across all products in the system. So more than one product may have the same rank, even in the same group.
EDIT: fixed "other_info". I had some gibberish there. Also added a row. Results should be from highest rank to lowest. But they also should only include the highest ranking product_id from each supergroup, along with matching other_info.
product_id | supergroup | rank | other_info
2 | Zep | 1 | Jimmy
8 | Who | 3 | Pete
1 | Cream | 3 | Eric
Can I do that with a simple query? The existing system's query already involves a GROUP BY statement on the supergroup, and no aggregators in the SELECT. That results in a random, but coherent row from within the group. What is the simplest way to modify the query to get a complete row, but always of the highest-ranked member of each super group.
If there is no way, what about this: Is this possible without GROUP BY?
SELECT t.*
FROM your_table t
JOIN (
SELECT MIN(product_id) as product_id #if there are multiple products with the same (min) rank in the same supergroup - get the one with lowest product_id
FROM your_table tt
JOIN (
SELECT supergroup, MIN(rank) as min_rank
FROM your_table
GROUP BY supergroup
) mr ON mr.supergroup = tt.supergroup AND mr.min_rank = tt.rank
GROUP BY tt.supergroup, tt.rank
) as mid ON mid.id.product_id = t.product_id
ORDER BY whatever_you_need_to
You need an index on (supergroup,rank) for this to run efficiently.

LIMIT results to n unique column values?

I have some MySQL results like this:
---------------------------
| name | something_random |
---------------------------
| john | ekjalsdjalfjkldd |
| alex | akjsldfjaekallee |
| alex | jkjlkjslakjfjflj |
| alex | kajslejajejjaddd |
| bob | ekakdie33kkd93ld |
| bob | 33kd993kakakl3ll |
| paul | 3k309dki595k3lkd |
| paul | 3k399kkfkg93lk3l |
etc...
This goes on for 1000's of rows of results. I need to limit the number of results to the first 50 unique names. I think there is a simple solution to this but I'm not sure.
I've tried using derived tables and variables but can't quite get there. If I could figure out how to increment a variable once every time a name is different I think I could say WHERE variable <= 50.
UPDATED
I've tried the Inner Join approach(es) suggested below. The problem is this:
The subselect SELECT DISTINCT name FROM testTable LIMIT 50 grabs the first 50 distinct names. Perhaps I wasn't clear enough in my original post, but this limits my query too much. In my query, not every name in the table is returned in the result. Let me modify my original example:
----------------------------------
| id | name | something_random |
----------------------------------
| 1 | john | ekjalsdjalfjkldd |
| 4 | alex | akjsldfjaekallee |
| 4 | alex | jkjlkjslakjfjflj |
| 4 | alex | kajslejajejjaddd |
| 6 | bob | ekakdie33kkd93ld |
| 6 | bob | 33kd993kakakl3ll |
| 12 | paul | 3k309dki595k3lkd |
| 12 | paul | 3k399kkfkg93lk3l |
etc...
So I added in some id numbers here. These ID numbers pertain to the people's names in the tables. So you can see in the results, not every single person/name in the table is necessarily in the result (due to some WHERE condition). So the 50th distinct name in the list will always have an ID number higher than 49. The 50th person could be id 79, 234, 4954 etc...
So back to the problem. The subselect SELECT DISTINCT name FROM testTable LIMIT 50 selects the first 50 names in the table. That means that my search results will be limited to names that have ID <=50, which is too constricting. If there are certain names that don't show up in the query (due to some WHERE condition), then they are still counted as one of the 50 distinct names. So you end up with too few results.
UPDATE 2
To #trapper: This is a basic simplification of what my query looks like:
SELECT
t1.id,
t1.name,
t2.details
FROM t1
LEFT JOIN t2 ON t1.id = t2.some_id
INNER JOIN
(SELECT DISTINCT name FROM t1 ORDER BY id LIMIT 0,50) s ON s.name = t1.name
WHERE
SOME CONDITIONS
ORDER BY
t1.id,
t1.name
And my results look like this:
----------------------------------
| id | name | details |
----------------------------------
| 1 | john | ekjalsdjalfjkldd |
| 3 | alex | akjsldfjaekallee |
| 3 | alex | jkjlkjslakjfjflj |
| 4 | alex | kajslejajejjaddd |
| 6 | bob | ekakdie33kkd93ld |
| 6 | bob | 33kd993kakakl3ll |
| 12 | paul | 3k309dki595k3lkd |
| 12 | paul | 3k399kkfkg93lk3l |
...
| 37 | bill | kajslejajejjaddd |
| 37 | bill | ekakdie33kkd93ld |
| 41 | matt | 33kd993kakakl3ll |
| 50 | jake | 3k309dki595k3lkd |
| 50 | jake | 3k399kkfkg93lk3l |
----------------------------------
The results stop at id=50. There are NOT 50 distinct names in the list. There are only roughly 23 distinct names.
My MySql syntax may be rusty, but the idea is to use a query to select the top 50 distinct names, then do a self-join on name and select the name and other information from the join.
select a.name, b.something_random
from Table b
inner join (select distinct name from Table order by RAND() limit 0,50) a
on a.name = b.name
SELECT DISTINCT name FROM table LIMIT 0,50
Edited: Ahh yes I misread question first time, this should do the trick though :)
SELECT a.name, b.something_random
FROM `table` b
INNER JOIN (SELECT DISTINCT name FROM `table` ORDER BY RAND() LIMIT 0,50) a
ON a.name = b.name ORDER BY a.name
How this work is the (SELECT DISTINCT name FROMtableORDER BY RAND() LIMIT 0,50) part is what pulls out the names to include in the join. So here I am taking 50 unique names at random, but you can change this to any other selection criteria if you want.
Then you join those results back into your table. This links each of those 50 selected names back to all of the rows with a matching name for your final results. Finally ORDER BY a.name just to be sure all the rows for each name end up grouped together.
This should do it:
SELECT tA.*
FROM
testTable tA
INNER JOIN
(SELECT distinct name FROM testTable LIMIT 50) tB ON tA.name = tB.name
;