Join another table with multiple rows to another table's single result - mysql

I currently select a single row (a post):
SELECT s.id AS id,s.date,s.title,s.views,s.image,s.width,s.description,u.id AS userId,u.username,u.display_name,u.avatar,
(select count(*) from comments where item_id = s.id and type = 1) as numComments,
(select count(*) from likes where item_id = s.id and type = 1) as numLikes,
(select avg(value) from ratings where showcase_id = s.id) as average,
(select count(*) from ratings where showcase_id = s.id) as total
FROM showcase AS s
INNER JOIN users AS u ON s.user_id = u.id
WHERE s.id = :id
LIMIT 5
Then get comments for that post in a separate query:
SELECT c.id as c_id,c.text,c.date,u.id as u_id,u.username,u.display_name,u.avatar
FROM comments as c
INNER JOIN users as u ON c.user_id = u.id
WHERE item_id = :item_id AND type = :type
:id and :item_id are the same. However, the comments return multiple rows whereas the first query returns one row - is there a way to join the comments to the first query or is the current way fine?

It really depends on your application.
If we are talking about a few records returned from a small or medium table, and if the query is executed just a few times a day, then it wouldn't matter much if:
you work with two record sets (two different queries are executed
and then their results are put together);
you join the two queries, copying the post information for each record from the comments query;
you build a XML with the comments and join it to the record returned in the first query (the post record).
Another factor to take in consideration is whether the post and it's comments are displayed at the same time. If this is NOT the case and the comments are not visible at first and displayed only after some action like the click of a button, then you should chose the 1st option above, for performance reasons.
But if both the post information it's comments must be displayed at the same time, then you should chose one of the 3 options above. Which one is more of a personal favorite in modeling your application data structures and it's database access layer.
Now, if the volume of data may get huge, then you should dig a little deepen and run some simulations to find the query(ies) that give you the optimal performance.

Related

How to rewrite UNION with LEFT JOIN more efficiently

I have two tables...one that registers users and one that checks in users. A user will always have a single entry in the register table but a user may have 0 or multiple entries in the checkin table. For a raffle selector, I wrote a query that is picking 1 entry from the register table and then 1 entry from the checkin table - each sub query picks a random entry so long as that userID does not exist in a 3rd table that stores the raffle winners. After the two entries are returned than it randomly selects one of the two returned entries as the winnner.
However, I believe there should be a more efficient way of writing this so its ONLY picking an entry once....not picking two entries and then picking one of the two.
It took me quite a while to figure out how to correctly write the below query as I am not proficient in mysql at all. The query works and seems to work efficiently, but I believe there should be a better way of writing it that also consolidates the amount of query code.
Hoping someone here can help or advise.
Table note: clubusers/clubHistory have multiple overlapping columns but the tables are not the same:
register = clubUsers
checkins = clubHistory
winners = clubRaffleWinners
SELECT * FROM (
(SELECT ch.user_ID,ch.clID FROM clubHistory AS ch
LEFT OUTER JOIN clubRaffleWinners AS cr1 ON
ch.user_ID=cr1.user_ID
AND cr1.cID=1157
AND cr1.rafID=18
AND cr1.crID=1001
AND cr1.ceID=1167
AND cr1.chDate1='2022-06-04'
WHERE
ch.cID=1157
AND ch.crID=1001
AND ch.ceID=1167
AND ch.chDate='2022-06-04'
AND cr1.user_ID IS NULL
GROUP BY ch.user_ID ORDER BY RAND() LIMIT 1
)
UNION
(SELECT cu.user_ID,cu.clID FROM clubUsers AS cu
LEFT OUTER JOIN clubRaffleWinners AS cr2 ON
cu.user_ID=cr2.user_ID
AND cr2.cID=1157
AND cr2.rafID=18
AND cr2.crID=1001
AND cr2.ceID=1167
AND cr2.chDate1='2022-06-04'
WHERE
cu.cID=1157
AND cu.crID=1001
AND cu.ceID=1167
AND cu.calDate<='2022-06-04'
AND cr2.user_ID IS NULL
GROUP BY cu.user_ID ORDER BY RAND() LIMIT 1
)
) AS foo order by RAND() LIMIT 1 ;
UPDATE:
As #JettoMartinez points out below, my current query could in fact randomly return the same user from each table so the final returned entry would just be the same user. I didn't realize this in my struggles just to get the above query to work. Thus my original OP asking for a more optimized query simply selecting a single random entry from both tables (where that user is not already in the winners table) is applicable for yet another reason.
There are two ways I can think of (Do note that since I don't fully understand the tables, I'm not using all the conditions you used in your JOIN statements, meaning it might need more work):
Using a exclusive subquery:
SELECT
cu.user_ID,
cu.clID,
ch.cID
FROM
clubUsers cu
LEFT JOIN clubHistory ch ON ch.user_ID = cu.user_ID
WHERE user_ID NOT IN (
SELECT
user_ID
FROM
clubRaffleWinners
WHERE
-- other conditions
)
ORDER BY RAND() LIMIT 1;
Using a LEFT "OUTER" JOIN, as you asked for:
SELECT
cu.user_ID,
cu.clID,
ch.cID -- Or any relevant field from clubHistory, really
FROM
clubUsers cu
LEFT JOIN clubHistory ch ON ch.user_ID = cu.user_ID
LEFT JOIN clubRaffleWinners cr ON cr.user_ID = cu.user_ID
AND ... -- other conditions to ensure uniqueness
AND ... -- that could also be in the WHERE part
WHERE
cr.user_ID IS NULL -- this will filter out the INNER part of the JOIN
ORDER BY RAND() LIMIT 1;
I don't have a dataset to properly test this queries, so please take them as a concept. I also didn't queried in clubHistory since I honestly don't see the point of doing so. Interpolating clubRaggleWinners to clubUsers seems enough for me.
EDIT
Since the user_ID in clubHistory is relevant to the raffle, I added a LEFT JOIN to it and added a field from said table in the SELECT statement, so that the user_id repeats once per entry in clubHistory plus the row of clubUsers, meaning that every user has 1 + number of entries / number of users + number of entries - number of winners chances to win.
This logic can be applied to the first query with a subquery too, and if the added field needs to be out, the query could be wrapped in a CTE or a subquery.
From what you are describing, and I want to make sure I understand.
Every registered person is qualified 1 entry.
However, each time they have checked in, they get 1 entry for each time they checked in. So, for someone registered and has NEVER checked-in, they get 1 entry. But if someone registered, and checked in 3 times, they would get a total of JUST the 3 times they checked in, vs 4 just for being registered.
Regardless of who is POSSIBLE, you want to EXCLUDE all people who have already been a winner in the raffle.
You SHOULD be able to get results from this below. Since the columns appear to be the same filtering on the cID, crID, ceID and Date, I have the primary FROM based on the registered clubUsers.
From that, a left-join to the clubHistory will either allow that person's ID to be returned once if only registered, OR multiple times based on the times checked in such as the example.
From the given user, I am also directly left-joining to the raffle winning history on the same criteria. If its the same criteria to the club history join, and the same criteria to the raffle (with exception of rafID = 18), appearing to indicate a specific raffle being drawn for, If the person is found, or not, the final WHERE accounts to exclude if its the single entry, or multiple entries via the IS NULL test.
The query will return all entries single or multiple, that have not already won in the order by RAND() qualifier, and apply a single LIMIT 1 to get the final winner. I dont know why you needed what appeared to be the clubhouse ID when you only really care about WHO won, without any regard to being a clubhouse history entry or not.
SELECT
cu.user_ID
FROM
clubUsers AS cu
LEFT JOIN clubHistory ch
on cu.user_ID = ch.user_ID
AND cu.cID = ch.cID
AND cu.crID = ch.crID
AND cu.ceID = ch.ceID
AND ch.chDate = '2022-06-04'
LEFT JOIN clubRaffleWinners AS crw
ON cu.user_ID = crw.user_ID
AND cu.cID = crw.cID
AND cu.crID = crw.crID
AND cu.ceID = crw.ceID
AND crw.chDate1 = '2022-06-04'
AND crw.rafID = 18
WHERE
cu.cID = 1157
AND cu.crID = 1001
AND cu.ceID = 1167
AND cu.calDate <= '2022-06-04'
AND crw.user_id IS NULL
order by
RAND()
LIMIT 1
For performance purposes, I would ensure the following indexes
table index
clubUsers ( cid, crID, ceID, calDate, user_id )
clubHistory ( user_id, cID, crID, ceID, chDate )
clubRaffleWinners ( user_id, cID, crID, ceID, chDate1, rafID )
(Just a Comment, but need formatting.)
I would start by trying to put these 4 values in a single table, not repeated across 3 tables:
cu.cID=1157
AND cu.crID=1001
AND cu.ceID=1167
AND cu.calDate<='2022-06-04'
Please provide SHOW CREATE TABLE for each table; then I can assess whether the recommended indexes make sense.

Does JOIN or LEFT JOIN keep checking in a SELECT query?

I have a JOIN query but I need to optimize it for performance.
For example, in this query:
"SELECT id FROM users WHERE id = :id"
Since there is no LIMIT 1 at the end of the query, that select query will keep searching. If I add LIMIT 1 to the end of that query, it will select only one and stop searching for more.
Here is my question and query:
"SELECT messages.text, users.name
FROM messages
LEFT JOIN users
ON messages.from_id = users.id
WHERE messages.user_id = :user_id"
In the JOIN users ON messages.from_id = users.id part, since there is only 1 user with that ID, will it keep searching after it has found that query? If it does, how can I optimize it so that it only searches for 1 row?
SELECT id FROM users WHERE id = :id
If there is no index on id, the entire table is scanned.
If there is a UNIQUE or PRIMARY KEY on id, only one row will be checked.
If there is a plain INDEX, it will scan from the first match until it finds an id that does not match.
For this:
SELECT m.text, u.name
FROM messages AS m
LEFT JOIN users AS u ON m.from_id = u.id
WHERE m.user_id = :user_id
It will do a "Nested Loop Join":
Find the occurrence(s) in messages that satisfy m.user_id = :user_id (see above).
For each such row, reach into users based on the ON clause.
There may be multiple rows (again, depending the index or lack of such).
So, your question "how can I optimize it so that it only searches for 1 row" is answered:
If there can only be one row, declare it UNIQUE.
If there are sometimes more than on, then INDEX. But don't worry about checking for an extra row; it is not that costly.
You say "only 1 user with that ID", but fail to specify which id in which table.
But that is not the end of the story...
LEFT JOIN may get turned into JOIN. In that case, users may be the first table to look at. Note also that the Optimizer is smart enough to deduce that you want u.id = :user_id. Anyway, the NLJ will start with users, then reach into messages. Again, the types of indexes are important.
Please provide SHOW CREATE TABLE for both tables. Then I can condense the answer to the relevant parts. Please provide EXPLAIN SELECT ... for confirmation of what I am saying.

Querying a large table using mysql

I manage a property website. I have a table with banned users (small table) and a table called advert_views which keeps track of each listing that each user views (currently 1.3m lines and growing). The advert_views table alsio takes note of the IP address for every advert viewed).
I want to get the IP addresses used by the banned users and check if any of these banned users have opened new accounts. I ran the following query:
SELECT adviews.user_id AS 'banned user_id',
adviews.client_ip AS 'IPs used by banned users',
adviews2.user_id AS 'banned users that opened a new account'
FROM banned_users
LEFT JOIN users on users.email_address = banned_users.email_address #since I don't store the user_id in banned_users
LEFT JOIN advert_views adviews ON adviews.user_id = users.id AND adviews.user_id IS NOT NULL # users may view listings when not logged in but they have restricted access to the information on the listing
LEFT JOIN (SELECT client_ip,
user_id
FROM advert_views
WHERE user_id IS NOT NULL
) adviews2
ON adviews2.client_ip = adviews.client_ip
WHERE banned_users.rec_status = 1 and adviews.user_id <> adviews2.user_id
GROUP BY adviews2.user_id
I applied an index on the advert_views table and the users table as per below:
enter image description here
My query takes half an hour to execute. Is there a way how to improve my query speed?
Thanks!
Chris
First of all: Why do you outer join the tables? Or better: Why do you try to outer join the tables? A left join is meant to get data from a table even when there is no match. But then your results could contain rows with all values null. (That doesn't happen though, because adviews.user_id <> adviews2.user_id in your where clause dismisses all outer-joined rows.) Don't give the DBMS more work to do than necessary. If you want inner joins, then don't outer join. (Though the difference in execution time won't be huge.)
Next: You select from banned_users, but you only use it to check existence. You shouldn't do this. Use an EXISTS or IN clause instead. (This is mainly for readability and in order not to produce duplicate results. This probably won't speed things up.)
SELECT av1.user_id AS 'banned user_id',
av2.client_ip AS 'IPs used by banned users',
av2.user_id AS 'banned users that opened a new account'
FROM adviews av1
JOIN adviews av2 ON av2.client_ip = av1.client_ip AND av2.user_id <> av1.user_id
WHERE av1.user_id IN
(
SELECT user_id
FROM users
WHERE email_address IN (select email_address from banned_users where rec_status = 1)
)
GROUP BY av2.user_id;
You may replace the inner IN clause with a join. It's mostly a matter of personal preference, but it is also that in the past MySQL sometimes didn't perform well on IN clauses, so many people made it a habit to join instead.
WHERE av1.user_id IN
(
SELECT u.user_id
FROM users u
JOIN banned_users bu ON bu.email_address = u.email_address
WHERE bu.rec_status = 1
)
At last consider removing the GROUP BY clause. It reduces your results to one row per reusing user_id, showing one of its related banned user_ids (arbitrarily chosen in case there is more than one). I don't know your tables. Are you getting many records per reusing user_id? If not, remove the clause.
As to indexes I suggest:
banned_users(rec_status, email_address)
users(email_address, user_id)
adviews(user_id, client_ip)
adviews(client_ip, user_id)

MySQL - 3 tables, is this complex join even possible?

I have three tables: users, groups and relation.
Table users with fields: usrID, usrName, usrPass, usrPts
Table groups with fields: grpID, grpName, grpMinPts
Table relation with fields: uID, gID
User can be placed in group in two ways:
if collect group minimal number of points (users.usrPts > group.grpMinPts ORDER BY group.grpMinPts DSC LIMIT 1)
if his relation to the group is manually added in relation tables (user ID provided as uID, as well as group ID provided as gID in table named relation)
Can I create one single query, to determine for every user (or one specific), which group he belongs, but, manual relation (using relation table) should have higher priority than usrPts compared to grpMinPts? Also, I do not want to have one user shown twice (to show his real group by points, but related group also)...
Thanks in advance! :) I tried:
SELECT * FROM users LEFT JOIN (relation LEFT JOIN groups ON (relation.gID = groups.grpID) ON users.usrID = relation.uID
Using this I managed to extract specified relations (from relation table), but, I have no idea how to include user points, respecting above mentioned priority (specified first). I know how to do this in a few separated queries in php, that is simple, but I am curious, can it be done using one single query?
EDIT TO ADD:
Thanks to really educational technique using coalesce #GordonLinoff provided, I managed to make this query to work as I expected. So, here it goes:
SELECT o.usrID, o.usrName, o.usrPass, o.usrPts, t.grpID, t.grpName
FROM (
SELECT u.*, COALESCE(relationgroupid,groupid) AS thegroupid
FROM (
SELECT u.*, (
SELECT grpID
FROM groups g
WHERE u.usrPts > g.grpMinPts
ORDER BY g.grpMinPts DESC
LIMIT 1
) AS groupid, (
SELECT grpUID
FROM relation r
WHERE r.userUID = u.usrID
) AS relationgroupid
FROM users u
)u
)o
JOIN groups t ON t.grpID = o.thegroupid
Also, if you are wondering, like I did, is this approach faster or slower than doing three queries and processing in php, the answer is that this is slightly faster way. Average time of this query execution and showing results on a webpage is 14 ms. Three simple queries, processing in php and showing results on a webpage took 21 ms. Average is based on 10 cases, average execution time was, really, a constant time.
Here is an approach that uses correlated subqueries to get each of the values. It then chooses the appropriate one using the precedence rule that if the relations exist use that one, otherwise use the one from the groups table:
select u.*,
coalesce(relationgroupid, groupid) as thegroupid
from (select u.*,
(select grpid from groups g where u.usrPts > g.grpMinPts order by g.grpMinPts desc limit 1
) as groupid,
(select gid from relations r where r.userId = u.userId
) as relationgroupid
from users u
) u
Try something like this
select user.name, group.name
from group
join relation on relation.gid = group.gid
join user on user.uid = relation.uid
union
select user.name, g1.name
from group g1
join group g2 on g2.minpts > g1.minpts
join user on user.pts between g1.minpts and g2.minpts

The least amount of code possible for this MySQL query?

I have a MySQL query that:
gets data from three tables linked by unique id's.
counts the number of games played in each category, from each user
and counts the number of games each user has played that fall under the "fps" category
It seems to me that this code could be a lot smaller. How would I go about making this query smaller. http://sqlfiddle.com/#!2/6d211/1
Any help is appreciated even if you just give me links to check out.
Generally it's a good idea to have your join logic as part of the [Inner|Left] Join clause, rather than as part of the Where clause. In your case of simplifying the query, this cleans up your Where clause so that the query processor doesn't apply filter conditions too early, which restricts what you want to do in more complex parts of the query (and impacts the overall performance of the query).
By refactoring the join conditions, we can reduce the query to its core join across the three tables, and then add the join to the specialised subquery where the aggregation occurs. This results in only one nested query, which joins across the fewest tables needed.
Here's what I came up with:
SELECT
u.user_id
,pg.game_id
,u.user
,g.game
,g.game_cat
,ga.cat_count
,ga.fps_count
FROM users u
inner join played_games pg
on u.user_id = pg.user_id
inner join games g
on pg.game_id = g.id
inner join
(
select
ipg.user_id
,ig.game_cat
,count(ig.game) cat_count
,sum(case when ig.game_cat = 'fps' then 1 else 0 end) fps_count
from played_games ipg
inner join games ig
on ipg.game_id = ig.id
group by
ipg.user_id
,ig.game_cat
) ga
on g.game_cat = ga.game_cat
and pg.user_id = ga.user_id
order by
ga.fps_count desc
,u.user
,ga.cat_count desc;
One difference between the original query (apart from the slight rename) is that the fps_count field has a value of 0 instead of NULL for players who haven't played a single FPS game. Hopefully this isn't so critical, but rather helps to add meaning to the query.
Lastly, I'm not sure about the context of how this is going to be used. In my opinion it's probably trying to do too much in both listing every game played by every user (one objective) and summarising the categories of games played by each user (a separate objective). This means that the summary details are being repeated multiple times, e.g. for users playing multiple games of a particular category, which may not be ideal. My recommendation would be to separate these out into two separate queries, though I don't know whether that would meet your specific needs.
Hope this helps.
I was thinking whether to provide d_mcg solution or this one. I decided to go for this one. I was wondering which one would be faster. That's something you can try and tell us :)
select u.user_id, pg.game_id, u.user, g.game, g.game_cat,
(select count(*) from played_games pg2
join games g2 on pg2.game_id = g2.id
where pg2.user_id = pg.user_id and g2.game_cat = g.game_cat) cat_count,
(select count(*) from played_games pg3
join games g3 on pg3.game_id = g3.id
where pg3.user_id = pg.user_id and g3.game_cat = g.game_cat and
g.game_cat = 'fps') order_count
from users u
left join played_games pg on u.user_id = pg.user_id
join games g on pg.game_id = g.id
order by order_count desc, u.user, cat_count desc