The least amount of code possible for this MySQL query? - mysql

I have a MySQL query that:
gets data from three tables linked by unique id's.
counts the number of games played in each category, from each user
and counts the number of games each user has played that fall under the "fps" category
It seems to me that this code could be a lot smaller. How would I go about making this query smaller. http://sqlfiddle.com/#!2/6d211/1
Any help is appreciated even if you just give me links to check out.

Generally it's a good idea to have your join logic as part of the [Inner|Left] Join clause, rather than as part of the Where clause. In your case of simplifying the query, this cleans up your Where clause so that the query processor doesn't apply filter conditions too early, which restricts what you want to do in more complex parts of the query (and impacts the overall performance of the query).
By refactoring the join conditions, we can reduce the query to its core join across the three tables, and then add the join to the specialised subquery where the aggregation occurs. This results in only one nested query, which joins across the fewest tables needed.
Here's what I came up with:
SELECT
u.user_id
,pg.game_id
,u.user
,g.game
,g.game_cat
,ga.cat_count
,ga.fps_count
FROM users u
inner join played_games pg
on u.user_id = pg.user_id
inner join games g
on pg.game_id = g.id
inner join
(
select
ipg.user_id
,ig.game_cat
,count(ig.game) cat_count
,sum(case when ig.game_cat = 'fps' then 1 else 0 end) fps_count
from played_games ipg
inner join games ig
on ipg.game_id = ig.id
group by
ipg.user_id
,ig.game_cat
) ga
on g.game_cat = ga.game_cat
and pg.user_id = ga.user_id
order by
ga.fps_count desc
,u.user
,ga.cat_count desc;
One difference between the original query (apart from the slight rename) is that the fps_count field has a value of 0 instead of NULL for players who haven't played a single FPS game. Hopefully this isn't so critical, but rather helps to add meaning to the query.
Lastly, I'm not sure about the context of how this is going to be used. In my opinion it's probably trying to do too much in both listing every game played by every user (one objective) and summarising the categories of games played by each user (a separate objective). This means that the summary details are being repeated multiple times, e.g. for users playing multiple games of a particular category, which may not be ideal. My recommendation would be to separate these out into two separate queries, though I don't know whether that would meet your specific needs.
Hope this helps.

I was thinking whether to provide d_mcg solution or this one. I decided to go for this one. I was wondering which one would be faster. That's something you can try and tell us :)
select u.user_id, pg.game_id, u.user, g.game, g.game_cat,
(select count(*) from played_games pg2
join games g2 on pg2.game_id = g2.id
where pg2.user_id = pg.user_id and g2.game_cat = g.game_cat) cat_count,
(select count(*) from played_games pg3
join games g3 on pg3.game_id = g3.id
where pg3.user_id = pg.user_id and g3.game_cat = g.game_cat and
g.game_cat = 'fps') order_count
from users u
left join played_games pg on u.user_id = pg.user_id
join games g on pg.game_id = g.id
order by order_count desc, u.user, cat_count desc

Related

MySQL LEFT JOIN working but slow, INNER works when the db is populated

I am building a statistics page for users of my site to see how much they have been interacting with the various features. On the site they can do four different types of activity, get points for them, and give feedback to other users on their answers. These are all recorded in the database.
In the points tables, a row is created when a user does something for the first time, for example watches a particular video. There is a new row for each video.
In the feedback tables, a row is created for each unique piece of feedback, with user id, feedback etc.
The problem I am having is that whlst two types of join work to get the expected output, INNER JOIN and LEFT JOIN they have have radically different execution times.
LEFT JOIN executes but takes forever (and subsequently crashes my low memory computer!).
INNER JOIN executes but only if all the tables have at least one matching row for the user.
Im sure therefore I am missing some efficiency step here, or misunderstanding the usage of the joins in this context. I could use help looking at the SQL statement and looking at how it could be improved.
Fiddle Link: http://sqlfiddle.com/#!9/9dce51 (with limited data due to size!)
The problems are with the groups of eight below - the other queries execute swiftly and properly.
SELECT users.forename, users.surname, users.email,
users_peer_request.user_id, COUNT(DISTINCT users_peer_request.id) as peer_req_cnt,
COUNT(DISTINCT users_self_assessment.id) AS self_assess_cnt, AVG(users_self_assessment.assessment) AS self_assess_avg,
GROUP_CONCAT(DISTINCT classes.name, '-', classes.id SEPARATOR ',') AS classes,
COUNT(DISTINCT a_experiment_points.id) AS tried_exp,
COUNT(DISTINCT a_phetlabs_points.id) AS tried_phet,
COUNT(DISTINCT a_videonotes_points.id) AS tried_video,
COUNT(DISTINCT a_working_points.id) AS tried_work,
COUNT(DISTINCT a_experiment_feedback.id) AS fb_exp,
COUNT(DISTINCT a_phetlab_feedback.id) AS fb_phet,
COUNT(DISTINCT a_videonotes_feedback.id) AS fb_video,
COUNT(DISTINCT a_working_feedback.id) AS fb_work
FROM users_peer_request
LEFT JOIN classes ON classes.id IN (SELECT classes_students.class_id FROM classes_students WHERE classes_students.student_id = users_peer_request.user_id)
LEFT JOIN users ON users.gid = users_peer_request.user_id
LEFT JOIN users_self_assessment ON users_self_assessment.user_id = users_peer_request.user_id
INNER JOIN a_experiment_points ON a_experiment_points.user_id = users_peer_request.user_id
INNER JOIN a_phetlabs_points ON a_phetlabs_points.user_id = users_peer_request.user_id
INNER JOIN a_videonotes_points ON a_videonotes_points.user_id = users_peer_request.user_id
INNER JOIN a_working_points ON a_working_points.user_id = users_peer_request.user_id
INNER JOIN a_experiment_feedback ON a_experiment_feedback.user_fb_id = users_peer_request.user_id
INNER JOIN a_phetlab_feedback ON a_phetlab_feedback.user_fb_id = users_peer_request.user_id
INNER JOIN a_videonotes_feedback ON a_videonotes_feedback.user_id_fb = users_peer_request.user_id
INNER JOIN a_working_feedback ON a_working_feedback.user_fb_id = users_peer_request.user_id
WHERE users_peer_request.user_id = 123456789123456789123456
GROUP BY users_peer_request.user_id
This query run on the fiddle provided I would hope returns NULL or 0 where the user does not appear in the table, and a number otherwise which counts that value.
Thanks so much, I hope you have enough detail here but if not please ask :)

Join another table with multiple rows to another table's single result

I currently select a single row (a post):
SELECT s.id AS id,s.date,s.title,s.views,s.image,s.width,s.description,u.id AS userId,u.username,u.display_name,u.avatar,
(select count(*) from comments where item_id = s.id and type = 1) as numComments,
(select count(*) from likes where item_id = s.id and type = 1) as numLikes,
(select avg(value) from ratings where showcase_id = s.id) as average,
(select count(*) from ratings where showcase_id = s.id) as total
FROM showcase AS s
INNER JOIN users AS u ON s.user_id = u.id
WHERE s.id = :id
LIMIT 5
Then get comments for that post in a separate query:
SELECT c.id as c_id,c.text,c.date,u.id as u_id,u.username,u.display_name,u.avatar
FROM comments as c
INNER JOIN users as u ON c.user_id = u.id
WHERE item_id = :item_id AND type = :type
:id and :item_id are the same. However, the comments return multiple rows whereas the first query returns one row - is there a way to join the comments to the first query or is the current way fine?
It really depends on your application.
If we are talking about a few records returned from a small or medium table, and if the query is executed just a few times a day, then it wouldn't matter much if:
you work with two record sets (two different queries are executed
and then their results are put together);
you join the two queries, copying the post information for each record from the comments query;
you build a XML with the comments and join it to the record returned in the first query (the post record).
Another factor to take in consideration is whether the post and it's comments are displayed at the same time. If this is NOT the case and the comments are not visible at first and displayed only after some action like the click of a button, then you should chose the 1st option above, for performance reasons.
But if both the post information it's comments must be displayed at the same time, then you should chose one of the 3 options above. Which one is more of a personal favorite in modeling your application data structures and it's database access layer.
Now, if the volume of data may get huge, then you should dig a little deepen and run some simulations to find the query(ies) that give you the optimal performance.

Mysql query incredibly slow in LEFT JOIN if the given value 0 to the primary id

In this sql:
SELECT s.*,
u.id,
u.name
FROM shops s
LEFT JOIN users u ON u.id = s.user_id
OR u.id = s.owner_user_id
WHERE s.status = 1
For some reason this query takes an amazing time. although id is the primary key. it seems especially after I added this part OR u.id=s.owner_user_id the query became slow. owner_user_id often is 0 only handful of times. But why would it take so long apparently scanning the whole table? The database table users is very long and big. I didn't design it. this is for a client who subsequent programmers added too many fields. the table is 22k rows and dozens of fields.
*the names of the fields for demonstration only. actual names are different, so don't ask me why I'm looking for owner_user_id (; I did solve the slowness by remove the "OR ..." part and instead searching for the id in the loop if it is not 0. but I would like to know why this is happening and how to speedup that query as is.
You may be able to speed it up by using IN instead of the OR but that is minor.
SELECT u.id,
u.name
FROM shops s
LEFT JOIN users u ON u.id IN ( s.user_id, s.owner_user_id )
WHERE s.status = 1
Firstly, are there any indexes on this table? Mainly one on the user.id field or the s.user_id or s.owner_user_id?
However, I must ask why you need to use a LEFT JOIN instead of a regular join. The LEFT JOIN causes the matching of every row with every other one. And since I'm assuming the value / id should either be in the user_id or the owner_user_id field, and that there will always be a match, if that is the case then the use of a JOIN should speed the query up a bit.
And as Mitch said, 22k rows is tiny.
How are you going to know which user record is which? Here's how I'd do it
SELECT s.*,
u.name AS user_name,
o.name AS owner_name
FROM shops s
LEFT JOIN users u ON s.user_id = u.id
LEFT JOIN users o ON s.owner_user_id = o.id
WHERE s.status = 1
I've omitted the IDs from the user table in the SELECT as these will be part of s.* anyway.
I'm curious about the left joins too. If shops.user_id and shops.owner_user_id are required foreign keys, use inner joins instead.

Counting results from multiple tables with same column

I have a system where, essentially, users are able to put in 3 different pieces of information: a tip, a comment, and a vote. These pieces of information are saved to 3 different tables. The linking column of each table is the user ID. I want to do a query to determine if the user has any pieces of information at all, of any of the three types. I'm trying to do it in a single query, but it's coming out totally wrong. Here's what I'm working with now:
SELECT DISTINCT
*
FROM tips T
LEFT JOIN comments C ON T.user_id = C.user_id
LEFT JOIN votes V ON T.user_id = V.user_id
WHERE T.user_id = 1
This seems to only be getting the tips, duplicated for as many votes or comments there are, even if the votes or comments weren't made by the specified user_id.
I only need a single number in return, not individual counts of each type. I basically want a sum of the number of tips, comments, and votes saved under that user_id, but I don't want to do three queries.
Anyone have any ideas?
Edit: Actually, I don't even technically need an actual count, I just need to know if there are any rows in any of those three tables with that user_id.
Edit 2: I almost have it with this:
SELECT
COUNT(DISTINCT T.tip_id),
COUNT(DISTINCT C.tip_id),
COUNT(DISTINCT V.tip_id)
FROM tips T
LEFT JOIN comments C ON T.user_id = C.user_id
LEFT JOIN votes V ON T.user_id = V.user_id
WHERE T.user_id = 1
I'm testing with user_id 1 (me). I've made 11 tips, voted 4 times, and made no comments. My return is a row with 3 columns: 11, 0, 4. That's the proper count. However, I tested it with a user that hasn't made any tips or comments, but has voted 3 times, that returned 0 for all counts, it should have returned: 0, 0, 3.
The problem that I'm having seems to be that if the table that I'm using for the WHERE clause doesn't have any rows from that user_id, then I get 0 across the board, even if the other tables DO have rows with that user_id. I could use this query:
SELECT
(SELECT COUNT(*) FROM tips WHERE user_id = 2) +
(SELECT COUNT(*) FROM comments WHERE user_id = 2) +
(SELECT COUNT(*) FROM votes WHERE user_id = 2) AS total
But I really wanted to avoid running multiple queries, even if they're subqueries like this.
UPDATE
Thanks to ace, I figured this out:
SELECT
(COUNT(DISTINCT T.tip_id) + COUNT(DISTINCT C.tip_id) + COUNT(DISTINCT V.tip_id)) AS total
FROM users U
LEFT JOIN tips T ON U.user_id = T.user_id
LEFT JOIN votes V ON U.user_id = V.user_id
LEFT JOIN comments C ON U.user_id = C.user_id
WHERE U.user_id = 4
the users table contains the actual information bout the user including, obviously, the user id. I used the user table as the parent, since I could be 100% sure that the user would be present in that table, even if they weren't in the other tables. I got the proper count that I wanted with this query!
As I understand your question. You want to count the total comments + tips + votes for each user. Though is not really clear to me take a look at below query. I added columns for details this is a cross tabs query as someone teach me.
EDITED QUERY:
SELECT
COALESCE(COALESCE(t2.tips,0) + COALESCE(c2.comments,0) + COALESCE(v2.votes,0)) AS `Totals`
FROM parent p
LEFT JOIN (SELECT t.user_id, COUNT(t.tip_id) AS tips FROM tips t GROUP BY t.user_id) t2
ON p.user_id = t2.user_id
LEFT JOIN (SELECT c.user_id, COUNT(c.tip_id) AS comments FROM comments c GROUP BY c.user_id) c2
ON p.user_id = c2.user_id
LEFT JOIN (SELECT v.user_id, COUNT(v.tip_id) AS votes FROM votes v GROUP BY v.user_id) v2
ON p.user_id = v2.user_id
WHERE p.user_id = 1;
Note: This used a parent table in order to get the result of a table which doesn't in other table.
The reason why I use a sub-query in my JOIN is to create a virtual table that will get the sum of tip_id for each table. Also I'm having problem with the DISTINCT using the same query of yours, so I end up with this query.
I know you prefer not using sub-queries, but I failed without a sub-query. For now this is all I can.

Join single row from a table in MySQL

I have two tables players and scores.
I want to generate a report that looks something like this:
player first score points
foo 2010-05-20 19
bar 2010-04-15 29
baz 2010-02-04 13
Right now, my query looks something like this:
select p.name player,
min(s.date) first_score,
s.points points
from players p
join scores s on s.player_id = p.id
group by p.name, s.points
I need the s.points that is associated with the row that min(s.date) returns. Is that happening with this query? That is, how can I be certain I'm getting the correct s.points value for the joined row?
Side note: I imagine this is somehow related to MySQL's lack of dense ranking. What's the best workaround here?
This is the greatest-n-per-group problem that comes up frequently on Stack Overflow.
Here's my usual answer:
select
p.name player,
s.date first_score,
s.points points
from players p
join scores s
on s.player_id = p.id
left outer join scores s2
on s2.player_id = p.id
and s2.date < s.date
where
s2.player_id is null
;
In other words, given score s, try to find a score s2 for the same player, but with an earlier date. If no earlier score is found, then s is the earliest one.
Re your comment about ties: You have to have a policy for which one to use in case of a tie. One possibility is if you use auto-incrementing primary keys, the one with the least value is the earlier one. See the additional term in the outer join below:
select
p.name player,
s.date first_score,
s.points points
from players p
join scores s
on s.player_id = p.id
left outer join scores s2
on s2.player_id = p.id
and (s2.date < s.date or s2.date = s.date and s2.id < s.id)
where
s2.player_id is null
;
Basically you need to add tiebreaker terms until you get down to a column that's guaranteed to be unique, at least for the given player. The primary key of the table is often the best solution, but I've seen cases where another column was suitable.
Regarding the comments I shared with #OMG Ponies, remember that this type of query benefits hugely from the right index.
Most RDMBs won't even let you include non aggregate columns in your SELECT clause when using GROUP BY. In MySQL, you'll end up with values from random rows for your non-aggregate columns. This is useful if you actually have the same value in a particular column for all the rows. Therefore, it's nice that MySQL doesn't restrict us, though it's an important thing to understand.
A whole chapter is devoted to this in SQL Antipatterns.