I have two tables:
user_score_post -> fields: id, post_id, user_id, score_date -> with about 3m rows
post -> field: id, user_id, body -> with about 10k rows...
and this is my query for retrieving monthly rank of a user with dedicated user_id and score according to the number of likes that his posts have:
SELECT COUNT(*) FROM
(SELECT COUNT(l.id) AS likes
FROM user_score_post l
JOIN post p ON p.id = l.post_id
AND score_date >= :time // last month
GROUP BY p.user_id) AS score
WHERE score.likes > :score // user current score
but it takes 2.4 seconds to execute. Is it normal despite using proper indexes and a powerful dedicated server?
what is the best alternative for this query? and what is the best indexing composition?
It would seem the following returns the score for each user. Perhaps also score_date should use bewteen month start and end.
select p.user_id, count(*) from user_post_score l
join post p on (p.user_id=l.user_id)
where score_date > :time
group by p.user_id
Sometimes in MySQL it is faster to use a correlated subquery rather than a join with aggregation. You might try:
SELECT COUNT(*)
FROM (SELECT (SELECT COUNT(*) AS likes
FROM user_score_post l
WHERE p.id = l.post_id AND score_date >= :time
) as likes
FROM post p
) score
WHERE score.likes > :score // user current score
Then, the index that you want is user_score_post(post_id, score_date). This might help.
Related
I have 3 tables:
I would like to select the difference of the total gain and total spent per user. So my hypothetical table could be:
I tried this:
SELECT g.total - s.total AS quantity, id FROM
(SELECT SUM(quantity) AS total FROM gain GROUP BY user) AS g,
(SELECT SUM(quantity) AS total FROM spent GROUP BY user) AS s, users
But it doesn't work...
You need to use the users table as base table, to be able to consider all the users, and then LEFT JOIN to the sub queries computing the total spent and total gain. This is because some user may not have any entry in either gain or spent table(s). Also, Coalesce() function handles the NULL (in case of no matching row)
SELECT
u.id AS user,
COALESCE(tot_gain, 0) - COALESCE(tot_spent, 0) AS balance
FROM users AS u
LEFT JOIN (SELECT user, SUM(quantity) as tot_spent
FROM spent
GROUP BY user) AS s ON s.user = u.id
LEFT JOIN (SELECT user, SUM(quantity) as tot_gain
FROM gain
GROUP BY user) AS g ON g.user = u.id
Madhur's solution is fine. An alternative is union all and group by:
select user, sum(gain) as gain, sum(spent) as spent
from ((select user, quantity as gain, 0 as spent
from gain
) union all
(select user, 0, quantity as spent
from spent
)
) u
group by user;
You can join to user if you want users that are not in either table or you need additional columns. However, that join may not be necessary.
Im having trouble structuring my MySQL query to return an accurate comment count, sum of votes, and the active users vote.
My tables are
wall_posts ( id, message, username, etc )
comments ( id, wall_id, username, text, etc )
votes ( id, wall_id, vote (+1 or -1), username )
My query looks like this
SELECT
wall_posts.*,
COUNT( comments.wall_id ) AS comment_count,
COALESCE( SUM( v1.vote ), 0 ) AS vote_tally,
v2.vote
FROM
wall_posts
LEFT JOIN comments ON wall_posts.id = comments.wall_id
LEFT JOIN votes v1 ON wall_posts.id = v1.wall_id
LEFT JOIN votes v2 ON wall_posts.id = v2.wall_id AND v2.username=:username
WHERE
symbol =: symbol
GROUP BY
wall_posts.id
ORDER BY
date DESC
LIMIT 15
It works for always returning the correct value for the specific active users vote (+1 or -1) or null if hasnt voted. If there are no comments on an item, the total vote sum is correct. If there are any comments, the vote sum will always be equal to the comment count, possibly with a negative sign if there are down votes but always equal to the amount of comments.
I think its obviously the way ive connected my tables but i just cant figure out why its copying the comment count, 1000000 points to someone who can explain this to me :)
You need to perform the aggregate operations in subqueries. Right now instead you're JOINing all of the tables (pre-aggregation) together. If you remove the aggregates (and the GROUP BY) you'll see the large mass of data which doesn't really mean anything.
Instead, try this (note I'm using a VIEW):
CREATE VIEW walls_posts_stats AS
SELECT
wall_posts.id,
COALESCE( comments_stats.comment_count, 0 ) AS comment_count,
COALESCE( votes_stats.vote_tally, 0 ) AS vote_tally
FROM
wall_posts
LEFT OUTER JOIN
(
SELECT
wall_id,
COUNT(*) AS comment_count
FROM
comments
GROUP BY
wall_id
) AS comments_stats ON wall_posts.id = comments_stats.wall_id
LEFT OUTER JOIN
(
SELECT
wall_id,
SUM( vote ) AS vote_tally
FROM
votes
GROUP BY
wall_id
) AS votes_stats ON wall_posts.id = votes_stats.wall_id
Then you can query it JOINed with your original wall data:
SELECT
wall_posts.*, -- note: avoid the use of * in production queries
stats.comment_count,
stats.vote_tally,
user_votes.vote
FROM
wall_posts
INNER JOIN walls_posts_stats AS stats ON wall_posts.id = stats.id
LEFT OUTER JOIN
(
SELECT
wall_id,
vote
FROM
votes
WHERE
username = :username
) AS user_votes ON wall_posts.id = user_votes.wall_id
ORDER BY
date DESC
LIMIT 15
Hypothetically you could combine it into a single large query (basically copy+paste the VIEW body into the INNER JOIN walls_posts_stats clause) but I feel that would introduce maintainability issues.
While MySQL does support views, it does not support parameterized views (aka composable table-valued functions; stored procedures are not composable) so that's why the user_votes subquery isn't in the walls_posts_stats VIEW.
I am saving the history of Facebook likes for a page, identified by user_id.
Now from this table, I need to get a set representing the user_id's and their latest number of likes, based on the most recent timestamp.
I started off with this:
SELECT *
FROM facebook_log
GROUP BY user_id
ORDER BY timestamp DESC;
But that does not do what I want because it returns the first records with the lowest timestamps.
I read something online about GROUP returning the very first records from the table.
I also understood something about JOIN the table with itself, but that doesn't work either, or I did something wrong.
If you just need the user_id and the timestamp, you can just do
select f.user_id, max(f.timestamp)
from facebook_log
group by user_id;
if you need all the data from the table, you can do
select f.*
from facebook_log f
inner join (select max(timestamp) mt, user_id
from facebook_log
group by user_id) m
on m.user_id = f.user_id and m.mt = f.timestamp
You can also get the latest number of likes by using this MySQL trick:
select f.user_id, max(f.timestamp),
substring_index(group_concat(f.numlikes order by f.timestamp desc), ',', 1) as LatestLikes
from facebook_log f
group by f.user_id;
I have these tables and queries as defined in sqlfiddle.
First my problem was to group people showing LEFT JOINed visits rows with the newest year. That I solved using subquery.
Now my problem is that that subquery is not using INDEX defined on visits table. That is causing my query to run nearly indefinitely on tables with approx 15000 rows each.
Here's the query. The goal is to list every person once with his newest (by year) record in visits table.
Unfortunately on large tables it gets real sloooow because it's not using INDEX in subquery.
SELECT *
FROM people
LEFT JOIN (
SELECT *
FROM visits
ORDER BY visits.year DESC
) AS visits
ON people.id = visits.id_people
GROUP BY people.id
Does anyone know how to force MySQL to use INDEX already defined on visits table?
Your query:
SELECT *
FROM people
LEFT JOIN (
SELECT *
FROM visits
ORDER BY visits.year DESC
) AS visits
ON people.id = visits.id_people
GROUP BY people.id;
First, is using non-standard SQL syntax (items appear in the SELECT list that are not part of the GROUP BY clause, are not aggregate functions and do not sepend on the grouping items). This can give indeterminate (semi-random) results.
Second, ( to avoid the indeterminate results) you have added an ORDER BY inside a subquery which (non-standard or not) is not documented anywhere in MySQL documentation that it should work as expected. So, it may be working now but it may not work in the not so distant future, when you upgrade to MySQL version X (where the optimizer will be clever enough to understand that ORDER BY inside a derived table is redundant and can be eliminated).
Try using this query:
SELECT
p.*, v.*
FROM
people AS p
LEFT JOIN
( SELECT
id_people
, MAX(year) AS year
FROM
visits
GROUP BY
id_people
) AS vm
JOIN
visits AS v
ON v.id_people = vm.id_people
AND v.year = vm.year
ON v.id_people = p.id;
The: SQL-fiddle
A compound index on (id_people, year) would help efficiency.
A different approach. It works fine if you limit the persons to a sensible limit (say 30) first and then join to the visits table:
SELECT
p.*, v.*
FROM
( SELECT *
FROM people
ORDER BY name
LIMIT 30
) AS p
LEFT JOIN
visits AS v
ON v.id_people = p.id
AND v.year =
( SELECT
year
FROM
visits
WHERE
id_people = p.id
ORDER BY
year DESC
LIMIT 1
)
ORDER BY name ;
Why do you have a subquery when all you need is a table name for joining?
It is also not obvious to me why your query has a GROUP BY clause in it. GROUP BY is ordinarily used with aggregate functions like MAX or COUNT, but you don't have those.
How about this? It may solve your problem.
SELECT people.id, people.name, MAX(visits.year) year
FROM people
JOIN visits ON people.id = visits.id_people
GROUP BY people.id, people.name
If you need to show the person, the most recent visit, and the note from the most recent visit, you're going to have to explicitly join the visits table again to the summary query (virtual table) like so.
SELECT a.id, a.name, a.year, v.note
FROM (
SELECT people.id, people.name, MAX(visits.year) year
FROM people
JOIN visits ON people.id = visits.id_people
GROUP BY people.id, people.name
)a
JOIN visits v ON (a.id = v.id_people and a.year = v.year)
Go fiddle: http://www.sqlfiddle.com/#!2/d67fc/20/0
If you need to show something for people that have never had a visit, you should try switching the JOIN items in my statement with LEFT JOIN.
As someone else wrote, an ORDER BY clause in a subquery is not standard, and generates unpredictable results. In your case it baffled the optimizer.
Edit: GROUP BY is a big hammer. Don't use it unless you need it. And, don't use it unless you use an aggregate function in the query.
Notice that if you have more than one row in visits for a person and the most recent year, this query will generate multiple rows for that person, one for each visit in that year. If you want just one row per person, and you DON'T need the note for the visit, then the first query will do the trick. If you have more than one visit for a person in a year, and you only need the latest one, you have to identify which row IS the latest one. Usually it will be the one with the highest ID number, but only you know that for sure. I added another person to your fiddle with that situation. http://www.sqlfiddle.com/#!2/4f644/2/0
This is complicated. But: if your visits.id numbers are automatically assigned and they are always in time order, you can simply report the highest visit id, and be guaranteed that you'll have the latest year. This will be a very efficient query.
SELECT p.id, p.name, v.year, v.note
FROM (
SELECT id_people, max(id) id
FROM visits
GROUP BY id_people
)m
JOIN people p ON (p.id = m.id_people)
JOIN visits v ON (m.id = v.id)
http://www.sqlfiddle.com/#!2/4f644/1/0 But this is not the way your example is set up. So you need another way to disambiguate your latest visit, so you just get one row per person. The only trick we have at our disposal is to use the largest id number.
So, we need to get a list of the visit.id numbers that are the latest ones, by this definition, from your tables. This query does that, with a MAX(year)...GROUP BY(id_people) nested inside a MAX(id)...GROUP BY(id_people) query.
SELECT v.id_people,
MAX(v.id) id
FROM (
SELECT id_people,
MAX(year) year
FROM visits
GROUP BY id_people
)p
JOIN visits v ON (p.id_people = v.id_people AND p.year = v.year)
GROUP BY v.id_people
The overall query (http://www.sqlfiddle.com/#!2/c2da2/1/0) is this.
SELECT p.id, p.name, v.year, v.note
FROM (
SELECT v.id_people,
MAX(v.id) id
FROM (
SELECT id_people,
MAX(year) year
FROM visits
GROUP BY id_people
)p
JOIN visits v ON ( p.id_people = v.id_people
AND p.year = v.year)
GROUP BY v.id_people
)m
JOIN people p ON (m.id_people = p.id)
JOIN visits v ON (m.id = v.id)
Disambiguation in SQL is a tricky business to learn, because it takes some time to wrap your head around the idea that there's no inherent order to rows in a DBMS.
I have the following tables for a messaging system and I was wondering how I would go about querying the DB for how many conversations have new messages.
My tables are as follows
Conversation
------------
id
subject
Messages
--------
id
conversation_id
user_id (sender)
message
timestamp (time sent)
Participants
------------
conversation_id
user_id
last_read (time stamp of last view user viewed conversation)
I'm trying to do the following query but it returns no results:
SELECT COUNT(m.conversation_id) AS count
FROM (messages_message m)
INNER JOIN messages_participants p ON p.conversation_id = m.conversation_id
WHERE `m`.`timestamp` > 'p.last_read'
AND `p`.`user_id` = '5'
GROUP BY m.conversation_id
LIMIT 1
Also, I probably will have to run this on every page load - any tips of making it as fast as possible?
Cheers
EDIT
I've got another somewhat related question if anybody would be so kind as to help out.
I'm trying to retrieve the subject, last message in conversation, timestamp of last convo and number of new messages. I believe I have a working query but it looks a bit badly put together. What sort of improvements can I do to this?
SELECT SQL_CALC_FOUND_ROWS c.*, last_msg.*, new_msgs.count as new_msgs_count
FROM ( messages_conversation c )
INNER JOIN messages_participants p ON p.user_id = '5'
INNER JOIN ( SELECT m.*
FROM (messages_message m)
ORDER BY m.timestamp DESC
LIMIT 1) last_msg
ON c.id = last_msg.conversation_id
LEFT JOIN ( SELECT COUNT(m.id) AS count, m.conversation_id, m.timestamp
FROM (messages_message m) ) new_msgs
ON c.id = new_msgs.conversation_id AND new_msgs.timestamp > p.last_read
LIMIT 0,10
Should I determine if the conversations is unread by doing an IF statement in MySQL or should I convert and compare timestamps on PHP?
Thanks again,
RS7
'p.last_read' as quoted above is a string constant - remove the quotes from this and see whether that changes anything, RS7. If user_id is an integer than remove the quotes from '5' as well.
As far as performance goes, ensure you have indexes on all the relevant columns. messages_participants.user_id and messages_message.timestamp being two important columns to index.
Yes, you have problem in your query.
Firstly, you should have noticed that you count the column you are grouping, so the count result will be 1.
Secondly, you are comparing the timestamp to a string : m.timestamp > 'p.last_read'.
Finally, avoid using LIMIT when you know your query will return one row (be self-confident :p).
Try:
SELECT
COUNT(m.conversation_id) AS count
FROM
messages_message m
INNER JOIN
messages_participants p ON p.conversation_id = m.conversation_id
WHERE
m.timestamp > p.last_read
AND p.user_id = 5
if you want to increase the query running time you can create a new index in message_participants (conversation_id, user_id) to index the conversations per users and then change your query with:
SELECT
COUNT(m.conversation_id) AS count
FROM
messages_message m
INNER JOIN
messages_participants p ON p.conversation_id = m.conversation_id AND p.user_id = 5
WHERE
m.timestamp > p.last_read
So that your DB engine can now filter the JOIN by simply looking at the index table. You could go deeper in this thought by indexing the timestampe too : (timestamp, conversation_id, user_id) and put the where condition in the join condition.
Whatever you choose, always put the most selective field first, to increase selectivity.
EDIT
First, let's comment your query:
SELECT
SQL_CALC_FOUND_ROWS c.*,
last_msg.*,
new_msgs.count as new_msgs_count
FROM
messages_conversation c
INNER JOIN
messages_participants p ON p.user_id = 5 -- Join with every conversations of user 5; if id is an integer, avoid writing '5' (string converted to an integer).
INNER JOIN
( -- Select every message : you could already select here messages from user 5
SELECT
*
FROM
messages_message m
ORDER BY -- this is not the goal of ORDER BY. Use MAX to obtain to latest timestamp.
m.timestamp DESC
LIMIT 1
) last_msg ON c.id = last_msg.conversation_id -- this query return one row and you want to have the latest timestamp for each conversation.
LEFT JOIN
(
SELECT
COUNT(m.id) AS count,
m.conversation_id,
m.timestamp
FROM
messages_message m
) new_msgs ON c.id = new_msgs.conversation_id AND new_msgs.timestamp > p.last_read
LIMIT 0,10
Let's rephrase your query:
select the number of new messages of a conversation subject, its last message and timestamp for user #id.
Do it step by step:
Selecting last message, timestamp in conversation for each user:
SELECT -- select the latest timestamp with its message
max(timestamp),
message
FROM
messages_message
GROUP BY
user_id
Aggregates functions (MAX, MIN, SUM, ...) work on the current group. Read this like "for each groups, calculate the aggregate functions, then select what I need where my conditions are true". So it will result in one row per group.
So this last query selects the last message and timestamp of every user in the messages_message table. As you can see, it is easy to select this value for a specific user adding the WHERE clause:
SELECT
MAX(timestamp),
message
FROM
messages_message
WHERE
user_id = #id
GROUP BY
user_id
Number of messages per conversation: for each conversation, count the number of messages
SELECT
COUNT(m.id) -- assuming id column is unique, otherwise count distinct value.
FROM
messages_conversation c
INNER JOIN -- The current user participated to the conversation
messages_participant p ON p.conversation_id = c.id AND p.user_id = #id
OUTER JOIN -- Messages of the conversation where the current user participated, newer than last read its time
messages_message m ON m.conversation_id = c.id AND m.timestamp > p.last_read = #id
GROUP BY
c.id -- for each conversation
INNER JOIN won't return rows for conversations where the current user did not participated.
Then OUTER JOIN will join with NULL columns if the condition is false, so that COUNT will return 0 - there is not new messages.
Putting it all together.
Select the last message and timestamp in conversation where the current user participated and the number of new messages in each conversation.
Which is a JOIN between the two last queries.
SELECT
last_msg.conversation_id,
last_msg.message,
last_msg.max_timestamp,
new_msgs.nb
FROM
(
SELECT
MAX(timestamp) AS max_timestamp,
message,
conversation_id
FROM
messages_message
WHERE
user_id = #id
GROUP BY
user_id
) last_msg
JOIN
(
SELECT
c.id AS conversation_id
COUNT(m.id) AS nb
FROM
messages_conversation c
INNER JOIN
messages_participant p ON p.conversation_id = c.id AND p.user_id = #id
OUTER JOIN
messages_message m ON m.conversation_id = c.id AND m.timestamp > p.last_read = #id
GROUP BY
C.id
) new_msgs ON new_msgs.conversation_id = last_msg.conversation_id
-- put here and only here a order by if necessary :)