MySQL JOIN queries - Messaging system - mysql

I have the following tables for a messaging system and I was wondering how I would go about querying the DB for how many conversations have new messages.
My tables are as follows
Conversation
------------
id
subject
Messages
--------
id
conversation_id
user_id (sender)
message
timestamp (time sent)
Participants
------------
conversation_id
user_id
last_read (time stamp of last view user viewed conversation)
I'm trying to do the following query but it returns no results:
SELECT COUNT(m.conversation_id) AS count
FROM (messages_message m)
INNER JOIN messages_participants p ON p.conversation_id = m.conversation_id
WHERE `m`.`timestamp` > 'p.last_read'
AND `p`.`user_id` = '5'
GROUP BY m.conversation_id
LIMIT 1
Also, I probably will have to run this on every page load - any tips of making it as fast as possible?
Cheers
EDIT
I've got another somewhat related question if anybody would be so kind as to help out.
I'm trying to retrieve the subject, last message in conversation, timestamp of last convo and number of new messages. I believe I have a working query but it looks a bit badly put together. What sort of improvements can I do to this?
SELECT SQL_CALC_FOUND_ROWS c.*, last_msg.*, new_msgs.count as new_msgs_count
FROM ( messages_conversation c )
INNER JOIN messages_participants p ON p.user_id = '5'
INNER JOIN ( SELECT m.*
FROM (messages_message m)
ORDER BY m.timestamp DESC
LIMIT 1) last_msg
ON c.id = last_msg.conversation_id
LEFT JOIN ( SELECT COUNT(m.id) AS count, m.conversation_id, m.timestamp
FROM (messages_message m) ) new_msgs
ON c.id = new_msgs.conversation_id AND new_msgs.timestamp > p.last_read
LIMIT 0,10
Should I determine if the conversations is unread by doing an IF statement in MySQL or should I convert and compare timestamps on PHP?
Thanks again,
RS7

'p.last_read' as quoted above is a string constant - remove the quotes from this and see whether that changes anything, RS7. If user_id is an integer than remove the quotes from '5' as well.
As far as performance goes, ensure you have indexes on all the relevant columns. messages_participants.user_id and messages_message.timestamp being two important columns to index.

Yes, you have problem in your query.
Firstly, you should have noticed that you count the column you are grouping, so the count result will be 1.
Secondly, you are comparing the timestamp to a string : m.timestamp > 'p.last_read'.
Finally, avoid using LIMIT when you know your query will return one row (be self-confident :p).
Try:
SELECT
COUNT(m.conversation_id) AS count
FROM
messages_message m
INNER JOIN
messages_participants p ON p.conversation_id = m.conversation_id
WHERE
m.timestamp > p.last_read
AND p.user_id = 5
if you want to increase the query running time you can create a new index in message_participants (conversation_id, user_id) to index the conversations per users and then change your query with:
SELECT
COUNT(m.conversation_id) AS count
FROM
messages_message m
INNER JOIN
messages_participants p ON p.conversation_id = m.conversation_id AND p.user_id = 5
WHERE
m.timestamp > p.last_read
So that your DB engine can now filter the JOIN by simply looking at the index table. You could go deeper in this thought by indexing the timestampe too : (timestamp, conversation_id, user_id) and put the where condition in the join condition.
Whatever you choose, always put the most selective field first, to increase selectivity.
EDIT
First, let's comment your query:
SELECT
SQL_CALC_FOUND_ROWS c.*,
last_msg.*,
new_msgs.count as new_msgs_count
FROM
messages_conversation c
INNER JOIN
messages_participants p ON p.user_id = 5 -- Join with every conversations of user 5; if id is an integer, avoid writing '5' (string converted to an integer).
INNER JOIN
( -- Select every message : you could already select here messages from user 5
SELECT
*
FROM
messages_message m
ORDER BY -- this is not the goal of ORDER BY. Use MAX to obtain to latest timestamp.
m.timestamp DESC
LIMIT 1
) last_msg ON c.id = last_msg.conversation_id -- this query return one row and you want to have the latest timestamp for each conversation.
LEFT JOIN
(
SELECT
COUNT(m.id) AS count,
m.conversation_id,
m.timestamp
FROM
messages_message m
) new_msgs ON c.id = new_msgs.conversation_id AND new_msgs.timestamp > p.last_read
LIMIT 0,10
Let's rephrase your query:
select the number of new messages of a conversation subject, its last message and timestamp for user #id.
Do it step by step:
Selecting last message, timestamp in conversation for each user:
SELECT -- select the latest timestamp with its message
max(timestamp),
message
FROM
messages_message
GROUP BY
user_id
Aggregates functions (MAX, MIN, SUM, ...) work on the current group. Read this like "for each groups, calculate the aggregate functions, then select what I need where my conditions are true". So it will result in one row per group.
So this last query selects the last message and timestamp of every user in the messages_message table. As you can see, it is easy to select this value for a specific user adding the WHERE clause:
SELECT
MAX(timestamp),
message
FROM
messages_message
WHERE
user_id = #id
GROUP BY
user_id
Number of messages per conversation: for each conversation, count the number of messages
SELECT
COUNT(m.id) -- assuming id column is unique, otherwise count distinct value.
FROM
messages_conversation c
INNER JOIN -- The current user participated to the conversation
messages_participant p ON p.conversation_id = c.id AND p.user_id = #id
OUTER JOIN -- Messages of the conversation where the current user participated, newer than last read its time
messages_message m ON m.conversation_id = c.id AND m.timestamp > p.last_read = #id
GROUP BY
c.id -- for each conversation
INNER JOIN won't return rows for conversations where the current user did not participated.
Then OUTER JOIN will join with NULL columns if the condition is false, so that COUNT will return 0 - there is not new messages.
Putting it all together.
Select the last message and timestamp in conversation where the current user participated and the number of new messages in each conversation.
Which is a JOIN between the two last queries.
SELECT
last_msg.conversation_id,
last_msg.message,
last_msg.max_timestamp,
new_msgs.nb
FROM
(
SELECT
MAX(timestamp) AS max_timestamp,
message,
conversation_id
FROM
messages_message
WHERE
user_id = #id
GROUP BY
user_id
) last_msg
JOIN
(
SELECT
c.id AS conversation_id
COUNT(m.id) AS nb
FROM
messages_conversation c
INNER JOIN
messages_participant p ON p.conversation_id = c.id AND p.user_id = #id
OUTER JOIN
messages_message m ON m.conversation_id = c.id AND m.timestamp > p.last_read = #id
GROUP BY
C.id
) new_msgs ON new_msgs.conversation_id = last_msg.conversation_id
-- put here and only here a order by if necessary :)

Related

SELECT the last message of conversation - MySQL

I have query which looks like:
SELECT * FROM
( SELECT DISTINCT CASE
WHEN user1_id = 1
THEN user2_id
ELSE user1_id
END userID,conversationId
FROM conversations
WHERE 1 IN (user2_id,user1_id))dt
INNER JOIN users on dt.userID = users.id
It returns conversationId and information about user from users table. I would like to also add the last message (the one with biggest messageId) from message table on base of conversationId. The last thing would be to sort all the results by messageId
I tried to use another INNER JOIN which looked like :
INNER JOIN message on dt.conversationId = message.conversationId
Its adding messages to the result but I would like to get only the last one (the one with highest messageId as mentioned). I guess I would have to implement MAX somehow but I dont have idea how. The same thing with sorting all result by messageId so results with the biggest messageId would be first.
Thanks for all suggestions.
You can get the highest messageId for the conversation in a corelated subquery and use it for your join condition:
INNER JOIN message m
on m.conversationId = dt.conversationId
and m.messageId = (
SELECT MAX(m1.messageId)
FROM message m1
WHERE m1.conversationId = dt.conversationId
)
So the solution for eveything was following query
SELECT * FROM
( SELECT DISTINCT
CASE
WHEN user1_id = 1
THEN user2_id
ELSE user1_id
END userID,conversationId
FROM conversations
WHERE 1 IN (user2_id,user1_id))dt
INNER JOIN users on dt.userID = users.id
INNER JOIN message m on m.conversationId = dt.conversationId and m.messageId = (SELECT MAX(m1.messageId)
FROM message m1 WHERE m1.conversationId = dt.conversationId)
ORDER by m.messageId DESC

How to select single rows using MAX and GROUP BY on non-uniqe field in MySQL?

I came across this very simple case where I need to select a list of conversations from Conversations table along with latest message from Messages table - which has non-uniqe dateCreated field.
After long research I came up with this query:
SELECT
Conversations.id,
dateCreated,
`name`,
lastMessageId,
lastMessageDate,
lastMessagePayload
FROM Conversations
LEFT JOIN (
SELECT
id AS lastMessageId,
m1.conversationId,
payload AS lastMessagePayload,
m1.dateCreated AS lastMessageDate,
FROM Messages AS m1
INNER JOIN (
SELECT conversationId, MAX(dateCreated) AS mdate FROM Messages GROUP BY conversationId
) AS m2
ON m1.conversationId = m2.conversationId AND m1.dateCreated = m2.mdate
) AS msg2
ON msg2.conversationId = Conversations.id
ORDER BY dateCreated DESC
Query works well but if two latest messages in same conversation have exact same dateCreated field this query would then output two conversations with same id but different lastMessage... row of fields.
I just couldn't find a way to get around this problem as main problem is when you do GROUP BY a field and MAX on another non-uniqe field then you can't get out always only one row out.
Any idea how to get list of unique conversations with latest message (any message of the two if they have the same date)?
Use row_number()!
select c.*, m.* -- or whatever columns you want
from conversations c left join
(select m.*,
row_number() over (partition by m.conversationid order by m.dateCreated desc, m.id desc) as seqnum
from messages m
) m
on m.conversation_id = c.id and
m.seqnum = 1;
MySQL 5.x version...
Use a correlated sub-query to get the latest message id (for a given conversation), using ORDER BY and LIMIT 1
SELECT
Conversations.Conversations.id,
Conversations.dateCreated,
Conversations.`name`,
Messages.id AS lastMessageId,
Messages.payload AS lastMessagePayload,
Messages.dateCreated AS lastMessageDate,
FROM
Conversations
LEFT JOIN
Messages
ON Messages.id = (
SELECT lookup.id
FROM Messages AS lookup
WHERE lookup.conversationId = Conversations.id
ORDER BY lookup.dateCreated DESC
LIMIT 1
)
ORDER BY
Conversations.dateCreated DESC
In the event of two messages having the same date, the message you get is non-deterministic / arbitrary.
You could, if you wanted, therefore change it to get the highest id from the most recent date...
ORDER BY lookup.dateCreated DESC, lookup.id DESC
LIMIT 1

Why doesn't my content field match my MAX(id) field in MySQL?

I'm trying to get a subset of data based on the latest id and dates. It seems that when selecting other fields in the table they are not in sync with the max id and dates returned.
Any idea how I can fix this?
MySQL:
SELECT MAX(m.id) as id, m.sender_id, m.receiver_id, MAX(m.date) as date, m.content, l.username, p.gender
FROM messages m
LEFT JOIN login_users l on l.user_id = m.sender_id
LEFT JOIN profiles p ON p.user_id = l.user_id
WHERE m.receiver_id=3
GROUP BY m.sender_id ORDER BY date DESC LIMIT 0, 7
The data for content isn't the correct one. It seems to be returning random content and not the content that is tied to the row for max id and max date.
Do I need to do some sort of sub select to fix this?
To answer the question in the title, "Why doesn't my content field match my MAX(id) field", that's because there is no guarantee that the values returned for the non-aggregate fields will be from the row where the MAX value is found. This is the documented behavior, and this is what we expect.
Other DBMS would throw an error on the statement, MySQL is just more lax, and you are getting values from one row, but it's not guaranteed to be the row that either of the MAX values (id or date) is found on.
You have two separate aggregate expression MAX(m.id) and MAX(m.date). Note that there is no guarantee that those values will come from the same row.
The rule in other databases is that every non-aggregate expression in the SELECT list needs to appear in the GROUP BY. (MySQL is more lax about that, and doesn't make that a requirement.)
One way to "fix" the query so that it does return values from the row with the MAX value is to use an inline view (query) that gets the MAX(id) grouped by what you want to GROUP BY, and then a JOIN back to the original table to get other values on the row.
From your statement it's not clear what result set you want returned. If you want the row that has the maximum id and you also want the row with maximum date, then you could something like this:
SELECT m.id
, m.sender_id
, m.receiver_id
, m.date
, m.content
, l.username
, p.gender
FROM ( SELECT t.sender_id
, t.receiver_id
, MAX(t.id) AS max_id
, MAX(t.date) AS max_date
FROM messages t
WHERE t.receiver_id=3
GROUP
BY t.sender_id
, t.receiver_id
) s
JOIN messages m
ON m.sender_id = s.sender_id
AND m.receiver_id = s.receiver_id
AND ( m.id = s.max_id OR m.date = s.max_date)
LEFT
JOIN login_users l on l.user_id = m.sender_id
LEFT
JOIN profiles p ON p.user_id = l.user_id
ORDER BY m.date DESC LIMIT 0, 7
The inline view aliased as "s" returns the max values, and then that gets joined back to the messages table, aliased as "m".
NOTE
In most cases, we find that a JOIN (query) will perform better than an IN (query), because of the different access plans. You can see the difference in plans with an EXPLAIN.
For performance, you'll want an index
... ON messages (`receiver_id`, `sender_id`, `id`, `date`)
There's an equality predicate on receiver_id, so that should be the leading column, to get a range scan (instead of a full scan). You want the sender_id column next, because that should allow MySQL to avoid a "Using filesort" operation to get the rows grouped. The id and date columns are included, so that the inline view query can be satisfied entirely from the index pages without a need to access the pages in the table. (The EXPLAIN should show "Using where; Using index".)
That same index should also suitable for the outer query, though it does need to access the "content" column from the table pages, so the EXPLAIN will not show "Using index" for that step. (It's likely that the "content" column is much longer than we would want in the index.)
Using a join
SELECT LatestM.id, m.sender_id, m.receiver_id, m.date, m.content, l.username, p.gender
(
SELECT sender_id, MAX(id) AS id
FROM messages
WHERE receiver_id=3
GROUP BY sender_id
) LatestM
INNER JOIN messages m
ON LatestM.sender_id = m.sender_id AND LatestM.id = m.id
LEFT JOIN login_users l on l.user_id = m.sender_id
LEFT JOIN profiles p ON p.user_id = l.user_id
WHERE m.receiver_id = 3
ORDER BY date DESC
LIMIT 0, 7
Problem with this is that if the latest id does not reflect the latest date then the date returned will not be the latest one.
Well, you could probably solve it without a subselect, but doing one is fairly straight forward. Something like this should work, just make the subselect return the id's of the interesting rows in messages, and get the data for only them.
SELECT m.id as id, m.sender_id, m.receiver_id, m.date as date,
m.content, l.username, p.gender
FROM messages m
LEFT JOIN login_users l on l.user_id = m.sender_id
LEFT JOIN profiles p ON p.user_id = l.user_id
WHERE m.id IN (
SELECT max(id) FROM messages
WHERE receiver_id=3
GROUP BY sender_id
)
ORDER BY date DESC
LIMIT 0, 7
The reason that your original query does not match up fields is that GROUP BY really requires aggregate functions (like MAX/MIN/SUM/...) applied to every field you select that's not grouped by. The reason the query even runs is that MySQL does not enforce that, but instead returns indeterminate fields from any row that is matching. Afaik, all other SQL RDBMS' refuse to run the query.
EDIT: As for performance, a few indexes that are likely to help are;
CREATE INDEX ix_inner ON messages(receiver_id, sender_id, id);
CREATE INDEX ix_login_users ON login_users(user_id);
CREATE INDEX ix_profiles ON profiles(user_id);

sql nested inner join only returning 1 result

I'm building a Chatapplication that's a bit like the facebookchat. I have users,conversations and messages. All 3 have their own tables. For now I try to get all converstations containing a certain user and the latest message of the conversation.
I tried this query, but in a fact I only get 1 row back, but there are more rows matching
SELECT conversations.id as converid,
messages.from as messageauthor,
messages.message as message
FROM conversations INNER JOIN (SELECT * FROM messages
ORDER BY date DESC LIMIT 1) as messages
ON messages.conversationid=conversations.id
WHERE user1=3
OR user2=3
When I do i.e.
SELECT conversations.id as converid,
messages.from as messageauthor
FROM conversations INNER JOIN messages
ON messages.conversationid=conversations.id
WHERE user1=3
OR user2=3
I get all results, for sure, and when I check the converid's I get 3 unique Id's, so at least there are 3 converstations going on with userid 3. So the top query should also return 3. Now I don't understand why it only returns 1 row. Does the limit 1 from the nested query affect the whole query?
Looking forward for some pointers...
No. The limit 1 affects the subquery, so it is only returning one row. So, there is only one match.
What is the issue with this query (your second query, but formatted differently):
SELECT c.id as converid, m.from as messageauthor
FROM conversations c INNER JOIN
messages m
ON m.conversationid=c.id
WHERE user1=3 OR user2=3;
I see, you want the latest message. Try calculating it and joining back in:
SELECT c.id as converid, m.from as messageauthor
FROM conversations c INNER JOIN
messages m
ON m.conversationid=c.id join
(select m.conversationid, max(date) as maxdate
from messages m
group by m.conversationid
) mmax
on mmax.conversationid = m.conversationid and m.date = mmax.maxdate
WHERE user1=3 OR user2=3;

MySQL latest unique user chats not working

I'm trying to get latest created datetime for unique user_id. I've got below query but it does not seem to be working....It does not get the latest created time.
SELECT * FROM `chats`
WHERE receiver_id = 1
GROUP BY user_id
ORDER BY created DESC
Is there a reason why?
UPDATE:
Actually I found my answer myself. Please look below. I had to use INNER JOIN for nested searched and filtered result table then find using where clause of that result then left join on that table to get the data I needed!
SELECT c.*, users.username
FROM chats c
INNER JOIN(
SELECT MAX(created) AS Date, user_id, receiver_id, chat, type, id
FROM chats
WHERE receiver_id = 1
GROUP BY user_id ) cc ON cc.Date = c.created AND cc.user_id = c.user_id
LEFT JOIN users ON users.id = c.user_id
WHERE c.receiver_id = 1
You should use a MAX aggregate function -
SELECT user_id, MAX(created) latest_datetime FROM chats
GROUP BY user_id;
Try this query, it will show all users and latest created datetime.