This question SQL select only rows with max value on a column doesn't solve my problem although it has been marked as duplicate.
It assumes my columns from_id and to_id are primary keys, when they don't have such constraint (see code provided bellow). If they were primary keys, I couldn't store my messages in the same table. As a result the SQL query of this answer prints all duplicates multiple times, which is not what I want. Please see expected behaviour bellow.
Expected behaviour : I need to select the latest message from all conversations, regardless of whether the user is only sender, recipient, or both. Each conversation/thread should only be displayed once.
Example : when querying this table, my SQL statement should only output msg3 and msg4, ignoring all the previous messages John and Alice exchanged.
Here is the closest query I could write. Problem is this query only selects conversations where user received a message. I'm stuck adding conversations where user is only sender (he didn't get any reply) to the selection.
SELECT * FROM messages where `to_id` = '1' GROUP BY `from_id` ORDER BY `send_date` ASC
Here are users and messages tables:
CREATE TABLE users (
id INT(11) AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(128) NOT NULL
);
CREATE TABLE messages (
id INT(11) AUTO_INCREMENT PRIMARY KEY,
to_id INT(11) NOT NULL, //recipient id to match to current user id
from_id INT(11) NOT NULL, //sender id to match to current user id
send_date DATETIME DEFAULT CURRENT_TIMESTAMP,
content TEXT
);
Question: How can I do this using a single SQL query ? Or should I change my data structure using three tables instead of one ?
I would first get the ids. You can do this using least() and greatest():
select least(m.to_id, m.from_id) as id1,
greatest(m.to_id, m.from_id) as id2, max(m.id) as max_id
from messages m
group by id1, id2;
You can then get the complete information about the message by joining back:
select m.*
from messages m
where m.id in (select max(m.id) as max_id
from messages m
group by least(m.to_id, m.from_id), greatest(m.to_id, m.from_id)
);
Note: In older versions of MySQL, putting the subquery in the from clause and using join is much more efficient.
Related
I've an application which reads email messages (I download the emails from gmail inbox in my application's database) from MySql Database. and the Database has following structure
Table1 (Contacts):
ContactID (int)
ContactName (varchar(100)
ContactEmailAddress (varchar(150))
Table2 (Subjects):
SubjectID (int)
ContactID (int)
Subject (varchar(200))
Table3 (Messages):
MessageID (int)
SubjectID (int)
MessageText (varchar(150)
IsRead (tinyint)
IsReceived (tinyint)
MessageDate (DateTime)
And here is my query to fetch most recent 40 records
SELECT * FROM(SELECT ROW_NUMBER()OVER(Order by isRead ASC,MessageDate DESC) RecID,
c.ContactName,s.subject,s.SubjectID,d.MessageDate,d.isRead
from Contacts c
INNER JOIN Subjects s on s.ContactID=c.ContactID
JOIN (
select MAX(MessageID) dtl_id,SubjectID from Messages where IsReceived=1
GROUP BY SubjectID)d_max on (d_max.subjectid=s.subjectid)
JOIN Messages d on (d.MessageID=d_max.dtl_id)
) AS RowConstrainedResult where RecID >=1 and RecID <=40 ORDER BY RecID
but this query takes almost 15 seconds to load. What should be done to improve the query performance. as my all primary key columns and referenced key columns are indexed. And the Messages table has almost 500k records in it.
The EXPLAIN in the link you provided does not seem to refer to the SELECT you provided, so I have to ignore it.
This index may be helpful:
Messages: (IsReceived, SubjectID, MessageID)
I added the tag [groupwise-maximum]. It links to a lot of other Questions that are doing similar things. Very few actually succeed in optimizing the task. I have a comparison of the faster-than-most techniques here: http://mysql.rjweb.org/doc.php/groupwise_max
I am running a query on three tables messages, message_recipients and users.
Table structure of messages table:
id int pk
message_id int
message text
user_id int
...
Index for this table is on user_id, message_id and id.
Table structure of message_recipients table:
id int pk
message_id int
read_date datetime
user_id int
...
Index is on id, message_id and user_id.
Table structure of users table:
id int pk
display_name varchar
...
Index is on id.
I am running the following query against these tables:
SELECT
m.*,
if(m.user_id = 0, 'Campus Manager', u.display_name) AS name,
mr.read_date,
IF(m1.message_id > 0 and m1.user_id=1, true, false) as replied
FROM
messages m
JOIN
message_recipients mr
ON
mr.message_id = m.id
LEFT JOIN
users u
ON
u.UID = m.user_id
LEFT JOIN
messages m1
ON
m1.message_id = m.id
WHERE
mr.user_id = 1
AND
m.published = 1
GROUP BY
mr.message_id
ORDER BY
m.created DESC
EXPLAIN returns the following data for this query:
UPDATE
As suggested by #e4c5, I added new composite index on (published,user_id,created) and now the explain query shows this:
How can this query be optimized by adding required indexes (if any) as it is taking lot of time?
GROUP BY needs to list all the non-aggregated columns. I suspect that would be a mess. Why do you need GROUP BY at all?
Why are you linking messages.id to messages_id? Is this a hierarchical table, but the column names aren't like 'parent_id'?
"Index is on id, message_id and user_id" -- is that one composite index or 3 single-column indexes? (It makes a big difference.) It would be better to show us SHOW CREATE TABLE instead of ambiguously paraphrasing.
Is user_id=1 prolific? That is, are you expecting thousands of rows? Is this query only a problem for him?
Using LEFT JOIN implies that m1.message_id could be NULL, yet the reference to it seems to ignore that possibility.
If this is a single table that contains a message thread -- both the main info about the thread and the individual responses, then I suggest it is a bad design. (I made this mistake once upon a time.) I think it iis better to have a table with one row per thread and another table with one row per comment. 1 thread : many comments. So there would be a thread_id in the comment table.
I was able to bring down the query time from 3 seconds to 0.1 second by adding a new index to messages and message_recipients table and changing the database engine of messages table to MyISAM from InnoDB.
Composite index composite added on these columns with respective order on messages table - published, user_id, created
Composite index message_id_2 added on two columns on message_recipients table - message_id, user_id
EXPLAIN Query now shows
We have lot of question like this, but each query is unique so this question arise. I have following query
Select * from tblretailerusers where (company_id=169 or company_id in (select id from tblretailercompany where multi_ret_id='169')) and ( id in (Select contact_person_1 from tbllocations where status=1 and (retailer_comp_id=169 or retailer_comp_id in (select id from tblretailercompany where multi_ret_id='169'))) OR id in(Select contact_person_2 from tbllocations where status=1 and (retailer_comp_id=169 or retailer_comp_id in (select id from tblretailercompany where multi_ret_id='169'))) ) and (last_login is not null )
It has three tables involve, Retailer, their Location and their User. Standard User information. Each retailer can have child Retailer, so Retailer table has Parent Retailer ID. Currently each table has about 6K records, and all table has Primary key as Auto increment and as I know they are indexed as well. in User Table Email field is indexed.
Now, this query take < 1 sec which is fine to have, but now client want to find user whose' email ids start with specific letter, like a and b. As soon as I add that to query, it starts taking about 50-60 seconds. I create Index on Email field which is not unique, and new query looks like
Select * from tblretailerusers where (company_id=169 or company_id in (select id from tblretailercompany where multi_ret_id='169')) and ( id in (Select contact_person_1 from tbllocations where status=1 and (retailer_comp_id=169 or retailer_comp_id in (select id from tblretailercompany where multi_ret_id='169'))) OR id in(Select contact_person_2 from tbllocations where status=1 and (retailer_comp_id=169 or retailer_comp_id in (select id from tblretailercompany where multi_ret_id='169'))) ) and (last_login is not null ) and email REGEXP '^A|^B'
I try to use Explain, in both version of query and the interesting fact I notice is that in Primary table row it do not show any value for possible_key where as we are using Primary key id search in User table as well as I have Index on Email field too. Here is Explain of query:
I try to recreate index, current I have index that use ID, CompanyID and Email in one index other than primary key in user table. I also create Index on Company Tables, but nothing speeds it up. User Table is MyISAM.
My another question is, how can I skip Sub Query for Child Company Search, as you can see it was used thrice in above query.
EDIT: The reason I am using the REGEXP in my query is that when I try the Like 'A%' it was even slow with that option.
Edit I just test with last_login is null instead of last_login is not null, and results take less than 5 seconds. Aren't Null or Not Null similar?
Instead of using the RegEx:
and email REGEXP '^A|^B'
...try a simple LIKE:
and (email like 'A%' or email like 'B%')
Regular Expressions are a bit of heavy for this rather small comparison. Like will probably be far faster. Also, I wouldn't expect the optimiser to try to decode what the regexp is trying to do. With a Like, however, it knows and will probably use the index you've set up.
I'm quite new to databases so i apologise in advance if this sounds silly. Im creating a basic web application that simulates a micro blogging website. I have three tables authors, posts & comments.
The authors table is described as follows:
aId int(20) NO PRI NULL auto_increment
aUser varchar(30) NO UNI NULL
aPass varchar(40) NO NULL
aEmail varchar(30) NO UNI NULL
aBio mediumtext YES NULL
aReg datetime NO NULL
the posts table is described below:
pId int(20) NO PRI NULL auto_increment
pAuthor int(20) NO MUL NULL
pTitle tinytext NO NULL
pBody mediumtext NO NULL
pDate datetime NO NULL
I understand the basics of relationships, but could i ask, if on my web application i want to display the posts and include who posted them, is there a way of doing this so the result set will show the actual username, rather than the numeric ID ? each time a post is created i capture the users ID, so every post created is by a valid user ID and the post table records the user ID of the person who created it, but when viewing the posts in a select query it shows the numbers and not the names associated with them in the authors table. us there a query i could use to do this or a way of doing it so when use a select * from authors, it shows the usernames rather than the user ID.
Thank you guys.
I think you are looking for a SQL query:
SELECT pTitle, pDate, aUser
FROM posts
LEFT JOIN authors ON aId=pAuthor
ORDER BY pDate DESC
After the SELECT you tell the MySQL what columns you want to see, with the LEFT JOIN you connect the tables together (by aId and pAuthor) and with ORDER BY you tell the mysql to give them to you ordered by date starting from newest pDate DESC (highest date first)
SELECT posts.*, authors.aUser
FROM posts
LEFT JOIN authors ON aId=pAuthor
WHERE pTitle LIKE "%news%"
ORDER BY pDate DESC, aUser ASC
to see the author's name when searching for posts with title containing "news" sorted from the newest posts, and in case two posts having the same timestamp, show them ordered by users name (Adam will go before Zachariash)
In case you do not need to see more than title, date and users name, use the 1st row from the first query above
I'm creating a site where users can create photo albums, create events, upload videos etc. What I want to do is make a list of a given user's recent activity. Here's a small outline of my tables:
**videos**
id
user_id
uploaded
**albums**
id
user_id
created
updated
**comments**
id
user_id
date
Of course there are more fields in the table, and also more tables, but these should be enough to help me construct a query.
Now what I want to be output is a date, and the id for a given activity with these fields:
user_id, video_id, album_id, comment_id, date
Of course only one of the ID fields should be chosen, the rest should just be null, and the date should come from "uploaded" for videos, "updated" for albums and "date" for comments. The user_id should be selected in the query in a where statement so you get activity for a given user.
I've tried to construct this query but failed, quessing that COALESCE should be used for choosing the different timestamps but I just can't get around it.
Something like this?
(select user_id, id as video_id, NULL as album_id, NULL as comment_id, uploaded as date from videos)
UNION
(select user_id, NULL, id, NULL, uploaded from albums)
UNION
(select user_id, NULL, NULL, id, date from comments)
You can apply an ORDER BY clause after the whole thing, but WHERE conditions much be put inside the parenteses of each separate SELECT.