I've an application which reads email messages (I download the emails from gmail inbox in my application's database) from MySql Database. and the Database has following structure
Table1 (Contacts):
ContactID (int)
ContactName (varchar(100)
ContactEmailAddress (varchar(150))
Table2 (Subjects):
SubjectID (int)
ContactID (int)
Subject (varchar(200))
Table3 (Messages):
MessageID (int)
SubjectID (int)
MessageText (varchar(150)
IsRead (tinyint)
IsReceived (tinyint)
MessageDate (DateTime)
And here is my query to fetch most recent 40 records
SELECT * FROM(SELECT ROW_NUMBER()OVER(Order by isRead ASC,MessageDate DESC) RecID,
c.ContactName,s.subject,s.SubjectID,d.MessageDate,d.isRead
from Contacts c
INNER JOIN Subjects s on s.ContactID=c.ContactID
JOIN (
select MAX(MessageID) dtl_id,SubjectID from Messages where IsReceived=1
GROUP BY SubjectID)d_max on (d_max.subjectid=s.subjectid)
JOIN Messages d on (d.MessageID=d_max.dtl_id)
) AS RowConstrainedResult where RecID >=1 and RecID <=40 ORDER BY RecID
but this query takes almost 15 seconds to load. What should be done to improve the query performance. as my all primary key columns and referenced key columns are indexed. And the Messages table has almost 500k records in it.
The EXPLAIN in the link you provided does not seem to refer to the SELECT you provided, so I have to ignore it.
This index may be helpful:
Messages: (IsReceived, SubjectID, MessageID)
I added the tag [groupwise-maximum]. It links to a lot of other Questions that are doing similar things. Very few actually succeed in optimizing the task. I have a comparison of the faster-than-most techniques here: http://mysql.rjweb.org/doc.php/groupwise_max
Related
This question SQL select only rows with max value on a column doesn't solve my problem although it has been marked as duplicate.
It assumes my columns from_id and to_id are primary keys, when they don't have such constraint (see code provided bellow). If they were primary keys, I couldn't store my messages in the same table. As a result the SQL query of this answer prints all duplicates multiple times, which is not what I want. Please see expected behaviour bellow.
Expected behaviour : I need to select the latest message from all conversations, regardless of whether the user is only sender, recipient, or both. Each conversation/thread should only be displayed once.
Example : when querying this table, my SQL statement should only output msg3 and msg4, ignoring all the previous messages John and Alice exchanged.
Here is the closest query I could write. Problem is this query only selects conversations where user received a message. I'm stuck adding conversations where user is only sender (he didn't get any reply) to the selection.
SELECT * FROM messages where `to_id` = '1' GROUP BY `from_id` ORDER BY `send_date` ASC
Here are users and messages tables:
CREATE TABLE users (
id INT(11) AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(128) NOT NULL
);
CREATE TABLE messages (
id INT(11) AUTO_INCREMENT PRIMARY KEY,
to_id INT(11) NOT NULL, //recipient id to match to current user id
from_id INT(11) NOT NULL, //sender id to match to current user id
send_date DATETIME DEFAULT CURRENT_TIMESTAMP,
content TEXT
);
Question: How can I do this using a single SQL query ? Or should I change my data structure using three tables instead of one ?
I would first get the ids. You can do this using least() and greatest():
select least(m.to_id, m.from_id) as id1,
greatest(m.to_id, m.from_id) as id2, max(m.id) as max_id
from messages m
group by id1, id2;
You can then get the complete information about the message by joining back:
select m.*
from messages m
where m.id in (select max(m.id) as max_id
from messages m
group by least(m.to_id, m.from_id), greatest(m.to_id, m.from_id)
);
Note: In older versions of MySQL, putting the subquery in the from clause and using join is much more efficient.
I am running a query on three tables messages, message_recipients and users.
Table structure of messages table:
id int pk
message_id int
message text
user_id int
...
Index for this table is on user_id, message_id and id.
Table structure of message_recipients table:
id int pk
message_id int
read_date datetime
user_id int
...
Index is on id, message_id and user_id.
Table structure of users table:
id int pk
display_name varchar
...
Index is on id.
I am running the following query against these tables:
SELECT
m.*,
if(m.user_id = 0, 'Campus Manager', u.display_name) AS name,
mr.read_date,
IF(m1.message_id > 0 and m1.user_id=1, true, false) as replied
FROM
messages m
JOIN
message_recipients mr
ON
mr.message_id = m.id
LEFT JOIN
users u
ON
u.UID = m.user_id
LEFT JOIN
messages m1
ON
m1.message_id = m.id
WHERE
mr.user_id = 1
AND
m.published = 1
GROUP BY
mr.message_id
ORDER BY
m.created DESC
EXPLAIN returns the following data for this query:
UPDATE
As suggested by #e4c5, I added new composite index on (published,user_id,created) and now the explain query shows this:
How can this query be optimized by adding required indexes (if any) as it is taking lot of time?
GROUP BY needs to list all the non-aggregated columns. I suspect that would be a mess. Why do you need GROUP BY at all?
Why are you linking messages.id to messages_id? Is this a hierarchical table, but the column names aren't like 'parent_id'?
"Index is on id, message_id and user_id" -- is that one composite index or 3 single-column indexes? (It makes a big difference.) It would be better to show us SHOW CREATE TABLE instead of ambiguously paraphrasing.
Is user_id=1 prolific? That is, are you expecting thousands of rows? Is this query only a problem for him?
Using LEFT JOIN implies that m1.message_id could be NULL, yet the reference to it seems to ignore that possibility.
If this is a single table that contains a message thread -- both the main info about the thread and the individual responses, then I suggest it is a bad design. (I made this mistake once upon a time.) I think it iis better to have a table with one row per thread and another table with one row per comment. 1 thread : many comments. So there would be a thread_id in the comment table.
I was able to bring down the query time from 3 seconds to 0.1 second by adding a new index to messages and message_recipients table and changing the database engine of messages table to MyISAM from InnoDB.
Composite index composite added on these columns with respective order on messages table - published, user_id, created
Composite index message_id_2 added on two columns on message_recipients table - message_id, user_id
EXPLAIN Query now shows
We have lot of question like this, but each query is unique so this question arise. I have following query
Select * from tblretailerusers where (company_id=169 or company_id in (select id from tblretailercompany where multi_ret_id='169')) and ( id in (Select contact_person_1 from tbllocations where status=1 and (retailer_comp_id=169 or retailer_comp_id in (select id from tblretailercompany where multi_ret_id='169'))) OR id in(Select contact_person_2 from tbllocations where status=1 and (retailer_comp_id=169 or retailer_comp_id in (select id from tblretailercompany where multi_ret_id='169'))) ) and (last_login is not null )
It has three tables involve, Retailer, their Location and their User. Standard User information. Each retailer can have child Retailer, so Retailer table has Parent Retailer ID. Currently each table has about 6K records, and all table has Primary key as Auto increment and as I know they are indexed as well. in User Table Email field is indexed.
Now, this query take < 1 sec which is fine to have, but now client want to find user whose' email ids start with specific letter, like a and b. As soon as I add that to query, it starts taking about 50-60 seconds. I create Index on Email field which is not unique, and new query looks like
Select * from tblretailerusers where (company_id=169 or company_id in (select id from tblretailercompany where multi_ret_id='169')) and ( id in (Select contact_person_1 from tbllocations where status=1 and (retailer_comp_id=169 or retailer_comp_id in (select id from tblretailercompany where multi_ret_id='169'))) OR id in(Select contact_person_2 from tbllocations where status=1 and (retailer_comp_id=169 or retailer_comp_id in (select id from tblretailercompany where multi_ret_id='169'))) ) and (last_login is not null ) and email REGEXP '^A|^B'
I try to use Explain, in both version of query and the interesting fact I notice is that in Primary table row it do not show any value for possible_key where as we are using Primary key id search in User table as well as I have Index on Email field too. Here is Explain of query:
I try to recreate index, current I have index that use ID, CompanyID and Email in one index other than primary key in user table. I also create Index on Company Tables, but nothing speeds it up. User Table is MyISAM.
My another question is, how can I skip Sub Query for Child Company Search, as you can see it was used thrice in above query.
EDIT: The reason I am using the REGEXP in my query is that when I try the Like 'A%' it was even slow with that option.
Edit I just test with last_login is null instead of last_login is not null, and results take less than 5 seconds. Aren't Null or Not Null similar?
Instead of using the RegEx:
and email REGEXP '^A|^B'
...try a simple LIKE:
and (email like 'A%' or email like 'B%')
Regular Expressions are a bit of heavy for this rather small comparison. Like will probably be far faster. Also, I wouldn't expect the optimiser to try to decode what the regexp is trying to do. With a Like, however, it knows and will probably use the index you've set up.
I have a table mvuser wih attributes user_id (int) (PRIMARY_KEY), username (varchar) (KEY), email (varchar),..., table job with attributes job_id (int) (PRIMARY_KEY), job_name (varchar) (KEY) and a table user_job.
Is it better for table user_job to have attributes user_id (int), job_id (int), or to have attributes username (varchar), job_name (varchar) and why? Which one is faster?
Table user_job will be always queried with username or with job_name, for example:
SELECT job_name from user_job WHERE user_job.username='Username'
or
SELECT job_name FROM user_job JOIN mvuser ON mvuser.user_id = user_job.user_id
JOIN job ON job.job_name = user_job.job_name
WHERE username='Username'
and similar when you query users by job_name.
I know that solution with user_id (int), job_id (int) is better if username and job_name can be changed, but in this case they can't, they are permanent.
You should prefere joining on the user_id and job_id as this is your primary key and is indexed by default (always try to join on indexed columns) also integer comparison is faster than string comparison.
If your usernames are unique you could also optimize your where clause by using the user_id instead of the username (as the user_id is indexed and the username is not). Otherwise it could speed up your query if create an index on username.
But the general practice is:
If you don't have a performance problem, don't over optimize it and ceep it readable.
If you have a performance problem, you need to measure it before and after the optimization. If you can't measure an improvement a theoretical speedup will not help your for your real problem.
having a little trouble with this query. I have two tables ...
Account -
ResourceID (int)
AccountID (int) (unique auto-inc)
Resource -
TextName (varchar)
ResourceID (int) (unique auto-inc)
CompanyID (int)
All I have is the AccountID and I Need to make 1 query that will tell me the TextName and ResourceID of all records in the Resource table who have the same CompanyID as the record in the account table that has the same ResourceID and the AccountID that I provide.
Here is what I have so far and already it has narrowed it down to only one entry ... and I have not even begun to try to incorporate the CompanyID yet.
SELECT r.ResId, r.FirstName, r.LastName
FROM account a, resource r
WHERE a.AccId='7' AND a.ResId = r.ResId
Any help is much appreciated. Thanks
You need an auto join to get the similar resource sharing the company !
SELECT rSameCompany.ResId, rSameCompany.FirstName, rSameCompany.LastName
FROM resource r
INNER JOIN resource rSameCompany
ON r.CompanyID = rSameCompany.CompanyID
INNER JOIN account a
ON r.ResourceID = a.ResourceID
AND a.AccId='7'
You want to LEFT JOIN on Account.ResourceID = Resource.ResourceID.