How to improve my friend list MySQL query? - mysql

I have a big MySQL query which actually returns the good set of result but it is quite slow.
SELECT DISTINCT
username,
user.id as uid,
email,
ui.gender,
ui.country,
ui.birthday,
IF( last_activity_date >= now() - INTERVAL 1 HOUR, 1, 0) as is_online,
p.thumb_url,
friend_request.id as sr_id,
IF( ul.id IS NOT NULL, 1, 0) as st_love,
DATE_FORMAT( NOW(), '%Y') - DATE_FORMAT( birthday, '%Y') - (DATE_FORMAT( NOW(), '00-%m-%d') < DATE_FORMAT( birthday, '00-%m-%d')) AS age
FROM friend_request
JOIN user ON (user.`id` = friend_request.`to_user_id`
OR
user.`id` = friend_request.`from_user_id`)
AND user.`id` != '$user_id'
AND friend_request.`status` = '1'
JOIN user_info ui ON user.`id` = ui.`user_id`
JOIN photo p ON ui.`main_photo` = p.`id`
LEFT JOIN user_love ul ON ul.`to_user_id` = user.`id`
AND ul.`from_user_id` = $user_id
WHERE (friend_request.`to_user_id` = '$user_id'
OR friend_request.`from_user_id` = '$user_id')
ORDER BY friend_request.id DESC
LIMIT 30
"$user_id" is the id of the logged-in user.
Here is the table structure of "friend_request" :
CREATE TABLE `friend_request` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`from_user_id` int(11) DEFAULT NULL,
`to_user_id` int(11) DEFAULT NULL,
`date` datetime DEFAULT NULL,
`seen` int(11) DEFAULT '0',
`status` int(11) DEFAULT '0',
PRIMARY KEY (`id`),
KEY `from_user_id` (`from_user_id`),
KEY `to_user_id` (`to_user_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ;
Can you help me to improve this query?
I have not copied the table structure of the other tables because, after some tests, the "optimization issue" seems to come from the friend_request table.
Thanks!
EDIT :
Here is what "EXPLAIN" gives me :

You should look at the query plan or try using the key word Explain https://dev.mysql.com/doc/refman/5.5/en/using-explain.html so that you can find what parts of your query are taking the longest and optimize them.
Things that jump out to me:
1) You may need to reduce your joins, or optimize them
2) You might need some indexes
3) You have an OR statement in your where clause, which may affect the cache query plan, I have seen an Or cause issues with query caching in tsql. Not sure if that would affect mysql. https://dev.mysql.com/doc/refman/5.1/en/query-cache.html
edit: formatting, found out the photo table join was necessary for the data that was being selected

I'd write the query like this. (I'd also qualify the references to all columns in the query, username, email, last_activity_date). I'd also be doing this as a prepared statement with bind placeholders, rather than including the $user_id variable into the SQL text. We're assuming that the contents of $user_id is known to be "safe", or has been properly escaped, to avoid SQL injection.
SELECT username
, user.id AS uid
, email
, ui.gender
, ui.country
, ui.birthday
, IFNULL(( last_activity_date >= NOW() - INTERVAL 1 HOUR),0) AS is_online
, p.thumb_url
, sr.id AS sr_id
, ul.id IS NOT NULL AS st_love
, TIMESTAMPDIFF(YEAR,ui.birthday,DATE(NOW())) AS age
FROM (
( SELECT frf.id
, frf.to_user_id AS user_id
FROM friend_request frf
WHERE frf.to_user_id <> '$user_id'
AND frf.from_user_id = '$user_id'
AND frf.status = '1'
ORDER BY frf.id DESC
LIMIT 30
)
UNION ALL
( SELECT frt.id
, frt.from_user_id AS user_id
FROM friend_request frt
WHERE frt.from_user_id <> '$user_id'
AND frt.to_user_id = '$user_id'
AND frt.status = '1'
ORDER BY frt.id DESC
LIMIT 30
)
ORDER BY id DESC
LIMIT 30
) sr
JOIN user ON user.id = sr.user_id
JOIN user_info ui ON ui.user_id = user.id
JOIN photo p ON p.id = ui.main_photo
LEFT
JOIN user_love ul
ON ul.to_user_id = user.id
AND ul.from_user_id = '$user_id'
ORDER BY sr.id DESC
LIMIT 30
NOTES:
The inline view sr is intended to get at most 30 rows from friend_request, ordered by id descending (like the original query.)
It looks like the original query is intending to find rows where the specified $user_id is in either the from_ or to_ column.
Older versions of MySQL can generated some pretty obnoxious execution plans for queries involving OR predicates for JOIN operations. The usual workaround for that is to use a UNION ALL or UNION operation of the return from two separate SELECT statements... each SELECT can be optimized to use an appropriate index.
Once we have that resultset from sr, the rest of the query is pretty straightforward.

Related

Trying to join 2 tables based on the most recent timestamp *before* a specific date

Here is my SQL statement, which seemed to function perfectly well before we created a new database. This approach seems to work just fine on another, similarly structured, pair of tables.
SELECT *
FROM tele2_details AS d
INNER JOIN
tele2_usage AS u
ON
d.iccid = u.iccid
AND
u.timestamp = (
SELECT MAX(u.timestamp)
FROM tele2_usage
WHERE u.timestamp <= DATE_FORMAT('2022-01-08 09:30:00','%Y-%m-%d %H:%i:%s')
)
WHERE
accountCustom1='Horizon'
What I'm attempting to do here is join the details table with the usage table, the usage rows just contains the iccid of a sim card, a timestamp, and their current usage in bytes. What this should do is find the most recent usage record before the specified date (2022-01-08 09:30:00). This should give me a set of sims, each joined with it's most recent usage record before the specified time, however I usually get zero results on this particular combo of tables.
Specifically though, it does match any records where the date is the exact same as specified in the query, but not dates that are before or equal to the specified date. Can anybody help me with where I'm going wrong. This query worked fine in a previous database, we're rebuilding our systems and this has now appeared as an issue.
Thanks in advance for any help.
Edit
Here's some more information, that I hope will make the question make more sense. So here is an outline of the details table, I've removed some of the columns but this is at least illustrative of table.
CREATE TABLE `tele2_details` (
`iccid` VARCHAR(255) NOT NULL,
`msisdn` VARCHAR(255) NOT NULL,
`status` VARCHAR(255) NOT NULL,
`ratePlan` VARCHAR(255) NOT NULL,
`communicationPlan` VARCHAR(255) NOT NULL,
PRIMARY KEY (`iccid`)
)
COLLATE='utf8mb4_0900_ai_ci'
ENGINE=InnoDB
;
Then we also have a usages table, which stores sample of the data usage of sim cards, along with timestamps...
CREATE TABLE `tele2_usage` (
`id` INT NOT NULL AUTO_INCREMENT,
`iccid` VARCHAR(255) NOT NULL COLLATE 'latin1_swedish_ci',
`msisdn` VARCHAR(255) NOT NULL COLLATE 'latin1_swedish_ci',
`timestamp` DATETIME NOT NULL,
`ctd_data_usage` BIGINT NOT NULL,
`ctd_sms_usage` BIGINT NOT NULL,
`ctd_voice_usage` BIGINT NOT NULL,
`session_count` INT NOT NULL,
PRIMARY KEY (`id`)
)
COLLATE='utf8mb4_hr_0900_ai_ci'
ENGINE=InnoDB
AUTO_INCREMENT=10116319
;
The query I'm trying to create should return a set of sim details, joined with a usage record which is closest to, but not after, a particular time.
So if you look at the original query at the top, I'm trying to join the details onto the usage record which is **closest to, but not after 2022-01-08 09:30:00 **
I hope that makes sense.
Let's have a look at a particular sim
SELECT iccid, msisdn, status, ratePlan, communicationPlan FROM tele2_details WHERE iccid='xxxx203605100034xxxx'
results in 1 match
"xxxx203605100034xxxx" "xxxx9120012xxxx" "ACTIVATED" "Pay as use - Existing Business" "Data LTE"
And if I look in the usage table for that same sim I can see many records that should satisfy my conditions
SELECT id, iccid, TIMESTAMP, ctd_data_usage FROM tele2_usage WHERE iccid='xxxx203605100034xxxx' AND TIMESTAMP <= '2022-01-08 09:30:00'
results in
"10096279" "xxxx203605100034xxxx" "2022-01-08 09:01:00" "77517560"
"10092271" "xxxx203605100034xxxx" "2022-01-08 08:01:03" "77002733"
"10088263" "xxxx203605100034xxxx" "2022-01-08 07:01:11" "76270445"
"10084255" "xxxx203605100034xxxx" "2022-01-08 06:01:05" "76270445"
of which I would like to select the first record (with the 09:01 timestamp) for joining on to the details record. I can get that timestamp with the following query
SELECT MAX(timestamp)
FROM tele2_usage
WHERE TIMESTAMP <= DATE_FORMAT('2022-01-08 09:30:00','%Y-%m-%d %H:%i:%s') AND iccid='xxxx203605100034xxxx'
which results in '2022-01-08 09:01:00', which is exactly what I want. So now I put it all together...
SELECT *
FROM tele2_details AS d
INNER JOIN
tele2_usage AS u
ON
d.iccid = u.iccid
AND
u.timestamp = (
SELECT MAX(timestamp)
FROM tele2_usage
WHERE timestamp <= DATE_FORMAT('2022-01-08 09:30:00','%Y-%m-%d %H:%i:%s')
)
WHERE
accountCustom1='Horizon' AND iccid='xxxx203605100034xxxx'
And i get nothing! I would expect to get back the details record joined with that particular sim, but actually I get zero results and I don't understand why.
Ideally I would remove the final AND for the iccid and I would expect to recieve a set of all the sims for that client with the usage from closest to but not after the specified date.
So with that explanation, does anyone know why I'm not getting any records? I have a similar table for another sim provider that structured exactly the same, with a details table and a usage table and this query works just fine on that table. I simply can't understand why this doesn't work.
Edit 2
#Serg suggested trying to alias the subquery (I think that's what it's called) which resulted in the following code...
SELECT *
FROM tele2_details AS d
INNER JOIN
tele2_usage AS u
ON
d.iccid = u.iccid
AND
u.timestamp = (
SELECT MAX(u2.timestamp)
FROM tele2_usage AS u2
WHERE u2.timestamp <= DATE_FORMAT('2022-01-08 09:30:00','%Y-%m-%d %H:%i:%s')
)
WHERE
accountCustom1='Horizon'
Unfortunately this still resulted in zero results.
You should correlate the subquery:
SELECT *
FROM tele2_details AS d INNER JOIN tele2_usage AS u
ON d.iccid = u.iccid
AND u.timestamp = (
SELECT MAX(u2.timestamp)
FROM tele2_usage u2
WHERE d.iccid = u2.iccid AND u2.timestamp <= DATE_FORMAT('2022-01-08 09:30:00','%Y-%m-%d %H:%i:%s')
)
WHERE d.accountCustom1='Horizon';
Or, with a join to an aggregation query:
SELECT *
FROM tele2_details AS d
INNER JOIN tele2_usage AS u ON d.iccid = u.iccid
INNER JOIN (
SELECT iccid, MAX(timestamp) timestamp
FROM tele2_usage
WHERE timestamp <= DATE_FORMAT('2022-01-08 09:30:00','%Y-%m-%d %H:%i:%s')
GROUP BY iccid
) m ON m.iccid = u.iccid AND m.timestamp = u.timestamp
WHERE d.accountCustom1='Horizon';

JOIN query taking long time and creating issue "converting HEAP to MyISAM

My query like below. here I used join query to take data. can u pls suggest how can I solve "converting HEAP to MyISAM" issue.
Can I use subquery here to update it? pls suggest how can I.
Here I have joined users table to check user is exist or not. can I refine it without join so that "converting HEAP to MyISAM" can solve.
Oh one more sometimes I will not check with specific user_id. like here I have added user_id = 16082
SELECT `user_point_logs`.`id`,
`user_point_logs`.`user_id`,
`user_point_logs`.`point_get_id`,
`user_point_logs`.`point`,
`user_point_logs`.`expire_date`,
`user_point_logs`.`point` AS `sum_point`,
IF(sum(`user_point_used_logs`.`point`) IS NULL, 0, sum(`user_point_used_logs`.`point`)) AS `minus`
FROM `user_point_logs`
JOIN `users` ON ( `users`.`id` = `user_point_logs`.`user_id` )
LEFT JOIN (SELECT *
FROM user_point_used_logs
WHERE user_point_log_id NOT IN (
SELECT DISTINCT return_id
FROM user_point_logs
WHERE return_id IS NOT NULL
AND user_id = 16082
)
)
AS user_point_used_logs
ON ( `user_point_logs`.`id` = `user_point_used_logs`.`user_point_log_used_id` )
WHERE expire_date >= 1563980400
AND `user_point_logs`.`point` >= 0
AND `users`.`id` IS NOT NULL
AND ( `user_point_logs`.`return_id` = 0
OR `user_point_logs`.`return_id` IS NULL )
AND `user_point_logs`.`user_id` = '16082'
GROUP BY `user_point_logs`.`id`
ORDER BY `user_point_logs`.`expire_date` ASC
DB FIDDLE HERE WITH STRUCTURE
Kindly try this, If it works... will optimize further by adding composite index.
SELECT
upl.id,
upl.user_id,
upl.point_get_id,
upl.point,
upl.expire_date,
upl.point AS sum_point,
coalesce(SUM(upl.point),0) AS minus -- changed from complex to readable
FROM user_point_logs upl
JOIN users u ON upl.user_id = u.id
LEFT JOIN (select supul.user_point_log_used_id from user_point_used_logs supul
left join user_point_logs supl on supul.user_point_log_id=supl.return_id and supl.return_id is null and supl.user_id = 16082) AS upul
ON upl.id=upul.user_point_log_used_id
WHERE
upl.user_id = 16082 and coalesce(upl.return_id,0)= 0
and upl.expire_date >= 1563980400 -- tip: if its unix timestamp change the datatype and if possible use range between dates
#AND upl.point >= 0 -- since its NN by default removing this condition
#AND u.id IS NOT NULL -- removed since the inner join matches not null
GROUP BY upl.id
ORDER BY upl.expire_date ASC;
Edit:
Try adding index in the column return_id on the table user_point_logs.
Since this column is used in join on derived query.
Or use composite index with user_id and return_id
Indexes:
user_point_logs: (user_id, expire_date)
user_point_logs: (user_id, return_id)
OR is hard to optimize. Decide on only one way to say whatever is being said here, then get rid of the OR:
AND ( `user_point_logs`.`return_id` = 0
OR `user_point_logs`.`return_id` IS NULL )
DISTINCT is redundant:
NOT IN ( SELECT DISTINCT ... )
Change
IF(sum(`user_point_used_logs`.`point`) IS NULL, 0,
sum(`user_point_used_logs`.`point`)) AS `minus`
to
COALESCE( ( SELECT SUM(point) FROM user_point_used_logs ... ), 0) AS minus
and toss LEFT JOIN (SELECT * FROM user_point_used_logs ... )
Since a PRIMARY KEY is a key, the second of these is redundant and can be DROPped:
ADD PRIMARY KEY (`id`),
ADD KEY `id` (`id`) USING BTREE;
After all that, we may need another pass to further simplify and optimize it.

Optimize a mysql query with left join, group by and order by

I have this query not written by me that I have to optimize:
SELECT DISTINCT r.itemid
, r.catid
, i.title
, i.owner
, i.image
, i.background
, i.icon
FROM jos_sobi2_cat_items_relations r
LEFT
JOIN jos_sobi2_item i
ON i.itemid = r.itemid
WHERE
( i.published = 1
AND r.catid > 1
AND ( i.publish_down > '2016-10-26 13:08:02'
OR i.publish_down = '0000-00-00 00:00:00'
)
AND i.itemid IN ( SELECT itemid
FROM jos_sobi2_item
WHERE ( published = 1
AND ( publish_down > '2016-10-26 13:08:02'
OR publish_down = '0000-00-00 00:00:00'
)
)
)
)
GROUP
BY i.itemid
ORDER
BY i.publish_up DESC
LIMIT 0,14
This is the explain mysql command:
The "items" table does have just the primary key on itemid field.
The "relation" table does have these 3 indexes:
- catid,itemid PRIMARY BTREE
- itemid BTREE
- catid BTREE
I saw that if I remove the DISTINCT or the GROUP BY clauses the query is fast, otherwise it takes more than 1 minute to be executed.
The first thought I had was to remove the DISTINCT clause, since the GROUP BY clause already does the job. But I am not sure.
Any helps about how optimize it?
Thanks.
At first, items.itemid IN (...) is redundant, You already have these conditions i the query. You do not need LEFT JOIN, items row is in the where condition, it can not be missing. You do not need distinct or group by either, [relation.itemid, relation.catid] is primary key, it can not contain duplicates. So the result is:
SELECT relation.itemid, relation.catid, title, owner, image, background, icon FROM
`jos_sobi2_cat_items_relations` AS relation
JOIN `jos_sobi2_item` AS items ON relation.itemid = items.itemid WHERE
`published` = '1' AND
relation.catid > 1 AND
(`publish_down` > '2016-10-26 13:08:02' OR `publish_down` = '0000-00-00 00:00:00' )
ORDER BY items.publish_up DESC, relation.itemid, relation.catid LIMIT 0, 14
You can compare the result to the original query. I added order on relation.itemid and relation.catid so that the result is deterministic. You can add index on items.publish_up to speed up the query if needed.

Optimize table to avoid using temporary and using filesort

I have a messages table
CREATE TABLE `messages` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`author` int(11) DEFAULT NULL,
`time` int(10) unsigned DEFAULT NULL,
`text` text CHARACTER SET latin1,
`dest` int(11) unsigned DEFAULT NULL,
`type` tinyint(4) unsigned DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `author` (`author`),
KEY `dest` (`dest`)
) ENGINE=InnoDB AUTO_INCREMENT=2758 DEFAULT CHARSET=utf8;
I need to get messages between two users
SELECT
...
FROM
`messages` m
LEFT JOIN `people` p ON m.author = p.id
WHERE
(author = 1 AND dest = 2)
OR (author = 2 AND dest = 1)
ORDER BY
m.id DESC
LIMIT 0, 25
When I EXPLAIN this query I get
Please excuse any ignorance, but is there a way I could optimize this table to avoid using a temporary table and filesort for this query, for now it is not causing a problem but I'm pretty sure in future it is going to be troublesome?
First, I'm guessing the left join is not necessary. Second, consider using union all instead. Then one approach is:
(SELECT ...
FROM messages m JOIN
people p
ON m.author = p.id
WHERE author = 1 AND dest = 2
ORDER BY id DESC
LIMIT 25
)
UNION ALL
(SELECT ...
FROM messages m JOIN
people p
ON m.author = p.id
WHERE author = 2 AND dest = 1
ORDER BY id DESC
LIMIT 25
)
ORDER BY m.id DESC
LIMIT 0, 25
With this query, an index on messages(author, dest, id) should make it fast. (Note: you might need to include m.id in the SELECT list.)
To build on Gordon's answer:
SELECT m2..., p...
FROM
(
( SELECT id
FROM messages
WHERE author = 1
AND dest = 2
ORDER BY id DESC
LIMIT 75
)
UNION ALL
(
SELECT id
FROM messages
WHERE author = 2
AND dest = 1
ORDER BY id DESC
LIMIT 75
)
) ORDER BY id DESC
LIMIT 50, 25 ) AS m1
JOIN messages AS m2 ON m2.id = m1.id
JOIN people p ON p.id = m2.author
ORDER BY m1.id DESC
Notes:
Gordon's index is now "covering". (This adds efficiency, thereby masking some of the other stuff I added.)
Lazy evaluation means that it does not need to shovel all the bulky fields of more than 25 rows around. Instead, only 25 need to be handled. Also, I avoid touching people to start with.
The code shows what "page 3" should look like. Note LIMIT 75 versus LIMIT 50,25.
"Pagination via OFFSET" has several problems. See my blog.
This formulation still will not avoid "filesort" and "using temp". But speed is the real goal, correct? ("Filesort" is a misnomer -- if you don't include that TEXT column, the sort will be done in RAM.)
When you add INDEX(author, dest, id), INDEX(author) becomes redundant; drop it.
The ALL after UNION is not the default for UNION, but it avoids an extra pass (and temp table) to de-duplicate the data.
There will still be 2 or 3 temp tables involved. See EXPLAIN FORMAT=JSON SELECT ... for details.

How to return an ID for the row that has MIN/MAX value within a group?

SELECT
MAX(`client_id`) `client_id`
FROM
`phrases`
WHERE
`language_id` = 1 AND
`client_id` = 1 OR
`client_id` IS NULL
GROUP BY
`language_phrase_id`
How do I get the id for the row that holds MAX(`client_id`) value?
I need this in the context of derived table, e.g.
SELECT
`p2`.`phrase`
FROM
(SELECT `language_phrase_id`, MAX(`client_id`) `client_id` FROM `phrases` WHERE `language_id` = 1 AND `client_id` = 1 OR `client_id` IS NULL GROUP BY `language_phrase_id`) `p1`
INNER JOIN
`phrases` `p2`
ON
`p2`.`language_id` = 1 AND
`p1`.`language_phrase_id` = `p2`.`language_phrase_id` AND
`p1`.`client_id` = `p2`.`client_id`;
Use window functions to find the max for each group and then a where clause to select the row with the maximum:
select p.*
from (SELECT p.*, max(client_id) partition by (language_phrase_id) as maxci
from phrases p
WHERE (`language_id` = 1 AND `client_id`= 1) or
`client_id` IS NULL
) p
where client_id = maxci
I also added parentheses to clarify your where statement. When mixing and and or, I always use parentheses to avoid possible confusion and mistakes.
Now that you've added the mysql tag to your statement, this won't work. So, here is a MySQL-specific solution:
select language_phrase_id,
substring_index(group_concat(id order by client_id desc), ',', 1) as max_id
from phrases
group by phrases p
Note that this if id will get converted to a character string in this process. If it has a different type, you can convert it back.
I had a bit of trouble understanding the requirements, but this seems to be what you're looking for.
Not the most beautiful SQL and I'm sure it can be simplified, but a starting point;
SELECT p1.id, p1.phrase
FROM phrases p1
LEFT JOIN `phrases` p2
ON p1.language_id=p2.language_id
AND p1.language_phrase_id=p2.language_phrase_id
AND p1.client_id IS NULL and p2.client_id = 1
WHERE p2.id IS NULL AND p1.language_id=1
AND (p1.client_id=1 or p1.client_id IS NULL)
GROUP BY p1.language_phrase_id
An SQLfiddle for testing.