order by makes query slow - mysql

I have two tables :
video (ID, TITLE, ..., UPLOADED_DATE)
join_video_category (ID (not used), ID_VIDEO_ ID_CATEGORY)
rows in video : 4 500 000 |
rows in join_video_category : 5 800 000
1 video can have many category.
I have a query works perfectly, 20 ms max to get result :
SELECT * FROM video WHERE ID IN
(SELECT ID_VIDEO FROM join_video_category WHERE ID_CATEGORY=11)
LIMIT 1000;
This query take 1000 video, the order is not important.
BUT, when i would like to get 10 latest video from a category, my query take arround 30-40 seconds :
SELECT * FROM video WHERE ID IN
(SELECT ID_VIDEO FROM join_video_category WHERE ID_CATEGORY=11)
ORDER BY UPLOADED_DATE DESC LIMIT 10;
I have index on ID_CATEGORY, ID_VIDEO, UPLOADED_DATE, PRIMARY ON ID video and join_video_category.
I have tested it with JOIN on my query, it's the same result.

First, the comparisons are to two very different queries. The first returns a bunch of videos whenever it encounters them. The second has to read all the videos and then sort them.
Try rewriting this as a JOIN:
SELECT v.*
FROM video v JOIN
join_video_category vc
ON v.id = bc.id_video
WHERE vc.ID_CATEGORY = 11
ORDER BY v.UPLOADED_DATE DESC
LIMIT 10;
That may or may not help. You have a lot of data and so you might have a lot of videos for a given category. If so, a where clause that gets more recent data might really help:
SELECT v.*
FROM video v JOIN
join_video_category vc
ON v.id = bc.id_video
WHERE vc.ID_CATEGORY = 11 AND v.UPLOADED_DATE >= '2015-01-01'
ORDER BY v.UPLOADED_DATE DESC
LIMIT 10;
Finally, if that doesn't work, consider adding something like UPLOADED_DATE into join_video_category. Then, this query should blaze:
select vc.video_id
from join_vdeo_category vc
where vc.ID_CATEGORY = 11
order by vc.UPLOADED_DATE desc
limit 10;
with an index on join_video_category(id_category, uploaded_date, video_id).

solution #1:
replacing "in" with "exists" would improve the performance, please try the below query.
SELECT * FROM video WHERE exists
(SELECT * FROM join_video_category WHERE ID_CATEGORY=11 AND join_video_category.ID_VIDEO = video.ID)
ORDER BY UPLOADED_DATE DESC LIMIT 10;
solution #2:
1) create tem_table
CREATE TABLE TEMP_TABLE AS SELECT * FROM join_video_category WHERE ID_CATEGORY=11;
2) use the temp table in solution #1
SELECT * FROM video WHERE exists
(SELECT * FROM temp_table WHERE temp_table.ID_VIDEO = video.ID)
ORDER BY UPLOADED_DATE DESC LIMIT 10;
Good Luck!!

If it is 1:Many, don't use an extra table between Video and Category. However, your row counts imply that it is Many:Many.
If it is 1:Many, simply have the category_id in the Video table, then simplify all the queries.
If it is Many:Many, then be sure to use this pattern for the junction table:
CREATE TABLE map_video_category (
video_id ...,
category_id ...,
PRIMARY KEY(video_id, category_id), -- both ids, one direction
INDEX (category_id, video_id) -- both ids, the other direction
) ENGINE=InnoDB; -- significantly better than MyISAM on INDEX handling here
The ID that you mentioned is a waste. The composite keys are optimal for all situations, and will improve performance in most situations.
Do not use IN ( SELECT ... ); the optimizer does a poor job of optimizing it. Change to a JOIN, LEFT JOIN, EXISTS, or some other construct.

Related

Slow query with Having on calculate field

I have a query that is slow... i want to display the last 12 newest members near me(near the logged user) and my dev database has 150k rows.
It took over 1 second and the explain query tells me that 30k rows are filtered
So 30k filtered for 150k rows in my developpment DB... my server online is much bigger thant this....
Here my query :
SELECT profils.*,
Users.username,
( SELECT count(*)
from profilsphotos pp
where pp.iduser=Profils.iduser
) as nbpics,
ATAN2(SQRT(POW(COS(RADIANS(50.78961000)) * SIN(RADIANS(Y(gm_coor) - 4.64956000)),
2) + POW(COS(RADIANS(X(gm_coor))) * SIN(RADIANS(50.78961000)) - SIN(RADIANS(X(gm_coor))) * COS(RADIANS(50.78961000)) * COS(RADIANS(Y(gm_coor) - 4.64956000)),
2)), (SIN(RADIANS(X(gm_coor))) * SIN(RADIANS(50.78961000)) + COS(RADIANS(X(gm_coor))) * COS(RADIANS(50.78961000)) * COS(RADIANS(Y(gm_coor) - 4.64956000)))
) * 6372.795 AS distance
from Users
inner join Profils ON Users.id=Profils.iduser
where Profils.Actif=1
and profils.idsexe=2
and profils.idlookingfor=1
and Profils.iduser<>1
HAVING distance<400
order by Users.id desc, distance asc
limit 12
Note that i add an index on those four fields: actif,idsexe,idlookingfor and iduser
What wrong with my query ?
Thanks a lot !
Pascal
I would extract the subquery from the SELECT clause to a temporary table, index it and join to it, instead of executing it for every record in the select clause (30K times).
So the steps are: create a temp table, index it, run the optimized query.
First, create the relevant indexes for the query:
ALTER TABLE
`Profils`
ADD
INDEX `profils_idx_actif_iduser` (`Actif`, `iduser`);
ALTER TABLE
`Users`
ADD
INDEX `users_idx_id_username` (`id`, `username`);
ALTER TABLE
`profils`
ADD
INDEX `profils_idx_idsexe_idlookingfor` (`idsexe`, `idlookingfor`);
ALTER TABLE
`profilsphotos`
ADD
INDEX `profilsphotos_idx_iduser` (`iduser`);
Now, create the temp table and index it:
-- Transformed subquery to a temp table to improve performance
CREATE TEMPORARY TABLE IF NOT EXISTS temp1 AS SELECT
count(*) AS nbpics,
iduser
FROM
profilsphotos pp
WHERE
1 = 1
GROUP BY
iduser
ORDER BY
NULL;
ALTER TABLE
`temp1`
ADD
INDEX `temp1_idx_iduser_nbpics` (`iduser`, `nbpics`);
Now try to run this query instead of the original one and see if it runs faster:
SELECT
optimizedSub1.*,
temp1.nbpics
FROM
(SELECT
Users.username,
ATAN2(SQRT(POW(COS(RADIANS(50.78961000)) * SIN(RADIANS(Y(Profils.gm_coor) - 4.64956000)),
2) + POW(COS(RADIANS(X(Profils.gm_coor))) * SIN(RADIANS(50.78961000)) - SIN(RADIANS(X(Profils.gm_coor))) * COS(RADIANS(50.78961000)) * COS(RADIANS(Y(Profils.gm_coor) - 4.64956000)),
2)),
(SIN(RADIANS(X(Profils.gm_coor))) * SIN(RADIANS(50.78961000)) + COS(RADIANS(X(Profils.gm_coor))) * COS(RADIANS(50.78961000)) * COS(RADIANS(Y(Profils.gm_coor) - 4.64956000)))) * 6372.795 AS distance
FROM
Users
INNER JOIN
Profils
ON Users.id = Profils.iduser
WHERE
Profils.Actif = 1
AND profils.idsexe = 2
AND profils.idlookingfor = 1
AND Profils.iduser <> 1
HAVING
distance < 400
ORDER BY
Users.id DESC,
distance ASC LIMIT 12) AS optimizedSub1
LEFT JOIN
temp1
ON temp1.iduser = optimizedSub1.iduser
Profils needs
INDEX(Actif, idsexe, idlookingfor) -- in any order
Perhaps distance should be first?..
order by Users.id desc, distance asc
What is Y(gm_coor)? If is a Stored Function, we need to know more. What table has gm_coor? After that, maybe we can discuss a "bounding box" as a partial speedup.
Make another nesting of SELECTs and move the computation of nbpics to it. Currently, the COUNT(*) is being performed 30K times. After the change, it will be only 12 times.
Reformulation
SELECT p2.*,
u.username,
( SELECT COUNT(*)
FROM profilsphotos pp
where pp.iduser = p2.iduser
) as nbpics,
x.distance
FROM
( SELECT p1.id, -- assuming this the PK of Profils
(...) AS distance
FROM Profils AS p1
WHERE p1.Actif=1
and p1.idsexe=2
and p1.idlookingfor=1
and p1.iduser<>1
HAVING distance < 400
ORDER BY distance
LIMIT 12
) AS x
JOIN profils AS p2 USING(id)
JOIN Users AS u ON u.id = p2.iduser;

How to make faster queries on my mysql table?

I have the following table
As you can see It has 1868155 rows. I am attempting to make a realtime graph, but It is impossible since almost any query lasts 1 or 2 seconds.
For example, this query
SELECT sensor.nombre, temperatura.temperatura
FROM sensor, temperatura
WHERE sensor.id = temperatura.idsensor
ORDER BY temperatura.fecha DESC, idsensor ASC
LIMIT 4
Is supposed to show this
Ive tried everything, using indexes(perhaps not correctly), using only the fields i need instead of *, etc. but the results are the same!
These are the indexes of the table.
Explain of the query
EDITED
This is the explain of the query after implementing
ALTER TABLE temperatura
ADD INDEX `sensor_temp` (`idsensor`,`fecha`,`temperatura`)
And using inner join syntax for the query
SELECT s.nombre, t.temperatura
FROM sensor s
INNER JOIN temperatura t
ON s.id = t.idsensor
ORDER BY t.fecha DESC, t.idsensor ASC
LIMIT 4
This is my whole sensor table
Try the following:
ALTER TABLE temperatura
ADD INDEX `sensor_temp` (`idsensor`,`fecha`,`temperatura`)
I also recommend using modern join syntax:
SELECT s.nombre, t.temperatura
FROM sensor s
INNER JOIN temperatura t
ON s.id = t.idsensor
ORDER BY t.fecha DESC, t.idsensor ASC
LIMIT 4
Report the EXPLAIN again after making the above changes, if performance is still not good enough.
Attempt #2
After looking closely at what it appears you are trying to do, I believe this next query may be more effective:
SELECT
s.nombre, t.temperatura
FROM temperatura t
LEFT OUTER JOIN temperatura later_t
ON later_t.idsensor = t.idsensor
AND later_t.fecha > t.fecha
INNER JOIN sensor s
ON s.id = t.idsensor
WHERE later_t.idsensor IS NULL
ORDER BY t.idsensor ASC
You can also try:
SELECT
s.nombre, t.temperatura
FROM temperatura t
INNER JOIN (
SELECT
t.idsensor,
MAX(t.fecha) AS fecha
FROM temperatura t
GROUP BY t.idsensor
) max_fecha
ON max_fecha.idsensor = t.idsensor
AND max_fecha.fecha > t.fecha
INNER JOIN sensor s
ON s.id = t.idsensor
ORDER BY t.idsensor ASC
In my experience, if you are trying to find the most recent record, one of the two queries above will work. Which works best depends on various factors, so try them both.
Let me know how those perform, and if they still get you the data you want. Also, any query you run, run at least 3 times, and report all 3 times. That will help get an accurate measure of how fast a given query is, since various external factors can affect the speed of a query.
It is not possible to optimize a mixture of ASC and DESC, as in
ORDER BY t.fecha DESC, t.idsensor ASC
You tried a covering index:
INDEX `sensor_temp` (`idsensor`,`fecha`,`temperatura`)
However, this covering index may be better:
INDEX `sensor_temp` (`fecha`,`idsensor`,`temperatura`)
Then, if you are willing to get the sensors in a different order, use
ORDER BY t.fecha DESC, t.idsensor DESC
This will give you up to 4 sensors for the last fecha:
sensor: PRIMARY KEY(id)
tempuratura: INDEX(fecha, idsensor, tempuratura)
SELECT
( SELECT nombre FROM sensor WHERE id = t.idsensor ) AS nombre,
t.temperatura
FROM
( SELECT MAX(fecha) AS max_fecha FROM tempuratura ) AS z
JOIN temperatura AS t ON t.fecha = z.max_fecha
ORDER BY t.idsensor ASC
LIMIT 4;

MySql Indexs for where and order by clause

I'm having problems with big table indexs.
I have the table (id,external_item_id,time_stamp,status_id);
What's the best index for this 3 queries:
SELECT *
FROM items pi
WHERE 1=1 AND external_item_id IN (1154,1155,1163,3660,6801,98)
ORDER BY pi.time_stamp DESC, pi.id DESC
LIMIT 12
SELECT *
FROM items pi
WHERE 1=1 AND external_item_id IN (1154,1155,1163,3660,6801,98) AND status_id < 20
ORDER BY pi.time_stamp DESC, pi.id DESC
LIMIT 12
SELECT *
FROM items pi
WHERE 1=1 AND external_item_id IN (1154,1155,1163,3660,6801,98) AND pi.time_stamp <= 13434534452 AND id < 1600
ORDER BY pi.time_stamp DESC, pi.id DESC
LIMIT 12
Because of the list of items for the in, it is hard to optimize this query.
There are basically two approaches the engine can take for these queries. Use the index for the where clause. Then either do the sort or use the index for the oder by. Because there are inequalities in the where (in is an "inequality"), the index cannot be directly used for the where.
The best indexes for the where are: items(external_item_id, status_id) and items(external_item_id, time_stamp).
An alternative execution plan is to use the index for the order by and then filter on the fly. This suggests trying: items(time_stamp, id, external_item_id, status_id). The last two columns are so the index can satisfy the where without going to the original data.
None of these are perfect solutions.

How to limiting subquery requests to one?

I was thinking a way to using one query with a subquery instead of using two seperate queries.
But turns out using a subquery is causing multiple requests for each row in result set. Is there a way to limit that count subquery result only one with in a combined query ?
SELECT `ad_general`.`id`,
( SELECT count(`ad_general`.`id`) AS count
FROM (`ad_general`)
WHERE `city` = 708 ) AS count,
FROM (`ad_general`)
WHERE `ad_general`.`city` = '708'
ORDER BY `ad_general`.`id` DESC
LIMIT 15
May be using a join can solve the problem but dunno how ?
SELECT ad_general.id, stats.cnt
FROM ad_general
JOIN (
SELECT count(*) as cnt
FROM ad_general
WHERE city = 708
) AS stats
WHERE ad_general.city = 708
ORDER BY ad_general.id DESC
LIMIT 15;
The explicit table names aren't required, but are used both for clarity and maintainability (the explicit table names will prevent any imbiguities should the schema for ad_general or the generated table ever change).
You can self-join (join the table to itself table) and apply aggregate function to the second.
SELECT `adgen`.`id`, COUNT(`adgen_count`.`id`) AS `count`
FROM `ad_general` AS `adgen`
JOIN `ad_general` AS `adgen_count` ON `adgen_count`.city = 708
WHERE `adgen`.`city` = 708
GROUP BY `adgen`.`id`
ORDER BY `adgen`.`id` DESC
LIMIT 15
However, it's impossible to say what the appropriate grouping is without knowing the structure of the table.

How to select maximum 3 items per users in MySQL?

I run a website where users can post items (e.g. pictures). The items are stored in a MySQL database.
I want to query for the last ten posted items BUT with the constraint of a maximum of 3 items can come from any single user.
What is the best way of doing it? My preferred solution is a constraint that is put on the SQL query requesting the last ten items. But ideas on how to set up the database design is very welcome.
Thanks in advance!
BR
It's pretty easy with a correlated sub-query:
SELECT `img`.`id` , `img`.`userid`
FROM `img`
WHERE 3 > (
SELECT count( * )
FROM `img` AS `img1`
WHERE `img`.`userid` = `img1`.`userid`
AND `img`.`id` > `img1`.`id` )
ORDER BY `img`.`id` DESC
LIMIT 10
The query assumes that larger id means added later
Correlated sub-queries are a powerful tool! :-)
This is difficult because MySQL does not support the LIMIT clause on sub-queries. If it did, this would be rather trivial... But alas, here is my naïve approach:
SELECT
i.UserId,
i.ImageId
FROM
UserSuppliedImages i
WHERE
/* second valid ImageId */
ImageId = (
SELECT MAX(ImageId)
FROM UserSuppliedImages
WHERE UserId = i.UserId
)
OR
/* second valid ImageId */
ImageId = (
SELECT MAX(ImageId)
FROM UserSuppliedImages
WHERE UserId = i.UserId
AND ImageId < (
SELECT MAX(ImageId)
FROM UserSuppliedImages
WHERE UserId = i.UserId
)
)
/* you get the picture...
the more "per user" images you want, the more complex this will get */
LIMIT 10;
You did not comment on having a preferred result order, so this selects the latest images (assuming ImageId is an ascending auto-incrementing value).
For comparison, on SQL Server the same would look like this:
SELECT TOP 10
img.ImageId,
img.ImagePath,
img.UserId
FROM
UserSuppliedImages img
WHERE
ImageId IN (
SELECT TOP 3 ImageId
FROM UserSuppliedImages
WHERE UserId = img.UserId
)
I would first select 10 distinct users, then selecting images from each of those users with a LIMIT 3, possibly by a union of all those and limit that to 10.
That would atleast narrow down the data you need to process to a fair amount.