Optimize GROUP BY&ORDER BY query - mysql

I have a web page where users upload&watch videos. Last week I asked what is the best way to track video views so that I could display the most viewed videos this week (videos from all dates).
Now I need some help optimizing a query with which I get the videos from the database. The relevant tables are this:
video (~239371 rows)
VID(int), UID(int), title(varchar), status(enum), type(varchar), is_duplicate(enum), is_adult(enum), channel_id(tinyint)
signup (~115440 rows)
UID(int), username(varchar)
videos_views (~359202 rows after 6 days of collecting data, so this table will grow rapidly)
videos_id(int), views_date(date), num_of_views(int)
The table video holds the videos, signup hodls users and videos_views holds data about video views (each video can have one row per day in that table).
I have this query that does the trick, but takes ~10s to execute, and I imagine this will only get worse over time as the videos_views table grows in size.
SELECT
v.VID,
v.title,
v.vkey,
v.duration,
v.addtime,
v.UID,
v.viewnumber,
v.com_num,
v.rate,
v.THB,
s.username,
SUM(vvt.num_of_views) AS tmp_num
FROM
video v
LEFT JOIN videos_views vvt ON v.VID = vvt.videos_id
LEFT JOIN signup s on v.UID = s.UID
WHERE
v.status = 'Converted'
AND v.type = 'public'
AND v.is_duplicate = '0'
AND v.is_adult = '0'
AND v.channel_id <> 10
AND vvt.views_date >= '2001-05-11'
GROUP BY
vvt.videos_id
ORDER BY
tmp_num DESC
LIMIT
8
All the relevant fields are indexed.
And here is a screenshot of the EXPLAIN result:
So, how can I optimize this?
UPDATE
This is my query based on the answer by Quassnoi. It returns the correct videos, but it messes up the JOIN on the signup table. For some records the username field is NULL, for others it contains the wrong username.
SELECT
v.VID,
v.title,
v.vkey,
v.duration,
v.addtime,
v.UID,
v.viewnumber,
v.com_num,
v.rate,
v.THB,
s.username
FROM
(SELECT
videos_id,
SUM(num_of_views) AS tmp_num
FROM
videos_views
WHERE
views_date >= '2010-05-13'
GROUP BY
videos_id
) q
JOIN video v ON v.VID = q.videos_id
LEFT JOIN signup s ON s.UID = v.VID
WHERE
v.type = 'public'
AND v.channel_id <> 10
AND v.is_adult = '0'
AND is_duplicate = '0'
ORDER BY
tmp_num DESC
LIMIT
8
Here is the resultset:

Yeah, ORDER BY on a computed column is always going to be unindexable. Sorry.
If you're going to be doing this query a lot and you want to avoid the views for each video having to be counted and ordered each time, you'll have to denormalise. Add a views_in_last_week column, recalculate it from videos_views in the background each day, and index it (possibly in a compound index with other relevant WHERE conditions).

Create the following index:
video_views (views_date, videos_id)
, and get rid of the LEFT JOIN between videos and views (it does not work with your current query, anyway):
SELECT *
FROM (
SELECT videos_id, SUM(num_of_views) AS tmp_num
FROM video_views
GROUP BY
videos_id
) q
JOIN videos v
ON v.vid = q.videos_id
LEFT JOIN
signup s
ON s.UID = v.UID
ORDER BY
tmp_num DESC
LIMIT 8
If you want to return zero for videos that had never been viewed, change the order of fields in the index:
video_views (videos_id, views_date)
and rewrite the query:
SELECT *,
(
SELECT COALESCE(SUM(num_of_views), 0)
FROM video_views vw
WHERE vw.videos_id = v.vid
AND views_date >= '2001-05-11'
) AS tmp_num
FROM videos v
LEFT JOIN
signup s
ON s.UID = v.UID
ORDER BY
tmp_num DESC
LIMIT 8

Related

Searching and Sorting Using MySQL Inner Join

I guess I can't explain my problem properly. I want to explain this to you with a picture.
Picture 1
In the first picture you can see the hashtags in the trend section. These hashtags are searched for the highest total and it is checked whether the date has passed. If valid data is available, the first 5 hashtags are taken.
Picture 2
In the second picture, it is checked whether the posts in the hashtag are in the post, if any, the oldest date value is taken, LIMIT is set to 1 and the id value from the oyuncular table is matched with sid. Thus, the name of the person sharing can be accessed.
Picture 3
My English is a little bad, I hope I could explain it properly.
SELECT
social_trend.hashtag,
social_trend.total,
social_trend.tarih,
social_post.sid,
social_post.tarih,
social_post.post,
oyuncular.id,
oyuncular.isim
FROM
social_trend
INNER JOIN
social_post
ON
social_post.post LIKE '%social_trend.hashtag%' ORDER BY social_post.tarih LIMIT 1
INNER JOIN
oyuncular
ON
oyuncular.id = social_post.sid
WHERE
social_trend.tarih > UNIX_TIMESTAMP() ORDER BY social_trend.total DESC LIMIT 5
YOu should use a sibquery
and add a proper join between subqiery and social_trend
(i assumed sing both sid)
SELECT
social_trend.hashtag,
social_trend.total,
social_trend.tarih,
t.sid,
t.tarih,
t.post,
oyuncular.id,
oyuncular.isim
FROM (
select social_post.*
from social_post
INNER JOIN social_trend ON social_post.post LIKE concat('%',social_trend.hashtag,'%' )
ORDER BY social_post.tarih LIMIT 1
) t
INNER JOIN social_trend ON social_trend.hashtag= t.post
INNER JOIN oyuncular ON oyuncular.id = t.sid
WHERE
social_trend.tarih > UNIX_TIMESTAMP() ORDER BY social_trend.total DESC LIMIT 5
but looking to your new explanation and img seems you need
SELECT
t.hashtag,
t.total,
t.tarih_trend,
t.sid,
t.tarih,
t.post,
oyuncular.id,
oyuncular.isim
FROM (
select social_post.sid
, social_post.tarih
, social_post.post
, st.hashtag
, st.total
, st.tarih tarih_trend
from social_post
INNER JOIN (
select * from social_trend
WHERE social_trend.tarih > UNIX_TIMESTAMP()
order by total DESC LIMIT 5
) st ON social_post.post LIKE concat('%',st.hashtag,'%' )
ORDER BY social_post.tarih LIMIT 5
) t
INNER JOIN oyuncular ON oyuncular.id = t.sid

SQL: How to merge two complex queries into one, where the second one needs data from the first one

The goal is to load a list of chats where the user sending the request is a member in. Some of the chats are group chats (more than two members) and there I want to show the profile pictures from the users who wrote the last three messages.
The first query to load meta data like the title and the timestamp of the chat is:
SELECT Chat_Users.ID_Chat, Chats.title, Chats.lastMessageAt
FROM Chat_Users
JOIN Chats ON Chats.ID = Chat_Users.ID_Chat
GROUP BY Chat_Users.ID_Chat
HAVING COUNT(Chat_Users.ID_Chat) = 2
AND MAX(Chat_Users.ID_User = $userID) > 0
ORDER BY Chats.lastMessageAt DESC
LIMIT 20
The query to load the last three profile pictures from one of the chats loaded with the query above is:
SELECT GROUP_CONCAT(innerTable.profilePictures SEPARATOR ', ') AS 'ppUrls',
innerTable.ID_Chat
FROM
(
SELECT Chat_Users.ID_Chat, Users.profilePictureUrl AS profilePictures
FROM Users
JOIN Chat_Users ON Chat_Users.ID_User = Users.ID
JOIN Chat_Messages ON Chat_Messages.ID_Chat = Chat_Users.ID_Chat
WHERE Chat_Users.ID_Chat = $chatID
ORDER BY Chat_Messages.timestamp DESC
LIMIT 3
) innerTable
GROUP BY innerTable.ID_Chat
Both are working separately but I want to merge them together so I don't have to run the second query in a loop due to performance reasons. Unfortunately I have no idea how this can be achieved because the second query needs the $chatID, which it only gets from the first query.
So to clarify the desired result: The list with the profile picture urls (second query) should be just another column in the result of the first query.
I hope it is explained in a reasonably understandable way. Any help would be much appreciated.
Edit: Sample data from the affected tables:
Table "Chats":
Table "Chat_Users":
Table "Chat_Messages":
Table "Users":
This fufils the brief, however it requires a view because MySQL 5.x doesn't support the WITH clause.
It's long and cluncky and I've tried to shorten it but this is as good as I can get, hopefully someone will pop up in the comments with a way to make it shorter!
The view:
CREATE VIEW last_interaction AS
SELECT
id_chat,
id_user,
MAX(timestamp) AS timestamp
FROM chat_messages
GROUP BY id_user, id_chat
The query:
SELECT
Chat_Users.ID_Chat,
Chats.title,
Chats.lastMessageAt,
urls.pps AS profilePictureUrls
FROM Chat_Users
JOIN Chats ON Chats.ID = Chat_Users.ID_Chat
JOIN (
SELECT
lo.id_chat,
GROUP_CONCAT(users.profilePictureUrl) AS pps
FROM last_interaction lo
JOIN users ON users.id = lo.id_user
WHERE (
SELECT COUNT(*) -- the amount of more recent interactions
FROM last_interaction li
WHERE (li.timestamp = lo.timestamp AND li.id_user > lo.id_user)
) < 3
GROUP BY id_chat
) urls ON urls.id_chat = Chats.id
GROUP BY Chat_Users.ID_Chat
HAVING COUNT(Chat_Users.ID_Chat) > 2
AND MAX(Chat_Users.ID_User = $userID)
ORDER BY Chats.lastMessageAt DESC
LIMIT 20

How can I build inner query to find trend top 3

Question about SQL. The schema is given below:
User(userID:int, userName:varchar(30), email:varchar(30), password:varchar(30), status:varchar(15))
Video(videoID:int, userID:int, videoTitle:varchar(60), likeCount:int, dislikeCount:int, datePublished:date)
Comment(commentID:int, userID:int, videoID:int, commentText:varchar(1000), dateCommented:date)
Watch(userID:int, videoID:int, dateWatched: date)
Same variables will be using as foreign key. Because of this I do not need to write foreign keys.
A)
List the trending top three videos for a given time interval
(String dateStart, String dateEnd)
A trending video is defined to be the most viewed video in the given interval (i.e., video that is viewed the highest numberof times among all).
You should include dateStart and dateEnd in the result, it is a CLOSED interval.
Output: videoTitle, userName, number of times that the video watched.
I could not figured out how can i find top tree videos. While building inner query what should I do?
If you want the top 3 videos for a period of time, you can do:
select v.videotitle, u.username, count(*) no_watches
from watch w
inner join video v on v.videoid = w.videoid
inner join usr u on u.userid = v.userid
where w.date_watched between ? and ?
group by v.videoid, u.userid
order by no_watches desc
limit 3
The between predicate is how I understand the "closed interval". The ? relate to query parameters that contain the start and end date.
Pre-aggregation might make the query more efficient:
select v.videotitle, u.username, w.no_watches
from (
select videoid, count(*) as no_watches
from watch
where date_watched between ? and ?
group by videoid
order by no_watches desc limit 3
) w
inner join video v on v.videoid = w.videoid
inner join usr u on u.userid = v.userid
order by w.no_watches desc
limit 3

Doctrine2 ORM slow query optimization

I'm struggling to make a query efficient enough. I'm using Doctrine2 ORM (the query is build with QueryBuilder) and part of my query is running very slow - takes about 4s with table of 5000 rows.
This is the relevant part of db schema:
TABLE user
id (primary)
... (plenty of rows, not relevant to the query)
TABLE slot
id (primary)
user_id (foreign for user)
date (datetime)
And this is how my query looks like (it's the basic version, there's a lot of filters to be applied, but these work like fine for now)
SELECT
u.id AS uid,
COUNT(DISTINCT s_order.id) AS sclr_1,
COUNT(DISTINCT s_filter.id) AS sclr_2
FROM
user u
LEFT JOIN slot s_order ON (s_order.user_id = u.id)
LEFT JOIN slot s_filter ON (s_filter.user_id = u.id)
WHERE
(
(
(
s_order.date BETWEEN ?
AND ?
)
AND (
s_filter.date BETWEEN ?
AND ?
)
)
AND (u.deleted_at IS NULL)
)
AND u.userType IN ('2')
GROUP BY
u.id
HAVING
sclr_2 > 0
ORDER BY
sclr_1 DESC
LIMIT
12
Let me explain what I'm trying to achieve here:
I need to filter users who has any slots between 1 week ago and 1 week ahead, then order them by count of slots available between now and 1 week ahead. The part of query causing issues is LEFT JOIN of s_filter and I'm wondering whether perhaps there's a way to improve the performance of that query?
Any help appreciated really, even if it's only plain SQL I'll try to convert it to DQL myself!
#UPDATE
Just an additional info that I forgot, the LIMIT in query is for pagination purposes!
#UPDATE 2
After a while of tweaking the query I figured out that I can use JOIN for filtering instead of LEFT JOIN + COUNT, so my query does look like that now:
SELECT
u.id AS uid, COUNT(DISTINCT s_order.id) AS ordinal
FROM
langu_user u
LEFT JOIN
slot s_order ON (s_order.user_id = u.id) AND s_order.date BETWEEN '2017-02-03 14:03:22' AND '2017-02-10 14:03:22'
JOIN
slot s_filter ON (s_filter.user_id = u.id) AND s_filter.date BETWEEN '2017-01-27 14:03:22' AND '2017-02-10 14:03:22'
WHERE
u.deleted_at IS NULL
AND u.userType IN ('2')
GROUP BY u.id
ORDER BY ordinal DESC
LIMIT 12
And it went down from 4.1-4.3s to 3.6~

mysql Multiple left joins using count

I have been researching this for hours and the best code that I have come up with is this from an example i found on overstack. I have been through several derivations but the following is the only query that returns the correct data, the problem is it takes over 139s (more than 2 minutes) to return only 30 rows of data. Im stuck. (life_p is a 'likes'
SELECT
logos.id,
logos.in_gallery,
logos.active,
logos.pubpriv,
logos.logo_name,
logos.logo_image,
coalesce(cc.Count, 0) as CommentCount,
coalesce(lc.Count, 0) as LikeCount
FROM logos
left outer join(
select comments.logo_id, count( * ) as Count from comments group by comments.logo_id
) cc on cc.logo_id = logos.id
left outer join(
select life_p.logo_id, count( * ) as Count from life_p group by life_p.logo_id
) lc on lc.logo_id = logos.id
WHERE logos.active = '1'
AND logos.pubpriv = '0'
GROUP BY logos.id
ORDER BY logos.in_gallery desc
LIMIT 0, 30
I'm not sure whats wrong. If i do them singularly meaningremove the coalece and one of the joins:
SELECT
logos.id,
logos.in_gallery,
logos.active,
logos.pubpriv,
logos.logo_name,
logos.logo_image,
count( * ) as lc
FROM logos
left join life_p on life_p.logo_id = logos.id
WHERE logos.active = '1'
AND logos.pubpriv = '0'
GROUP BY logos.id
ORDER BY logos.in_gallery desc
LIMIT 0, 30
that runs in less than half a sec ( 2-300 ms )....
Here is a link to the explain: https://logopond.com/img/explain.png
MySQL has a peculiar quirk that allows a group by clause that does not list all non-aggregating columns. This is NOT a good thing and you should always specify ALL non-aggregating columns in the group by clause.
Note, when counting over joined tables it is useful to know that the COUNT() function ignores NULLs, so for a LEFT JOIN where NULLs can occur don't use COUNT(*), instead use a column from within the joined table and only rows from that table will be counted. From these points I would suggest the following query structure.
SELECT
logos.id
, logos.in_gallery
, logos.active
, logos.pubpriv
, logos.logo_name
, logos.logo_image
, COALESCE(COUNT(cc.logo_id), 0) AS CommentCount
, COALESCE(COUNT(lc.logo_id), 0) AS LikeCount
FROM logos
LEFT OUTER JOIN comments cc ON cc.logo_id = logos.id
LEFT OUTER JOIN life_p lc ON lc.logo_id = logos.id
WHERE logos.active = '1'
AND logos.pubpriv = '0'
GROUP BY
logos.id
, logos.in_gallery
, logos.active
, logos.pubpriv
, logos.logo_name
, logos.logo_image
ORDER BY logos.in_gallery DESC
LIMIT 0, 30
If you continue to have performance issues then use a execution plan and consider adding indexes to suit.
You can create some indexes on the joining fields:
ALTER TABLE table ADD INDEX idx__tableName__fieldName (field)
In your case will be something like:
ALTER TABLE cc ADD INDEX idx__cc__logo_id (logo_id);
I dont really like it because ive always read that sub queries are bad and that joins perform better under stress, but in this particular case subquery seems to be the only way to pull the correct data in under half a sec consistently. Thanks for the suggestions everyone.
SELECT
logos.id,
logos.in_gallery,
logos.active,
logos.pubpriv,
logos.logo_name,
logos.logo_image,
(Select COUNT(comments.logo_id) FROM comments
WHERE comments.logo_id = logos.id) AS coms,
(Select COUNT(life_p.logo_id) FROM life_p
WHERE life_p.logo_id = logos.id) AS floats
FROM logos
WHERE logos.active = '1' AND logos.pubpriv = '0'
ORDER BY logos.in_gallery desc
LIMIT ". $start .",". $pageSize ."
Also you can create a mapping tables to speed up your query try:
CREATE TABLE mapping_comments AS
SELECT
comments.logo_id,
count(*) AS Count
FROM
comments
GROUP BY
comments.logo_id
) cc ON cc.logo_id = logos.id
Then change your code
left outer join(
should become
inner join mapping_comments as mp on mp.logo_id =cc.id
Then each time a new comment are added to the cc table you need to update your mapping table OR you can create a stored procedure to do it automatically when your cc table changes