Question about SQL. The schema is given below:
User(userID:int, userName:varchar(30), email:varchar(30), password:varchar(30), status:varchar(15))
Video(videoID:int, userID:int, videoTitle:varchar(60), likeCount:int, dislikeCount:int, datePublished:date)
Comment(commentID:int, userID:int, videoID:int, commentText:varchar(1000), dateCommented:date)
Watch(userID:int, videoID:int, dateWatched: date)
Same variables will be using as foreign key. Because of this I do not need to write foreign keys.
A)
List the trending top three videos for a given time interval
(String dateStart, String dateEnd)
A trending video is defined to be the most viewed video in the given interval (i.e., video that is viewed the highest numberof times among all).
You should include dateStart and dateEnd in the result, it is a CLOSED interval.
Output: videoTitle, userName, number of times that the video watched.
I could not figured out how can i find top tree videos. While building inner query what should I do?
If you want the top 3 videos for a period of time, you can do:
select v.videotitle, u.username, count(*) no_watches
from watch w
inner join video v on v.videoid = w.videoid
inner join usr u on u.userid = v.userid
where w.date_watched between ? and ?
group by v.videoid, u.userid
order by no_watches desc
limit 3
The between predicate is how I understand the "closed interval". The ? relate to query parameters that contain the start and end date.
Pre-aggregation might make the query more efficient:
select v.videotitle, u.username, w.no_watches
from (
select videoid, count(*) as no_watches
from watch
where date_watched between ? and ?
group by videoid
order by no_watches desc limit 3
) w
inner join video v on v.videoid = w.videoid
inner join usr u on u.userid = v.userid
order by w.no_watches desc
limit 3
Related
I guess I can't explain my problem properly. I want to explain this to you with a picture.
Picture 1
In the first picture you can see the hashtags in the trend section. These hashtags are searched for the highest total and it is checked whether the date has passed. If valid data is available, the first 5 hashtags are taken.
Picture 2
In the second picture, it is checked whether the posts in the hashtag are in the post, if any, the oldest date value is taken, LIMIT is set to 1 and the id value from the oyuncular table is matched with sid. Thus, the name of the person sharing can be accessed.
Picture 3
My English is a little bad, I hope I could explain it properly.
SELECT
social_trend.hashtag,
social_trend.total,
social_trend.tarih,
social_post.sid,
social_post.tarih,
social_post.post,
oyuncular.id,
oyuncular.isim
FROM
social_trend
INNER JOIN
social_post
ON
social_post.post LIKE '%social_trend.hashtag%' ORDER BY social_post.tarih LIMIT 1
INNER JOIN
oyuncular
ON
oyuncular.id = social_post.sid
WHERE
social_trend.tarih > UNIX_TIMESTAMP() ORDER BY social_trend.total DESC LIMIT 5
YOu should use a sibquery
and add a proper join between subqiery and social_trend
(i assumed sing both sid)
SELECT
social_trend.hashtag,
social_trend.total,
social_trend.tarih,
t.sid,
t.tarih,
t.post,
oyuncular.id,
oyuncular.isim
FROM (
select social_post.*
from social_post
INNER JOIN social_trend ON social_post.post LIKE concat('%',social_trend.hashtag,'%' )
ORDER BY social_post.tarih LIMIT 1
) t
INNER JOIN social_trend ON social_trend.hashtag= t.post
INNER JOIN oyuncular ON oyuncular.id = t.sid
WHERE
social_trend.tarih > UNIX_TIMESTAMP() ORDER BY social_trend.total DESC LIMIT 5
but looking to your new explanation and img seems you need
SELECT
t.hashtag,
t.total,
t.tarih_trend,
t.sid,
t.tarih,
t.post,
oyuncular.id,
oyuncular.isim
FROM (
select social_post.sid
, social_post.tarih
, social_post.post
, st.hashtag
, st.total
, st.tarih tarih_trend
from social_post
INNER JOIN (
select * from social_trend
WHERE social_trend.tarih > UNIX_TIMESTAMP()
order by total DESC LIMIT 5
) st ON social_post.post LIKE concat('%',st.hashtag,'%' )
ORDER BY social_post.tarih LIMIT 5
) t
INNER JOIN oyuncular ON oyuncular.id = t.sid
I'm struggling to make a query efficient enough. I'm using Doctrine2 ORM (the query is build with QueryBuilder) and part of my query is running very slow - takes about 4s with table of 5000 rows.
This is the relevant part of db schema:
TABLE user
id (primary)
... (plenty of rows, not relevant to the query)
TABLE slot
id (primary)
user_id (foreign for user)
date (datetime)
And this is how my query looks like (it's the basic version, there's a lot of filters to be applied, but these work like fine for now)
SELECT
u.id AS uid,
COUNT(DISTINCT s_order.id) AS sclr_1,
COUNT(DISTINCT s_filter.id) AS sclr_2
FROM
user u
LEFT JOIN slot s_order ON (s_order.user_id = u.id)
LEFT JOIN slot s_filter ON (s_filter.user_id = u.id)
WHERE
(
(
(
s_order.date BETWEEN ?
AND ?
)
AND (
s_filter.date BETWEEN ?
AND ?
)
)
AND (u.deleted_at IS NULL)
)
AND u.userType IN ('2')
GROUP BY
u.id
HAVING
sclr_2 > 0
ORDER BY
sclr_1 DESC
LIMIT
12
Let me explain what I'm trying to achieve here:
I need to filter users who has any slots between 1 week ago and 1 week ahead, then order them by count of slots available between now and 1 week ahead. The part of query causing issues is LEFT JOIN of s_filter and I'm wondering whether perhaps there's a way to improve the performance of that query?
Any help appreciated really, even if it's only plain SQL I'll try to convert it to DQL myself!
#UPDATE
Just an additional info that I forgot, the LIMIT in query is for pagination purposes!
#UPDATE 2
After a while of tweaking the query I figured out that I can use JOIN for filtering instead of LEFT JOIN + COUNT, so my query does look like that now:
SELECT
u.id AS uid, COUNT(DISTINCT s_order.id) AS ordinal
FROM
langu_user u
LEFT JOIN
slot s_order ON (s_order.user_id = u.id) AND s_order.date BETWEEN '2017-02-03 14:03:22' AND '2017-02-10 14:03:22'
JOIN
slot s_filter ON (s_filter.user_id = u.id) AND s_filter.date BETWEEN '2017-01-27 14:03:22' AND '2017-02-10 14:03:22'
WHERE
u.deleted_at IS NULL
AND u.userType IN ('2')
GROUP BY u.id
ORDER BY ordinal DESC
LIMIT 12
And it went down from 4.1-4.3s to 3.6~
This is a common greatest-n-per-group question, but with an extra problem.
What i want is to find the latest 20 posts of a user, and for each post, load its latest 5 (or n) comments. Moreover, for pagination needs, i need to know how many other comments each post has. Is it possible to find it out with the same query? (Or at least with the less efficient queries needed)
What i've done now is finding the latest posts and their comments(all):
Finding latest posts of user:
SELECT up.primkey
, up.sender
, up.comment
, up.date
, up.admin_approved
, u.username
, u.avatar
FROM users_posts up
JOIN users u
ON up.sender=u.userkey
WHERE up.sender=?
AND up.admin_approved=0
ORDER
BY primkey DESC LIMIT $from,$to;
Then,i store each primkey in an array, in order to retrieve the comments of these posts.
Finding the comments of the posts:
SELECT c.primkey,c.post_id, c.sender, c.comment, c.date, u.username, u.avatar
FROM users_posts_comments c
LEFT JOIN users u ON c.sender=u.userkey
WHERE c.post_id IN('.implode(",", $posts_array).') AND c.`admin_approved`=0
ORDER BY c.primkey DESC;
After that, i store each comment to a new array ($comments[$post_id][]=comment info) and then i output the result.
What i want is to modify the second query and limit the comments to the 5 recent, and also to find somehow how many are the total comments of each post in order to show the pagination.
Here's a fiddle... sqlfiddle.com/#!2/e92a6/1
Expected result:
post8
post7
comment-7
comment-6
comment-5
{pages}
post6
...
I know it would be difficult, so what would you recommend me to do(the most efficient way)?
Thanks.
A faster solution uses variables - but I'm old school...
This query gives you the latest 5 posts for each sender...
SELECT x.* ,COUNT(*)
FROM users_posts x
JOIN users_posts y
ON y.sender = x.sender
AND y.date >= x.date
GROUP
BY x.primkey
HAVING COUNT(*) <= 5
So now you can extend that idea to return the 3 most recent comments (if any) for each of those last 5 posts (for each sender)
SELECT a.*, upc.*
FROM
( SELECT x.*
FROM users_posts x
JOIN users_posts y
ON y.sender = x.sender
AND y.date >= x.date
GROUP
BY x.primkey
HAVING COUNT(*) <= 5
) a
LEFT
JOIN users_posts_comments upc
ON upc.post_id = a.primkey
LEFT
JOIN users_posts_comments z
ON z.post_id = upc.post_id
AND z.date >= upc.date
GROUP
BY a.sender
, a.primkey
, upc.primkey
HAVING COUNT(upc.post_id) <= 3;
I have an assignment to create a twitter like database. And in this assignment i have to filter out the trending topics. My idea was to count the tweets with a specific tag between the date the tweet was made and 7 days later, and order them by the count.
I have the following 2 tables i am using for this query :
Table Tweet : id , message, users_id, date
Table Tweet_tags : id, tag, tweet_id
Since mysql isn't my strong point at all im having trouble getting any results from the query.
The query i tried is :
Select
Count(twitter.tweet_tags.id) As NumberofTweets,
twitter.tweet_tags.tag
From twitter.tweet
Inner Join twitter.tweet_tags On twitter.tweet_tags.tweet_id = twitter.tweet.id
WHERE twitter.tweet_tags.tag between twitter.tweet.date and ADDDATE(twitter.tweet.date, INTERVAL 7 day)
ORDER BY NumberofTweets
The query works, but gives no results. I just can't get it to work. Could you guys please help me out on this, or if you have a better way to get the trending topics please let me know!
Thanks alot!
This is equivalent to your query, with table aliases to make it easier to read, with BETWEEN replaced by two inequality predicates, and the ADDDATE function replaced with equivalent operation...
SELECT COUNT(s.id) As NumberofTweets
, s.tag
FROM twitter.tweet t
JOIN twitter.tweet_tags s
ON s.tweet_id = t.id
WHERE s.tag >= t.date
AND s.tag <= t.date + INTERVAL 7 DAY
ORDER
BY NumberofTweets
Two things pop out at me here...
First, there is no GROUP BY. To get a count by "tag", you want at GROUP BY tag.
Second, you are comparing "tag" to "date". I don't know your tables, but that just doesn't look right. (I expect "date" is a DATETIME or TIMESTAMP, and "tag" is a character string (maybe what my daughter calls a "hash tag". Or is that tumblr she's talking about?)
If I understand your requirement:
For each tweet, and for each tag associated with that tweet, you want to get a count of the number of other tweets, that have a matching tag, that are made within 7 days after the datetime of the tweet.
One way to get this result would be to use a correlated subquery. (This is probably the easiest approach to understand, but is probably not the best approach from a performance standpoint).
SELECT t.id
, s.tag
, ( SELECT COUNT(1)
FROM twitter.tweet_tags r
JOIN twitter.tweet q
ON q.id = r.tweet_id
WHERE r.tag = s.tag
AND q.date >= t.date
AND q.date <= t.date + INTERVAL 7 DAY
) AS cnt
FROM twitter.tweet t
JOIN twitter.tweet_tags s
ON s.tweet_id = t.id
ORDER
BY cnt DESC
Another approach would be to use a join operation:
SELECT t.id
, s.tag
, COUNT(q.id) AS cnt
FROM twitter.tweet t
JOIN twitter.tweet_tags s
ON s.tweet_id = t.id
LEFT
JOIN twitter.tweet_tags r
ON r.tag = s.tag
LEFT
JOIN twitter.tweet q
ON q.id = r.tweet_id
AND q.date >= t.date
AND q.date <= t.date + INTERVAL 7 DAY
GROUP
BY t.id
, s.tag
ORDER
BY cnt DESC
The counts from both of these queries assume that tweet_tags (tweet_id, tag) is unique. If there are any "duplicates", then including the DISTINCT keyword, i.e. COUNT(DISTINCT q.id) (in place of COUNT(1) and COUNT(q.id) respectively) would get you the count of "related" tweets.
NOTE: the counts returned will include the original tweet itself.
NOTE: removing the LEFT keywords from the query above should return an equivalent result, since the tweet/tag (from t/s) is guaranteed to match itself (from r/q), as long as the tag is not null and the tweet date is not null.
Those queries are going to have problematic performance on large sets. Appropriate covering indexes are going to be needed for acceptable performance:
... ON twitter.tweet_tags (tag, tweet_id)
... ON twitter.tweet (date)
I have a web page where users upload&watch videos. Last week I asked what is the best way to track video views so that I could display the most viewed videos this week (videos from all dates).
Now I need some help optimizing a query with which I get the videos from the database. The relevant tables are this:
video (~239371 rows)
VID(int), UID(int), title(varchar), status(enum), type(varchar), is_duplicate(enum), is_adult(enum), channel_id(tinyint)
signup (~115440 rows)
UID(int), username(varchar)
videos_views (~359202 rows after 6 days of collecting data, so this table will grow rapidly)
videos_id(int), views_date(date), num_of_views(int)
The table video holds the videos, signup hodls users and videos_views holds data about video views (each video can have one row per day in that table).
I have this query that does the trick, but takes ~10s to execute, and I imagine this will only get worse over time as the videos_views table grows in size.
SELECT
v.VID,
v.title,
v.vkey,
v.duration,
v.addtime,
v.UID,
v.viewnumber,
v.com_num,
v.rate,
v.THB,
s.username,
SUM(vvt.num_of_views) AS tmp_num
FROM
video v
LEFT JOIN videos_views vvt ON v.VID = vvt.videos_id
LEFT JOIN signup s on v.UID = s.UID
WHERE
v.status = 'Converted'
AND v.type = 'public'
AND v.is_duplicate = '0'
AND v.is_adult = '0'
AND v.channel_id <> 10
AND vvt.views_date >= '2001-05-11'
GROUP BY
vvt.videos_id
ORDER BY
tmp_num DESC
LIMIT
8
All the relevant fields are indexed.
And here is a screenshot of the EXPLAIN result:
So, how can I optimize this?
UPDATE
This is my query based on the answer by Quassnoi. It returns the correct videos, but it messes up the JOIN on the signup table. For some records the username field is NULL, for others it contains the wrong username.
SELECT
v.VID,
v.title,
v.vkey,
v.duration,
v.addtime,
v.UID,
v.viewnumber,
v.com_num,
v.rate,
v.THB,
s.username
FROM
(SELECT
videos_id,
SUM(num_of_views) AS tmp_num
FROM
videos_views
WHERE
views_date >= '2010-05-13'
GROUP BY
videos_id
) q
JOIN video v ON v.VID = q.videos_id
LEFT JOIN signup s ON s.UID = v.VID
WHERE
v.type = 'public'
AND v.channel_id <> 10
AND v.is_adult = '0'
AND is_duplicate = '0'
ORDER BY
tmp_num DESC
LIMIT
8
Here is the resultset:
Yeah, ORDER BY on a computed column is always going to be unindexable. Sorry.
If you're going to be doing this query a lot and you want to avoid the views for each video having to be counted and ordered each time, you'll have to denormalise. Add a views_in_last_week column, recalculate it from videos_views in the background each day, and index it (possibly in a compound index with other relevant WHERE conditions).
Create the following index:
video_views (views_date, videos_id)
, and get rid of the LEFT JOIN between videos and views (it does not work with your current query, anyway):
SELECT *
FROM (
SELECT videos_id, SUM(num_of_views) AS tmp_num
FROM video_views
GROUP BY
videos_id
) q
JOIN videos v
ON v.vid = q.videos_id
LEFT JOIN
signup s
ON s.UID = v.UID
ORDER BY
tmp_num DESC
LIMIT 8
If you want to return zero for videos that had never been viewed, change the order of fields in the index:
video_views (videos_id, views_date)
and rewrite the query:
SELECT *,
(
SELECT COALESCE(SUM(num_of_views), 0)
FROM video_views vw
WHERE vw.videos_id = v.vid
AND views_date >= '2001-05-11'
) AS tmp_num
FROM videos v
LEFT JOIN
signup s
ON s.UID = v.UID
ORDER BY
tmp_num DESC
LIMIT 8