Count tweets between dates (mysql) - mysql

I have an assignment to create a twitter like database. And in this assignment i have to filter out the trending topics. My idea was to count the tweets with a specific tag between the date the tweet was made and 7 days later, and order them by the count.
I have the following 2 tables i am using for this query :
Table Tweet : id , message, users_id, date
Table Tweet_tags : id, tag, tweet_id
Since mysql isn't my strong point at all im having trouble getting any results from the query.
The query i tried is :
Select
Count(twitter.tweet_tags.id) As NumberofTweets,
twitter.tweet_tags.tag
From twitter.tweet
Inner Join twitter.tweet_tags On twitter.tweet_tags.tweet_id = twitter.tweet.id
WHERE twitter.tweet_tags.tag between twitter.tweet.date and ADDDATE(twitter.tweet.date, INTERVAL 7 day)
ORDER BY NumberofTweets
The query works, but gives no results. I just can't get it to work. Could you guys please help me out on this, or if you have a better way to get the trending topics please let me know!
Thanks alot!

This is equivalent to your query, with table aliases to make it easier to read, with BETWEEN replaced by two inequality predicates, and the ADDDATE function replaced with equivalent operation...
SELECT COUNT(s.id) As NumberofTweets
, s.tag
FROM twitter.tweet t
JOIN twitter.tweet_tags s
ON s.tweet_id = t.id
WHERE s.tag >= t.date
AND s.tag <= t.date + INTERVAL 7 DAY
ORDER
BY NumberofTweets
Two things pop out at me here...
First, there is no GROUP BY. To get a count by "tag", you want at GROUP BY tag.
Second, you are comparing "tag" to "date". I don't know your tables, but that just doesn't look right. (I expect "date" is a DATETIME or TIMESTAMP, and "tag" is a character string (maybe what my daughter calls a "hash tag". Or is that tumblr she's talking about?)
If I understand your requirement:
For each tweet, and for each tag associated with that tweet, you want to get a count of the number of other tweets, that have a matching tag, that are made within 7 days after the datetime of the tweet.
One way to get this result would be to use a correlated subquery. (This is probably the easiest approach to understand, but is probably not the best approach from a performance standpoint).
SELECT t.id
, s.tag
, ( SELECT COUNT(1)
FROM twitter.tweet_tags r
JOIN twitter.tweet q
ON q.id = r.tweet_id
WHERE r.tag = s.tag
AND q.date >= t.date
AND q.date <= t.date + INTERVAL 7 DAY
) AS cnt
FROM twitter.tweet t
JOIN twitter.tweet_tags s
ON s.tweet_id = t.id
ORDER
BY cnt DESC
Another approach would be to use a join operation:
SELECT t.id
, s.tag
, COUNT(q.id) AS cnt
FROM twitter.tweet t
JOIN twitter.tweet_tags s
ON s.tweet_id = t.id
LEFT
JOIN twitter.tweet_tags r
ON r.tag = s.tag
LEFT
JOIN twitter.tweet q
ON q.id = r.tweet_id
AND q.date >= t.date
AND q.date <= t.date + INTERVAL 7 DAY
GROUP
BY t.id
, s.tag
ORDER
BY cnt DESC
The counts from both of these queries assume that tweet_tags (tweet_id, tag) is unique. If there are any "duplicates", then including the DISTINCT keyword, i.e. COUNT(DISTINCT q.id) (in place of COUNT(1) and COUNT(q.id) respectively) would get you the count of "related" tweets.
NOTE: the counts returned will include the original tweet itself.
NOTE: removing the LEFT keywords from the query above should return an equivalent result, since the tweet/tag (from t/s) is guaranteed to match itself (from r/q), as long as the tag is not null and the tweet date is not null.
Those queries are going to have problematic performance on large sets. Appropriate covering indexes are going to be needed for acceptable performance:
... ON twitter.tweet_tags (tag, tweet_id)
... ON twitter.tweet (date)

Related

Searching and Sorting Using MySQL Inner Join

I guess I can't explain my problem properly. I want to explain this to you with a picture.
Picture 1
In the first picture you can see the hashtags in the trend section. These hashtags are searched for the highest total and it is checked whether the date has passed. If valid data is available, the first 5 hashtags are taken.
Picture 2
In the second picture, it is checked whether the posts in the hashtag are in the post, if any, the oldest date value is taken, LIMIT is set to 1 and the id value from the oyuncular table is matched with sid. Thus, the name of the person sharing can be accessed.
Picture 3
My English is a little bad, I hope I could explain it properly.
SELECT
social_trend.hashtag,
social_trend.total,
social_trend.tarih,
social_post.sid,
social_post.tarih,
social_post.post,
oyuncular.id,
oyuncular.isim
FROM
social_trend
INNER JOIN
social_post
ON
social_post.post LIKE '%social_trend.hashtag%' ORDER BY social_post.tarih LIMIT 1
INNER JOIN
oyuncular
ON
oyuncular.id = social_post.sid
WHERE
social_trend.tarih > UNIX_TIMESTAMP() ORDER BY social_trend.total DESC LIMIT 5
YOu should use a sibquery
and add a proper join between subqiery and social_trend
(i assumed sing both sid)
SELECT
social_trend.hashtag,
social_trend.total,
social_trend.tarih,
t.sid,
t.tarih,
t.post,
oyuncular.id,
oyuncular.isim
FROM (
select social_post.*
from social_post
INNER JOIN social_trend ON social_post.post LIKE concat('%',social_trend.hashtag,'%' )
ORDER BY social_post.tarih LIMIT 1
) t
INNER JOIN social_trend ON social_trend.hashtag= t.post
INNER JOIN oyuncular ON oyuncular.id = t.sid
WHERE
social_trend.tarih > UNIX_TIMESTAMP() ORDER BY social_trend.total DESC LIMIT 5
but looking to your new explanation and img seems you need
SELECT
t.hashtag,
t.total,
t.tarih_trend,
t.sid,
t.tarih,
t.post,
oyuncular.id,
oyuncular.isim
FROM (
select social_post.sid
, social_post.tarih
, social_post.post
, st.hashtag
, st.total
, st.tarih tarih_trend
from social_post
INNER JOIN (
select * from social_trend
WHERE social_trend.tarih > UNIX_TIMESTAMP()
order by total DESC LIMIT 5
) st ON social_post.post LIKE concat('%',st.hashtag,'%' )
ORDER BY social_post.tarih LIMIT 5
) t
INNER JOIN oyuncular ON oyuncular.id = t.sid

Passing argument in LEFT JOIN

I am currently trying to get data from 2 tables with a LEFT JOIN having an unknow value.
I tried using LEFT JOIN but it didn't work.
Here is my code example :
SELECT
cc.shid,
cc.user,
ts.type,
sum(cc.qty1) + sum(cc.qty2) as qty_tot,
COUNT(cc.id) as nb
FROM
content_c cc
LEFT JOIN
(SELECT
s.shid,
s.type
FROM
tab_s s
LIMIT 1
) as ts ON ts.shid = cc.shid
WHERE
cc.time_i like '2019-01%'
GROUP BY
cc.user,
ts.type
With that query it will never work : ts will contain the first occurence of tab_s regardless of cc.shid. I wonder if there is a way to make this :
LEFT JOIN
(SELECT
s.shid,
s.type
FROM
tab_s s
WHERE
s.shid = cc.shid
LIMIT 1
) as ts ON ts.shid = cc.shid
Any idea ? Is there a pointer notion in SQL or something like ? Like I can use &cc.shid, or #cc.shid ?
Note that doing the following :
LEFT JOIN tab_s ts ON ts.shid = cc.shid
Will make my request to take more than 1 minute to display results. And I cannot set an index in tab_s.shid aswell as cc.shid as its have multiple occurences.
Please keep in mind that content_c can have multiple occurence of cc.shid, that why I need to take only the first result (LIMIT 1). It's important.
Use a correlated subquery:
SELECT cc.shid, cc.user, cc.type,
SUM(cc.qty1) + SUM(cc.qty2) as qty_tot,
COUNT(cc.id) as nb
FROM (SELECT cc.*,
(SELECT s.type
FROM tab_s s
WHERE ts.shid = cc.shid
LIMIT 1
) as type
FROM content_c cc
) cc
WHERE cc.time_i >= '2019-01-01' AND
cc.time_i < '2019-02-01'
GROUP BY cc.shid, cc.user, cc.type;
Notes:
The use of LIMIT with no ORDER BY is suspicious. Why would there be duplicates in the underlying table?
Your date comparisons are bad. Use date/time functions when working with date/time values. Don't use string functions.
The GROUP BY should include all non-aggregated columns in the SELECT.
As discussed in the question comments, Can you please try this script and see if it meets your requirements? This will return a row per ID in "content_c" table with the GROUP BY impact.
SELECT
cc.shid,
cc.user,
ts.type,
sum(cc.qty1) + sum(cc.qty2) as qty_tot,
COUNT(cc.id) as nb
FROM content_c cc
LEFT JOIN
(
SELECT DISTINCT s.shid, s.type FROM tab_s s
) AS ts ON ts.shid = cc.shid
WHERE cc.time_i like '2019-01%'
GROUP BY cc.shid,cc.user,ts.type

group by month and year, count from another table

im trying to get my query to group rows by month and year from the assignments table, and count the number of rows that has a certain value from the leads table. they are linked together as the assignments table has an id_lead field, which is the id of the row in the leads table.
d_new would be a count of the assignments for leads for the month whose website is newsite.com
d_subprime would be a count of the assignments for leads for the month whose website is not newsite.com
here are the tables being used:
`leads`
id (int)
website (varchar)
`assignments`
id_lead (int)
date_assigned (int)
heres my query which is not working:
SELECT
MONTHNAME(FROM_UNIXTIME(a.date_assigned)) as d_month,
YEAR(FROM_UNIXTIME(a.date_assigned)) as d_year,
(select COUNT(*) from leads where website='newsite.com' ) as d_new,
(select COUNT(*) from leads where website!='newsite.com') as d_subprime
FROM assignments as a
left join leads as l on (l.id = a.id_lead)
where id_dealership='$id_dealership2'
GROUP BY
d_month,
d_year
ORDER BY
d_year asc,
MONTH(FROM_UNIXTIME(a.date_assigned)) asc
$id_dealership is a variable containing a id of the dealership im trying to view the count for.
any help would be greatly appreciated.
You can sort of truncate your timestamps to months and use the obtained values for grouping, then derive the necessary date parts from them:
SELECT
YEAR(d_yearmonth) AS d_year,
MONTHNAME(d_yearmonth) AS d_month,
…
FROM (
SELECT
LAST_DAY(FROM_UNIXTIME(a.date_assigned)) as d_yearmonth,
…
FROM assignments AS a
LEFT JOIN leads AS l ON (l.id = a.id_lead)
WHERE id_dealership = '$id_dealership2'
GROUP BY
d_yearmonth
) AS s
ORDER BY
d_year ASC,
MONTH(d_yearmonth) ASC
Well, LAST_DAY() doesn't really truncate a timestamp, but it does turn all the values belonging to the same month into the same value, which is basically what we need.
And I guess the counts should be related to the rows you are actually selecting, which is not what your subqueries are. Something like this might do:
…
COUNT(d.website = 'newsite.com' OR NULL) AS d_new,
/* or: COUNT(d.website) - COUNT(NULLIF(d.website, 'newsite.com')) AS d_new */
COUNT(NULLIF(d.website, 'newsite.com')) AS d_subprime
…
Here's the entire query with all the modifications mentioned:
SELECT
YEAR(d_yearmonth) AS d_year,
MONTHNAME(d_yearmonth) AS d_month,
d_new,
d_subprime
FROM (
SELECT
LAST_DAY(FROM_UNIXTIME(a.date_assigned)) as d_yearmonth,
COUNT(d.website = 'newsite.com' OR NULL) AS d_new,
COUNT(NULLIF(d.website, 'newsite.com')) AS d_subprime
FROM assignments AS a
LEFT JOIN leads AS l ON (l.id = a.id_lead)
WHERE id_dealership = '$id_dealership2'
GROUP BY
d_yearmonth
) AS s
ORDER BY
d_year ASC,
MONTH(d_yearmonth) ASC
This should do the trick:
SELECT
YEAR(FROM_UNIXTIME(a.date_assigned)) as d_year,
MONTHNAME(FROM_UNIXTIME(a.date_assigned)) as d_month,
l.website,
COUNT(*)
FROM
assignments AS a
INNER JOIN leads AS l on (l.id = a.id_lead) /*are you sure, that you need a LEFT JOIN?*/
WHERE id_dealership='$id_dealership2'
GROUP BY
d_year, d_month, website
/*an ORDER BY is not necessary, MySQL does that automatically when grouping*/
If you really need a LEFT JOIN, be aware that COUNT() ignores NULL values. If you want to count those as well (which I can't imagine to make sense) write it like this:
SELECT
YEAR(FROM_UNIXTIME(a.date_assigned)) as d_year,
MONTHNAME(FROM_UNIXTIME(a.date_assigned)) as d_month,
l.website,
COUNT(COALESCE(l.id, 1))
FROM
assignments AS a
LEFT JOIN leads AS l on (l.id = a.id_lead)
WHERE id_dealership='$id_dealership2'
GROUP BY
d_year, d_month, website
Start with
SELECT
MONTHNAME(FROM_UNIXTIME(a.date_assigned)) as d_month,
YEAR(FROM_UNIXTIME(a.date_assigned)) as d_year,
SUM(IF(l.website='newsite.com',1,0) AS d_new,
SUM(IF(l.website IS NOT NULL AND l.website!='newsite.com',1,0) AS d_subprime
FROM assignments AS a
LEFT JOIN leads AS l ON l.id = a.id_lead
WHERE id_dealership='$id_dealership2'
GROUP BY
d_month,
d_year
ORDER BY
d_year asc,
MONTH(FROM_UNIXTIME(a.date_assigned)) asc
and work from here: The field id_dealership is neither in leads nor in assignments, so you need more work.
If you edit your question to account for id_dealership we might be able to help you further.

single result from a query with group by

Please help me to write this query.
I want to get all Words for which the last repetition for a given user has a date earlier than today OR there is no repetition.
I have something like this, but it's incorrect.
SELECT * FROM `Word` w LEFT JOIN (
SELECT * FROM `Repetition`
GROUP BY word_id
ORDER BY next DESC
) r ON w.id = r.word_id
WHERE wordset_id = 1 AND (r.user_id IS NULL) OR r.next < CURRENT_DATE
"Subquery" should return a table of last repetition for a given user/word combination
I want to get: All words that don't have any repetitions or have due ( meaning earlier or today) repetitions (both for a given user)
SELECT * FROM `Word` w LEFT JOIN
(SELECT word_id, user_id, `repNo`, `repCount`, `date`, `ef`, MAX(next) next
FROM `Repetition` GROUP BY `word_id`,`user_id`) r ON w.id = r.word_id
WHERE wordset_id = 1 AND (r.user_id IS NULL) OR r.next < CURRENT_DATE
I might not understand the problem completely, but to word it differently, you're wanting to return all of the words that don't have repetitions today?
SELECT * FROM word
WHERE word_id NOT IN (
SELECT word_id FROM repetition WHERE next >= CURRENT_DATE
)
Something like that or am I way off? Not really sure what you're trying to do with wordset_id or user_id there.
I think you don't have to use subquery here, you'll should be fine just with join;
SELECT words.*
FROM words
-- Left join makes sure that you get one result for each word
LEFT JOIN Repetition ON (
(Words.id = Repetition.word_id)
-- Get just one user's result
AND (Repetition.user_id = ?)
)
-- And filter today's results:
WHERE (Repetition.id IS NULL) OR (Repetition.next < NOW())
GROUP BY Words.id

Using INNER JOIN to receive crossed data by date range

I have a theme gallery. In the dashboard i have to display the most viewed themes BY date (today, last 7 days, last 30 days, all time).
These are the 2 involved tables:
theme
id_theme
title
views
id_view
id_theme
date
The $timestamp values are calculated with mktime() (no prob in there).
This is my current SQL query:
SELECT t.id_theme,t.title,
(SELECT COUNT(*)
FROM views
WHERE views.id_theme=t.id_theme
AND views.date BETWEEN '.$timestamp1.' AND '.$timestamp2.')
AS q
FROM theme AS t
INNER JOIN views ON t.id_theme = views.id_theme
GROUP BY views.id_theme
ORDER BY q
DESC LIMIT 10
The problem is that The catch, is that sometimes it receives themes with 0 views, and that should not happen. I tried changing the INNER JOIN with RIGHT JOIN with no results. Any ideas?
Hmm. not sure why you're using subqueries for this, seems like this would work better:
SELECT theme.id_theme, theme.title, COUNT(views.id_view) as view_count
FROM theme
LEFT JOIN views ON (theme.id_theme = views.id_theme)
GROUP BY theme.id_theme
WHERE views.date > DATE_SUB(now() INTERVAL 30 day)
ORDER BY view_count DESC
HAVING view_count > 0
SELECT t.id_theme, t.title, COUNT(*) AS q
FROM theme AS t
INNER JOIN views ON t.id_theme = views.id_theme
AND views.date BETWEEN '.$timestamp1.' AND '.$timestamp2.'
GROUP BY t.id_theme, t.title
ORDER BY q
DESC LIMIT 10