Count retweets per user by linking two tables - mysql

I have the following tables:
tweets retweets
----------------- ----------------
user_id retweets user_id (etc...)
----------------- ----------------
1 0 1
2 0 1
1
2
2
I want to count the number of retweets per user and update tweets.retweets accordingly:
UPDATE users
SET retweets = (
SELECT COUNT(*) FROM retweets WHERE retweets.user_id = users.user_id
)
I have been running this query two times, but it times out (on tables that are not that large). Is my query wring?
Also see the SQL Fiddle (although it apparently doesn't allow UPDATE statements): http://www.sqlfiddle.com/#!2/f591e/1

This solution should be much faster than using subqueries for getting the count of tweets of each user (your correlated subquery will execute for each user):
UPDATE users a
LEFT JOIN
(
SELECT user_id, COUNT(1) AS retweet_count
FROM retweets
GROUP BY user_id
) b ON a.user_id = b.user_id
SET a.retweets = COALESCE(b.retweet_count, 0)

If your retweets table is not changing dynamically why not to gather data at first and then update destination table like this:
create table retweets_hist AS SELECT COUNT(*) AS retweets,user_id FROM retweets group by user_id;
then
UPDATE users
SET retweets = NVL(
SELECT retweets FROM retweets_hist WHERE retweets_hist.user_id = users.user_id
),0)
If it is dynamic, then I think using triggers is better.
The main issue here is when there is a user which has never retweeted ever counting it's retweets is time-consuming.
In answer to your question, Yes counting takes a fraction but counting something which never existed take time! this is the problem!
May this one would have better timing:
UPDATE users
SET retweets = NVL(
SELECT retweets
FROM retweets
WHERE retweets.user_id = users.user_id),0)
WHERE EXISTS(select *
FROM retweets
WHERE retweets.user_id = users.user_id)
But then again you have to update never retweets to Zero.
**Keyword EXISTS is in Oracle I don't know if mysql supports it

Related

Display all rows in table even when WHERE statement does not match

I have a table delarations where users can record the time they spent on projects. Declarations have a begin and end time.
I want for a specific project (id = 1), a grand total of seconds that every user has spent on the project, even if a user didn't spent any time on the project at all.
Tables (simplified):
users
-----
- id
- name
- ...
projects
--------
- id
- name
declarations
------------
- id
- user_id
- project_id
- begin
- end
Let's say there are 2 users. User ID=1 has spent some time on projects, and user ID=2 didn't do anything.
select users.*, sum(timestampdiff(second, declarations.start, declarations.end)) as seconds
from users
join declarations on declarations.user_id = users.id
where declarations.project_id = 1
group by users.id
With the above query, only user 1 will appear. How can I modify the query in such way to include all the other users as well, with a value of 0 for seconds?
Consider using a LEFT OUTER JOIN and move that WHERE condition to JOIN ON condition like
select users.*,
sum(timestampdiff(second, declarations.start, declarations.end)) as seconds
from users
left join declarations on declarations.user_id = users.id
and declarations.project_id = 1
group by users.id

SQL Query to count sessions without repeating lines

I have two tables which join themselves by a field called user_id. The first table called sessions can have multiple lines for the same day. I'm trying to find a way of selecting the total of that sessions without repeating the days (sort of).
Example:
Table sessions
ID | user_id | datestart
1 1 2014-08-05
2 1 2014-08-05
3 2 2014-08-05
As you can see there are two lines that are repeated (the first and second). If I query SELECT COUNT(sess.id) AS total this will retrieve 3, but I want it to retrieve 2 because the first two lines have the same user_id so it must count as one.
Using the clause Group By will retrieve two different lines: 2 and 1, which is also incorrect.
You can view a full example working at SQLFiddle.
Is there anyway of solving this only by query or do I need to do it by language?
I think you are looking for count(distinct):
SELECT COUNT(distinct user_id) AS total
FROM sessions sess INNER JOIN
users user
ON user.id = sess.user_id
WHERE user.equipment_id = 1 AND
sess.datestart = CURDATE();
If I understand the problem correctly, you want the number of users with sessions, rather than number of unique sessions. Use DISTINCT:
SELECT COUNT(DISTINCT(user_id)) FROM sessions,users WHERE user_id=users.id
Try this way:
SELECT COUNT(distinct sess.user_id) AS total
FROM sessions AS sess
INNER JOIN users AS user ON user.id = sess.user_id
WHERE user.equipment_id = 1 AND sess.datestart = CURDATE()
Sql Fiddle

How to check if id exists in another table given table is 30 million records?

I know the question seems duplicate, but I don't know how to ask it differently.
I have two very simple tables in MySQL database, The first is table Users
id, user_id
1 1
2 3
4 4
The second is table Friends
id, user_id, friend_id
1 1 3
2 1 4
3 1 8
I dumped the data from CSV file that I would like to clean. I need to check if friend_id exists in table 1 as well. The first table has around 30000 rows, but the second table has around 30 million rows.
And I use this query to check
SELECT u.user_id, uf.friend_id as exists_friend_ids
FROM Users u, Friends uf
WHERE u.user_id = '1'
and uf.friend_id IN (select user_id from eventify.Users)
However, my desired output would be this but as I cannot run the above query to actually give my test results I cannot continue.
user_id, exists_friend_ids
1 3
1 4
You can see that 8 is not there, because it doesn't exist in Users table. But as the second table has over 30 million records it's just running forever on my computer. Am I doing it right or this is the only way to do it. Or should I learn Hadoop instead?
I have updated my query to use equal join.
Have you tried a LEFT JOIN query with a GROUP BY friend_id ? If a user doesn't exist, it won't add a line to the result.
If all you are doing is cleaning the table then you have some flexibility since the fact that the query runs slow will not have a great impact since you will want to run it only once. Here are a couple of different options:
use a left join to find the rows in friends without the corresponding friend id in the users table (untested):
SELECT Friends.id, Users.user_id
FROM Friends LEFT JOIN Users on Friends.friend_id = Users.user_id
WHERE Users.user_id is NULL
Then delete the records you find
use an inner join to fin the friends that exist. Then create a new table with those records (untested)
SELECT Friends.id, Users.user_id
FROM Friends INNER JOIN Users on Friends.friend_id = Users.user_id
And insert the resulting rows into a new table which will become your new "Friends" table.
Hope that helps
I don't understand why you do the CASE construct here. If you want to get a list of all friend_ids that don't exist in the users table, then what about something like:
select friends.friend_id,
count(*)
from friends
where friends.friend_id not in (select users.user_id
from users)
group by 1
You will of course have an index on users.user_id...

MySQL nested query counting

A bit of background info; this is an application that allows users to created challenges and then vote on those challenges (bog standard userX-vs-userY type application).
The end goal here is to get a list of 5 users sorted by the number of challenges they have won, to create a type of leaderboard. A challenge is won by a user if it's status = expired and the user has > 50 votes for that challenge (challenges expire after 100 votes in total).
I'll simplify things a bit here, but essentially there are three tables:
users
id
username
...
challenges
id
issued_to
issued_by
status
challenges_votes
id
challenge_id
user_id
voted_for
So far I have an inner query which looks like:
SELECT `challenges`.`id`
FROM `challenges_votes`
LEFT JOIN `challenges` ON (`challenges`.`id` = `challenges_votes`.`challenge_id`)
WHERE `voted_for` = 1
WHERE `challenges`.`status` = 'expired'
GROUP BY `challenges`.`id`
HAVING COUNT(`challenges_votes`.`id`) > 50
Which in this example would return challenge IDs that have expired and where the user with ID 1 has > 50 votes for.
What I need to do is count the number of rows returned here, apply it to each user from the users table, order this by the number of rows returned and limit it to 5.
To this end I have the following query:
SELECT `users`.`id`, `users`.`username`, COUNT(*) AS challenges_won
FROM (
SELECT `challenges`.`id`
FROM `challenges_votes`
LEFT JOIN `challenges` ON (`challenges`.`id` = `challenges_votes`.`challenge_id`)
WHERE `voted_for` = 1
GROUP BY `challenges`.`id`
HAVING COUNT(`challenges_votes`.`id`) > 0
) AS challenges_won, `users`
GROUP BY `users`.`id`
ORDER BY challenges_won
LIMIT 5
Which is kinda getting there but of course the voted_for user ID here is always 1. Is this even the right way to go about this type of query? Can anyone shed any light on how I should be doing it?
Thanks!
I guess the following script will solve your problem:
-- get the number of chalenges won by each user and return top 5
SELECT usr.id, usr.username, COUNT(*) AS challenges_won
FROM users usr
JOIN (
SELECT vot.challenge_id, vot.voted_for
FROM challenges_votes vot
WHERE vot.challenge_id IN ( -- is this check really necessary?
SELECT cha.id -- if any user is voted 51 he wins, so
FROM challenges cha -- why wait another 49 votes that won't
WHERE cha.status = 'expired' -- change the result?
) --
GROUP BY vot.challenge_id
HAVING COUNT(*) > 50
) aux ON (aux.voted_for = usr.id)
GROUP BY usr.id, usr.username
ORDER BY achallenges_won DESC LIMIT 5;
Please allow me to propose a small consideration to the condition to close a challenge: if any user wins after 51 votes, why is it necessary to wait another 49 votes that will not change the result? If this constraint can be dropped, you won't have to check challenges table and this can improve the query performance -- but, it can worsen too, you can only tell after testing with your actual database.

MySQL subtracting multiple times for same row in update

I have a table of comments and a table of posts
Whenever a post is deleted, a query runs to subtract the number of comments (that are deleted later) from each user's comment_count
So if a user has 2 comments in a post, and that post is deleted, their balance should have 2 subtracted from it
My query is as follows:
UPDATE users
INNER JOIN comment ON users.id = comment.author
SET comment_count = comment_count - 1
WHERE comment.post = 1
User A has 2 comments with .post = 1, but for some reason that user only gets comment_count subtracted by 1 once, when it should happen twice
I think my syntax is right because when I:
SELECT *
FROM users
INNER JOIN comment ON users.id = comment.author
WHERE comment.post = 1
I get two results for user A
Shouldn't UPDATE be iterating over those two results, subtracting each time?
Can someone explain what I am missing? thank you
If you're going to store the count, use:
UPDATE USERS
SET comment_count = (SELECT COUNT(*)
FROM COMMENT c
WHERE c.author = USERS.id)
...or:
UPDATE USERS u
JOIN (SELECT c.author,
COUNT(*) AS numComments
FROM COMMENT c
GROUP BY c.author) x ON x.author = u.id
SET comment_count = x.numComments
There's no point in relying on two records to subtract twice, when you could perform the operation once.
I prefer not to store such values, because they can be calculated based on records without the hassle of keeping the counts in sync. A view might be a better idea...