I have two tables user & user login history. I need to make a report of the times a particular user is logging into the system. The table contains millions of rows of data. So running a nested query to fetch number of logins of users is taking a lot of time.
I am trying to loop through all the users and update the logins column. How can I do this in one query?
The schema is like this:
users table:
id INT(10)
username VARCHAR(7)
logins INT(10)
user_logs table:
id INT(10)
userid INT(10)
login_date DATETIME(19)
http://sqlfiddle.com/#!9/dc4149
I'm running this query
UPDATE users u
SET u.logins = (SELECT COUNT(*)
FROM user_logs
WHERE userid = u.id)
LIMIT 1
This is not working.
Is there any way how I could loop through users & update their respective login count?
I tried doing this with PHP but as the tables are very large. Doing this 1 by 1 takes very time.
Can I do this via command line?
An update should take so long, especially if you have proper indexed on both tables.
Try this:
UPDATE users u
INNER JOIN(SELECT ul.userid,count(1) as cnt FROM user_logs ul GROUP BY ul.userid) u2
ON(u2.userid = u.id)
SET u.logins = u2.cnt
Then make sure you have the following indexes:
users - (id,logins)
user_logins - (userid)
If that doesn't help - try doing this in two steps , build a derived table with the sub query results, and update by it :
CREATE TABLE temp_for_update AS(
SELECT ul.userid,count(1) as cnt
FROM user_logs ul
GROUP BY ul.userid);
CREATE INDEX YourIndex
ON temp_for_update (userid,cnt);
UPDATE users u
INNER JOIN temp_for_update u2
ON(u2.userid = u.id)
SET u.logins = u2.cnt
This should defiantly be faster.
Try using update join like
UPDATE users a
JOIN (
SELECT userid, COUNT(*) as count_login
FROM user_logs
GROUP BY userid) b ON b.userid = a.id
SET a.logins = b.count_login;
Related
I have a user specific query which i need to run for all users.
I am struggling on how to replace the hard coded uuid with a reference or if it needs a different approach altogether?
select max(MaxDate), users.created_at
from (
select max(`moment`.`created_at`) as MaxDate
from `moment`
where `moment`.`user_uuid` = "7dd668af-241a-4176-a1da-f5689214b206"
union (
select max(`module`.`updated_at`) as MaxDate
from `module`
where `module`.`user_uuid` = "7dd668af-241a-4176-a1da-f5689214b206"
)
) as stuff, `users`
where `users`.`uuid` = "7dd668af-241a-4176-a1da-f5689214b206"
the end goal is to get the date a user was created and a date the same user last updated something and then get the avage time between them. But for all users not a single user.
Here is a general query which would report all users, sorted by user:
SELECT
u.user_uuid,
GREATEST(COALESCE(t1.max_created_at, t2.max_updated_at),
COALESCE(t2.max_updated_at, t1.max_created_at)) AS max_date
FROM users u
LEFT JOIN
(
SELECT user_uuid, MAX(created_at) AS max_created_at
FROM moment
GROUP BY user_uuid
) t1
ON u.user_uuid = t1.user_uuid
LEFT JOIN
(
SELECT user_uuid, MAX(updated_at) AS max_updated_at
FROM module
GROUP BY user_uuid
) t2
ON u.user_uuid = t2.user_uuid
ORDER BY
u.user_uuid;
If you want to restrict to a single user, you may still do so via a WHERE clause or via a WHERE IN clause for a group of users.
Note that there is a bit of a smell in your database design, because you have your user information strewn across multiple tables. My answer assumes that in general every user would appear in both tables, but maybe this is not the case.
Use group by
select `users`.`uuid`,max(MaxDate) as maxdate, min(users.created_at) as createddate
from (
select `moment`.`user_uuid`,max(`moment`.`created_at`) as MaxDate
from `moment`
group by `moment`.`user_uuid`
union
select `module`.`user_uuid`,max(`module`.`updated_at`) as MaxDate
from `module` group by `module`.`user_uuid`
) as stuff inner join `users` on `users`.`uuid`=stuff.user_uuid
group by `users`.`uuid`
I'm trying to create a favorite system for my website. I have a user table, job table and a favorites table. When a user adds a job to their favorites, their userId and that jobsId is saved to the favorites table. When the user wants to see their favorites I've been trying to select all the rows in the favorites table that have the same userId as the current user. I then need to select the jobId's from these rows and select all the rows in the job table that have a matching jobId.
I've been trying variations of this query but haven't had any luck.
$sqlQuery = "SELECT * FROM job WHERE id = favorites.jobId AND :userId = favorites.userId"
You want all records from the jobs table whose IDs are in the user's favorites. This translates to:
select * from jobs where id in
(
select jobid from favorites where userid = :userid
);
How about:
Select j.*
from job j
join favorites f on j.id = f.jobId
where :userId = f.userId
I have list of submissions of exercises done by students who are part of a group(classroom), this contains:
submission table: userId, groupId, exercise_id (and more irrelevant data)
users table: userId, groupId
I want to select all the exercises done by all the students in a specific group. For this I currently have:
SELECT DISTINCT(exercise_id) FROM submissions as c1 WHERE c1.groupId = 1
AND NOT EXISTS(
SELECT DISTINCT(UserId) FROM users as u WHERE u.GroupId = 1
AND NOT EXISTS (
SELECT exercise_id FROM submissions as c2 WHERE u.UserId = c2.UserId
AND c2.exercise_id = c1.exercise_id
)
)
i.e. I select all the exercises for which there are no users in the group that have not done the exercise.
However, this query takes 5 seconds on a submission table with 1.5 million rows. Which steps could I take to further optimise this query? I have considered inner joins, but won't this result in the same query execution plan?
The groupid really shouldn't be in both tables. Assuming the values are consistent, try the following:
select s.exercise_id
from submissions s
where s.groupid = 1
group by s.exercise_id
having count(distinct userid) = (select count(distinct userid) from users where groupid = 1);
For performance, you want an index on submissions(groupid, exercise_id). Also, if you know there are no duplicate submissions or users, then remove the distinct, because that has an adverse effect on performance.
I'm working with a third-party database of a vendor we use for our account management. One of the queries we need to run programmatically is to see which accounts are currently active. There is no single column for this -- status is tracked in a separate table from the main information table that tracks all changes in status, from activation to deletion.
I want to do a simple join like this:
SELECT u.id ,
s.status
FROM user_table u
JOIN status_table s ON u.id = s._id
WHERE s.acct_status = "ACTIVE"
OR s.new_status = "ACTIVE";
But this doesn't work because there might be a later record that sets the accounts status to TERMINATED or something else. Note: every account will have a status entry of one sort or another.
For the purposes of this question, it doesn't matter what the user table is like. The status table is very simple:
_id
date_entered
acct_status
new_status
I'm pretty sure that this query would get me the latest status update (thanks to this post, but I'm not sure how to throw in a join here:
select
*
from
(select
_id, new_status
from
aria.get_acct_status_history
order by date_entered desc) as t1
group by _id;
Any ideas?
If you need the latest record per user from your status table then you use a self join on data column by getting max date per user entries and i assume _id from status_table table refers to user id
SELECT s.* FROM status_table s
JOIN (
SELECT _id ,MAX(date_entered) date_entered
FROM status_table
GROUP BY _id
) s1
ON(s._id = s1._id AND s.date_entered = s1.date_entered )
WHERE s.acct_status = "ACTIVE" or s.new_status = "ACTIVE";
Later on you and join you users table to get the user info,joining with max of date column ie. AND s.date_entered = s1.date_entered will satisfy your criteria to have recent row per user entries
SELECT u.id, s.status
FROM user_table u
JOIN status_table s ON u.id = s._id
JOIN (
SELECT _id ,MAX(date_entered) date_entered
FROM status_table
GROUP BY _id
) s1
ON(s._id = s1._id AND s.date_entered = s1.date_entered )
WHERE s.acct_status = "ACTIVE" or s.new_status = "ACTIVE";
I have 2 tables :
1) user where credentials of users are stored. It holds more than 1000 user records.
ID | NAME | PASSWORD | USERTYPEID
2) user_logs in which the login details are captured. Its fairly large i.e. more than 100000 records.
ID | NAME | DATEOFLOGIN | USERID | LOGINTYPE
I have to find the users which do not access the system between two given dates and their last logged in date.
SELECT MAX(userlogs.dateoflogin) AS lastlogindate,
u1.id AS Id,
u1.name AS Name
FROM USER u1
LEFT OUTER JOIN user_logs userlogs ON u1.id = userlogs.userid
WHERE u1.id NOT IN
( SELECT userid
FROM user_logs userlogs2
WHERE userlogs2.logtype='Login'
AND userlogs2.dateoflogin BETWEEN '2013-05-10' AND '2013-05-20'
AND userlogs2.userid IS NOT NULL)
GROUP BY u1.id;
If the tables hold smaller records then it works well.
But in live system where user table has more than 1000 records and user_logs table has more than 100000 records, the query took very long time and I dont know it succeeded or not. :)
How do I optimize the above query. This query also finds the user who never attempt to login.
First off, you need to modify that subquery if you want to improve performance. They are convenient but they have a tendency to significantly slow down a query.
Secondly, make sure you have indexes on all the columns in a WHERE clause.
This is equivalent to your query, but a LEFT JOIN with a NULL check is often more efficient than NOT IN.
SELECT MAX(userlogs.dateoflogin) as lastlogindate , u1.id as Id , u1.name as Name
FROM user u1
LEFT OUTER JOIN user_logs userlogs ON u1.id = userlogs.userid
LEFT OUTER JOIN (SELECT distinct userid
FROM user_logs
WHERE logtype='Login'
AND dateoflogin BETWEEN '2013-05-10' AND '2013-05-20'
AND userid IS NOT null) userlogs2 ON u1.id = userlogs2.userid
WHERE userlogs2.userid IS NULL
GROUP BY u1.id
Make sure you have an index on dateoflogin so the subquery will perform well.
Compare the output of EXPLAIN with both queries.