Deduplicate rows in mysql - mysql

I got a USER table ,i need to deduplicate records when the field age and name is equal and within 5 minutes keep the earliest record :
i got sql like this ,but is don't work somehow:
SELECT *
FROM user u1
WHERE NOT EXISTS (SELECT *
FROM user u2
WHERE u2.name = u1.name AND u2.age = u1.age
AND u2.created_at > u1.created_at - INTERVAL 5 MINUTE)

We could try phrasing the logic as deleting any record for which we can find another record by the same age and name, with a create time within 5 minutes of the record being considered for deletion.
DELETE
FROM user u1
WHERE EXISTS (SELECT 1 FROM user u2
WHERE u2.age = u1.age AND u2.name = u1.name AND
TIMEDIFF(t2.create_time, t1.create_time) < '00:05:00');
This assumes that you actually want to remove the duplicate records. If you just want to select, use the logic provided by #Nick which is very similar to my delete query.

Related

sort details order by user id from another mysql table by activity less than 3600

I wish to fetch all users from "members" table but also check if the member_id from members table and user_id exist in"login" table and then see if column "activity" (current_timestamp) is less than 3600 seconds in login table than order those users on top rest users if don't exist in login table shows those users in bottom?
how cani query this please.
this is how i fetch users
$query = "SELECT * FROM members WHERE member_id != '".$_SESSION['member_id']."'";
but now how do i query the rest?
really thank you for your help.
Thanks
You need something like
SELECT members.*
FROM members
JOIN logins ON members.member_id = logins.member_id
WHERE logins.logged_in_at >= CURRENT_TIMESTAMP - INTERVAL 3600 SECOND
-- AND members.member_id != '$_SESSION['tfs_member_id']'
JOIN provides the presence in logins table, WHERE by logged_in_at filters "active" logins.
Use a LEFT JOIN with the logins table so you get members that aren't in the table. Then test whether the user is found in the logins table in the ORDER BY clause, and follow that by ordering by whether the last activity is recent.
SELECT m.*
FROM members AS m
LEFT JOIN logins as l ON m.member_id = l.member_id
ORDER BY l.member_id IS NULL,
l.activity > DATE_SUB(NOW(), INTERVAL 1 HOUR) DESC

MySQL Delete from database where count of grouped records less than value

I have a database similar to this:
Name State
Bill CA
Joe NY
Susan CA
I know I can get a total count of the number of records for each state like this:
SELECT State, COUNT(*) as count FROM users GROUP BY State
I'm trying to delete all records where the total count of states is less than 2 (or any arbitrary number)
Something like this (Pseudocode):
DELETE FROM users WHERE totalUsersInState < 2
So the final database should be like this
Name State
Bill CA
Susan CA
What is the correct syntax for that? I can't figure it out.
We can use a join to an inline view (a derived table in the MySQL parlance)
Write it as a SELECT statement first
SELECT t.*
FROM users t
JOIN ( SELECT r.state
FROM users r
GROUP
BY r.state
HAVING SUM(1) < 2
) s
ON s.state = t.state
Verify that these are the rows we want to delete, and then convert that into a DELETE statement by just replacing the first SELECT keyword...
DELETE t.*
FROM ...
Note that this will not remove a row with a NULL value for state because of the equality comparison in the join predicate.
One option, fins all states with fewer than less that 2 users, then delete all records for those states.
DELETE FROM
users
WHERE
state IN (SELECT state FROM users GROUP BY state HAVING COUNT(*) < 2)
Or (because the < 2 means "delete users where they are the only user in the state")...
DELETE FROM
users
WHERE
NOT EXISTS (SELECT *
FROM users lookup
WHERE lookup.Name <> users.Name
AND lookup.State = users.State
)
-- WHERE NOT EXISTS (any _other_ user that's in the same state)
-- => WHERE this is the only user in the state

Display all rows in table even when WHERE statement does not match

I have a table delarations where users can record the time they spent on projects. Declarations have a begin and end time.
I want for a specific project (id = 1), a grand total of seconds that every user has spent on the project, even if a user didn't spent any time on the project at all.
Tables (simplified):
users
-----
- id
- name
- ...
projects
--------
- id
- name
declarations
------------
- id
- user_id
- project_id
- begin
- end
Let's say there are 2 users. User ID=1 has spent some time on projects, and user ID=2 didn't do anything.
select users.*, sum(timestampdiff(second, declarations.start, declarations.end)) as seconds
from users
join declarations on declarations.user_id = users.id
where declarations.project_id = 1
group by users.id
With the above query, only user 1 will appear. How can I modify the query in such way to include all the other users as well, with a value of 0 for seconds?
Consider using a LEFT OUTER JOIN and move that WHERE condition to JOIN ON condition like
select users.*,
sum(timestampdiff(second, declarations.start, declarations.end)) as seconds
from users
left join declarations on declarations.user_id = users.id
and declarations.project_id = 1
group by users.id

MYSQL: doing a select to filter using the ID of 2 tables

I really dont know how to ask this so i will resume and go to the point.
I have 2 tables.
(Im using Wordpress so you may know the tables)
Table 1 = wp_users (contains id column and display_name nickname)
Table 2 = wp_simple_login_log (contains uid column wich is the ID of the user, should be the same ID in wp_users. It also contains time column where the last log is recorded in this format(2015-3-11))
What table 2 do is save the date of the last log of each user. Table 1 have all the users of my site.
I want to select all the users that had logged in the last 3 months.
This is what i thought of doing.
SELECT *
FROM `wp_simple_login_log`
INNER JOIN `wp_users` ON wp_users.id = wp_simple_login_log.uid
That works, but it brings ALL the logs, thats good because i never specified the "range" that i wanted.
So i added at the end:
WHERE time '%2015%'
which obviously didn't work, so here I am, any ideas?
I want to filter the logs of the last 3 months where the "display_name" column contains an "#" (I also need to filter those users with emails on their nickname)
If the column is stored as a date, then you would use:
SELECT *
FROM wp_simple_login_log l INNER JOIN
wp_users u
ON u.id = l.uid
WHERE l.time >= date_sub(curdate(), interval 3 month);
If it is stored as a string, then do an explicit conversion:
SELECT *
FROM wp_simple_login_log l INNER JOIN
wp_users u
ON u.id = l.uid
WHERE date(l.time) >= date_sub(curdate(), interval 3 month);
For the YYYY-MM-DD format, conversion can just use the date() function.

Count rows after specific one

A table users has three columns: id, name, pass.
Another table logins has user_id column, an isright boolean (tinyint) column which says whether the login was successful or not and a date column.
I need a simple left join to get the user's name and his password (1), the last login datetime (successful or not) (2) and the count of the logins for the specific user since his last successful login (3).
(1) and (2) I can achieve using
SELECT name, pass, MAX(date)
FROM users
LEFT JOIN logins ON logins.id = users.id
-- here either "GROUP BY users.id" or "WHERE users.id = 1234"
But (3) seems to be harder. I googled it and found many similar question but none of them was asking on exactly how to count rows after specific condition is true. (It's even more complicated - count the logins for that user, not everyone)
I don't even know how to do it in a separate query (I'd prefer having one query for the 3 things and I suppose I'd have to use a subquery, although I prefer joins).
SQL fiddle with the tables and some data: http://sqlfiddle.com/#!9/a932b
Any ideas?
The straight-forward way is to have two derived tables: One to get the last login date per user, the other to get the last successful login date per user. Then select from users, outer join the two derived tables and look whether the last login was successful and count the (failed) logins after the last successful login. (With another DBMS you would rather use analytic functions that MySQL lacks.)
select
users.name,
users.pass,
(
select max(isright)
from logins
where user_id = students.id and date = last_login.date
) as last_login_successful,
(
select count(*)
from logins
where user_id = students.id and date > last_successful_login.date
) as last_logins_failed
from users
left outer join
(
select user_id, max(date) as date
from logins
group by user_id
) last_login on last_login.user_id = users.id
left outer join
(
select user_id, max(date) as date
from logins
where isright = 1
group by user_id
) last_successful_login on last_successful_login.user_id = users.id;
This gives you four possibilities per user:
The user never tried to login. last_login_successful is null and last_logins_failed is meaningless.
The user's logins all failed. last_login_successful is 0 and last_logins_failed is meaningless.
The user's last login was successful. last_login_successful is 1 and last_logins_failed is meaningless.
The user logged in successfully once, but failed at least th last time they tried. last_login_successful is 0 and last_logins_failed is the number of failures after last successful login.
And here is a fiddle: http://sqlfiddle.com/#!9/57b7d/1.
EDIT: To also count failed logins when a user never logged in: If a user never logged in, their last_login.date is null. In last_logins_failed you want to count all records for which the last login occurred before OR never:
(
select count(*)
from logins
where user_id = students.id and (date > last_successful_login.date
or last_successful_login.date is null)
) as last_logins_failed
I guess you could do something like
select count(*)
from logins as l join users as u on l.user_id = u.id
where l.timestamp > (select max(timestamp)
from logins
where user_id = u.id and isright = 1)
Get the last timestamp of the login that was successful (subquery) for user
then get a count for all the logins where the timestamp is greater than that of the user, this automatically gives you only the unsuccessful logins because you took the last successful one as a reference timestamp/datetime
How about using a view which only has latest login detail with below query
select l.user_id, l.date, l.isright from logins l
where l.date >= (select max(l2.date) from logins l2
where l2.isright=1 and l2.user_id = l.user_id)