Delete all SQL rows except one for a Group - mysql

I have a table like this:
Schema (MySQL v5.7)
CREATE TABLE likethis
(`id` int, `userid` int, `date` DATE)
;
INSERT INTO likethis
(`id`, `userid`, `date`)
VALUES
(1, 1, "2021-11-15"),
(2, 2, "2021-11-15"),
(3, 1, "2021-11-13"),
(4, 3, "2021-10-13"),
(5, 3, "2021-09-13"),
(6, 2, "2021-09-13");
id
userid
date
1
1
2021-11-15
2
2
2021-11-15
3
1
2021-11-13
4
3
2021-10-13
5
3
2021-09-13
6
2
2021-09-13
View on DB Fiddle
I want to delete all records which are older than 14 days, EXCEPT if the user only has records which are older - than keep the "newest" (biggest "id") row for this user.
Desired target after that action shall be:
id
userid
date
1
1
2021-11-15
2
2
2021-11-15
3
1
2021-11-13
4
3
2021-10-13
i.e.: User ID 1 only has records within the last 14 days: Keep all of them. User ID has a record within the last 14 days, so delete ALL his records which are older than 14 days. User ID 3 has only "old" records, i.e. older than 14 days - so keep only the one newest of those records, even though it's older than 14 days.
I thought of something like a self join with a subquery where I group by user-id ... but can't really get to it ...

This query could work
DELETE b
FROM likethis a
JOIN likethis b ON a.`userid` = b.`userid` AND a.`date` > b.`date`
WHERE b.`date` < NOW() - INTERVAL 14 DAY

I believe you can use case function in MySql
For Example -
SELECT TableID, TableCol,
CASE
WHEN Date > 30 THEN "Delete statement"
ELSE "Dont Delete (Record is not 30"
END
FROM TableName;
Suggested link:
https://www.w3schools.com/sql/func_mysql_case.asp
https://dev.mysql.com/doc/refman/5.7/en/case.html
Hope this helps...

Related

MYSQL , Count week wise and also show sum with empty dates

I have two tables
Table_1 : Routes_Day_plan
Date Status_Id
------------------------
2019-06-09 1
2019-06-10 2
2019-06-09 2
2019-06-11 3
2019-06-14 4
2019-06-14 6
2019-06-15 8
Table_2 : Codes
id code
-------
1 Leave
2 Half_leave
3 Holiday
4 Work
5 Full_Hours
Now my task is to count week wise from table 1 where code (from second table) = Leave,Half_leave,work and than also show the sum , and where date not found show 0 , i write this query it's return data but not empty dates can someone please help ,
My Query:
select COUNT(*) as available, DATE(date)
from Table_1
where status_id in (
select id from codes
where code in ('Leave','Half_leave','work'))
AND DATE(date) >= DATE('2019-06-09') AND DATE(date) <= DATE('2019-06-16')
group by date
UNION ALL
SELECT COUNT(date), 'SUM' date
FROM Table_1
where status_id in (
select id from codes
where code in ('Leave','Half_leave','work'))
AND DATE(date) >= DATE('2019-06-09') AND DATE(date) <= DATE('2019-06-16')
Result Something Like ,
available Dates
------------------------
5 2019-06-09
2 2019-06-10
3 2019-06-11
3 2019-06-12
2 2019-06-14
2 2019-06-15
17 SUM
I want like this
available Dates
------------------------
5 2019-06-09
2 2019-06-10
3 2019-06-11
3 2019-06-12
0 2019-06-13
2 2019-06-14
2 2019-06-15
17 SUM
Your best bet here would be to have a Date Dimension/Lookup table which contains pre-populated dates for the entire year. By joining your record table to this lookup, you essentially allocate your data to each date that actually exist (ex. 2019-06-13) and if your data is not found in the lookup, you will find a null in that field.
The Count function will count a null as a 0. Just make sure you group on the date field from your lookup table and not from your record table.
Make a table, a date dimension that contains all the dates value, from beginning to end. Like this:
Set EndDate = '2099-01-01';
Set RunDate = '1900-01-01';
WHILE RunDate <= EndDate DO
insert into dim_date
(`DATE`)
select
RunDate as DATE
;
Set RunDate = ADDDATE(RunDate,1);
END WHILE;
Create temporary table with dim_date left join Routes_Day_plan and set Status as 0 maybe for record that dont match. Use this temporary table then instead of Routes_Day_plan in your queries.

SQL query which selects the last row from each day

I need your help. I have a table (senosrId, time, data), and I need to select the latest data from each day for one of the sensors for the latest 10 days.
For MS SQL, tested, compiled:
Test table:
CREATE TABLE [dbo].[DataTable](
[SensorId] [int] NULL,
[SensorTime] [datetime] NULL,
[SensorData] [int] NULL
)
Run several times to insert demo data:
insert into DataTable (SensorId, SensorTime, SensorData) select 1, getdate() - 15*rand(), convert(int, rand()*100)
Get last value for each of the last 10 days (actual answer):
select top 10 *
from DataTable
inner join ( -- max time for each day
select SensorId, max(SensorTime) as maxtime, convert(varchar(10), SensorTime, 112) as notneededcolumn
from DataTable
group by SensorId, convert(varchar(10), SensorTime, 112)
) lastvalues on lastvalues.maxtime=DataTable.SensorTime and lastvalues.SensorId=DataTable.SensorId
where DataTable.SensorId=1
order by DataTable.SensorTime desc
Example output:
1 2017-05-17 21:07:14.840 54 1 2017-05-17 21:07:14.840 20170517
1 2017-05-16 23:35:37.220 94 1 2017-05-16 23:35:37.220 20170516
1 2017-05-14 22:35:48.970 8 1 2017-05-14 22:35:48.970 20170514
1 2017-05-13 14:56:34.557 94 1 2017-05-13 14:56:34.557 20170513
1 2017-05-12 22:28:55.400 89 1 2017-05-12 22:28:55.400 20170512

MySQL find only those that are not within 24 hours with foreign keys

Let's say today is 16th of November, 2013
and, let's say I have a table of dates like so
------------------------
id | foreign_id | date
------------------------
1 | 1 | 2013-11-01 05:42:38
2 | 2 | 2013-11-04 04:21:22
3 | 2 | 2013-11-16 15:11:55
I want to select those entries where there were no records of past 24 hours.
ie: id #3 was today at 15:11, so I don't want to select foreign_id that are 2.
I tried following with CakePHP 2.x in it's find api conditions
'HOUR(TIMEDIFF(NOW(), Traffic.accessed)) > 24'
Where Traffic is some table and accessed is some date field.
Unfortunately this still selects this record because it finds that there are previous entries that are having a date that are older than 24 hours.
Please help.
UPDATE
My actual query is like this
SELECT Traffic.accessed, Traffic.access_ip, Traffic.client_order_id
FROM `mdb`.`client_orders` AS `ClientOrder`
LEFT JOIN `geclicks`.`traffics`
AS `Traffic` ON (`Traffic`.`client_order_id` = `ClientOrder`.`id`)
WHERE `ClientOrder`.`status` = 'ACTIVE' AND `ClientOrder`.`offer_clicks` = 0
AND NOT (`ClientOrder`.`offer_id` = 0) AND ((`Traffic`.`access_ip` IS NULL)
OR (((`Traffic`.`access_ip` <> "444")
OR (((HOUR(TIMEDIFF(NOW(), MAX(`Traffic`.`accessed`))) > 24)
AND (`Traffic`.`access_ip` = "444"))))))
GROUP BY `Traffic`.`client_order_id` ORDER BY RAND() ASC
To select everything:
SELECT * FROM Records
Now let's filter out bad foreign keys (e.g. 1, 2, and 3):
SELECT * FROM Records WHERE foreign_id NOT IN (1, 2, 3)
Park that aside... how do we find the foreign_ids that have entries within 24 hours:
SELECT DISTINCT foreign_id FROM Records WHERE TIMEDIFF(CURRENT_TIMESTAMP(), date) <= 24
Plug that back in the original instead of the (1, 2, 3) example:
SELECT * FROM Records WHERE foreign_id NOT IN (SELECT DISTINCT foreign_id FROM Records WHERE TIMEDIFF(CURRENT_TIMESTAMP(), date) <= 24)

MySQL MAX() with GROUP BY

Considering these entries:
INSERT INTO `schedule_hours` (`id`, `weekday`, `start_hour`) VALUES
(1, 1, '09:00:00'),
(2, 2, '09:00:00'),
(3, 3, '09:00:00'),
(4, 4, '09:00:00'),
(5, 5, '09:00:00'),
(6, 6, NULL),
(7, 7, NULL),
(8, 1, '12:00:00');
I'm running the following query:
SELECT MAX(id), weekday, start_hour
FROM schedule_hours
GROUP BY weekday
ORDER BY weekday
The objective is to get a whole week (weekday 1-monday, 2-tuesday, etc...) but return the most recent entries.
So, in my table I now have 2 entries for Monday and 1 entry for the rest of the days, I only want to return the latest ones (id is an increment field), the right result should be:
8 1 12:00:00
2 2 09:00:00
3 3 09:00:00
4 4 09:00:00
5 5 09:00:00
6 6 NULL
7 7 NULL
What I'm currently getting:
8 1 09:00:00 < wrong
2 2 09:00:00
3 3 09:00:00
4 4 09:00:00
5 5 09:00:00
6 6 NULL
7 7 NULL
The id and weekday columns are correct, but the first row is showing a wrong result for the start_hour column!
You should try this query:
SELECT id, weekday, start_hour
FROM schedule_hours
WHERE id IN (
SELECT MAX(id)
FROM schedule_hours
GROUP BY weekday
)
ORDER BY weekday
Currently in your query, the columns in SELECT clause are different from the columns in GROUP BY clause. In standard SQL, your query is illegal and will result in a syntax error. However, MySQL extends the use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause, which is why you are not getting an error but the output is not what you are expecting. For more details, you may read MySQL Extensions to GROUP BY.
An alternative which avoids taking advantage of MySQL allowing a GROUP BY of a field which isn't in the SELECT statement:-
SELECT schedule_hours.id, schedule_hours.weekday, schedule_hours.start_hour
FROM schedule_hours
INNER JOIN
(
SELECT weekday, MAX(id) AS MaxId
FROM schedule_hours
GROUP BY weekday
)Sub1
ON schedule_hours.id = Sub1.MaxId
AND schedule_hours.weekday = Sub1.weekday
ORDER BY schedule_hours.weekday

Select all rows containing duplicate values in one of two columns from within distinct groups of related records

I'm trying to create a MySQL query that will return all individual rows (not grouped) containing duplicate values from within a group of related records. By 'groups of related records' I mean those with the same account number (per the sample below).
Basically, within each group of related records that share the same distinct account number, select just those rows whose values for the date or amount columns are the same as another row's values within that account's group of records. Values should only be considered duplicate from within that account's group. The sample table and ideal output details below should clear things up.
Also, I'm not concerned with any records with a status of X being returned, even if they have duplicate values.
Small sample table with relevant data:
id account invoice date amount status
1 1 1 2012-04-01 0 X
2 1 2 2012-04-01 120 P
3 1 2 2012-05-01 120 U
4 1 3 2012-05-01 117 U
5 2 4 2012-04-01 82 X
6 2 4 2012-05-01 82 U
7 2 5 2012-03-01 81 P
8 2 6 2012-05-01 80 U
9 3 7 2012-03-01 80 P
10 3 8 2012-04-01 79 U
11 3 9 2012-04-01 78 U
Ideal output returned from desired SQL query:
id account invoice date amount status
2 1 2 2012-04-01 120 P
3 1 2 2012-05-01 120 U
4 1 3 2012-05-01 117 U
6 2 4 2012-05-01 82 U
8 2 6 2012-05-01 80 U
10 3 8 2012-04-01 79 U
11 3 9 2012-04-01 78 U
Thus, row 7/9 and 8/9 should not both be returned because their duplicate values are not considered duplicate from within the scope of their respective accounts. However, row 8 should be returned because it shares a duplicate value with row 6.
Later, I may want to further hone the selection by grabbing only duplicate rows that have matching statuses, thus row 2 would be excluded because it does't match the other two found within that account's group of records. How much more difficult would that make the query? Would it just be a matter of adding a WHERE or HAVING clause, or is it more complicated than that?
I hope my explanation of what I'm trying to accomplish makes sense. I've tried using INNER JOIN but that returns each desired row more than once. I don't want duplicates of duplicates.
Table Structure and Sample Values:
CREATE TABLE payment (
id int(11) NOT NULL auto_increment,
account int(10) NOT NULL default '0',
invoice int(10) NOT NULL default '0',
date date NOT NULL default '0000-00-00',
amount int(10) NOT NULL default '0',
status char(1) NOT NULL default '',
PRIMARY KEY (id)
);
INSERT INTO payment VALUES (1, 1, 1, '2012-04-01', 0, 'X');
INSERT INTO payment VALUES (2, 1, 2, '2012-04-01', 120, 'P');
INSERT INTO payment VALUES (3, 1, 2, '2012-05-01', 120, 'U');
INSERT INTO payment VALUES (4, 1, 3, '2012-05-01', 117, 'U');
INSERT INTO payment VALUES (5, 2, 4, '2012-04-01', 82, 'X');
INSERT INTO payment VALUES (6, 2, 4, '2012-05-01', 82, 'U');
INSERT INTO payment VALUES (7, 2, 5, '2012-03-01', 81, 'p');
INSERT INTO payment VALUES (8, 2, 6, '2012-05-01', 80, 'U');
INSERT INTO payment VALUES (9, 3, 7, '2012-03-01', 80, 'U');
INSERT INTO payment VALUES (10, 3, 8, '2012-04-01', 79, 'U');
INSERT INTO payment VALUES (11, 3, 9, '2012-04-01', 78, 'U');
This type of query can be implemented as a semi join.
Semijoins are used to select rows from one of the tables in the join.
For example:
select distinct l.*
from payment l
inner join payment r
on
l.id != r.id and l.account = r.account and
(l.date = r.date or l.amount = r.amount)
where l.status != 'X' and r.status != 'X'
order by l.id asc;
Note the use of distinct, and that I'm only selecting columns from the left table. This ensures that there are no duplicates.
The join condition checks that:
it's not joining a row to itself (l.id != r.id)
rows are in the same account (l.account = r.account)
and either the date or the amount is the same (l.date = r.date or l.amount = r.amount)
For the second part of your question, you would need to update the on clause in the query.
This seems to work
select * from payment p1
join payment p2 on
(p1.id != p2.id
and p1.status != 'X'
and p1.account = p2.account
and (p1.amount = p2.amount or p1.date = p2.date))
group by p1.id
http://sqlfiddle.com/#!2/a50e9/3