Deleting records based on a group by in MYSQL - mysql

I'm having trouble coming up with a query which is going to allow me to keep only the most recent order from a user (maybe a better way to say this is delete all old orders):
CREATE TABLE orders(id integer, created_at datetime, user_id integer, label nvarchar(25));
INSERT INTO orders values(1, now(), 1, 'FRED FIRST');
INSERT INTO orders values(2, DATE_ADD(now(), INTERVAL 1 DAY), 1, 'FRED SECOND');
INSERT INTO orders values(3, DATE_ADD(now(), INTERVAL 2 DAY), 1, 'FRED THIRD');
INSERT INTO orders values(4, DATE_ADD(now(), INTERVAL 1 DAY), 3, 'BARNEY FIRST');
SELECT * FROM orders;
'1','2014-03-07 08:39:36','1','FRED FIRST'
'2','2014-03-08 08:39:36','1','FRED SECOND'
'3','2014-03-09 08:39:36','1','FRED THIRD'
'4','2014-03-08 08:39:36','3','BARNEY FIRST'
I would like to run a query which would leave me with FRED's THIRD order and BARNEY's FIRST order. FRED FIRST and FRED SECOND should be deleted because they are not the latest order from FRED.
Any thoughts about how I might be able to do this with a single query?
EDIT: After posting this, I found something that works (it does what I'm looking to do)-- but it seems a bit messy:
DELETE old_orders
FROM orders old_orders
left outer join(
SELECT MAX(created_at) as created_at, user_id
FROM orders
GROUP BY user_id) new_orders
ON new_orders.user_id = old_orders.user_id and new_orders.created_at = old_orders.created_at
WHERE new_orders.user_id is null;

Use a nested query, like this:
DELETE FROM orders
WHERE id NOT IN (
SELECT id FROM (
select id from orders o JOIN (
select user_id, max(created_at) t from orders group by user_id
) o1 ON o.user_id = o1.user_id AND o.created_at = o1.t
) AS tmp
)
Working Fiddle: http://sqlfiddle.com/#!2/56d913/1

One way you might achieve this is to set a flag for the row indicating that it is the most recent order. So when a new order is placed you would clear the flag on other orders for that customer and set the flag for the row that your inserting. Then your DELETE query could just delete all orders that don't have that flag set.

Related

How to calculate 3 month active users

I have a table that contains an orderId, a timestamp and a customerId, like this:
DROP TABLE IF EXISTS testdata;
CREATE TABLE testdata (
`orderId` int,
`createdOn` datetime(6),
`customerId` int,
PRIMARY KEY (`orderId`)
);
INSERT INTO testdata (orderId, createdOn, customerId) VALUES
('1000001','2020-01-01 17:08:41.460000','101'),
('1000002','2020-01-02 18:01:00.180000','102'),
('1000003','2020-01-03 12:26:02.460000','103'),
('1000004','2020-01-04 13:32:42.610000','104'),
('1000005','2020-01-05 20:21:28.540000','101'),
('1000006','2020-01-06 11:54:20.530000','102'),
('1000007','2020-02-01 20:54:42.470000','102'),
('1000008','2020-02-02 10:21:29.470000','102'),
('1000009','2020-02-03 16:22:23.880000','102'),
('1000010','2020-02-04 16:22:23.880000','103'),
('1000011','2020-02-05 17:08:41.460000','103'),
('1000012','2020-02-06 18:01:00.180000','103'),
('1000013','2020-03-01 12:26:02.460000','102'),
('1000014','2020-03-02 13:32:42.610000','102'),
('1000015','2020-03-03 20:21:28.540000','103'),
('1000016','2020-03-04 11:54:20.530000','103'),
('1000017','2020-03-05 20:54:42.470000','104'),
('1000018','2020-03-06 10:21:29.470000','104'),
('1000019','2020-04-01 16:22:23.880000','103'),
('1000020','2020-04-02 16:22:23.880000','103'),
('1000021','2020-04-03 17:08:41.460000','103'),
('1000022','2020-04-04 18:01:00.180000','104'),
('1000023','2020-04-05 12:26:02.460000','104'),
('1000024','2020-04-06 13:32:42.610000','104'),
('1000025','2020-05-01 20:21:28.540000','103'),
('1000026','2020-05-02 11:54:20.530000','103'),
('1000027','2020-05-03 20:54:42.470000','104'),
('1000028','2020-05-04 10:21:29.470000','104'),
('1000029','2020-05-05 16:22:23.880000','105'),
('1000030','2020-05-06 16:22:23.880000','105'),
('1000031','2020-05-01 20:21:28.540000','104'),
('1000032','2020-05-02 11:54:20.530000','104'),
('1000033','2020-05-03 20:54:42.470000','104'),
('1000034','2020-05-04 10:21:29.470000','105'),
('1000035','2020-05-05 16:22:23.880000','105'),
('1000036','2020-05-06 16:22:23.880000','105')
;
Now I want to calculate for each month the number of customers that have been active (i.e., have an order) within the last 3 months (i.e., current month or the preceding two months).
I manage to calculate the active users for the current month, like this:
SELECT
EXTRACT(YEAR_MONTH FROM createdOn) AS order_createdOn_ym
,COUNT(DISTINCT customerId) AS mau
FROM testdata
GROUP BY order_createdOn_ym
ORDER BY order_createdOn_ym asc
;
(Fiddle over here.)
However, I'm completely stumped as to how you can approach calculating the 3-months-active users.
Any help is greatly appreciated!
Here is one option:
select c.createdmonth, count(distinct customerid) as mau
from (
select distinct date_format(createdon, '%Y-%m-01') as createdmonth
from testdata
) c
left join testdata t
on t.createdon >= c.createdmonth - interval 2 month
and t.createdon < c.createdmonth + interval 1 month
group by c.createdmonth
The idea is to enumerate the distinct months, then bring the table with a left join that recovers the last 2 month and the current month. You can then aggregate and count the number of distinct customers per group.
Thanks to #GMB for providing the solution. Purely as a matter of taste I prefer to have the month interval the following way though:
SELECT date_format(c.end_of_createdOn_month, '%Y-%m') as order_month,
count(distinct customerid) as mau_3m
FROM (
select distinct LAST_DAY(createdOn) as end_of_createdOn_month
from testdata
) c
LEFT JOIN testdata t
ON t.createdon >= (c.end_of_createdOn_month - interval 3 month)
AND t.createdon <= c.end_of_createdOn_month
GROUP BY c.end_of_createdOn_month;

Minimum number of Meeting Rooms required to Accomodate all Meetings in MySQL

I have the following columns in a table called meetings: meeting_id - int, start_time - time, end_time - time. Assuming that this table has data for one calendar day only, how many minimum number of rooms do I need to accomodate all the meetings. Room size/number of people attending the meetings don't matter.
Here's the solution:
select * from
(select t.start_time,
t.end_time,
count(*) - 1 overlapping_meetings,
count(*) minimum_rooms_required,
group_concat(distinct concat(y.start_time,' to ',t.end_time)
separator ' // ') meeting_details from
(select 1 meeting_id, '08:00' start_time, '09:15' end_time union all
select 2, '13:20', '15:20' union all
select 3, '10:00', '14:00' union all
select 4, '13:55', '16:25' union all
select 5, '14:00', '17:45' union all
select 6, '14:05', '17:45') t left join
(select 1 meeting_id, '08:00' start_time, '09:15' end_time union all
select 2, '13:20', '15:20' union all
select 3, '10:00', '14:00' union all
select 4, '13:55', '16:25' union all
select 5, '14:00', '17:45' union all
select 6, '14:05', '17:45') y
on t.start_time between y.start_time and y.end_time
group by start_time, end_time) z;
My question - is there anything wrong with this answer? Even if there's nothing wrong with this, can someone share a better answer?
Let's say you have a table called 'meeting' like this -
Then You can use this query to get the minimum number of meeting Rooms required to accommodate all Meetings.
select max(minimum_rooms_required)
from (select count(*) minimum_rooms_required
from meetings t
left join meetings y on t.start_time >= y.start_time and t.start_time < y.end_time group by t.id
) z;
This looks clearer and simple and works fine.
Meetings can "overlap". So, GROUP BY start_time, end_time can't figure this out.
Not every algorithm can be done in SQL. Or, at least, it may be grossly inefficient.
I would use a real programming language for the computation, leaving the database for what it is good at -- being a data repository.
Build a array of 1440 (minutes in a day) entries; initialize to 0.
Foreach meeting:
Foreach minute in the meeting (excluding last minute):
increment element in array.
Find the largest element in the array -- the number of rooms needed.
CREATE TABLE [dbo].[Meetings](
[id] [int] NOT NULL,
[Starttime] [time](7) NOT NULL,
[EndTime] [time](7) NOT NULL) ON [PRIMARY] )GO
sample data set:
INSERT INTO Meetings VALUES (1,'8:00','09:00')
INSERT INTO Meetings VALUES (2,'8:00','10:00')
INSERT INTO Meetings VALUES (3,'10:00','11:00')
INSERT INTO Meetings VALUES (4,'11:00','12:00')
INSERT INTO Meetings VALUES (5,'11:00','13:00')
INSERT INTO Meetings VALUES (6,'13:00','14:00')
INSERT INTO Meetings VALUES (7,'13:00','15:00')
To Find Minimum number of rooms required run the below query:
create table #TempMeeting
(
id int,Starttime time,EndTime time,MeetingRoomNo int,Rownumber int
)
insert into #TempMeeting select id, Starttime,EndTime,0 as MeetingRoomNo,ROW_NUMBER()
over (order by starttime asc) as Rownumber from Meetings
declare #RowCounter int
select top 1 #RowCounter=Rownumber from #TempMeeting order by Rownumber
WHILE #RowCounter<=(Select count(*) from #TempMeeting)
BEGIN
update #TempMeeting set MeetingRoomNo=1
where Rownumber=(select top 1 Rownumber from #TempMeeting where
Rownumber>#RowCounter and Starttime>=(select top 1 EndTime from #TempMeeting
where Rownumber=#RowCounter)and MeetingRoomNo=0)set #RowCounter=#RowCounter+1
END
select count(*) from #TempMeeting where MeetingRoomNo=0
Consider a table meetings with columns id, start_time and end_time. Then the following query should give correct answer.
with mod_meetings as (select id, to_timestamp(start_time, 'HH24:MI')::TIME as start_time,
to_timestamp(end_time, 'HH24:MI')::TIME as end_time from meetings)
select CASE when max(a_cnt)>1 then max(a_cnt)+1
when max(a_cnt)=1 and max(b_cnt)=1 then 2 else 1 end as rooms
from
(select count(*) as a_cnt, a.id, count(b.id) as b_cnt from mod_meetings a left join mod_meetings b
on a.start_time>b.start_time and a.start_time<b.end_time group by a.id) join_table;
Sample DATA:
DROP TABLE IF EXISTS meeting;
CREATE TABLE "meeting" (
"meeting_id" INTEGER NOT NULL UNIQUE,
"start_time" TEXT NOT NULL,
"end_time" TEXT NOT NULL,
PRIMARY KEY("meeting_id")
);
INSERT INTO meeting values (1,'08:00','14:00');
INSERT INTO meeting values (2,'09:00','10:30');
INSERT INTO meeting values (3,'11:00','12:00');
INSERT INTO meeting values (4,'12:00','13:00');
INSERT INTO meeting values (5,'10:15','11:00');
INSERT INTO meeting values (6,'12:00','13:00');
INSERT INTO meeting values (7,'10:00','10:30');
INSERT INTO meeting values (8,'11:00','13:00');
INSERT INTO meeting values (9,'11:00','14:00');
INSERT INTO meeting values (10,'12:00','14:00');
INSERT INTO meeting values (11,'10:00','14:00');
INSERT INTO meeting values (12,'12:00','14:00');
INSERT INTO meeting values (13,'10:00','14:00');
INSERT INTO meeting values (14,'13:00','14:00');
Solution:
DROP VIEW IF EXISTS Final;
CREATE VIEW Final AS SELECT time, group_concat(event), sum(num) num from (
select start_time time, 's' event, 1 num from meeting
union all
select end_time time, 'e' event, -1 num from meeting)
group by 1
order by 1;
select max(room) AS Min_Rooms_Required FROM (
select
a.time,
sum(b.num) as room
from
Final a
, Final b
where a.time >= b.time
group by a.time
order by a.time
);
Here's the explanation to gashu's nicely working code (or otherwise a non-code explanation of how to solve it with any language).
Firstly, if the variable 'minimum_rooms_required' would be renamed to 'overlap' it would make the whole thing much easier to understand. Because for each of the start or end times we want to know the numbers of overlapping ongoing meetings. When we found the maximum, this means there's no way of getting around with less than the overlapping amount, because well they overlap.
By the way, I think there might be a mistake in the code. It should check for t.start_time or t.end_time between y.start_time and y.end_time. Counterexample: meeting 1 starts at 8:00, ends at 11:00 and meeting 2 starts at 10:00, ends at 12:00.
(I'd post it as a comment to the gashu's answerbut I don't have enough reputation)
I'd go for Lead() analytic function
select
sum(needs_room_ind) as min_rooms
from (
select
id,
start_time,
end_time,
case when lead(start_time,1) over (order by start_time asc) between start_time
and end_time then 1 else 0 end as needs_room_ind
from
meetings
) a
IMO, I wanna to take the difference between how many meeting are started and ended at the same time when each meeting_id is started (assuming meeting starts and ends on time)
my code was just like this :
with alpha as
(
select a.meeting_id,a.start_time,
count(distinct b.meeting_id) ttl_meeting_start_before,
count(distinct c.meeting_id) ttl_meeting_end_before
from meeting a
left join
(
select meeting_id,start_time from meeting
) b
on a.start_time > b.start_time
left join
(
select meeting_id,end_time from meeting
) c
on a.start_time > c.end_time
group by a.meeting_id,a.start_time
)
select max(ttl_meeting_start_before-ttl_meeting_end_before) max_meeting_room
from alpha

Get transactions balance for each month

I got a 2 column table with transactions where time of change (unix_time) and change value is stored.
create table transactions (
changed int(11),
points int(11)
);
insert into transactions values (UNIX_TIMESTAMP('2014-03-27 03:00:00'), +100);
insert into transactions values (UNIX_TIMESTAMP('2014-05-02 03:00:00'), +100);
insert into transactions values (UNIX_TIMESTAMP('2015-01-01 03:00:00'), -100);
insert into transactions values (UNIX_TIMESTAMP('2015-05-01 03:00:00'), +150);
To get current balance you need to sum all values and to get balance from the past you need to sum if change time for this value is less then requested like:
select
sum(case when changed < unix_timestamp('2013-12-01') then
points
else
0
end) as cash_balance_2013_11,
...
so for each month there need to be a separate SQL code. I would like to have SQL code that will give me balances for all months. (eg from fixed date till now)
EDIT:
HERE IS SQL FIDDLE
Can you just group by and order by month?
UPDATE: to get running totals you have to join the individual months to a set of totals-by-month, matching on "less than or equal to":-
select
m.single_month
, sum(month_of_change.total_points) as running total_by_month
from
(
select
sum(points) as total_points
, month_of_change
from
(
select
points
, MONTH(FROM_UNIXTIME(t.time_of_change)) as month_of_change -- assumes unix_time
from mytable t
) x
group by month_of_change
) monthly_totals
inner join
(
select distinct MONTH(FROM_UNIXTIME(t.time_of_change)) as single_month
) m
on monthly_totals.month_of_change <= m.single_month
group by m.single_month
(N.B: not tested)

MySQL Query for finding a "LAST" row, based on two fields

I have the following MySQL table to log the registration status changes of pupils:
CREATE TABLE `pupil_registration_statuses` (
`status_id` INT(11) NOT NULL AUTO_INCREMENT,
`status_pupil_id` INT(10) UNSIGNED NOT NULL,
`status_status_id` INT(10) UNSIGNED NOT NULL,
`status_effectivedate` DATE NOT NULL,
PRIMARY KEY (`status_id`),
INDEX `status_pupil_id` (`status_pupil_id`)
)
COLLATE='utf8_general_ci'
ENGINE=MyISAM;
Example data:
INSERT INTO `pupil_registration_statuses` (`status_id`, `status_pupil_id`, `status_status_id`, `status_effectivedate`) VALUES
(1, 123, 1, '2013-05-06'),
(2, 123, 2, '2014-03-15'),
(3, 123, 5, '2013-03-15'),
(4, 123, 6, '2013-05-06'),
(5, 234, 2, '2013-02-02'),
(6, 234, 4, '2013-04-17'),
(7, 345, 2, '2014-02-01'),
(8, 345, 3, '2013-06-01');
It is possible that statuses can be inserted, thus the sequence of dates does not necessarily follow the same sequence of IDs.
For example: status_id 1 might has a date of 2013-05-06, but status_id 3 might have a date of 2013-03-15.
status_id values are, however, sequential within any particular date. Thus if a pupil's registration status changes multiple times on one day then the last row will will reflect their status for that date.
It is necessary to find out a particular student's registration status on a particular date. The following query works for an individual pupil:
SELECT *
FROM pupil_registration_statuses
WHERE status_pupil_id = 123
AND status_effectivedate <= '2013-05-06'
ORDER BY status_effectivedate DESC, status_id DESC
LIMIT 1;
This returns the expected row of status_id = 4
However, I now need to issue a (single) query to return the status for all pupils on a particular date.
The following query is proposed, but doesn't obey the "last status_id in a day" requirement:
SELECT *
FROM pupil_registration_statuses prs
INNER JOIN (SELECT status_pupil_id, MAX(status_effectivedate) last_date
FROM pupil_registration_statuses
WHERE status_effectivedate <= '2013-05-06'
GROUP BY status_pupil_id) qprs ON prs.status_pupil_id = qprs.status_pupil_id AND prs.status_effectivedate = qprs.last_date;
This query, however, returns 2 rows for pupil 123.
EDIT
To clarify, if the input is the date '2013-05-06', I expect to get the rows 4 and 6 from the query.
http://sqlfiddle.com/#!2/68ee6/2
Is this what you're after?
SELECT a.*
FROM pupil_registration_statuses a
JOIN
( SELECT prs.status_pupil_id
, MIN(prs.status_id) min_status_id
FROM pupil_registration_statuses prs
JOIN
( SELECT status_pupil_id
, MAX(status_effectivedate) last_date
FROM pupil_registration_statuses
WHERE status_effectivedate <= '2013-05-06'
GROUP
BY status_pupil_id
) qprs
ON prs.status_pupil_id = qprs.status_pupil_id
AND prs.status_effectivedate = qprs.last_date
GROUP
BY prs.status_pupil_id
) b
ON b.min_status_id = a.status_id;
http://sqlfiddle.com/#!2/68ee6/7
(Incidentally, there's an ugly and undocumented hack for this kind of problem which goes something like this:
SELECT x.* FROM (SELECT * FROM prs WHERE status_effectivedate <= '2013-05-06' ORDER BY status_pupil_id, status_effectivedate DESC, status_id)x GROUP BY status_pupil_id;
...but I didn't tell you that! ;) )
If I understood right, you want to...
1) Get 1 row per person.
2) Get the status changes from the specific day you manually input.
3) Get the last status changes from within the specific day.
If that's right, you need the query you already have ordering by date and then by id, just with a distinct.
SELECT DISTINCT on status_pupil_id *
FROM pupil_registration_statuses
WHERE status_pupil_id = 123
AND status_effectivedate <= '2013-05-06'
ORDER BY status_effectivedate DESC, status_id DESC
I have changed where clause, please try it.
SELECT *
FROM pupil_registration_statuses prs
INNER JOIN (SELECT status_pupil_id, MAX(status_effectivedate) last_date
FROM pupil_registration_statuses
WHERE Datediff(status_effectivedate, '2013-05-06') <= 0
GROUP BY status_pupil_id) qprs ON prs.status_pupil_id = qprs.status_pupil_id AND prs.status_effectivedate = qprs.last_date;
EDIT
Try this
SELECT *
FROM
(
select status_pupil_id,max(status_id) as status_id from pupil_registration_statuses innr
--where Datediff(dd,status_effectivedate, '2013-05-06') >= 0
group by status_pupil_id
)as ca
inner join pupil_registration_statuses prs on prs.status_id = ca.status_id
where Datediff(dd,prs.status_effectivedate, '2013-05-06') >= 0

How to get users that purchased items ONLY in a specific time period (MySQL Database)

I have a table that contains all purchased items.
I need to check which users purchased items in a specific period of time (say between 2013-03-21 to 2013-04-21) and never purchased anything after that.
I can select users that purchased items in that period of time, but I don't know how to filter those users that never purchased anything after that...
SELECT `userId`, `email` FROM my_table
WHERE `date` BETWEEN '2013-03-21' AND '2013-04-21' GROUP BY `userId`
Give this a try
SELECT
user_id
FROM
my_table
WHERE
purchase_date >= '2012-05-01' --your_start_date
GROUP BY
user_id
HAVING
max(purchase_date) <= '2012-06-01'; --your_end_date
It works by getting all the records >= start date, groups the resultset by user_id and then finds the max purchase date for every user. The max purchase date should be <=end date. Since this query does not use a join/inner query it could be faster
Test data
CREATE table user_purchases(user_id int, purchase_date date);
insert into user_purchases values (1, '2012-05-01');
insert into user_purchases values (2, '2012-05-06');
insert into user_purchases values (3, '2012-05-20');
insert into user_purchases values (4, '2012-06-01');
insert into user_purchases values (4, '2012-09-06');
insert into user_purchases values (1, '2012-09-06');
Output
| USER_ID |
-----------
| 2 |
| 3 |
SQLFIDDLE
This is probably a standard way to accomplish that:
SELECT `userId`, `email` FROM my_table mt
WHERE `date` BETWEEN '2013-03-21' AND '2013-04-21'
AND NOT EXISTS (
SELECT * FROM my_table mt2 WHERE
mt2.`userId` = mt.`userId`
and mt2.`date` > '2013-04-21'
)
GROUP BY `userId`
SELECT `userId`, `email` FROM my_table WHERE (`date` BETWEEN '2013-03-21' AND '2013-04-21') and `date` >= '2013-04-21' GROUP BY `userId`
This will select only the users who purchased during that timeframe AND purchased after that timeframe.
Hope this helps.
Try the following
SELECT `userId`, `email`
FROM my_table WHERE `date` BETWEEN '2013-03-21' AND '2013-04-21'
and user_id not in
(select user_id from my_table
where `date` < '2013-03-21' or `date` > '2013-04-21' )
GROUP BY `userId`
You'll have to do it in two stages - one query to get the list of users who did buy within the time period, then another query to take that list of users and see if they bought anything afterwards, e.g.
SELECT userID, email, count(after.*) AS purchases
FROM my_table AS after
LEFT JOIN (
SELECT DISTINCT userID
FROM my_table
WHERE `date` BETWEEN '2013-03-21' AND '2013-04-21'
) AS during ON after.userID = during.userID
WHERE after.date > '2013-04-21'
HAVING purchases = 0;
Inner query gets the list of userIDs who purchased at least one thing during that period. That list is then joined back against the same table, but filtered for purchases AFTER the period , and counts how many purchases they made and filters down to only those users with 0 "after" purchases.
probably won't work as written - haven't had my morning tea yet.
SELECT
a.userId,
a.email
FROM
my_table AS a
WHERE a.date BETWEEN '2013-03-21'
AND '2013-04-21'
AND a.userId NOT IN
(SELECT
b.userId
FROM
my_table AS b
WHERE b.date BETWEEN '2013-04-22'
AND CURDATE()
GROUP BY b.userId)
GROUP BY a.userId
This filters out anyone who has not purchased anything from the end date to the present.