mysql - how to delete records with condition - mysql

Here is my table structure:
CREATE TABLE `order`
(`order_id` int, `order_status_id` int, `ip` varchar(11), `date_added` datetime)
;
INSERT INTO `order`
(`order_id`, `order_status_id`, `ip`, `date_added`)
VALUES
(1, 0, '192.168.1.1', '2016-12-07 00:00:00'),
(2, 0, '192.168.1.1', '2016-12-07 00:00:00'),
(3, 0, '192.168.1.1', '2016-12-07 00:00:00'),
(4, 0, '192.168.1.1', '2016-12-07 00:00:00'),
(5, 1, '192.168.1.1', '2016-12-07 00:00:00'),
(6, 0, '192.168.1.2', '2016-12-08 00:00:00'),
(7, 0, '192.168.1.2', '2016-12-08 00:00:00'),
(8, 0, '192.168.1.2', '2016-12-08 00:00:00'),
(9, 0, '192.168.1.2', '2016-12-08 00:00:00'),
(10, 1, '192.168.1.2', '2016-12-08 00:00:00'),
(11, 0, '192.168.1.3', '2016-12-09 00:00:00'),
(12, 0, '192.168.1.3', '2016-12-09 00:00:00'),
(13, 0, '192.168.1.3', '2016-12-09 00:00:00'),
(14, 0, '192.168.1.3', '2016-12-09 00:00:00'),
(15, 0, '192.168.1.3', '2016-12-09 00:00:00');
http://sqlfiddle.com/#!9/20c0f
I expect sql erase all records except records where order_id are: 5, 10, 15
Explanations: I would like to erase records which has order_status_id = 0 with conditions:
a) if one of the record (from same ip/date_added order_id) has order_status_id = 1, then erase all records which has order_status_id = 0 (from same ip/date_added). In my example records 1-4 and 6-9 should be deleted.
b) if there are no record from same ip/date_added with order_id = 1 (all records has order_status_id = 0), then leave one record with the highest order_id (all other deleted). In my example records 11-14 should be deleted.

SQL DEMO
First you need separate your condition. You have {ip,date} with order_status_id {0,1}
-- first filter
SELECT `ip`, `date_added`
FROM `order`
GROUP BY `ip`, `date_added`
HAVING MAX(`order_status_id`) = 1;
-- second filter
SELECT `ip`, `date_added`
FROM `order`
GROUP BY `ip`, `date_added`
HAVING MAX(`order_status_id`) = 0;
First delete
You delete everyone matching the first filter but only those with order_status_id = 0 meaning you left the one with 1
DELETE o
FROM `order` o
INNER JOIN (
SELECT `ip`, `date_added`
FROM `order`
GROUP BY `ip`, `date_added`
HAVING MAX(`order_status_id`) = 1
) filter
ON o.`ip` = filter.`ip`
AND o.`date_added` = filter.`date_added`
WHERE o.`order_status_id` = 0;
Second delete
You delete everyone matching second filter, but leave the one who doesnt have any id higher than him.
DELETE o1
FROM `order` o1
INNER JOIN (
SELECT `ip`, `date_added`
FROM `order`
GROUP BY `ip`, `date_added`
HAVING MAX(`order_status_id`) = 0
) filter
ON o1.`ip` = filter.`ip`
AND o1.`date_added` = filter.`date_added`
LEFT JOIN `order` o2
ON o1.`order_id` < o2.`order_id`
WHERE o2.`ip` IS NOT NULL;
OUTPUT

Related

How do I build a query to get the latest row per user where a third criteria is in a separate table?

I have three tables
CREATE TABLE `LineItems` (
`LineItemID` int NOT NULL,
`OrderID` int NOT NULL,
`ProductID` int NOT NULL
);
INSERT INTO `LineItems` (`LineItemID`, `OrderID`, `ProductID`) VALUES
(1, 1, 2),
(2, 1, 1),
(3, 2, 3),
(4, 2, 4),
(5, 3, 1),
(6, 4, 2),
(7, 5, 4),
(8, 5, 2),
(9, 5, 3),
(10, 6, 1),
(11, 6, 4),
(12, 7, 4),
(13, 7, 1),
(14, 7, 2),
(15, 8, 1),
(16, 9, 3),
(17, 9, 4),
(18, 10, 3);
CREATE TABLE `Orders` (
`OrderID` int NOT NULL,
`UserID` int NOT NULL,
`OrderDate` datetime NOT NULL
);
INSERT INTO `Orders` (`OrderID`, `UserID`, `OrderDate`) VALUES
(1, 21, '2021-05-01 00:00:00'),
(2, 21, '2021-05-03 00:00:00'),
(3, 24, '2021-05-06 00:00:00'),
(4, 23, '2021-05-12 00:00:00'),
(5, 21, '2021-05-14 00:00:00'),
(6, 22, '2021-05-16 00:00:00'),
(7, 23, '2021-05-20 00:00:00'),
(8, 21, '2021-05-22 00:00:00'),
(9, 24, '2021-05-23 00:00:00'),
(10, 23, '2021-05-26 00:00:00');
CREATE TABLE `Products` (
`ProductID` int NOT NULL,
`ProductTitle` VARCHAR(250) NOT NULL,
`ProductType` enum('doors','windows','flooring') NOT NULL
);
INSERT INTO `Products` (`ProductID`, `ProductTitle`, `ProductType`) VALUES
(1, 'French Doors','doors'),
(2, 'Sash Windows','windows'),
(3, 'Sliding Doors','doors'),
(4, 'Parquet Floor','flooring');
SQL Fiddle:
Orders - contains an order date and a user id
LineItems - Foreign key to the orders table, contains product ids that are in the order
Products - Contains details of the products (including if they are a door, window, or flooring)
I have figured out how to get the latest order per user with
SELECT O.* FROM Orders O LEFT JOIN Orders O2
ON O2.UserID=O.UserID AND O.OrderDate < O2.OrderDate
WHERE O2.OrderDate IS NULL;
This works fine and is included in the SQL fiddle, along with a query that returns a complete picture for reference.
I am trying to figure out how to get the latest order with flooring per user, but I'm not having any luck.
In the SQL fiddle linked above, the intended output for what I am after would be
OrderID | UserID | OrderDate
6 | 22 | 2021-05-16T00:00:00Z
5 | 21 | 2021-05-14T00:00:00Z
9 | 24 | 2021-05-23T00:00:00Z
7 | 23 | 2021-05-20T00:00:00Z
EDIT: To clarify, in the intended result, two rows (for users 21 and 23) are different than in the query that gets just latest order per user. This is because order IDs 8 and 10 (from the latest order per user query) do not include flooring. The intended query has to find the latest order with flooring from each user to return in the result set.
You need to add the LineItems and Products tables to your query to find orders where flooring was purchased:
SELECT DISTINCT O.*
FROM Orders O
LEFT JOIN Orders O2
ON O2.UserID=O.UserID AND
O.OrderDate < O2.OrderDate
INNER JOIN LineItems i
ON i.OrderID = O.OrderID
INNER JOIN Products p
ON p.ProductID = i.ProductID
WHERE O2.OrderDate IS NULL AND
p.ProductType = 'flooring'
db<>fiddle here

Find users with activities in all the last 6 months

I'm looking for the best solution on retrieving the list of users ID with activities in all the last 6 months.
Table structure and data, simplified, is the following:
CREATE TABLE activities (
id int,
client_id int,
created_at timestamp
);
insert into activities values
(1, 1, '2019-06-01 00:00:00'),
(2, 2, '2019-06-01 00:00:00'),
(3, 1, '2019-07-01 00:00:00'),
(4, 1, '2019-08-01 00:00:00'),
(5, 1, '2019-09-01 00:00:00'),
(6, 1, '2019-10-01 00:00:00'),
(7, 1, '2019-11-01 00:00:00'),
(8, 2, '2019-11-01 00:00:00'),
(9, 3, '2019-11-01 00:00:00');
I need to retrieve the list of users that has at least one activity for each one of the last 6 months. In the previous example just client_id 1
I thought doing a join, but it seems too expensive. I won't give you any idea on possible solutions in order not to deviate and accept whatever you have in mind.
Please, consider that I have to manage a really big data source (more then 50 millions rows).
Any quick idea?
I make no claims for the supremacy of this solution, partly because I find such requests disingenuous, but it should work, at least...
CREATE TABLE activities (
id int,
client_id int,
created_at timestamp
);
insert into activities values
(1, 1, '2019-06-01 00:00:00'),
(2, 2, '2019-06-01 00:00:00'),
(3, 1, '2019-07-01 00:00:00'),
(4, 1, '2019-08-01 00:00:00'),
(5, 1, '2019-09-01 00:00:00'),
(6, 1, '2019-10-01 00:00:00'),
(7, 1, '2019-11-01 00:00:00'),
(8, 2, '2019-11-01 00:00:00'),
(9, 3, '2019-11-01 00:00:00');
SELECT a.client_id
FROM activities a
WHERE a.created_at >= LAST_DAY(CURDATE() - INTERVAL 7 MONTH)+INTERVAL 1 DAY
GROUP
BY a.client_id
HAVING COUNT(DISTINCT(DATE_FORMAT(a.created_at,'%Y-%m'))) >= 6;
+-----------+
| client_id |
+-----------+
| 1 |
+-----------+

Select rows grouped by a column having max aggregate

Given the following data set, how would I find the email addresses that were references for the most ApplicationIDs that have an "Accepted" decision?
CREATE TABLE IF NOT EXISTS `EmailReferences` (
`ApplicationID` INT NOT NULL,
`Email` VARCHAR(45) NOT NULL,
PRIMARY KEY (`ApplicationID`, `Email`)
);
INSERT INTO EmailReferences (ApplicationID, Email)
VALUES
(1, 'ref10#test.org'), (1, 'ref11#test.org'), (1, 'ref12#test.org'),
(2, 'ref20#test.org'), (2, 'ref21#test.org'), (2, 'ref22#test.org'),
(3, 'ref11#test.org'), (3, 'ref31#test.org'), (3, 'ref32#test.org'),
(4, 'ref40#test.org'), (4, 'ref41#test.org'), (4, 'ref42#test.org'),
(5, 'ref50#test.org'), (5, 'ref51#test.org'), (5, 'ref52#test.org'),
(6, 'ref60#test.org'), (6, 'ref11#test.org'), (6, 'ref62#test.org'),
(7, 'ref70#test.org'), (7, 'ref71#test.org'), (7, 'ref72#test.org'),
(8, 'ref10#test.org'), (8, 'ref81#test.org'), (8, 'ref82#test.org')
;
CREATE TABLE IF NOT EXISTS `FinalDecision` (
`ApplicationID` INT NOT NULL,
`Decision` ENUM('Accepted', 'Denied') NOT NULL,
PRIMARY KEY (`ApplicationID`)
);
INSERT INTO FinalDecision (ApplicationID, Decision)
VALUES
(1, 'Accepted'), (2, 'Denied'),
(3, 'Accepted'), (4, 'Denied'),
(5, 'Denied'), (6, 'Denied'),
(7, 'Denied'), (8, 'Accepted')
;
Fiddle of same:http://sqlfiddle.com/#!9/03bcf2/1
Initially, I was using LIMIT 1 and ORDER BY CountDecision DESC, like so:
SELECT er.email, COUNT(fd.Decision) AS CountDecision
FROM EmailReferences AS er
JOIN FinalDecision AS fd ON er.ApplicationID = fd.ApplicationID
WHERE fd.Decision = 'Accepted'
GROUP BY er.email
ORDER BY CountDecision DESC
LIMIT 1
;
However, it occurred to me that I could have multiple email addresses that referred different "most accepted" decisions (i.e., a tie, so to speak), and those would be filtered out (is that the right phrasing?) with the LIMIT keyword.
I then tried a variation on the above query, replacing the ORDER BY and LIMIT lines with:
HAVING MAX(CountDecision)
But I realized that that's only half a statement: MAX(CountDecision) needs to be compared to something. I just don't know what.
Any pointers would be much appreciated. Thanks!
Note: this is for a homework assignment.
Update: To be clear, I'm trying to find value and count of Emails from EmailReferences. However, I only want rows that have FinalDecision.Decision = 'Accepted' (on matching ApplicantIDs). Based on my data, the result should be:
Email | CountDecision
---------------+--------------
ref10#test.org | 2
ref11#test.org | 2
For example...
SELECT a.*
FROM
( SELECT x.email
, COUNT(*) total
FROM emailreferences x
JOIN finaldecision y
ON y.applicationid = x.applicationid
WHERE y.decision = 'accepted'
GROUP
BY x.email
) a
JOIN
( SELECT COUNT(*) total
FROM emailreferences x
JOIN finaldecision y
ON y.applicationid = x.applicationid
WHERE y.decision = 'accepted'
GROUP
BY x.email
ORDER
BY total DESC
LIMIT 1
) b
ON b.total = a.total;
MySQL still lack window functions, but when version 8 is production ready, this becomes easier. So for fuure reference, or for those databases like Mariadb that already have window functions:
CREATE TABLE IF NOT EXISTS `EmailReferences` (
`ApplicationID` INT NOT NULL,
`Email` VARCHAR(45) NOT NULL,
PRIMARY KEY (`ApplicationID`, `Email`)
);
INSERT INTO EmailReferences (ApplicationID, Email)
VALUES
(1, 'ref10#test.org'), (1, 'ref11#test.org'), (1, 'ref12#test.org'),
(2, 'ref20#test.org'), (2, 'ref21#test.org'), (2, 'ref22#test.org'),
(3, 'ref30#test.org'), (3, 'ref31#test.org'), (3, 'ref32#test.org'),
(4, 'ref40#test.org'), (4, 'ref41#test.org'), (4, 'ref42#test.org'),
(5, 'ref50#test.org'), (5, 'ref51#test.org'), (5, 'ref52#test.org'),
(6, 'ref60#test.org'), (6, 'ref11#test.org'), (6, 'ref62#test.org'),
(7, 'ref70#test.org'), (7, 'ref71#test.org'), (7, 'ref72#test.org'),
(8, 'ref10#test.org'), (8, 'ref81#test.org'), (8, 'ref82#test.org')
;
CREATE TABLE IF NOT EXISTS `FinalDecision` (
`ApplicationID` INT NOT NULL,
`Decision` ENUM('Accepted', 'Denied') NOT NULL,
PRIMARY KEY (`ApplicationID`)
);
INSERT INTO FinalDecision (ApplicationID, Decision)
VALUES
(1, 'Accepted'), (2, 'Denied'),
(3, 'Accepted'), (4, 'Denied'),
(5, 'Denied'), (6, 'Denied'),
(7, 'Denied'), (8, 'Accepted')
;
select email, CountDecision
from (
SELECT er.email, COUNT(fd.Decision) AS CountDecision
, max(COUNT(fd.Decision)) over() maxCountDecision
FROM EmailReferences AS er
JOIN FinalDecision AS fd ON er.ApplicationID = fd.ApplicationID
WHERE fd.Decision = 'Accepted'
GROUP BY er.email
) d
where CountDecision = maxCountDecision
email | CountDecision
:------------- | ------------:
ref10#test.org | 2
dbfiddle here

Selecting only the first two items from and order

I need you help regarding something, i have 3 tables ORDERS, ORDER_ITEM, ORDER_ITEM_LINE.
CREATE TABLE orders
(`id` int, `date` datetime)
;
INSERT INTO orders
(`id`, `date`)
VALUES
(78, '2017-01-03 00:00:00'),
(79, '2017-02-03 00:00:00'),
(80, '2017-03-03 00:00:00'),
(81, '2017-04-03 00:00:00'),
(82, '2017-05-03 00:00:00'),
(83, '2017-06-03 00:00:00'),
(84, '2017-07-03 00:00:00')
;
CREATE TABLE order_item
(`id` int, `fk_o_id` int, `sku` int)
;
INSERT INTO order_item
(`id`, `fk_o_id`, `sku`)
VALUES
(10, 78, 123),
(11, 79, 124),
(12, 79, 125),
(13, 80, 126),
(14, 82, 127),
(15, 82, 128),
(16, 82, 129)
;
CREATE TABLE order_item_line
(`id` int, `fk_oi_id` int, `line_id` int)
;
INSERT INTO order_item_line
(`id`, `fk_oi_id`, `line_id`)
VALUES
(33, 10, 1),
(34, 11, 1),
(35, 12, 2),
(36, 13, 1),
(37, 14, 1),
(38, 15, 2),
(39, 16, 3)
;
I would like to display all orders with 2 or more than 2 items but only first two so it will be line_id - 1 and 2.
The outcome should look like:
Outcome
If you have any ideas, thank you in advance.
To get the result you require, you will need to create another table. In this example I created a table called TESTQUERY and inserted data to count how many times the orders id appeared
Table creation
CREATE TABLE TESTQUERY
(`id` int, `count` int)
Data into the test table
INSERT INTO TESTQUERY
(
SELECT o.id, COUNT(o.id) as count FROM orders o
JOIN order_item oi ON oi.fk_o_id = o.id
JOIN order_item_line oil ON oil.fk_oi_id = oi.id
GROUP BY o.id
)
I then queried against all for databases using the query below and it returned your desired outcome
SELECT o.id, oi.sku, oil.line_id FROM orders o
JOIN order_item oi ON oi.fk_o_id = o.id
JOIN order_item_line oil ON oil.fk_oi_id = oi.id
JOIN TESTQUERY t ON t.id = o.id
WHERE t.count > 1 AND oil.line_id < 3
I hope this helps

MySQL : SELECT all users with 5 unopened messages in there last 5 messages received in a messages table

I'm using mysql.
I have a messages table with userid, message_id, opened (true or false), timestamp.
I want all users who did not open a message in there last 5 messages received
This is what I have right now:
SELECT mnc.userid
FROM `messages` mnc
WHERE (select count(*) from messagesas m where m.userid = mnc.userid
and m.message_sendtime_timestamp >= mnc.message_sendtime_timestamp
and m.opened = 'FALSE') >= 6
But, this give me users with more than 6 unopened messages
not necessarily consecutive
Here are sample data
CREATE TABLE messages
(`user_id` int, `timestamp` datetime, `opened` varchar(5))
;
INSERT INTO messages
(`user_id`, `timestamp`, `opened`)
VALUES
(1, '2016-01-01 00:00:00', 'false'),
(1, '2016-02-01 00:00:00', 'false'),
(1, '2016-03-01 00:00:00', 'false'),
(1, '2016-04-01 00:00:00', 'false'),
(1, '2016-05-01 00:00:00', 'false'),
(1, '2016-06-01 00:00:00', 'false'),
(2, '2016-01-01 00:00:00', 'false'),
(2, '2016-02-01 00:00:00', 'false'),
(2, '2016-03-01 00:00:00', 'false'),
(3, '2015-01-01 00:00:00', 'false'),
(3, '2016-01-01 00:00:00', 'false'),
(3, '2016-02-01 00:00:00', 'false'),
(3, '2016-03-01 00:00:00', 'false'),
(3, '2016-04-01 00:00:00', 'false'),
(3, '2016-05-01 00:00:00', 'true'),
(3, '2016-06-01 00:00:00', 'false'),
(4, '2015-01-01 00:00:00', 'true'),
(4, '2015-02-01 00:00:00', 'true'),
(4, '2016-01-01 00:00:00', 'false'),
(4, '2016-02-01 00:00:00', 'false'),
(4, '2016-03-01 00:00:00', 'false'),
(4, '2016-04-01 00:00:00', 'false'),
(4, '2016-05-01 00:00:00', 'false'),
(4, '2016-06-01 00:00:00', 'false')
Expected result :
userid
1
4
to answer this question
I want all users who did not open a message in last 5 messages received ?
First you need create a row_id for each user_id
SELECT #rowid := IF(#prev_value = user_id, #rowid + 1, 1) as row_id,
m.*
#prev_value := user_id
FROM messages m,
(SELECT #row_num := 1) x,
(SELECT #prev_value := '') y
ORDER BY `timestamp` DESC
Then check how many open message you have on that subquery
SQL Fiddle Demo
SELECT user_id, COUNT(*), SUM(opened = 'false')
FROM (
SELECT #rowid := IF(#prev_value = user_id, #rowid + 1, 1) as row_id,
m.*,
#prev_value := user_id
FROM messages m,
(SELECT #row_num := 1) x,
(SELECT #prev_value := '') y
ORDER BY user_id, `timestamp` DESC
) T
WHERE row_id <= 5 -- only check last 5 or less messages
GROUP BY user_id
HAVING COUNT(*) = SUM(opened = 'false') -- Check all messages are NOT opened
SELECT
MAX(CASE WHEN (t.ct = 5 and t.op=5) THEN t.user_id END) AS userid
FROM
(
SELECT
user_id,
opened,timestamp ,
#opened := opened,
IF ( (#opened = 'false' && #prev = user_id) ,#o := #o + 1,#o := 1),
IF(#opened='true',#o:=0,#o) op,
IF (#prev = user_id ,#c := #c + 1,(#c := 1)) ct,
#prev := user_id
FROM (SELECT #prev := 0 ,#c := 1,#opened :='0',#o := 0) var,
messages
order by user_id asc,timestamp desc
) t
GROUP BY t.user_id
check herehttp://sqlfiddle.com/#!9/8447a3/1