Identifying users with a downward trend SQL - mysql

Trying to identify a list of customers who's quantity decreases from their previous purchase.
In this example we see that with each new purchase Mary's quantity decreases over time. However, while Bob shows a decline, he would not yield in the results because on 9/19 he purchased 8 quantities which is greater than his previous purchase of 5.
I'm trying to figure out a query for this for the life of me I can't seem to get it together
Customer PurchaseDate Quantity
Bob 9/1/2021 10
Bob 9/10/2021 6
Bob 9/18/2021 5
Bob 9/19/2021 8
Mary 9/1/2021 10
Mary 9/10/2021 6
Mary 9/18/2021 5
Mary 9/19/2021 3
Frank 9/1/2021 5
Lucus 9/1/2021 5
Lucus 9/10/2021 6
Lucus 9/18/2021 10
End results should be
Customer
Mary

This is a bit tricky, and to find results that are steadily increasing or decreasing you would probably want to use the MATCH_RECOGNIZE clause, which MySQL doesn't (yet) support. This way you can define a pattern whereby each qty is less than than the previous value. Additionally, you could probably do this with a recursive cte, but that would be outside of my abilities.
Here is what I came up with, with the caveat that it only compares the first and last values:
WITH
tbl (customer, purchasedate, quantity) AS (
SELECT * FROM VALUES
('Bob', '9/1/2021', 10),
('Bob', '9/10/2021', 6),
('Bob', '9/18/2021', 5),
('Bob', '9/19/2021', 8),
('Mary', '9/1/2021', 10),
('Mary', '9/10/2021', 6),
('Mary', '9/18/2021', 5),
('Mary', '9/19/2021', 3),
('Frank', '9/1/2021', 5),
('Lucus', '9/1/2021', 5),
('Lucus', '9/10/2021', 6),
('Lucus', '9/18/2021', 10)
)
SELECT
DISTINCT customer
FROM
tbl
QUALIFY
FIRST_VALUE(quantity) OVER (partition BY customer ORDER BY purchasedate)
> LAST_VALUE(quantity) OVER (PARTITION BY customer ORDER BY purchasedate)
Which gives:
CUSTOMER
Bob
Mary
Or, to get strictly decreasing with a known max, you can chain them all together which gets pretty ugly:
WITH
tbl (customer, purchasedate, quantity) AS (
SELECT * FROM VALUES
('Bob', '9/1/2021', 10),
('Bob', '9/10/2021', 6),
('Bob', '9/18/2021', 5),
('Bob', '9/19/2021', 8),
('Mary', '9/1/2021', 10),
('Mary', '9/10/2021', 6),
('Mary', '9/18/2021', 5),
('Mary', '9/19/2021', 3),
('Frank', '9/1/2021', 5),
('Lucus', '9/1/2021', 5),
('Lucus', '9/10/2021', 6),
('Lucus', '9/18/2021', 10)
)
SELECT
DISTINCT customer
FROM
tbl
qualify
(NTH_VALUE(quantity, 1) OVER (partition BY customer ORDER BY purchasedate) >= NTH_VALUE(quantity, 2) OVER (partition BY customer ORDER BY purchasedate))
and ((NTH_VALUE(quantity, 2) OVER (partition BY customer ORDER BY purchasedate) >= NTH_VALUE(quantity, 3) OVER (partition BY customer ORDER BY purchasedate)) or (NTH_VALUE(quantity, 3) OVER (partition BY customer ORDER BY purchasedate) is null))
and ((NTH_VALUE(quantity,3) OVER (partition BY customer ORDER BY purchasedate) >= NTH_VALUE(quantity, 4) OVER (partition BY customer ORDER BY purchasedate)) or (NTH_VALUE(quantity, 4) OVER (partition BY customer ORDER BY purchasedate) is null))
Which gives:
CUSTOMER
Mary
Though for an unknown amount I would think match_recognize would be the best solution (or you could add in some recursion or a custom function).

SELECT Customer
FROM ( SELECT CASE WHEN Customer = #customer AND Quantity > #quantity
THEN 1
ELSE 0
END AS increase_detected,
#customer := Customer Customer,
PurchaseDate,
#quantity := Quantity Quantity
FROM test
CROSS JOIN ( SELECT #customer := NULL, #quantity := NULL ) init_variables
ORDER BY Customer, PurchaseDate
) subquery
GROUP BY Customer
HAVING NOT SUM(increase_detected);
https://dbfiddle.uk/?rdbms=mysql_5.6&fiddle=68b75b0df7fe4b383896e78db0caa569

Related

How to find the difference between two timestamped rows whenever value changes between rows in MySQL

My Data Set looks like this:
The Output given in column D is derived as follows:
Output against index 2 : TimeStamp in Index 3 - TimeStamp in Index 2
Output against index 6 : TimeStamp in Index 10 - TimeStamp in Index 6
Output against index 12 : TimeStamp in Index 15 - TimeStamp in Index 12
DataSet MySQL V2012
create table #temp11 (Index# int, TimeStamp# Datetime, Alarm int)
insert into #temp11 values
(1, '10/6/2019 00:08:01', 0),
(2, '10/6/2019 00:08:13' ,1),
(3, '10/6/2019 00:08:15' ,1),
(4, '10/6/2019 00:10:47' ,0),
(5, '10/6/2019 00:10:58' ,0),
(6, '10/6/2019 00:10:59' ,1),
(7, '10/6/2019 00:11:00' ,1),
(8, '10/6/2019 00:11:01' ,1),
(9, '10/6/2019 00:11:02' ,1),
(10, '10/6/2019 00:11:03' ,1),
(11, '10/6/2019 00:11:04' ,0),
(12, '10/6/2019 00:11:05' ,1),
(13, '10/6/2019 00:11:06' ,1),
(14, '10/6/2019 00:11:07' ,1),
(15,'10/6/2019 00:11:15' ,1)
TIA
This is a variant of the gaps-and-islands problem. Here is one way to solve it using window functions (available in MySQL 8.0):
select
t.*,
case when
alarm = 1
and row_number() over(partition by alarm, rn1 - rn2 order by TimeStamp) = 1
then timestampdiff(
second,
min(TimeStamp) over(partition by alarm, rn1 - rn2),
max(TimeStamp) over(partition by alarm, rn1 - rn2)
)
end out
from (
select
t.*,
row_number() over(order by TimeStamp) rn1,
row_number() over(partition by alarm order by TimeStamp) rn2
from mytable t
) t
The inner query ranks record in the whole table and in partition of records sharing the same alarm. The difference between the ranks gives you the group each record belong to.
Then, the outer query identifies the first record in each group with alarm = 1, and computes the difference between the first and last record in the group, in seconds.

Calculating product purchases in a Financial Year | SQL Server

I would like to find out product purchases for 2 financial years (FY16-17 & FY17-18).
To go about it:
OwnerID: 101, the first purchase is in 2014 with 3 purchases in FY17-18.
OwnerID: 102, the first purchase is in 2011 with 1 purchase in FY16-17, 1 purchase in FY17-18.
OwnerID: 103, the first purchase is in 2017 however should not be considered as he's a new customer with only 1 purchase in FY17-18. (i.e. first purchase not considered if new customer)
OwnerID: 104, the first purchase is in 2016 but made 3 more purchases in FY16-17.
Code:
CREATE TABLE Test
(
OwnerID INT,
ProductID VARCHAR(255),
PurchaseDate DATE
);
INSERT INTO Test (OwnerID, ProductID, PurchaseDate)
VALUES (101, 'P2', '2014-04-03'), (101, 'P9', '2017-08-09'),
(101, 'P11', '2017-10-05'), (101, 'P12', '2018-01-15'),
(102, 'P1', '2011-06-02'), (102, 'P3', '2016-06-03'),
(102, 'P10', '2017-09-01'),
(103, 'P8', '2017-06-23'),
(104, 'P4', '2016-12-17'), (104, 'P5', '2016-12-18'),
(104, 'P6', '2016-12-19'), (104, 'P7', '2016-12-20');
Desired output:
FY16-17 FY17-18
-----------------
5 4
I tried the below query to fetch records that aren't first occurrence and there by fetching the count within financial years:
SELECT *
FROM
(SELECT
ROW_NUMBER() OVER(PARTITION BY OwnerID ORDER BY PurchaseDate) AS OCCURANCE
FROM Test
GROUP BY OwnerID, PurchaseDate)
WHERE
OCCURANCE <> 1
However it throws an error:
Msg 102, Level 15, State 1, Line 5
Incorrect syntax near ')'.
The subquery needs to have an alias - try this:
SELECT *
FROM
(SELECT
ROW_NUMBER() OVER(PARTITION BY OwnerID ORDER BY PurchaseDate) AS OCCURRENCE
FROM Test
GROUP BY OwnerID, PurchaseDate) subQry
WHERE
subQry.OCCURRENCE <> 1
I am using IIF to separate the two fiscal years and subquery to filter out those with only one purchase
SELECT SUM(IIF(PurchaseDate >= '2016-04-01' AND PurchaseDate < '2017-04-01',1,0)) AS 'FY16-17',
SUM(IIF(PurchaseDate >= '2017-04-01' AND PurchaseDate < '2018-04-01',1,0)) AS 'FY17-18'
FROM test t1
JOIN (SELECT ownerID, COUNT(*) count
FROM test
GROUP BY ownerID) t2 on t1.ownerID = t2.ownerID
WHERE t2.count > 1

Count consecutive rows with a particular status

I need to count whether there are three consecutive failed login attempts of the user in last one hour.
For example
id userid status logindate
1 1 0 2014-08-28 10:00:00
2 1 1 2014-08-28 10:10:35
3 1 0 2014-08-28 10:30:00
4 1 0 2014-08-28 10:40:00
In the above example, status 0 means failed attempt and 1 means successful attempt.
I need a query that will count three consecutive records of a user with status 0 occurred in last one hour.
I tried below query
SELECT COUNT( * ) AS total, Temp.status
FROM (
SELECT a.status, MAX( a.id ) AS idlimit
FROM loginAttempts a
GROUP BY a.status
ORDER BY MAX( a.id ) DESC
) AS Temp
JOIN loginAttempts t ON Temp.idlimit < t.id
HAVING total >1
Result:
total status
2 1
I don't know why it display status as 1. I also need to add a where condition on logindate and status field but don't know how would it work
For consecutive count you can use user defined variables to note the series values ,like in below query i have use #g and #r variable, in inner query i am storing the current status value that could be 1/0 and in case expression i am comparing the value stored in #g with the status column if they both are equal like #g is holding previous row value and previous row's status is equal to the current row's status then do not change the value stored in #r,if these values don't match like #g <> a.status then increment #r with 1, one thing to note i am using order by with id column and assuming it is set to auto_increment so for consecutive 1s #r value will be same like #r was 3 for first status 1 and the again status is 1 so #r will 3 until the status changes to 0 same for status 0 vice versa
SELECT t.userid,t.consecutive,t.status,COUNT(1) consecutive_count
FROM (
SELECT a.* ,
#r:= CASE WHEN #g = a.status THEN #r ELSE #r + 1 END consecutive,
#g:= a.status g
FROM attempts a
CROSS JOIN (SELECT #g:=2, #r:=0) t1
WHERE a.`logindate` BETWEEN '2014-08-28 10:00:00' AND '2014-08-28 11:00:00'
ORDER BY id
) t
GROUP BY t.userid,t.consecutive,t.status
HAVING consecutive_count >= 3 AND t.status = 0
Now in parent query i am grouping results by userid the resultant value of case expression i have name is it as consecutive and status to get the count for each user's consecutive status
One thing to note for above query that its necessary to provide the
hour range like i have used between without this it will be more
difficult to find exactly 3 consecutive statuses with in an hour
Sample data
INSERT INTO attempts
(`id`, `userid`, `status`, `logindate`)
VALUES
(1, 1, 0, '2014-08-28 10:00:00'),
(2, 1, 1, '2014-08-28 10:10:35'),
(3, 1, 0, '2014-08-28 10:30:00'),
(4, 1, 0, '2014-08-28 10:40:00'),
(5, 1, 0, '2014-08-28 10:50:00'),
(6, 2, 0, '2014-08-28 10:00:00'),
(7, 2, 0, '2014-08-28 10:10:35'),
(8, 2, 0, '2014-08-28 10:30:00'),
(9, 2, 1, '2014-08-28 10:40:00'),
(10, 2, 1, '2014-08-28 10:50:00')
;
As you can see from id 3 to 5 you can see consecutive 0s for userid 1 and similarly id 6 to 8 userid 2 has consecutive 0s and they are in an hour range using above query you can have results as below
userid consecutive status consecutive_count
------ ----------- ------ -------------------
1 2 0 3
2 2 0 3
Fiddle Demo
M Khalid Junaid's answer is great, but his Fiddle Demo didn't work for me when I clicked it.
Here is a Fiddle Demo which works as of this writing.
In case it doesn't work later, I used the following in the schema:
CREATE TABLE attempts
(`id` int, `userid` int, `status` int, `logindate` datetime);
INSERT INTO attempts
(`id`, `userid`, `status`, `logindate`)
VALUES
(1, 1, 0, '2014-08-28 10:00:00'),
(2, 1, 1, '2014-08-28 10:10:35'),
(3, 1, 0, '2014-08-28 10:30:00'),
(4, 1, 0, '2014-08-28 10:40:00'),
(5, 1, 0, '2014-08-28 10:50:00'),
(6, 2, 0, '2014-08-28 10:00:00'),
(7, 2, 0, '2014-08-28 10:10:35'),
(8, 2, 0, '2014-08-28 10:30:00'),
(9, 2, 1, '2014-08-28 10:40:00'),
(10, 2, 1, '2014-08-28 10:50:00')
;
And this as the query:
SELECT t.userid,t.consecutive,t.status,COUNT(1) consecutive_count
FROM (
SELECT a.* ,
#r:= CASE WHEN #g = a.status THEN #r ELSE #r + 1 END consecutive,
#g:= a.status g
FROM attempts a
CROSS JOIN (SELECT #g:=2, #r:=0) t1
WHERE a.`logindate` BETWEEN '2014-08-28 10:00:00' AND '2014-08-28 11:00:00'
ORDER BY id
) t
GROUP BY t.userid,t.consecutive,t.status
HAVING consecutive_count >= 3 AND t.status = 0;

Count occurrences that differ within a column

I want to be able to select the amount of times the data in columns Somedata_A and Somedata_B has changed from the from the previous row within its column. I've tried using DISTINCT and it works to some degree. {1,2,3,2,1,1} will show 3 when I want it to show 4 course there's 5 different values in sequence.
Example:
A,B,C,D,E,F
{1,2,3,2,1,1}
A compare to B gives a difference, B compare to C gives a difference . . . E compare to F gives not difference. All in all it gives 4 differences within a set of 6 values.
I have gotten DISTINCT to work but it does not really do the trick for me. And to add more to the question I'm really not interested it the whole range, lets say just the 2 last days/entries per Title.
Second I'm concern about performance issues. I tried the query below on a real set of data and it got interrupted probably due to timeout.
SQL Fiddle
MySQL 5.5.32 Schema Setup:
CREATE TABLE testdata(
Title varchar(10),
Date varchar(10),
Somedata_A int(5),
Somedata_B int(5));
INSERT INTO testdata (Title, Date, Somedata_A, Somedata_B) VALUES
("Alpha", '123', 1, 2),
("Alpha", '234', 2, 2),
("Alpha", '345', 1, 2),
("Alpha", '349', 1, 2),
("Alpha", '456', 1, 2),
("Omega", '123', 1, 1),
("Omega", '234', 2, 2),
("Omega", '345', 3, 3),
("Omega", '349', 4, 3),
("Omega", '456', 5, 4),
("Delta", '123', 1, 1),
("Delta", '234', 2, 2),
("Delta", '345', 1, 3),
("Delta", '349', 2, 3),
("Delta", '456', 1, 4);
Query 1:
SELECT t.Title, (SELECT COUNT(DISTINCT Somedata_A) FROM testdata AS tt WHERE t.Title = tt.Title) AS A,
(SELECT COUNT(DISTINCT Somedata_B) FROM testdata AS tt WHERE t.Title = tt.Title) AS B
FROM testdata AS t
GROUP BY t.Title
Results:
| TITLE | A | B |
|-------|---|---|
| Alpha | 2 | 1 |
| Delta | 2 | 4 |
| Omega | 5 | 4 |
Something like this may work: it uses a variable for row number, joins on an offset of 1 and then counts differences for A and B.
http://sqlfiddle.com/#!2/3bbc8/9/2
set #i = 0;
set #j = 0;
Select
A.Title aTitle,
sum(Case when A.SomeData_A <> B.SomeData_A then 1 else 0 end) AVar,
sum(Case when A.SomeData_B <> B.SomeData_B then 1 else 0 end) BVar
from
(SELECT Title, #i:=#i+1 as ROWID, SomeData_A, SomeData_B
FROM testdata
ORDER BY Title, date desc) as A
INNER JOIN
(SELECT Title, #j:=#j+1 as ROWID, SomeData_A, SomeData_B
FROM testdata
ORDER BY Title, date desc) as B
ON A.RowID= B.RowID + 1
AND A.Title=B.Title
Group by A.Title
This works (see here) (FYI: Your results in the question do not match your data - for instance, for Alpha, ColumnA: it never changes from 1. The answer should be 0)
Hopefully you can adapt this Statement to your actual data model
SELECT t1.title, SUM(t1.Somedata_A<>t2.Somedata_a) as SomeData_A
,SUM(t1.Somedata_b<>t2.Somedata_b) as SomeData_B
FROM testdata AS t1
JOIN testdata AS t2
ON t1.title = t2.title
AND t2.date = DATE_ADD(t1.date, INTERVAL 1 DAY)
GROUP BY t1.title
ORDER BY t1.title;

top contibuting users for particular category

I would like to find top contributors of particular state:
The candidates below have gathered particular votes for that state.
Find Top candidates for that states.
create table uservotes(id int, name varchar(50), vote int,state int);
INSERT INTO uservotes VALUES
(1, 'A', 34,1),
(2, 'B', 80,1),
(3, 'bA', 30,1),
(4, 'C', 8,1),
(5, 'D', 4,1),
(6, 'E', 14,2),
(7, 'F', 304,2),
(8, 'AA', 42,3),
(9, 'Ab', 6,3),
(10, 'Aa', 10,3);
States
create table states(state_id int, name_state varchar(50));
INSERT INTO states VALUES
(1, 'CA'),
(2, 'AL'),
(3, 'AZ'),
I am looking for:
for
CAL
2
1
3
4
5
based on the ranks of contribution.
How do I get that.
I really appreciate any help.
Thanks in Advance.
Code tried :
select uv.*, (#rank := #rank + 1) as rank
from uservotes uv,states s cross join
(select #rank := 0) const on uv.statesid = s.state_id
where name_state = 'CAL'
order by vote desc;
This is easy. You can use join and a group_concat():
select name_state, substring_index(group_concat(id order by votes desc), ',', 5)
from uservotes uv join
states s
on uv.state = s.state
group by name_state;
group_concat() will put all the id's in order with the highest votes first. substring_index() will extract the first five.
EDIT:
To get the top ranked users in one row, just add a where name_state = 'CA' to the above query.
To get them in different rows:
select uv.*
from uservotes uv join
states s
on uv.state = s.state
where state = 'CA'
order by votes desc
limit 5;