MySQL Days Between One Order and Next Order Having > 1 Order - mysql

My goal is to find how many customers order more than once between the criteria below. Or to put into other terms, how long it takes for customers to place their next order with these criteria:
0 - 12 mos
13 - 24 mos
25 - 36 mos
37+ mos
Table is set up by line item on each order. like this:
Customer | Item | OrderNumber | OrderDate
1500 item1 5555 2015-02-01
1500 item2 5555 2015-02-01
1500 item34 5255 2014-05-25
1500 item44 4100 2012-12-30
2200 item55 5100 2014-02-15
2200 item1 5100 2014-02-15
3255 item12 5300 2015-03-05
3255 item34 5399 2014-05-01
3255 item22 5399 2014-05-01
So if it takes less than 12 mos for a customer to order more than once then it should be counted towards the "0-12 mos". If a customer took 18 mos to place their next order they would be counted towards the "13-24 mos" and so on and so forth.
I don't really know where to begin on this one. I probably will have to at least have: HAVING COUNT(DISTINCT OrderNumber) > 1. I have never used LAG, is this something that I should utilize a MySQL variant of to find the next OrderDate in the sequence?
Any help would be appreciated to at least start to identify the components of the query needed.

If you just wanted the average time between orders, you can do something like this:
select floor(days_between / 365) as numyears, count(*)
from (select customer, datediff(max(orderdate), min(orderdate)) as days_between
count(*) as numorders
from orders
group by customer
having count(*) >= 2
) c
group by numyears;
If you really want to understand the timing between orders, learn about survival analysis, particularly recurrent event analysis. Your question, although well-enough formed, is rather naive because it does not consider customers with only one order nor the time since a customer's last order.

Related

SQL subquery in SELECT clause

I'm trying to find admin activity within the last 30 days.
The accounts table stores the user data (username, password, etc.)
At the end of each day, if a user had logged in, it will create a new entry in the player_history table with their updated data. This is so we can track progress over time.
accounts table:
id
username
admin
1
Michael
4
2
Steve
3
3
Louise
3
4
Joe
0
5
Amy
1
player_history table:
id
user_id
created_at
playtime
0
1
2021-04-03
10
1
2
2021-04-04
10
2
3
2021-04-05
15
3
4
2021-04-10
20
4
5
2021-04-11
20
5
1
2021-05-12
40
6
2
2021-05-13
55
7
3
2021-05-17
65
8
4
2021-05-19
75
9
5
2021-05-23
30
10
1
2021-06-01
60
11
2
2021-06-02
65
12
3
2021-06-02
67
13
4
2021-06-03
90
The following query
SELECT a.`username`, SEC_TO_TIME((MAX(h.`playtime`) - MIN(h.`playtime`))*60) as 'time' FROM `player_history` h, `accounts` a WHERE h.`created_at` > '2021-05-06' AND h.`user_id` = a.`id` AND a.`admin` > 0 GROUP BY h.`user_id`
Outputs this table:
Note that this is just admin activity, so Joe is not included in this data.
from 2021-05-06 to present (yy-mm-dd):
username
time
Michael
00:20:00
Steve
00:10:00
Louise
00:02:00
Amy
00:00:00
As you can see this from data, Amy's time is shown as 0 although she has played for 10 minutes in the last month. This is because she only has 1 entry starting from 2021-05-06 so there is no data to compare to. It is 0 because 10-10 = 0.
Another flaw is that it doesn't include all activity in the last month, basically only subtracts the highest value from the lowest.
So I tried fixing this by comparing the highest value after 2021-05-06 to their most previous login before the date. So I modified the query a bit:
SELECT a.`Username`, SEC_TO_TIME((MAX(h.`playtime`) - (SELECT MAX(`playtime`) FROM `player_history` WHERE a.`id` = `user_id` AND `created_at` < '2021-05-06'))*60) as 'Time' FROM `player_history` h, `accounts` a WHERE h.`created_at` >= '2021-05-06' AND h.`user_id` = a.`id` AND a.`admin` > 0 GROUP BY h.`user_id`
So now it will output:
username
time
Michael
00:50:00
Steve
00:50:00
Louise
00:52:00
Amy
00:10:00
But I feel like this whole query is quite inefficient. Is there a better way to do this?
I think you want lag():
SELECT a.username,
SEC_TO_TIME(SUM(h.playtime - COALESCE(h.prev_playtime, 0))) as time
FROM accounts a JOIN
(SELECT h.*,
LAG(playtime) OVER (PARTITION BY u.user_id ORDER BY h.created_at) as prev_playtime
FROM player_history h
) h
ON h.user_id = a.id
WHERE h.created_at > '2021-05-06' AND
a.admin > 0
GROUP BY a.username;
In addition to the LAG() logic, note the other changes to the query:
The use of proper, explicit, standard, readable JOIN syntax.
The use of consistent columns for the SELECT and GROUP BY.
The removal of single quotes around the column alias.
The removal of backticks; they just clutter the query, making it harder to write and to read.

mysql group by day and count then filter only the highest value for each day

I'm stuck on this query. I need to do a group by date, card_id and only show the highest hits. I have this data:
date card_name card_id hits
29/02/2016 Paul Stanley 1345 12
29/02/2016 Phil Anselmo 1347 16
25/02/2016 Dave Mustaine 1349 10
25/02/2016 Ozzy 1351 17
23/02/2016 Jhonny Cash 1353 13
23/02/2016 Elvis 1355 15
20/02/2016 James Hethfield 1357 9
20/02/2016 Max Cavalera 1359 12
My query at the moment
SELECT DATE(card.create_date) `day`, `name`,card_model_id, count(1) hits
FROM card
Join card_model ON card.card_model_id = card_model.id
WHERE DATE(card.create_date) >= DATE(DATE_SUB(NOW(), INTERVAL 1 MONTH)) AND card_model.preview = 0
GROUP BY `day`, card_model_id
;
I want to group by date, card_id and filter the higher hits result showing only one row per date. As if I run a max(hits) with group by but I won't work
Like:
date card_name card_id hits
29/02/2016 Phil Anselmo 1347 16
25/02/2016 Ozzy 1351 17
23/02/2016 Elvis 1355 15
20/02/2016 Max Cavalera 1359 12
Any light on that will be appreciated. Thanks for reading.
Here is one way to do this. Based on your sample data (not the query):
select s.*
from sample s
where s.hits = (select max(s2.hits)
from sample s2
where date(s2.date) = date(s.date)
);
Your attempted query seems to have no relationship to the sample data, so it is unclear how to incorporate those tables (the attempted query has different columns and two tables).

sum on column for records returned from another query

I'm wondering what the best way to do this is:
I have a table voldata(vol_id, key_id, dte, volume) that will have hundreds of thousands of entries. there are thousands of unique key_id entries. Each key_id entry has about 100 dte entries (ie, date). So the table might look something like this:
vol_id key_id dte volume
...
186303 K_DXNRTDZL 2013-09-01 2900
186304 K_DXNRTDZL 2013-10-01 4400
186305 K_DXNRTDZL 2013-11-01 4400
186306 K_DXNRTDZL 2013-12-01 4400
...
186433 K_WXDNKG3O 2014-03-01 8100
186434 K_WXDNKG3O 2014-04-01 8100
186435 K_WXDNKG3O 2014-05-01 6600
186436 K_WXDNKG3O 2014-06-01 8100
...
186338 K_X4TSU3RD 2014-01-01 5400
186339 K_X4TSU3RD 2014-02-01 6600
186340 K_X4TSU3RD 2014-03-01 8100
186341 K_X4TSU3RD 2014-04-01 8100
I have another table catkeydata(catkey_id, cat_id, key_id). Each cat_id (category) is made up of potentially hundreds of key_ids. So this table stores data like:
catkey_id cat_id key_id
7305 C_B3ZRB0QR K_DXNRTDZL
7306 C_B3ZRB0QR K_X4TSU3RD
7307 C_B3ZRB0QR K_G7TBKU83
7308 C_B3ZRB0QR K_8X0L681N
7312 C_B3ZRB0QR K_WXDNKG3O
ie, the category C_B3ZRB0QR is made up of the 5 key_ids shown there. I want to be able to SUM the volume from the voldata table for a given date and a given category. So, I'll be requesting the full list of key_ids that make up a given category then I want to sum the volume for those keywords for a specific date.
Is there an easy/one-line way to do that without looping through all the keywords in the category? Thanks!
Try something like
SELECT SUM(volume) FROM voldata INNER JOIN catkeydata ON
voldata.key_id = catkeydata.key_id WHERE cat_id = ‘C_B3ZRB0QR’
AND dte = ‘2013-09-01’;
Would also bring in the Categories that have no children and give visibility to that.
$query = "SELECT c.CAT_ID, SUM(v.VOLUME) FROM CATKEYDATA AS c LEFT JOIN VOLDATA v ON c.KEY_ID = v.KEY_ID AND v.DATE = '2014-06-01'";

MySQL sum amounts for transactions related to loans, in reverse order

I'm trying to add an amount living in a transactions table, but the adding must be done taking the transactions in reversed order, grouped by a loan they belong to. Let me put some examples:
Transactions
tid loanid amount entrydate
------------------------------------
1 1 1,500 2013-06-01
2 2 1,500 2013-06-01
3 1 1,000 2013-06-02
4 3 2,300 2013-06-04
5 5 2,000 2013-06-04
6 1 1,100 2013-06-07
7 2 1,000 2013-06-09
| | | |
Loans
loanid
------
1
2
3
4
5
|
As you can see, there's no transactions for loanid 4, just to make clear the point that there's no obligation for the transactions to exist for every loan.
Now, what I'm trying to achieve is to sum up the amounts for the transactions of each loan. This first approach achieves this:
SELECT tr.tid,
l.loanid,
tr.entrydate,
tr.amount,
#prevLoan:=l.loanid prevloan,
#amnt:=if(#prevLoan:=l.loanid, #amnt+tr.amount, tr.amount) totAmnt
FROM (SELECT * FROM Transactions) tr
JOIN (SELECT #prevLoan:=0, #amnt:=0) t
JOIN Loans l
ON l.loanid = tr.loanid
GROUP BY l.loanid, tr.tid
Which achieves something like this:
tid loanid entrydate amount prevloan totAmnt
-----------------------------------------------------------
1 1 2013-06-01 1,500 1 1,500
3 1 2013-06-02 1,000 1 2,500
6 1 2013-06-07 1,100 1 3,600 <-- final result for loanid=1
2 2 2013-06-01 1,500 2 1,500
7 2 2013-06-09 1,000 2 2,500 <-- final result for loanid=2
4 3 2013-06-04 2,300 3 2,300 <-- final result for loanid=3
5 5 2013-06-04 2,000 5 2,000 <-- final result for loanid=5
| | | | | |
As you can see, for each loan, the amount of the transactions belonging to it are getting sumed up in the totAmnt column, so the last transaction of each loan has the total sum of the transactions for the same loan.
Now.... what I actually need is something to get the sum to be done in reversed order. I mean, for the same transactions of each loan, the sum still gets the same result, but I need the sum to be done from the last transaction of each loan up to the first one.
I've tried the following, but to no avail (it's the same query as the last one, but with an Order By DESC on the FROM transactions table):
SELECT tr.tid,
l.loanid,
tr.entrydate,
tr.amount,
#prevLoan:=l.loanid prevloan,
#amnt:=if(#prevLoan:=l.loanid, #amnt+tr.amount, tr.amount) totAmnt
FROM (SELECT * FROM Transactions ORDER BY tr.entrydate DESC) tr
JOIN (SELECT #prevLoan:=0, #amnt:=0) t
JOIN Loans l
ON l.loanid = tr.loanid
GROUP BY l.loanid, tr.tid
I'm using tr.entrydate because is a more familiar way to say the order criteria, and besides that's what policy says is the valid order criteria, tid may say something but entrydate is the ordering column of the Transactions table...
Using the previous query, I just get the same results I get with the first query, so I guess something must be missing there. What I need is to get results as the following:
tid loanid entrydate amount prevloan totAmnt
-----------------------------------------------------------
6 1 2013-06-07 1,100 1 1,100
3 1 2013-06-02 1,000 1 2,100
1 1 2013-06-01 1,500 1 3,600 <-- final result for loanid=1
7 2 2013-06-09 1,000 2 1,000
2 2 2013-06-01 1,500 2 2,500 <-- final result for loanid=2
4 3 2013-06-04 2,300 3 2,300 <-- final result for loanid=3
5 5 2013-06-04 2,000 5 2,000 <-- final result for loanid=5
| | | | | |
As you can see the sum for each loanid gets the same final result, but the sum is done for the transactions in reversed order...
Hope all this mess is clear... How can I achieve such a result?
You appear to be VERY close... I think you have two small adjustments. First, dont use the outer GROUP BY as you are not doing any aggregations (sum, min, max, avg, etc). Second, when querying your transaction table, just order that by the loan ID FIRST, THEN the date descending... This way all the loan IDs are in proper order grouped together, but within each loan, THEY are sorted descending order as you are looking for. Also, adjust your #prevLoan AFTER you have accumulated so the current record can be compared to the next. You are starting the #variable to zero so it won't match the first loan ID on the first run anyhow. Finally, you don't even need the join to the loan table since the transaction table has the loan ID to use as basis of comparison test. Since the inner-most query is ordered by loan and then entry date descending, you should not need it again at the outer query.
SELECT
tr.tid,
tr.loanid,
tr.entrydate,
tr.amount,
#amnt := if( #prevLoan = tr.loanid, #amnt+tr.amount, tr.amount) totAmnt,
#prevLoan := tr.loanid prevloan
FROM
( SELECT *
FROM Transactions
ORDER BY loanid, entrydate DESC) tr
JOIN (SELECT #prevLoan := 0,
#amnt := 0) t
Alternate Solution?? Per my comment, it looks like you want the high totals and shrinking down... Is this closer?
SELECT
tr.tid,
tr.loanid,
tr.entrydate,
tr.amount,
trTotals.TotalLoans - if( #prevLoan = tr.loanid, #amnt, 0 ) as NewBal,
#amnt := if( #prevLoan = tr.loanid, #amnt+tr.amount, tr.amount) runningTotals,
#prevLoan := tr.loanid prevloan
FROM
( SELECT *
FROM Transactions
ORDER BY loanid, entrydate DESC) tr
JOIN ( SELECT loanid, sum( amount ) as TotalLoans
FROM Transactions
group by loanid ) trTotals
on tr.loanid = trTotals.loanid
JOIN (SELECT #prevLoan := 0,
#amnt := 0) t
Produces... (Total Paid) (for reversing calc)
tid loanid entrydate amount NewBal Running Totals prevLoan
6 1 2013-06-07 1100 3600 1100 1
3 1 2013-06-02 1000 2500 2100 1
1 1 2013-06-01 1500 1500 3600 1
7 2 2013-06-09 1000 2500 1000 2
2 2 2013-06-01 1500 1500 2500 2
4 3 2013-06-04 2300 2300 2300 3
5 5 2013-06-04 2000 2000 2000 5

SQL query by date

MySql 5.5.
I have a table that represents a work assignment:
empId jobNo workDate hours
4 441 10/1/2012 10
4 441 9/1/2012 22
4 441 8/1/2012 6
And one that represents salary:
empId effDate rate
4 10/1/2012 6.50
4 9/1/2012 5.85
4 6/1/2012 4.00
The salary applies to all work performed on or after the effective date. So the rate in jun, jul, and aug is 4.00; sep is 5.85, and oct is 6.50.
If I naively query for October's work:
SELECT Work.empId, Work.jobNo, Work.workDate, Work.hours, Salary.effDate, Salary.rate
FROM Work
JOIN Salary ON Work.empId = Salary.empId
WHERE Work.workDate <= '2012-10-01'
AND Salary.effDate <= Work.workDate
ORDER BY Work.jobNo ASC, Work.workDate DESC;
I do not get what I want. I get something like
4 441 10/1/2012 10 10/1/2012 6.50
4 441 10/1/2012 10 9/1/2012 5.85
4 441 10/1/2012 10 6/1/2012 4.00
4 441 9/1/2012 22 9/1/2012 5.85
4 441 9/1/2012 22 6/1/2012 4.00
4 441 8/1/2012 6 6/1/2012 4.00
When I want
4 441 10/1/2012 10 10/1/2012 6.50
4 441 9/1/2012 22 9/1/2012 5.85
4 441 8/1/2012 6 6/1/2012 4.00
I can't quite wrap my head around how to create the query I want.
The real situation has multiple employees, multiple jobs, obviously.
Thanks for your help.
Here is your actual issue: you want to be able to detect, for each record in Work, what is the corresponding effective rate, according to the work date x salary effective date. When you simply do Salary.effDate <= WORK.workDate you get ALL rates before the work date. But you only want the most recent one.
This is a slightly complicated variant of the greatest-n-per-group problem. There are many ways of doing this, here is one:
SELECT sel.*, Salary.Rate
FROM
(
SELECT Work.empId, Work.jobNo, Work.workDate,
Work.hours, Max(Salary.effDate) effDate
FROM WORK
JOIN Salary ON WORK.empId = Salary.empId
WHERE WORK.workDate <= '2012-10-01'
AND Salary.effDate <= WORK.workDate
GROUP BY WORK.empId, WORK.jobNo, WORK.workDate, WORK.hours
ORDER BY WORK.jobNo ASC, WORK.workDate DESC
) sel
INNER JOIN Salary ON sel.empId = Salary.empId
AND sel.EffDate = Salary.EffDate
First of all, the inner query detects the most recent salary effective date for each work record. Then, we join that with the Salary again to the rate.
See the working SQLFiddle.
You're using what's called a NATURAL JOIN. Try changing the word "JOIN" to "LEFT JOIN" which should group the results on the left, giving you the desired results.
Assuming the salary table has a primary or alternate key (unique index) consisting of the columns empId and effDate, I'd do something like this:
select w.empID as EMPLOYEE_ID ,
w.jobNo as JOB_NUMBER ,
w.workDate as DATE_WORKED ,
w.hours as HOURS_WORKED ,
rate.HourlyWage as HOURLY_WAGE ,
w.hours * rate.HourlyWage as WAGES_CHARGED ,
rate.effDateFrom as HOURLY_WAGE_EFFECTIVE_DATE
from work w
join ( select sfrom.EmpId as EmpID ,
sfrom.rate as HourlyWage ,
sfrom.EffDate as effDateFrom ,
( select min(Effdate)
from salary t
where t.empId = sfrom.EmpId
and t.effDate > sfrom.EffDate
) as effDateThru
from salary sfrom
) rate on rate.empID = w.empID
and rate.EffDateFrom <= w.workDate
and ( rate.effDateThru is null -- if rate has not end date, is the current period
or rate.effDateThru > w.workDate -- 'date-thru' represents the start date of the next period, so the this upper bound is EXCLUSIVE
)
we join the work table against a virtual rate table that gives us each employee's wage and the date range for which it is effective. The 'current' row for each employee will have the thru/expiry date set to null. And...since the thru/expiry date is actually the effective date for the next salary entry, the upper bound is exclusive rather than inclusive. Consequently, the range test must test for null and one can't use between.