I have a list of drivers, orders, and dates in a table named all_data between 2022-01-01 and 2022-01-15 (15 days) like this:
driver_id
order_id
order_date
1
a
2022-01-01
1
b
2022-01-02
2
c
2022-01-01
2
d
2022-01-03
For all 15 days, how do I find the number of continually active drivers, who completed at least one order every single day, up to that date? The output should be a table like this:
order_date
active_drivers
2022-01-01
30
2022-01-02
27
2022-01-03
25
For example, on 2022-01-01, there were 30 unique drivers who completed at least one order that day. On 2022-01-02, we must find the number of unique drivers who completed at least one order on 2022-01-01 and 2022-01-02. On 2022-01-03, we must count drivers who completed at least one order on 2022-01-01, 2022-01-02, and 2022-01-03.
What I have tried
I found a similar solution in MySQL (below) but it is not allowed in bigquery because of the error "Unsupported subquery with table in join predicate".
MySQL
SELECT order_date,
(SELECT COUNT(distinct s1.driver_id) as num_hackers
FROM all_data s2
join all_data s1
on s2. order_date = s1. order_date and
(SELECT COUNT(distinct s3. order_date)
FROM all_data s3
WHERE s3.driver_id = s2.driver_id
AND s3. order_date < s1. order_date)
= datediff(s1. order_date, date('2022-01-01'), day)
))
from all_data
I also read this Google BigQuery: Rolling Count Distinct question but that is for a fixed 45 number of days, while the number of days here is a variable based on the date. How do I write a query in BigQuerySQL to find the rolling number of continually active drivers per day?
Consider below
select order_date, count(distinct if(flag, driver_id, null)) active_drivers
from (
select order_date, driver_id,
row_number() over(partition by driver_id order by order_date) -
date_diff(order_date, min(order_date) over(), day) = 1 as flag
from (select distinct order_date, driver_id from all_data)
)
group by order_date
First find out all combination of dates and drivers, then just get count of all drivers per date. Try this:
select order_date, count(*)
from(
select order_date, driver_id, count(*)
from all_data ad
group by order_date, driver_id)
group by order_date
Related
I'm having great difficulty writing this query and cannot find any answers online which could be applied to my problem.
I have a couple of tables which looks similar to the below with. Each purchase date corresponds with an item purchased.
Cust_ID
Purchase_Date
123
08/01/2022
123
08/20/2022
123
09/05/2022
123
10/08/2022
123
12/25/2022
123
01/26/2023
The result I am looking for should contain the customers ID, a range of the purchases, the number of consecutive months they had made a purchase (regardless of which day they purchased), and a count of how many purchases they had made in the time frame. The result should look something like the below for my example.
Cust_ID
Min Purchase Date
Max Purchase Date
Consecutive Months
No. Items Purchased
123
08/01/2022
10/08/2022
3
4
123
12/25/2022
01/26/2023
2
2
I have tried using CTEs with querys similar to
WITH CTE as
(
SELECT
PaymentDate PD,
CustomerID CustID,
DATEADD(m, -ROW_NUMBER() OVER (PARTITION BY c.CustomerID ORDER BY
DATEPART(m,PaymentDate)), PaymentDate) as TempCol1,
FROM customers as c
LEFT JOIN payments as p on c.customerid = p.customerid
GROUP BY c.CustomerID, p.PaymentDate
)
SELECT
CustID,
MIN(PD) AS MinPaymentDate,
MAX(PD) AS MaxPaymentDate,
COUNT(*) as ConsecutiveMonths,
FROM CTE
GROUP BY CustID, TempCol1
However, the above failed to properly count consecutive months. When the payment dates matched a month apart (e.g. 1/1/22 - 2/1/22), the query properly counts the consecutive months. However, if the dates do not match from month to month (e.g. 1/5/22 - 2/15/22), the count breaks.
Any guidance/help would be much appreciated!
This is just a small enhancement on the answer already given by ahmed. If your date range for this query is more than a year, then year(M.Purchase_Date) + month(M.Purchase_Date) will be 2024 for both 2022-02-01 and 2023-01-01 as YEAR() and MONTH() both return integer values. This will return incorrect count of consecutive months. You can change this to use CONCAT() or FORMAT(). Also, the COUNT(*) for ItemsPurchased should be counting the right hand side of the join, as it is a LEFT JOIN.
WITH consecutive_months AS
(
SELECT *,
DATEADD(
month,
-DENSE_RANK() OVER (
PARTITION BY CustomerID
ORDER BY YEAR(PaymentDate), MONTH(PaymentDate)
),
PaymentDate
) AS grp_date
FROM payments
)
SELECT
C.CustomerID AS CustID,
MIN(M.PaymentDate) AS MinPaymentDate,
MAX(M.PaymentDate) AS MaxPaymentDate,
COUNT(DISTINCT FORMAT(M.PaymentDate, 'yyyyMM')) AS ConsecutiveMonths,
COUNT(M.CustomerID) AS ItemsPurchased
FROM customers C
LEFT JOIN consecutive_months M
ON C.CustomerID = M.CustomerID
GROUP BY C.CustomerID, YEAR(M.grp_date), MONTH(M.grp_date)
Here's a db<>fiddle
You need to use the dense_rank function instead of the row_number, this will give the same rank for the same months and avoid breaking the grouping column. Also, you need to aggregate for 'year-month' of the grouping date column.
with consecutive_months as
(
select *,
Purchase_Date - interval
dense_rank() over (partition by Cust_ID order by year(Purchase_Date), month(Purchase_Date))
month as grp_date
from payments
)
select C.Cust_ID,
min(M.Purchase_Date) as MinPurchaseDate,
max(M.Purchase_Date) as MaxPurchaseDate,
count(distinct year(M.Purchase_Date), month(M.Purchase_Date)) as ConsecutiveMonthsNo,
count(M.Cust_ID) as ItemsPurchased
from customers C left join consecutive_months M
on C.Cust_ID = M.Cust_ID
group by C.Cust_ID, year(M.grp_date), month(M.grp_date)
See demo on MySQL
You tagged your question with MySQL, while it seems that you posted an SQL Server query syntax, for SQL Server just use dateadd(month, -dense_rank() over (partition by Cust_ID order by year(Purchase_Date), month(Purchase_Date)), Purchase_Date).
See demo on SQL Server.
I am having trouble coming up with a query to get a list of customer ids and the date of their 20th purchase.
I am given a table called transactions with the column name customer_id and purchase_date. Each row in the table is equal to one transaction.
customer_id
purchase_date
1
2020-11-19
2
2022-01-01
3
2021-12-05
3
2021-12-09
3
2021-12-16
I tried to do it like this and assumed I would have to count the number of times the customer_id has been mentioned and return the id number if the count equals 20.
SELECT customer_id, MAX(purchase_date)
FROM transactions
(
SELECT customer_id,
FROM transactions
GROUP BY customer_id
HAVING COUNT (customer_id) =20
)
How can I get this to return the list of customer_id and only the date of the 20th transaction?
You need to select the rows of transactions belonging to the customer_id and filter the result by the 20th row
SELECT * FROM (
SELECT customer_id, purchase_date, ROW_NUMBER() OVER(
PARTITION BY customer_id
ORDER BY purchase_date DESC
) AS nth
FROM transactions
) as t WHERE nth = 20
My solution:
select *
from transactions t
inner join (
select
customer_id,
purchase_date,
row_number() over (partition by customer_id order by purchase_date) R
from transactions) x on x.purchase_date=t.purchase_date
and x.customer_id=t.customer_id
where x.R=20;
see: DBFIDDLE
For MySQL5.7, see: DBFIDDLE
set #r:=1;
select *
from transactions t
inner join (
select
customer_id,
purchase_date,
#r:=#r+1 R
from transactions) x on x.purchase_date=t.purchase_date
and x.customer_id=t.customer_id
where x.R=20;
Use row_number = 20
SELECT
customer_id,
purchase_date as date_t_20
FROM
(
SELECT
customer_id,
purchase_date,
Row_number() OVER (
PARTITION BY customer_id
ORDER BY purchase_date) AS rn
FROM transactions
) T
WHERE rn = 20;
I want to fetch the first and last record of every month in sql but my query give the results below and here is my query
SELECT DISTINCT month, amount,
MIN(date) OVER (PARTITION BY month ORDER BY utility.month)
FROM
utility;
results of the query above
month
amount
min(date)
February/2022
200
2022-02-02
January/2022
1000
2022-01-01
January/2022
200
2022-01-01
March/2022
1000
2022-02-06
You can get the MIN() and MAX() value first, turn into a subquery then join utility table twice to get the amount corresponding to the extracted dates, like this:
SELECT v.month,
v.mindt,
u1.amount,
v.maxdt,
u2.amount
FROM
(SELECT month,
MIN(date) mindt, MAX(date) maxdt
FROM
utility
GROUP BY month) v
JOIN utility u1 ON u1.date=v.mindt
JOIN utility u2 ON u2.date=v.maxdt
;
That will give result something like this:
month
mindt
amount
maxdt
amount
January2022
2022-01-02
250
2022-01-29
350
February2022
2022-02-01
300
2022-02-28
500
March2022
2022-03-03
500
2022-03-18
300
Or you can modify the subquery to do UNION ALL, join utility once and return all in just the original 3 columns:
SELECT v.month,
v.minmaxdt,
u.amount
FROM
(SELECT month,
MIN(date) minmaxdt
FROM utility
GROUP BY month
UNION ALL
SELECT month,
MAX(date)
FROM utility
GROUP BY month
) v
JOIN utility u ON u.date=v.minmaxdt
ORDER BY v.month, v.minmaxdt;
That will give result something like this:
month
minmaxdt
amount
February2022
2022-02-01
300
February2022
2022-02-28
500
January2022
2022-01-02
250
January2022
2022-01-29
350
March2022
2022-03-03
500
March2022
2022-03-18
300
Demo fiddle
Try using MIN and MAX at the same time together with GROUP BY.
Check this from W3Schools.
The MIN() function returns the smallest value of the selected column.
The MAX() function returns the largest value of the selected column.
Try this code:
SELECT DISTINCT month, amount, MIN(date), MAX(date) FROM utility GROUP BY month;
I find myself often wanting to get an adjacent row value when I do a MIN or MAX statement. For example in the following statement:
WITH people AS (
select 'Greg' as name, 20 as age union
select 'Tom' as name, 17 as age
) SELECT MAX(age) FROM people;
# MAX(age)
20
The MAX function does the equivalent of: MAX(eval_expression=age, return_expression=age), where it always has the same evaluation and return value (implicitly). However, I would like to find the name of the person with the max age. So, the conceptual syntax would be: MAX(eval_expression=age, return_expression=name). This is a pattern I find myself using quite frequently and usually end up hacking something together like:
WITH people AS (
select 'Greg' as name, 20 as age union
select 'Tom' as name, 17 as age
) SELECT name FROM people NATURAL JOIN (SELECT name, MAX(age) age FROM people) _;
# name
'Greg'
Is there a generic way to do the MAX(expr, return) that I'm trying to accomplish?
Update: to provide an example where an aggregation is required:
with sales as (
select DATE '2014-01-01' as date, 100 as sales, 'Fish' as product union
select DATE '2014-01-01' as date, 105 as sales, 'Potatoes' as product union
select DATE '2014-01-02' as date, 84 as sales, 'Salsa' as product
) select date, max(sales) from sales group by date
# date, max(sales)
2014-01-01, 105
2014-01-02, 84
And how to get the equivalent of: MAX(expr=sales, return=product)? Something like:
WITH sales AS (
select DATE '2014-01-01' as d, 100 as revenue, 'Fish' as product union
select DATE '2014-01-01' as d, 105 as revenue, 'Potatoes' as product union
select DATE '2014-01-02' as d, 84 as revenue, 'Salsa' as product
) SELECT d AS date, product FROM sales NATURAL JOIN (SELECT d, MAX(revenue) AS revenue FROM sales GROUP BY d) _;
# date, product
2014-01-01, Potatoes
2014-01-02, Salsa
Unless I'm missing something here -
use limit with order by:
WITH people AS (
select 'Greg' as name, 20 as age union
select 'Tom' as name, 17 as age
)
SELECT name
FROM people
ORDER BY age DESC
LIMIT 1;
# name
'Greg'
If you want to use first_value(), I would recommend:
select distinct date,
first_value(product) over(partition by date order by sales desc) top_product
from sales
No need for aggregation here, nor for a frame specification in the window function. The window function walks the dataset starting from the row with the greatst sales, so all rows in the partition get the same top_product assigned. Then distinct retains only one row per partition.
But basically, this ends up as a greatest-n-per group problem, where you want the row with the greatest sale for each date. The first_value() solution does not scale well if you want more than one column on that row. A typical solution is to rank records in a subquery, then filter. Again, no aggregation is needed, that's filtering logic only:
select *
from (
select s.*
row_number() over(partition by date order bys ales desc) rn
from sales
) t
where rn = 1
One solution would be to use an unbounded window function such as FIRST_VALUE, where you can sort the date partition by sales. Here would be an example:
;WITH sales AS (
select DATE '2014-01-01' as date, 100 as sales, 'Fish' as product union
select DATE '2014-01-01' as date, 105 as sales, 'Potatoes' as product union
select DATE '2014-01-01' as date, 103 as sales, 'Lettuce' as product union
select DATE '2014-01-02' as date, 84 as sales, 'Salsa' as product
)
SELECT DISTINCT date, LAST_VALUE(product) OVER (
partition by date
order by sales
-- Default: https://dev.mysql.com/doc/refman/8.0/en/window-functions-frames.html
-- rows between unbounded preceding and current row
rows between unbounded preceding and unbounded following
) top_product
FROM sales group by date;
# date, top_product
'2014-01-01', 'Potatoes'
'2014-01-02', 'Salsa'
I think the subselect might be easier to read (at least for me), but this is another option. You'd have to check on the performance of the two but I'd think the analytic function (without the not-indexeable join) would be much faster.
I am using PostgreSQL. I need to get the dates for the first 5 transactions of every user on my DB.
Transaction - trans.id, trans.date, trans.cust_id, trans.value
Customer - cust.id, cust.created_at
I need to get the date of the first 5 transactions for all the customers.
Try this query:
SELECT cust_id, date
FROM (
SELECT cust_id,
date,
row_number() OVER (partition by cust_id
ORDER BY date, id ) rn
FROM Transaction
) as alias
WHERE rn <= 5
ORDER BY 1,2
demo: http://sqlfiddle.com/#!15/cfd2e/4