Conditional subquery in where clause - mysql

I'm trying to create a query with conditional logic where I only calculate revenue for the most recent records by each month using a datetime column (start_date), but only if there are multiple records in that month from the same account_id.
Here's a basic example of the schema after I join two tables (full schema in sqlfiddle link).
| account_id | plan_id | start_date | plan_interval | price |
|------------|---------|----------------------|---------------|-------|
| 1 | 1 | 2018-01-03T14:52:13Z | month | 39 |
| 1 | 3 | 2018-02-07T11:10:17Z | year | 999 |
| 1 | 2 | 2018-02-07T11:11:17Z | month | 99 |
In the above example, I would only like to include rows 1 and 3 in my output, as it's the one record from account_id 1 in January and the most recent of two records for account_id 1 in February.
SELECT
MONTH(start_date) AS month,
SUM(CASE WHEN plan_interval = 'month'
THEN price * .01
ELSE (price * .01)/12 END) AS mrr
FROM subscriptions
JOIN plans
ON plans.id = subscriptions.plan_id
WHERE Year(start_date) = 2018 AND
CASE WHEN (account_id = account_id
AND MONTH(start_date) = MONTH(start_date))
THEN (SELECT MAX(start_date) FROM subscriptions)
ELSE (SELECT start_date FROM subscriptions)
END
GROUP BY month
ORDER BY month ASC;
The case statement in the subquery above does not seem to work in doing this. It returns the data without filtering out records when the first condition is met.
Here is an example: sqlfiddle

This query returns the rows that you are asking for in the question:
SELECT s.*, p.plan_interval, p.price,
(CASE WHEN p.plan_interval = 'month'
THEN p.price * 0.01
ELSE (p.price * 0.01)/12
END) AS mrr
FROM subscriptions s JOIN
plans p
ON p.id = s.plan_id
WHERE YEAR(s.start_date) = 2018 AND
s.start_date = (SELECT MAX(s2.start_date)
FROM subscriptions s2
WHERE s2.account_id = s.account_id AND
EXTRACT(YEAR_MONTH FROM s2.start_date) = EXTRACT(YEAR_MONTH FROM s.start_date)
)
ORDER BY s.start_date ASC;
This uses a subquery to get the most recent record for a subscription for each month.
You can then aggregate this however you wish.
Notes about the query:
Table aliases make the query easier to write and to read.
The subquery uses the handy YEAR_MONTH option of EXTRACT(), so it handles both years and months.
For numeric constants between -1 and 1, I always prepend with a 0, so 0.12 rather than .12. If find that this makes the decimal point more obvious.

First work out the last entry by account and month (sub query a) join to subscriptions to get the plan_id and then get the plan
SELECT S.ACCOUNT_id,s.plan_id,s.start_date,p.Price,p.plan_interval,
case when p.plan_interval = 'month' then p.price * .01 /12 else p.price * .01 end as rev
from subscriptions s
join (select s.account_id,month(s.start_date), max(s.start_date) start_date
from subscriptions s
group by account_id,month(start_date)) a on a.account_id = s.account_id and a.start_date = s.start_date
join plans p on p.id = s.plan_id;
+------------+---------+---------------------+----------+---------------+--------------+
| ACCOUNT_id | plan_id | start_date | Price | plan_interval | rev |
+------------+---------+---------------------+----------+---------------+--------------+
| 1 | 1 | 2018-01-03 14:52:13 | 3900.00 | month | 3.25000000 |
| 1 | 2 | 2018-02-07 11:11:17 | 9900.00 | month | 8.25000000 |
| 2 | 3 | 2018-01-03 17:40:05 | 99900.00 | year | 999.00000000 |
+------------+---------+---------------------+----------+---------------+--------------+

In your case, the WHERE statement does not work because the CASE statement will always return a boolean.
CASE WHEN (account_id = account_id
AND MONTH(start_date) = MONTH(start_date))
THEN (SELECT MAX(start_date) FROM subscriptions)
ELSE (SELECT start_date FROM subscriptions)
END
Another approach to what you are building would involve using a subquery to order the columns the way you want within the groups.
SELECT
account_id,
month,
CASE WHEN plan_interval = 'month'
THEN price * .01
ELSE (price * .01)/12
END AS mrr
FROM (
SELECT *, MONTH(start_date) AS month
FROM subscriptions
INNER JOIN plans ON plans.id = subscriptions.plan_id
ORDER BY account_id, start_date DESC
) sq
GROUP BY account_id, month
This works because selecting columns in a GROUP BY will automatically take the first row that is returned by the subquery for a given group of columns.

Related

Dividing new created columns

orders_table:
orders_id_column | user_id_column | final_status_column
----------------------------------------------------
1 | 4455 | DeliveredStatus
2 | 4455 | DeliveredStatus
3 | 4455 | CanceledStatus
4 | 8888 | CanceledStatus
I want to calculate the total number of orders, and the number of Canceled orders by user_id, and then the cocient between these two, to arrive to something like is:
user_id | total_orders | canceled_orders | cocient
---------------------------------------------------
4455 | 3 | 1 | 0.33
8888 | 1 | 1 | 1.00
I managed to create the first two columns, but not the last one:
SELECT
COUNT(order_id) AS total_orders,
SUM(if(orders.final_status = 'DeliveredStatus', 1, 0)) AS canceled_orders
FROM users
GROUP BY user_id;
You can use an easy approach :
SELECT
user_id,
COUNT(order_id) AS total_orders,
SUM(CASE WHEN final_status = 'CanceledStatus' THEN 1 ELSE 0 END ) AS
canceled_orders,
SUM(CASE WHEN final_status = 'CanceledStatus' THEN 1 ELSE 0 END ) /COUNT(order_id)
as cocient
FROM users
GROUP BY user_id;
Demo: https://www.db-fiddle.com/f/7yUJcuMJPncBBnrExKbzYz/136
You could just use a sub-query.
Then you can refer to the newly created columns, as the outer query exists in a different scope (one where the new columns now exist).
(Thus avoids repeating any logic, and maintaining DRY code.)
SELECT
user_id,
total_orders,
cancelled_orders,
cancelled_orders / total_orders
FROM
(
SELECT
user_id,
COUNT(order_id) AS total_orders,
SUM(if(orders.final_status = 'DeliveredStatus', 1, 0)) AS canceled_orders
FROM
users
GROUP BY
user_id
)
AS per_user
Note, selecting from the users table appears to be a typo in your example. It would appear that you should select from the orders table...

Mysql sub select is very slow

I have a table like this :
-------------------------------
| id | valid_until | username|
-------------------------------
| 1 | 2020-01-01 | user1 |
-------------------------------
| 1 | 2020-01-01 | user2 |
-------------------------------
| 1 | 2020-01-02 | user3 |
-------------------------------
| 1 | 2020-01-02 | user4 |
-------------------------------
| 1 | 2020-01-03 | user5 |
-------------------------------
| 1 | 2020-01-03 | user6 |
-------------------------------
| 1 | 2020-01-03 | user7 |
-------------------------------
This is the user subscription table, valid_until says when the subscription will end up.
I want to know active subscription in each day, So I have a range, for example, 2020-01-01 TO 2020-01-03 and here is my query :
SELECT
valid_until qva,
(SELECT
COUNT(iod.id) AS ct
FROM
`order_detail` iod
WHERE
valid_until > qva)
FROM
`order_detail`
WHERE
valid_until >= '2020-01-01'
AND valid_until <= '2020-01-03'
GROUP BY qva
But this query is too slow, What is the problem with my query? Response time (230 sec)
Your query is invalid according to the SQL standard. You have GROUP BY qva (and using the alias name here is usually not allowed, but in MySQL and MariaDB it is), but you don't apply an aggregate function on the subquery result. MySQL is known for violating the standard here and they silently apply ANY_VALUE on such unaggregated expressions.
This makes the query slow. For each order in the date range the count is evaluated, only to pick one of those counts per date at the very end.
So, let's build the query up from scratch. You want one result row for each day in the given date range. That is:
select distinct valid_until
where valid_until between date '2020-01-01' and date '2020-01-03'
from order_detail;
Then, for each of these dates you want to get the count. You can do this in a subquery as in your original query. I suppose it must be >= instead of > though, as an order valid until a day is still valid that day (or at least this is what I'd expect).
SELECT
dates.day,
(
select count(*)
from order_detail od
where od.valid_until >= dates.day
) as ct
FROM
(
select distinct valid_until as day
where valid_until between date '2020-01-01' and date '2020-01-03'
from order_detail
) dates
order by dates.day;
(This assumes that valid_until is a mere date. If it's a datetime, you'll have to adjust this query a little.)
UPDATE:
As ysth told you in the request comments, your query will get you only days in the given range that exist as valid_until in your table. So does mine. If you wanted the three days regardless, you'd have to replace the subquery and make it independent from the table, e.g.
FROM
(
select date '2020-01-01' as day union all
select date '2020-01-02' as day union all
select date '2020-01-03' as day
) dates
I think you just want a cumulative count of valid_until from the latest date to the earliest:
select valid_until, count(*) day_count,
sum(count(*)) over (order by valid_until desc) as count_as_of_day
from order_details od
group by valid_until;
If you want this for a range of dates, then use a subquery:
select od.*
from (select valid_until, count(*) day_count,
sum(count(*)) over (order by valid_until desc) as count_as_of_day
from order_details od
group by valid_until
) od
where valid_until >= :from and valid_until <= :to

How merge two select with different WHERE and special conditions

I have table something like this:
date|status|value
date is date,
status is 1 for pending, 2 to confirmed
and value is value of order
I want to get 3 columns:
date|#status pending|#status pending+confirmed
example of data:
+------------+-----------------+-----------------+
| date | status | value |
+------------+-----------------+-----------------+
| 2015-11-17 | 1 | 89|
| 2015-11-16 | 1 | 6 |
| 2015-11-16 | 2 | 16 |
| 2015-11-16 | 2 | 26 |
| 2015-11-15 | 2 | 26 |
| 2015-11-14 | 2 | 24 |
+------------+-----------------+-----------------+
example of what I want:
+------------+-----------------+-----------------+
| date | confirmed |confirmed+pending|
+------------+-----------------+-----------------+
| 2015-11-17 | 0 | 1 |
| 2015-11-16 | 2 | 3 |
| 2015-11-15 | 1 | 1 |
| 2015-11-14 | 1 | 1 |
+------------+-----------------+-----------------+
I am trying to do:
SELECT array1.DATE
,array1.confirmed
,array2.total
FROM (
SELECT DATE (DATE) AS DATE
,count(value) AS confirmed
FROM Orders
WHERE STATUS = '2'
GROUP BY DATE (DATE) DESC limit 5
) AS array1
INNER JOIN (
SELECT DATE (DATE) AS DATE
,count(value) AS total
FROM Orders
GROUP BY DATE (DATE) DESC limit 5
) AS array2
But I get 4 results per date with repeated confirmed value and different total transactions.
If I try separated, I can get both correct informations:
will list only sum of confirmed orders of last 5 days:
SELECT array1.DATE
,array1.confirmed
,array2.total
FROM (
SELECT DATE (DATE) AS DATE
,count(valor) AS confirmed
FROM Orders
WHERE STATUS = '2'
GROUP BY DATE (DATE) DESC limit 5;
)
will list sum of all orders of last 5 days:
SELECT DATE (DATE) AS DATE
,count(valor) AS total
FROM Orders
GROUP BY DATE (DATE) DESC limit 5
I observed at least one big problem:
Sometimes we will have one day with a lot of not confirmed orders and zero confirmed, so probably inner join will fail.
You can use CASE WHEN, To get the expected output,you have given.
SELECT `date`,
(SUM(CASE WHEN `status`=1 THEN 1 ELSE 0 END)) AS Confirmed,
(SUM(CASE WHEN `status`=1 OR `status`=2 THEN 1 ELSE 0 END)) AS Confirmed_Pending
FROM
table_name
GROUP BY DATE(`date`) DESC
Hope this helps.
You are missing an ON clause in your INNER JOIN. Or, since in your case the column you join on is the same on both sides, you can use USING:
SELECT array1.DATE
,array1.confirmed
,array2.total
FROM (
SELECT DATE (DATE) AS DATE
,count(value) AS confirmed
FROM Orders
WHERE STATUS = '2'
GROUP BY DATE (DATE) DESC limit 5
) AS array1
INNER JOIN (
SELECT DATE (DATE) AS DATE
,count(value) AS total
FROM Orders
GROUP BY DATE (DATE) DESC limit 5
) AS array2
USING (DATE)
An easier approach could be to use a case expression to evaluate whether the status is something you'd like to count, and apply the count function to that:
SELECT DATE (`date`) AS `date`,
COUNT(CASE status WHEN 2 THEN 1 END) AS `confirmed`,
COUNT(CASE WHEN status IN (1, 2) THEN 1 END) AS `pending and confirmed`,
FROM orders
GROUP BY DATE (`date`) DESC

MySQL count daily new users VS returned users (cohort analysis)

The table structure is: user_id, Date (I'm used to work with timestamp)
for example
user id | Date (TS)
A | '2014-08-10 14:02:53'
A | '2014-08-12 14:03:25'
A | '2014-08-13 14:04:47'
B | '2014-08-13 04:04:47'
...
and for the next week I have
user id | Date (TS)
A | '2014-08-17 09:02:53'
B | '2014-08-17 10:04:47'
B | '2014-08-18 10:04:47'
A | '2014-08-19 10:04:22'
C | '2014-08-19 11:04:47'
...
and for today I have
user id | Date (TS)
A | '2015-05-27 09:02:53'
B | '2015-05-27 10:04:47'
C | '2015-05-27 10:04:22'
D | '2015-05-27 17:04:47'
I need to know how to perform a single query to find the number of users which are a "returned" user from the very beginning of their activity.
Expected results :
date | New user | returned User
2014-08-10 | 1 | 0
2014-08-11 | 0 | 0
2014-08-12 | 0 | 1 (A was active on 08/11)
2014-08-13 | 1 | 1 (A was active on 08/12 & 08/11)
...
2014-08-17 | 0 | 2 (A & B were already active )
2014-08-18 | 0 | 1
2014-08-19 | 1 | 1
...
2015-05-27 | 1 | 3 (D is a new user)
After some long search on Stackoverflow I found some material provided by https://meta.stackoverflow.com/users/107744/spencer7593 here : Weekly Active Users for each day from log but I didn't succeed to change his query to output my expected results.
Thanks for your help
Assuming you have a date table somewhere (and using t-sql syntax because I know it better...) the key is to calculate the mindate for each user separately, calculate the total number of users on that day, and then just declaring a returning user to be a user who wasn't new:
SELECT DateTable.Date, NewUsers, NumUsers - NewUsers AS ReturningUsers
FROM
DateTable
LEFT JOIN
(
SELECT MinDate, COUNT(user_id) AS NewUsers
FROM (
SELECT user_id, min(CAST(date AS Date)) as MinDate
FROM Table
GROUP BY user_id
) A
GROUP BY MinDate
) B ON DateTable.Date = B.MinDate
LEFT JOIN
(
SELECT CAST(date AS Date) AS Date, COUNT(DISTINCT user_id) AS NumUsers
FROM Table
GROUP CAST(date AS Date)
) C ON DateTable.Date = C.Date
Thanks to Stephen, I made a short fix on his query, which works well even it's a bit time consuming on large database :
SELECT
DATE(Stats.Created),
NewUsers,
NumUsers - NewUsers AS ReturningUsers
FROM
Stats
LEFT JOIN
(
SELECT
MinDate,
COUNT(user_id) AS NewUsers
FROM (
SELECT
user_id,
MIN(DATE(Created)) as MinDate
FROM Stats
GROUP BY user_id
) A
GROUP BY MinDate
) B
ON DATE(Stats.Created) = B.MinDate
LEFT JOIN
(
SELECT
DATE(Created) AS Date,
COUNT(DISTINCT user_id) AS NumUsers
FROM Stats
GROUP BY DATE(Created)
) C
ON DATE(Stats.Created) = C.Date
GROUP BY DATE(Stats.Created)

Counting multiple columns based on datetable - MySQL

With a table of dates, I'm trying to count different columns based on weeks.
I manage to do it with one column, and it works fine. But when I'm counting multiple columns I get either wrong or duplicated results. I think it's because of the join.
This works for one column as expected:
SELECT
DATE_FORMAT(thedate, '%u') as week
,COUNT(t.completed_date) as completed
FROM datetable
LEFT JOIN projects t ON t.completed_date = thedate
WHERE thedate BETWEEN YEAR(NOW()) AND NOW()
GROUP BY YEARWEEK(thedate,7)
By adding ,COUNT(t.sales_date) as sales to the select, I will get duplicated counts for completed and sales.
Based to this sample (projects)
| id | completed_date | sales_date |
| 1 | NULL | NULL |
| 2 | NULL | 2013-08-26 |
| 3 | NULL | 2013-08-28 |
| 4 | 2013-09-06 | NULL |
I'm looking for
| week | completed | sales |
| 34 | 0 | 0 |
| 35 | 0 | 2 |
| 36 | 1 | 0 |
I'm using a datetable because I need all dates with 0 when there's no dates.
I think I could solve it by subqueries, but there's 12 other date fields i need to count in this query as well (excluded from the sample).
Is there a better way of solving this than by using lots of subqueries? My SQL is a bit rusty.
One way is to use subqueries that group each value by week, then join them all together.
SELECT d.week, completed, sales
FROM (SELECT YEARWEEK(thedate) week
FROM datetable
WHERE thedate BETWEEN YEAR(NOW()) AND NOW()
GROUP BY week) d
LEFT JOIN (SELECT YEARWEEK(completed_date) week, COUNT(*) completed
FROM projects
WHERE completed_date BETWEEN YEAR(NOW()) AND NOW()
GROUP BY week) c
ON c.week = d.week
LEFT JOIN (SELECT YEARWEEK(sales_date) week, COUNT(*) sales
FROM projects
WHERE sales_date BETWEEN YEAR(NOW()) AND NOW()
GROUP BY week) s
ON s.week = d.week
This way is more easily extended to additional columns:
SELECT DATE_FORMAT(thedate, '%u') AS week,
IFNULL(SUM(completed_date = thedate), 0) AS completed,
IFNULL(SUM(sales_date = thedate), 0) AS sales
FROM datetable
LEFT JOIN projects
ON thedate IN (completed_date, sales_date)
WHERE thedate BETWEEN YEAR(NOW()) AND NOW()
GROUP BY week