Number of Customers Grouped By number of orders with Revenue - mysql

I am looking to build a query that will give me the number of orders grouped by the quantity made plus I would like the revenue for those numbers.
So for example.
| Number of Orders | Numbers of Customers | Revenue Of Orders |
| 1 | 312 | 4350.88 |
| 2 | 208 | 3490.00 |
| 3 | 152 | 2240.50 |
I have got the first two columns working correctly.. This is that query
SELECT
r.num_of_orders ,
count(*) AS num_of_customers
FROM
(
SELECT
count(*) AS num_of_orders
FROM
reservations r
WHERE
created_at >= '2008-01-01 00:00:00'
AND `status` = 'closed'
GROUP BY
r.customer_id
) r
GROUP BY
r.num_of_orders
Trying to add revenue I have tried.
SELECT
r.num_of_orders ,
count(*) AS num_of_customers,
sum(b.total) as total_revenue
FROM
(
SELECT
count(*) AS num_of_orders
FROM
reservations r
WHERE
created_at >= '2008-01-01 00:00:00'
GROUP BY
r.customer_id
) r,
(
SELECT
sum(payments.total) AS total
FROM
reservations r
JOIN payments ON payments.id = r.reservation_id
WHERE
r.created_at >= '2008-01-01 00:00:00'
GROUP BY
r.customer_id
) b
GROUP BY
r.num_of_orders
But I know these numbers for revenue are out..
Hope you can advise.

Add it in the original subquery:
SELECT r.num_of_orders, count(*) AS num_of_customers,
SUM(revenue) as total_revenue
FROM (SELECT COUNT(DISTINCT r.reservation_id) AS num_of_orders,
SUM(p.total) as revenue
FROM reservations r JOIN
payments p
ON p.id = r.reservation_id
WHERE r.created_at >= '2008-01-01' AND
r.status = 'closed'
GROUP BY r.customer_id
) r
GROUP BY r.num_of_orders;

Related

Retrieving last record in each group filtering by date - MySQL

I need an upgrade from a problem already solved in another question: Retrieving the last record in each group - MySQL
My problem is very similar yet I cannot achieve the results I need.
In my first table VAT_types I define what kind of rates are available by their names
id type
--------------
1 ordinaria
2 ridotta
3 minima
4 esente
In my second table VAT_rates I have multiple VAT rates according to when the law will make them official and those rates will be once in a while updated but a record of all rates must always be available
id date type rate
-----------------------------
1 2013-01-01 1 22.0000
2 2013-01-01 2 10.0000
3 2013-01-01 3 4.0000
4 2000-01-01 4 0.0000
9 2019-01-01 2 11.5000
10 2021-01-01 2 12.0000
11 2019-01-01 1 24.2000
12 2021-01-01 1 25.0000
So if I want to filter them accordin to the current date (or a future date) I just have to query them like this:
SELECT VAT.id, TYPE.type, VAT.date, VAT.rate
FROM VAT_rates VAT JOIN VAT_types TYPE on TYPE.id = VAT.type
WHERE cast(VAT.date as date) <= cast("2022-11-22" as date)
ORDER BY VAT.type ASC, VAT.date DESC
"2022-11-22" can be any date, and in fact if I change it to CURDATE() it will display all available rates until that date.
Now I want to group them by vat type and retrieve just the last updated one. So I looked up here and found that solution linked above which I tweaked like this:
SELECT T1.*
FROM (
SELECT VAT.id, TYPE.type, VAT.date, VAT.rate
FROM VAT_rates VAT JOIN VAT_types TYPE on TYPE.id = VAT.type
WHERE cast(VAT.date as date) <= cast("2022-11-22" as date)
ORDER BY VAT.type ASC, VAT.date DESC
) T1
LEFT JOIN (
SELECT VAT.id, TYPE.type, VAT.date, VAT.rate
FROM VAT_rates VAT JOIN VAT_types TYPE on TYPE.id = VAT.type
WHERE cast(VAT.date as date) <= cast("2022-11-22" as date)
ORDER BY VAT.type ASC, VAT.date DESC
) T2
ON (T1.type = T2.type AND T1.id < T2.id)
WHERE T2.id IS NULL
ORDER BY T1.rate DESC;
The result will be:
id type date rate
--------------------------------
12 Ordinaria 2021-01-01 25,0000
10 Ridotta 2021-01-01 12,0000
3 Minima 2013-01-01 4,0000
4 Esente 2000-01-01 0,0000
It seems to work, but of course it's way too complicated. I also wish to use this query in my php and tweaking the date just once in order to retrieve the right rates and then the specific rate needed.
How can I simplify the query above?
You could use a inner join on subquery for max date group by type
select VAT.id, TYPE.type, VAT.date, VAT.rate
from VAT_rates VAT
inner JOIN VAT_types TYPE on TYPE.id = VAT.type
inner join (
select max(VAT.date) max_date, TYPE.type
from VAT_rates VAT
INNER JOIN VAT_types TYPE on TYPE.id = VAT.type
WHERE str_to_date(VAT.date, '%Y-%m-%d') <= str_to_date("2022-11-22", '%Y-%m-%d')
group by TYPE.type
) T on T.max_date = VAT.date and T.type = TYPE.type
It is common to find the greatest per group using the following approach
select VAT.id, TYPE.type, VAT.date, VAT.rate
from VAT_rates VAT
join VAT_types TYPE on VAT.type = TYPE.id
join
(
select type, max(date) max_date
from VAT_rates
where cast(date as date) <= cast("2022-11-22" as date)
group by type
) t on VAT.type = t.type and
VAT.date = t.max_date and
cast(VAT.date as date) <= cast("2022-11-22" as date)

MySQL - Get the max count from a subquery group

I have a table logins with the following schema:
| id | user_id | weekday |
|----|---------|---------|
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 3 | 1 | 2 |
...
Weekday is a number from 0 to 6.
I want to get which weekday has the highest count, for each user_id in the table.
I tried the following query:
SELECT MAX(num) as max_num, user_id, weekday
FROM (
SELECT COUNT(*) as num, user_id, weekday
FROM logins
GROUP BY user_id, weekday
) C
WHERE user_id = C.user_id AND num = C.num
GROUP BY user_id;
Which gets me weekday = 1 instead of 2. I think that I shouldn't use a WHERE clause here, but I couldn't manage to get the correct result.
I've checked other similar questions with no luck, such as:
MYSQL, Max,Group by and Max
Select first row in each GROUP BY group?
I created a SQL Fiddle with my example: http://sqlfiddle.com/#!9/e43a71/1
Here is a method:
SELECT user_id, MAX(num) as max_num,
SUBSTRING_INDEX(GROUP_CONCAT(weekday ORDER BY num DESC), ',', 1) as weekday_max
FROM (SELECT user_id, weekday, COUNT(*) as num
FROM logins l
GROUP BY user_id, weekday
) uw
GROUP BY user_id;
SELECT days.user_id, days.weekday, days.num
FROM (
SELECT user_id, MAX(num) AS num
FROM (
SELECT user_id, weekday, COUNT(*) AS num
FROM logins
GROUP BY user_id, weekday
) max
GROUP BY user_id
) nums
JOIN (
SELECT user_id, weekday, COUNT(*) as num
FROM logins
GROUP BY user_id, weekday
) days ON(days.user_id = nums.user_id AND days.num = nums.num);
-- With Mariadb 10.2 or MySQL 8.0.2
WITH days AS (
SELECT user_id, weekday, COUNT(*) as num
FROM logins
GROUP BY user_id, weekday
)
SELECT days.user_id, days.weekday, days.num
FROM (
SELECT user_id, MAX(num) AS num
FROM days
GROUP BY user_id
) nums
JOIN days ON(days.user_id = nums.user_id AND days.num = nums.num);

Cumulative Sum in MySQL

Using MySQL. I want to get cumulative sum.
This is my table
CREATE TABLE `user_infos`
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
(..)
`created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
`updated_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
PRIMARY KEY (`id`) )
And what I want to get is
+-------+-------+----------------+
| month | count | cumulative_sum |
+-------+-------+----------------+
| 01 | 100 | 100 |
| 02 | 101 | 201 |
| ... | 110 | 311 |
| 12 | 200 | 511 |
+-------+-------+----------------+
but the result is
+-------+-------+----------------+
| month | count | cumulative_sum |
+-------+-------+----------------+
| 01 | 100 | 100 |
| 02 | 101 | 101 |
| ... | 110 | 110 |
| 12 | 200 | 200 |
+-------+-------+----------------+
This is my wrong query..
select
T1.Month,T1.Count,
#runnung_total := (#running_total + T1.Count) as cumulative_sum
from (
select date_format(created_at,'%m') as Month,count(1) as Count from users
where date_format(created_at,'%Y')='2016'
group by(date_format(created_at,'%m'))
union
select date_format(created_at,'%m') as Month,count(1) as Count from users
where date_format(created_at,'%Y')='2017'
group by(date_format(created_at,'%m')) ) as T1
join (select #running_total := 0) as R1;
I referred to this. What's wrong in my code?
You can achieve that in two steps: first of all get the sum for each year and month
select concat(year(created_at), lpad(month(created_at), 2, '0')) as ye_mo,
count(*) as cnt
from users
group by concat(year(created_at), lpad(month(created_at), 2, '0'))
Then join it with itself, having each row matched with all previous ones
select t1.ye_mo, sum(t2.cnt)
from (
select concat(year(created_at), lpad(month(created_at), 2, '0')) as ye_mo,
count(*) as cnt
from users
group by concat(year(created_at), lpad(month(created_at), 2, '0'))
) t1
join (
select concat(year(created_at), lpad(month(created_at), 2, '0')) as ye_mo,
count(*) as cnt
from users
group by concat(year(created_at), lpad(month(created_at), 2, '0'))
) t2
on t1.ye_mo >= t2.ye_mo
group by t1.ye_mo
order by t1.ye_mo
Edit
The query above assumes you want the running sum to increase across different years. If you want to display the months only, and aggregate the values of different years in the same month, you can change id this way
select t1.mnt, sum(t2.cnt)
from (
select month(created_at) as mnt,
count(*) as cnt
from userss
group by month(created_at)
) t1
join (
select month(created_at) as mnt,
count(*) as cnt
from userss
group by month(created_at)
) t2
on t1.mnt >= t2.mnt
group by t1.mnt
order by t1.mnt
Finally, if you want the running sum to reset at the beginning of each year, you can do that like this
select t1.yr, t1.mn, sum(t2.cnt)
from (
select year(created_at) as yr, month(created_at) as mn,
count(*) as cnt
from userss
group by year(created_at), month(created_at)
) t1
join (
select year(created_at) as yr, month(created_at) as mn,
count(*) as cnt
from userss
group by year(created_at), month(created_at)
) t2
on t1.yr = t2.yr and
t1.mn >= t2.mn
group by t1.yr, t1.mn
order by t1.yr, t1.mn
All three versions can be seen in action here
Variables are the right way to go. You can simplify your query:
select m.Month, m.cnt,
(#running_total := (#running_total + m.cnt) ) as cumulative_sum
from (select month(created_at) as Month, count(*) as cnt
from users
where year(created_at) in (2016, 2017)
group by month(created_at)
) m cross join
(select #running_total := 0) params
order by m.Month;
Starting with MySQL 8, the ideal approach to calculate cumulative sums is by using SQL standard window functions rather than the vendor-specific, and not stricly declarative approach of using local variables. Your query can be written as follows:
WITH data(month, count) AS (
SELECT date_format(create_at, '%m') AS month, count(*) AS count
FROM users
GROUP BY date_format(create_at, '%m')
)
SELECT
month,
count,
sum(count) OVER (ORDER BY month) AS cumulative_sum
FROM data

Optimise MySQL - JOIN vs Nested query

I have been trying to optimise some SQL queries based on the assumption that Joining tables is more efficient than nesting queries. I am joining the same table multiple times to perform a different analysis on the data.
I have 2 tables:
transactions:
id | date_add | merchant_ id | transaction_type | amount
1 1488733332 108 add 20.00
2 1488733550 108 remove 5.00
and a calendar table which just lists dates so that I can create empty records where there are no transactions on particular days:
calendar:
id | datefield
1 2017-03-01
2 2017-03-02
3 2017-03-03
4 2017-03-04
I have many thousands of rows in the transactions table, and I'm trying to get an annual summary of total and different types of transactions per month (i.e 12 rows in total), where
transactions = sum of all "amount"s,
additions = sum of all "amounts" where transaction_type = "add"
redemptions = sum of all "amounts" where transaction_type = "remove"
result:
month | transactions | additions | redemptions
Jan 15 12 3
Feb 20 15 5
...
My initial query looks like this:
SELECT COALESCE(tr.transactions, 0) AS transactions,
COALESCE(ad.additions, 0) AS additions,
COALESCE(re.redemptions, 0) AS redemptions,
calendar.date
FROM (SELECT DATE_FORMAT(datefield, '%b %Y') AS date FROM calendar WHERE datefield LIKE '2017-%' GROUP BY YEAR(datefield), MONTH(datefield)) AS calendar
LEFT JOIN (SELECT COUNT(transaction_type) as transactions, from_unixtime(date_add, '%b %Y') as date_t FROM transactions WHERE merchant_id = 108 GROUP BY from_unixtime(date_add, '%b %Y')) AS tr
ON calendar.date = tr.date_t
LEFT JOIN (SELECT COUNT(transaction_type = 'add') as additions, from_unixtime(date_add, '%b %Y') as date_a FROM transactions WHERE merchant_id = 108 AND transaction_type = 'add' GROUP BY from_unixtime(date_add, '%b %Y')) AS ad
ON calendar.date = ad.date_a
LEFT JOIN (SELECT COUNT(transaction_type = 'remove') as redemptions, from_unixtime(date_add, '%b %Y') as date_r FROM transactions WHERE merchant_id = 108 AND transaction_type = 'remove' GROUP BY from_unixtime(date_add, '%b %Y')) AS re
ON calendar.date = re.date_r
I tried optimising and cleaning it up a little, removing the nested statements and came up with this:
SELECT
DATE_FORMAT(cal.datefield, '%b %d') as date,
IFNULL(count(ct.amount),0) as transactions,
IFNULL(count(a.amount),0) as additions,
IFNULL(count(r.amount),0) as redeptions
FROM calendar as cal
LEFT JOIN transactions as ct ON cal.datefield = date(from_unixtime(ct.date_add)) && ct.merchant_id = 108
LEFT JOIN transactions as r ON r.id = ct.id && r.transaction_type = 'remove'
LEFT JOIN transactions as a ON a.id = ct.id && a.transaction_type = 'add'
WHERE cal.datefield like '2017-%'
GROUP BY month(cal.datefield)
I was surprised to see that the revised statement was about 20x slower than the original with my dataset. Have I missed some sort of logic? Is there a better way to achieve the same result with a more streamlined query, given I am joining the same table multiple times?
EDIT:
So to further explain the results I'm looking for - I'd like a single row for each month of the year (12 rows) each with a column for the total transactions, total additions, and total redemptions in each month.
The first query I was getting a result in about 0.5 sec but with the second I was getting results in 9.5sec.
Looking to your query You could use a single left join with case when
SELECT COALESCE(t.transactions, 0) AS transactions,
COALESCE(t.additions, 0) AS additions,
COALESCE(t.redemptions, 0) AS redemptions,
calendar.date
FROM (SELECT DATE_FORMAT(datefield, '%b %Y') AS date
FROM calendar
WHERE datefield LIKE '2017-%'
GROUP BY YEAR(datefield), MONTH(datefield)) AS calendar
LEFT JOIN
( select
COUNT(transaction_type) as transactions
, sum( case when transaction_type = 'add' then 1 else 0 end ) as additions
, sum( case when transaction_type = 'remove' then 1 else 0 end ) as redemptions
, from_unixtime(date_add, '%b %Y') as date_t
FROM transactions
WHERE merchant_id = 108
GROUP BY from_unixtime(date_add, '%b %Y' ) t ON calendar.date = t.date_t
First I would create a derived table with timestamp ranges for every month from your calendar table. This way a join with the transactions table will be efficient if date_add is indexed.
select month(c.datefield) as month,
unix_timestamp(timestamp(min(c.datefield), '00:00:00')) as ts_from,
unix_timestamp(timestamp(max(c.datefield), '23:59:59')) as ts_to
from calendar c
where c.datefield between '2017-01-01' and '2017-12-31'
group by month(c.datefield)
Join it with the transaactions table and use conditional aggregations to get your data:
select c.month,
sum(t.amount) as transactions,
sum(case when t.transaction_type = 'add' then t.amount else 0 end) as additions,
sum(case when t.transaction_type = 'remove' then t.amount else 0 end) as redemptions
from (
select month(c.datefield) as m, date_format(c.datefield, '%b') as `month`
unix_timestamp(timestamp(min(c.datefield), '00:00:00')) as ts_from,
unix_timestamp(timestamp(max(c.datefield), '23:59:59')) as ts_to
from calendar c
where c.datefield between '2017-01-01' and '2017-12-31'
group by month(c.datefield), date_format(c.datefield, '%b')
) c
left join transactions t on t.date_add between c.ts_from and c.ts_to
where t.merchant_id = 108
group by c.m, c.month
order by c.m

Self join only returning one record

Working on an inventory management system, and we have the following tables:
================================================
| orders | order_line_items | product_options |
|--------|-------------------|-----------------|
| id | id | id |
| start | order_id | name |
| end | product_option_id | |
| | quantity | |
| | price | |
| | event_start | |
| | event_end | |
================================================
I'm trying to calculate inventory on a certain date, so I need to make a self join to compare the quantity on order_line_items to the SUM of the quantity of other records in order_line_items with the same product_option_id, and where the event start and end are within a range.
So, given a date 2016-01-20, I have:
SELECT order_line_items.id, order_line_items.product_option_id, order_line_items.order_id FROM order_line_items
WHERE order_line_items.event_end_date >= '2016-01-20 04:00:00'
AND order_line_items.event_start_date <= '2016-01-21 04:00:00'
AND order_line_items.product_option_id IS NOT NULL;
The above returns 127 rows
When I try to do a self join, like so:
SELECT
order_line_items.id,
order_line_items.product_option_id,
order_line_items.order_id,
order_line_items.quantity,
other_line_items.other_product_option_id,
other_line_items.other_order_id,
other_line_items.other_quantity,
other_line_items.total
FROM order_line_items
JOIN (
SELECT
id,
product_option_id AS other_product_option_id,
order_id AS other_order_id,
quantity AS other_quantity,
SUM(quantity) total
FROM order_line_items
WHERE order_line_items.event_end_date >= '2016-01-20 04:00:00'
AND order_line_items.event_start_date <= '2016-01-21 04:00:00'
) other_line_items ON order_line_items.product_option_id = other_line_items.other_product_option_id
WHERE order_line_items.event_end_date >= '2016-01-20 04:00:00'
AND order_line_items.event_start_date <= '2016-01-21 04:00:00'
AND order_line_items.product_option_id IS NOT NULL;
It only returns 1 record. As you can see here: (https://goo.gl/BhUYxK) there are plenty of records with the same product_option_id so this last query should be returning a lot of rows
The added SUM(...) turns the subquery into a single row. Perhaps the subquery needed to have one of these:
GROUP BY (id)
GROUP BY (product_option_id)
GROUP BY (order_id)
(I don't know the schema or application well enough to say which makes sense.)
(Please use shorter, more distinctive, aliases; the SQL is very hard to read because of the length and similarity of order_line_items and other_line_items.)
Are you actually want to get the following ? :
SELECT product_option_id, sum(quantity)
FROM order_line_items
WHERE event_end_date >= '2016-01-20 04:00:00'
AND event_start_date <= '2016-01-21 04:00:00'
GROUP BY 1
I can not tell why you need a self join here
It's hard to tell what you want without a sample result, but this will give you the comparison of quantity of each order's product options to the total within the range:
SELECT oli.order_id,
oli.product_option_id,
oli.quantity,
po.total_quantity
FROM order_line_items oli
JOIN (
SELECT product_option_id,
SUM(quantity) total_quantity
FROM order_line_items
WHERE event_end >= '2016-01-20 04:00:00'
AND event_start <= '2016-01-21 04:00:00'
GROUP BY product_option_id
) po
ON po.product_option_id = oli.product_option_id
WHERE oli.event_end >= '2016-01-20 04:00:00'
AND oli.event_start <= '2016-01-21 04:00:00'
If you can have multiple rows of the same product_option in an order you may need to tweak this as so:
SELECT oli.order_id,
oli.product_option_id,
SUM(oli.quantity) order_quantity,
po.total_quantity
FROM order_line_items oli
JOIN (
SELECT product_option_id,
SUM(quantity) total_quantity
FROM order_line_items
WHERE event_end >= '2016-01-20 04:00:00'
AND event_start <= '2016-01-21 04:00:00'
GROUP BY product_option_id
) po
ON po.product_option_id = oli.product_option_id
WHERE oli.event_end >= '2016-01-20 04:00:00'
AND oli.event_start <= '2016-01-21 04:00:00'
GROUP BY oli.order_id,
oli.product_option_id
From what I understand, you actually should be left joining the orders and product_options to the order_line_items table.
Your query should look something like this, for the quantity of items on orders between certain dates.
SELECT product_options.id, production_options.name, SUM(order_line_items.quantity)
FROM order_line_items
LEFT JOIN product_options ON production_options.id=order_line_items.product_option_id
LEFT JOIN orders ON orders.id=order_line_items.order_id
WHERE orders.start>='SOME DATETIME' AND orders.end<='SOME DATETIME'
GROUP BY product_options.id
Also, just a comment, product_options should probably just be named products. Shorter table names for the win!