average, count, group by in select query - mysql

I am developing a booking engine web app.
Once an user made a booking it goes to this table.
id | Promo_code | total | arrival_date | departure_date | booked_date
1 | ABC1 | 1000 | 2019-02-06 | 2019-02-10 | 2019-02-02
2 | ABC1 | 2500 | 2019-02-07 | 2019-02-11 | 2019-02-03
3 | ABC1 | 3000 | 2019-02-12 | 2019-02-15 | 2019-02-03
4 | ABC2 | 5000 | 2019-02-07 | 2019-02-11 | 2019-02-02
5 | null | 3000 | 2019-02-12 | 2019-02-15 | 2019-02-01
Here the promo_code is what it names implies. If the user doesn't book with a promo_code it is null (5th record).
Hope other fields total, arrival_date, departure_date and booked_date are clear to you.
My question is I want to generate a report something like this.
promo_code | number_of_bookings | revenue | Average_length_of_stay | Average_depart_date | Average_reservation_revenue
ABC1 | 3 | 6500 | 3 | 5 | 2166
ABC2 | 1 | 5000 | 4 | 5 | 5000
This report is called revenue by promo code report.
If I explain what happend in this report is
Average_length_of_stay = (departure_date - arrival_date) / number_of_bookings
Average_depart_date = (departure_date - booked_date) / number_of_bookings
Of cause I could generate this report by the backend logic somehow. But I would be very painful. There must be a way to query this
in the SQL directly.
What I have done upto now is
SELECT promo_code ,count(*) as number_of_bookings,
sum(total) as revenue
FROM booking_widget.User_packages group by promo_code;
I am stuck with Average_length_of_stay, Average_depart_date and Average_reservation_revenue.
How do I get the average values which the group by clause?

It is trivial:
SELECT promo_code
, COUNT(*) AS number_of_bookings
, SUM(total) AS revenue
, AVG(DATEDIFF(departure_date, arrival_date)) AS average_length_of_stay
, AVG(DATEDIFF(departure_date, booked_date)) AS average_depart_date
, AVG(total) AS average_reservation_revenue
FROM t
GROUP BY promo_code

Related

Joining two MySQL datasets by date as well as id

I am struggling to find a way to efficently join two datasets using a single query
Dataset one can be returned using the following query:
SELECT hours_person_id, hours_date, hours_job, SUM(hours_value) AS hours
FROM hours
WHERE hours_status = 1
GROUP BY hours_person_id, hours_date, hours_job
which gives a dataset similar to
| 1 | 2020-06-07 | 101 | 25 |
| 1 | 2020-06-07 | 102 | 10 |
| 1 | 2020-06-07 | 103 | 5 |
| 2 | 2020-06-07 | 101 | 30 |
| 2 | 2020-06-07 | 104 | 10 |
From which we can get total hours per week, per job, etc...
Our second dataset gives us the hourly rates for the each person. The problem is that this table contains both historical and future hourly rates, so the join needs to ensure that the rate applies to the correct person_id and date. There could also be more than 1 rate for a person on a date.
The following gives all the rates that are active
SELECT rate_person_id, rate_date, rate_value
FROM rates
WHERE rate_active = 1
Which could look like
| 1 | 2020-01-01 | 20.00 |
| 1 | 2020-05-01 | 25.00 |
| 1 | 2020-07-01 | 22.00 |
| 2 | 2020-01-01 | 22.00 |
| 2 | 2020-05-01 | 24.00 |
| 3 | 2020-05-01 | 20.00 |
| 3 | 2020-05-01 | 21.00 |
| 3 | 2020-07-01 | 18.00 |
So for the hours above the rate from the 2020-05-01 would be the expected result, with the 21.00 value being the result for person_id === 3
Can what I am looking for be done in a single Query, or am I better off Joining two subqueries?
Update
As requested here is a fiddle that represents the above
https://www.db-fiddle.com/f/oiUpTnajY6M6ZTfZgRf4kT/0
As you can see we have a query that returns the correct data, but this query does not scale to our curennt data set (1.8m lines and more sub tables)
So for the hours above the rate from the 2020-05-01 would be the expected result, with the 21.00 value being the result for person_id === 1
From your rates output, person_id = 1 was never on rate value 21.00 .
| 1 | 2020-01-01 | 20.00 |
| 1 | 2020-05-01 | 25.00 |
| 1 | 2020-07-01 | 22.00 |
For 2 active rates for a person, do you need the most recent rate or you need the rate in the month where he worked. If there is no rate for that month then do you want 0 rate or something else.
SELECT h.*,
(SELECT rate_value
FROM rates r
WHERE h.hours_person_id = r.rate_person_id AND
r.date <= h.date
ORDER BY r.date DESC
LIMIT 1
) as rate_value
FROM hours h
I don't see what active has to do with the question, because you need to go back in time. You can then aggregate or do whatever you want once you have the correct rate on the date.

geting the purchase number based on user ID in mysql

I am trying to get a fouth column where I get the purchase number of that user, I have this data:
user date purchase_id
a 01-01-2018 1
b 02-01-2018 2
a 02-01-2018 3
a 03-01-2018 4
b 04-01-2018 5
a 04-01-2018 6
and would like to get something like this:
user date purchase_id purchase_order
a 01-01-2018 1 1
b 02-01-2018 2 1
a 02-01-2018 3 2
a 03-01-2018 4 3
b 04-01-2018 5 2
a 04-01-2018 6 4
The final use of this is to build a cohort analysis to check user retention.
Thanks
You seem to be looking for ROW_NUMBER() (available in MySQL 8.0). This window function can be used to rank records within groups sharing the same user.
SELECT
user,
date,
purchase_id,
ROW_NUMBER() OVER(PARTITION BY user ORDER BY purchase_id ) purchase_order
FROM mytable
NB: it is unclear what column you want to use for ordering. It could be purchase_id (as show in the above query), or maybe date: you can change the query as per your requirement.
Demo on DB Fiddle:
| user | date | purchase_id | purchase_order |
| ---- | ---------- | ----------- | -------------- |
| a | 2018-01-01 | 1 | 1 |
| a | 2018-01-02 | 3 | 2 |
| a | 2018-01-03 | 4 | 3 |
| a | 2018-01-04 | 6 | 4 |
| b | 2018-01-02 | 2 | 1 |
| b | 2018-01-04 | 5 | 2 |
Exclusively for versions prior to 8.0...
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(purchase_id SERIAL PRIMARY KEY
,user CHAR(1) NOT NULL
,date DATE NOT NULL
);
INSERT INTO my_table VALUES
(1,'a','2018-01-01'),
(2,'b','2018-01-02'),
(3,'a','2018-01-02'),
(4,'a','2018-01-03'),
(5,'b','2018-01-04'),
(6,'a','2018-01-04');
SELECT a.purchase_id
, a.user
, a.date
, a.i rank
FROM
( SELECT x.*
, CASE WHEN #prev = user THEN #i:=#i+1 ELSE #i:=1 END i
, #prev := user
FROM my_table x
, (SELECT #prev:=null,#i:=0) vars
ORDER
BY user
, date
) a
ORDER
BY purchase_id;
+-------------+------+------------+------+
| purchase_id | user | date | rank |
+-------------+------+------------+------+
| 1 | a | 2018-01-01 | 1 |
| 2 | b | 2018-01-02 | 1 |
| 3 | a | 2018-01-02 | 2 |
| 4 | a | 2018-01-03 | 3 |
| 5 | b | 2018-01-04 | 2 |
| 6 | a | 2018-01-04 | 4 |
+-------------+------+------------+------+

How to sum values of two tables and group by date

I am building a trading system where users need to know their running account balance by date for a specific user (uid) including how much they made from trading (results table) and how much they deposited or withdrew from their accounts (adjustments table).
Here is the sqlfiddle and tables: http://sqlfiddle.com/#!9/6bc9e4/1
Adjustments table:
+-------+-----+-----+--------+------------+
| adjid | aid | uid | amount | date |
+-------+-----+-----+--------+------------+
| 1 | 1 | 1 | 20 | 2019-08-18 |
| 2 | 1 | 1 | 50 | 2019-08-21 |
| 3 | 1 | 1 | 40 | 2019-08-21 |
| 4 | 1 | 1 | 10 | 2019-08-19 |
+-------+-----+-----+--------+------------+
Results table:
+-----+-----+-----+--------+-------+------------+
| tid | uid | aid | amount | taxes | date |
+-----+-----+-----+--------+-------+------------+
| 1 | 1 | 1 | 100 | 3 | 2019-08-19 |
| 2 | 1 | 1 | -50 | 1 | 2019-08-20 |
| 3 | 1 | 1 | 100 | 2 | 2019-08-21 |
| 4 | 1 | 1 | 100 | 2 | 2019-08-21 |
+-----+-----+-----+--------+-------+------------+
How do I get the below results for uid (1)
+--------------+------------+------------------+----------------+------------+
| ResultsTotal | TaxesTotal | AdjustmentsTotal | RunningBalance | Date |
+--------------+------------+------------------+----------------+------------+
| - | - | 20 | 20 | 2019-08-18 |
| 100 | 3 | 10 | 133 | 2019-08-19 |
| -50 | 1 | - | 84 | 2019-08-20 |
| 200 | 4 | 90 | 378 | 2019-08-21 |
+--------------+------------+------------------+----------------+------------+
Where RunningBalance is the current account balance for the particular user (uid).
Based on #Gabriel's answer, I came up with something like, but it gives me empty balance and duplicate records
SELECT SUM(ResultsTotal), SUM(TaxesTotal), SUM(AdjustmentsTotal), #runningtotal:= #runningtotal+SUM(ResultsTotal)+SUM(TaxesTotal)+SUM(AdjustmentsTotal) as Balance, date
FROM (
SELECT 0 AS ResultsTotal, 0 AS TaxesTotal, adjustments.amount AS AdjustmentsTotal, adjustments.date
FROM adjustments LEFT JOIN results ON (results.uid=adjustments.uid) WHERE adjustments.uid='1'
UNION ALL
SELECT results.amount AS ResultsTotal, taxes AS TaxesTotal, 0 as AdjustmentsTotal, results.date
FROM results LEFT JOIN adjustments ON (results.uid=adjustments.uid) WHERE results.uid='1'
) unionTable
GROUP BY DATE ORDER BY date
For what you are asking you would want to union then group the results from both tables, this should give the results you want. However, I recommend calculating the running balance outside of MySQL since this adds some complexity to our query.
Weird things could start to happen, for example, if someone already defined the #runningBalance variable as part of the queries scope.
SELECT aggregateTable.*, #runningBalance := ifNULL(#runningBalance, 0) + TOTAL
FROM (
SELECT SUM(ResultsTotal), SUM(TaxesTotal), SUM(AdjustmentsTotal)
, SUM(ResultsTotal) + SUM(TaxesTotal) + SUM(AdjustmentsTotal) as TOTAL
, date
FROM (
SELECT 0 AS ResultsTotal, 0 AS TaxesTotal, amount AS AdjustmentsTotal, date
FROM adjustments
UNION ALL
SELECT amount AS ResultsTotal, taxes AS TaxesTotal, 0 as AdjustmentsTotal, date
FROM results
) unionTable
GROUP BY date
) aggregateTable

Query with dynamic date intervals

Given a statuses table that holds information about products availability, how do I select the date that corresponds to the 1st day in the latest 20 days that the product has been active?
Yes I know the question is hard to follow. I think another way to put it would be: I want to know how many times each product has been sold in the last 20 days that it was active, meaning the product could have been active for years, but I'd only want the sales count from the latest 20 days that it had a status of "active".
It's something easily doable in the server-side (i.e. getting any collection of products from the DB, iterating them, performing n+1 queries on the statuses table, etc), but I have hundreds of thousands of items so it's imperative to do it in SQL for performance reasons.
table : products
+-------+-----------+
| id | name |
+-------+-----------+
| 1 | Apple |
| 2 | Banana |
| 3 | Grape |
+-------+-----------+
table : statuses
+-------+-------------+---------------+---------------+
| id | name | product_id | created_at |
+-------+-------------+---------------+---------------+
| 1 | active | 1 | 2018-01-01 |
| 2 | inactive | 1 | 2018-02-01 |
| 3 | active | 1 | 2018-03-01 |
| 4 | inactive | 1 | 2018-03-15 |
| 6 | active | 1 | 2018-04-25 |
| 7 | active | 2 | 2018-03-01 |
| 8 | active | 3 | 2018-03-10 |
| 9 | inactive | 3 | 2018-03-15 |
+-------+-------------+---------------+---------------+
table : items (ordered products)
+-------+---------------+-------------+
| id | product_id | order_id |
+-------+---------------+-------------+
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 3 | 1 | 3 |
| 4 | 1 | 4 |
| 5 | 1 | 5 |
| 6 | 2 | 3 |
| 7 | 2 | 4 |
| 8 | 2 | 5 |
| 9 | 3 | 5 |
+-------+---------------+-------------+
table : orders
+-------+---------------+
| id | created_at |
+-------+---------------+
| 1 | 2018-01-02 |
| 2 | 2018-01-15 |
| 3 | 2018-03-02 |
| 4 | 2018-03-10 |
| 5 | 2018-03-13 |
+-------+---------------+
I want my final results to look like this:
+-------+-----------+----------------------+--------------------------------+
| id | name | recent_sales_count | date_to_start_counting_sales |
+-------+-----------+----------------------+--------------------------------+
| 1 | Apple | 3 | 2018-01-30 |
| 2 | Banana | 0 | 2018-04-09 |
| 3 | Grape | 1 | 2018-03-10 |
+-------+-----------+----------------------+--------------------------------+
So this is what I mean by latest 20 active days for e.g. Apple:
It was last activated at '2018-04-25'. That's 4 days ago.
Before that, it was inactive since '2018-03-15', so all these days until '2018-04-25' don't count.
Before that, it was active since '2018-03-01'. That's more 14 days until '2018-03-15'.
Before that, inactive since '2018-02-01'.
Finally, it was active since '2018-01-01', so it should only count the missing 2 days (4 + 14 + 2 = 20) backwards from '2018-02-01', resulting in date_to_start_counting_sales = '2018-01-30'.
With the '2018-01-30' date in hand, I'm then able to count Apple orders in the last 20 active days: 3.
Hope that makes sense.
Here is a fiddle with the data provided above.
I've got a standard SQL solution, that does not use any window function as you are on MySQL 5
My solution requires 3 stacked views.
It would have been better with a CTE but your version doesn't support it. Same goes for the stacked Views... I don't like to stack views and always try to avoid it, but sometimes you have no other choice, because MySQL doesn't accept subqueries in FROM clause for Views.
CREATE VIEW VIEW_product_dates AS
(
SELECT product_id, created_at AS active_date,
(
SELECT created_at
FROM statuses ti
WHERE name = 'inactive' AND ta.created_at < ti.created_at AND ti.product_id=ta.product_id
GROUP BY product_id
) AS inactive_date
FROM statuses ta
WHERE name = 'active'
);
CREATE VIEW VIEW_product_dates_days AS
(
SELECT product_id, active_date, inactive_date, datediff(IFNULL(inactive_date, SYSDATE()),active_date) AS nb_days
FROM VIEW_product_dates
);
CREATE VIEW VIEW_product_dates_days_cumul AS
(
SELECT product_id, active_date, ifnull(inactive_date,sysdate()) AS inactive_date, nb_days,
IFNULL((SELECT SUM(V2.nb_days) + V1.nb_days
FROM VIEW_product_dates_days V2
WHERE V2.active_date >= IFNULL(V1.inactive_date, SYSDATE()) AND V1.product_id=V2.product_id
),V1.nb_days) AS cumul_days
FROM VIEW_product_dates_days V1
);
The final view produce this :
| product_id | active_date | inactive_date | nb_days | cumul_days |
|------------|----------------------|----------------------|---------|------------|
| 1 | 2018-01-01T00:00:00Z | 2018-02-01T00:00:00Z | 31 | 49 |
| 1 | 2018-03-01T00:00:00Z | 2018-03-15T00:00:00Z | 14 | 18 |
| 1 | 2018-04-25T00:00:00Z | 2018-04-29T11:28:39Z | 4 | 4 |
| 2 | 2018-03-01T00:00:00Z | 2018-04-29T11:28:39Z | 59 | 59 |
| 3 | 2018-03-10T00:00:00Z | 2018-03-15T00:00:00Z | 5 | 5 |
So it aggregates all active periods of all products, it counts the number of days for each period, and the cumulative days of all past active periods since current date.
Then we can query this final view to get the desired date for each product. I set a variable for your 20 days, so you can change that number easily if you want.
SET #cap_days = 20 ;
SELECT PD.id, Pd.name,
SUM(CASE WHEN o.created_at > PD.date_to_start_counting_sales THEN 1 ELSE 0 END) AS recent_sales_count ,
PD.date_to_start_counting_sales
FROM
(
SELECT p.*,
(CASE WHEN LowerCap.max_cumul_days IS NULL
THEN ADDDATE(ifnull(HigherCap.min_inactive_date,sysdate()),(-#cap_days))
ELSE
CASE WHEN LowerCap.max_cumul_days < #cap_days AND HigherCap.min_inactive_date IS NULL
THEN ADDDATE(ifnull(LowerCap.max_inactive_date,sysdate()),(-LowerCap.max_cumul_days))
ELSE ADDDATE(ifnull(HigherCap.min_inactive_date,sysdate()),(LowerCap.max_cumul_days-#cap_days))
END
END) as date_to_start_counting_sales
FROM products P
LEFT JOIN
(
SELECT product_id, MAX(cumul_days) AS max_cumul_days, MAX(inactive_date) AS max_inactive_date
FROM VIEW_product_dates_days_cumul
WHERE cumul_days <= #cap_days
GROUP BY product_id
) LowerCap ON P.id=LowerCap.product_id
LEFT JOIN
(
SELECT product_id, MIN(cumul_days) AS min_cumul_days, MIN(inactive_date) AS min_inactive_date
FROM VIEW_product_dates_days_cumul
WHERE cumul_days > #cap_days
GROUP BY product_id
) HigherCap ON P.id=HigherCap.product_id
) PD
LEFT JOIN items i ON PD.id = i.product_id
LEFT JOIN orders o ON o.id = i.order_id
GROUP BY PD.id, Pd.name, PD.date_to_start_counting_sales
Returns
| id | name | recent_sales_count | date_to_start_counting_sales |
|----|--------|--------------------|------------------------------|
| 1 | Apple | 3 | 2018-01-30T00:00:00Z |
| 2 | Banana | 0 | 2018-04-09T20:43:23Z |
| 3 | Grape | 1 | 2018-03-10T00:00:00Z |
FIDDLE : http://sqlfiddle.com/#!9/804f52/24
Not sure which version of MySql you're working with, but if you can use 8.0, that version came out with a lot of functionality that makes things slightly more doable (CTE's, row_number(), partition, etc.).
My recommendation would be to create a view like in this DB-Fiddle Example, call the view on server side and iterate programatically. There are ways of doing it in SQL, but it'd be a bear to write, test and likely would be less efficient.
Assumptions:
Products cannot be sold during inactive date ranges
Statuses table will always alternate status active/inactive/active for each product. I.e. no date ranges where a certain product is both active and inactive.
View Results:
+------------+-------------+------------+-------------+
| product_id | active_date | end_date | days_active |
+------------+-------------+------------+-------------+
| 1 | 2018-01-01 | 2018-02-01 | 31 |
+------------+-------------+------------+-------------+
| 1 | 2018-03-01 | 2018-03-15 | 14 |
+------------+-------------+------------+-------------+
| 1 | 2018-04-25 | 2018-04-29 | 4 |
+------------+-------------+------------+-------------+
| 2 | 2018-03-01 | 2018-04-29 | 59 |
+------------+-------------+------------+-------------+
| 3 | 2018-03-10 | 2018-03-15 | 5 |
+------------+-------------+------------+-------------+
View:
CREATE OR REPLACE VIEW days_active AS (
WITH active_rn
AS (SELECT *, Row_number()
OVER ( partition BY NAME, product_id
ORDER BY created_at) AS rownum
FROM statuses
WHERE name = 'active'),
inactive_rn
AS (SELECT *, Row_number()
OVER ( partition BY NAME, product_id
ORDER BY created_at) AS rownum
FROM statuses
WHERE name = 'inactive')
SELECT x1.product_id,
x1.created_at AS active_date,
CASE WHEN x2.created_at IS NULL
THEN Curdate()
ELSE x2.created_at
END AS end_date,
CASE WHEN x2.created_at IS NULL
THEN Datediff(Curdate(), x1.created_at)
ELSE Datediff(x2.created_at,x1.created_at)
END AS days_active
FROM active_rn x1
LEFT OUTER JOIN inactive_rn x2
ON x1.rownum = x2.rownum
AND x1.product_id = x2.product_id ORDER BY
x1.product_id);

Group records by two columns

I have table named "invoices". I would like to sum the amount and grouped it by companies and date
+------------+------------------+-------------+-----------+
| company_id | company | date | amount |
+------------+--------------------------------+-----------+
| 1 | chevrolet | 2017-11-18 | 100 |
| 1 | chevrolet | 2017-11-18 | -70 |
| 1 | chevrolet | 2017-11-25 | 50 |
| 2 | mercedes | 2017-04-01 | 30 |
| 2 | mercedes | 2017-04-01 | -30 |
| 2 | mercedes | 2017-09-01 | 50 |
| 3 | toyota | 2017-05-12 | 60 |
+------------+------------------+-------------+-----------+
The desired result is:
+------------+------------------+-------------+-----------+
| company_id | model_name | date | amount |
+------------+--------------------------------+-----------+
| 1 | chevrolet | 2017-11-18 | 30 |
| 1 | chevrolet | 2017-11-25 | 50 |
| 2 | mercedes | 2017-04-01 | 0 |
| 2 | mercedes | 2017-09-01 | 50 |
| 3 | toyota | 2017-05-12 | 60 |
+------------+------------------+-------------+-----------+
How can I do it?
You already have the spec in english there, it just needs translating to SQL:
select company_id, model_name, date, sum(amount) as amount
from invoices
group by company_id, model_name, date
In MySQL you can (depending on how it's configured) get away without doing the GROUP BY line, and you might see SQLs like this on your travels through the world of MySQL:
select company_id, model_name, date, sum(amount) as amount
from invoices
MySQL is implicitly inserting the group by for you.. Personally I'd always recommend to put it in explicitly, as few other DBs do an "auto group by" and sticking to standard SQL makes your SQL knowledge more portable. You might also find strong proponents of the "group by should always be implicit" argument which, I acknowledge, has its merits :)
Together
SELECT
SUM (amount)
FROM <table-name>
GROUP BY company, date;
For grouping by company
SELECT
SUM (amount)
FROM <table-name>
GROUP BY company;
For grouping by date
SELECT
SUM (amount)
FROM <table-name>
GROUP BY date;
Use following:
SELECT company_id, model_name, date , SUM(amount) AS amount
FROM invoices GROUP BY company, date;
See here and here for more about GROUP BY clause and examples.