Group records by two columns

Group records by two columns - mysql

I have table named "invoices". I would like to sum the amount and grouped it by companies and date
+------------+------------------+-------------+-----------+
| company_id | company | date | amount |
+------------+--------------------------------+-----------+
| 1 | chevrolet | 2017-11-18 | 100 |
| 1 | chevrolet | 2017-11-18 | -70 |
| 1 | chevrolet | 2017-11-25 | 50 |
| 2 | mercedes | 2017-04-01 | 30 |
| 2 | mercedes | 2017-04-01 | -30 |
| 2 | mercedes | 2017-09-01 | 50 |
| 3 | toyota | 2017-05-12 | 60 |
+------------+------------------+-------------+-----------+
The desired result is:
+------------+------------------+-------------+-----------+
| company_id | model_name | date | amount |
+------------+--------------------------------+-----------+
| 1 | chevrolet | 2017-11-18 | 30 |
| 1 | chevrolet | 2017-11-25 | 50 |
| 2 | mercedes | 2017-04-01 | 0 |
| 2 | mercedes | 2017-09-01 | 50 |
| 3 | toyota | 2017-05-12 | 60 |
+------------+------------------+-------------+-----------+
How can I do it?

You already have the spec in english there, it just needs translating to SQL:
select company_id, model_name, date, sum(amount) as amount
from invoices
group by company_id, model_name, date
In MySQL you can (depending on how it's configured) get away without doing the GROUP BY line, and you might see SQLs like this on your travels through the world of MySQL:
select company_id, model_name, date, sum(amount) as amount
from invoices
MySQL is implicitly inserting the group by for you.. Personally I'd always recommend to put it in explicitly, as few other DBs do an "auto group by" and sticking to standard SQL makes your SQL knowledge more portable. You might also find strong proponents of the "group by should always be implicit" argument which, I acknowledge, has its merits :)

Together
SELECT
SUM (amount)
FROM <table-name>
GROUP BY company, date;
For grouping by company
SELECT
SUM (amount)
FROM <table-name>
GROUP BY company;
For grouping by date
SELECT
SUM (amount)
FROM <table-name>
GROUP BY date;

Use following:
SELECT company_id, model_name, date , SUM(amount) AS amount
FROM invoices GROUP BY company, date;
See here and here for more about GROUP BY clause and examples.

Related

average, count, group by in select query

I am developing a booking engine web app.
Once an user made a booking it goes to this table.
id | Promo_code | total | arrival_date | departure_date | booked_date
1 | ABC1 | 1000 | 2019-02-06 | 2019-02-10 | 2019-02-02
2 | ABC1 | 2500 | 2019-02-07 | 2019-02-11 | 2019-02-03
3 | ABC1 | 3000 | 2019-02-12 | 2019-02-15 | 2019-02-03
4 | ABC2 | 5000 | 2019-02-07 | 2019-02-11 | 2019-02-02
5 | null | 3000 | 2019-02-12 | 2019-02-15 | 2019-02-01
Here the promo_code is what it names implies. If the user doesn't book with a promo_code it is null (5th record).
Hope other fields total, arrival_date, departure_date and booked_date are clear to you.
My question is I want to generate a report something like this.
promo_code | number_of_bookings | revenue | Average_length_of_stay | Average_depart_date | Average_reservation_revenue
ABC1 | 3 | 6500 | 3 | 5 | 2166
ABC2 | 1 | 5000 | 4 | 5 | 5000
This report is called revenue by promo code report.
If I explain what happend in this report is
Average_length_of_stay = (departure_date - arrival_date) / number_of_bookings
Average_depart_date = (departure_date - booked_date) / number_of_bookings
Of cause I could generate this report by the backend logic somehow. But I would be very painful. There must be a way to query this
in the SQL directly.
What I have done upto now is
SELECT promo_code ,count(*) as number_of_bookings,
sum(total) as revenue
FROM booking_widget.User_packages group by promo_code;
I am stuck with Average_length_of_stay, Average_depart_date and Average_reservation_revenue.
How do I get the average values which the group by clause?

It is trivial:
SELECT promo_code
, COUNT(*) AS number_of_bookings
, SUM(total) AS revenue
, AVG(DATEDIFF(departure_date, arrival_date)) AS average_length_of_stay
, AVG(DATEDIFF(departure_date, booked_date)) AS average_depart_date
, AVG(total) AS average_reservation_revenue
FROM t
GROUP BY promo_code

Selecting the most recent result from one table joining to another

I have two tables.
One table contains customer data, like name and email address. The other table contains a log of the status changes.
The status log table looks like this:
+-------------+------------+------------+
| customer_id | status | date |
+-------------+------------+------------+
| 1 | Bought | 2018-07-01 |
| 1 | Bought | 2018-07-02 |
| 2 | Ongoing | 2018-07-03 |
| 3 | Ongoing | 2018-07-04 |
| 1 | Not Bought | 2018-07-05 |
| 4 | Bought | 2018-07-06 |
| 4 | Not Bought | 2018-07-07 |
| 4 | Bought | 2018-07-08 | *
| 3 | Cancelled | 2018-07-09 |
+-------------+------------+------------+
And the customer data:
+-------------+------------+
| id | name | email |
+-------------+------------+
| 1 | Alex | alex#home |
| 2 | John | john#home |
| 3 | Simon | si#home |
| 4 | Philip | phil#home |
+-------------+------------+
I would like to select the customer's who have "Bought" in July (07). But exclude customers who's status has changed from "Bought" anything other most recently.
The result should be just one customer (Philip) - all the others have had their status change to something other than Bought most recently.
I have the following SQL:
SELECT
a.customer_id
FROM
statuslog a
WHERE
DATE(a.`date`) LIKE '2018-07-%'
AND a.status = 'Bought'
ORDER BY a.date DESC
LIMIT 1
But that is as far as I have got! The above query only returns one result, but essentially there could be more than one.
Any help is appreciated!

Here is an approach that uses a correlated subquery to get the most recent status record:
SELECT sl.customerid
FROM wwym_statuslog sl
WHERE sl.date = (SELECT MAX(sl2.date)
FROM wwym_statuslog sl2
WHERE sl2.customer_id = sl.customer_id AND
sl2.date >= '2018-07-01' AND
sl2.date < '2018-08-01'
) AND
sl.status = 'Bought'
ORDER BY sl.date DESC
LIMIT 1;
Notes:
Use meaningful table aliases! That is, abbreviations for the table names, rather than arbitrary letters such as a and b.
Use proper date arithmetic. LIKE is for strings. MySQL has lots of date functions that work.
In MySQL 8+, you would use ROW_NUMBER().

Query with dynamic date intervals

Given a statuses table that holds information about products availability, how do I select the date that corresponds to the 1st day in the latest 20 days that the product has been active?
Yes I know the question is hard to follow. I think another way to put it would be: I want to know how many times each product has been sold in the last 20 days that it was active, meaning the product could have been active for years, but I'd only want the sales count from the latest 20 days that it had a status of "active".
It's something easily doable in the server-side (i.e. getting any collection of products from the DB, iterating them, performing n+1 queries on the statuses table, etc), but I have hundreds of thousands of items so it's imperative to do it in SQL for performance reasons.
table : products
+-------+-----------+
| id | name |
+-------+-----------+
| 1 | Apple |
| 2 | Banana |
| 3 | Grape |
+-------+-----------+
table : statuses
+-------+-------------+---------------+---------------+
| id | name | product_id | created_at |
+-------+-------------+---------------+---------------+
| 1 | active | 1 | 2018-01-01 |
| 2 | inactive | 1 | 2018-02-01 |
| 3 | active | 1 | 2018-03-01 |
| 4 | inactive | 1 | 2018-03-15 |
| 6 | active | 1 | 2018-04-25 |
| 7 | active | 2 | 2018-03-01 |
| 8 | active | 3 | 2018-03-10 |
| 9 | inactive | 3 | 2018-03-15 |
+-------+-------------+---------------+---------------+
table : items (ordered products)
+-------+---------------+-------------+
| id | product_id | order_id |
+-------+---------------+-------------+
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 3 | 1 | 3 |
| 4 | 1 | 4 |
| 5 | 1 | 5 |
| 6 | 2 | 3 |
| 7 | 2 | 4 |
| 8 | 2 | 5 |
| 9 | 3 | 5 |
+-------+---------------+-------------+
table : orders
+-------+---------------+
| id | created_at |
+-------+---------------+
| 1 | 2018-01-02 |
| 2 | 2018-01-15 |
| 3 | 2018-03-02 |
| 4 | 2018-03-10 |
| 5 | 2018-03-13 |
+-------+---------------+
I want my final results to look like this:
+-------+-----------+----------------------+--------------------------------+
| id | name | recent_sales_count | date_to_start_counting_sales |
+-------+-----------+----------------------+--------------------------------+
| 1 | Apple | 3 | 2018-01-30 |
| 2 | Banana | 0 | 2018-04-09 |
| 3 | Grape | 1 | 2018-03-10 |
+-------+-----------+----------------------+--------------------------------+
So this is what I mean by latest 20 active days for e.g. Apple:
It was last activated at '2018-04-25'. That's 4 days ago.
Before that, it was inactive since '2018-03-15', so all these days until '2018-04-25' don't count.
Before that, it was active since '2018-03-01'. That's more 14 days until '2018-03-15'.
Before that, inactive since '2018-02-01'.
Finally, it was active since '2018-01-01', so it should only count the missing 2 days (4 + 14 + 2 = 20) backwards from '2018-02-01', resulting in date_to_start_counting_sales = '2018-01-30'.
With the '2018-01-30' date in hand, I'm then able to count Apple orders in the last 20 active days: 3.
Hope that makes sense.
Here is a fiddle with the data provided above.

I've got a standard SQL solution, that does not use any window function as you are on MySQL 5
My solution requires 3 stacked views.
It would have been better with a CTE but your version doesn't support it. Same goes for the stacked Views... I don't like to stack views and always try to avoid it, but sometimes you have no other choice, because MySQL doesn't accept subqueries in FROM clause for Views.
CREATE VIEW VIEW_product_dates AS
(
SELECT product_id, created_at AS active_date,
(
SELECT created_at
FROM statuses ti
WHERE name = 'inactive' AND ta.created_at < ti.created_at AND ti.product_id=ta.product_id
GROUP BY product_id
) AS inactive_date
FROM statuses ta
WHERE name = 'active'
);
CREATE VIEW VIEW_product_dates_days AS
(
SELECT product_id, active_date, inactive_date, datediff(IFNULL(inactive_date, SYSDATE()),active_date) AS nb_days
FROM VIEW_product_dates
);
CREATE VIEW VIEW_product_dates_days_cumul AS
(
SELECT product_id, active_date, ifnull(inactive_date,sysdate()) AS inactive_date, nb_days,
IFNULL((SELECT SUM(V2.nb_days) + V1.nb_days
FROM VIEW_product_dates_days V2
WHERE V2.active_date >= IFNULL(V1.inactive_date, SYSDATE()) AND V1.product_id=V2.product_id
),V1.nb_days) AS cumul_days
FROM VIEW_product_dates_days V1
);
The final view produce this :
| product_id | active_date | inactive_date | nb_days | cumul_days |
|------------|----------------------|----------------------|---------|------------|
| 1 | 2018-01-01T00:00:00Z | 2018-02-01T00:00:00Z | 31 | 49 |
| 1 | 2018-03-01T00:00:00Z | 2018-03-15T00:00:00Z | 14 | 18 |
| 1 | 2018-04-25T00:00:00Z | 2018-04-29T11:28:39Z | 4 | 4 |
| 2 | 2018-03-01T00:00:00Z | 2018-04-29T11:28:39Z | 59 | 59 |
| 3 | 2018-03-10T00:00:00Z | 2018-03-15T00:00:00Z | 5 | 5 |
So it aggregates all active periods of all products, it counts the number of days for each period, and the cumulative days of all past active periods since current date.
Then we can query this final view to get the desired date for each product. I set a variable for your 20 days, so you can change that number easily if you want.
SET #cap_days = 20 ;
SELECT PD.id, Pd.name,
SUM(CASE WHEN o.created_at > PD.date_to_start_counting_sales THEN 1 ELSE 0 END) AS recent_sales_count ,
PD.date_to_start_counting_sales
FROM
(
SELECT p.*,
(CASE WHEN LowerCap.max_cumul_days IS NULL
THEN ADDDATE(ifnull(HigherCap.min_inactive_date,sysdate()),(-#cap_days))
ELSE
CASE WHEN LowerCap.max_cumul_days < #cap_days AND HigherCap.min_inactive_date IS NULL
THEN ADDDATE(ifnull(LowerCap.max_inactive_date,sysdate()),(-LowerCap.max_cumul_days))
ELSE ADDDATE(ifnull(HigherCap.min_inactive_date,sysdate()),(LowerCap.max_cumul_days-#cap_days))
END
END) as date_to_start_counting_sales
FROM products P
LEFT JOIN
(
SELECT product_id, MAX(cumul_days) AS max_cumul_days, MAX(inactive_date) AS max_inactive_date
FROM VIEW_product_dates_days_cumul
WHERE cumul_days <= #cap_days
GROUP BY product_id
) LowerCap ON P.id=LowerCap.product_id
LEFT JOIN
(
SELECT product_id, MIN(cumul_days) AS min_cumul_days, MIN(inactive_date) AS min_inactive_date
FROM VIEW_product_dates_days_cumul
WHERE cumul_days > #cap_days
GROUP BY product_id
) HigherCap ON P.id=HigherCap.product_id
) PD
LEFT JOIN items i ON PD.id = i.product_id
LEFT JOIN orders o ON o.id = i.order_id
GROUP BY PD.id, Pd.name, PD.date_to_start_counting_sales
Returns
| id | name | recent_sales_count | date_to_start_counting_sales |
|----|--------|--------------------|------------------------------|
| 1 | Apple | 3 | 2018-01-30T00:00:00Z |
| 2 | Banana | 0 | 2018-04-09T20:43:23Z |
| 3 | Grape | 1 | 2018-03-10T00:00:00Z |
FIDDLE : http://sqlfiddle.com/#!9/804f52/24

Not sure which version of MySql you're working with, but if you can use 8.0, that version came out with a lot of functionality that makes things slightly more doable (CTE's, row_number(), partition, etc.).
My recommendation would be to create a view like in this DB-Fiddle Example, call the view on server side and iterate programatically. There are ways of doing it in SQL, but it'd be a bear to write, test and likely would be less efficient.
Assumptions:
Products cannot be sold during inactive date ranges
Statuses table will always alternate status active/inactive/active for each product. I.e. no date ranges where a certain product is both active and inactive.
View Results:
+------------+-------------+------------+-------------+
| product_id | active_date | end_date | days_active |
+------------+-------------+------------+-------------+
| 1 | 2018-01-01 | 2018-02-01 | 31 |
+------------+-------------+------------+-------------+
| 1 | 2018-03-01 | 2018-03-15 | 14 |
+------------+-------------+------------+-------------+
| 1 | 2018-04-25 | 2018-04-29 | 4 |
+------------+-------------+------------+-------------+
| 2 | 2018-03-01 | 2018-04-29 | 59 |
+------------+-------------+------------+-------------+
| 3 | 2018-03-10 | 2018-03-15 | 5 |
+------------+-------------+------------+-------------+
View:
CREATE OR REPLACE VIEW days_active AS (
WITH active_rn
AS (SELECT *, Row_number()
OVER ( partition BY NAME, product_id
ORDER BY created_at) AS rownum
FROM statuses
WHERE name = 'active'),
inactive_rn
AS (SELECT *, Row_number()
OVER ( partition BY NAME, product_id
ORDER BY created_at) AS rownum
FROM statuses
WHERE name = 'inactive')
SELECT x1.product_id,
x1.created_at AS active_date,
CASE WHEN x2.created_at IS NULL
THEN Curdate()
ELSE x2.created_at
END AS end_date,
CASE WHEN x2.created_at IS NULL
THEN Datediff(Curdate(), x1.created_at)
ELSE Datediff(x2.created_at,x1.created_at)
END AS days_active
FROM active_rn x1
LEFT OUTER JOIN inactive_rn x2
ON x1.rownum = x2.rownum
AND x1.product_id = x2.product_id ORDER BY
x1.product_id);

Get MAX inputted date within sub query

I have a script which is working but not as desired. My aim is to select the most recently inputted record on the plans database for each seller in the account_manager_sellers list.
The current issue with the script below is: It is returning the oldest record rather than the newest, for example: it is selecting a record in 2016 rather than one which has a timestamp in 2018. (eventually I need to change the WHERE clause to get all lastsale records before 2017-01-01.
Simple Database Samples.
plans AKA (sales list)
+----+------------------+-----------+
| id | plan_written | seller_id |
+----+------------------+-----------+
| 1 | 20/09/2016 09:12 | 123 |
| 2 | 22/12/2016 09:45 | 444 |
| 3 | 19/10/2016 09:07 | 555 |
| 4 | 02/10/2015 14:26 | 123 |
| 5 | 15/08/2016 11:06 | 444 |
| 6 | 16/08/2016 11:03 | 123 |
| 7 | 03/10/2016 10:15 | 555 |
| 8 | 28/09/2016 10:12 | 123 |
| 9 | 27/09/2016 15:12 | 444 |
+----+------------------+-----------+
account_manager_sellers (seller list)
+-----+----------+
| id | name |
+-----+----------+
| 123 | person 1 |
| 444 | person 2 |
| 555 | person 3 |
+-----+----------+
Current Code Used
SELECT p.plan_written, p.seller_id
FROM plans AS p NATURAL JOIN (
SELECT id, MAX(plan_written) AS lastsale
FROM plans
GROUP BY seller_id
) AS t
JOIN account_manager_sellers AS a ON a.id = p.seller_id
WHERE lastsale < "2018-05-08 00:00:00"
Summary
Using the code and example tables above, this code would return these 3 results, whilst we do expect 3 results, the MAX(plan_written) does not seem to have followed, my guess is that it is something to do with the GROUP clause, I am not sure if we can utilise an ORDER BY and LIMIT clause?
+--------------+------------------+
| seller_id | plan_written |
+--------------+------------------+
| 123 | 16/08/2016 11:03 |
| 444 | 15/08/2016 11:06 |
| 555 | 03/10/2016 10:15 |
+--------------+------------------+

The join condition in your query is off, and you should be restricting to the max date for each seller. Also, you don't need to join to the account_manager_sellers table to get your expected output:
SELECT p1.*
FROM plans p1
INNER JOIN
(
SELECT
seller_id, MAX(plan_written) AS max_plan_written
FROM plans
WHERE plan_written < '2018-05-08 00:00:00'
GROUP BY seller_id
) p2
ON p1.seller_id = p2.seller_id AND
p1.plan_written = p2.max_plan_written;

Calculate sum for group records in MySQL

I have this table of orders
| ORDER_ID | PRODUCT | CUSTOMER | QTY | DATE
---------------------------------------------
| 1 | shoes | Nick | 1 | 01/01/2016
| 2 | shirts | Nick | 5 | 02/02/2016
| 3 | shoes | Paul | 10 | 03/03/2016
| 4 | shirts | Paul | 20 | 04/04/2016
So, How can I achieve this report result with ONE Select Statement?
| Date_of_Order | Customer | Quantity | PRODUCT_TOTAL_SALES |
-----------------------------------------------------------------
| 01/01/2016 | Nick | 1 | shoes : 11 |
| 02/02/2016 | Nick | 10 | shirts : 25 |
| 03/03/2016 | Paul | 5 | shoes : 11 |
| 04/04/2016 | Paul | 20 | shirts : 25 |
I know how to use concat(column1, ' ', column2) to create a combined column but I haven't succeed to add a sum for a grouped item there. When I try with left join I get the sum for a product ...BUT its always the whole sum and its not related to the dates of the order so when I try to filter the results on my query for a certain period I still get 11 for shoes and 25 for shirts...

You can group by multiple columns and get the sum for the smallest group.
If you want the daily sales, then instead of GROUP BY product use GROUP BY product, date
SELECT
o.`date` AS Date_of_Order,
SUM(o.qty) as Total_Quantity,
CONCAT(o.product, ':', SUM(o.qty))
FROM
orders o
GROUP BY product, `date`
ORDER BY `date`

Simple additional SELECT from same table can do that for entire period:
SELECT
o.`date` AS Date_of_Order,
o.Customer,
o.qty as Quantity,
(SELECT
CONCAT(oo.product, ':', SUM(oo.qty))
FROM
orders oo
WHERE
oo.product = o.product
) PRODUCT_TOTAL_SALES
FROM
orders o
Output:
+---------------+----------+----------+---------------------+
| Date_of_Order | Customer | Quantity | PRODUCT_TOTAL_SALES |
+---------------+----------+----------+---------------------+
| 01/01/2016 | Nick | 1 | shoes:11 |
| 02/02/2016 | Nick | 5 | shirts:25 |
| 03/03/2016 | Paul | 10 | shoes:11 |
| 04/04/2016 | Paul | 20 | shirts:25 |
+---------------+----------+----------+---------------------+
4 rows in set
If you want to filter by certain period, you must include it in both:
SELECT
o.`date` AS Date_of_Order,
o.Customer,
o.qty as Quantity,
(SELECT
CONCAT(oo.product, ':', sum(oo.qty))
FROM
orders oo
WHERE
oo.product = o.product
AND STR_TO_DATE(oo.`date`,'%d/%m/%Y') BETWEEN '2016-01-01' AND '2016-03-03'
) PRODUCT_TOTAL_SALES
FROM
orders o
WHERE
STR_TO_DATE(o.`date`,'%d/%m/%Y') BETWEEN '2016-01-01' AND '2016-03-03'
Output:
+---------------+----------+----------+---------------------+
| Date_of_Order | customer | Quantity | PRODUCT_TOTAL_SALES |
+---------------+----------+----------+---------------------+
| 01/01/2016 | Nick | 1 | shoes:11 |
| 02/02/2016 | Nick | 5 | shirts:5 |
| 03/03/2016 | Paul | 10 | shoes:11 |
+---------------+----------+----------+---------------------+
3 rows in set

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Group records by two columns - mysql

Together SELECT SUM (amount) FROM <table-name> GROUP BY company, date; For grouping by company SELECT SUM (amount) FROM <table-name> GROUP BY company; For grouping by date SELECT SUM (amount) FROM <table-name> GROUP BY date;

Use following: SELECT company_id, model_name, date , SUM(amount) AS amount FROM invoices GROUP BY company, date; See here and here for more about GROUP BY clause and examples.

Related

average, count, group by in select query

Selecting the most recent result from one table joining to another

Query with dynamic date intervals

Get MAX inputted date within sub query

Calculate sum for group records in MySQL

Categories

Resources