MySQL self join to get past averages - mysql

I am trying to find an average of past records in the database based on a specific time frame (between 9 and 3 months ago) if there is no value recorded for a recent sale. the reason for this is recent sales on our website sometimes do not immediately collect commissions so i am needing to go back to historic records to find out what a commission rate estimate might be.
Commission rate is calculated as:
total_commission / gross_sales
It is only necessary to find out what an estimate would be if a recent sale has no "total_commission" recorded
here is what i have tried so far but i think this is wrong:
SELECT
cs.*
,SUM(cs2.gross_sales)
,SUM(cs2.total_commission)
FROM
(SELECT
sale_id
, date
, customer_code
, customer_country
, gross_sales
, total_commission
FROM customer_sale cs ) cs
LEFT JOIN customer_sale cs2
ON cs2.customer_code = cs.customer_code
AND cs2.customer_country = cs.customer_country
AND cs2.date > cs.date - interval 9 month
AND cs2.date < cs.date - interval 3 month
GROUP BY cs.sale_id
so that data would be structured as follows:
sale_id date customer_code customer_country gross_sales total_commission
1 2013-12-01 cust1 united states 10000 1500
2 2013-12-01 cust2 france 20000 3000
3 2013-12-01 cust3 united states 15000 2250
4 2013-12-01 cust4 france 14000 2100
5 2013-12-01 cust5 united states 13000 1950
6 2013-12-01 cust6 france 12000 1800
7 2014-04-02 cust1 united states 10000
8 2014-04-02 cust2 france 20000
9 2014-04-02 cust3 united states 15000
10 2014-04-02 cust4 france 14000
11 2014-04-02 cust5 united states 13000
12 2014-04-02 cust6 france 12000
so I would need to output results from the query similar to this: (based on sales between 9 and 3 months ago from the same customer_code in the same customer_country)
sale_id date customer_code customer_country gross_sales total_commission gross_sales_past total_commission_past
1 2013-12-01 cust1 united states 10000 1500
2 2013-12-01 cust2 france 20000 3000
3 2013-12-01 cust3 united states 15000 2250
4 2013-12-01 cust4 france 14000 2100
5 2013-12-01 cust5 united states 13000 1950
6 2013-12-01 cust6 france 12000 1800
7 2014-04-02 cust1 united states 10000 10000 1500
8 2014-04-02 cust2 france 20000 20000 3000
9 2014-04-02 cust3 united states 15000 15000 2250
10 2014-04-02 cust4 france 14000 14000 2100
11 2014-04-02 cust5 united states 13000 13000 1950
12 2014-04-02 cust6 france 12000 12000 1800

Your query looks mostly right, but I think your outer query needs to be GROUP BY cs.sale_id (assuming that sale_id is unique in the customer_sale table, and assuming that the date column is datatype DATE, DATETIME, or TIMESTAMP).
And I think you want to include a join predicate so that you match only match "past" rows to those rows where you don't have a total commission, e.g.
AND cs.total_commission IS NULL
And I don't think you really need an inline view.
Here's what I came up with:
SELECT cs.sale_id
, cs.date
, cs.customer_code
, cs.customer_country
, cs.gross_sales
, cs.total_commission
, SUM(ps.gross_sales) AS gross_sales_past
, SUM(ps.total_commission) AS total_commission_past
FROM customer_sale cs
LEFT
JOIN customer_sale ps
ON ps.customer_code = cs.customer_code
AND ps.customer_country = cs.customer_country
AND ps.date > cs.date - INTERVAL 9 MONTH
AND ps.date < cs.date - INTERVAL 3 MONTH
AND cs.total_commission IS NULL
GROUP
BY cs.sale_id
Appropriate indexes will likely improve performance of the query. Likely, the EXPLAIN output will show "Using temporary; Using filesort", and that can be expensive for large sets.
MySQL will likely be able to make use of a covering index for the JOIN:
... ON customer_sale (customer_code,customer_country,date,gross_sales,total_commission).

Related

Moving average in MYSQL without dates but grouped by another column

I have a table in MYSQL(version 5.7.33) which looks like shown below:
Date
SalesRep
Sale
2021-04-01
Jack
10
2021-04-02
Jack
8
2021-03-01
Lisa
10
2021-03-02
Lisa
14
2021-03-03
Lisa
21
2021-03-04
Lisa
7
2021-03-08
Lisa
10
2021-03-09
Lisa
20
2021-03-10
Lisa
15
I want the moving average of Sale column, but don't want that to be based on the dates since the dates have gap, instead I want it based on row numbers and grouped by SalesRep. So something like this:
Date
SalesRep
Sale
MoveAvg
2021-04-01
Jack
10
10
2021-04-02
Jack
8
9
2021-03-01
Lisa
10
10
2021-03-02
Lisa
14
12
2021-03-03
Lisa
21
15
2021-03-04
Lisa
7
13
2021-03-08
Lisa
10
12.4
2021-03-09
Lisa
20
13.6
2021-03-10
Lisa
15
13.8
So the moving average is for all the dates from start to finish for a particular sales rep and then it starts over for another sales rep and so on. Is this possible to do in MYSQL? Thank you in advance!
You could use avg as a window function with a frame clause for this:
SELECT dt, salesrep, sale,
AVG(sale) OVER (PARTITION BY salesrep ORDER BY dt
ROWS UNBOUNDED PRECEDING)
AS moveavg
Without window functions, you simply join all previous rows for each salesrep:
select a.dt, a.salesrep, a.sale, avg(b.sale) as moveavg
from mysterytablename a
join mysterytablename b on b.salesrep=a.salesrep and b.dt <= a.dt
group by a.salesrep, a.dt

Select limit result in mysql

I have table with these data:
Id City Amount
1 London 25000
2 New York 20000
3 London 23000
4 Paris 22000
5 Moscow 18000
6 London 21000
7 New York 19000
8 Moscow 26000
9 London 24000
10 Moscow 16000
11 London 15000
12 Moscow 23000
13 Paris 19000
14 New York 15000
15 London 26000
I must create SQL as what to get the results as this?
Id City Amount
1 London 25000
2 New York 20000
3 London 23000
4 Paris 22000
5 Moscow 18000
7 New York 19000
8 Moscow 26000
13 Paris 19000
That means I just want to get the maximum of one city only to appear 2 times.
I just want to get the first two records or the last two records
Thanks!
If your MySQL version < 8.0, you can simulate RowNumber functionality, using Session variables. To achieve Partition By functionality based on grouping, we will use two session variables, one for the row number and the other for storing the old City to compare it with the current one, and increment it by 1, if belonging to same City group, else reset to 1.
Following code will get your first two records. You can easily change it to get first n records, by changing 2 to n in the query.
SET #row_number = 0;
SET #city_var = '';
SELECT inner_nest.Id,
inner_nest.City,
inner_nest.Amount
FROM (
SELECT
#row_number:=CASE
WHEN #city_var = City THEN #row_number + 1
ELSE 1
END AS num,
Id,
#city_var:=City as City,
Amount
FROM
table_name
ORDER BY
City) AS inner_nest
WHERE inner_nest.num <= 2
ORDER BY inner_nest.Id ASC
**
SQL Fiddle
**

Selecting Only the most recent date

I'm having difficulties with a query that absolutely has me stumped. I have a mysql database for a restaurant chain that keeps track of menu item prices from year to year. In this particular query I'm trying to obtain only the most recent price for an item at each store.
ItemMenu
pk storeNum itemNum vendorNum size price year
1 5555 2000 3150 Large 3.99 2015
2 5555 2000 3150 Large 3.75 2014
3 3333 2000 3153 Large 3.69 2014
4 2222 2000 3150 Large 3.89 2014
5 2222 2000 3150 Large 3.69 2013
ItemList
itemNum item categoryNum
2000 Mashed Potatoes 2000
2001 Green Beans 2000
2002 Coleslaw 2000
2003 Baked Beans 2000
2004 Corn 2000
ItemCategory
categoryNum type
2000 Side
2001 Dessert
2002 Drink
2003 Sauce
ItemVendor
vendorNum vendorName
3150 Acme Foods
3152 John's Vegetables
3153 Smith's Wholesale
Stores
storeNum franchisee address phone
5555 David Smith 9999 Main st 555-1212
3333 James Bond 123 Baker 867-5309
2222 Mark Jones 450 21st Ave 888-5411
What I would like to have returned is
storeNum, franchisee, item, type, vendorName, size, price, year
But only for the most recent year.
5555, David Smith, Mashed Potatoes, Side, Acme Foods, Large, 3.99, 2015
3333, James Bond, Mashed Potatoes, Side, Smith's Wholesale, 3.69, 2014
2222, Mark Jones, Mashed Potatoes, Side, Acme Foods, Large, 3.89, 2014
I hope that made sense, I'm at a complete loss of how to join the multiple tables and only pulling data for the most recent year.
Thanks,
Kevin
I have this working but have run into another issue where I may have multiple prices for a given year due to a mid-year price increase. How can I go about adding an additional sub-query to grab the max price after I've selected the max year?
My current query
SELECT m.storeNum, m.itemNum,size,m.price,year FROM ItemMenu m,
(SELECT storeNum, itemNum, MAX(year) maxYear FROM ItemMenu
GROUP BY storeNum, itemNum) yt, (SELECT storeNum, itemNum, MAX(price)
maxPrice FROM ItemMenu) mp
WHERE m.storeNum=yt.storeNum AND m.itemNum=yt.itemNum
AND m.year=yt.maxYear AND m.itemNum=5000 AND m.storeNum=205706;
Returns valid results for max year (I've selected a specific store and item to reduce the number of results).
+----------+---------+------------+-------+------+
| storeNum | itemNum | size | price | year |
+----------+---------+------------+-------+------+
| 205706 | 5000 | Individual | 1.59 | 2014 |
| 205706 | 5000 | Large | 3.69 | 2014 |
| 205706 | 5000 | Large | 3.59 | 2014 |
| 205706 | 5000 | Individual | 1.79 | 2014 |
+----------+---------+------------+-------+------+
I need to further reduce this so I only get the values of $1.79 and 3.69.
Thanks
-Kevin
You'll need to use a subquery: 1st get a set of the most recent year for a given (item,store) pairing. Next, select the price for that (item,store,year) triplet:
SELECT m.storeNum, m.itemNum,price,year FROM ItemMenu m,
(SELECT storeNum, itemNum, MAX(year) maxYear FROM ItemMenu
GROUP BY storeNum, itemNum) yt
WHERE m.storeNum=yt.storeNum AND m.itemNum=yt.itemNum
AND m.year=yt.maxYear;
You can, of course, join the various ID->name tables onto this to get the human-readable data, but I suspect your issue was figuring out how to get the most recent prices.
It should be also noted that this could be done with a JOIN rather than including the subquery in the FROM section; that may be faster.

MySQL AVG and Grouping

I'm struggling with a MySQL statement and was hoping for some guidance as I am close, but not quite there. I have a database that contains a table of property addresses and of property rental listings. The property addresses are related to a table or regions, which is related to a table of districts, which is then related to a table of suburbs.
I am trying to create a result which gives me the average rent in each suburb per month and by the number of bedrooms.
For example:
District Suburb Month YEAR YMD Bedrooms DataAverage
Nelson The Brook 01 2012 2012-01-01 00:00 1 190
Nelson The Brook 01 2012 2012-01-01 00:00 2 274
Nelson The Brook 01 2012 2012-01-01 00:00 3 341
Which I can then convert into a table as follows:
Average Rent
Beds by Suburb Jan-12 Feb-12 Mar-12 Apr-12 May-12 Jun-12 Jul-12
The Brook
1 $150 $245 $160 $285 $135 $370 $350
2 $330 $340 $380 $310 $335 $345 $355
3 $350 $380 $310 $395 $380 $350 $350
Inner City
1 $160 $245 $260 $285 $295 $300 $350
2 $360 $440 $480 $410 $535 $545 $555
3 $370 $480 $510 $595 $480 $450 $550
My Current SQL query is this:
SELECT d.name as District, s.name AS Suburb,
FROM_UNIXTIME(l.StartDate,'%m') AS Month,
FROM_UNIXTIME(l.StartDate,'%Y') AS YEAR,
FROM_UNIXTIME(l.StartDate, '%Y-%m-01 00:00') AS YMD,
p.Bedrooms,
REPLACE(FORMAT(AVG(l.RentPerWeek),0),',','') AS DataAverage
FROM properties p
LEFT JOIN listings l on l.property_id=p.id
LEFT JOIN regions r on p.region_id=r.id
LEFT JOIN districts d on d.region_id=r.id
LEFT JOIN suburbs s on s.district_id=d.id
WHERE FROM_UNIXTIME(l.StartDate) BETWEEN DATE(NOW()) - INTERVAL (DAY(NOW()) - 1) DAY - INTERVAL 11 MONTH AND NOW()
GROUP BY District, Suburb, Year, Month, Bedrooms
ORDER BY District, Suburb ASC, YMD ASC, Bedrooms ASC
Unfortunately what I am getting is the same result for each and every suburb. I think I may need to create a subquery SQL statement to get this to work properly, but I'm not entirely sure.
So I am getting something like this:
District Suburb Month YEAR YMD Bedrooms DataAverage
Nelson The Brook 01 2012 2012-01-01 00:00 1 190
Nelson The Brook 01 2012 2012-01-01 00:00 2 330
Nelson The Brook 01 2012 2012-01-01 00:00 3 350
Nelson The Brook 02 2012 2012-02-01 00:00 1 245
Nelson The Brook 02 2012 2012-02-01 00:00 2 340
Nelson The Brook 02 2012 2012-02-01 00:00 3 380
...
Nelson Inner City 01 2012 2012-01-01 00:00 1 190
Nelson Inner City 01 2012 2012-01-01 00:00 2 330
Nelson Inner City 01 2012 2012-01-01 00:00 3 350
Nelson Inner City 02 2012 2012-02-01 00:00 1 245
Nelson Inner City 02 2012 2012-02-01 00:00 2 340
Nelson Inner City 02 2012 2012-02-01 00:00 3 380
.etc.
Average Rent
Beds by Suburb Jan-12 Feb-12 Mar-12 Apr-12 May-12 Jun-12 Jul-12
The Brook
1 $150 $245 $160 $285 $135 $370 $350
2 $330 $340 $380 $310 $335 $345 $355
3 $350 $380 $310 $395 $380 $350 $350
Inner City
1 $150 $245 $160 $285 $135 $370 $350
2 $330 $340 $380 $310 $335 $345 $355
3 $350 $380 $310 $395 $380 $350 $350
Any pointers or assistance would be greatly appreciated.
Assuming that id is the primary key of each table, then according to your query text, a property is associated with a region, by virtue of the region_id column on the properties table:
FROM properties p
LEFT
JOIN regions r
ON p.region_id=r.id
A district is associated with a region (presumably, a district is a subdivision of a region.)
LEFT
JOIN districts d
ON d.region_id=r.id
and a suburb is associated with a district (presumably, a suburb is a subdivision of a district.)
LEFT
JOIN suburbs s
ON s.district_id=d.id
The net result is that every property within a region is getting associated with EVERY district within that region, and associated with EVERY suburb within each district.
So, you are getting the rent values averaged for all properties within a region.
To get rent values per suburb, you really need the relationship between a property and its suburb.
What you really need is a suburb_id column on the properties table as a foreign key to the suburbs table.
LEFT
JOIN suburbs s
ON s.district_id=d.id
AND s.id = p.suburb_id

Empty set returned from query

Any help is greatly appreciated.
I have a table hospital:
Nurse + Year + No.Patients
A001 |2000 | 23
A001 |2001 | 30
A001 |2002 | 35
B001 |2000 | 12
B001 |2001 | 15
B001 |2002 | 45
C001 |2000 | 50
C002 |2001 | 59
C003 |2002 | 69
etc
What I am trying to do is work out which nurse
had the greatest increase of patients for the years 2000 - 2002.
Clearly B001 did as her patients increased from 12 to 45 and increase of 33
and what I am trying to produce is the result B001 | 33.
This is what I have so far:
select a.nurse,a.nopats from hospital as a
join
( select nurse,max(nopats)-min(nopats) as growth
from hospital where year between 2000 and 2002 group by nurse ) as s1
on a.nurse = s1.nurse and a.nopats = s1.growth
where year between 2000 and 2002;
but all I get returned is an empty set.
I think I need an overall max(nopats) after the join.
Any help here would be great.
Thanks!
Try this:
SELECT nurse, (max(nopats) - min(nopats)) AS growth
FROM hospital
WHERE year BETWEEN 2000 AND 2002
GROUP BY nurse
ORDER BY growth DESC
LIMIT 1;
Result: B001 | 33 due to LIMIT 1; just leave it away if you want more results.
SELECT nurse, MAX(nopats) - MIN(nopats) AS Growth
FROM hospital
WHERE year BETWEEN 2000 AND 2002
GROUP BY nurse
ORDER BY Growth
That should do it. Let me know if thats what you needed.