I have 1 table:
id | year | quarter | month | brand | sku | total_unit_sales
-----------------------------------------------------------------
1 | 2010 | 1 | Jan | Toyota | 123 | 156
2 | 2010 | 1 | Jan | Toyota | 124 | 77
3 | 2010 | 1 | Jan | Toyota | 125 | 325
4 | 2010 | 1 | Feb | Toyota | 123 | 184
5 | 2010 | 1 | Feb | Toyota | 124 | 98
6 | 2010 | 1 | Feb | Toyota | 125 | 219
7 | 2010 | 1 | Mar | Toyota | 123 | 178
8 | 2010 | 1 | Mar | Toyota | 124 | 101
9 | 2010 | 1 | Mar | Toyota | 125 | 215
10 | 2010 | 1 | Apr | Toyota | 123 | 216
11 | 2010 | 1 | Apr | Toyota | 124 | 115
12 | 2010 | 1 | Apr | Toyota | 125 | 278
I need to create delta indexes (the percentage of variation one time period to other period) on sales by brand, month, year. Those indexes are average variation in the last 12 months, variation in current quarter, last month vs previous month.
I once achieved this in a multi stage way, creating many summarizing tables and then generating the desired report. However this was a manual customized process. Now I need a fully automated way where the data source is updated and the report generated.
I' ve been working on a self join, however the results are less than desirable, having in mind that can compare previous price versus newest price by self joining the table with:
left join on a.id=b.id+1
This is prone to error due to the fact that some months do not gahter sales data of some specific sku, not sold on that month.
I appreciate your help. Thanks in advance. mysql ver 5.5+
Your join approach sounds like a valid way to proceed. You can improve the quality of your figures by using an outer join, and IFNULL() to generate sensible values.
Related
I am trying to get the rows that don't exist in one table where one table called schedules (match_week, player_home_id, player_away_id) and the other table called match (match_week, Winner_id, Defeated_id) are joined. The players look at their schedule and play a match. I am trying to get a list of the scheduled matches that do not exist in the match table. The IDs in the match table can be in either column Winner_id or Defeated_id.
I have reviewed a number of Stack Exchange examples, but most use "IS NULL" and I don't have null values. I have used a Join that does give the output of the matches played. I would like the matches that have not been played.
CSV - wp_schedule_test
+----+------------+--------------+--------------+-----------------+-----------------+
| ID | match_week | home_player1 | away_player1 | player1_home_id | player1_away_id |
+----+------------+--------------+--------------+-----------------+-----------------+
| 1 | WEEK 1 | James Rives | Dale Hemme | 164 | 169 |
| 2 | WEEK 1 | John Head | David Foster | 81 | 175 |
| 3 | WEEK 1 | John Dalton | Eric Simmons | 82 | 23 |
| 4 | WEEK 2 | John Head | James Rives | 81 | 164 |
| 5 | WEEK 2 | Dale Hemme | John Dalton | 169 | 82 |
| 6 | WEEK 2 | David Foster | Eric Simmons | 175 | 23 |
| 7 | WEEK 3 | John Dalton | James Rives | 82 | 164 |
| 8 | WEEK 3 | John Head | Eric Simmons | 81 | 23 |
| 9 | WEEK 3 | Dale Hemme | David Foster | 169 | 175 |
| 10 | WEEK 4 | Eric Simmons | James Rives | 23 | 164 |
| 11 | WEEK 4 | David Foster | John Dalton | 175 | 82 |
| 12 | WEEK 4 | Dale Hemme | John Head | 169 | 81 |
+----+------------+--------------+--------------+-----------------+-----------------+
CSV - wp_match_scores_test
+----+------------+------------+------------+
| ID | match_week | player1_id | player2_id |
+----+------------+------------+------------+
| 5 | WEEK 1 | 82 | 23 |
| 20 | WEEK 1 | 164 | 169 |
| 21 | WEEK 2 | 164 | 81 |
| 25 | WEEK 2 | 82 | 169 |
| 61 | WEEK 3 | 175 | 169 |
| 62 | WEEK 4 | 175 | 82 |
| 69 | WEEK 2 | 175 | 23 |
| 85 | WEEK 3 | 164 | 82 |
| 86 | WEEK 4 | 164 | 23 |
+----+------------+------------+------------+
The output from the mysql query are the matches that have been played. I am trying to figure out how to list the matches that have not been played from the table Schedule.
CSV - MySQL Output
+------------+------------+------------+
| match_week | player1_id | player2_id |
+------------+------------+------------+
| WEEK 1 | 164 | 169 |
| WEEK 1 | 82 | 23 |
| WEEK 2 | 164 | 81 |
| WEEK 2 | 82 | 169 |
| WEEK 2 | 175 | 23 |
| WEEK 3 | 175 | 169 |
| WEEK 3 | 164 | 82 |
| WEEK 4 | 175 | 82 |
| WEEK 4 | 164 | 23 |
+------------+------------+------------+
MYSQL
select DISTINCT ms.match_week, ms.player1_id , ms.player2_id FROM
wp_match_scores_test ms
JOIN wp_schedules_test s
ON (s.player1_home_id = ms.player1_id or s.player1_away_id =
ms.player2_id)
Order by ms.match_week
The expected output is:
CSV - Desired Output
+------------+----------------+----------------+
| match_week | player_home_id | player_away_id |
+------------+----------------+----------------+
| WEEK 1 | 81 | 175 |
| WEEK 3 | 81 | 23 |
| WEEK 4 | 169 | 81 |
+------------+----------------+----------------+
The added code I would like to use is
SELECT s.*
FROM wp_schedules_test s
WHERE NOT EXISTS
(select DISTINCT ms.match_week, ms.player1_id , ms.player2_id FROM
wp_match_scores_test ms
JOIN wp_schedules_test s
ON (s.player1_home_id = ms.player1_id or s.player1_away_id =
ms.player2_id)
Order by ms.match_week)
Unfortunately, the output yields "No Rows"
You can use a LEFT JOIN to achieve the desired results, joining the two tables on matching player ids (noting that player id values in wp_match_scores_test can correspond to either player1_home_id or player1_away_id in wp_schedules_test). If there is no match, the result table will have NULL values from the wp_match_scores_test table values, and you can use that to select the matches which have not been played:
SELECT sch.*
FROM wp_schedule_test sch
LEFT JOIN wp_match_scores_test ms
ON (ms.player1_id = sch.player1_home_id
OR ms.player2_id = sch.player1_home_id)
AND (ms.player1_id = sch.player1_away_id
OR ms.player2_id = sch.player1_away_id)
WHERE ms.ID IS NULL
Output:
ID match_week home_player1 away_player1 player1_home_id player1_away_id
2 Week 1 John Head David Foster 81 175
8 Week 3 John Head Eric Simmons 81 23
12 Week 4 Dale Hemme John Head 169 81
Note that you can also use a NOT EXISTS query, using the same condition as I used in the JOIN:
SELECT sch.*
FROM wp_schedule_test sch
WHERE NOT EXISTS (SELECT *
FROM wp_match_scores_test ms
WHERE (ms.player1_id = sch.player1_home_id
OR ms.player2_id = sch.player1_home_id)
AND (ms.player1_id = sch.player1_away_id
OR ms.player2_id = sch.player1_away_id))
The output of this query is the same. Note though that conditions in the WHERE clause have to be evaluated for every row in the result set and that will generally make this query less efficient than the LEFT JOIN equivalent.
Demo on dbfiddle
I have a table which saves monthly values, but also a value for the complete year. Is is possible to add to the yearly value whenever I insert a value for a month?
I want to avoid loading the value first, adding to it in the server-code and writing it again.
You can write a trigger and insert value in the years table when any value is inserted in the Month table like
CREATE TRIGGER tr_month ON monthly_table
AFTER INSERT
AS
BEGIN
UPDATE year_table
SET // insert your values here
FROM inserted
WHERE monthly.id = inserted.id; // something like that, I am not sure about your structure thats why cannot add exact syntax
END
GO
Your best approach to this is avoiding redundant data in your table. When you need year totals, SELECT them.
You didn't tell us your table definition, so I will guess. The table months contains
year int (for example, 2019)
month int (1-12)
value number
You can get the details of this the obvious way: `
SELECT year, month, value FROM months;
You can get the details and the yearly sums this way
SELECT year, month, SUM(value) value
FROM months
GROUP BY year, month WITH ROLLUP;
The result set for this query looks like the other result set, but also contains sums. It looks like this:
| year | month | value |
| ---- | ----- | ----- |
| 2018 | 1 | 100 | detail month values...
| 2018 | 2 | 140 |
| 2018 | 3 | 130 |
| 2018 | 4 | 190 |
| 2018 | 5 | 120 |
| 2018 | 6 | 180 |
| 2018 | 7 | 130 |
| 2018 | 8 | 140 |
| 2018 | 9 | 150 |
| 2018 | 10 | 200 |
| 2018 | 11 | 230 |
| 2018 | 12 | 300 |
| 2018 | | 2010 | yearly sum for 2018 (month is NULL)
| 2019 | 1 | 100 |
| 2019 | 2 | 130 |
| 2019 | 3 | 160 |
| 2019 | 4 | 140 |
| 2019 | 5 | 190 |
| 2019 | 6 | 240 |
| 2019 | | 960 | yearly sum for 2019 (month is NULL)
| | | 2970 | total sum (both month and year are NULL)
View on DB Fiddle
Why is this a good process?
you need to store no extra data.
it works correctly even if you update or delete rows in your table.
it's fast: SQL is made to do this kind of thing.
Just solved it by adding the values client side, this also saves computing time on the server.
I have a table that looks like this:
+--------+----------+------+-----------+
| make | model | year | avg_price |
+--------+----------+------+-----------+
| Subaru | Forester | 2013 | 18533 |
| Ford | F-150 | 2014 | 27284 |
| Ford | F-150 | 2010 | 18296 |
| Subaru | Forester | 2012 | 16589 |
| Ford | F-150 | 2013 | 25330 |
| Ford | F-150 | 2011 | 20366 |
| Subaru | Forester | 2008 | 7256 |
| Ford | F-150 | 2015 | 33519 |
| Ford | F-150 | 2012 | 23033 |
| Subaru | Forester | 2011 | 15789 |
+--------+----------+------+-----------+
Using MySQL, I want to add a new column with a three year average price centered on the record year. It should look like this when done:
+--------+----------+------+-----------+---------------------+
| make | model | year | avg_price | 3_yr_center_average |
+--------+----------+------+-----------+---------------------+
| Subaru | Forester | 2013 | 18533 | 17561 |
| Ford | F-150 | 2014 | 27284 | 28711 |
| Ford | F-150 | 2010 | 18296 | 19331 |
| Subaru | Forester | 2012 | 16589 | 16970 |
| Ford | F-150 | 2013 | 25330 | 25216 |
| Ford | F-150 | 2011 | 20366 | 20565 |
| Subaru | Forester | 2008 | 7256 | 7256 |
| Ford | F-150 | 2015 | 33519 | 30401 |
| Ford | F-150 | 2012 | 23033 | 22910 |
| Subaru | Forester | 2011 | 15789 | 16189 |
+--------+----------+------+-----------+---------------------+
It seems that this should be straight forward if the data was ordered and everything was the same make and model. The reality is the working table has over 4000 unique make model year combinations and they are all un-ordered by year.
Therefore, the query cannot rely on ordered records or that adjacent records are in any way relevant to the next record. The query needs to filter on the distinct make model and year then center avg over the three year interval without hiccuping when it is averaging the first or last year of a spread where it will be missing one or two of the three years.
Any MySQL tips would be greatly appreciated! Thanks.
We can try joining twice to bring the previous and proceeding years into a single line with the current year, for each make and model. Then, subquery and take the average of prices from the three years:
SELECT make, model, year, avg_price,
(avg_price + last_price + next_price) / (1.0 + last_cnt + next_cnt) AS 3_yr_center_average
FROM
(
SELECT t1.make, t1.model, t1.year, t1.avg_price,
COALESCE(t2.avg_price, 0) AS last_price,
COALESCE(t3.avg_price, 0) AS next_price,
CASE WHEN t2.avg_price IS NOT NULL THEN 1 ELSE 0 END AS last_cnt,
CASE WHEN t3.avg_price IS NOT NULL THEN 1 ELSE 0 END AS next_cnt
FROM yourTable t1
LEFT JOIN yourTable t2
ON t1.make = t2.make AND t1.model = t2.model AND t1.year = t2.year + 1
LEFT JOIN yourTable t3
ON t1.make = t3.make AND t1.model = t3.model AND t1.year = t3.year - 1
) t
ORDER BY
make, model, year;
Demo
Note that there is an edge case here in your data with regard to what should happen for a record which is the last (or first) year for that make and model. In that case, there are only two years available for the three year moving average. I made the assumption in this case that you would be OK with actually just reporting a two year moving average. For example, for the Subaru Forester in 2013, I report a three year moving average of 17561, which is actually the average of the 2013 price 18533 and the previous 2012 price 16589.
I am looking for a SQL statement that outputs missing calendar weeks based on a table. Here is a short example. We are using MySQL. I dropped all the irrelevant columns.
+------+--------------+
| Year | CalendarWeek |
+------+--------------+
| 2012 | 1 |
| 2012 | 5 |
| 2012 | 8 |
| 2012 | 9 |
| 2012 | 51 |
| 2013 | 2 |
+------+--------------+
What I am trying to get:
+------+--------------+
| Year | CalendarWeek |
+------+--------------+
| 2012 | 2 |
| 2012 | 3 |
| 2012 | 4 |
| 2012 | 6 |
| 2012 | 7 |
| 2012 | 10 |
| ... | ... |
| 2012 | 50 |
| 2012 | 52 |
| 2013 | 1 |
+------+--------------+
I added the dots to shorten the output.
Further background: The columns in each row are computed via some logic in Java. If we create a new row, lets say for the current calendar week, we have to check if we need to fill gaps. Its a simple routine. Is there a row for the previous week? Yes? Fine we are done. Otherwise compute the value for the previous week, insert it and check the week before that. The one and only do-while-loop in the whole program.
This apparently fails if we start with a table that has more than one gap. In this case we have to run through all the calendar weeks for every year and check for missing rows. Takes some time.
tl;dr I am trying to reduce roundtrips to the database with a shortcut.
I have two tables connected with one to many relationship.
Parent Table is a simple user table with id and first_name column
Parent
id | first_name
1 | Bob
2 | Dick
3 | Harry
4 | Tom
5 | Holly
Child Table contains insurance selection for a user.
Child
id | insu_id | user_id | year
1 | 188765 | 1 | 2008
2 | 188765 | 1 | 2009
3 | 188765 | 1 | 2010
4 | 188765 | 1 | 2011
5 | 188765 | 1 | 2012
I want to copy insurance selection of user_id 1 to all rest of the users i.e. 2, 3, 4, 5 in child table so that the table should look like
id | insu_id | user_id | year
1 | 188765 | 1 | 2008
2 | 188765 | 1 | 2009
3 | 188765 | 1 | 2010
4 | 188765 | 1 | 2011
5 | 188765 | 1 | 2012
6 | 188765 | 2 | 2008
7 | 188765 | 2 | 2009
8 | 188765 | 2 | 2010
9 | 188765 | 2 | 2011
10 | 188765 | 2 | 2012
11 | 188765 | 3 | 2008
12 | 188765 | 3 | 2009
13 | 188765 | 3 | 2010
14 | 188765 | 3 | 2011
15 | 188765 | 3 | 2012
16 | 188765 | 4 | 2008
17 | 188765 | 4 | 2009
18 | 188765 | 4 | 2010
19 | 188765 | 4 | 2011
20 | 188765 | 4 | 2012
21 | 188765 | 5 | 2008
22 | 188765 | 5 | 2009
23 | 188765 | 5 | 2010
24 | 188765 | 5 | 2011
25 | 188765 | 5 | 2012
What Can I do
INSERT INTO child(insurance_id, user_id, year) SELECT insurance_id, '2', year FROM child WHERE user_id = 1
INSERT INTO child(insurance_id, user_id, year) SELECT insurance_id, '3', year FROM child WHERE user_id = 1
INSERT INTO child(insurance_id, user_id, year) SELECT insurance_id, '4', year FROM child WHERE user_id = 1
INSERT INTO child(insurance_id, user_id, year) SELECT insurance_id, '5', year FROM child WHERE user_id = 1
What I want
I don't want to run 4 different INSERT INTO() SELECT queries because number of users can increase beyond that. I want one query selecting user_id dynamically rather than hard coding.
I have tried that and it seems to work. I get the insurance data from child table for the first user and join it with all other users in the parent table in order to insert the result. You may have to fix the column names if they are not the same as your database.
INSERT INTO child(insurance_id, user_id, year)
SELECT a.insurance_id, b.id, a.year
FROM
child a, parent b
WHERE a.user_id = 1 AND b.id > 1
ORDER BY b.id, a.year
This will always insert insurance data for all users except the one with id = 1. No matter how many of them you have. Hope that helps :)