MYSQL JOIN with lower-than / greater-than conditions (minimum_quantity, valid_from) - mysql

I have
order table with columns
id
date
supplier_id
order_lineitem table with columns
id
order_id
article_id
order_quantity
order_price
a prices table with columns
id
article_id
supplier_id
valid_until
minimum_order_quantity
list_price
The prices table doesn't necessarily have to have a matching / valid entry, so this one would have to be joined via an outer join.
I'd like to compare order_prices against list_prices.
Therefore I need to somehow join
SELECT
o.id,
o.date,
ol.article_id,
ol.order_quantity,
ol.order_price,
p.list_price
FROM
`order` o JOIN order_lineitem ol on ol.order_id = o.id
LEFT OUTER JOIN prices p on
p.article_id = ol.article_id
AND p.supplier_id = o.supplier_id
AND p.minimum_order_quantity <= ol.order_quantity
AND IFNULL(p.valid_until, DATE('2099-12-31')) >= o.date
/* here comes the fun part that doesn't work (reliably) */
ORDER BY
IFNULL(p.valid_until, DATE('2099-12-31')) asc,
p.minimum_order_quantity desc
GROUP BY o.id, ol.id, p.article_id
/* ... trying to get only THAT price from the prices table that applies for the
(a) the given article
(b) from the given supplier
(c) that was valid at the time of purchase (i.e. has the smallest "valid_until" date that is greater than the purchase date)
(d) when ordering the given quantity (prices can also increase with higher quantities, so it has to be the price with the largest minimum_order_quantity that is smaller than the ordered quantity)
*/
I particularly don't want to fall into the trap (which I dug for myself here) of using group by to limit the results to 1 record from the prices table based on a previous sorting, since
(i) as per MySQL documentation it is non-deterministic which record will actually get returned (although it may in effect often work and this is a frequently suggested route to go) - also see this excellent explanation on the issue: https://stackoverflow.com/a/14770936/9818188 and
(ii) this concept wouldn't work on other SQL implementations like SQL Server, Maria DB & Co.
The question is not around putting in a nested query in order to be able to ORDER first and then GROUP subsequently. It's more about how to really properly get the correct row--ideally also working on other SQL implementations like SQL Server, Maria DB or Google BigQuery.
And since I can't really rely on prices being cheaper the more I buy I also can't simply get the min(list_price).
How can this can be achieved?
Since the output of this query is required for downstream processing, I can't slice & dice the task but need a full list of all orders with respective list prices.
EDIT
Here is a SQL fiddle - the desired prices are shown in column order_price, the prices incorrectly determined by the JOIN (excluding the order byclause - as this would cause non-deterministic results) are shown in column list_price:
http://sqlfiddle.com/#!9/f03a4f/2
CREATE TABLE `order`
(`id` int, `date` datetime, `supplier_id` int)
;
INSERT INTO `order`
(`id`, `date`, `supplier_id`)
VALUES
(1, '2022-01-15 00:00:00', 1),
(2, '2022-02-15 00:00:00', 1),
(3, '2022-03-15 00:00:00', 1),
(4, '2022-01-15 00:00:00', 2),
(5, '2022-02-15 00:00:00', 2),
(6, '2022-03-15 00:00:00', 2)
;
CREATE TABLE order_lineitem
(`id` int, `order_id` int, `article_id` int, `order_quantity` int, `order_price` int)
;
INSERT INTO order_lineitem
(`id`, `order_id`, `article_id`, `order_quantity`, `order_price`)
VALUES
(1, 1, 1, 1, 11),
(2, 1, 1, 10, 8),
(3, 1, 1, 100, 9),
(4, 2, 1, 1, 15),
(5, 2, 1, 10, 12),
(6, 2, 1, 100, 13),
(7, 3, 1, 1, 17),
(8, 3, 1, 10, 14),
(9, 3, 1, 100, 16),
(10, 4, 1, 1, 10),
(11, 4, 1, 10, 80),
(12, 4, 1, 100, 80),
(13, 5, 1, 1, 10),
(14, 5, 1, 10, 80),
(15, 5, 1, 100, 80),
(16, 6, 1, 1, 10),
(17, 6, 1, 10, 10),
(18, 6, 1, 100, 10)
;
CREATE TABLE prices
(`id` int, `article_id` int, `supplier_id` int, `valid_until` varchar(10), `minimum_order_quantity` int, `list_price` int)
;
INSERT INTO prices
(`id`, `article_id`, `supplier_id`, `valid_until`, `minimum_order_quantity`, `list_price`)
VALUES
(1, 1, 1, '2022-01-31', 1, 11),
(2, 1, 1, '2022-01-31', 10, 8),
(3, 1, 1, '2022-01-31', 100, 9),
(4, 1, 2, NULL, 1, 10),
(5, 1, 1, '2022-02-31', 1, 15),
(6, 1, 1, '2022-02-31', 10, 12),
(7, 1, 1, '2022-02-31', 100, 13),
(8, 1, 1, NULL, 1, 17),
(9, 1, 1, NULL, 10, 14),
(10, 1, 1, NULL, 100, 16),
(11, 2, 1, NULL, 1, 99),
(12, 1, 2, '2022-02-31', 10, 80)
;
SELECT
o.id,
o.supplier_id,
o.date,
ol.article_id,
ol.order_quantity,
ol.order_price,
p.list_price
FROM
`order` o JOIN order_lineitem ol on ol.order_id = o.id
LEFT OUTER JOIN prices p on
p.article_id = ol.article_id
AND p.supplier_id = o.supplier_id
AND p.minimum_order_quantity <= ol.order_quantity
AND IFNULL(p.valid_until, DATE('2099-12-31')) >= o.date
/* here comes the fun part that doesn't work (reliably) */
/* NOTE: I am purposesly commenting out the ORDER BY clause here, because
(a) it would have to go after GROUP BY - requiring a nested table which I would like to prevent AND, more importantly,
(b) limiting the numer of rows returned to 1 by GROUPing with an incomplete set of columns on a sorted table may return non-deterministic results as per the MySQL documentation.
see also https://stackoverflow.com/a/14770936/9818188 explaining the issue with GROUP BY in this context
#
# ORDER BY
# IFNULL(p.valid_until, DATE('2099-12-31')) asc,
# p.minimum_order_quantity desc
*/
GROUP BY o.id, ol.id, p.article_id
/* ... trying to get only THAT price from the prices table that applies for the
(a) the given article
(b) from the given supplier
(c) that was valid at the time of purchase (i.e. has the smallest "valid_until" date that is greater than the purchase date)
(d) when ordering the given quantity (prices can also increase with higher quantities, so it has to be the price with the largest minimum_order_quantity that is smaller than the ordered quantity)
*/

If you are interrestd in the highest listprice, you would do it like the.
If you need also other columns from theprices table, you need to SQL select only rows with max value on a column
as you have to join the sub querys for all articles
SELECT
o.id,
o.date,
ol.article_id,
ol.order_quantity,
ol.order_price,
(SELECT `list_price` FROM prices p WHERE
p.article_id = ol.article_id
AND p.supplier_id = o.supplier_id
AND p.minimum_order_quantity <= ol.order_quantity
AND IFNULL(p.valid_until, DATE('2099-12-31')) >= o.date
ORDER BY `list_price` DESC
LIMIT 1
) list_price
FROM
`order` o JOIN order_lineitem ol on ol.order_id = o.id

Related

How do I build a query to get the latest row per user where a third criteria is in a separate table?

I have three tables
CREATE TABLE `LineItems` (
`LineItemID` int NOT NULL,
`OrderID` int NOT NULL,
`ProductID` int NOT NULL
);
INSERT INTO `LineItems` (`LineItemID`, `OrderID`, `ProductID`) VALUES
(1, 1, 2),
(2, 1, 1),
(3, 2, 3),
(4, 2, 4),
(5, 3, 1),
(6, 4, 2),
(7, 5, 4),
(8, 5, 2),
(9, 5, 3),
(10, 6, 1),
(11, 6, 4),
(12, 7, 4),
(13, 7, 1),
(14, 7, 2),
(15, 8, 1),
(16, 9, 3),
(17, 9, 4),
(18, 10, 3);
CREATE TABLE `Orders` (
`OrderID` int NOT NULL,
`UserID` int NOT NULL,
`OrderDate` datetime NOT NULL
);
INSERT INTO `Orders` (`OrderID`, `UserID`, `OrderDate`) VALUES
(1, 21, '2021-05-01 00:00:00'),
(2, 21, '2021-05-03 00:00:00'),
(3, 24, '2021-05-06 00:00:00'),
(4, 23, '2021-05-12 00:00:00'),
(5, 21, '2021-05-14 00:00:00'),
(6, 22, '2021-05-16 00:00:00'),
(7, 23, '2021-05-20 00:00:00'),
(8, 21, '2021-05-22 00:00:00'),
(9, 24, '2021-05-23 00:00:00'),
(10, 23, '2021-05-26 00:00:00');
CREATE TABLE `Products` (
`ProductID` int NOT NULL,
`ProductTitle` VARCHAR(250) NOT NULL,
`ProductType` enum('doors','windows','flooring') NOT NULL
);
INSERT INTO `Products` (`ProductID`, `ProductTitle`, `ProductType`) VALUES
(1, 'French Doors','doors'),
(2, 'Sash Windows','windows'),
(3, 'Sliding Doors','doors'),
(4, 'Parquet Floor','flooring');
SQL Fiddle:
Orders - contains an order date and a user id
LineItems - Foreign key to the orders table, contains product ids that are in the order
Products - Contains details of the products (including if they are a door, window, or flooring)
I have figured out how to get the latest order per user with
SELECT O.* FROM Orders O LEFT JOIN Orders O2
ON O2.UserID=O.UserID AND O.OrderDate < O2.OrderDate
WHERE O2.OrderDate IS NULL;
This works fine and is included in the SQL fiddle, along with a query that returns a complete picture for reference.
I am trying to figure out how to get the latest order with flooring per user, but I'm not having any luck.
In the SQL fiddle linked above, the intended output for what I am after would be
OrderID | UserID | OrderDate
6 | 22 | 2021-05-16T00:00:00Z
5 | 21 | 2021-05-14T00:00:00Z
9 | 24 | 2021-05-23T00:00:00Z
7 | 23 | 2021-05-20T00:00:00Z
EDIT: To clarify, in the intended result, two rows (for users 21 and 23) are different than in the query that gets just latest order per user. This is because order IDs 8 and 10 (from the latest order per user query) do not include flooring. The intended query has to find the latest order with flooring from each user to return in the result set.
You need to add the LineItems and Products tables to your query to find orders where flooring was purchased:
SELECT DISTINCT O.*
FROM Orders O
LEFT JOIN Orders O2
ON O2.UserID=O.UserID AND
O.OrderDate < O2.OrderDate
INNER JOIN LineItems i
ON i.OrderID = O.OrderID
INNER JOIN Products p
ON p.ProductID = i.ProductID
WHERE O2.OrderDate IS NULL AND
p.ProductType = 'flooring'
db<>fiddle here

How do i get a limited number of rows for each value in a field? (db is mysql but could be changed if easier)

I found this difficult to search for this question. I have a table of sports fixtures (tbl_fixture) and a table of sports participants (tbl_participant) which have a many-to-many relationship via a linking table (tbl_fixture_participant)
I need to return the most recent 3 fixtures (ie latest tbl_fixture.start_datetime) of multiple participants and whether they won each of the fixtures, (eg more recent 3 fixtures of participant 1 and most recent 3 fixtures of participant 2, and most recent 3 fixtures of participant 3, with each record returning the fixture_id, participant_id, start_datetime and is_winner fields).
The number of participants that i need to get the data for could be between 1 and 100.
If there's a better way to structure my data, or a better database for this type of query (graph db?) then i'm happy to look into those.
Here's a sample schema:
CREATE TABLE tbl_fixture (
fixture_id INT AUTO_INCREMENT PRIMARY KEY,
start_datetime DATETIME NOT NULL
);
CREATE TABLE tbl_participant (
participant_id INT AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(255) NOT NULL
);
CREATE TABLE tbl_fixture_participant (
fixture_id INT NOT NULL,
participant_id INT NOT NULL,
is_winner TINYINT NOT NULL,
FOREIGN KEY (fixture_id)
REFERENCES tbl_fixture (fixture_id)
ON UPDATE RESTRICT ON DELETE CASCADE,
FOREIGN KEY (participant_id)
REFERENCES tbl_participant (participant_id)
ON UPDATE RESTRICT ON DELETE CASCADE
);
INSERT INTO tbl_fixture (fixture_id, start_datetime)
VALUES (1, "2021-01-14 15:00:00"),
(2, "2021-01-13 16:00:00"),
(3, "2021-01-12 17:00:00"),
(4, "2021-01-11 15:00:00"),
(5, "2021-01-19 16:00:00"),
(6, "2021-01-18 17:00:00"),
(7, "2021-01-05 15:00:00"),
(8, "2021-01-03 16:00:00"),
(9, "2021-01-03 17:00:00"),
(10, "2021-01-11 15:00:00"),
(11, "2021-01-12 16:00:00"),
(12, "2021-01-13 17:00:00"),
(13, "2021-01-14 15:00:00"),
(14, "2021-01-19 16:00:00");
INSERT INTO tbl_participant (participant_id, name) VALUES
( 1,"Team 1"),
( 2,"Team 2"),
( 3,"Team 3");
INSERT INTO tbl_fixture_participant (fixture_id, participant_id, is_winner)
VALUES (1, 1, 0)
,(2, 1, 1)
,(2, 2, 0)
,(3, 1, 1)
,(12, 2, 0)
,(4, 3, 1)
,(4, 2, 0)
,(6, 3, 1)
,(1, 2, 1)
,(10, 1, 1)
,(5, 2, 0)
,(6, 1, 0)
,(11, 1, 1)
,(14, 1, 0)
,(7, 2, 0)
,(7, 3, 1)
,(3, 3, 0)
,(8, 1, 0)
,(5, 3, 1)
,(13, 2, 0)
,(8, 3, 1)
,(13, 3, 1)
,(9, 1, 0)
,(9, 2, 1)
,(10, 2, 0)
,(11, 3, 0)
,(12, 3, 1)
,(14, 3, 1);
And SQL Fiddle of same.
I would like the data to come back like:
fixture_id
start_datetime
participant_id
is_winner
14
2021-01-19T16:00:00Z
1
0
6
2021-01-18T17:00:00Z
1
0
1
2021-01-14T15:00:00Z
1
0
5
2021-01-19T16:00:00Z
2
0
13
2021-01-14T15:00:00Z
2
0
1
2021-01-14T15:00:00Z
2
1
EDITED TO REFLECT FACT THAT DATES ARE NOT NECESSARILY SEQUENTIAL...
E.g. (for older versions of MySQL)...
SELECT x.*
, fx.start_datetime
FROM fixture_participant x
JOIN fixture fx
ON fx.fixture_id = x.fixture_id
JOIN fixture_participant y
ON y.participant_id = x.participant_id
JOIN fixture fy
ON fy.fixture_id = y.fixture_id
AND fy.start_datetime > fx.start_datetime
GROUP
BY x.fixture_id
, x.participant_id
, x.is_winner
, fx.start_datetime
HAVING COUNT(x.fixture_id) <=3
ORDER
BY participant_id,fixture_id;
...or something like that.

Ponderate average MYSQL

We have a little simulator of a tour-operator DB (MYSQL) and we are asked to get a Query that gives us the weighted avg of duration of the tours that we have.
https://en.wikipedia.org/wiki/Weighted_arithmetic_mean
Using subquery I got to this point where I have the days that each tour lasts and the weight of each tour from the total of tours, but I am stuck and don't know how to get the weighted avg from here. I know I have to use another select from the result I already got but I would appreciate some help.
SQLfiddle down here:
http://sqlfiddle.com/#!9/53d80/2
Tables and data
CREATE TABLE STAGE
(
ID INT AUTO_INCREMENT NOT NULL,
TOUR INT NOT NULL,
TYPE INT NOT NULL,
CITY INT NOT NULL,
DAYS INT NOT NULL,
PRIMARY KEY (ID)
);
CREATE TABLE TOUR
(
ID INT AUTO_INCREMENT NOT NULL,
DESCRIPTION VARCHAR(255) CHARACTER SET UTF8 COLLATE UTF8_UNICODE_CI
NOT NULL,
STARTED_ON DATE NOT NULL,
TYPE INT NOT NULL,
PRIMARY KEY (ID)
);
INSERT INTO TOUR (DESCRIPTION, STARTED_ON, TYPE) VALUES
('Mediterranian Cruise','2018-01-01',3),
('Trip to Nepal','2017-12-01',1),
('Tour in Nova York','2015-04-24',5),
('A week at the Amazones','2014-09-11',2),
('Visiting the Machu Picchu','2013-02-19',4);
INSERT INTO STAGE (TOUR, TYPE, CITY, DAYS) VALUES
(1, 1, 38254, 1),
(1, 2, 22460, 3),
(1, 2, 47940, 3),
(1, 2, 42600, 4),
(1, 3, 38254, 1),
(2, 1, 13097, 1),
(2, 2, 29785, 5),
(2, 3, 13097, 1),
(3, 1, 788, 2); ,
(3, 2, 48019, 6),
(3, 3, 788, 1),
(4, 1, 38254, 2),
(4, 2, 8703, 3);,
(4, 3, 38254, 4),
(5, 1, 10453, 1),
(5, 2, 32045, 5),
(5, 3, 10453, 2);
Query:
SELECT
AVG(TD.TOUR_DAYS) AS AVERAGE_DAYS,
COUNT(TD.TOUR_ID) AS WEIGHT
FROM
(
SELECT
TOUR.ID AS TOUR_ID,
SUM(DAYS) AS TOUR_DAYS,
COUNT(STAGE.ID) AS STAGE_DAYS
FROM
TOUR
INNER JOIN
STAGE
ON
TOUR.ID = STAGE.TOUR
GROUP BY
TOUR.ID
) AS TD
GROUP BY
TD.TOUR_DAYS
weigthed avg would be:
(1×7+1×8+2×9+1×12) / (1+1+2+1) = 9
Wheighted AVG can be calculated with SUM(value * wheight) / SUM(wheight). In your case:
SELECT SUM(AVERAGE_DAYS * WEIGHT) / SUM(WEIGHT)
FROM (
SELECT
AVG(TD.TOUR_DAYS) AS AVERAGE_DAYS,
COUNT(TD.TOUR_ID) AS WEIGHT
FROM
(
SELECT
TOUR.ID AS TOUR_ID,
SUM(DAYS) AS TOUR_DAYS,
COUNT(STAGE.ID) AS STAGE_DAYS
FROM
TOUR
INNER JOIN
STAGE
ON
TOUR.ID = STAGE.TOUR
GROUP BY
TOUR.ID
) AS TD
GROUP BY
TD.TOUR_DAYS
) sub
http://sqlfiddle.com/#!9/53d80/4
I'm not 100% sure, but it looks like the following query is doing exactly the same:
SELECT AVG(TOUR_DAYS)
FROM (
SELECT TOUR, SUM(DAYS) AS TOUR_DAYS
FROM STAGE
GROUP BY TOUR
) sub;
Or even without any subqueries:
SELECT SUM(DAYS) / COUNT(DISTINCT TOUR)
FROM STAGE;
That would mean, the requirement should be simplified to "Get average number of days per tour".

Selecting only the first two items from and order

I need you help regarding something, i have 3 tables ORDERS, ORDER_ITEM, ORDER_ITEM_LINE.
CREATE TABLE orders
(`id` int, `date` datetime)
;
INSERT INTO orders
(`id`, `date`)
VALUES
(78, '2017-01-03 00:00:00'),
(79, '2017-02-03 00:00:00'),
(80, '2017-03-03 00:00:00'),
(81, '2017-04-03 00:00:00'),
(82, '2017-05-03 00:00:00'),
(83, '2017-06-03 00:00:00'),
(84, '2017-07-03 00:00:00')
;
CREATE TABLE order_item
(`id` int, `fk_o_id` int, `sku` int)
;
INSERT INTO order_item
(`id`, `fk_o_id`, `sku`)
VALUES
(10, 78, 123),
(11, 79, 124),
(12, 79, 125),
(13, 80, 126),
(14, 82, 127),
(15, 82, 128),
(16, 82, 129)
;
CREATE TABLE order_item_line
(`id` int, `fk_oi_id` int, `line_id` int)
;
INSERT INTO order_item_line
(`id`, `fk_oi_id`, `line_id`)
VALUES
(33, 10, 1),
(34, 11, 1),
(35, 12, 2),
(36, 13, 1),
(37, 14, 1),
(38, 15, 2),
(39, 16, 3)
;
I would like to display all orders with 2 or more than 2 items but only first two so it will be line_id - 1 and 2.
The outcome should look like:
Outcome
If you have any ideas, thank you in advance.
To get the result you require, you will need to create another table. In this example I created a table called TESTQUERY and inserted data to count how many times the orders id appeared
Table creation
CREATE TABLE TESTQUERY
(`id` int, `count` int)
Data into the test table
INSERT INTO TESTQUERY
(
SELECT o.id, COUNT(o.id) as count FROM orders o
JOIN order_item oi ON oi.fk_o_id = o.id
JOIN order_item_line oil ON oil.fk_oi_id = oi.id
GROUP BY o.id
)
I then queried against all for databases using the query below and it returned your desired outcome
SELECT o.id, oi.sku, oil.line_id FROM orders o
JOIN order_item oi ON oi.fk_o_id = o.id
JOIN order_item_line oil ON oil.fk_oi_id = oi.id
JOIN TESTQUERY t ON t.id = o.id
WHERE t.count > 1 AND oil.line_id < 3
I hope this helps

MySQL sort column by values

I want to sort my database result based on a column value. I thought I needed to do this:
SELECT * FROM products
ORDER BY FIELD(brand_id, 4, 1, 6)
But this doesn't sort my result at all. Or at least in a way I don't understand.
What I want to achieve is that I get all products, ordered by brand_id. First all with brand_id 4, then 1 then 6 and then the rest.
Here is my example on jsfiddle.
If not working (jsfiddle not working fine here), see my code below:
Schema:
CREATE TABLE `products` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`brand_id` int(10) unsigned NOT NULL,
`price` decimal(8,2) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB;
INSERT INTO `products` (`id`, `brand_id`, `price`)
VALUES
(1, 1, 10.32),
(2, 4, 15.32),
(3, 2, 23.32),
(4, 6, 56.32),
(5, 4, 23.32),
(6, 4, 23.32),
(7, 1, 25.32),
(8, 5, 15.32),
(9, 3, 55.32),
(10, 5, 23.32);
The problem is here:
FIELD(brand_id, 4, 1, 6)
will return 0 if brand_id is not present in the list, so brands not present will be at the top. You can use this:
ORDER BY FIELD(brand_id, 6, 1, 4) DESC
with all of the values in reverse order, and then you have to order in DESC order.
Normally, when you use field() for sorting you also have in:
SELECT p.*
FROM products p
WHERE brand_id IN (4, 1, 6)
ORDER BY FIELD(brand_id, 4, 1, 6);
Then it works as advertised.
Instead of reversing values, you can also take a more explicit approach:
SELECT p.*
FROM products p
ORDER BY brand_id IN (4, 1, 6) DESC,
FIELD(brand_id, 4, 1, 6);
This is longer, but the intention is clear.