MySQL 8.0 GROUP BY / FULL_GROUP_BY - mysql

Since upgrading to MySQL 8 I have a lot of queries that dont comply to the new full-group-by setting of MySQL. Below is a simplified version of one of the queries. I have A LOT of them and before going through all the code, I want to fully understand the problem.
I have the following records in the database:
[prod_id] => 1
[prod_name] => Product 1
[prod_new] => 50.00
[prod_size] => L
[prod_id] => 2
[prod_name] => Product 1
[prod_new] => 45.00
[prod_size] => M
[prod_id] => 3
[prod_name] => Product 1
[prod_new] => 40.00
[prod_size] => S
[prod_id] => 4
[prod_name] => Product 4
[prod_new] => 100.00
[prod_size] => M
[prod_id] => 5
[prod_name] => Product 5
[prod_new] => 200.00
[prod_size] => M
When I ran the following query in MySQL 5.x, I got 3 results. Containing products 1, 4, 5. With the corresponding name, price and size.
SELECT prod_id, prod_name, prod_price, prod_size
FROM prod_product
GROUP BY prod_name
Since the upgrade I get the widely know error about nonaggregated columns. So I want to fix this, but in some cases this will get me unwanted results. Lets say, for some reason, I wanted the highest product-id.
SELECT MAX(prod_id), prod_name, ANY_VALUE(prod_price), ANY_VALUE(prod_size)
FROM prod_product
GROUP BY prod_name
This will give me product-id's 3, 4, 5. But with product-id 3 it gives me the price and size of product-id 1.
Obviously that is unwanted behaviour. I would assume that, since prod_id is the primary-key, the database knows which values to show with the corresponding id. When I say MAX(prod_id) this already pinpoints a single record in this group, why give me values of other records from this group?
I guess I am missing something important here. =)
Thanks!

I would assume that, since prod_id is the primary-key, the database knows which values to show with the corresponding id.
Why would it know that?
Consider the following queries:
SELECT prod_name, prod_id, MIN(prod_price), MAX(prod_price)
FROM prod_product
GROUP BY prod_name
Which value should it return for prod_id here? The product that corresponds to the minimum price? Or the product that corresponds to the maximum price?
Also, what if there are multiple products that tie for the minimum or maximum price? Which one should it return?
SELECT prod_name, prod_id, AVG(prod_price)
FROM prod_product
GROUP BY prod_name
Now which prod_id should it infer? The aggregate calculation AVG() is likely to return a value that doesn't correspond to any single product.
The same happens with the aggregate SUM().
The fact is, there is no implicit correlation between an aggregate function and a specific row in the group. You should not expect SQL to guess which row from the group you mean to reference when you use non-aggregated expressions.

If you want the first record in every group of rows having the same prod_name ordered by prod_id, you can use window function ROW_NUMBER(), which is available in MySQL 8 :
SELECT x.prod_id, x.prod_name, x.prod_new, x.prod_size
FROM (
SELECT
p.prod_id, p.prod_name, p.prod_new, p.prod_size,
ROW_NUMBER() OVER(PARTITION BY p.prod_name ORDER BY p.prod_id) rn
FROM prod_product p
) x WHERE x.rn = 1
The inner query assigns a number to each record within each group, and the outer query filters in the first record in each group.
Demo on DB Fiddle :
WITH prod_product AS (
SELECT 1 prod_id, 'Product 1' prod_name, 50 prod_new, 'L' prod_size
UNION ALL SELECT 2, 'Product 1', 45, 'M'
UNION ALL SELECT 3, 'Product 1', 40, 'S'
UNION ALL SELECT 4, 'Product 4', 100, 'M'
UNION ALL SELECT 5, 'Product 5', 200, 'M'
)
SELECT x.prod_id, x.prod_name, x.prod_new, x.prod_size
FROM (
SELECT
p.prod_id, p.prod_name, p.prod_new, p.prod_size,
ROW_NUMBER() OVER(PARTITION BY p.prod_name ORDER BY p.prod_id) rn
FROM prod_product p
) x WHERE x.rn = 1;
| prod_id | prod_name | prod_new | prod_size |
| ------- | --------- | -------- | --------- |
| 1 | Product 1 | 50 | L |
| 4 | Product 4 | 100 | M |
| 5 | Product 5 | 200 | M |

Related

SQL Order results by Match Against Relevance and display the price based on sellers rank

Looking to display results based on 'relevance' of the users search along with the price of the seller that ranks highest. A live example to what i'm after is Amazons search results, now I understand their algorithm is extremely complicated, but i'm after a simplified version.
Lets say we search for 'Jumper' the results that are returned are products related to 'Jumper' but then the price is not always the cheapest is based on the sellers rank. The seller with the highest rank gets his/hers prices displayed.
Heres what I have been working on but not giving me the expected results at mentioned above, and to be honest I don't think this is very efficient.
SELECT a.catalogue_id, a.productTitle, a.prod_rank, b.catalogue_id, b.display_price, b.sellers_rank
FROM
(
SELECT c.catalogue_id,
c.productTitle,
MATCH(c.productTitle) AGAINST ('+jumper*' IN BOOLEAN MODE) AS prod_rank
FROM catalogue AS c
WHERE c.catalogue_id IN (1, 2, 3)
) a
JOIN
(
SELECT inventory.catalogue_id,
inventory.amount AS display_price,
(accounts.comsn + inventory.quantity - inventory.amount) AS sellers_rank
FROM inventory
JOIN accounts ON inventory.account_id = accounts.account_id
WHERE inventory.catalogue_id IN (1, 2, 3)
) AS b
ON a.catalogue_id = b.catalogue_id
ORDER BY a.prod_rank DESC
LIMIT 100;
Sample Tables:
Accounts:
----------------------------
account_id | comsn
----------------------------
1 | 100
2 | 9999
Catalogue:
----------------------------
catalogue_id | productTitle
----------------------------
1 | blue jumper
2 | red jumper
3 | green jumper
Inventory:
-----------------------------------------------
product_id | catalogue_id | account_id | quantity | amount |
-----------------------------------------------
1 | 2 | 1 | 6 | 699
2 | 2 | 2 | 2 | 2999
Expected Results:
Product Title:
red jumper
Amount:
29.99 (because he/she has sellers rank of: 7002)
First, you should limit the results only to the matches for the first subquery:
Second, you should eliminate the second subquery:
SELECT p.catalogue_id, p.productTitle, p.prod_rank,
i.amount as display_price,
(a.comsn + i.quantity - i.amount)
FROM (SELECT c.catalogue_id, c.productTitle,
MATCH(c.productTitle) AGAINST ('+jumper*' IN BOOLEAN MODE) AS prod_rank
FROM catalogue AS c
WHERE c.catalogue_id IN (1, 2, 3)
HAVING prod_rank > 0
) p JOIN
inventory i
ON i.catalogue_id = c.catalogue_id join
accounts a
ON i.account_id = a.account_id
ORDER BY c.prod_rank DESC
LIMIT 100;
I'm not sure if you can get rid of the final ORDER BY. MATCH with JOIN can be a bit tricky in that respect. But only ordering by the matches should help.

select min value of a field from joins table

CREATE VIEW products_view
AS
Hi guys ! I've tree tables:
Products
Categories
Prices
A product belongs to one category and may has more prices.
consider this set of data:
Product :
id title featured category_id
1 | bread | yes | 99
2 | milk | yes | 99
3 | honey | yes | 99
Price :
id product_id price quantity
1 | 1 | 99.99 | 10
2 | 1 | 150.00 | 50
3 | 2 | 33.10 | 20
4 | 2 | 10.00 | 11
I need to create a view, a full list of products that for each product select the min price and its own category.
eg.
id title featured cat.name price quantity
1 | bread | yes | food | 99.99 | 10
I tried the following query but in this way I select only the min Price.price value but Price.quantity, for example, came from another row. I should find the min Price.price value and so use the Price.quantity of this row as correct data.
CREATE VIEW products_view
AS
SELECT `Prod`.`id`, `Prod`.`title`, `Prod`.`featured`, `Cat`.`name`, MIN(`Price`.`price`) as price,`Price`.`quantity`
FROM `products` AS `Prod`
LEFT JOIN `prices` AS `Price` ON (`Price`.`product_id` = `Prod`.`id`)
LEFT JOIN `categories` AS `Cat` ON (`Prod`.`category_id` = `Cat`.`id`)
GROUP BY `Prod`.`id`
ORDER BY `Prod`.`id` ASC
My result is:
id title featured cat.name price quantity
1 | bread | yes | food | 99.99 | **50** <-- wrong
Can you help me ? Thx in advance !
As documented under MySQL Extensions to GROUP BY (emphasis added):
In standard SQL, a query that includes a GROUP BY clause cannot refer to nonaggregated columns in the select list that are not named in the GROUP BY clause. For example, this query is illegal in standard SQL because the name column in the select list does not appear in the GROUP BY:
SELECT o.custid, c.name, MAX(o.payment)
FROM orders AS o, customers AS c
WHERE o.custid = c.custid
GROUP BY o.custid;
For the query to be legal, the name column must be omitted from the select list or named in the GROUP BY clause.
MySQL extends the use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause. This means that the preceding query is legal in MySQL. You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate. Furthermore, the selection of values from each group cannot be influenced by adding an ORDER BY clause. Sorting of the result set occurs after values have been chosen, and ORDER BY does not affect which values within each group the server chooses.
What you are looking for is the group-wise minimum, which can be obtained by joining the grouped results back to the table:
SELECT Prod.id, Prod.title, Prod.featured, Cat.name, Price.price, Price.quantity
FROM products AS Prod
LEFT JOIN categories AS Cat ON Prod.category_id = Cat.id
LEFT JOIN (
prices AS Price NATURAL JOIN (
SELECT product_id, MIN(price) AS price
FROM prices
GROUP BY product_id
) t
) ON Price.product_id = Prod.id
ORDER BY Prod.id

SQL query suggestions please

I'm trying to write a SQL query to calculate prices for licenses.
Please check the schema below:
Table: Prices
| ID (bigint) | USERS(Bigint) | TYPE (varchar) | PRICE (BIGINT)
------------------------------------------------------------------
| 1 | 1 | other | 20 |
| 2 | 15 | local | 13.96 |
Table: Licenses
| ID (bigint) | USERID (Bigint) |STATUS(VARCHAR) | USERS(bigint) | DEVICES(BIGINT) | TYPE(VARCHAR) | REQUEST_TYPE (VARCHAR) |
--------------------------------------------------------------------------------------------------------------
| 1 | 13 | 10 | 10 | local | add |
| 2 | 13 | 15 | 20 | other | extend |
My objective:
Given a userid and type, i want to calculate total prices of all the licenses of basing on following critirea:
For given userid and type:
1) Get all licenses which have request_type as either new (or) extend
2) For each such license, match the number of users (USERS column) with USERS column from 'prices' table and do calculation as devices*(associated price from prices table)
3) Using this calculate sum of all such prices and return a total price.
I'm trying to do this by using the following query but i'm not successful yet:
SELECT SUM(PRICE) FROM prices
LEFT OUTER JOIN licenses
ON (
prices.users=licenses.users
AND prices.type=licenses.type
)
WHERE licenses.userid=13
AND licenses.status='approved'
AND licenses.request_type IN ('add','extend')
Please check SQL Fiddle here: http://sqlfiddle.com/#!2/05f5cf
Pleas help.
Thanks,
David
the result will be null, because query cannot found the condition of LEFT OUTER JOIN
there is the query
SELECT SUM(PRICE) FROM prices
LEFT OUTER JOIN licenses
ON (
prices.users=licenses.users
AND prices.type=licenses.type //cannot found condition on table
)
WHERE licenses.userid=13
AND licenses.status='approved'
AND licenses.request_type IN ('add','extend')
in this is the data inside table
table licences
(1, 'approved', 10, 10, 'local', 'add', 13),
(2, 'approved', 15, 20, 'other', 'extend', 13);
and table prices
(1, 1, 'other', 20),
(2, 15, 'local', 13.96);
and your condition is
prices.users=licenses.users
AND prices.type=licenses.type //cannot found condition on table
that mean if from your table is
at licences table have a type="other" and users=15
but at prices table haven't have type="other" and users=15
so the result will be null
because when i change the first row of table prices
(1, 1, 'other', 20)
becomes (1, 15, 'other', 20),
that will be have a result = 20
you need change your first line of query
SELECT SUM(PRICE) FROM prices
be
SELECT IFNULL(SUM(PRICE),0) FROM prices
this will change the result if haven't found the row to 0 not null
From your comments and updates, I think you want (not sure if it's necessary to compare users in license and users in price, but it seems youd want this)
select coalesce(sum( p.users * p.price), 0)
FROM licenses l
inner join prices p
on p.type = l.type
--and p.users = l.users
where l.status = 'approved'
and l.request_type in ('add', 'extend')
and l.userid = 13
Edit
In fact, do you need to check that type AND users are identical, or just users ?
If you need only check on users, then
inner join prices p
on p.users = l.users
If you need only check on type
inner join prices p
on p.type = l.type
If you need both, you will get 0 with your sample datas.
See SqlFiddle with 3 versions.

complex sql query (GROUP BY)

I need some help building a query.
Here is what I need :
I have a table called data:
ID| PRODUCT | VALUE |COUNTRY| DEVICE | SYSTEM
-----+---------+-------+-------+---------+--------
48 | p1 | 0.4 | US | dev1 | system1
47 | p2 | 0.67 | IT | dev2 | system2
46 | p3 | 1.2 | GB | dev3 | system3
45 | p1 | 0.9 | ES | dev4 | system4
44 | p1 | 0.6 | ES | dev4 | system1
I need to show which products have produced the most revenue and which country, device and system contributed the most.
**for example : the result i would get from the table would be:
PRODUCT | TOTAL COST |COUNTRY| DEVICE | SYSTEM
-------+------------+-------+---------+--------
p1 | 1.9 | ES | dev4 | system1
p2 | 0.67 | IT | dev2 | system2
p3 | 1.2 | GB | dev3 | system3
Top country is ES because ES contributed with 0.9 + 0.6 = 1.5 > 0.4 (contribution of US).
same logic for top device and top system.**
I guess for total revenue and product something like this will do :
SELECT SUM(value) as total_revenue,product FROM data GROUP BY product
But how can I add country,device and system?
Is this even feasible in a single query, if not what is the best way (performance wise) to do it?
Many thanks for your help.
EDIT
I edited the sample table to explain better.
Do it in separate queries:
SELECT product,
SUM(value) AS amount
FROM data
GROUP BY country -- change to device, system, etc. as required
ORDER BY amount DESC
LIMIT 1
You are correct... it is not just a simple query... but 3 queries wrapped into one result.
I've posted my sample out on SQL Fiddle here...
First query -- the inner most. You need to get all revenue based on a per product/country and sort that by the product and DESCENDING on the total revenue to have highest revenue in first position per product.
Next query (where I've implemented use of MySQL #variable use). Since the first result order already has it in order of product and revenue rank, I set the rank to 1 every time a product changes from whatever the "#LastProd" is... This would create ES = Rank #1 for product 1, then US = Rank #2 for product 1, then continue on the other "products".
The final outermost query re-joins back to the raw Data table but gets a list of all the devices and systems that comprised the product sale in question, but ONLY where the product rank was #1.
select
pqRank.product,
pqRank.country,
pqRank.revenue,
group_concat( distinct d2.device ) as PartDevices,
group_concat( distinct d2.system ) as PartSystems
from
( select
pq.product,
pq.country,
pq.revenue,
#RevenueRank := if( #LastProd = pq.product, #RevenueRank +1, 1 ) as ProdRank,
#LastProd := pq.product
from
( select
d.product,
d.country,
sum( d.value ) as Revenue
from
data d
group by
d.product,
d.country
order by
d.product,
Revenue desc ) pq,
( select #RevenueRank := 0,
#LastProd := ' ') as sqlvars
) pqRank
JOIN data d2
on pqRank.product = d2.product
and pqRank.country = d2.country
where
pqRank.ProdRank = 1
group by
pqRank.product,
pqRank.country
You could do sth like that
CREATE TABLE data
(
id int auto_increment primary key,
product varchar(20),
country varchar(4),
device varchar(20),
system varchar(20),
value decimal(5,2)
);
INSERT INTO data (product, country, device, system, value)
VALUES
('p1', 'US', 'dev1', 'system1', 0.4),
('p2', 'IT', 'dev2', 'system2', 0.67),
('p1', 'IT', 'dev1', 'system2', 0.23);
select 'p' as grouping_type, product, sum(value) as sumval
from data
group by product
union all
select 'c' as grouping_type, country, sum(value) as sumval
from data
group by country
union all
select 'd' as grouping_type, device, sum(value) as sumval
from data
group by device
union all
select 's' as grouping_type, system, sum(value) as sumval
from data
group by system
order by grouping_type, sumval
It's ugly, I wouldn't use it, but it should work.

How to add column totals to view

I'm constructing a SQL query for a business report (using MySQL). What I would like to do is create a table that looks like the following:
Product | Quantity | Price | Total
widget1 | 3 | 1.00 | 3.00
widget1 | 1 | 1.00 | 1.00
widget1 | 2 | 1.00 | 2.00
widget1 | 3 | 1.00 | 3.00
Total | 9 | 1.00 | 9.00
I can write a query to output everything except the last line of the table. Is this possible? If so how would one implement it?
I have tried some of the answers below with the following query but it doesn't work. I must be missing something fundamental.
SELECT uc_order_products.nid AS nid,
uc_orders.order_id AS 'order_id',
first_name.value AS 'firstname',
last_name.value AS 'lastname',
uc_order_products.title AS 'program',
uc_order_products.qty AS 'quantity',
uc_order_products.price AS 'price',
(uc_order_products.qty * uc_order_products.price) AS 'total',
sum(uc_order_products.qty) AS 'total quantity',
sum(uc_order_products.qty * uc_order_products.price) AS 'total revenue'
FROM profile_values first_name
INNER JOIN profile_values last_name ON first_name.uid = last_name.uid
LEFT JOIN uc_orders uc_orders ON uc_orders.uid = first_name.uid
LEFT JOIN uc_order_products uc_order_products ON uc_orders.order_id = uc_order_products.order_id
WHERE uc_orders.order_status IN ('completed')
AND first_name.fid =5
AND last_name.fid =6
AND COALESCE(:nid,nid) = nid
GROUP BY uc_order_products.nid WITH ROLLUP
I suspect that I can't use group by with rollup within the query that creates reporting table. How would I wrap the query to produce the desired result?
Thanks
I have had a little attempt at this, mainly because i hadn't heard of WITH ROLLUP (thanks biziclop) and I wanted to try it out.
CREATE TABLE test.MyTable(
product TEXT(10),
quantity NUMERIC,
price NUMERIC
);
INSERT INTO MyTable VALUES
("widget1", 3, 1),
("widget1", 1, 1),
("widget1", 2, 1),
("widget1", 3, 1),
;
SELECT
Product,
Quantity,
Price,
Total
FROM
(
SELECT
rownum,
COALESCE(Product, 'Total') AS Product,
Quantity,
Price,
(Quantity * Price) AS Total
FROM
(
SELECT
#rownum:=#rownum+1 rownum,
Product,
SUM(Quantity) AS Quantity,
Price AS Price
FROM
MyTable,
(SELECT #rownum:=0) r
GROUP BY
product, rownum
WITH ROLLUP
)
AS myalias
) AS myalias2
WHERE rownum IS NOT NULL
OR Product = 'Total'
Outputs:
I'm giving up now, but i am looking forward to seeing how a pro does it!
try this:
SELECT
product,
COUNT(product) AS quantity,
SUM(price) price
FROM product
GROUP BY product WITH ROLLUP