MySQL: Aggregating counts - mysql

I'm trying to find how many companies had sales in a specific segment. I've managed to get a count of the sales entries (5), but I can't seem to aggregate by the product segment as well. Please see this simplification:
http://sqlfiddle.com/#!9/685cb/1
CREATE TABLE Table1
(`company` text, `sales` int, `segment` text)
;
INSERT INTO Table1
(`company`, `segment`, `sales`)
VALUES
('ACME',10,100),
('ACME',11,100),
('HAL',10,25),
('HAL',13,25),
('GEN',11,50)
;
SELECT COUNT(company) AS companies,
CASE
WHEN segment IN (10, 11, 12, 13, 14, 15, 16)
THEN 'Product segment A'
WHEN segment IN (20, 21, 22)
THEN 'Product segment B'
WHEN segment IN (30)
THEN 'Product segment C'
END AS grp, SUM(sales) AS sum_sales
FROM Table1
WHERE
(company LIKE '%ACME%'
OR company LIKE '%HAL%'
OR company LIKE '%GEN%'
)
AND
segment IN (10, 11, 12, 13, 14, 15 ,16, 20, 21, 22, 30)
GROUP BY grp
ORDER BY grp
;
The goal is to get "companies" to show 3, as there are three companies that had sales in segment A.

You could use the distinct modifier in the count function to get the number of different entries:
SELECT COUNT(DISTINCT company) AS companies,
-- Here -----^
CASE
WHEN segment IN (10, 11, 12, 13, 14, 15, 16)
THEN 'Product segment A'
WHEN segment IN (20, 21, 22)
THEN 'Product segment B'
WHEN segment IN (30)
THEN 'Product segment C'
END AS grp, SUM(sales) AS sum_sales
FROM Table1
WHERE
(company LIKE '%ACME%'
OR company LIKE '%HAL%'
OR company LIKE '%GEN%'
)
AND
segment IN (10, 11, 12, 13, 14, 15 ,16, 20, 21, 22, 30)
GROUP BY grp
ORDER BY grp
;
SQLFiddle

Related

MYSQL JOIN with lower-than / greater-than conditions (minimum_quantity, valid_from)

I have
order table with columns
id
date
supplier_id
order_lineitem table with columns
id
order_id
article_id
order_quantity
order_price
a prices table with columns
id
article_id
supplier_id
valid_until
minimum_order_quantity
list_price
The prices table doesn't necessarily have to have a matching / valid entry, so this one would have to be joined via an outer join.
I'd like to compare order_prices against list_prices.
Therefore I need to somehow join
SELECT
o.id,
o.date,
ol.article_id,
ol.order_quantity,
ol.order_price,
p.list_price
FROM
`order` o JOIN order_lineitem ol on ol.order_id = o.id
LEFT OUTER JOIN prices p on
p.article_id = ol.article_id
AND p.supplier_id = o.supplier_id
AND p.minimum_order_quantity <= ol.order_quantity
AND IFNULL(p.valid_until, DATE('2099-12-31')) >= o.date
/* here comes the fun part that doesn't work (reliably) */
ORDER BY
IFNULL(p.valid_until, DATE('2099-12-31')) asc,
p.minimum_order_quantity desc
GROUP BY o.id, ol.id, p.article_id
/* ... trying to get only THAT price from the prices table that applies for the
(a) the given article
(b) from the given supplier
(c) that was valid at the time of purchase (i.e. has the smallest "valid_until" date that is greater than the purchase date)
(d) when ordering the given quantity (prices can also increase with higher quantities, so it has to be the price with the largest minimum_order_quantity that is smaller than the ordered quantity)
*/
I particularly don't want to fall into the trap (which I dug for myself here) of using group by to limit the results to 1 record from the prices table based on a previous sorting, since
(i) as per MySQL documentation it is non-deterministic which record will actually get returned (although it may in effect often work and this is a frequently suggested route to go) - also see this excellent explanation on the issue: https://stackoverflow.com/a/14770936/9818188 and
(ii) this concept wouldn't work on other SQL implementations like SQL Server, Maria DB & Co.
The question is not around putting in a nested query in order to be able to ORDER first and then GROUP subsequently. It's more about how to really properly get the correct row--ideally also working on other SQL implementations like SQL Server, Maria DB or Google BigQuery.
And since I can't really rely on prices being cheaper the more I buy I also can't simply get the min(list_price).
How can this can be achieved?
Since the output of this query is required for downstream processing, I can't slice & dice the task but need a full list of all orders with respective list prices.
EDIT
Here is a SQL fiddle - the desired prices are shown in column order_price, the prices incorrectly determined by the JOIN (excluding the order byclause - as this would cause non-deterministic results) are shown in column list_price:
http://sqlfiddle.com/#!9/f03a4f/2
CREATE TABLE `order`
(`id` int, `date` datetime, `supplier_id` int)
;
INSERT INTO `order`
(`id`, `date`, `supplier_id`)
VALUES
(1, '2022-01-15 00:00:00', 1),
(2, '2022-02-15 00:00:00', 1),
(3, '2022-03-15 00:00:00', 1),
(4, '2022-01-15 00:00:00', 2),
(5, '2022-02-15 00:00:00', 2),
(6, '2022-03-15 00:00:00', 2)
;
CREATE TABLE order_lineitem
(`id` int, `order_id` int, `article_id` int, `order_quantity` int, `order_price` int)
;
INSERT INTO order_lineitem
(`id`, `order_id`, `article_id`, `order_quantity`, `order_price`)
VALUES
(1, 1, 1, 1, 11),
(2, 1, 1, 10, 8),
(3, 1, 1, 100, 9),
(4, 2, 1, 1, 15),
(5, 2, 1, 10, 12),
(6, 2, 1, 100, 13),
(7, 3, 1, 1, 17),
(8, 3, 1, 10, 14),
(9, 3, 1, 100, 16),
(10, 4, 1, 1, 10),
(11, 4, 1, 10, 80),
(12, 4, 1, 100, 80),
(13, 5, 1, 1, 10),
(14, 5, 1, 10, 80),
(15, 5, 1, 100, 80),
(16, 6, 1, 1, 10),
(17, 6, 1, 10, 10),
(18, 6, 1, 100, 10)
;
CREATE TABLE prices
(`id` int, `article_id` int, `supplier_id` int, `valid_until` varchar(10), `minimum_order_quantity` int, `list_price` int)
;
INSERT INTO prices
(`id`, `article_id`, `supplier_id`, `valid_until`, `minimum_order_quantity`, `list_price`)
VALUES
(1, 1, 1, '2022-01-31', 1, 11),
(2, 1, 1, '2022-01-31', 10, 8),
(3, 1, 1, '2022-01-31', 100, 9),
(4, 1, 2, NULL, 1, 10),
(5, 1, 1, '2022-02-31', 1, 15),
(6, 1, 1, '2022-02-31', 10, 12),
(7, 1, 1, '2022-02-31', 100, 13),
(8, 1, 1, NULL, 1, 17),
(9, 1, 1, NULL, 10, 14),
(10, 1, 1, NULL, 100, 16),
(11, 2, 1, NULL, 1, 99),
(12, 1, 2, '2022-02-31', 10, 80)
;
SELECT
o.id,
o.supplier_id,
o.date,
ol.article_id,
ol.order_quantity,
ol.order_price,
p.list_price
FROM
`order` o JOIN order_lineitem ol on ol.order_id = o.id
LEFT OUTER JOIN prices p on
p.article_id = ol.article_id
AND p.supplier_id = o.supplier_id
AND p.minimum_order_quantity <= ol.order_quantity
AND IFNULL(p.valid_until, DATE('2099-12-31')) >= o.date
/* here comes the fun part that doesn't work (reliably) */
/* NOTE: I am purposesly commenting out the ORDER BY clause here, because
(a) it would have to go after GROUP BY - requiring a nested table which I would like to prevent AND, more importantly,
(b) limiting the numer of rows returned to 1 by GROUPing with an incomplete set of columns on a sorted table may return non-deterministic results as per the MySQL documentation.
see also https://stackoverflow.com/a/14770936/9818188 explaining the issue with GROUP BY in this context
#
# ORDER BY
# IFNULL(p.valid_until, DATE('2099-12-31')) asc,
# p.minimum_order_quantity desc
*/
GROUP BY o.id, ol.id, p.article_id
/* ... trying to get only THAT price from the prices table that applies for the
(a) the given article
(b) from the given supplier
(c) that was valid at the time of purchase (i.e. has the smallest "valid_until" date that is greater than the purchase date)
(d) when ordering the given quantity (prices can also increase with higher quantities, so it has to be the price with the largest minimum_order_quantity that is smaller than the ordered quantity)
*/
If you are interrestd in the highest listprice, you would do it like the.
If you need also other columns from theprices table, you need to SQL select only rows with max value on a column
as you have to join the sub querys for all articles
SELECT
o.id,
o.date,
ol.article_id,
ol.order_quantity,
ol.order_price,
(SELECT `list_price` FROM prices p WHERE
p.article_id = ol.article_id
AND p.supplier_id = o.supplier_id
AND p.minimum_order_quantity <= ol.order_quantity
AND IFNULL(p.valid_until, DATE('2099-12-31')) >= o.date
ORDER BY `list_price` DESC
LIMIT 1
) list_price
FROM
`order` o JOIN order_lineitem ol on ol.order_id = o.id

Get previous X days of revenue for each group

Here is my table
CREATE TABLE financials (
id INT(6) UNSIGNED AUTO_INCREMENT PRIMARY KEY,
CountryID VARCHAR(30) NOT NULL,
ProductID VARCHAR(30) NOT NULL,
Revenue INT NOT NULL,
cost INT NOT NULL,
reg_date TIMESTAMP
);
INSERT INTO `financials` (`id`, `CountryID`, `ProductID`, `Revenue`, `cost`, `reg_date`) VALUES
( 1, 'Canada', 'Doe' , 20, 5, '2010-01-31 12:01:01'),
( 2, 'USA' , 'Tyson' , 40, 15, '2010-02-14 12:01:01'),
( 3, 'France', 'Keaton', 80, 25, '2010-03-25 12:01:01'),
( 4, 'France', 'Keaton',180, 45, '2010-04-24 12:01:01'),
( 5, 'France', 'Keaton', 30, 6, '2010-04-25 12:01:01'),
( 6, 'France', 'Emma' , 15, 2, '2010-01-24 12:01:01'),
( 7, 'France', 'Emma' , 60, 36, '2010-01-25 12:01:01'),
( 8, 'France', 'Lammy' ,130, 26, '2010-04-25 12:01:01'),
( 9, 'France', 'Louis' ,350, 12, '2010-04-25 12:01:01'),
(10, 'France', 'Dennis',100,200, '2010-04-25 12:01:01'),
(11, 'USA' , 'Zooey' , 70, 16, '2010-04-25 12:01:01'),
(12, 'France', 'Alex' , 2, 16, '2010-04-25 12:01:01');
For each product and date combination, I need to get the revenue for previous 5 days. For instance, for Product ‘Keaton’, the last purchase was on 2010-04-25, it will only sum up revenue between 2010-04-20 to 2010-04-25 and therefore it will be 210. While for "Emma", it would return 75, since it would sum everything between 2010-01-20 to 2010-01-25.
SELECT ProductID, sum(revenue), reg_date
FROM financials f
Where reg_date in (
SELECT reg_date
FROM financials as t2
WHERE t2.ProductID = f.productID
ORDER BY reg_date
LIMIT 5)
Unfortunately, when i use either https://sqltest.net/ or http://sqlfiddle.com/ it says that 'LIMIT & IN/ALL/ANY/SOME subquery' is not supported. Would my query work or not?
Your query is on the right track, but probably won't work in MySQL. MySQL has limitations on the use of in and limit with subqueries.
Instead:
SELECT f.ProductID, SUM(f.revenue)
FROM financials f JOIN
(SELECT ProductId, MAX(reg_date) as max_reg_date
FROM financials
GROUP BY ProductId
) ff
ON f.ProductId = ff.ProductId and
f.reg_date >= ff.max_reg_date - interval 5 day
GROUP BY f.ProductId;
EDIT:
If you want this for each product and date combination, then you can use a self join or correlated subquery:
SELECT f.*,
(SELECT SUM(f2.revenue)
FROM financials f2
WHERE f2.ProductId = f.ProductId AND
f2.reg_date <= f.reg_date AND
f2.reg_date >= f.reg_date - interval 5 day
) as sum_five_preceding_days
FROM financials f;
After some trials I ended up with some complex query, that I think it solves your problem
SELECT
financials.ProductID, sum(financials.Revenue) as Revenues
FROM
financials
INNER JOIN (
SELECT ProductId, GROUP_CONCAT(id ORDER BY reg_date DESC) groupedIds
FROM financials
group by ProductId
) group_max
ON financials.ProductId = group_max.ProductId
AND FIND_IN_SET(financials.id, groupedIds) BETWEEN 1 AND 5
group by financials.ProductID
First I used group by financials.ProductID to count revenues by products. The real problem you are facing is eliminating all rows that are not in the top 5, for each group. For that I used the solution from this question, GROUP_CONCAT and FIND_IN_SET, to get the top 5 result without LIMIT. Instead of WHERE IN I used JOIN but with this, WHERE IN might also work.
Heres the FIDDLE

Mysql Match and count in the same table

I'm having an trouble with count product with some conditions on the same table..
Table structure:
INSERT INTO `filter` (`filter_seq_id`, `group_id`, `product_seq_id`) VALUES
(1, 1, 10),
(2, 1, 11),
(3, 1, 12),
(4, 1, 13),
(5, 2, 14),
(6, 2, 15),
(7, 2, 16),
(8, 2, 17),
(9, 3, 18),
(10, 3, 19),
(11, 3, 20),
(12, 3, 21),
(13, 4, 20),
(14, 4, 11),
(15, 4, 27),
(16, 4, 29),
(17, 5, 11),
(18, 5, 20),
(19, 5, 27),
(20, 5, 13);
Here i want count distinct product_seq_id for the group_id (1,2,3) only if product_seq_id also exits in both (4,5) group id..
for example:
group_id -> 1 found product_seq_id 11 in 4,5 so distinct count is 1
group_id -> 2 found nothing
group_id -> 3 found product_seq_id 20 in 4,5 so distinct count is 1
i have tried below query its not returning has i expect
its counting if product_seq_id exists in any one of (4,5)
And i want to count only if product_seq_id exits on both "4" and "5" group_id
SELECT
`f`.`group_id`, count(distinct f.product_seq_id) as count
FROM
filter f
JOIN
filter ff ON `ff`.`product_seq_id` = `f`.`product_seq_id`
AND `f`.`group_id` IN (1,2,3)
AND `ff`.`group_id` IN (4,5)
GROUP BY `f`.`group_id`
http://sqlfiddle.com/#!2/996c1/1
Is this what you are looking for?
select f123.GROUP_ID, count(f123.PRODUCT_SEQ_ID)
from (select product_seq_id from filter where group_id=4) f4
inner join (select product_seq_id from filter where group_id=5) f5
on f4.product_seq_id = f5.product_seq_id
inner join (select group_id, product_seq_id from filter where group_id<4) f123
on f123.product_seq_id = f5.product_seq_id
group by GROUP_ID order by GROUP_ID
The first two subselects selects all of the product_seq_id which are in filter in the group 4 or 5. Those 2 list are joined together only if both contain the same product_seq_id.
The Result of this is all the product_seq_id which are both in group 4 and group 5 (in this example 11 and 20)
Next this result is joined with the last subselect, which selects all group_id and product_seq_id in the groups 1 to 3. They are only joined if they contain any product_seq_id, which is in the previous result of the other join (so if they contain the product_seq_id 11 or 20). The Result of this looks like this:
Group_ID, Product_Seq_ID
1 11
3 20
This result is then grouped by the Group_ID and the amount of product_seq_id in each group is counted
EDIT: Added Explanation

SQL to fetch similar "match" results by percentage

This table stores user votes between user matches. There is always one winner, one loser and the voter.
CREATE TABLE `user_versus` (
`id_user_versus` int(11) NOT NULL AUTO_INCREMENT,
`id_user_winner` int(10) unsigned NOT NULL,
`id_user_loser` int(10) unsigned NOT NULL,
`id_user` int(10) unsigned NOT NULL,
`date_versus` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id_user_versus`),
KEY `id_user_winner` (`id_user_winner`,`id_user_loser`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=17 ;
INSERT INTO `user_versus` (`id_user_versus`, `id_user_winner`, `id_user_loser`, `id_user`, `date_versus`) VALUES
(1, 6, 7, 1, '2013-10-25 23:02:57'),
(2, 6, 8, 1, '2013-10-25 23:02:57'),
(3, 6, 9, 1, '2013-10-25 23:03:04'),
(4, 6, 10, 1, '2013-10-25 23:03:04'),
(5, 6, 11, 1, '2013-10-25 23:03:10'),
(6, 6, 12, 1, '2013-10-25 23:03:10'),
(7, 6, 13, 1, '2013-10-25 23:03:18'),
(8, 6, 14, 1, '2013-10-25 23:03:18'),
(9, 7, 6, 2, '2013-10-26 04:02:57'),
(10, 8, 6, 2, '2013-10-26 04:02:57'),
(11, 9, 8, 2, '2013-10-26 04:03:04'),
(12, 9, 10, 2, '2013-10-26 04:03:04'),
(13, 9, 11, 2, '2013-10-26 04:03:10'),
(14, 9, 12, 2, '2013-10-26 04:03:10'),
(15, 9, 13, 2, '2013-10-26 04:03:18'),
(16, 9, 14, 2, '2013-10-26 04:03:18');
I'm working on a query that fetches similar profiles. A profile is similar, when the voting percentage (wins vs loses) is +/- 10% of the specified profile.
SELECT id_user_winner AS id_user,
IFNULL(wins, 0) AS wins,
IFNULL(loses, 0) AS loses,
IFNULL(wins, 0) + IFNULL(loses, 0) AS total,
IFNULL(wins, 0) / (IFNULL(wins, 0) + IFNULL(loses, 0)) AS percent
FROM
(
SELECT id_user_winner AS id_user FROM user_versus
UNION
SELECT id_user_loser FROM user_versus
) AS u
LEFT JOIN
(
SELECT id_user_winner, COUNT(*) AS wins
FROM user_versus
GROUP BY id_user_winner
) AS w
ON u.id_user = id_user_winner
LEFT JOIN
(
SELECT id_user_loser, COUNT(*) AS loses
FROM user_versus
GROUP BY id_user_loser
) AS l
ON u.id_user = l.id_user_loser
This is the current result:
It's currently returning NULL rows, and they shouldn't be there. What still needs to get optimized (and can't quite put my finger on it) is:
bring users similar to user ABC only
specify condition that defines who is a similar user to, e.g. user id = 6 (where similar users have +/- 10% difference in percentage with user id 6)
Any help will be appreciated. Thanks!
To calculate wins and losses of each user without having to join the table to itself and use OUTER joins, it is possible to just select wins and losses separately and do a UNION ALL between them, but with additional information if given row represents a win for the user, or a loss.
Then, it's easy to calculate all wins and losses for each user. The tricky part was to incorporate the option for specifying to which user you would like to compare the profiles. I did that with a variable which is set to the value of percentage of the user with given user_id, which you can change from a constant to a variable.
Here is my proposal (comparing to user with id = 6):
SELECT
player_id AS id_user,
wins,
losses,
wins + losses AS total,
wins / (wins + losses) AS percent
FROM (
SELECT
player_id,
SUM(is_a_win) wins,
SUM(is_a_loss) losses,
CASE
WHEN player_id = 6
THEN #the_user_score := SUM(is_a_win) / (SUM(is_a_win) + SUM(is_a_loss))
ELSE NULL
END
FROM (
SELECT id_user_winner AS player_id, 1 AS is_a_win, 0 AS is_a_loss FROM user_versus
UNION ALL SELECT id_user_loser, 0, 1 FROM user_versus
) games
GROUP BY player_id
) data
WHERE
ABS(wins / (wins + losses) - #the_user_score) <= 0.1
;
Output:
ID_USER WINS LOSSES TOTAL PERCENT
6 8 2 10 0.8
9 6 1 7 0.8571
You could of course remove the user whose profile is the base for comparison by adding player_id != 6 (or, in the final solution, some variable name) condition to the outermost WHERE clause.
Example at SQLFiddle: Matching Profiles - Example
Could you provide some feedback if this is what you were looking for, and, if not, what output would you expect?

Select multiple rows from one table with a count of another

I am having trouble with a select statement. What I have so far is this -
SELECT COUNT(booked.desk_id),
name,
desk.desk_id,
phone,
fax,
dock,
pc
FROM desk, booked
WHERE desk.desk_id = booked.desk_id
AND booking_id >=1
AND location = "Cheltenham"
Which outputs
"12" "Desk 1" "1" "1" "0" "0" "1"
Which is close to what I want, but there is another desk in the desk table called Desk 2, which is it completely ignoring. And indeed, if there are bookings for Desk 2 it includes their count in what it is showing as a count for Desk 1...
Entire table strucutres is as follows:
table "booked"
INSERT INTO `booked` (`id`, `booking_id`, `desk_id`, `member_id`, `date_booked`) VALUES
(246, 1358121601, 1, 1, 'Monday 14th January at 4:40pm'),
(247, 1358121602, 1, 1, 'Monday 14th January at 4:40pm'),
(248, 1358121604, 1, 1, 'Monday 14th January at 4:40pm'),
(249, 1358121603, 1, 1, 'Monday 14th January at 4:40pm'),
(250, 1358121606, 1, 1, 'Monday 14th January at 4:40pm'),
(251, 1358121605, 1, 1, 'Monday 14th January at 4:40pm'),
(252, 1358121607, 2, 1, 'Monday 14th January at 4:40pm'),
(253, 1358121609, 2, 1, 'Monday 14th January at 4:40pm'),
(254, 1358121608, 2, 1, 'Monday 14th January at 4:40pm'),
(255, 1358121610, 2, 1, 'Monday 14th January at 4:40pm'),
(256, 1358121612, 2, 1, 'Monday 14th January at 4:40pm'),
(257, 1358121611, 2, 1, 'Monday 14th January at 4:40pm');
table "desk"
INSERT INTO `desk` (`location`, `desk_id`, `name`, `phone`, `fax`, `dock`, `pc`) VALUES
('Cheltenham', 1, 'Desk 1', 1, 0, 0, 1),
('Cheltenham', 2, 'Desk 2', 1, 1, 0, 1);
What I need help with is how to correctly structure the statement so it will output individual rows for each desk with it's relevant information.
You are missing a GROUP BY to go along with your aggregate function:
SELECT COUNT(booked.desk_id),
name,
desk.desk_id,
phone,
fax,
dock,
pc
FROM desk
INNER JOIN booked
ON desk.desk_id = booked.desk_id
WHERE booking_id >=1
AND location = "Cheltenham"
GROUP BY name;
In MySQL you do not have to GROUP BY all fields in the select list, but in other RDBMS you would have to use:
SELECT COUNT(booked.desk_id),
name,
desk.desk_id,
phone,
fax,
dock,
pc
FROM desk
INNER JOIN booked
ON desk.desk_id = booked.desk_id
WHERE booking_id >=1
AND location = "Cheltenham"
GROUP BY name, desk.desk_id, phone, fax, dock, pc
Based on your sample data and comment, you can use:
SELECT coalesce(CountDesk, 0) Total,
name,
d.desk_id,
phone,
fax,
dock,
pc
FROM desk d
LEFT JOIN
(
select COUNT(booked.desk_id) CountDesk,
desk_id
from booked
WHERE booking_id >=1
GROUP BY desk_id
) b
ON d.desk_id = b.desk_id
WHERE location = "Cheltenham"
See SQL Fiddle with Demo
If you want to do this without the subquery:
SELECT
Coalesce(count(b.desk_id), 0) Total,
name,
d.desk_id,
phone,
fax,
dock,
pc
FROM desk d
LEFT JOIN booked b
ON d.desk_id = b.desk_id
WHERE booking_id >=1
AND location = "Cheltenham"
GROUP BY name, d.desk_id, phone, fax, dock, pc ;
See SQL Fiddle with Demo