Getting a COUNT() from a GROUP BY - mysql

I've got an incredibly convoluted SQL query (three INNER JOINS) that results in an easy to read result set as follows (simplified). I have inherited the db, so it's impossible to change the structure of any of the existing tables, and therefore I have to perform the convoluted query to get to this point:
product_category | product_code
------------------------------------
Hardware 102
Hardware 104
Hardware 102
Software 205
Software 104
If I then simply do a GROUP BY product_category, product_code, I get most of the final result set I'm interested in:
product_category | product_code
------------------------------------
Hardware 102
Hardware 104
Software 205
Software 104
However, what's missing is number in stock:
product_category | product_code | num_in_stock
--------------------------------------------------------
Hardware 102 2
Hardware 104 1
Software 205 1
Software 104 1
Since I want to be able to COUNT() directly from the processing done by the GROUP BY statement, I'm a little lost.
Here is the SQL query thus far:
SELECT categories.product_category, codes.product_code FROM stock
INNER JOIN products ON stock.product_id = products.id
INNER JOIN codes ON products.code_id = codes.id
INNER JOIN categories ON codes.category_id = categories.id
GROUP BY categories.product_category, codes.product_code
The tables are as follows:
CATEGORIES - e.g., "Hardware", "Software"
CODES - e.g., 100, 204 (belongs to a category)
PRODUCTS - combinations of categories + codes, with a useless version #
STOCK - entries of products, if more than one is in stock, there are multiple entries
So the reason this is so messy is because of the useless version # field in PRODUCTS. What this means is that for a particular combo (e.g., "Hardware 102") it can be entered in PRODUCTS multiple times, each with different version # values, which will then cause STOCK to refer to different ids from PRODUCTS, even though, to me, it's the same product. Ugh!
Any ideas?
Edit:
So let's say there's a product "Misc 999" that has two different versions. This means that there will an entry in CATEGORIES of "Misc", in CODES of "999" (with a category_id of that belonging to "Misc"), and two entries in PRODUCTS (both with the same code_id but with different version info - which I'm ignoring).
Then, if we have 10 of these in stock (3 of one version and 7 of the other, but I'm ignoring version info) there will be 10 entries in the STOCK table, each of which will refer to the PRODUCTS table through an id (two different ids, in this case).

Just add count(*) to your select clause:
SELECT categories.product_category, codes.product_code, count(*) qty_in_stock
FROM stock
INNER JOIN products ON stock.product_id = products.id
INNER JOIN codes ON products.code_id = codes.id
INNER JOIN categories ON codes.category_id = categories.id
GROUP BY categories.product_category, codes.product_code
SQLFiddle here.

It's not entirely clear what you want, but perhaps this works:
SELECT categories.product_category
, codes.product_code
, SUM(num_in_stock) as num_in_stock
FROM (
SELECT product_id
, count(*) as num_in_stock
FROM stock
group by product_id
) a
INNER JOIN products
ON a.product_id = products.id
INNER JOIN codes
ON products.code_id = codes.id
INNER JOIN categories
ON codes.category_id = categories.id
GROUP BY categories.product_category
, codes.product_code

Related

Is there an SQL command to count frequencies of a value in different columns?

I have a very large dataset of donations to educational projects. I have done some processing and for this question there are three tables of interest: Project, Funding and Category.
Project contains the project ID, some other negligible info (e.g. date started), and the category ID the project belongs to. Projects can belong to one or two categories, and so there are two columns for each project. If a project only belongs to one category, category 2 is NULL. In total there are 8 categories, with category ID's going from 1 to 8.
Funding contains the project ID, some other negligible info (e.g. total cost), and the current status of the project. This is either 'fully funded' or 'expired', as all projects are done.
Category only contains 2 columns, one with the 8 category ID's and the other with the category names (1 - Sports, 2 - Science, etc).
*Project*
project_id category_id1 category_id2
... ... ...
... ... ...
*Funding*
project_id status
... ...
... ...
*Category*
Category_ID project_category
... ...
... ...
I'm now trying to find out for each category the percentage of those fully funded, which would be (fully funded) / (fully funded + expired). However, I can't seem to find a way to make SQL count instances for each category regardless of whether they are in category column 1 or category column 2 of 'Project' table. This is the code I have so far with its output:
SELECT project_category, status, count(project_category)
FROM Project
INNER JOIN Category ON Project.Category_ID1 = Category.Category_ID
INNER JOIN Funding ON Project.project_id = Funding.project_id
GROUP BY project_category, status
project_category status count(project_category)
Applied Learning Expired 4003
Applied Learning Fully Funded 11441
Essentials Expired 16
Essentials Fully Funded 219
Health & Sports Expired 1235
Health & Sports Fully Funded 4518
... .... ...
... .... ...
This output only counts the categories from project.category_id1. I could just make another table for project.category_id2 and add them up manually, but I would rather have it one table. Is there a way to do this?
Thanks for trying to help!!
You can unpivot and then aggregate:
SELECT c.project_category, f.status, count(*)
FROM (SELECT p.project_id1 as project_id, p.Category_ID FROM Project p
UNION ALL
SELECT p.project_id2 as project_id, p.Category_ID FROM Project p
) p JOIN
Category c
ON p.Category_ID = c.Category_ID JOIN
Funding f
ON p.project_id = f.project_id
GROUP BY c.project_category, f.status;
Note that this also introduces table aliases and qualified all column references.
Here is a db<>fiddle.

SQL query involving comparison of sets

Background
Products can be sold as bundles. Following tables are present: products, bundles, bundles_products, orders, orders_products.
An order would be said to "contain" a bundle if it contains all the bundle's products.
Problem
How would one go about counting orders for bundles?
Example
products table
id name
1 broom
2 mug
3 spoon
4 candle
bundles table
id name
1 dining
2 witchcraft
bundles_products table
bundle_id product_id
1 2
1 3
2 1
2 4
orders_products table
order_id product_id
1000 1
1000 3
1001 1
1001 2
1001 3
The query would return the following table:
bundle orders
dining 1
witchcraft 0
Notes
The example intentionally misses the orders table as it is not relevant what it contains.
Of course, this could be approached imperatively, by writing some code and gathering the data, but I was hoping there is a declarative, SQL way of querying for this kind of things?
One idea I had was to use a GROUP_CONCAT to concatenate all the products of a bundle and somehow compare that with products of each order. Still, a long way from clear.
One way is to use two Derived Tables (subqueries). In first subquery, we will fetch the total number of unique products for every bundle. In the second subquery, we will fetch the total products in an order, for a combination of order and bundle.
We will LEFT JOIN them on bundle_id as well as matching the total count of products per bundle in them. Eventually, we will do a grouping on bundle, and count the number of orders matching successfully.
SELECT dt1.id AS bundle_id,
dt1.name AS bundle,
Count(dt2.order_id) AS orders
FROM (SELECT b.id,
b.name,
Count(DISTINCT bp.product_id) AS total_bundle_products
FROM bundles AS b
JOIN bundles_products AS bp
ON bp.bundle_id = b.id
GROUP BY b.id,
b.name) AS dt1
LEFT JOIN (SELECT op.order_id,
bp.bundle_id,
Count(DISTINCT op.product_id) AS order_bundle_products
FROM orders_products AS op
JOIN bundles_products AS bp
ON bp.product_id = op.product_id
GROUP BY bp.bundle_id,
op.order_id) AS dt2
ON dt2.bundle_id = dt1.id
AND dt2.order_bundle_products = dt1.total_bundle_products
GROUP BY dt1.id,
dt1.name
SQL Fiddle DEMO
Here's the brief example, which lacks some parts, I omitted because I don't know precise database structure. Logic is such:
Temp table is generated, which consists of 3 rows - order, count of
products related to bundle, count of products in bundle
Then we select only orders from this table in which we have those last two
variables equal
select count(order_id) from orders
left join(
select count(*) from bundles_products as bundle_amount,
sum(case when orders_products in (
select names from bundles_products where bundle_id='1') then 1 else 0) as order_total,
orders.order_id
left join product on bundle_products.product_id = products.product_id
left join orders on products.product_id = orders_products.product_id
where bundle_products.bundle_id ='1'
) as my_table
on orders.order_name = my_table.orders
where my_table.bundle_amount = my_table.order_total
Edit: I posted this as a response to previous version of the question, without detailed explanation.
Edit2: fixed query a bit. It can be starting point. Logic is still the same, you can get amount of orders for each bundle_id using it

SQL query to get results between 2 tables, and the second one has 3 possibilities of returning data

Even though my question was warned as similar title, I couldn't find here any similar problem. Let me explain in details:
I've got two tables (I'm working with MySQL) with these values inserted:
table products:
id name
1 TV
2 RADIO
3 COMPUTER
table sales (product_id is A FK which references products(id)):
id quantity product_id
1 50 2
2 100 3
3 200 3
The tv's haven't been sold, radios got 1 sale (of 50 unities) and computers got two sales (one of 100 e other of 200 unities);
Now I must create a query where I can show the products and its sales, but there are some conditions that make that task difficult:
1 - If there's no sales, show obviously NULL;
2 - If there's 1 sale, show that sale;
3 - If there's more than 1 sale, show the latest sale (which I've tried to use function MAX(id) to make it simple, and yet didn't worked);
In the tables example above, I expect to show this, after a proper SQL Query:
products.NAME sales.QUANTITY
TV NULL
RADIO 50
COMPUTER 200
I've been trying lots of joins, inner joins, etc., but couldn't find the result I expect. Which SQL query can give the answer I expect?
Any help will be very appreciated.
Thanks.
Hope the below query works.
SELECT products.name, sl.quantity
FROM products LEFT JOIN (
SELECT product_id, max(quantity) as quantity FROM sales GROUP BY product_id) sl
ON products.id = sl.product_id
In MySQL 8.0 you can do:
with m (product_id, max_id) as ( -- This is a CTE
select product_id, max(id) from sales group by product_id
)
select
p.name,
s.quantity
from products p
left join m on m.product_id = p.id
left join sales s on s.id = m.max_id
If you have an older MySQL, you can use a Table Expression:
select
p.name,
s.quantity
from products p
left join ( -- This is a table expression
select product_id, max(id) as max_id from sales group by product_id
) m on m.product_id = p.id
left join sales s on s.id = m.max_id

MYSQL Query search in relationship

For the sake of clarity and this question i will rename the tables so it is a bit clearer for everybody and explain what i want to achieve:
There is an input form with options that return categories ID's. If a 'Product' has 'Category', i want to return/find the 'Product' which lets say has multiple categories(or just 1) and all of its categories are inside the array that is passed from the form.
Products table
ID Title
1 Pizza
2 Ice Cream
Categories table
ID Title
1 Baked food
2 Hot food
ProductsCategories table
ID ProductId CategoryId
1 1 1
2 1 2
So if i pass [1,2] the query should return Product with id 1 since all ProductsCategories are inside the requested array, but if i pass only 1 or 2, the query should return no results.
Currently i have the following query which works, but for some reason if i create a second Product and create a ProductCategory that has a CategoryId same as the first product, the query returns nulll...
SELECT products.*
FROM products
JOIN products_categories
ON products_categories.product_id= products.id
WHERE products_categories.category_id IN (1, 2)
HAVING COUNT(*) = (select count(*) from products_categories pc
WHERE pc .product_id = products.id)
All help is deeply appretiated! Cheers!
In order to match all values in IN clause, you just need to know in addition the number of passed categories which you must use it in HAVING clause:
SELECT
p.*,
GROUP_CONCAT(c.title) AS categories
FROM
Products p
INNER JOIN ProductsCategories pc ON pc.productId = p.ID
INNER JOIN Categories c ON c.ID = pc.categoryId
WHERE
pc.categoryId IN (1,2)
GROUP BY
p.id
HAVING
COUNT(DISTINCT pc.categoryId) = 2 -- this is # of unique categories in IN clause
So in case IN (1,2) result is:
+----+-------+---------------------+
| id | title | categories |
+----+-------+---------------------+
| 1 | Pizza | Baked Food,Hot Food |
+----+-------+---------------------+
1 row in set
In case IN (1,3) result is Empty set (no results).
#mitkosoft, thanks for your answer, but sadly the query is not producing the needed results. If the product's categories are partially in the passed categories the product is still returned. Additionally i might not know how many parameters are sent by the form.
Luckily I managed to create the query that does the trick and works perfectly fine (at least so far)
SELECT products.*,
COUNT(*) as resultsCount,
(SELECT COUNT(*) FROM products_categories pc WHERE pc.product_id = products.id) as categoriesCount
FROM products
JOIN products_categories AS productsCategories
ON productsCategories.product_id= products.id
WHERE productsCategories.category_id IN (7, 15, 8, 1, 50)
GROUP BY products.id
HAVING resultsCount = categoriesCount
ORDER BY amount DESC #optional
That way the query is flexible and gives me exactly what I needed! - Only those products that have all their categories inside the search parameters(not partially).
Cheers! :)

Get random records distributed by categories

I need to get records from a table products, from several categories without run at risk of receiving records concentrated in few categories.
My sqlFiddle: http://sqlfiddle.com/#!2/060877/2
As you can see at SQLFiddle, I have 5 categories and 20 products (of course it's an example). I want to get, for example, 10 products without run at risk of receiving 5 or more products from category 2 and only 1 (or none) from 5.
If it's possible, return at proportion of quantity of each category (if 6 product, return 3 and so on).
Does MySQL make this automatically for me?
What I meant (setting the limit as you need):
(SELECT p.*, c.name AS category
FROM products AS p
INNER JOIN category c ON p.category_id=c.id
WHERE c.id = 1 LIMIT 6)
UNION
(SELECT p.*, c.name AS category
FROM products AS p
INNER JOIN category c ON p.category_id=c.id
WHERE c.id = 2 LIMIT 3)
ORDER BY RAND()