Normalize data in SQL query - mysql

I have an SQL query A (see below for more details) that returns a table as following:
cluster brand amount
0 bos 600
0 phi 300
0 har 100
1 pro 2500
1 wal 1500
1 ash 1000
2 dil 4200
2 sor 500
2 van 300
...
However, I want to show not the amount, but the fraction of that amount compared to the total amount in that cluster, like in the following table:
cluster brand amount
0 bos 0.60
0 phi 0.30
0 har 0.10
1 pro 0.50
1 wal 0.30
1 ash 0.20
2 dil 0.84
2 sor 0.10
2 van 0.06
...
How should I change my SQL such that I can get access to the sum over all amounts in one cluster, and still have multiple rows with the same cluster?
** Details **
SQL server: MySQL, interfaced through the python-MySQL connector.
Current SQL query to generate the first table:
SELECT c.cluster, brand, COUNT(o.id) AS brand_amount
FROM nyon_all.clustering AS c
LEFT JOIN nyon_all.persons AS p ON c.pid = p.id
LEFT JOIN nyon_all.orders AS o ON p.id = o.pid
LEFT JOIN nyon_all.articles AS a ON o.aid = a.id
LEFT JOIN nyon_all.brands AS ab ON a.brand_id = ab.id
WHERE c.cluster_round = 'Org_2014-08-27_10:45:35'
GROUP BY cluster, brand
HAVING brand_amount > 100
ORDER BY c.cluster ASC, brand_amount DESC;
Table orders (primary key id) links persons (foreign key pid) with articles (foreign key aid). Articles have a certain brand (foreign key brand_id), which are related to a name in the Table brands.
The total amount of articles per cluster can be retrieved with the following SQL query:
SELECT c.cluster, COUNT(o.pid) AS amount
FROM nyon_all.clustering AS c
LEFT JOIN nyon_all.persons AS p ON c.pid = p.id
LEFT JOIN nyon_all.orders AS o ON p.id = o.pid
WHERE c.cluster_round = 'Org_2014-08-27_10:45:35'
GROUP BY cluster
ORDER BY c.cluster ASC, amount DESC;
Result:
cluster amount
0 1000
1 5000
2 5000
However, I can't seem to combine the two SQL queries.

You could do a join on a subquery summing the amount by cluster
select t1.cluster, amount / sumAmount
from Table1 t1
join (select cluster, sum(amount) as sumAmount
from Table1
group by cluster)s
on t1.cluster = s.cluster
see SqlFiddle
EDIT
SELECT
c.cluster,
brand,
COUNT(o.id) / coalesce(s.sumBrandAmount, 0) AS brand_amount -- of course it would be nice to check for dividing by 0
FROM nyon_all.clustering AS c
LEFT JOIN nyon_all.persons AS p ON c.pid = p.id
LEFT JOIN nyon_all.orders AS o ON p.id = o.pid
LEFT JOIN nyon_all.articles AS a ON o.aid = a.id
LEFT JOIN nyon_all.brands AS ab ON a.brand_id = ab.id
LEFT JOIN (select c1.id, count(o1.id) as sumBrandAmount
from nyon_all.clustering c1
left join nyon_all.persons p1 on p1.id = c1.pid
left join nony_all.orders as o1 on o1.id = p1.id
--maybe some where clause as in your main query
group by c1.id) s
ON s.id = c.id
WHERE c.cluster_round = 'Org_2014-08-27_10:45:35'
GROUP BY cluster, brand
HAVING brand_amount > 100
ORDER BY c.cluster ASC, brand_amount DESC;

Related

MYSQL JOIN query to group user skills for conditional count

I am struggling to find the logic for a JOIN query with GROUP BY.
I have 3 tables.
1. tbl_users
2. tbl_event_orders
3.tbl_event_signature ( For saving signatures on completed events)
tbl_users
id name skill
---------------------------
1 user1 A
2 user2 B
3 user3 A
4 user4 A
tbl_orders
id user_id item_id price
------------------------------------
1 1 1 100
2 2 1 100
3 3 1 100
4 4 1 100
tbl_signature
id item_id user_id signature
----------------------------------------------
1 1 1 xxxxxxxx...
1 1 3 NULL
1 1 4 xxxxxxxx...
I need the event details from item id.
For example for item with id 1, I need the following result.
skill total_count attended_users_count amount
A 3 2 300
B 1 0 100
skill - skill from user table.
total_count - total count of orders from that partical skill.
attended_users_count - total count of orders from that partical skill + this should have an entry and a NOT NULL value in tbl_signature table.
amount - sum of price(total_count)
I have the following query for getting users with skills and total count.
SELECT
U.skill as skill,
count(U.skill) as total_count,
sum( O.price ) as amount
FROM tbl_users U
INNER JOIN tbl_orders O
ON U.id = O.user_id
WHERE O.item_id = 1
GROUP BY U.skill
but when dealing with attended users count, I am getting unexpected results.
I have tried with the following query,
SELECT
U.skill as skill,
count(U.skill) as total_count,
count( S.signature ) as attended_users_count,
sum( O.price ) as amount
FROM tbl_users U
INNER JOIN tbl_orders O
ON U.id = O.user_id
LEFT JOIN tbl_signature S
ON O.item_id = S.item_id
WHERE O.item_id = 1
GROUP BY U.skill
Is there any way to get this in a single query?
This query should give you the results you want. It JOINs tbl_orders to tbl_users, and then LEFT JOINs to tbl_signature. Rows of tbl_signature which don't match, or which don't have a signature, will not get included in the count for that order:
SELECT u.skill,
COUNT(o.id) AS total_count,
COUNT(s.signature) AS attended_users_count,
SUM(o.price) AS amount
FROM tbl_orders o
JOIN tbl_users u ON u.id = o.user_id
LEFT JOIN tbl_signature s ON s.item_id = o.item_id AND s.user_id = u.id
WHERE o.item_id = 1
GROUP BY u.skill
Output:
skill total_count attended_users_count amount
A 3 2 300
B 1 0 100
Demo on dbfiddle

Retriving data from 2 tables

I have 2 tables with below description.
table 1: customer, columns : customer_id, source
table 2: source, columns: source, rank
one customer would have many sources, each source has a particular rank in the rank table, i need to fetch the data in such a way that for each individual customer which ever has a lowest ranked source i need to fetch those records.
Here is an example:
customer table data is
1 abc
2 efg
3 abc
1 efg
1 hij
2 hij
source table data is
abc 2
hij 1
efg 3
the result set should be:
1 hij
2 hij
3 abc
You could use either of the two queries below to satisfy your requirement.
QUERY 1
SELECT c.customer_id,
c.source
FROM customer c
INNER JOIN source s
ON c.source = s.source
WHERE s.rank = (SELECT Min(s1.rank)
FROM source s1 inner join customer c1 on s1.source = c1.source
WHERE c1.customer_id = c.customer_id)
QUERY 2
SELECT x.customer_id ,
c1.source
FROM
(SELECT c.customer_id ,
MIN(s.rank) AS MinRank
FROM customer c
INNER JOIN SOURCE s ON c.source = s.source
GROUP BY c.customer_id) x
INNER JOIN customer c1 ON x.customer_id = c1.customer_id
INNER JOIN SOURCE s1 ON s1.source = c1.source
AND s1.rank = x.MinRank;
UPDATE 1
This update is in response to your comment for 3 tables rather than 2 tables. The query below extends Query 1 when your schema is spread across 3 tables.
SELECT c.customer_id,
s.source_name
FROM customer c
INNER JOIN source s
ON c.cust_id = s.cust_id
INNER JOIN rank r
ON s.source_name = r.source_name
WHERE r.rank = (SELECT Min(r1.rank)
FROM customer c1
INNER JOIN source s1
ON s1.cust_id = c1.cust_id
INNER JOIN rank r1
ON r1.source_name = s.source_name
WHERE c1.cust_id = c.cust_id);
For Oracle:
select d.customer_id, d.source
from (
select
c.customer_id,
s.source,
row_number() over (partition by c.customer_id order by s.rank asc) as rn
from customer c
join source s
on c.source = s.source
) d
where d.rn = 1
;
A much simpler way. Try this -
select c.cid,c.sourceid,min(s.rankid)
from customer c inner join sourc s
on (c.sourceid=s.sourceid)
group by c.cid order by c.cid asc
Here's an SQLFiddle
Select a.customer_id,b.source
from
(select c.customer_id,min(s.rank) as rank
from customer c
inner join source s
on c.source=s.source
group by c.customer_id) as a
inner join source b
on a.rank = b.rank

MySQL Queries with Meta Keys and Values

I have a hard time wrapping my head around coming up with a nice clean mysql query for this problem. I have two tables:
ORDER ITEMS ORDER ITEM META
----------------- ---------------------
ID Name ID Key Value
----------------- ---------------------
24 Product A 24 _qty 3
30 Product B 30 _qty 5
33 Product B 30 _weight 1000g
55 Product A 33 _qty 1
----------------- 33 _weight 500g
55 _qty 2
---------------------
I ran this query:
SELECT
oi.ID,
oi.Name,
oim1.Value as Qty,
oim2.Value as Weight
FROM
`order_items` as oi
INNER JOIN `order_item_meta` as oim1 ON oim1.ID = oi.ID
INNER JOIN `order_item_meta` as oim2 ON oim2.ID = oi.ID
WHERE
oim1.Key = '_qty' AND
oim2.Key = 'weight'
But it only gives me
-------------------------------
ID Name Qty Weight
-------------------------------
30 Product B 5 1000g
33 Product B 1 500g
-------------------------------
I need to include products that do not have _weight defined as a key so it will give me the following results:
-------------------------------
ID Name Qty Weight
-------------------------------
24 Product A 3
30 Product B 5 1000g
33 Product B 1 500g
55 Product A 2
-------------------------------
Try using an outer join:
select oi.id, oi.name, oim1.value as qty, oim2.value as weight
from order_items as oi
join order_item_meta as oim1
on oim1.id = oi.id
left join order_item_meta as oim2
on oim2.id = oi.id
and oim2.key = '_weight'
where oim1.key = '_qty'
Fiddle Test:
http://sqlfiddle.com/#!2/dd3ad6/2/0
If there is ever a situation where an order doesn't have a quantity you would also have to use an outer join for the quantity, like this:
select oi.id, oi.name, oim1.value as qty, oim2.value as weight
from order_items as oi
left join order_item_meta as oim1
on oim1.id = oi.id
and oim1.key = '_qty'
left join order_item_meta as oim2
on oim2.id = oi.id
and oim2.key = '_weight'
However if an order ALWAYS has an associated quantity (just not necessarily an associated weight) you should use the first query instead, an inner join for the quantity, and an outer join for the weight. (it all depends on your situation)

MySQL - Total of parent + child categories

I have two tables:
parent-child 'categories':
id name parent_id
1 Food NULL
2 Pizza 1
3 Pasta 2
'transactions':
id amount category_id
1 100 1
2 50 2
3 25 2
I want to return all the Categories along with two total columns:
total = The sum of the amount for all transactions with this category_id
parentTotal = total + the total of all its child categories
Example (using the tables above):
id name parent_id total parentTotal
1 Food NULL 100 175
2 Pizza 1 0 0
3 Pasta 2 75 0
EDIT:
Code updated (based on code from Nedret Recep below) and works fine...
SELECT
tmp1.id, tmp1.name, tmp1.parent_id, tmp1.total, IFNULL(tmp1.total, 0) + IFNULL(tmp2.s, 0) AS parenttotal
FROM
(SELECT
ca.id, ca.name, ca.parent_id, SUM(tr.amount) as total
FROM
categories ca
LEFT JOIN
transactions tr
ON
tr.category_id = ca.id
GROUP BY
ca.id)
AS tmp1
LEFT JOIN
(SELECT
c.id, c.parent_id as categoryid, SUM(t.amount) AS s
FROM
transactions t
RIGHT JOIN
categories c
ON
t.category_id = c.id
GROUP BY
c.parent_id)
AS tmp2
ON tmp2.categoryid = tmp1.id
order by coalesce(tmp1.parent_id, tmp1.id), tmp1.parent_id
I'd really appreciate some help - thanks!
With one inner join we calculate the totals in category which is standard. Then with another inner join we calculate the sums but this time grouping by parent_id. Then we join the two result tables to have both sums in one row. This query will be slow with large tables so an alternative approach on application level would do better.
SELECT
tmp1.id, tmp1.name, tmp1.parent_id, tmp1.total, tmp1.total + tmp2.s AS parenttotal
FROM
(SELECT
ca.id, ca.name, ca.parent_id, SUM(tr.amount) as total
FROM
transactions tr
INNER JOIN
categories ca
ON
tr.categoru_id = ca.id
GROUP BY
ca.id)AS tmp1
LEFT OUTER JOIN
(
SELECT
c.parent_id as categoryid, SUM(t.amount) AS s
FROM
transactions t
INNER JOIN
categories c
ON
t.category_id = c.i
GROUP
BY c.id ) AS tmp2
ON
tmp2.categoryid = tmp.id

mysql group by not doing what i am expecting?

SELECT
bp.product_id,bs.step_number,
p.price, pd.name as product_name
FROM
builder_product bp
JOIN builder_step bs ON bp.builder_step_id = bs.builder_step_id
JOIN builder b ON bp.builder_id = b.builder_id
JOIN product p ON p.product_id = bp.product_id
JOIN product_description pd ON p.product_id = pd.product_id
WHERE b.builder_id = '74' and bs.optional != '1'
group by bs.step_number
ORDER by bs.step_number, p.price
but here is my results
88 1 575.0000 Lenovo Thinkcentre POS PC
244 2 559.0000 Touchscreen with MSR - Firebox 15"
104 3 285.0000 Remote Order Printer - Epson
97 4 395.0000 Aldelo Lite
121 5 549.0000 Cash Register Express - Pro
191 6 349.0000 Integrated Payment Processing
155 7 369.0000 Accessory - Posiflex 12.1" LCD Customer Display
That's not how GROUP BY is supposed to work. If you group by a number of columns, your select can only return:
The columns you group by
Aggregation functions from other columns, such as MIN(), MAX(), AVG()...
So you'd need to do this:
SELECT
bs.step_number,
MIN(p.price) AS min_price, pd.name as product_name
FROM
builder_product bp
JOIN builder_step bs ON bp.builder_step_id = bs.builder_step_id
JOIN builder b ON bp.builder_id = b.builder_id
JOIN product p ON p.product_id = bp.product_id
JOIN product_description pd ON p.product_id = pd.product_id
WHERE b.builder_id = '74' and bs.optional != '1'
group by bs.step_number, pd.name
ORDER by bs.step_number, min_price
(MySQL allows a very relaxed syntax and will happily remove random rows for each group but other DBMS will trigger an error with your original query.)
Join to a sub select of the tables which only contain the min value of each group
In this example. the mygroup min(amt) returns the lowest dollar item for a group
I then join this back to the main table as a full inner join to limit the records only to that minimum.
Select A.myGROUP, A.Amt
from mtest A
INNER JOIN (Select myGroup, min(Amt) as minAmt from mtest group by mygroup) B
ON B.myGroup=A.mygroup
and B.MinAmt = A.Amt
Yes. Each different group key is returned only once. This problem is not easily solved. Run two distinct queries and combine results afterwards. IF this is not an option create a temporary table for the minimum price for each step join the tables in the query.