Eliminate certain duplicated rows after group by - mysql

With this db:
Chef(cid,cname,age),
Recipe(rid,rname),
Cooked(orderid,cid,rid,price)
Customers(cuid,orderid,time,daytime,age)
[cid means chef id, and so on]
Given orders from customers, I need to find for each chef, the difference between his age and the average of people who ordered his/her meals.
I wrote the following query:
select cid, Ch.age - AVG(Cu.age) as Diff
from Chef Ch NATURAL JOIN Cooked Co,Customers Cu
where Co.orderid = Cu.orderid
group by cid
This solves the problem, but if you assume that customers has their unique id, it might not work,because then one can order two meals of the same chef and affect the calculation.
Now I know it can be answered with NOT EXISTS but I'm looking for a soultion which includes the group by function (something similar to what I wrote). So far I couldn't find (I searched and tried many ways, from select distinct , to manipulation in the where clause ,to "having count(distinct..)" )
Edit: People asked for an exmaple. i'm coding using SQLFiddle and it crashes alot, so I'll try my best:
cid | cuid | orderid | Cu.age
-----------------------------
1 333 1 20
1 200 2 41
1 200 5 41
2 4 3 36
Let's say Chef 1's age is 50 . My query will give you 50 - (20+40+40/3) = 16 and 2/3. althought it should actually be 50 - (20+40/2) = 20. (because the guy with id 200 ordered two recipes of our beloved Chef 1.).
Assume Chef 2's age is 47. My query will result:
cid | Diff
----------
1 16.667
2 11
Another edit: I wasn't taught any particular sql-query form.So I really have no idea what are the differences between Oracle's to MySql's to Microsoft Server's, so I'm basically "freestyle" querying.(I hope it will be good in my exam as well :O )

First, you should write your query as:
select cid, Ch.age - AVG(Cu.age) as Diff
from Chef Ch join
Cooked Co
on ch.cid = co.cid join
Customers Cu
on Co.orderid = Cu.orderid
group by cid;
Two different reasons:
NATURAL JOIN is just a bug waiting to happen. List the columns that you want used for the join, lest an unexpected field or spelling difference affect the results.
Never use commas in the FROM clause. Always use explicit JOIN syntax.
Next, the answer to your question is more complicated. For each chef, we can get the average age of the customers by doing:
select cid, avg(age)
from (select distinct co.cid, cu.cuid, cu.age
from Cooked Co join
Customers Cu
on Co.orderid = Cu.orderid
) c
group by cid;
Then, for the difference, you need to bring that information in as well. One method is in the subquery:
select cid, ( age - avg(cuage) ) as diff
from (select distinct co.cid, cu.cuid, cu.age as cuage, c.age as cage
from Chef c join
Cooked Co
on ch.cid = co.cid join
Customers Cu
on Co.orderid = Cu.orderid
) c
group by cid, cage;

Related

How can I add a column to a right join select query

I am trying to find a way to add a country code to a database call record based on a phone number column. I have a table with countries and their dialling codes called countries. I can query all records and add the country code after but I need to be able to filter and paginate the results.
I am working with a system I don't have much control over so adding new columns to tables or rewriting large blocks of code isn't really an option. This is what I have to work with.
Countries Table.
id
name
dialling_code
1
Ireland
353
2
America
1
Call Record table.
id
startdatetime
enddatetime
route_id
phonenumber
duration_seconds
1
2014-12-18 18:51:12
2014-12-18 18:52:12
23
3538700000
60
2
2014-12-18 17:41:02
2014-12-18 17:43:02
43
18700000
120
Routes table.
id
number
enabled
23
1234567890
1
43
0987654321
1
I need to get sum values of duration, total unique phone numbers all grouped by route_id, route_number but now we need to group these results by country_id so we can group callers by country. I use the mysql query below to get sum values of duration, total unique phone numbers all grouped by route_id, route_number. This query was written by another developer a long time ago.
SELECT
phone_number,
route_number,
COUNT(callrecord_id) AS total_calls,
SUM(duration_sec) AS total_duration,
callrecord_join.route_id
FROM routes
RIGHT JOIN (
SELECT
DATE(a.startdatetime) AS call_date,
a.id AS callrecord_id,
a.route_id AS route_id,
a.phonenumber AS phone_number,
a.duration_seconds as duration_sec,
b.inboundnumber AS route_number,
FROM callrecord AS a
INNER JOIN routes AS b ON a.route_id = b.id
WHERE DATE_FORMAT(a.startdatetime, '%Y-%m-%d') >= '2014-12-18'
AND DATE_FORMAT(a.startdatetime, '%Y-%m-%d') <= '2014-12-18'
AND b.isenabled = 1
) AS callrecord_join ON routes.id = callrecord_join.route_id
GROUP BY route_id, route_number
LIMIT 10 offset 0;
I have everything up to adding a country_id in the right join table so I can group by the country_id.
I know I could loop through each country using php and get the results using a where clause, something like the below but I cannot paginate these results or filter them easily.
WHERE LEFT(a.phonenumber, strlen($dialling_code)) = $dialling_code
How can I use the countries table to add a column to the join table query with the country id so I can group by route_id, route_number and country_id? Something like the table below.
id
startdatetime
enddatetime
route_id
phonenumber
duration_seconds
country_id
1
2014-12-18 18:51:12
2014-12-18 18:52:12
23
3538700000
60
1
2
2014-12-18 17:41:02
2014-12-18 17:43:02
43
18700000
120
2
The RIGHT JOIN from routes to callrecord_join serves no purpose, as you already have the INNER JOIN between routes and callrecord in the sub-query, which is on the righthand side of the join.
You can use the join you have described -
JOIN countries c ON LEFT(a.phonenumber, LENGTH(c.dialling_code)) = c.dialling_code
but it will give the same result as:
JOIN countries c ON a.phonenumber LIKE CONCAT(c.dialling_code, '%')
which should be slightly less expensive.
You should test the join to countries to make sure none of your numbers in callrecord join to multiple countries. Some international dialling codes are ambiguous, so it depends on which list of dialling codes you are using.
SELECT a.*, COUNT(*), GROUP_CONCAT(c.dialling_code)
FROM callrecord a
JOIN country c ON a.phonenumber LIKE CONCAT(c.dialling_code, '%')
GROUP BY a.id
HAVING COUNT(*) > 1;
Obviously, you will need to batch the above query if your dataset is very large.
I hope I am not grossly over-simplifying things, but from what I understand of your question the query is just:
SELECT
r.id AS route_id,
r.number AS route_number,
c.name AS country_name,
SUM(a.duration_seconds) AS total_duration,
COUNT(a.id) AS total_calls,
COUNT(DISTINCT a.phonenumber) AS unique_numbers
FROM callrecord AS a
JOIN routes AS r ON a.route_id = r.id
JOIN countries c ON a.phonenumber LIKE CONCAT(c.dialling_code, '%')
WHERE a.startdatetime >= '2014-12-18'
AND a.startdatetime < '2014-12-19'
AND r.isenabled = 1
GROUP BY r.id, r.number, c.name
LIMIT 10 offset 0;
Please note the removal of DATE_FORMAT() from the startdatetime to make these criteria sargable, assuming a suitable index is available.

SQL query involving comparison of sets

Background
Products can be sold as bundles. Following tables are present: products, bundles, bundles_products, orders, orders_products.
An order would be said to "contain" a bundle if it contains all the bundle's products.
Problem
How would one go about counting orders for bundles?
Example
products table
id name
1 broom
2 mug
3 spoon
4 candle
bundles table
id name
1 dining
2 witchcraft
bundles_products table
bundle_id product_id
1 2
1 3
2 1
2 4
orders_products table
order_id product_id
1000 1
1000 3
1001 1
1001 2
1001 3
The query would return the following table:
bundle orders
dining 1
witchcraft 0
Notes
The example intentionally misses the orders table as it is not relevant what it contains.
Of course, this could be approached imperatively, by writing some code and gathering the data, but I was hoping there is a declarative, SQL way of querying for this kind of things?
One idea I had was to use a GROUP_CONCAT to concatenate all the products of a bundle and somehow compare that with products of each order. Still, a long way from clear.
One way is to use two Derived Tables (subqueries). In first subquery, we will fetch the total number of unique products for every bundle. In the second subquery, we will fetch the total products in an order, for a combination of order and bundle.
We will LEFT JOIN them on bundle_id as well as matching the total count of products per bundle in them. Eventually, we will do a grouping on bundle, and count the number of orders matching successfully.
SELECT dt1.id AS bundle_id,
dt1.name AS bundle,
Count(dt2.order_id) AS orders
FROM (SELECT b.id,
b.name,
Count(DISTINCT bp.product_id) AS total_bundle_products
FROM bundles AS b
JOIN bundles_products AS bp
ON bp.bundle_id = b.id
GROUP BY b.id,
b.name) AS dt1
LEFT JOIN (SELECT op.order_id,
bp.bundle_id,
Count(DISTINCT op.product_id) AS order_bundle_products
FROM orders_products AS op
JOIN bundles_products AS bp
ON bp.product_id = op.product_id
GROUP BY bp.bundle_id,
op.order_id) AS dt2
ON dt2.bundle_id = dt1.id
AND dt2.order_bundle_products = dt1.total_bundle_products
GROUP BY dt1.id,
dt1.name
SQL Fiddle DEMO
Here's the brief example, which lacks some parts, I omitted because I don't know precise database structure. Logic is such:
Temp table is generated, which consists of 3 rows - order, count of
products related to bundle, count of products in bundle
Then we select only orders from this table in which we have those last two
variables equal
select count(order_id) from orders
left join(
select count(*) from bundles_products as bundle_amount,
sum(case when orders_products in (
select names from bundles_products where bundle_id='1') then 1 else 0) as order_total,
orders.order_id
left join product on bundle_products.product_id = products.product_id
left join orders on products.product_id = orders_products.product_id
where bundle_products.bundle_id ='1'
) as my_table
on orders.order_name = my_table.orders
where my_table.bundle_amount = my_table.order_total
Edit: I posted this as a response to previous version of the question, without detailed explanation.
Edit2: fixed query a bit. It can be starting point. Logic is still the same, you can get amount of orders for each bundle_id using it

How to determine the first result of a group in MySQL?

So I'm doing a collectible cards managing app, and I have these tables:
"card": contains all the distinct cards ever made, and gives their informations (id, name, text, and so on)
"edition": contains all the card sets ever released (called it "edition" because "set" was a reserved word)
"cardinset": since a card can appear in more than one set, this is the associative table between the two. It also gives the number of the card in the set and the number of copies I have of it in french (fr) and in english (en) from that set.
Of course, all tables have a unique auto-increment id, called "id".
The purpose of my SQL request is to list all the cards of a set, ordered by the number of the card in the set. I need all the info of the "cardinset" entry (number, fr, en) and of course the info of the "card" entry (name, rarity, etc.), but I also need to know how many copies of each card I have in total (across all sets, not just in this one).
My SQL request looks like this (I removed a few fields that weren't important):
SELECT
c.name,
c.rarity,
SUM(cis.fr) + SUM(cis.en) AS available,
cis.number,
cis.fr,
cis.en
FROM
card AS c
INNER JOIN cardinset AS cis ON c.id = cis.cardId
WHERE
c.id IN
(
SELECT
cardId
FROM
cardinset AS cs
WHERE
setId = 104
ORDER BY
number
)
GROUP BY
c.id,
c.name
ORDER BY
cis.number
It almost works, but it doesn't retrieve the right cardinset entry for each card, since it takes the first one of the group, which is not always the one linked to the right set.
Example:
| c.name | c.rarity | available | cis.number | cis.fr | cis.en |
| -------------- | -------- | --------- | ---------- | ------ | ------ |
| Divine Verdict | Common | 9 | 008 | 1 | 1 |
Here, the card info (name and rarity) are correct, as well as the "available" field. However the cis field are wrong: they are part of a cis entry linking this card to another set.
The question is: is it possible to define which entry is the first in the group, and therefore is returned in this case? And if not, is there another way (maybe cleaner) to get the result I want?
Thank you in advance for your answer, I really don't know what to do here... I guess I've reached the limits of my knowledge of MySQL...
Here's a more precise example. This screenshot n°1 shows the first results of my query (described above), knowing that there are 212 results in total. They should be ordered by number, and there should be exactly one result of each number, and yet there are some exceptions:
n° 005, which should be "Divine Verdict" isn't there, because it appears instead as n° 008. That's because that card is part of 6 different sets, a we can see in screenshot n°2 (result of the query "SELECT * FROM cardinset WHERE cardId = 13984"), and the group returns the first entry, which is for set n°12 and not n°104 as I would have it. However the "available" field shows "9", which is the result I want: the sum of all the "fr" and "en" field for that card in all 6 sets it appears in.
There are other cards that don't have the right cardinset info: n° 011 and 019 are missing, but can be found lower with other cardinset info.
I believe this is the way you would want to format your query.
SELECT
c.name,
c.rarity,
cis.fr + cis.en AS available,
cis.number,
cis.fr,
cis.en
FROM
card AS c
INNER JOIN cardinset AS cis ON c.id = cis.cardId
WHERE
c.id IN
(
SELECT
cardId
FROM
cardinset AS cs
WHERE
setId = 104
GROUP BY
setID, cardID
)
ORDER BY
cis.number
The GROUP BY clause was moved into the sub select and modified to make sure an entry is the right combo of card/set. Also removed the SUMs because that was not necessary.
I made it at last!
I used a subquery to get the "available" field with a GROUP BY clause. It's long, and not very fast, but it gets the job done. If you have an idea that could improve it, don't hesitate.
SELECT e.code, cs.number, sub.name, sub.rarity, cs.fr, cs.en, sub.available
FROM cardinset as cs
INNER JOIN edition as e ON e.id = cs.setId
INNER JOIN (
SELECT c.id, c.name, c.rarity,
SUM(cis.fr)+SUM(cis.en) as available, SUM(cis.frused)+SUM(cis.enused) as used
FROM card as c
INNER JOIN cardinset as cis ON c.id = cis.cardId
WHERE c.id IN (
SELECT cardId
FROM cardinset as cins
WHERE setId = 54)
GROUP BY c.id, c.name
ORDER BY c.id
) AS sub ON cs.cardId = sub.id
WHERE setId = 54
ORDER BY cs.number

MySQL How do I count the number of occurences of each ID?

It may be difficult to explain what I am after, apologies if the question is vague.
I have a table which associates products with keywords using IDs
So I may have product IDs, 2,3,4,5 associated with Keyword id 14
and product IDs 3,6,9 associated with Keyword id 15
My question is How do I count and store the total for those IDs associated with Keyword 14 and for those IDs associated with Keyword 15 and so on (New Keywords added all the time)?
MY SQL so far:
select products_keyword_categories.key_cat_name
from products_keyword_to_product
inner join products_keyword_categories
on products_keyword_categories.key_cat_id = products_keyword_to_product.key_cat_id
group by products_keyword_categories.key_cat_name
Many thanks in advance for any advice. Also, if there is any terminology that will aid me in further research via a Google search that would also be most welcome.
Edit to add: In the example above the table containing the associations is products_keyword_to_product - I inner join the other table to return the Keyword name.
Edit to add (2): Sorry I was afraid my question would be vague.
If I wanted to just count all the products using keyword ID 14 I would use COUNT() AS - As mentioned in the answers but I also need to count the number of products using Keyword ID 15 and Keyword ID 16 etc. - Hope that makes more sense.
select key_cat_name ,count(*)
from products_keyword_categories pkc
inner join products_keyword_to_product ptk on pkc.id=ptk.key_id
group by id;
select cat.key_cat_name, count(*) from
products_keyword_categories cat inner join products_keyword_to_product prod
on prod.key_cat_id=cat.key_cat_id
group by cat.key_cat_name
Edit:
select cat.key_cat_name, prod_assoc.product_id, count(*) from
products_keyword_categories cat inner join products_keyword_to_product prod_assoc
on prod_assoc.key_cat_id=cat.key_cat_id
group by cat.key_cat_name,prod_assoc.product_id
Assuming your tables structure is like this:
products_keyword_categories
key_cat_id key_cat_name
1 Electronics
2 Toys
3 Software
products_keyword_to_product
key_cat_id product_id
1 1
2 1
3 2
1 2
products
product_id name
1 Product A
2 Robot
Edit 2:
Try this
SELECT key_cat_name, product_id, COUNT(*)
FROM
(select cat.key_cat_name, prod_assoc.product_id from
products_keyword_categories cat inner join products_keyword_to_product prod_assoc
on prod_assoc.key_cat_id=cat.key_cat_id) as tbl
GROUP BY key_cat_name, product_id
Edit 3:
The query above is made of 2 parts:
The inner part:
(select cat.key_cat_name, prod_assoc.product_id from
products_keyword_categories cat inner join products_keyword_to_product prod_assoc
on prod_assoc.key_cat_id=cat.key_cat_id)
Which gives 1 row per combination of product_id and key_cat_name.
The outer part:
SELECT key_cat_name, product_id, COUNT(*)
FROM (...) as tbl
GROUP BY key_cat_name, product_id
Which operates on the results of the inner part (as tbl), counting how many times a combination of key_cat_name and product_id appears on the inner part.
Check this: Subqueries in MySQL, Part 1
You are almost there, you just need to add the following:
select count(products_keyword_to_product.id), products_keyword_categories.key_cat_name
...
the rest is correct
Updated Answer:
SELECT COUNT(*), reference_field FROM table WHERE...
HAVING field=value
GROUP BY field
For aggregate conditions you must use HAVING

Grouping, counting and excluding based on column value

Although I've not a complete newbie in SQL or MySQL I notice that there's still quite a bit to learn. I cannot get my head around this one, after much trying, reading and searching. If you can give any pointers, I'd be grateful.
I have simplified the actual data and tables to the following.
Two tables are relevant: Staff, and Work. They contain data on staff in various projects.
Staff:
ID Name Unit
1 Smith Chicago
2 Clarke London
3 Hess Chicago
Work:
StaffID ProjectID
1 10
2 10
3 10
1 20
2 30
3 40
1 50
3 50
Goal:
To get grouped all those projects where there are staff from Chicago, with the count of all staff in that project.
Expected result:
Project Staff count
10 3
20 1
40 1
50 2
So the project 30 is not listed because its member(s) are not from Chicago.
My query below is obviously wrong. It counts only those project members who are from Chicago, not the whole project staff.
SELECT
work.projectID as Project, COUNT(*) as "Staff count"
FROM
staff
JOIN
work ON staff.ID=work.staffID
WHERE
unit="Chicago"
GROUP BY
work.projectID;
I'd put the test for Chicago in a subselect.
Alternatively you can use a self-join, but I find the sub-select easier to understand.
SELECT
w.projectid as project
,COUNT(*) as `staff count`
FROM work w
INNER JOIN staff s ON (w.staffID = s.id)
WHERE w.projectID IN (
SELECT w2.projectID FROM work w2
INNER JOIN staff s2 ON (w2.staffID = s2.id AND s2.unit = 'Chicago'))
GROUP BY w.projectid
Remove the where clause and add a having clause which checks that at least one member of staff is from Chicago.
SELECT
work.projectID as Project, COUNT(*) as "Staff count"
FROM
staff
JOIN
work ON staff.ID=work.staffID
GROUP BY
work.projectID
HAVING
count(case unit when 'Chicago' then 1 end) > 0;
Finally: the result. Thanks again both #Johan and #a'r for your help, and #Johan for getting me on the right track (in my case).
I changed the sub-select to a derived table, and inner-joined this with the Work table on projectID.
SELECT
w.projectID AS project
,COUNT(*) AS `staff count`
FROM work w
INNER JOIN
(SELECT DISTINCT w2.projectID
FROM work w2
INNER JOIN staff s ON (w2.staffID = s.id AND s.unit = 'Chicago')) c
ON (w.projectID = c.projectID)
GROUP BY w.projectID