MySQL GROUP_CONCAT with SUM() and multiple JOINs inside subquery - mysql

I'm very average with MySQL, but usually I can write all the needed queries after reading documentation and searching for examples. Now, I'm in the situation where I spent 3 days re-searching and re-writing queries, but I can't get it to work the exact way I need. Here's the deal:
1st table (mpt_companies) contains companies:
| company_id | company_title |
------------------------------
| 1 | Company A |
| 2 | Company B |
2nd table (mpt_payment_methods) contains payment methods:
| payment_method_id | payment_method_title |
--------------------------------------------
| 1 | Cash |
| 2 | PayPal |
| 3 | Wire |
3rd table (mpt_payments) contains payments for each company:
| payment_id | company_id | payment_method_id | payment_amount |
----------------------------------------------------------------
| 1 | 1 | 1 | 10.00 |
| 2 | 2 | 3 | 15.00 |
| 3 | 1 | 1 | 20.00 |
| 4 | 1 | 2 | 10.00 |
I need to list each company along with many stats. One of stats is the sum of payments in each payment method. In other words, the result should be:
| company_id | company_title | payment_data |
--------------------------------------------------------
| 1 | Company A | Cash:30.00,PayPal:10.00 |
| 2 | Company B | Wire:15.00 |
Obviously, I need to:
Select all the companies;
Join payments for each company;
Join payment methods for each payment;
Calculate sum of payments in each method;
GROUP_CONCAT payment methods and sums;
Unfortunately, SUM() doesn't work with GROUP_CONCAT. Some solutions I found on this site suggest using CONCAT, but that doesn't produce the list I need. Other solutions suggest using CAST(), but maybe I do something wrong because it doesn't work too. This is the closest query I wrote, which returns each company, and unique list of payment methods used by each company, but doesn't return the sum of payments:
SELECT *,
(some other sub-queries I need...),
(SELECT GROUP_CONCAT(DISTINCT(mpt_payment_methods.payment_method_title))
FROM mpt_payments
JOIN mpt_payment_methods
ON mpt_payments.payment_method_id=mpt_payment_methods.payment_method_id
WHERE mpt_payments.company_id=mpt_companies.company_id
ORDER BY mpt_payment_methods.payment_method_title) AS payment_data
FROM mpt_companies
Then I tried:
SELECT *,
(some other sub-queries I need...),
(SELECT GROUP_CONCAT(DISTINCT(mpt_payment_methods.payment_method_title), ':', CAST(SUM(mpt_payments.payment_amount) AS CHAR))
FROM mpt_payments
JOIN mpt_payment_methods
ON mpt_payments.payment_method_id=mpt_payment_methods.payment_method_id
WHERE mpt_payments.company_id=mpt_companies.company_id
ORDER BY mpt_payment_methods.payment_method_title) AS payment_data
FROM mpt_companies
...and many other variations, but all of them either returned query errors, either didn't return/format data I need.
The closest answer I could find was MySQL one to many relationship: GROUP_CONCAT or JOIN or both? but after spending 2 hours re-writing the provided query to work with my data, I couldn't do it.
Could anyone give me a suggestion, please?

You can do that by aggregating twice. First for the sum of payments per method and company and then to concatenate the sums for each company.
SELECT x.company_id,
x.company_title,
group_concat(payment_amount_and_method) payment_data
FROM (SELECT c.company_id,
c.company_title,
concat(pm.payment_method_title, ':', sum(p.payment_amount)) payment_amount_and_method
FROM mpt_companies c
INNER JOIN mpt_payments p
ON p.company_id = c.company_id
INNER JOIN mpt_payment_methods pm
ON pm.payment_method_id = p.payment_method_id
GROUP BY c.company_id,
c.company_title,
pm.payment_method_id,
pm.payment_method_title) x
GROUP BY x.company_id,
x.company_title;
db<>fiddle

Here you go
SELECT company_id,
company_title,
GROUP_CONCAT(
CONCAT(payment_method_title, ':', payment_amount)
) AS payment_data
FROM (
SELECT c.company_id, c.company_title, pm.payment_method_id, pm.payment_method_title, SUM(p.payment_amount) AS payment_amount
FROM mpt_payments p
JOIN mpt_companies c ON p.company_id = c.company_id
JOIN mpt_payment_methods pm ON pm.payment_method_id = p.payment_method_id
GROUP BY p.company_id, p.payment_method_id
) distinct_company_payments
GROUP BY distinct_company_payments.company_id
;

Related

Efficiently join same data twice without CTE in MySQL?

I have this query that sums payments and reimbursements against pledges. It works, but smells bad:
select P.pledgeID, P.decamount,
(
sum(coalesce(C1.decamount, 0)) - sum(coalesce(C2.decamount, 0))
) as paymentTotal
from Pledge P
left join (select C.*, CT.eaddOrSubtract
from `Payment` C
left join PaymentType CT on C.paymentTypeID = CT.paymentTypeID )
C1 on P.pledgeID = C1.pledgeID and C1.eaddOrSubtract = 'add'
left join (select C.*, CT.eaddOrSubtract
from `Payment` C
left join PaymentType CT on C.paymentTypeID = CT.paymentTypeID)
C2 on P.pledgeID = C2.pledgeID and C2.eaddOrSubtract = 'subtract'
group by pledgeID
Particularly, I think there should be a better way to handle the joins inside the joins, especially since they produce the same results. On another RDBMS, I'd use a CTE, but that's not available here. Is there a more efficient way to calculate these payment totals (taking into account the fact that some are net additions and other net subtractions)?
Schema info:
PaymentType
---
| paymentTypeID | eaddOrSubtract | ...
| 1 | add |
| 2 | add |
| 3 | subtract |
| 4 | add |
| 5 | subtract |
Payment
---
| checkID | pledgeID | paymentTypeID | decamount | ...
| 1 | 19415 | 4 | 15.19 |
| 2 | 19414 | 2 | 900.00 |
| 3 | 19106 | 5 | 3856.00 |
| 4 | 19106 | 3 | 52.00 |
| 5 | 19414 | 1 | 15.00 |
The query should select all pledges (their pledgeID and decamount) and the total of payments for each pledge. Some payments are positive, some negative.
You query selects all pledges, joins the positive payments to each pledge and joins the negative payments to each row related to the pledge. If there is at most one negative and at most one positive payment it almost works (except that it returns NULL instead of 0 when there are no payments). Once there are at least two payments in one category (positive/negative) and at least one in the other category, problems arise. Each negative payment is joined to each positive payment on the same pledge and all the pairs are summed.
A cleaner way to look at the problem directly follows the first paragraph of this answer. Instead of two joins with filter on eaddOrSubtract, it suffices to make one join to a subquery, where the subquery internally handles the sign of the amount being summed. The CASE operator is great for such a job.
SELECT
P.pledgeID
, P.decamount
, COALESCE(SUM(C.signedDecamount), 0) AS paymentTotal
FROM Pledge P
LEFT JOIN (
SELECT
C.*
, CASE CT.eaddOrSubtract
WHEN 'add' THEN C.decamount
WHEN 'subtract' THEN -C.decamount
END AS signedDecamount
FROM Payment C
LEFT JOIN PaymentType CT ON C.paymentTypeID = CT.paymentTypeID
) C ON P.pledgeID = C.pledgeID
GROUP BY P.pledgeID
The COALESCE() call is there for the case when no payments are joined to the pledge or all they joined payments have NULL decamount. COALESCE() to 0 inside SUM() can always be safely omitted, as SUM() skips NULLs; I guess those calls were just artifacts of hacking the corner cases of joins in the original query.
SQL Fiddle

MySQL - Get records from INNER JOIN not between dates

I have two tables
Accounts:
+------------+--------+
| accountsid | name |
+------------+--------+
| 1 | Bob |
| 2 | Rachel |
| 3 | Mark |
+------------+--------+
Sales Orders
+--------------+------------+------------+--------+
| salesorderid | accountsid | so_date | amount |
+--------------+------------+------------+--------+
| 1 | 1 | 2015-12-16 | 50 |
| 2 | 1 | 2016-01-13 | 20 |
| 3 | 2 | 2015-12-14 | 10 |
| 4 | 3 | 2016-02-14 | 35 |
+--------------+------------+------------+--------+
As you can see, is a 1-N relation where Accounts has many Salesorders and Salesorder has 1 Account.
I need to retrieve "old" Accounts where are not active anymore. For example, If some Account dont have Salesorder in 2016 is an inactive Account.
So, in this example the result will be ONLY Rachel.
How can i retrieve this? I think its the "opposite" of between but I cant figure how to do it...
Thanks.
PS. Despite the title I can get this without INNER JOIN.
You're looking to effect an anti-join, for which there are three possibilities in MySQL:
Using NOT IN:
SELECT a.*
FROM Accounts a
WHERE a.accountsid NOT IN (
SELECT so.accountsid
FROM `Sales Orders` so
WHERE so.so_date >= '2016-01-01'
)
Using NOT EXISTS:
SELECT a.*
FROM Accounts a
WHERE NOT EXISTS (
SELECT *
FROM `Sales Orders` so
WHERE so.accountsid = a.accountsid
AND so.so_date >= '2016-01-01'
)
Using an outer JOIN:
SELECT a.*
FROM Accounts a LEFT JOIN `Sales Orders` so
ON so.accountsid = a.accountsid
AND so.so_date >= '2016-01-01'
WHERE so.accountsid IS NULL
why do you need to use only inner join? inner join is for cases you have data matching on two tables but in this case you don't you need to be using a subquery with either "not in" or "not exists"
What you want is to get the ids that didn´t make any order, so get the ids that made some order and the rest of them are the ones that didn´t make orders.
It should be something like this SELECT * FROM Accounts WHERE accountsid NOT IN (SELECT accountsid FROM Sales Orders WHERE so_date > your_date)

SQL sorting does not follow group by statement, always uses primary key

I have a SQL database with a table called staff, having following columns:
workerID (Prim.key), name, department, salary
I am supposed to find the workers with the highest salary per department and used the following statement:
select staff.workerID, staff.name, staff.department, max(staff.salary) AS biggest
from staff
group by staff.department
I get one worker shown from each department, but they are NOT the workers with the highest salary, BUT the biggest salary value is shown, even though the worker does not get that salary.
The person shown is the worker with the "lowest" workerID per department.
So, there is some sorting going on using the primary key, even though it is not mentioned in the group by statement.
Can someone explain, what is going on and maybe how to sort correctly.
Explanation for what is going on:
You are performing a GROUP BY on staff.department, however your SELECT list contains 2 non-grouping columns staff.workerID, staff.name. In standard sql this is a syntax error, however MySql allows it so the query writers have to make sure that they handle such situations themselves.
Reference: http://dev.mysql.com/doc/refman/5.0/en/group-by-handling.html
In standard SQL, a query that includes a GROUP BY clause cannot refer to nonaggregated columns in the select list that are not named in the GROUP BY clause.
MySQL extends the use of GROUP BY so that the select list can refer to nonaggregated columns not named in the GROUP BY clause.
The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate.
Starting with MySQL 5.1 the non-standard feature can be disabled by setting the ONLY_FULL_GROUP_BY flag in sql_mode: http://dev.mysql.com/doc/refman/5.6/en/sql-mode.html#sqlmode_only_full_group_by
How to fix:
select staff.workerID, staff.name, staff.department, staff.salary
from staff
join (
select staff.department, max(staff.salary) AS biggest
from staff
group by staff.department
) t
on t.department = staff.department and t.biggest = staff.salary
In the inner query, fetch department and its highest salary using GROUP BY. Then in the outer query join those results with the main table which would give you the desired results.
This is the usual case group by with a aggregate function does not guarantee proper row corresponding to the aggregate function. Now there are many ways to do it and the usual practice is a sub-query and join. But if the table is big then performance wise it kills, so the other approach is to use left join
So lets say we have the table
+----------+------+-------------+--------+
| workerid | name | department | salary |
+----------+------+-------------+--------+
| 1 | abc | computer | 400 |
| 2 | cdf | electronics | 200 |
| 3 | gfd | computer | 400 |
| 4 | wer | physics | 300 |
| 5 | hgt | computer | 700 |
| 6 | juy | electronics | 100 |
| 7 | wer | physics | 400 |
| 8 | qwe | computer | 200 |
| 9 | iop | electronics | 800 |
| 10 | kli | physics | 800 |
| 11 | qsq | computer | 600 |
| 12 | asd | electronics | 300 |
+----------+------+-------------+--------+
SO we can get the data as
select st.* from staff st
left join staff st1 on st1.department = st.department
and st.salary < st1.salary
where
st1.workerid is null
The above will give you as
+----------+------+-------------+--------+
| workerid | name | department | salary |
+----------+------+-------------+--------+
| 5 | hgt | computer | 700 |
| 9 | iop | electronics | 800 |
| 10 | kli | physics | 800 |
+----------+------+-------------+--------+
My favorite solution to this problem uses LEFT JOIN:
SELECT m.workerID, m.name, m.department, m.salary
FROM staff m # 'm' from 'maximum'
LEFT JOIN staff o # 'o' from 'other'
ON m.department = o.department # match rows by department
AND m.salary < o.salary # match each row in `m` with the rows from `o` having bigger salary
WHERE o.salary IS NULL # no bigger salary exists in `o`, i.e. `m`.`salary` is the maximum of its dept.
;
This query selects all the workers that have the biggest salary from their department; i.e. if two or more workers have the same salary and it is the bigger in their department then all these workers are selected.
Try this:
SELECT s.workerID, s.name, s.department, s.salary
FROM staff s
INNER JOIN (SELECT s.department, MAX(s.salary) AS biggest
FROM staff s GROUP BY s.department
) AS B ON s.department = B.department AND s.salary = B.biggest;
OR
SELECT s.workerID, s.name, s.department, s.salary
FROM (SELECT s.workerID, s.name, s.department, s.salary
FROM staff s
ORDER BY s.department, s.salary DESC
) AS s
GROUP BY s.department;

Using subqueries with multiple results which depend on the main query

I have a persons table called "tc_person" and a marriage table called "tc_marriage". I want to select a few columns from my persons table and one column which represents the id of the partner.
The marriage table includes the pid_1 and pid_2 of two people - but it is important that there is only one entry for a couple and the order of the couples ids may vary. Here's the tables:
tc_person:
| id | name | lastname |
--------------------------------------
| 4 | peter | smith |
| 5 | sarah | smith |
tc_marriage:
| id | pid_1 | pid_2 |
--------------------------------------
| 0 | 5 | 4 |
| 1 | 7 | 9 |
It seems that my subquery is interpreted as a whole before the original select statement. Now I get the error that my subquery returns more than one row.
SELECT p.id, p.name, p.lastname,
(SELECT m.pid_1 FROM tc_marriage m WHERE m.pid_2 = p.id UNION
SELECT m.pid_2 FROM tc_marriage m WHERE m.pid_1 = p.id) as partner_id
FROM tc_person p WHERE p.lastname LIKE 'smith';
I am looking for the following output:
| id | name | lastname | partner_id |
-----------------------------------------------------
| 4 | peter | smith | 5 |
| 5 | sarah | smith | 4 |
Is this even possible with only one single query? You can probably tell by now that I'm quite the SQL noob. Maybe you guys can help.
You can use IN to avoid a UNION (which is typically slower), and a CASE statement to pick out the correct partner_id:
SELECT p.id,
p.name,
p.last_name,
CASE p.id WHEN m.pid_1 THEN m.pid_2 ELSE m.pid_1 END AS partner_id
FROM tc_person p
JOIN tc_marriage m ON p.id IN (m.pid_1, m.pid_2)
What you want to do is to join the person table to the marriage table. However, the problem is that you have two keys. One solution is to do two joins and then some logic to choose the right value.
I prefer to "double" the marriage table by swapping the partners. The following query takes this approach in a subquery:
select p.id, p.name, p.lastname, partner_id
from tc_person p join
(select pid_1 as firstid, pid_2 as partner_id
from marriage
union all
select pid_2 as firstid, pid_1 as partner_id
from marriage
) m
on p.id = m.firstid

MySQL nested query

I have 2 tables, one containing products, one purchases. I'm trying to find the average of the last 3 purchase prices for each product. So in the example below, for the product 'beans' I would want to return the average of the last 3 purchase prices before the product time 1230854663, i.e. average of customer C,D,E (239)
Products
+-------+------------+
| Name | time |
+-------+------------+
| beans | 1230854764 |
+-------+------------+
Purchases
+----------+------------+-------+
| Customer | time | price |
+----------+------------+-------+
| B | 1230854661 | 207 |
| C | 1230854662 | 444 |
| D | 1230854663 | 66 |
| E | 1230854764 | 88 |
| A | 1230854660 | 155 |
+----------+------------+-------+
I've come up with a nested select query which nearly gets me there i.e. it works if I hard code the time:
SELECT products.name,(SELECT avg(temp.price) FROM (select purchases.price from purchases WHERE purchases.time < 1230854764 order by purchases.time desc limit 3) temp) as av_price
from products products
But if the query references product.time rather than a hard coded time such as below I get an error that column products.time does not exist.
SELECT products.name,(SELECT avg(temp.price) FROM (select purchases.price from purchases WHERE purchases.time < products.time order by purchases.time desc limit 3) temp) as av_price from products products
I'm not sure if I'm making a simple mistake with nested queries or I'm going about this in totally the wrong way and should be using joins or some other construct, either way I'm stuck. Any help would be greatfully received.
The only problem in your query is you haven't mentioned products table in your inner query.
SELECT products.name,(SELECT avg(temp.price)
FROM (select purchases.price from purchases,products
WHERE purchases.time < products.time order by purchases.time desc limit 3) temp) as
av_price from products products