Efficiently join same data twice without CTE in MySQL? - mysql

I have this query that sums payments and reimbursements against pledges. It works, but smells bad:
select P.pledgeID, P.decamount,
(
sum(coalesce(C1.decamount, 0)) - sum(coalesce(C2.decamount, 0))
) as paymentTotal
from Pledge P
left join (select C.*, CT.eaddOrSubtract
from `Payment` C
left join PaymentType CT on C.paymentTypeID = CT.paymentTypeID )
C1 on P.pledgeID = C1.pledgeID and C1.eaddOrSubtract = 'add'
left join (select C.*, CT.eaddOrSubtract
from `Payment` C
left join PaymentType CT on C.paymentTypeID = CT.paymentTypeID)
C2 on P.pledgeID = C2.pledgeID and C2.eaddOrSubtract = 'subtract'
group by pledgeID
Particularly, I think there should be a better way to handle the joins inside the joins, especially since they produce the same results. On another RDBMS, I'd use a CTE, but that's not available here. Is there a more efficient way to calculate these payment totals (taking into account the fact that some are net additions and other net subtractions)?
Schema info:
PaymentType
---
| paymentTypeID | eaddOrSubtract | ...
| 1 | add |
| 2 | add |
| 3 | subtract |
| 4 | add |
| 5 | subtract |
Payment
---
| checkID | pledgeID | paymentTypeID | decamount | ...
| 1 | 19415 | 4 | 15.19 |
| 2 | 19414 | 2 | 900.00 |
| 3 | 19106 | 5 | 3856.00 |
| 4 | 19106 | 3 | 52.00 |
| 5 | 19414 | 1 | 15.00 |

The query should select all pledges (their pledgeID and decamount) and the total of payments for each pledge. Some payments are positive, some negative.
You query selects all pledges, joins the positive payments to each pledge and joins the negative payments to each row related to the pledge. If there is at most one negative and at most one positive payment it almost works (except that it returns NULL instead of 0 when there are no payments). Once there are at least two payments in one category (positive/negative) and at least one in the other category, problems arise. Each negative payment is joined to each positive payment on the same pledge and all the pairs are summed.
A cleaner way to look at the problem directly follows the first paragraph of this answer. Instead of two joins with filter on eaddOrSubtract, it suffices to make one join to a subquery, where the subquery internally handles the sign of the amount being summed. The CASE operator is great for such a job.
SELECT
P.pledgeID
, P.decamount
, COALESCE(SUM(C.signedDecamount), 0) AS paymentTotal
FROM Pledge P
LEFT JOIN (
SELECT
C.*
, CASE CT.eaddOrSubtract
WHEN 'add' THEN C.decamount
WHEN 'subtract' THEN -C.decamount
END AS signedDecamount
FROM Payment C
LEFT JOIN PaymentType CT ON C.paymentTypeID = CT.paymentTypeID
) C ON P.pledgeID = C.pledgeID
GROUP BY P.pledgeID
The COALESCE() call is there for the case when no payments are joined to the pledge or all they joined payments have NULL decamount. COALESCE() to 0 inside SUM() can always be safely omitted, as SUM() skips NULLs; I guess those calls were just artifacts of hacking the corner cases of joins in the original query.
SQL Fiddle

Related

MySQL GROUP_CONCAT with SUM() and multiple JOINs inside subquery

I'm very average with MySQL, but usually I can write all the needed queries after reading documentation and searching for examples. Now, I'm in the situation where I spent 3 days re-searching and re-writing queries, but I can't get it to work the exact way I need. Here's the deal:
1st table (mpt_companies) contains companies:
| company_id | company_title |
------------------------------
| 1 | Company A |
| 2 | Company B |
2nd table (mpt_payment_methods) contains payment methods:
| payment_method_id | payment_method_title |
--------------------------------------------
| 1 | Cash |
| 2 | PayPal |
| 3 | Wire |
3rd table (mpt_payments) contains payments for each company:
| payment_id | company_id | payment_method_id | payment_amount |
----------------------------------------------------------------
| 1 | 1 | 1 | 10.00 |
| 2 | 2 | 3 | 15.00 |
| 3 | 1 | 1 | 20.00 |
| 4 | 1 | 2 | 10.00 |
I need to list each company along with many stats. One of stats is the sum of payments in each payment method. In other words, the result should be:
| company_id | company_title | payment_data |
--------------------------------------------------------
| 1 | Company A | Cash:30.00,PayPal:10.00 |
| 2 | Company B | Wire:15.00 |
Obviously, I need to:
Select all the companies;
Join payments for each company;
Join payment methods for each payment;
Calculate sum of payments in each method;
GROUP_CONCAT payment methods and sums;
Unfortunately, SUM() doesn't work with GROUP_CONCAT. Some solutions I found on this site suggest using CONCAT, but that doesn't produce the list I need. Other solutions suggest using CAST(), but maybe I do something wrong because it doesn't work too. This is the closest query I wrote, which returns each company, and unique list of payment methods used by each company, but doesn't return the sum of payments:
SELECT *,
(some other sub-queries I need...),
(SELECT GROUP_CONCAT(DISTINCT(mpt_payment_methods.payment_method_title))
FROM mpt_payments
JOIN mpt_payment_methods
ON mpt_payments.payment_method_id=mpt_payment_methods.payment_method_id
WHERE mpt_payments.company_id=mpt_companies.company_id
ORDER BY mpt_payment_methods.payment_method_title) AS payment_data
FROM mpt_companies
Then I tried:
SELECT *,
(some other sub-queries I need...),
(SELECT GROUP_CONCAT(DISTINCT(mpt_payment_methods.payment_method_title), ':', CAST(SUM(mpt_payments.payment_amount) AS CHAR))
FROM mpt_payments
JOIN mpt_payment_methods
ON mpt_payments.payment_method_id=mpt_payment_methods.payment_method_id
WHERE mpt_payments.company_id=mpt_companies.company_id
ORDER BY mpt_payment_methods.payment_method_title) AS payment_data
FROM mpt_companies
...and many other variations, but all of them either returned query errors, either didn't return/format data I need.
The closest answer I could find was MySQL one to many relationship: GROUP_CONCAT or JOIN or both? but after spending 2 hours re-writing the provided query to work with my data, I couldn't do it.
Could anyone give me a suggestion, please?
You can do that by aggregating twice. First for the sum of payments per method and company and then to concatenate the sums for each company.
SELECT x.company_id,
x.company_title,
group_concat(payment_amount_and_method) payment_data
FROM (SELECT c.company_id,
c.company_title,
concat(pm.payment_method_title, ':', sum(p.payment_amount)) payment_amount_and_method
FROM mpt_companies c
INNER JOIN mpt_payments p
ON p.company_id = c.company_id
INNER JOIN mpt_payment_methods pm
ON pm.payment_method_id = p.payment_method_id
GROUP BY c.company_id,
c.company_title,
pm.payment_method_id,
pm.payment_method_title) x
GROUP BY x.company_id,
x.company_title;
db<>fiddle
Here you go
SELECT company_id,
company_title,
GROUP_CONCAT(
CONCAT(payment_method_title, ':', payment_amount)
) AS payment_data
FROM (
SELECT c.company_id, c.company_title, pm.payment_method_id, pm.payment_method_title, SUM(p.payment_amount) AS payment_amount
FROM mpt_payments p
JOIN mpt_companies c ON p.company_id = c.company_id
JOIN mpt_payment_methods pm ON pm.payment_method_id = p.payment_method_id
GROUP BY p.company_id, p.payment_method_id
) distinct_company_payments
GROUP BY distinct_company_payments.company_id
;

SQL left join: how to return the newest from tableB and grouped by another field

I've been trying for two days, without luck.
I have the following simplified tables in my database:
customers:
| id | name |
| 1 | andrea |
| 2 | marco |
| 3 | giovanni |
access:
| id | name_id | date |
| 1 | 1 | 5000 |
| 2 | 1 | 4000 |
| 3 | 2 | 1500 |
| 4 | 2 | 3000 |
| 5 | 2 | 1000 |
| 6 | 3 | 6000 |
| 7 | 3 | 2000 |
I want to return all the names with their last access date.
At first I tried simply with
SELECT * FROM customers LEFT JOIN access ON customers.id =
access.name_id
But I got 7 rows instead of 3 as expected. So I understood I need to use GROUP BY statemet as the following:
SELECT * FROM customers LEFT JOIN access ON customers.id =
access.name_id GROUP BY customers.id
As far I know, GROUP BY combines using a random row. In fact I got unordered access dates with several tests.
Instead I need to group every customer id with its corresponding latest access! How this can be done?
You have to get the latest date from the access table with a group by on the the name_id, then join this result with the customer table. Here is the query:
select c.id, c.name, a.last_access_date from customers c left join
(select id, name_id, max(access_date) last_access_date from access group by name_id) a
on c.id=a.name_id;
Here is a DEMO on sqlfiddle.
I think this is what you'd like to achieve:
SELECT c.id, c.name, max(a.date) last_access
FROM customers c
LEFT JOIN access a ON c.id = a.name_id
GROUP BY c.id, c.name
The LEFT join will return all entries in table customers regardless if the join criteria (c.id = a.name_id) is satisfied. This means that you might get some NULL entries.
Example:
Simply add a new row in the customers table (id: 4, name: manuela). The output will have 4 rows and the newest row will be (id: 4, last_access: null)
I would do this using a correlated subquery in the ON clause:
SELECT a.*, c.*
FROM customers c LEFT JOIN
access a
ON c.id = a.name_id AND
a.DATE = (SELECT MAX(a2.date) FROM access a2 WHERE a2.name_id = a.name_id);
If this statement is true:
I need to group every customer id with its corresponding latest access! How this can be done?
Then you can simply do:
select a.name_id, max(a2.date)
from access a
group by a.name_id;
You do not need the customers table because:
All customers are in access, so the left join is not necessary.
You need no columns from customers.

Can I be selective on what rows I join on in MySQL

Suppose I have two tables, people and emails. emails has a person_id, an address, and an is_primary:
people:
id
emails:
person_id
address
is_primary
To get all email addresses per person, I can do a simple join:
select * from people join emails on people.id = emails.person_id
What if I only want (at most) one row from the right table for each row in the left table? And, if a particular person has multiple emails and one is marked as is_primary, is there a way to prefer which row to use when joining?
So, if I have
people: emails:
------ -----------------------------------------
| id | | id | person_id | address | is_primary |
------ -----------------------------------------
| 1 | | 1 | 1 | a#b.c | true |
| 2 | | 2 | 1 | b#b.c | false |
| 3 | | 3 | 2 | c#b.c | true |
| 4 | | 4 | 4 | d#b.c | false |
------ -----------------------------------------
is there a way to get this result:
------------------------------------------------
| people.id | emails.id | address | is_primary |
------------------------------------------------
| 1 | 1 | a#b.c | true |
| 2 | 3 | c#b.c | true | // chosen over b#b.c because it's primary
| 3 | null | null | null | // no email for person 3
| 4 | 4 | d#b.c | false | // no primary email for person 4
------------------------------------------------
You got it a bit wrong, how left/right joins work.
This join
select * from people join emails on people.id = emails.person_id
will get you every column from both tables for all records that match your ON condition.
The left join
select * from people left join emails on people.id = emails.person_id
will give you every record from people, regardless if there's a corresponding record in emails or not. When there's not, the columns from the emails table will just be NULL.
If a person has multiple emails, multiple records will be in the result for this person. Beginners often wonder then, why the data has duplicated.
If you want to restrict the data to the rows where is_primary has the value 1, you can do so in the WHERE clause when you're doing an inner join (your first query, although you ommitted the inner keyword).
When you have a left/right join query, you have to put this filter in the ON clause. If you would put it in the WHERE clause, you would turn the left/right join into an inner join implicitly, because the WHERE clause would filter the NULL rows that I mentioned above. Or you could write the query like this:
select * from people left join emails on people.id = emails.person_id
where (emails.is_primary = 1 or emails.is_primary is null)
EDIT after clarification:
Paul Spiegel's answer is good, therefore my upvote, but I'm not sure if it performs well, since it has a dependent subquery. So I created this query. It may depend on your data though. Try both answers.
select
p.*,
coalesce(e1.address, e2.address) AS address
from people p
left join emails e1 on p.id = e1.person_id and e1.is_primary = 1
left join (
select person_id, address
from emails e
where id = (select min(id) from emails where emails.is_primary = 0 and emails.person_id = e.person_id)
) e2 on p.id = e2.person_id
Use a correlated subquery with LIMIT 1 in the ON clause of the LEFT JOIN:
select *
from people p
left join emails e
on e.person_id = p.id
and e.id = (
select e1.id
from emails e1
where e1.person_id = e.person_id
order by e1.is_primary desc, -- true first
e1.id -- If e1.is_primary is ambiguous
limit 1
)
order by p.id
sqlfiddle

MySQL - Get records from INNER JOIN not between dates

I have two tables
Accounts:
+------------+--------+
| accountsid | name |
+------------+--------+
| 1 | Bob |
| 2 | Rachel |
| 3 | Mark |
+------------+--------+
Sales Orders
+--------------+------------+------------+--------+
| salesorderid | accountsid | so_date | amount |
+--------------+------------+------------+--------+
| 1 | 1 | 2015-12-16 | 50 |
| 2 | 1 | 2016-01-13 | 20 |
| 3 | 2 | 2015-12-14 | 10 |
| 4 | 3 | 2016-02-14 | 35 |
+--------------+------------+------------+--------+
As you can see, is a 1-N relation where Accounts has many Salesorders and Salesorder has 1 Account.
I need to retrieve "old" Accounts where are not active anymore. For example, If some Account dont have Salesorder in 2016 is an inactive Account.
So, in this example the result will be ONLY Rachel.
How can i retrieve this? I think its the "opposite" of between but I cant figure how to do it...
Thanks.
PS. Despite the title I can get this without INNER JOIN.
You're looking to effect an anti-join, for which there are three possibilities in MySQL:
Using NOT IN:
SELECT a.*
FROM Accounts a
WHERE a.accountsid NOT IN (
SELECT so.accountsid
FROM `Sales Orders` so
WHERE so.so_date >= '2016-01-01'
)
Using NOT EXISTS:
SELECT a.*
FROM Accounts a
WHERE NOT EXISTS (
SELECT *
FROM `Sales Orders` so
WHERE so.accountsid = a.accountsid
AND so.so_date >= '2016-01-01'
)
Using an outer JOIN:
SELECT a.*
FROM Accounts a LEFT JOIN `Sales Orders` so
ON so.accountsid = a.accountsid
AND so.so_date >= '2016-01-01'
WHERE so.accountsid IS NULL
why do you need to use only inner join? inner join is for cases you have data matching on two tables but in this case you don't you need to be using a subquery with either "not in" or "not exists"
What you want is to get the ids that didn´t make any order, so get the ids that made some order and the rest of them are the ones that didn´t make orders.
It should be something like this SELECT * FROM Accounts WHERE accountsid NOT IN (SELECT accountsid FROM Sales Orders WHERE so_date > your_date)

join one row to all row and returning all row

can I get data like this from my table
| id_outlet| date | count(msisdn) |
| 34.10.1 | 2014-08 | 0 |
| 34.10.1 | 2014-09 | 3 |
| 34.10.1 | 2014-10 | 2 |
| 34.10.2 | 2014-08 | 1 |
| 34.10.2 | 2014-09 | 0 |
| 34.10.2 | 2014-10 | 0 |
So I have 2 tables
1. table outlet (unique)
2. table sales (detail of table outlet)
As u see in my second table there are 3 periode (2014-08, 2014-09, 2014-10)
I want join that periode with id_outlet in first table like that example.
Can I?
Please Help me
Using a CROSS JOIN:-
SELECT
o.id_outlet,
s_main.periode,
o.branch,
count(msisdn)
FROM
(
SELECT DISTINCT SUBSTRING(date,1,7) AS periode
FROM sales
) s_main
CROSS JOIN outlet o
LEFT OUTER JOIN sales s
ON s_main.periode = SUBSTRING(s.date,1,7)
AND o.id_outlet = s.id_outlet
WHERE (o.STATUS LIKE 'STREET%')
GROUP BY s_main.periode, o.branch, o.id_outlet
If you have a table of dates then you can just use that rather than the sub query to get the dates (which also avoids the potential problem of not having a date in the results for a month where there has been zero sales for any outlet).
Don't worry, be happy!
SELECT
o.id_outlet,
SUBSTRING(s.date,1,7) AS periode,
o.branch
FROM outlet o LEFT JOIN sales s ON o.id_outlet = s.id_outlet
WHERE (o.STATUS LIKE 'STREET%')
ORDER BY o.id_outlet, YEAR(s.DATE), MONTH(s.DATE), branch
You need this query:
SELECT
o.id_outlet,
d.period AS periode,
o.branch,
count(msisdn)
FROM dates d LEFT JOIN outlet o ON d.period = SUBSTRING(o.date,1,7) LEFT JOIN sales s ON o.id_outlet = s.id_outlet
WHERE (o.STATUS LIKE 'STREET%')
GROUP BY CONCAT(d.period, '#', s.id_outlet)
ORDER BY o.id_outlet, d.period, branch