Use aggregate values for farther calculation - mysql

I have the followin query:
SELECT contracts.id,
(SELECT sum(pos.sum_to_pay) FROM pos
where pos.contract_id=contracts.id and pos.is_draft=0) as paid,
(SELECT sum(acts.amount) FROM acts
where acts.contract_id=contracts.id) as acts_sum
from contracts
it works but i want to add another result field to_pay that should be calculated like acts_sum - paid = to_pay.
I'm trying to do it like this:
SELECT contracts.id,
(SELECT sum(pos.sum_to_pay) FROM pos
where pos.contract_id=contracts.id and pos.is_draft=0) as paid,
(SELECT sum(acts.amount) FROM acts
where acts.contract_id=contracts.id) as acts_sum,
(acts_sum - paid) as to_pay
from contracts
but I got the error Unknown column 'acts_sum'. How can i find to_pay value based on acts_sum and paid?

Do it with a subquery like this
SELECT acts_sum, paid, (acts_sum - paid) as to_pay FROM
(SELECT contracts.id,
(SELECT sum(pos.sum_to_pay) FROM pos
where pos.contract_id=contracts.id and pos.is_draft=0) as paid,
(SELECT sum(acts.amount) FROM acts
where acts.contract_id=contracts.id) as acts_sum,
from contracts ) subq

You could rewrite your query using joins, correlated sub queries sometimes considered as a costly solution
select c.id,
COALESCE(a.acts_sum,0),
COALESCE(p.paid,0),
COALESCE(a.acts_sum,0) - COALESCE(p.paid,0) as to_pay
from contracts c
left join (
SELECT contract_id,sum(sum_to_pay) paid
FROM pos
where is_draft=0
group by contract_id
) p on c.id = p.contract_id
left join (
SELECT contract_id,sum(amount) acts_sum
FROM acts
group by contract_id
) a on c.id = a.contract_id

Related

SELECT query take too much time to process

The following MySQL query take too much time. its take 24sec . and total records not more the 15000 each table please guide me for faster
Thanks
select c1.code,
( SELECT COALESCE(sum(i.total_amount),0)
FROM invoice as i
WHERE i.customer_code= c1.code
)-
( SELECT COALESCE(sum(p.amount),0)
FROM collection as p
where p.customer_code = c1.code
)-
( SELECT COALESCE(sum(CN.amount),0)
FROM cr_note as CN
where CN.customer_code= c1.code
) as rem_Balance
from customer as c1
you make it fast by replacing sub queries to queries with left joins like this:
WITH allInvoice AS (SELECT customer_code AS code, SUM(total_amount) AS amount FROM invoice GROUP BY customer_code),
allCollection AS (SELECT customer_code AS code, SUM(amount) AS amount FROM collection GROUP BY customer_code),
allNote AS (SELECT customer_code AS code, SUM(amount) AS amount FROM cr_note GROUP BY customer_code)
SELECT customer.code,
(COALESCE(allInvoice.amount) - COALESCE(allCollection.amount) - COALESCE(allNote.amount)) AS rem_Balance
FROM customer
LEFT JOIN allInvoice ON allInvoice.code = customer.code
LEFT JOIN allCollection ON allCollection.code = customer.code
LEFT JOIN allNote ON allNote.code = customer.code

SQL: Select records based on comparison of two most recent associated records

Let's say we have a person table and survey table. survey is a set of attributes collected from a person at some point in time. Let's say survey has columns address and marriage_status
How do I select all persons whose address or marriage status has changed in the last survey?
Here's how I would write it if MySQL were able to magically interpret my intention:
SELECT *
FROM person
JOIN
(SELECT *
FROM survey
GROUP BY survey.person_id
ORDER BY survey.timestamp DESC
LIMIT 2 EACH) -- of course this part doesn't actually work. Trying to get last 2 records per person
surveys
ON surveys.person_id = person.id
WHERE surveys[0].address != surveys[1].address
OR surveys[0].marriage_status != surveys[1].marriage_status;
OR
SELECT *
FROM person
JOIN
(SELECT MOST RECENT survey FOR EACH person) latest_survey
ON latest_survey.person_id = person.id
JOIN
(SELECT SECOND MOST RECENT survey FOR EACH person) previous_survey
ON previous_survey.person_id = person.id
WHERE latest_survey.address != previous_survey.address
OR latest_survey.marriage_status != previous_survey.marriage_status;
This seems like a relatively straightforward query, but it's driving me crazy. I suspect I have tunnel vision and I'm not approaching this the right way.
EDIT: I am on MySQL v5. Based on the first couple answers, it seems like this might be the time to migrate to v8 (among other reasons)
So here's how I ended up doing it. It's a little long, but I think it's pretty straightforward? This felt amazing to get working.
(Note that underscores are used as prefixes in table aliases to help keep track of subquery depth)
SELECT person.*
FROM person
JOIN (
-- Join full survey data against each 'most recent' survey timestamp
SELECT s1.*
FROM survey s1
JOIN (
-- get most recent timestamp for each person
SELECT _s1.person_id, MAX(_s1.timestamp) timestamp
FROM survey _s1
GROUP BY person_id
) latest_surveys
ON latest_surveys.person_id = s1.person_id and latest_surveys.timestamp = s1.timestamp
) latest
ON latest.person_id = person.id
JOIN (
-- Join full survey data against each 'SECOND most recent' survey timestamp
select s2.*
from survey s2
JOIN (
-- to get SECOND most recent survey timestamp, do similar query, but exclude latest timestamp
SELECT _s2.person_id, MAX(_s2.timestamp) timestamp
FROM survey _s2
JOIN (
-- get most recent timestamp for each person (again)
SELECT __s2.person_id, MAX(__s2.timestamp) timestamp
FROM survey __s2
GROUP BY person_id
) _latest_surveys
-- Note the *NOT* equal here
ON _latest_surveys.person_id = _s2.person_id and _latest_surveys.timestamp != _s2.timestamp
GROUP BY _s2.person_id
) previous_surveys
ON previous_surveys.person_id = s2.person_id and previous_surveys.timestamp = s2.timestamp
) previous
ON previous.person_id = person.id
WHERE latest.address != previous.address
OR latest.marriage_status != previous.marriage_status;
Analytic functions make your question much more tractable. If you are not yet using MySQL 8+, then now would be a good time to upgrade. Assuming you are using MySQL 8+, we can try:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY p.id ORDER BY s.timestamp DESC) rn
FROM person p
INNER JOIN survey s ON p.id = s.person_id
)
SELECT id
FROM cte
GROUP BY id
HAVING
MAX(CASE WHEN rn = 1 THEN address END) <> MAX(CASE WHEN rn = 2 THEN address END) OR
MAX(CASE WHEN rn = 1 THEN marriage_status END) <> MAX(CASE WHEN rn = 2 THEN marriage_status END);
The above query uses a pivot trick to isolate the latest, and second latest, addresses and marriage statuses for each person. It retains person id values for those whose latest and second latest addresses or marriage statuses are not identical.
This might be how you can achieve that:
SELECT *
FROM person
JOIN (
SELECT *,
MAX(survey_date) latest_survey,
SUBSTRING_INDEX(SUBSTRING_INDEX(GROUP_CONCAT(survey_date ORDER BY person_id, survey_date ASC),',',-2),',',1) previous_survey,
SUBSTRING_INDEX(GROUP_CONCAT(address ORDER BY person_id, survey_date ASC),',',-1) curadd,
SUBSTRING_INDEX(SUBSTRING_INDEX(GROUP_CONCAT(address ORDER BY person_id, survey_date ASC),',',-2),',',1) prevadd,
SUBSTRING_INDEX(GROUP_CONCAT(marriage_status ORDER BY person_id, survey_date ASC),',',-1) curms,
SUBSTRING_INDEX(SUBSTRING_INDEX(GROUP_CONCAT(marriage_status ORDER BY person_id, survey_date ASC),',',-2),',',1) prevms
FROM survey GROUP BY person_id
HAVING curadd != prevadd OR curms != prevms) A
ON person.id=A.person_id;
Using GROUP_CONCAT and SUBSTRING_INDEX to combine the data value then separate it again and using those to compare at the end. I know there are a bunch of ways to achieve without all these, like your second example is something that I think can be done but when I think about it, it's going to be a very long query. This query however, since you're not using MySQL 8+ is much shorter but the performance of this query is a concern especially on a large table.
It is not given, but I hope you have at least MySQL 8 or similar to have ability to use Common Table Expression. It can simplify the complex query.
The trick part is getting survey records #1 and #2 for each user. I will do it this way: see cte1 and cte2 definition
WITH
cte1 AS (
SELECT MAX(x1.id) AS id, x1.person_id
FROM survey x1
GROUP BY x1.person_id),
cte2 AS (
SELECT MAX(x2.id) AS id, x2.person_id
FROM survey x2
JOIN cte1 ON cte1.person_id = x2.person_id
AND cte1.id > x2.id
GROUP BY x2.person_id)
SELECT
p.*,
s1.address, s2.address address2,
s1.marriage_status, s2.marriage_status marriage_status2
FROM person AS p
JOIN (
cte1 JOIN survey s1 ON s1.id = cte1.id
) ON cte1.person_id = p.id
JOIN (
cte2 JOIN survey s2 ON s2.id = cte2.id
) ON cte2.person_id = p.id
WHERE
(s1.address <> s2.address)
OR (s1.marriage_status <> s2.marriage_status)
https://www.db-fiddle.com/f/hLwdHiZin4MkdUZ4aBz67H/2
Update: Thanks to Ian, I replaced MIN to MAX to get recent records

Postgresql jsonb_agg subquery sort

How can I sort the results of a subquery that's using a json aggregate?
If I had a schema like this:
CREATE TABLE plans( id integer NOT NULL, name character varying(255));
CREATE TABLE plan_items ( id integer NOT NULL, plan_id integer NOT NULL, expected_at date, status integer);
I'm aggregating the plan_items result on a json column through a subquery.
Like this:
SELECT
plans.id,
plans.name,
jsonb_agg((SELECT pi_cols FROM
(SELECT plan_items.id, plan_items.expected_at, plan_items.status) pi_cols
)) AS plan_items_data
FROM
plans
INNER JOIN plan_items ON plan_items.plan_id = plans.id
GROUP BY
plans.id,
plans.name
ORDER BY plans.id;
The JSON aggregate is working as expected and give me the results that I need. Ok.
But I can't order the results.
I've tried:
jsonb_agg((SELECT pi_cols FROM
(SELECT plan_items.id, plan_items.expected_at, plan_items.status ORDER BY plan_items.expected_at) pi_cols
)) AS plan_items_data
and also:
jsonb_agg((SELECT pi_cols FROM
(SELECT plan_items.id, plan_items.expected_at, plan_items.status) pi_cols ORDER BY pi_cols.expected_at
)) AS plan_items_data
But none of these solved.
Any ideas?
As Abelisto suggests, just use a simple aggregate expression with ordering:
jsonb_agg(plan_items ORDER BY plan_items.expected_at) AS plan_items_data
Join the tables with the desirable sort order and use lateral join to select columns for jsonb_agg():
select s.plan_id id, name, jsonb_agg(pi_col)
from (
select p.id plan_id, p.name, pi.id, expected_at, status
from plans p
join plan_items pi
on p.id = pi.plan_id
order by p.id, expected_at
) s,
lateral (
select plan_id id, expected_at, status
) pi_col
group by 1, 2
order by 1;
The above query seems to be more natural and flexible (and a bit faster in most cases) than the one with a subquery in a select list. However for better performance you should also apply Abelisto's suggestion:
select s.plan_id id, name, json_agg(pi_col order by pi_col.expected_at)
from (
select p.id plan_id, p.name, pi.id, expected_at, status
from plans p
join plan_items pi
on p.id = pi.plan_id
) s,
lateral (
select plan_id id, expected_at, status
) pi_col
group by 1, 2
order by 1;

sql counts wrong number of likes

I have written an sql statement that besides all the other columns should return the number of comments and the number of likes of a certain post. It works perfectly when I don't try to get the number of times it has been shared too. When I try to get the number of time it was shared instead it returns a wrong number of like that seems to be either the number of shares and likes or something like that. Here is the code:
SELECT
[...],
count(CS.commentId) as shares,
count(CL.commentId) as numberOfLikes
FROM
(SELECT *
FROM accountSpecifics
WHERE institutionId= '{$keyword['id']}') `AS`
INNER JOIN
account A ON A.id = `AS`.accountId
INNER JOIN
comment C ON C.accountId = A.id
LEFT JOIN
commentLikes CL ON C.commentId = CL.commentId
LEFT JOIN
commentShares CS ON C.commentId = CS.commentId
GROUP BY
C.time
ORDER BY
year, month, hour, month
Could you also tell me if you think this is an efficient SQL statement or if you would do it differently? thank you!
Do this instead:
SELECT
[...],
(select count(*) from commentLikes CL where C.commentId = CL.commentId) as shares,
(select count(*) from commentShares CS where C.commentId = CS.commentId) as numberOfLikes
FROM
(SELECT *
FROM accountSpecifics
WHERE institutionId= '{$keyword['id']}') `AS`
INNER JOIN account A ON A.id = `AS`.accountId
INNER JOIN comment C ON C.accountId = A.id
GROUP BY C.time
ORDER BY year, month, hour, month
If you use JOINs, you're getting back one result set, and COUNT(any field) simply counts the rows and will always compute the same thing, and in this case the wrong thing. Subqueries are what you need here. Good luck!
EDIT: as posted below, count(distinct something) can also work, but it's making the database do more work than necessary for the answer you want to end up with.
Quick fix:
SELECT
[...],
count(DISTINCT CS.commentId) as shares,
count(DISTINCT CL.commentId) as numberOfLikes
Better approach:
SELECT [...]
, Coalesce(shares.numberOfShares, 0) As numberOfShares
, Coalesce(likes.numberOfLikes , 0) As numberOfLikes
FROM [...]
LEFT
JOIN (
SELECT commentId
, Count(*) As numberOfShares
FROM commentShares
GROUP
BY commentId
) As shares
ON shares.commentId = c.commentId
LEFT
JOIN (
SELECT commentId
, Count(*) As numberOfLikes
FROM commentLikes
GROUP
BY commentId
) As likes
ON likes.commentId = c.commentId

Sums are being multiplied in query

I am working on 2 problems for homework and after many hours I have just about solved them both, the last issue I have is that both of my queries are coming back with doubled numerical values instead of single.
Here is what I have:
SELECT SUM(P.AMT_PAID) AS TOTAL_PAID, C.CITATION_ID, C.DATE_ISSUED, SUM(V.FINE_CHARGED) AS TOTAL_CHARGED
FROM PAYMENT P, CITATION C, VIOLATION_CITATION V
WHERE V.CITATION_ID = C.CITATION_ID
AND C.CITATION_ID = P.CITATION_ID
GROUP BY C.CITATION_ID;
and my other one:
SELECT C.CITATION_ID, C.DATE_ISSUED, SUM(V.FINE_CHARGED) AS TOTAL_CHARGED, SUM(P.AMT_PAID) AS TOTAL_PAID, SUM(V.FINE_CHARGED) - SUM(P.AMT_PAID) AS TOTAL_OWED
FROM (CITATION C)
LEFT JOIN VIOLATION_CITATION V
ON V.CITATION_ID = C.CITATION_ID
LEFT JOIN PAYMENT P
ON P.CITATION_ID = C.CITATION_ID
GROUP BY C.CITATION_ID
ORDER BY TOTAL_OWED DESC;
I am sure there is just something that I am overlooking. If someone else could kindly tell me where I went awry it would be a great help.
Select Sum(P.Amt_Paid) As Total_Paid, C.Citation_Id
, C.Date_Issued, Sum(V.Fine_Charged) As Total_Charged
From Payment P
Join Citation C
On C.Citation_Id = P.Citation_Id
Join Violation_Citation V
On V.Citation_Id = C.Citation_Id
Group By C.Citation_Id
First, you should use the JOIN syntax instead of using the comma-delimited list of tables. It makes it easier to read, more standardized and will help prevent problems by overlooking a filtering clause.
Second, the most likely reason for having a sum that is too large is due to the join to the VIOLATION_CITATION table. If you remove the Group By and columns with aggregate functions, you will likely see that P.AMT_PAID is repeated for each instance of VIOLATION_CITATION. Perhaps, the following will solve the problem:
Select Coalesce(PaidByCitation.TotalAmtPaid,0) As Total_Paid
, C.Citation_Id, C.Date_Issued
, Coalesce(ViolationByCitation.TotalCharged,0) As Total_Charged
, Coalesce(ViolationByCitation.TotalCharged,0)
- Coalesce(PaidByCitation.TotalAmtPaid,0) As Total_Owed
From Citation As C
Left Join (
Select P.Citation_Id, Sum( P.Amt_Paid ) As TotalAmtPaid
From Payment As P
Group By P.Citation_Id
) As PaidByCitation
On PaidByCitation.Citation_Id = C.Citation_Id
Left Join (
Select V.Citation_Id, Sum( V.Find_Charged ) As TotalCharged
From Violation_Citation As V
Group By V.Citation_Id
) As ViolationByCitation
On ViolationByCitation.Citation_Id = C.Citation_Id
The use of Coalesce is to ensure that if the left join returns no rows for a given Citation_ID value, that we replace the Null with zero.