MySQL join 3 calculation results - mysql

I have 3 different MySQL calculations, which I'd like to join. I need to be able to show lines where a sum may not exist for some column, or some invoice might not have a corporation ID.
I'm trying to get something like:
passport_amount | invoice_amount | balance_amount | corporation_id
------------------------------------------------------------------
345 | 2345 | 56 | 56
So that I can work on these values in my application code by iterating the list once and not fetching data from the database three times, and then iterating three times to combine the values.
SELECT
sum(passports.amount) AS passports_amount,
companies.corporation_id
FROM
passports
INNER JOIN
employees ON ( passports.employee_id = employees.id )
INNER JOIN
companies ON ( employees.company_id = companies.id )
WHERE
((((passports.pass_type IN ('sport','culture','both'))
AND
(MONTH(passports.valid_from) >= 1
AND MONTH(passports.valid_from) <= 9
AND YEAR(passports.valid_from) = year(now())))
AND (passports.removed = 0
AND passports.valid_from <= date('2014-09-29 11:55:26')))
AND (companies.removed = 0)
AND companies.corporation_id IS NOT NULL)
GROUP BY
companies.corporation_id;
SELECT
sum(invoices.amount) AS invoices_amount,
invoices.corporation_id
FROM
invoices
WHERE
((((YEAR(sent_at) = 2014)
AND (invoices.product_type_id IN (2,3,4)))
AND
(invoices.removed = 0
AND invoices.activated = 1))
AND invoices.corporation_id IS NOT NULL)
GROUP BY
invoices.corporation_id;
SELECT
amount AS balance_amount,
business_id AS corporation_id
FROM
invoice_balances
WHERE
business_type = 'Corporation';

You can combine queries for balance and invoice amount using sub select in left join but this will look odd and can be expensive in terms of performance
SELECT
SUM(p.amount) AS passports_amount,
ii.invoices_amount,
b.balance_amount,
c.corporation_id
FROM
passports p
INNER JOIN employees e ON ( p.e = e.id )
INNER JOIN companies c ON ( e.company_id = c.id )
/* Added left join for balance using subselect*/
LEFT JOIN (
SELECT amount AS balance_amount, business_id AS corporation_id
FROM invoice_balances
WHERE business_type = 'Corporation'
) b ON (c.corporation_id = b.corporation_id)
/* Added left join for invoices_amount using subselect*/
LEFT JOIN(
SELECT SUM(i.amount) AS invoices_amount,
i.corporation_id
FROM
invoices i
WHERE
((((YEAR(sent_at) = 2014)
AND (i.product_type_id IN (2,3,4)))
AND (i.removed = 0 AND i.activated = 1)
)
AND i.corporation_id IS NOT NULL)
GROUP BY i.corporation_id
) ii ON(c.corporation_id = ii.corporation_id)
/* end of joins */
WHERE
((((p.pass_type IN ('sport','culture','both'))
AND
(MONTH(p.valid_from) >= 1
AND MONTH(p.valid_from) <= 9
AND YEAR(p.valid_from) = YEAR(NOW())))
AND (p.removed = 0
AND p.valid_from <= DATE('2014-09-29 11:55:26')))
AND (c.removed = 0)
AND c.corporation_id IS NOT NULL)
GROUP BY c.corporation_id;

Related

Is there a method of counting an attribute that is in a GROUP BY clause?

I need have created a select statement to list out all the customers that have been to multiple merchants below.
I want to create another statement to display how many of those customers have been to each merchant.
What is the optimal method of approaching this problem?
Lists out all customers that have been to multiple merchants.
WITH valentinesDayMerchant AS (
SELECT m.MerchantId, m.MerchantGroupId, m.WebsiteName
FROM Merchant m
INNER JOIN OpeningHours oh ON m.MerchantId = oh.MerchantId AND oh.DayOfWeek = 'TUE'
LEFT JOIN devices.DeviceConnectionState AS dcs ON dcs.MerchantId = oh.MerchantId
WHERE MerchantStatus = '-' AND (m.PrinterType IN ('V','O') OR dcs.State = 1 OR dcs.StateTransitionDateTime > '2023-01-23')
)
SELECT DISTINCT ul.UserLoginId, ul.FullName, ul.EmailAddress, ul.Mobile
FROM dbo.UserLogin AS ul
INNER JOIN dbo.Patron AS p ON p.UserLoginId = ul.UserLoginId
INNER JOIN valentinesDayMerchant AS m ON (m.MerchantId = ul.ReferringMerchantId OR m.MerchantId IN (SELECT pml.MerchantId FROM dbo.PatronMerchantLink AS pml WHERE pml.PatronId = p.PatronId AND ISNULL(pml.IsBanned, 0) = 0))
LEFT JOIN (
SELECT mg.MerchantGroupId, mg.MerchantGroupName, groupHost.HostName [GroupHostName]
FROM dbo.MerchantGroup AS mg
INNER JOIN dbo.Merchant AS parent ON parent.MerchantId = mg.ParentMerchantId
INNER JOIN dbo.HttpHostName AS groupHost ON groupHost.MerchantID = parent.MerchantId AND groupHost.Priority = 0
) mGroup ON mGroup.MerchantGroupId = m.MerchantGroupId
LEFT JOIN (
SELECT po.PatronId, MAX(po.OrderDateTime) [LastOrder]
FROM dbo.PatronsOrder AS po
GROUP BY po.PatronId
) orders ON orders.PatronId = p.PatronId
INNER JOIN dbo.HttpHostName AS hhn ON hhn.MerchantID = m.MerchantId AND hhn.Priority = 1
WHERE ul.UserLoginId NOT IN (1,2,100,372) AND ul.UserStatus <> 'D' AND (
ISNULL(orders.LastOrder, '2000-01-01') > '2020-01-01' OR ul.RegistrationDate > '2022-01-01'
)
GROUP BY ul.UserLoginId, ul.FullName, ul.EmailAddress, ul.Mobile
HAVING COUNT(m.MerchantId) > 1
Methods I have tried include adding the merchant name to a group by and displaying the count of the customers, however this does not work as I cannot have anything related to the Merchant in the GROUP BY, or I wouldn't be able to use HAVING clause to identify the customers that have been to multiple merchants. I have also tried selecting all the merchants and counting the distinct customers which doesn't work as it takes into account all the customers, not specifically the customers that have been to multiple merchants only.

How can I optimize my sql code?

I have following tables
contacts
contact_id | contact_slug | contact_first_name | contact_email | contact_date_added | company_id | contact_is_active | contact_subscribed | contact_last_name | contact_company | contact_twitter
contact_campaigns
contact_campaign_id | contact_id | contact_campaign_created | company_id | contact_campaign_sent
bundle_feedback
bundle_feedback_id | bundle_id, contact_id | company_id | bundle_feedback_rating | bundle_feedback_favorite_track_id | bundle_feedback_supporting | campaign_id
bundles
bundle_id | bundle_name | bundle_created | company_id | bundle_is_active
tracks
track_id | company_id | track_title
I wrote this query, but it works slowly, how can I optimize this query to make it faster ?
SELECT SQL_CALC_FOUND_ROWS c.contact_id,
c.contact_first_name,
c.contact_last_name,
c.contact_email,
c.contact_date_added,
c.contact_company,
c.contact_twitter,
concat(c.contact_first_name," ", c.contact_last_name) AS fullname,
c.contact_subscribed,
ifnull(icc.sendCampaignsCount, 0) AS sendCampaignsCount,
ifnull(round((ibf.countfeedbacks/sendCampaignsCount * 100),2), 0) AS percentFeedback,
ifnull(ibf.bundle_feedback_supporting, 0) AS feedbackSupporting
FROM contacts AS c
LEFT JOIN
(SELECT c.contact_id,
count(cc.contact_campaign_id) AS sendCampaignsCount
FROM contacts AS c
LEFT JOIN contact_campaigns AS cc ON cc.contact_id = c.contact_id
WHERE c.company_id = '876'
AND c.contact_is_active = '1'
AND cc.contact_campaign_sent = '1'
GROUP BY c.contact_id) AS icc ON icc.contact_id = c.contact_id
LEFT JOIN
(SELECT bf.contact_id,
count(*) AS countfeedbacks,
bf.bundle_feedback_supporting
FROM bundle_feedback bf
JOIN bundles b
JOIN contacts c
LEFT JOIN tracks t ON bf.bundle_feedback_favorite_track_id = t.track_id
WHERE bf.bundle_id = b.bundle_id
AND bf.contact_id = c.contact_id
AND bf.company_id='876'
GROUP BY bf.contact_id) AS ibf ON ibf.contact_id = c.contact_id
WHERE c.company_id = '876'
AND contact_is_active = '1'
ORDER BY percentFeedback DESC LIMIT 0, 25;
I have done 2 improvements
1) Removed the contacts which is getting joined unnecessarily twice and put the condition at the final where condition.
2) Removed as per SQL_CALC_FOUND_ROWS
Which is fastest? SELECT SQL_CALC_FOUND_ROWS FROM `table`, or SELECT COUNT(*)
SELECT c.contact_id,
c.contact_first_name,
c.contact_last_name,
c.contact_email,
c.contact_date_added,
c.contact_company,
c.contact_twitter,
concat(c.contact_first_name," ", c.contact_last_name) AS fullname,
c.contact_subscribed,
ifnull(icc.sendCampaignsCount, 0) AS sendCampaignsCount,
ifnull(round((ibf.countfeedbacks/sendCampaignsCount * 100),2), 0) AS percentFeedback,
ifnull(ibf.bundle_feedback_supporting, 0) AS feedbackSupporting
FROM contacts AS c
LEFT JOIN
(SELECT cc.contact_id,
count(cc.contact_campaign_id) AS sendCampaignsCount
FROM contact_campaigns
WHERE cc.contact_campaign_sent = '1'
GROUP BY cc.contact_id) AS icc ON icc.contact_id = c.contact_id
LEFT JOIN
(SELECT bf.contact_id,
count(*) AS countfeedbacks,
bf.bundle_feedback_supporting
FROM bundle_feedback bf
JOIN bundles b
LEFT JOIN tracks t ON bf.bundle_feedback_favorite_track_id = t.track_id
WHERE bf.bundle_id = b.bundle_id
GROUP BY bf.contact_id) AS ibf ON ibf.contact_id = c.contact_id
WHERE c.company_id = '876' and c.contact_is_active = '1'
First, you are not identifying any indexes you have to optimize the query. That said, I would ensure you have at least the following composite / covering indexes.
table index
contacts ( company_id, contact_is_active )
contact_campaigns ( contact_id, contact_campaign_sent )
bundle_feedback ( contact_id, bundle_feedback_supporting )
Next, as noted in other answer, unless you really need how many rows qualified, remove the "SQL_CALC_FOUND_ROWS".
In your first left-join (icc), you do a left-join on contact_campaigns (cc), but then throw into your WHERE clause an "AND cc.contact_campaign_sent = '1'" which turns that into an INNER JOIN. At the outer query level, these would result in no matching record and thus NULL for your percentage calculations.
In your second left-join (ibf), you are doing a join to the tracks table, but not utilizing anything from it. Also, you are joining to the bundles table but not using anything from there either -- unless you are getting multiple rows in the bundles and tracks tables which would result in a Cartesian result and possibly overstate your "CountFeedbacks" value. You also do not need the contacts table as you are not doing anything else with it, and the feedback table has the contact ID basis your are querying for. Since that is only grouped by the contact_id, your "bf.bundle_feedback_supporting" is otherwise wasted. If you want counts of feedback, just count from that table per contact ID and remove the rest. (also, the joins should have the "ON" clauses instead of within the WHERE clause for consistency)
Also, for your supporting feedback, the data type and value are unclear, so I implied as a Yes or No and have a SUM() based on how many are supporting. So, a given contact may have 100 records but only 37 are supporting. This gives you 1 record for the contact having BOTH values 100 and 37 respectively and not lost in a group by based on the first entry found for the contact.
I would try to summarize your query to below:
SELECT
c.contact_id,
c.contact_first_name,
c.contact_last_name,
c.contact_email,
c.contact_date_added,
c.contact_company,
c.contact_twitter,
concat(c.contact_first_name," ", c.contact_last_name) AS fullname,
c.contact_subscribed,
ifnull(icc.sendCampaignsCount, 0) AS sendCampaignsCount,
ifnull(round((ibf.countfeedbacks / icc.sendCampaignsCount * 100),2), 0) AS percentFeedback,
ifnull(ibf.SupportCount, 0) AS feedbackSupporting
FROM
contacts AS c
LEFT JOIN
( SELECT
c.contact_id,
count(*) AS sendCampaignsCount
FROM
contacts AS c
JOIN contact_campaigns AS cc
ON c.contact_id = cc.contact_id
AND cc.contact_campaign_sent = '1'
WHERE
c.company_id = '876'
AND c.contact_is_active = '1'
GROUP BY
c.contact_id) AS icc
ON c.contact_id = icc.contact_id
LEFT JOIN
( SELECT
bf.contact_id,
count(*) AS countfeedbacks,
SUM( case when bf.bundle_feedback_supporting = 'Y'
then 1 else 0 end ) as SupportCount
FROM
contacts AS c
JOIN bundle_feedback bf
ON c.contact_id = bf.contact_id
WHERE
c.company_id = '876'
AND c.contact_is_active = '1'
GROUP BY
bf.contact_id) AS ibf
ON c.contact_id = ibf.contact_id
WHERE
c.company_id = '876'
AND c.contact_is_active = '1'
ORDER BY
percentFeedback DESC LIMIT 0, 25;

using joins together with aggregates, and retrieving rows when no aggregate exists

The following query on my MySQL tables returns rows from the purchaseorder table that have corresponding entries in the deliveryorder table. How do I construct this query so that I get rows from the purchaseorder table even if no corresponding rows exist in the deliveryorder table? If the users want to see sql table CREATE statements, I can post those, but I'm not posting now as it really makes the question too big.
SELECT
`purchaseorder`.`id` AS `po_id`,
`purchaseorder`.`order_quantity` AS `po_order_quantity`,
`purchaseorder`.`applicable_approved_unit_rate` AS `po_unit_rate`,
`purchaseorder`.`applicable_sales_tax_rate` AS `po_tax_rate`,
`purchaseorder`.`order_date` AS `po_order_date`,
`purchaseorder`.`remarks` AS `po_remarks`,
`purchaseorder`.`is_open` AS `po_is_open`,
`purchaseorder`.`is_active` AS `po_is_active`,
`purchaseorder`.`approved_rate_id` AS `po_app_rate_id`,
`supplier`.`name` AS `sup_name`,
SUM(`deliveryorder`.`quantity`) AS `total_ordered`
FROM `purchaseorder`
LEFT JOIN `deliveryorder` ON (`deliveryorder`.`purchase_order_id` = `purchaseorder`.`id`)
INNER JOIN `approvedrate` ON (`purchaseorder`.`approved_rate_id` = `approvedrate`.`id`)
INNER JOIN `supplier` ON (`approvedrate`.`supplier_id` = `supplier`.`id`)
WHERE (
`purchaseorder`.`is_active` = 1
AND `purchaseorder`.`is_open` = 1
AND `deliveryorder`.`is_active` = 1
AND `approvedrate`.`material_id` = 2
)
HAVING `purchaseorder`.`order_quantity` >= `total_ordered` + 1
You have an aggregating function but no GROUP BY clause, which is wierd, but anyway - something like this? Oops - edited...
SELECT po.id po_id
, po.order_quantity po_order_quantity
, po.applicable_approved_unit_rate po_unit_rate
, po.applicable_sales_tax_rate po_tax_rate
, po.order_date po_order_date
, po.remarks po_remarks
, po.is_open po_is_open
, po.is_active po_is_active
, po.approved_rate_id po_app_rate_id
, s.name sup_name
, SUM(do.quantity) total_ordered
FROM purchaseorder po
LEFT
JOIN deliveryorder do
ON do.purchase_order_id = po.
AND do.is_active = 1
LEFT
JOIN approvedrate ar
ON ar.id = po.approved_rate_id
AND ar.material_id = 2
LEFT
JOIN supplier s
ON s.id = ar.supplier_id
WHERE po.is_active = 1
AND po.is_open = 1
HAVING po.order_quantity >= total_ordered + 1
I couldn't work out how to get the desired results all in one query, but ended up using the following two queries to fulfill my requirements: -
1st query
SELECT
pot.`id` AS `po_id`,
pot.`order_quantity` AS `po_order_quantity`,
pot.`applicable_approved_unit_rate` AS `po_unit_rate`,
pot.`applicable_sales_tax_rate` AS `po_tax_rate`,
pot.`is_open` AS `po_is_open`,
pot.`is_active` AS `po_is_active`,
st.`id` AS `sup_id`,
st.`name` AS `sup_name`,
SUM(dot.`quantity`) AS `total_ordered`
FROM `purchaseorder` pot
INNER JOIN `deliveryorder` dot ON (dot.`purchase_order_id` = pot.`id`)
INNER JOIN `approvedrate` art ON (pot.`approved_rate_id` = art.`id`)
INNER JOIN `supplier` st ON (art.`supplier_id` = st.`id`)
WHERE (
pot.`is_active` = 1
AND pot.`is_open` = 1
AND art.`material_id` = #materialid
AND art.`in_effect` = 1
AND art.`is_active` = 1
AND dot.`is_active` = 1
AND st.`is_active` = 1
)
HAVING pot.`order_quantity` >= `total_ordered` + #materialquantity
2nd query
SELECT
pot.`id` AS `po_id`,
pot.`order_quantity` AS `po_order_quantity`,
pot.`applicable_approved_unit_rate` AS `po_unit_rate`,
pot.`applicable_sales_tax_rate` AS `po_tax_rate`,
pot.`is_open` AS `po_is_open`,
pot.`is_active` AS `po_is_active`,
st.`id` AS `sup_id`,
st.`name` AS `sup_name`,
0 AS `total_ordered`
FROM `purchaseorder` pot
INNER JOIN `approvedrate` art ON (pot.`approved_rate_id` = art.`id`)
INNER JOIN `supplier` st ON (art.`supplier_id` = st.`id`)
WHERE (
pot.`is_active` = 1
AND pot.`is_open` = 1
AND art.`material_id` = #materialid
AND art.`in_effect` = 1
AND art.`is_active` = 1
AND st.`is_active` = 1
AND pot.`order_quantity` >= #materialquantity
AND pot.`id` NOT IN
(
SELECT dot.`purchase_order_id`
FROM `deliveryorder` dot
WHERE dot.is_active = 1
)
)

MySQL getting sum of tables with the same id

I have four tables person,loan,ca,payments
I would like to get the sum of all payments amounts and cash advance amounts which has the same ID as the loan joined with a person from a specific date.
Here is my code, but the sum is calculated incorrectly:
SELECT pd.*,
l.total_loan_amount,
sum(c.ca_total_amount) AS ctot,
sum(p.payment_amount)
FROM personal_data pd
LEFT JOIN loans l
ON pd.id_personal_data = l.id_personal_data
LEFT JOIN ca c
ON l.id_loan = c.id_loan
LEFT JOIN payments p
ON l.id_loan = p.id_loan
WHERE l.loan_date = curDate()
AND (
c.ca_date = curDate()
OR c.ca_date IS NULL
)
AND (
p.payment_date = curDate()
OR p.payment_date IS NULL
)
GROUP BY pd.id_personal_data
Doing that may sometimes retrieve invalid results because id may or may not sometimes be present on other table.
Try using a subquery for each column you want to retrieve.
SELECT pd.*,
l.total_loan_amount,
c.totalCA,
p.totalPayment
FROM personal_data pd
LEFT JOIN loans l
ON pd.id_personal_data = l.id_personal_data
LEFT JOIN
(
SELECT id_loan, SUM(ca_total_amount) totalCA
FROM ca
-- WHERE DATE(ca_date) = DATE(CURDATE()) OR
-- ca_date IS NULL
GROUP BY id_loan
) c ON l.id_loan = c.id_loan
LEFT JOIN
(
SELECT id_loan, SUM(payment_amount) totalPayment
FROM payments
-- WHERE DATE(payment_date) = DATE(CURDATE()) OR
-- payment_date IS NULL
GROUP BY id_loan
) p ON l.id_loan = p.id_loan
WHERE DATE(l.loan_date) = DATE(curDate())
I think dates on every payment and cash advance are irrelevant because you are looking for its totals based on the date of loan

Sum of rows with join

This is the current table layout.
There are 3 legs
Each leg has 2 points, where is_start = 1 is the start of the leg, and is_start is the end of the leg.
When the user check in at a point, a entry in points_user are created.
In this application you have multiple legs which has 2 points where one marks the start of the leg, where the other marks the end of the leg. So the sum of User's (with id = 2) Leg (with id= 1) is points_users.created where points_users.leg_id = 1 and points_users.user_id = 2 and points_users.is_start = 0 minus points_users where is_start = 1 (and the other parameters stay the same). And that's for just one leg.
What I would like is to sum all the time differences for each leg, we get the data like this:
| User.id | User.name | total_time |
| 1 | John | 129934 |
Anyone know how I can join these tables and sum it up grouped by user?
(No, this is not homework)
As far as I got:
SELECT
( `end_time` - `start_time` ) AS `diff`
FROM
(
SELECT SUM(UNIX_TIMESTAMP(`p1`.`created`)) AS `start_time`
FROM `points_users` AS `pu1`
LEFT JOIN `points` AS `p1` ON `pu1`.`point_id` = `p1`.`id`
WHERE `p1`.`is_start` = 1
) AS `start_time`,
(
SELECT SUM(UNIX_TIMESTAMP(`pu2`.`created`)) AS `end_time`
FROM `points_users` AS `pu2`
LEFT JOIN `points` AS `p2` ON `pu2`.`point_id` = `p2`.`id`
WHERE `p2`.`is_start` = 0
) AS `end_time`
Try this:
select users.user_id,
users.user_name,
SUM(timeDuration) totalTime
from users
join (
select
pStart.User_id,
pStart.leg_id,
(pEnd.created - pStart.created) timeDuration
from (select pu.user_id, pu.leg_id, pu.created
from points_users pu
join points p on pu.id = p.point_id and pu.leg_id = p.leg_id
where p.is_start = 1 ) pStart
join (select pu.user_id, pu.leg_id, pu.created
from points_users pu
join points p on pu.id = p.point_id and pu.leg_id = p.leg_id
where p.is_start = 0 ) pEnd
on pStart.user_id = pEnd.user_id
and pStart.leg_id = pEnd.leg_id
) tt
on users.user_id = tt.user_id
group by users.user_id, users.user_name
Subquery gets the time duration for each user/leg, and main query then sums them for all the legs of each user.
EDIT: Added the points table now that I can see your attempt at a query.
The simplest way is to join points_users to itself:
select leg_start.user_id, sum(leg_end.created - leg_start.created)
from points_users leg_start
join points_users leg_end on leg_start.user_id = leg_end.user_id
and leg_start.leg_id = leg_end.leg_id
join points point_start on leg_start.point_id = point_start.id
join points point_end on leg_end.point_id = point_end.id
where point_start.is_start = 1 and point_end.is_start = 0
group by leg_start.user_id
Some people prefer to put those is_start filters in the join condition, but since it's an inner join that's mainly just a point of style. If it were an outer join, then moving them from the WHERE to the JOIN could have an effect on the results.