SQL Aggregate with join giving incorrect results - mysql

In a bid to learn SQL i've added some dummy data into a few tables that i generated in Excel. I've got a table for customer, order headers and order lines.
Im trying to check that the customers balance, order header total and line totals all match.
But when I run this query I get the incorrect output for the orderheader, i believe it to be becuase its doing the SUM for the amount of times the orderlines table is referenced.
Can anyone tell me the correct way i should be doing it?
SELECT
cus.cus_id,
cus.cus_name,
cus.cus_balance,
SUM(orderheader.orderheader_currentsell) AS orderHeader_total,
SUM(orderlines.orderlines_currentsell) AS orderLines_total
FROM
cus
JOIN
orderheader ON orderheader.orderHeader_customer = cus.cus_id
JOIN
orderlines ON orderlines.orderlines_orderid = orderheader.orderHeader_id
GROUP BY cus.cus_name
output ( the highlighted column should be the same as the other values.)

You have multiple rows for the header. To solve this, aggregate before doing the join. In your case, just aggregating the order lines should be sufficient:
SELECT c.cus_id, c.cus_name, c.cus_balance,
SUM(oh.orderheader_currentsell) AS orderHeader_total,
SUM(ol.orderLines_total) AS orderLines_total
FROM cus c JOIN
orderheader oh
ON oh.orderHeader_customer = c.cus_id JOIN
(SELECT ol.orderlines_orderid, SUM((ol.orderlines_currentsell) as orderLines_total
FROM orderlines ol
GROUP BY ol.orderlines_orderid
) ol
ON ol.orderlines_orderid = oh.orderHeader_id
GROUP BY cus.cus_name;

Because you have different levels of grouping, it's not that trivial, and you need subselects.
You can calculate the total per customer as a subselect in the field list. In the code below I've done that just for the orders, but you could do the same for the order lines which are still solved by the grouping.
SELECT
cus.cus_id,
cus.cus_name,
cus.cus_balance,
( SELECT
SUM(orderheader_currentsell)
FROM
orderheader
WHERE
orderheader.orderHeader_customer = cus.cus_id) AS orderHeader_total,
SUM(orderlines.orderlines_currentsell) AS orderLines_total
FROM
cus
JOIN
orderlines ON orderlines.orderlines_orderid = orderheader.orderHeader_id
GROUP BY cus.cus_name

This is at first glance, but I am noticing you have:
cus.cus_id,
cus.cus_name,
cus.cus_balance,
as the non-aggregate columns. But in your Group-By you only have:
GROUP BY cus.cus_name
Group By should include all of the non-aggregate columns. This may be why you're not getting the expected results. That would be changed to:
GROUP BY cus.cus_id,
cus.cus_name,
cus.cus_balance

Related

sql query group by not working in subquery

I have a sql query based on 2 IDs in the same table, the results of the sum have come out correctly, but the problem is group by cannot handle data that appears 2 times
SELECT coa_a.debet_april,coa_a.namacoaapril, coa_b.kredit_april ,coa_b.namacoaapril
FROM `t_jurnalumum` join coa on coa.m_coa_4_id=t_jurnalumum.IdDebet
join (select DISTINCT m_coa_4_id, sum(a.Nilai) as debet_april, coa.namacoa as namacoaapril
from t_jurnalumum a j
join coa on a.IdDebet=coa.m_coa_4_id
where year (a.Tanggal)=2021 and month (a.Tanggal)=4
GROUP by a.IdDebet ) as coa_a on coa_a.m_coa_4_id=t_jurnalumum.IdDebet
join (select DISTINCT m_coa_4_id, sum(b.Nilai) as kredit_april, coa.namacoa as namacoaapril
from t_jurnalumum b
join coa on b.IdKredit=coa.m_coa_4_id
where year (b.Tanggal)=2021 and month (b.Tanggal)=4
GROUP by b.IdKredit ) as coa_b on coa_b.m_coa_4_id=t_jurnalumum.IdKredit
GROUP by coa_b.namacoaapril, coa_a.namacoaapril
this the result
and this is the main table
It is because you are using the same table. The first sub query of join will give you the distinct value from first group by and second group by gives distinct value. And at the end you also have grouped the whole result. The first one gives two distinct value and second one gives two distinct value which eventually gives all value.

MYSQL query to get project details and last MAX() action details from log

How can I write a MYSQL query to get project details and the entire last row of the activity log? I want a list of all the projects, with the data from each project's most recent row from the action log, all of it ordered by the most recent action log date DESC. Sorry, I know that this is a common query and the answer must be very easy. But I can't find the solution. I searched with every possible word combination. I found examples that need only one field such as MAX(id) from the joined table. I found solutions with COALESCE but can't seem to make them work. My problem is that I need many fields from the 'parent' table row PL_PROJECTS as well as many fields from the joined table PL_LOG row, not to mention people's names from the same table joined twice.
Everything I try either gives me all the rows of the PL_LOG, repeating rows from PL_PROJECTS. Or, I get just one row from PL_LOG for just one project if I put a LIMIT in the sub query. Here's my query that doesn't work:
SELECT
PJ.pj_id, PJ.pj_title, PJ.pj_location, PJ.pj_desc, PJ.pj_request, PJ.pj_date_start, PP1.pp_name AS supervisor_name, PP2.pp_name AS customer_name, ST.st_desc, logDate, logDesc
FROM PL_PROJECTS PJ
INNER JOIN PL_PEOPLE PP1 ON PJ.pj_spst_member = PP1.pp_id
INNER JOIN PL_PEOPLE PP2 ON PJ.pj_pp_id = PP2.pp_id
INNER JOIN PL_STATUS ST ON PJ.pj_status = ST.st_id
LEFT OUTER JOIN (
SELECT MAX(lg_pj_id) MaxLogID, lg_date AS logDate, lg_desc AS logDesc, lg_pj_id
FROM PL_LOG PL
ORDER BY lg_id DESC
)
LR ON LR.lg_pj_id = PJ.pj_id
GROUP BY PJ.pj_id
ORDER BY logDate DESC
LIMIT 9999999
I think you problem is, that your subselect only generates one row as you are using max() while you need one row per project (lg_pj_id i think).
You only need to rewrite the subselect to generate one row per project with the informations from the recent activity. Do you have an activity_ID in your action log? Because it looks like
lg_pj_id is the project_ID. The meaning of lg_desc is also unknown (or is that the action_log_id ?). Try to group by project_ID in you subselect and depending on your needs either select the max values from the corresponding rows or select the row with the maximum values per group (project_ID)
Thanks for the suggestion of GROUP BY to get one row per project. I tried changing the sub-query like so:
SELECT MAX(lg_id) AS MaxLogID, lg_desc, lg_pj_id
FROM PL_LOG PL
GROUP BY lg_pj_id
Now, I get one row from the log, but it gives me the max id, but not the lg_desc from the same row! If I try the sub-query by itself:
SELECT lg_id, lg_pj_id, lg_date, lg_desc
FROM `PL_LOG`
WHERE lg_pj_id = 33
ORDER BY lg_date DESC
I get these rows. You can see the max row, 68 has a description "30 minute skype call."
68,33,2018-06-10 00:00:00","30 minute skype call."
61,33,"2018-06-02 00:00:00","Sent email to try to elicit a response."
52,33,"2018-05-10 00:00:00","sent follow up email"
47,33,"2018-03-26 00:00:00","sent initial email"
46,33,"2018-03-26 00:00:00","sent initial email"
But when I try to get just that row, using GROUP BY, it gives me the max lg_id, but the first lg_desc. I need the data all from the max(lg_id) row:
SELECT MAX(lg_id) AS MaxLogID, lg_pj_id, lg_date, lg_desc
FROM PL_LOG
WHERE lg_pj_id = 33
GROUP BY lg_pj_id
ORDER BY MaxLogID DESC
Returns:
68, 33, "2018-03-26 00:00:00", "sent initial email"
Try this as mentioned in my comment:
SELECT
PJ.pj_id, PJ.pj_title, PJ.pj_location, PJ.pj_desc, PJ.pj_request,
PJ.pj_date_start, PP1.pp_name AS supervisor_name, PP2.pp_name AS
customer_name, ST.st_desc, logDate, logDesc
FROM PL_PROJECTS PJ
INNER JOIN PL_PEOPLE PP1 ON PJ.pj_spst_member = PP1.pp_id
INNER JOIN PL_PEOPLE PP2 ON PJ.pj_pp_id = PP2.pp_id
INNER JOIN PL_STATUS ST ON PJ.pj_status = ST.st_id
LEFT JOIN (SELECT lg_id, lg_date AS logDate, lg_desc AS logDesc, lg_pj_id
FROM PL_LOG AS PL
WHERE PL.lg_id=(SELECT MAX(lg_id) FROM PL_LOG AS PL_2
WHERE PL_LOG.lg_pj_id = PL_2.lg_pj_id )
LR ON LR.lg_pj_id = PJ.pj_id
GROUP BY PJ.pj_id
ORDER BY logDate DESC
LIMIT 9999999

MySQL Inner join naming error?

http://sqlfiddle.com/#!9/e6effb/1
I'm trying to get a top 10 by revenue per brand for France on december.
There are 2 tables (first table has date, second table has brand and I'm trying to join them)
I get this error "FUNCTION db_9_d870e5.SUM does not exist. Check the 'Function Name Parsing and Resolution' section in the Reference Manual"
Is my use of Inner join there correct?
It's because you had an extra space after SUM. Please change it from
SUM (o1.total_net_revenue)to SUM(o1.total_net_revenue).
See more about it here.
Also after correcting it, your query still had more error as you were not selecting order_id on your intermediate table i2 so edited here as :
SELECT o1.order_id, o1.country, i2.brand,
SUM(o1.total_net_revenue)
FROM orders o1
INNER JOIN (
SELECT i1.brand, SUM(i1.net_revenue) AS total_net_revenue,order_id
FROM ordered_items i1
WHERE i1.country = 'France'
GROUP BY i1.brand
) i2
ON o1.order_id = i2.order_id AND o1.total_net_revenue = i2.total_net_revenue
AND o1.total_net_revenue = i2.total_net_revenue
WHERE o1.country = 'France' AND o1.created_at BETWEEN '2016-12-01' AND '2016-12-31'
GROUP BY 1,2,3
ORDER BY 4
LIMIT 10`
--EDIT stack Fan is correct that the o2.total_net_revenue exists. My confusion was because the data structure duplicated three columns between the tables, including one that was being looked for.
There were a couple errors with your SQL statement:
1. You were referencing an invalid column in your outer-select-SUM function. I believe you're actually after i2.total_net_revenue.
The table structure is terrible, the "important" columns (country, revenue, order_id) are duplicated between the two tables. I would also expect the revenue columns to share the same name, if they always have the same values in them. In the example, there's no difference between i1.net_revenue and o1.total_net_revenue.
In your inner join, you didn't reference i1.order_id, which meant that your "on" clause couldn't execute correctly.
PROTIP:
When you run into an issue like this, take all the complicated bits out of your query and get the base query working correctly first. THEN add your functions.
PROTIP:
In your GROUP BY clause, reference the actual columns, NOT the column numbers. It makes your query more robust.
This is the query I ended up with:
SELECT o1.order_id, o1.country, i2.brand,
SUM(i2.total_net_revenue) AS total_rev
FROM orders o1
INNER JOIN (
SELECT i1.order_id, i1.brand, SUM(i1.net_revenue) AS total_net_revenue
FROM ordered_items i1
WHERE i1.country = 'France'
GROUP BY i1.brand
) i2
ON o1.order_id = i2.order_id AND o1.total_net_revenue = i2.total_net_revenue
AND o1.total_net_revenue = i2.total_net_revenue
WHERE o1.country = 'France' AND o1.created_at BETWEEN '2016-12-01' AND '2016-12-31'
GROUP BY o1.order_id, o1.country, i2.brand
ORDER BY total_rev
LIMIT 10

Why INNER JOIN dont not work correct?

I have one SQL query with INNER JOINS. I need to get all offers from table offers.
Table offers is empty now. But the following query returns one row with NULL field.
Why is it returned? How to fix that? I need to return 0 rows if table is empty.
Query:
select *, SUM(offers.price * announcement_product.amount) AS total, announcements.user_id AS creator_ann, announcements.id AS ann_id,
announcements.delivery AS deliveryAnn, announcements.payment AS
paymentAnn, SUM(announcement_product.amount) AS amount,
announcement_product.name as name_product
from `offers`
inner join `announcements` on `announcements`.`id` = `offers`.`announcement_id`
inner join `announcement_product` on `offers`.`announcement_product_id` = `announcement_product`.`id`
inner join `countries` on `countries`.`id` = `announcements`.`country`
where `offers`.`user_id` = 1 and `offers`.`status` = 1 and `offers`.`deleted_at` is null
You're using the aggregate function SUM(), but you don't have any GROUP BY clause.
When you do that you are instructing MySQL to add up all the row values in the column you mention in SUM(). It will do that even if there are no rows to add up.
For best results you should study up on the GROUP BY function and how to use it with SUM(). It's hard to guess what you want from your query.
I'm not sure, but I don't think
select *, ..
when there's multiple tables in the query is valid.
Try
select offers.*,..
This how Your select structure should be :
Select
Id,
Sku,
Sum(Onhand),
Sum(price)
From mytable
Where mytable Onhand > 0
Group by
Id,Sku
If you are going to use aggregate function such as Max,Sum,Min,....
you need to use group by for other table fields that your using in the select part.

MySql query runs very slow(actually never gives output) without where clause

I have a mysql query and it works fine when i use where clause, but when i donot use
where clause it gone and never gives the output and finally timeout.
Actually i have used Explain command to check the performance of the query and in both cases the Explain gives the same number of rows used in joining.
I have attached the image of output got with Explain command.
Below is the query.
I couldn't figure whats the problem here.
Any help is highly appreciated.
Thanks.
SELECT
MCI.CLIENT_ID AS CLIENT_ID, MCI.NAME AS CLIENT_NAME, MCI.PRIMARY_CONTACT AS CLIENT_PRIMARY_CONTACT,
MCI.ADDED_BY AS SP_ID, CONCAT(MUD_SP.FIRST_NAME, ' ', MUD_SP.LAST_NAME) AS SP_NAME,
MCI.FK_PROSPECT_ID AS PROSPECT_ID, MCI.DATE_ADDED AS ADDED_ON,
(SELECT GROUP_CONCAT(LT.TAG_TEXT SEPARATOR ', ')
FROM LK_TAG LT
INNER JOIN M_OBJECT_TAG_MAPPING MOTM
ON LT.PK_ID = MOTM.FK_TAG_ID
WHERE MOTM.FK_OBJECT_ID = MCI.FK_PROSPECT_ID
AND MOTM.OBJECT_TYPE = 1
AND MOTM.IS_ACTIVE = 1
) AS TAGS,
IFNULL(SUM(GET_DIGITS(MMR.RCP_AMOUNT)), 0) AS REVENUE_SO_FAR,
IFNULL(SUM(GET_DIGITS(MMR.RCP_RUPEES)), 0) AS REVENUE_INR,
COUNT(DISTINCT PMI_MONTHLY.PROJECT_ID) AS MONTHLY,
COUNT(DISTINCT PMI_FIXED.PROJECT_ID) AS FIXED,
COUNT(DISTINCT PMI_HOURLY.PROJECT_ID) AS HOURLY,
COUNT(DISTINCT PMI_ANNUAL.PROJECT_ID) AS ANNUAL,
COUNT(DISTINCT PMI_CURRENTLY_RUNNING.PROJECT_ID) AS CURRENTLY_RUNNING_PROJECTS,
COUNT(DISTINCT PMI_YET_TO_START.PROJECT_ID) AS YET_TO_START_PROJECTS,
COUNT(DISTINCT PMI_TECH_SALES_CLOSED.PROJECT_ID) AS TECH_SALES_CLOSED_PROJECTS
FROM
M_CLIENT_INFO MCI
INNER JOIN M_USER_DETAILS MUD_SP
ON MCI.ADDED_BY = MUD_SP.PK_ID
LEFT OUTER JOIN M_MONTH_RECEIPT MMR
ON MMR.CLIENT_ID = MCI.CLIENT_ID
LEFT OUTER JOIN M_PROJECT_INFO PMI_FIXED
ON PMI_FIXED.CLIENT_ID = MCI.CLIENT_ID AND PMI_FIXED.PROJECT_TYPE = 1
LEFT OUTER JOIN M_PROJECT_INFO PMI_MONTHLY
ON PMI_MONTHLY.CLIENT_ID = MCI.CLIENT_ID AND PMI_MONTHLY.PROJECT_TYPE = 2
LEFT OUTER JOIN M_PROJECT_INFO PMI_HOURLY
ON PMI_HOURLY.CLIENT_ID = MCI.CLIENT_ID AND PMI_HOURLY.PROJECT_TYPE = 3
LEFT OUTER JOIN M_PROJECT_INFO PMI_ANNUAL
ON PMI_ANNUAL.CLIENT_ID = MCI.CLIENT_ID AND PMI_ANNUAL.PROJECT_TYPE = 4
LEFT OUTER JOIN M_PROJECT_INFO PMI_CURRENTLY_RUNNING
ON PMI_CURRENTLY_RUNNING.CLIENT_ID = MCI.CLIENT_ID AND PMI_CURRENTLY_RUNNING.STATUS = 4
LEFT OUTER JOIN M_PROJECT_INFO PMI_YET_TO_START
ON PMI_YET_TO_START.CLIENT_ID = MCI.CLIENT_ID AND PMI_YET_TO_START.STATUS < 4
LEFT OUTER JOIN M_PROJECT_INFO PMI_TECH_SALES_CLOSED
ON PMI_TECH_SALES_CLOSED.CLIENT_ID = MCI.CLIENT_ID AND PMI_TECH_SALES_CLOSED.STATUS > 4
WHERE YEAR(MCI.DATE_ADDED) = '2012'
GROUP BY MCI.CLIENT_ID ORDER BY CLIENT_NAME ASC
Yes, as many people have said, the key is that when you have the where clause, mysql engine filters the table M_CLIENT_INFO --probably drammatically--.
A similar result as removing the where clause is to to add this where clause:
where 1 = 1
You will see that the performance is degraded also because mysql will try to get all the data.
Remove the where clause and all columns from select and add a count to see how many records you get. If it is reasonable, say up to 10k, then do the following,
put back the select columns related to M_CLIENT_INFO
do not include the nested one "TAGS"
remove all your joins
run your query without where clause and gradually include the joins
this way you'll find out when the timeout is caused.
I would try the following. First, MySQL has a keyword "STRAIGHT_JOIN" which tells the optimizer to do the query in the table order you've specified. Since all you left-joins are child-related (like a lookup table), you don't want MySQL to try and interpret one of those as a primary basis of the query.
SELECT STRAIGHT_JOIN ... rest of query.
Next, your M_PROJECT_INFO table, I dont know how many columns of data are out there, but you appear to be concentrating on just a few columns on your DISTINCT aggregates. I would make sure you have a covering index on these elements to help the query via an index on
( Client_ID, Project_Type, Status, Project_ID )
This way the engine can apply the criteria and get the distinct all out of the index instead of having to go back to the raw data pages for the query.
Third, your M_CLIENT_INFO table. Ensure that has an index on both your criteria, group by AND your Order By, and change your order by from the aliased "CLIENT_NAME" to the actual column of the SQL table so it matches the index
( Date_Added, Client_ID, Name )
I have "name" in ticks as it is also a reserved word and helps clarify the column, not the keyword.
Next, the WHERE clause. Whenever you apply a function to an indexed column name, it doesn't work the greatest, especially on date/time fields... You might want to change your where clause to
WHERE MCI.Date_Added between '2012-01-01' and '2012-12-31 23:59:59'
so the BETWEEN range is showing the entire year and the index can better be utilized.
Finally, if the above do not help, I would consider splitting your query some. The GROUP_CONCACT inline select for the TAGS might be a bit of a killer for you. You might want to have all the distinct elements first for the grouping per client, THEN get those details.... Something like
select
PQ.*,
group_concat(...) tags
from
( the entire primary part of the query ) as PQ
Left join yourGroupConcatTableBasis on key columns