SQL SUM issues with joins - mysql

I got a quite complex query (at least for me).
I want to create a list of users that are ready to be paid. There are 2 conditions that need to be met: order status should be 3 and the total should be more then 50. Currently I got this query (generated with Codeingiter active record):
SELECT `services_payments`.`consultant_id`
, `consultant_userdata`.`iban`
, `consultant_userdata`.`kvk`, `consultant_userdata`.`bic`
, `consultant_userdata`.`bankname`
, SUM(`services_payments`.`amount`) AS amount
FROM (`services_payments`)
JOIN `consultant_userdata`
ON `consultant_userdata`.`user_id` = `services_payments`.`consultant_id`
JOIN `services`
ON `services`.`id` = `services_payments`.`service_id`
WHERE `services`.`status` = 3
AND `services_payments`.`paid` = 0
HAVING `amount` > 50
The services_payments table contains the commissions, consultant_userdata contains the userdata and services keeps the order data. The current query only gives me 1 result while I'm expecting 4 results.
Could someone please tell me what I'm doing wrong and what would be the solution?

For ActiveRecord, rsanchez' answer would be more of
$this->db->group_by('services_payments.consultant_id, consultant_userdata.iban, consultant_userdata.kvk, consultant_userdata.bic, consultant_userdata.bankname');

Related

MYSQL Query - joining tables and grouping results

I'm after some help with a report I'm designing please.
My report includes results from a booking database where I'd like to show each booking on a single line. However as the booking database has a number of tables my MYSQL query involves JOINS which is resulting in multiple rows per booking. It is the multiple results for "dcea4_eb_field_values.field_value" per booking causing the repeating rows.
This is my query
SELECT
dcea4_eb_events.event_date,
dcea4_eb_events.title,
dcea4_eb_registrants.id,
dcea4_eb_registrants.first_name,
dcea4_eb_registrants.last_name,
dcea4_eb_registrants.email,
dcea4_eb_registrants.register_date,
dcea4_eb_registrants.amount,
dcea4_eb_registrants.comment,
dcea4_eb_field_values.field_id,
dcea4_eb_field_values.field_value
FROM dcea4_eb_events
INNER JOIN dcea4_eb_registrants ON dcea4_eb_registrants.event_id = dcea4_eb_events.id
INNER JOIN dcea4_eb_field_values ON dcea4_eb_field_values.registrant_id = dcea4_eb_registrants.id
WHERE 1=1
AND (dcea4_eb_field_values.field_id = 14 OR dcea4_eb_field_values.field_id = 26 OR dcea4_eb_field_values.field_id = 27 OR dcea4_eb_field_values.field_id = 15)
AND dcea4_eb_registrants.published <> 2
AND dcea4_eb_registrants.published IS NOT NULL
AND (dcea4_eb_registrants.published = 1 OR dcea4_eb_registrants.payment_method = "os_offline")
[ AND (dcea4_eb_registrants.register_date {RegistrationDate} ) ]
[ AND REPLACE(dcea4_eb_events.title,'\'','') in ({Club}) ]
ORDER BY dcea4_eb_registrants.register_date,
dcea4_eb_events.title
This is what the output currently looks like
current result
and this is what I'd like it to look like
desired result
Any help appreciated

MySQL in clause slow with 10 or more items

This query takes 18 seconds
SELECT `wd`.`week` AS `start_week`, `wd`.`hold_code`, COUNT(wd.hold_code) AS hold_code_count
FROM `weekly_data` AS `wd`
JOIN aol_reporting_hold_codes hc ON hc.hold_code = wd.hold_code AND chart = 'GR'
WHERE `wd`.`days` <= 6
AND `wd`.`hold_code` IS NOT NULL
AND NOT `wd`.`hold_code` = ''
AND `wd`.`week` >= '201717'
AND `wd`.`itemgroup` IN ('BOTDTO', 'BOTDWG', 'C&FORG', 'C&FOTO', 'MF-SUB', 'MI-SUB', 'PROPRI', 'PROPTO', 'STRSTO', 'STRSUB')
AND `production_type` = 2
AND `contract` = "1234"
AND `project` = 8
GROUP BY `start_week`, `wd`.`hold_code`
This query takes 4 seconds
SELECT `wd`.`week` AS `start_week`, `wd`.`hold_code`, COUNT(wd.hold_code) AS hold_code_count
FROM `weekly_data` AS `wd`
JOIN aol_reporting_hold_codes hc ON hc.hold_code = wd.hold_code AND chart = 'GR'
WHERE `wd`.`days` <= 6
AND `wd`.`hold_code` IS NOT NULL
AND NOT `wd`.`hold_code` = ''
AND `wd`.`week` >= '201717'
AND `wd`.`itemgroup` IN ('BOTDWG', 'C&FORG', 'C&FOTO', 'MF-SUB', 'MI-SUB', 'PROPRI', 'PROPTO', 'STRSTO', 'STRSUB')
AND `production_type` = 2
AND `contract` = "1234"
AND `project` = 8
GROUP BY `start_week`, `wd`.`hold_code`
All I have done is removed one item from the IN clause. I can remove any one of the items. It runs in 4 seconds as long as there are 9 items or less. It takes 18 seconds to run as soon as I increase to 10 items.
I thought MySQL limited length of command by size i.e. 1MB
More than just the EXPLAIN, use EXPLAIN FORMAT=JSON and get the "Optimizer trace" for the query. I suspect the length of the IN leads to picking a different query plan.
There is virtually no limit to the number of items in IN. I have seen as many as 70K.
That aside, you may be able to speed up even the 4-sec version...
I suggest having this index. Grrr... I can't tell which columns are in which tables. So, if these are all in one table, then make such an index:
INDEX(production_type, contract, project) -- in any order
If those are all in wd, then tack on a 4th column - any of week, itemgroup, days.
Be cautious about COUNT(wd.hold_code).
COUNT(x) checks x for being non-NULL; is that what you want? If not, then simply say COUNT(*).
When JOINing, then GROUP BY, you get an "explode-implode". The number of intermediate rows is big; that is when the COUNT is performed.
It seems wrong to both COUNT(hold_code) and GROUP BY hold_code. What are you trying to do?
For further discussion, please provide SHOW CREATE TABLE and EXPLAIN.
Please note MySql IN clause limit is established with max_allowed_packet value. You may check with NOT IN if results are faster. Also I suggest put values to be checked with IN clause under a buffer string instead of comma separated values and then give a try.

Having an issue with displaying most recent Call_date

I am trying to output the most recent Call_date. I have tried using the MAX function with no luck. Below I have tagged 3 images showing the database tables, my current code output and the required output. Underneath that is my current code. Any help is appreciated!
Database Tables - https://imgur.com/a/7ZPFO
Output we are looking for - https://imgur.com/a/k3idB
Output my code currently gives - https://imgur.com/a/H53vq
Here is what I have tried:
SELECT Staff.First_name, Staff.Last_name, call_date, taken_by
FROM Issue
JOIN Caller ON Issue.Caller_id = Caller.Caller_id
JOIN Staff ON Issue.Taken_by = Staff.Staff_code
WHERE Caller.First_name = 'Harry'
I would just add the following to the end of your query:
ORDER BY call_date DESC LIMIT 1
This will give you one row as a result. And that row will be the one with the most recent call_date.
Based on the code provided, you're only asking for 3 columns so it's a matter of a join. When you select the max call date you need to group by the other 2 non-aggregate columns. If the date column is of datatype Date or Datetime then this should work:
SELECT Caller.First_name, Caller.Last_name --from Caller_id
,MAX(Issue.call_date) AS call_date
FROM Issue INNER JOIN Caller ON Issue.Caller_id = Caller.Caller_id
WHERE Caller.First_name = 'Harry'
GROUP BY Caller.First_name, Caller.Last_name

query optimization for mysql

I have the following query which takes about 28 seconds on my machine. I would like to optimize it and know if there is any way to make it faster by creating some indexes.
select rr1.person_id as person_id, rr1.t1_value, rr2.t0_value
from (select r1.person_id, avg(r1.avg_normalized_value1) as t1_value
from (select ma1.person_id, mn1.store_name, avg(mn1.normalized_value) as avg_normalized_value1
from matrix_report1 ma1, matrix_normalized_notes mn1
where ma1.final_value = 1
and (mn1.normalized_value != 0.2
and mn1.normalized_value != 0.0 )
and ma1.user_id = mn1.user_id
and ma1.request_id = mn1.request_id
and ma1.request_id = 4 group by ma1.person_id, mn1.store_name) r1
group by r1.person_id) rr1
,(select r2.person_id, avg(r2.avg_normalized_value) as t0_value
from (select ma.person_id, mn.store_name, avg(mn.normalized_value) as avg_normalized_value
from matrix_report1 ma, matrix_normalized_notes mn
where ma.final_value = 0 and (mn.normalized_value != 0.2 and mn.normalized_value != 0.0 )
and ma.user_id = mn.user_id
and ma.request_id = mn.request_id
and ma.request_id = 4
group by ma.person_id, mn.store_name) r2
group by r2.person_id) rr2
where rr1.person_id = rr2.person_id
Basically, it aggregates data depending on the request_id and final_value (0 or 1). Is there a way to simplify it for optimization? And it would be nice to know which columns should be indexed. I created an index on user_id and request_id, but it doesn't help much.
There are about 4907424 rows on matrix_report1 and 335740 rows on matrix_normalized_notes table. These tables will grow as we have more requests.
First, the others are right about knowing better how to format your samples. Also, trying to explain in plain language what you are trying to do is also a benefit. With sample data and sample result expectations is even better.
However, that said, I think it can be significantly simplified. Your queries are almost completely identical with the exception of the one field of "final_value" = 1 or 0 respectively. Since each query will result in 1 record per "person_id", you can just do the average based on a CASE/WHEN AND remove the rest.
To help optimize the query, your matrix_report1 table should have an index on ( request_id, final_value, user_id ). Your matrix_normalized_notes table should have an index on ( request_id, user_id, store_name, normalized_value ).
Since your outer query is doing the average based on an per stores averages, you do need to keep it nested. The following should help.
SELECT
r1.person_id,
avg(r1.ANV1) as t1_value,
avg(r1.ANV0) as t0_value
from
( select
ma1.person_id,
mn1.store_name,
avg( case when ma1.final_value = 1
then mn1.normalized_value end ) as ANV1,
avg( case when ma1.final_value = 0
then mn1.normalized_value end ) as ANV0
from
matrix_report1 ma1
JOIN matrix_normalized_notes mn1
ON ma1.request_id = mn1.request_id
AND ma1.user_id = mn1.user_id
AND NOT mn1.normalized_value in ( 0.0, 0.2 )
where
ma1.request_id = 4
AND ma1.final_Value in ( 0, 1 )
group by
ma1.person_id,
mn1.store_name) r1
group by
r1.person_id
Notice the inner query is pulling all transactions for the final value as either a zero OR one. But then, the AVG is based on a case/when of the respective value for the normalized value. When the condition is NOT the 1 or 0 respectively, the result is NULL and is thus not considered when the average is computed.
So at this point, it is grouped on a per-person basis already with each store and Avg1 and Avg0 already set. Now, roll these values up directly per person regardless of the store. Again, NULL values should not be considered as part of the average computation. So, if Store "A" doesn't have a value in the Avg1, it should not skew the results. Similarly if Store "B" doesnt have a value in Avg0 result.

SQL Statement running extreamly Slow

Okay I have look through several posts about SQL running slow and I didn't see anything similar to this, so my apologies if I missed one. I was asked about a month ago to create a view that would allow the user to report what hasn't been billed yet, and the SQL is joining 4 tables together. One is 1.2 million records ish. the rest are between 80K - 250K. (Please note that this should only return around 100 records after the where statements).
SELECT C.Cltnum + '.' + C.CltEng AS [ClientNum],
C.CPPLname,
w.WSCDesc,
MIN(w.Wdate) AS [FirstTDate],
w.WCodeCat,
w.WCodeSub,
SUM(w.Wbilled) AS [Billed],
SUM(w.Whours * w.Wrate) AS [Billable Wip],
sum(ar.[ARProgress]) AS [Progress],
w.Winvnum,
-- dbo.GetInvoiceAmount(w.Winvnum) AS [InvoiceAmount],
SUM(cb.cinvar) AS [AR Balance]
FROM dbo.WIP AS w
--Never join on a select statement
--join BillingAuditCatagoriesT bac on w.WCodeCat = bac.Catagory and w.WCodeSub = bac.Subcatagory
INNER JOIN dbo.Clients AS C ON w.WCltID = C.ID
JOIN dbo.ClientBuckets AS cb on c.cltnum = cb.cltnum
JOIN dbo.AcctsRec AS ar on ar.arapplyto = w.[Winvnum]
-- WHERE w.wcodecat = '1AUDT'
GROUP BY C.Cltnum, C.CltEng, C.CPPLname, w.WCodeCat, w.Wdate, w.Winvnum, w.WCodeSub, w.WSCDesc
so, where I think there may be a problem is that Category is a varchar it is xat, ACT, BID and there are about 15 different Category. this is the same as SubCat. you will notice that there are 3 functions on this and they are GetJamesProgress Which is = (SELECT sum(Amount) From Progress Where inv = w.invnum) and the same with GetInvoiceAmount and GetJamesARBalance. I know that this is bad to do but when I join by invoice number it takes even longer than with out them.
Please help thanks so much!