Complex selection query from multiple tables - mysql

I need to select records for review based a nutty set of factors primarily from two tables, supplementing additional data from a third. here are the factors:
is it annual, and not overridden? (stays annual)
OR it is NOT annual, BUT IS overridden? (WAS biennial, but now is annual)
OR does the searchYear match the even/oddness of the recordYear?
(this handles biennial cases that would qualify for the searchYear)
AND finally does it match the searchQtr? (because this is a quarterly summary)
so based on that, here's my query that i've assembled (and is throwing an error):
select ar.id as eNum, ar.quarter as qtr, dd.description as device, ar.building_name AS bldgName, ir.inspectorName
from (select * from asset_roster) ar
join device_descriptions dd on ar.descriptionID = dd.id
left join inspector_roster ir on ar.inspectorID = ir.id
where ar.id not in ( select eNumber from safety_report )
and ( ( (dd.annual = '1' and ar.override = '0') or
(dd.annual = '0' and ar.override = '1') ) or
( (select mod( left(ar.quarter,2))) = '1') )
and ( right(ar.quarter,2) = 'Q2' )
group by bldgName asc;
so this query fails with this message:
Error in query (1064): Syntax error near ')) = '1') ) and ( right(ar.quarter,2) = 'Q2' ) group by bldgName asc' at line 8
some things to note: the query up to the first and works on it's own. and the last and, above the group by... works as well. so I've narrowed it to the group of nested ands. sadly, I'm unable to unravel exactly what the issue is here. which is why this post now exists.

Related

MySQL - Select Field With Nested 'WHERE'

I'm working on a project which uses the SUM command to get a number of values. Now, all of this works fine but there is an issue when it comes to load time as the query takes 3.4 seconds to complete.
Here is an example of what I have so far:
SELECT
p.`player_id`,
p.`player_name` AS `name`,
d.`player_debut` AS `debut`,
SUM(a.`player_order` <= '11' OR a.`player_sub` != '0') AS `apps`,
SUM(a.`player_order` <= '11') AS `starts`,
SUM(a.`player_goals`) AS `goals`
FROM
`table1` r,
`table2` a,
`table3` p
LEFT JOIN `table4` d ON p.`player_id` = d.`player_id`
WHERE
r.`match_id` = a.`match_id` AND
a.`player_id` = p.`player_id` AND
r.`void` = '0'
GROUP BY
a.`player_id`
ORDER BY
p.`player_name` ASC
Cast your mind to line 4. That field is retrieved by making use of the LEFT JOIN further down the query. By taking those two lines out, load time decreases to less than 0.5 seconds - a significant improvement.
What I'm trying to achieve there (line 4), without success, is something similar to lines 5-7, where a sort of invisible WHERE clause has been applied.
The idea would be t4.date WHERE t2.order <= '14', but I'm not sure how I'd be able to get this to work without the aforementioned LEFT JOIN and increased load time that comes with it.
For clarification, here is how table4 was created - with the following query turned into a VIEW.
SELECT a.`player_id`, m.`date` AS `player_debut`
FROM
`table1` r,
`table2` a,
`table3` p
WHERE
a.`match_id` = m.`match_id` AND
a.`player_id` = p.`player_id` AND
m.`match_void` = '0' AND
(
a.`player_order` BETWEEN '1' AND '11' OR
a.`player_sub_on_for` != '0'
)
GROUP BY p.`player_id`
ORDER BY p.`player_name` ASC
Essentially, as I am making use of the same tables for both queries and only utilising a different WHERE clause, I'm trying to establish if there is a way to 'nest' this.
You may just need conditional aggregation
SELECT p.`player_id`, p.`player_name` AS `name`,
min(case when a.player_order <= '11' OR a.`player_sub` != '0' then r.date else 0 end) `debut`,
SUM(case when a.`player_order` <= '11' OR a.`player_sub` != '0' then 1 else 0 end) AS `apps`,
SUM(case when a.`player_order` <= '11' then 1 else 0 end) AS `starts`,
SUM(a.`player_goals`) AS `goals`
FROM
`table1` r,
left join `table2` a on r.`match_id` = a.`match_id`,
left join `table3` p on a.`player_id` = p.`player_id`
WHERE r.`void` = '0'
GROUP BY p.player_id,a.`player_id`
ORDER BY p.player_id,p.`player_name`;
There seem to be some inconsistencies in your column names ( a.player_sub,a.player_sub_on_for, m.match_void, r.void = '0') so I may not have got this quite right , and group by clause without aggregation is pointless.

mysql Multiple left joins using count

I have been researching this for hours and the best code that I have come up with is this from an example i found on overstack. I have been through several derivations but the following is the only query that returns the correct data, the problem is it takes over 139s (more than 2 minutes) to return only 30 rows of data. Im stuck. (life_p is a 'likes'
SELECT
logos.id,
logos.in_gallery,
logos.active,
logos.pubpriv,
logos.logo_name,
logos.logo_image,
coalesce(cc.Count, 0) as CommentCount,
coalesce(lc.Count, 0) as LikeCount
FROM logos
left outer join(
select comments.logo_id, count( * ) as Count from comments group by comments.logo_id
) cc on cc.logo_id = logos.id
left outer join(
select life_p.logo_id, count( * ) as Count from life_p group by life_p.logo_id
) lc on lc.logo_id = logos.id
WHERE logos.active = '1'
AND logos.pubpriv = '0'
GROUP BY logos.id
ORDER BY logos.in_gallery desc
LIMIT 0, 30
I'm not sure whats wrong. If i do them singularly meaningremove the coalece and one of the joins:
SELECT
logos.id,
logos.in_gallery,
logos.active,
logos.pubpriv,
logos.logo_name,
logos.logo_image,
count( * ) as lc
FROM logos
left join life_p on life_p.logo_id = logos.id
WHERE logos.active = '1'
AND logos.pubpriv = '0'
GROUP BY logos.id
ORDER BY logos.in_gallery desc
LIMIT 0, 30
that runs in less than half a sec ( 2-300 ms )....
Here is a link to the explain: https://logopond.com/img/explain.png
MySQL has a peculiar quirk that allows a group by clause that does not list all non-aggregating columns. This is NOT a good thing and you should always specify ALL non-aggregating columns in the group by clause.
Note, when counting over joined tables it is useful to know that the COUNT() function ignores NULLs, so for a LEFT JOIN where NULLs can occur don't use COUNT(*), instead use a column from within the joined table and only rows from that table will be counted. From these points I would suggest the following query structure.
SELECT
logos.id
, logos.in_gallery
, logos.active
, logos.pubpriv
, logos.logo_name
, logos.logo_image
, COALESCE(COUNT(cc.logo_id), 0) AS CommentCount
, COALESCE(COUNT(lc.logo_id), 0) AS LikeCount
FROM logos
LEFT OUTER JOIN comments cc ON cc.logo_id = logos.id
LEFT OUTER JOIN life_p lc ON lc.logo_id = logos.id
WHERE logos.active = '1'
AND logos.pubpriv = '0'
GROUP BY
logos.id
, logos.in_gallery
, logos.active
, logos.pubpriv
, logos.logo_name
, logos.logo_image
ORDER BY logos.in_gallery DESC
LIMIT 0, 30
If you continue to have performance issues then use a execution plan and consider adding indexes to suit.
You can create some indexes on the joining fields:
ALTER TABLE table ADD INDEX idx__tableName__fieldName (field)
In your case will be something like:
ALTER TABLE cc ADD INDEX idx__cc__logo_id (logo_id);
I dont really like it because ive always read that sub queries are bad and that joins perform better under stress, but in this particular case subquery seems to be the only way to pull the correct data in under half a sec consistently. Thanks for the suggestions everyone.
SELECT
logos.id,
logos.in_gallery,
logos.active,
logos.pubpriv,
logos.logo_name,
logos.logo_image,
(Select COUNT(comments.logo_id) FROM comments
WHERE comments.logo_id = logos.id) AS coms,
(Select COUNT(life_p.logo_id) FROM life_p
WHERE life_p.logo_id = logos.id) AS floats
FROM logos
WHERE logos.active = '1' AND logos.pubpriv = '0'
ORDER BY logos.in_gallery desc
LIMIT ". $start .",". $pageSize ."
Also you can create a mapping tables to speed up your query try:
CREATE TABLE mapping_comments AS
SELECT
comments.logo_id,
count(*) AS Count
FROM
comments
GROUP BY
comments.logo_id
) cc ON cc.logo_id = logos.id
Then change your code
left outer join(
should become
inner join mapping_comments as mp on mp.logo_id =cc.id
Then each time a new comment are added to the cc table you need to update your mapping table OR you can create a stored procedure to do it automatically when your cc table changes

Sub sub-query can't find joined column in parent select

I'm running into some trouble with SQL:
Basically I'm trying to get a result set back that contains a sum of ALL questions asked to employees (grouped by company) and also add the "onetime_items" which are manually added items in a different table.
I currently have this SQL statement (I'm using MySQL):
SELECT
CONCAT_WS(
', ', count(DISTINCT CONCAT(emailaddress, '_', a.id)),
(
SELECT GROUP_CONCAT(items SEPARATOR '; ') as OneTimeItems
FROM (
SELECT CONCAT_WS(
': ', oi.item_name, SUM(oi.item_amount)
) items
FROM onetime_item oi
WHERE oi.company_id = e.company_id
AND oi.date BETWEEN '2015-12-01'
AND LAST_DAY('2015-12-01')
GROUP BY oi.item_name
) resulta
)
) as AllItems,
e.id,
LEFT(e.firstname, 1) as voorletter,
e.lastname
FROM question q
LEFT JOIN employee e ON q.employee_id = e.id
WHERE 1=1
AND YEAR(created_at) = '2015'
AND MONTH(created_at) = '12'
GROUP BY e.company_id
Now I get the following error:
Fatal error: Uncaught exception 'PDOException' with message 'SQLSTATE[42S22]: Column not found: 1054 Unknown column 'e.company_id' in 'where clause'
The dates used are dummy dates.
This column DOES exist in the table employee and the left join works ( I tried entering an id manually instead of using the column reference and it worked, I got the right result back)
Any idea as to why the reference to e.company_id fails?
Thanks to dba.stackexchange.com
link: https://dba.stackexchange.com/questions/126339/subquery-cant-find-column-from-superquerys-join
Answer by ypersillycubeᵀᴹ
Here's the answer that was posted there, hopefully someone else will profit as well from this!
The cause of the problem was identified by #Phil in the comments: in the comments:
Probably because it's nested too deep
You have 2 layers of nesting and the reference of table e cannot "see" through these 2 layers in MySQL.
Correlated inline subquery can usually be converted to derived tables and then LEFT joined to the other tables in the FROM clause but they have to be turned into uncorrelated (in MySQL. In other DBMS, you could use a LATERAL join or the similar OUTER APPLY.
A first rewrite to get the job done:
SELECT
CONCAT_WS(
', ', count(DISTINCT CONCAT(q.emailaddress, '_', e.id)),
dv.OneTimeItems
) as AllItems,
e.id,
LEFT(e.firstname, 1) as voorletter,
e.lastname
FROM question q
LEFT JOIN employee e ON q.employee_id = e.id
LEFT JOIN
(
SELECT company_id,
GROUP_CONCAT(items SEPARATOR '; ') AS OneTimeItems
FROM (
SELECT oi.company_id,
CONCAT_WS(
': ', oi.item_name, SUM(oi.item_amount)
) items
FROM onetime_item oi
WHERE oi.date BETWEEN '2015-12-01'
AND LAST_DAY('2015-12-01')
GROUP BY oi.company_id, oi.item_name
) resulta
GROUP BY company_id
) AS dv
ON dv.company_id = e.company_id
WHERE 1=1
AND YEAR(q.created_at) = '2015'
AND MONTH(q.created_at) = '12'
GROUP BY e.company_id ;
Test in SQLfiddle.
Unrelated to the issue comments:
There is GROUP BY e.company_id while the select list has e.id, LEFT(e.firstname, 1), e.lastname. All these will give arbitrary result from a (more or less random) employee for each company - or even in extremely rare cases arbitrary results from 2 or 3 different employees! MySQL allowed (before 5.7) such bad use of group by that could cause erroneous results. It has been fixed in 5.7 and the default settings would reject this query.
The condition:
YEAR(created_at) = '2015' AND MONTH(created_at) = '12'
cannot make use of indexes. It's better to rewrite with either BETWEEN if the column is of DATE type of with an inclusive-exclusive range condition, which works flawlessly with any datetime type (DATE, DATETIME, TIMESTAMP) of any precision:
-- use only when the type is DATE only
date BETWEEN '2015-12-01' AND LAST_DAY('2015-12-01')
or:
-- use when the type is DATE, DATETIME or TIMESTAMP
created_at >= '2015-12-01' AND created_at < '2016-01-01'

Print all rows from one table with a nested MySQL query

I have a MySQL query that I would like to enhance by requiring that all values of client.client_name are printed out in the result, even if no values are found for every row in that table. The current table shows:
client.client_name
Client A
Client B
Client C
The current MySQL query is below:
SELECT X.expr1 AS 'Project Name', SUM(X.expr2) AS 'Total Hours Logged', X.expr3 - sum(X.expr2) AS 'Monthly Hours Remaining', X.expr4 AS 'Last Day', DATEDIFF(X.expr4 , curdate()) AS 'Days Remaining'
FROM
(SELECT
client.client_name AS expr1
, sum(time_records.value) AS expr2
, client.monthly_hours AS expr3
FROM project_objects
INNER JOIN projects
ON projects.id = project_objects.project_id
INNER JOIN time_records
ON time_records.parent_id = project_objects.id
LEFT JOIN client
ON project_objects.project_id = client.project_id
WHERE time_records.parent_type = 'Task'
AND client.start_day_of_month < dayofmonth(curdate())
AND time_records.state = 3
GROUP BY client.client_name
UNION
SELECT
client.client_name AS expr1
, sum(time_records.value) as expr2
, client.monthly_hours AS expr3
FROM projects
INNER JOIN time_records
ON projects.id = time_records.parent_id
LEFT JOIN client
ON projects.id = client.project_id
WHERE time_records.parent_type = 'Project'
AND client.start_day_of_month < dayofmonth(curdate())
AND time_records.state = 3
GROUP BY client.client_name
) X
GROUP BY X.expr1
ORDER BY DATEDIFF(X.expr4 , curdate()
As you can see from the above query - I added a Left Join for the client table, however it doesn't result in printing out all client records - it only prints those for which there are time_records available. I think this is related to the nesting or the order of how I am writing the joins, but can't seem to figure it out. If you have any ideas it would be greatly appreciated. Thanks!
Please try order as below:
FROM client LEFT JOIN project_objects
ON project_objects.project_id = client.project_id
see SQLFiddle for the final solution here: http://sqlfiddle.com/#!2/30362/16

optimize Mysql: get latest status of the sale

In the following query, I show the latest status of the sale (by stage, in this case the number 3). The query is based on a subquery in the status history of the sale:
SELECT v.id_sale,
IFNULL((
SELECT (CASE WHEN IFNULL( vec.description, '' ) = ''
THEN ve.name
ELSE vec.description
END)
FROM t_record veh
INNER JOIN t_state_campaign vec ON vec.id_state_campaign = veh.id_state_campaign
INNER JOIN t_state ve ON ve.id_state = vec.id_state
WHERE veh.id_sale = v.id_sale
AND vec.id_stage = 3
ORDER BY veh.id_record DESC
LIMIT 1
), 'x') sale_state_3
FROM t_sale v
INNER JOIN t_quarters sd ON v.id_quarters = sd.id_quarters
WHERE 1 =1
AND v.flag =1
AND v.id_quarters =4
AND EXISTS (
SELECT '1'
FROM t_record
WHERE id_sale = v.id_sale
LIMIT 1
)
the query delay 0.0057seg and show 1011 records.
Because I have to filter the sales by the name of the state as it would have to repeat the subquery in a where clause, I have decided to change the same query using joins. In this case, I'm using the MAX function to obtain the latest status:
SELECT
v.id_sale,
IFNULL(veh3.State3,'x') AS sale_state_3
FROM t_sale v
INNER JOIN t_quarters sd ON v.id_quarters = sd.id_quarters
LEFT JOIN (
SELECT veh.id_sale,
(CASE WHEN IFNULL(vec.description,'') = ''
THEN ve.name
ELSE vec.description END) AS State3
FROM t_record veh
INNER JOIN (
SELECT id_sale, MAX(id_record) AS max_rating
FROM(
SELECT veh.id_sale, id_record
FROM t_record veh
INNER JOIN t_state_campaign vec ON vec.id_state_campaign = veh.id_state_campaign AND vec.id_stage = 3
) m
GROUP BY id_sale
) x ON x.max_rating = veh.id_record
INNER JOIN t_state_campaign vec ON vec.id_state_campaign = veh.id_state_campaign
INNER JOIN t_state ve ON ve.id_state = vec.id_state
) veh3 ON veh3.id_sale = v.id_sale
WHERE v.flag = 1
AND v.id_quarters = 4
This query shows the same results (1011). But the problem is it takes 0.0753 sec
Reviewing the possibilities I have found the factor that makes the difference in the speed of the query:
AND EXISTS (
SELECT '1'
FROM t_record
WHERE id_sale = v.id_sale
LIMIT 1
)
If I remove this clause, both queries the same time delay... Why it works better? Is there any way to use this clause in the joins? I hope your help.
EDIT
I will show the results of EXPLAIN for each query respectively:
q1:
q2:
Interesting, so that little statement basically determines if there is a match between t_record.id_sale and t_sale.id_sale.
Why is this making your query run faster? Because Where statements applied prior to subSelects in the select statement, so if there is no record to go with the sale, then it doesn't bother processing the subSelect. Which is netting you some time. So that's why it works better.
Is it going to work in your join syntax? I don't really know without having your tables to test against but you can always just apply it to the end and find out. Add the keyword EXPLAIN to the beginning of your query and you will get a plan of execution which will help you optimize things. Probably the best way to get better results in your join syntax is to add some indexes to your tables.
But I ask you, is this even necessary? You have a query returning in <8 hundredths of a second. Unless this query is getting ran thousands of times an hour, this is not really taxing your DB at all and your time is probably better spent making improvements elsewhere in your application.