How can i get rif of theses subqueries ?
(all tables have columns Created and LastEdited as timestamps)
table Process
- ID
- Title
table ProcessHistory
- ID
- ProcessID
- HistoryID
table History
- ID
- Title (new, open, closed etc.)
When i try to get a list of processes with cols of the last status title i do:
SELECT DISTINCT Process.*, History.Title AS HistoryTitle, History.ID AS HistoryID
FROM `Process`
LEFT JOIN ProcessHistory AS ProcessHistory ON Process.ID=ProcessHistory.ProcessID
LEFT JOIN History AS History ON HistoryID=ProcessHistory.HistoryID
WHERE History.ID = (
SELECT HistoryID FROM ProcessHistory
WHERE ProcessID=Process.ID
ORDER BY ProcessHistory.ID DESC LIMIT 1
)
GROUP BY Process.ID
ORDER BY Process.ID DESC LIMIT 0, 100
When i try to get a list filtered by a specific status
(Where the latest HistoryID is 1 - "all open Processes")
SELECT DISTINCT Process.*, History.Title AS HistoryTitle, History.ID AS HistoryID
FROM `Process`
LEFT JOIN ProcessHistory AS ProcessHistory ON Process.ID=ProcessHistory.ProcessID
LEFT JOIN History AS History ON HistoryID=ProcessHistory.HistoryID
WHERE History.ID=(
SELECT HistoryID FROM ProcessHistory
WHERE HistoryID =1
AND ProcessID=Process.ID ORDER BY ProcessHistory.ID DESC LIMIT 1
)
GROUP BY Process.ID
ORDER BY Process.ID DESC LIMIT 0, 100
For performance reasons i want to get rid of theses subqueries?
How can i replace the subquery?
Thanks in advance !
Query you posted not give proper result as per ORDER BY ProcessHistory.ID DESC LIMIT 1
Try below query as per result of your query
SELECT DISTINCT p.*, h.Title AS HistoryTitle,h.ID AS HistoryID
FROM
Process p JOIN ProcessHistory ph ON p.ID=ph.ProcessID and ph.HistoryID=1
JOIN History h ON h.ID=ph.HistoryID
GROUP BY p.ID
ORDER BY p.ID DESC LIMIT 0, 100;
Here the sql fiddle:
sql fiddle link
If you want other result then comment.
You'll still need to make another select to filter out the rest of the process history, but you should use a derived table instead of a subquery, like this sqlfiddle:
SELECT Process.*, History.Title AS HistoryTitle
FROM Process
JOIN (
SELECT ProcessID, max(HistoryID) as HistoryID
FROM ProcessHistory
GROUP BY ProcessID
) PH ON PH.ProcessID = Process.ID
JOIN History ON History.ID = PH.HistoryID
ORDER BY Process.ID DESC LIMIT 0, 100
As Himanshu Patel pointed out, your query to "get a list filtered by a specific status (Where the latest HistoryID is 1 - "all open Processes")" will not produce the desired effect. It will simply return all processes that have a HistoryID of 1. See this sqlfiddle.
Instead, you want to use a derived table to get those latest process history ids, join them with the processes, and filter on the history ids like this sqlfiddle:
SELECT DISTINCT Process.*, History.Title AS HistoryTitle
FROM Process
JOIN (
SELECT ProcessID, max(HistoryID) as HistoryID
FROM ProcessHistory
GROUP BY ProcessID
) PH ON PH.ProcessID = Process.ID
JOIN History ON History.ID = PH.HistoryID
WHERE PH.HistoryID = 1
ORDER BY Process.ID DESC LIMIT 0, 100
An alternative approach would be to create a view that filters the ProcessHistory table to the latest history per process and join on that. YMMV, but in some cases, performance can be improved that way.
SELECT *
FROM Process AS p
LEFT JOIN (
SELECT ph.ProcessID, ph.HistoryID, h.Title AS HistoryTitle
FROM ProcessHistory AS ph
JOIN History AS h ON ph.HistoryID = h.ID
WHERE h.ID = 1
) AS phh ON p.ID = phh.ProcessID
ORDER BY p.ID DESC LIMIT 100
Due to the actual subquery, used to get "last" row related to each "process", you can neither convert it to a JOIN nor can you use it as a separate query to set a session variable.
Related
I need your help on deciding which query to use since we are facing performance issue with MySQL joins and Subqueries.
The problem is that I'm trying to find out user's 'first order date' while they should fit certain conditions:
order_status = 1(completed) or order_status = 2(canceled)
The Tables are tb_order and tb_user; All the columns that contain a 'time' are using Unix Time Stamp.
The result I need looks like this:
order_id
user_id
user_1st_order_date
1
47
1666876594
2
982
1667095997
Option 1: JOIN
Select
o.id as 'order_id',
u.id as 'user_id',
ox.create_time as 'user_1st_order_date'
from
tb_order o
left join tb_user u on o.user_id = u.id
/* here I have about 10 joins */
left join
(
select
ux.id,
ox.create_time
from
tb_user u
left join tb_order ox on ox.user_id = u.id
where
( ox.order_status = 1 or ox.order_status = 2 )
/* Orders can be (completed) or (canceled) */
group by
ux.id
) x on x.id = u.id
/* The thought here is by using group by `ux.id` I will get the
user's earliest completed or canceled order and it's `create_time`
then this can be used to `join` the order info */
where
o.create_time != 0
and
( o.order_status = 1 or o.order_status = 2 )
group by
o.id
Option 2: Subquery
Select
o.id as 'order_id',
u.id as 'user_id',
(
select
ox.create_time
from
tb_order ox
where
(ox.order_status = 1 or ox.order_status = 2)
and
ox.user_id = u.id
order by
ox.id asc
limit 1
) as 'user_1st_order_date'
from
tb_order o
left join tb_user u on o.user_id = u.id
/* here I have about 10 joins */
where
o.create_time != 0
and
( o.order_status = 1 or o.order_status = 2 )
group by
o.id
/* Option 1 stopped working somehow yesterday and start to give me the latest order time instead, and I don't know why. Though I can get the correct date back by putting 'Min()' in front of the ox.create_time */
left join
(
select
ux.id,
Min(ox.create_time)
Both worked but I'm trying to find the most efficient one since I'll use this on a daily basis to update our data source for Tableau Online.
Many thanks in advance.
Just looking at query 1, you have set out a crazy set of table relationships.
Starting with the Select in parentheses, you have a Left Join that implies there are users without orders. That's OK, but your Where filter is based solely on order status, which is NULL when there is no order, so all such users will be filtered out. There is no useful purpose being served by joining the tb_user table and it can be omitted from that subquery.
In the outer query the Left join of tb_order to tb_user implies there are orders without users, but then joining the subquery using u.id instead of o.userid guarantees that nothing from the subquery will be usable in that case. Once again, there is no purpose served in bring tb_user in there either.
To get the desired result set you set out above, you can vastly simplify things by looking only at the tb_order table like Option 3 below:
Option 3
Select * From (
Select id as 'order_id', user_id as 'user_id'
,min(Case When order_status In (1,2) Then create_time End)
Over (Partition By user_id
Between unbounded preceding And unbounded following)
AS 'user_1st_order_date'
From tb_order
)
Where order_status in (1,2)
Order by order_id
This can be further simplified by moving the Where order_status in (1,2) inside the inner query and removing the Case statement around the created_date, but it's less adaptable to use within other queries.
I guess I can't explain my problem properly. I want to explain this to you with a picture.
Picture 1
In the first picture you can see the hashtags in the trend section. These hashtags are searched for the highest total and it is checked whether the date has passed. If valid data is available, the first 5 hashtags are taken.
Picture 2
In the second picture, it is checked whether the posts in the hashtag are in the post, if any, the oldest date value is taken, LIMIT is set to 1 and the id value from the oyuncular table is matched with sid. Thus, the name of the person sharing can be accessed.
Picture 3
My English is a little bad, I hope I could explain it properly.
SELECT
social_trend.hashtag,
social_trend.total,
social_trend.tarih,
social_post.sid,
social_post.tarih,
social_post.post,
oyuncular.id,
oyuncular.isim
FROM
social_trend
INNER JOIN
social_post
ON
social_post.post LIKE '%social_trend.hashtag%' ORDER BY social_post.tarih LIMIT 1
INNER JOIN
oyuncular
ON
oyuncular.id = social_post.sid
WHERE
social_trend.tarih > UNIX_TIMESTAMP() ORDER BY social_trend.total DESC LIMIT 5
YOu should use a sibquery
and add a proper join between subqiery and social_trend
(i assumed sing both sid)
SELECT
social_trend.hashtag,
social_trend.total,
social_trend.tarih,
t.sid,
t.tarih,
t.post,
oyuncular.id,
oyuncular.isim
FROM (
select social_post.*
from social_post
INNER JOIN social_trend ON social_post.post LIKE concat('%',social_trend.hashtag,'%' )
ORDER BY social_post.tarih LIMIT 1
) t
INNER JOIN social_trend ON social_trend.hashtag= t.post
INNER JOIN oyuncular ON oyuncular.id = t.sid
WHERE
social_trend.tarih > UNIX_TIMESTAMP() ORDER BY social_trend.total DESC LIMIT 5
but looking to your new explanation and img seems you need
SELECT
t.hashtag,
t.total,
t.tarih_trend,
t.sid,
t.tarih,
t.post,
oyuncular.id,
oyuncular.isim
FROM (
select social_post.sid
, social_post.tarih
, social_post.post
, st.hashtag
, st.total
, st.tarih tarih_trend
from social_post
INNER JOIN (
select * from social_trend
WHERE social_trend.tarih > UNIX_TIMESTAMP()
order by total DESC LIMIT 5
) st ON social_post.post LIKE concat('%',st.hashtag,'%' )
ORDER BY social_post.tarih LIMIT 5
) t
INNER JOIN oyuncular ON oyuncular.id = t.sid
This almost seems like a scope issue- the select statement in the subquery doesn't recognize table 'candidate':
SELECT
candidate.id AS id,
candidate.image AS image,
candidate.name AS name,
candidate.party AS party,
player.order AS player_order,
c_pcts.pct AS pct
FROM `candidate`
INNER JOIN players player ON player.candidate_id = candidate.id
INNER JOIN lineups lineup ON player.lineup_id = lineup.id
INNER JOIN (
SELECT
pct
FROM candidate_pcts p
INNER JOIN weekly_game game ON p.weekly_game_id = (
SELECT id FROM weekly_game ORDER BY date DESC LIMIT 1
) WHERE p.candidate_id = candidate.id
) c_pcts
WHERE lineup.id = '31'
ORDER BY player.order ASC
gives the error: "Unknown column 'candidate.id' in 'where clause'." If instead of "FROM candidate_pcts p" I put
FROM candidate_pcts p, candidate c
then it doesn't see 'p.weekly_game_id' ...huh?
Seems like I need to identify the 'candidate' table for the subquery somehow but everything I'm trying leads me only further astray. And I have tried a mess of things: order of the tables, explicitly identifying them everywhere i could think of, backticks. I should note that the nested subquery works like a charm. Here it is again:
SELECT
pct
FROM `candidate_pcts`
INNER JOIN weekly_game game ON candidate_pcts.weekly_game_id = (
SELECT id FROM weekly_game ORDER BY date DESC LIMIT 1
) WHERE candidate_pcts.candidate_id = '5'
with a hardcoded an id value there, of course. I can supply database structure if needed here, but this is long already. The 'weekly_game' table is simply a set of scores for each candidate each week and we only want the most recent week's score, thus the 'ORDER BY date DESC LIMIT 1' clause.
Thanks very much for your time.
Tables:
table candidate: {id, image, name, party}
table candidate_pcts: {id, candidate_id, pct, weekly_game_id}
table lineups: {id, date, user_id}
table players: {id,candidate_id,lineup_id,order}
table weekly_game: {id,date}
You are basically on the right track around the problem. In essence the nested sub-select does not know about candidate.id. It you break apart the query and just look at the sub-select in question:
SELECT
pct
FROM candidate_pcts p
INNER JOIN weekly_game game ON p.weekly_game_id = (
SELECT id FROM weekly_game ORDER BY date DESC LIMIT 1
) WHERE p.candidate_id = candidate.id
You can see there is NO reference whatsoever in that query to the candidate table other than in your where clause, thus this is an unknown column.
Since a subselect is, in essence, made before the outer select that references it, the subselect must be a standalone, executable query.
Thanks to all, especially Mike for that excellent explanation. What I did was restructured the query like so:
SELECT
candidate.id AS id,
candidate.image AS image,
candidate.name AS name,
candidate.party AS party,
player.order AS player_order,
pcts.pct AS pct
FROM `candidate`
INNER JOIN players player ON player.candidate_id = candidate.id
INNER JOIN lineups lineup ON player.lineup_id = lineup.id
LEFT JOIN (
SELECT
p.candidate_id AS pct_id, pct AS pct
FROM candidate_pcts p
INNER JOIN weekly_game game ON p.weekly_game_id = (
SELECT id FROM weekly_game ORDER BY date DESC LIMIT 1
)
) pcts
ON pct_id = candidate.id
WHERE lineup.id = '$lineup_id'
ORDER BY player.order ASC
im trying to get my query to group rows by month and year from the assignments table, and count the number of rows that has a certain value from the leads table. they are linked together as the assignments table has an id_lead field, which is the id of the row in the leads table.
d_new would be a count of the assignments for leads for the month whose website is newsite.com
d_subprime would be a count of the assignments for leads for the month whose website is not newsite.com
here are the tables being used:
`leads`
id (int)
website (varchar)
`assignments`
id_lead (int)
date_assigned (int)
heres my query which is not working:
SELECT
MONTHNAME(FROM_UNIXTIME(a.date_assigned)) as d_month,
YEAR(FROM_UNIXTIME(a.date_assigned)) as d_year,
(select COUNT(*) from leads where website='newsite.com' ) as d_new,
(select COUNT(*) from leads where website!='newsite.com') as d_subprime
FROM assignments as a
left join leads as l on (l.id = a.id_lead)
where id_dealership='$id_dealership2'
GROUP BY
d_month,
d_year
ORDER BY
d_year asc,
MONTH(FROM_UNIXTIME(a.date_assigned)) asc
$id_dealership is a variable containing a id of the dealership im trying to view the count for.
any help would be greatly appreciated.
You can sort of truncate your timestamps to months and use the obtained values for grouping, then derive the necessary date parts from them:
SELECT
YEAR(d_yearmonth) AS d_year,
MONTHNAME(d_yearmonth) AS d_month,
…
FROM (
SELECT
LAST_DAY(FROM_UNIXTIME(a.date_assigned)) as d_yearmonth,
…
FROM assignments AS a
LEFT JOIN leads AS l ON (l.id = a.id_lead)
WHERE id_dealership = '$id_dealership2'
GROUP BY
d_yearmonth
) AS s
ORDER BY
d_year ASC,
MONTH(d_yearmonth) ASC
Well, LAST_DAY() doesn't really truncate a timestamp, but it does turn all the values belonging to the same month into the same value, which is basically what we need.
And I guess the counts should be related to the rows you are actually selecting, which is not what your subqueries are. Something like this might do:
…
COUNT(d.website = 'newsite.com' OR NULL) AS d_new,
/* or: COUNT(d.website) - COUNT(NULLIF(d.website, 'newsite.com')) AS d_new */
COUNT(NULLIF(d.website, 'newsite.com')) AS d_subprime
…
Here's the entire query with all the modifications mentioned:
SELECT
YEAR(d_yearmonth) AS d_year,
MONTHNAME(d_yearmonth) AS d_month,
d_new,
d_subprime
FROM (
SELECT
LAST_DAY(FROM_UNIXTIME(a.date_assigned)) as d_yearmonth,
COUNT(d.website = 'newsite.com' OR NULL) AS d_new,
COUNT(NULLIF(d.website, 'newsite.com')) AS d_subprime
FROM assignments AS a
LEFT JOIN leads AS l ON (l.id = a.id_lead)
WHERE id_dealership = '$id_dealership2'
GROUP BY
d_yearmonth
) AS s
ORDER BY
d_year ASC,
MONTH(d_yearmonth) ASC
This should do the trick:
SELECT
YEAR(FROM_UNIXTIME(a.date_assigned)) as d_year,
MONTHNAME(FROM_UNIXTIME(a.date_assigned)) as d_month,
l.website,
COUNT(*)
FROM
assignments AS a
INNER JOIN leads AS l on (l.id = a.id_lead) /*are you sure, that you need a LEFT JOIN?*/
WHERE id_dealership='$id_dealership2'
GROUP BY
d_year, d_month, website
/*an ORDER BY is not necessary, MySQL does that automatically when grouping*/
If you really need a LEFT JOIN, be aware that COUNT() ignores NULL values. If you want to count those as well (which I can't imagine to make sense) write it like this:
SELECT
YEAR(FROM_UNIXTIME(a.date_assigned)) as d_year,
MONTHNAME(FROM_UNIXTIME(a.date_assigned)) as d_month,
l.website,
COUNT(COALESCE(l.id, 1))
FROM
assignments AS a
LEFT JOIN leads AS l on (l.id = a.id_lead)
WHERE id_dealership='$id_dealership2'
GROUP BY
d_year, d_month, website
Start with
SELECT
MONTHNAME(FROM_UNIXTIME(a.date_assigned)) as d_month,
YEAR(FROM_UNIXTIME(a.date_assigned)) as d_year,
SUM(IF(l.website='newsite.com',1,0) AS d_new,
SUM(IF(l.website IS NOT NULL AND l.website!='newsite.com',1,0) AS d_subprime
FROM assignments AS a
LEFT JOIN leads AS l ON l.id = a.id_lead
WHERE id_dealership='$id_dealership2'
GROUP BY
d_month,
d_year
ORDER BY
d_year asc,
MONTH(FROM_UNIXTIME(a.date_assigned)) asc
and work from here: The field id_dealership is neither in leads nor in assignments, so you need more work.
If you edit your question to account for id_dealership we might be able to help you further.
I have the following query, but after some time when users start putting in more and more items in the "ci_falsepositives" table, it gets really slow.
The ci_falsepositives table contains a reference field from ci_address_book and another reference field from ci_matched_sanctions.
How can I create a new query but still being able to sort on each field.
For example I can still sort on "hits" or "matches"
SELECT *, matches - falsepositives AS hits
FROM (SELECT c.*, IFNULL(p.total, 0) AS matches,
(SELECT COUNT(*)
FROM ci_falsepositives n
WHERE n.addressbook_id = c.reference
AND n.sanction_key IN
(SELECT sanction_key FROM ci_matched_sanctions)
) AS falsepositives
FROM ci_address_book c
LEFT JOIN
(SELECT addressbook_id, COUNT(match_id) AS total
FROM ci_matched_sanctions
GROUP BY addressbook_id) AS p
ON c.id = p.addressbook_id
) S
ORDER BY folder asc, wholename ASC
LIMIT 0,15
The problem has to be the SELECT COUNT(*) FROM ci_falsepositives sub-query. That sub-query can be written using an inner join between ci_falsepositives and ci_matched_sanctions, but the optimizer might do that for you anyway. What I think you need to do, though, is make that sub-query into a separate query in the FROM clause of the 'next query out' (that is, SELECT c.*, ...). Probably, that query is being evaluated multiple times - and that's what's hurting you when people add records to ci_falsepositives. You should study the query plan carefully.
Maybe this query will be better:
SELECT *, matches - falsepositives AS hits
FROM (SELECT c.*, IFNULL(p.total, 0) AS matches, f.falsepositives
FROM ci_address_book AS c
JOIN (SELECT n.addressbook_id, COUNT(*) AS falsepositives
FROM ci_falsepositives AS n
JOIN ci_matched_sanctions AS m
ON n.sanction_key = m.sanction_key
GROUP BY n.addressbook_id
) AS f
ON c.reference = f.addressbook_id
LEFT JOIN
(SELECT addressbook_id, COUNT(match_id) AS total
FROM ci_matched_sanctions
GROUP BY addressbook_id) AS p
ON c.id = p.addressbook_id
) AS s
ORDER BY folder asc, wholename ASC
LIMIT 0, 15