How to speed up the sql query? - mysql

I have an SQL query as follows:
SELECT p.Id1,p.Id2,p.Id3
FROM dataset1 p
WHERE p.Id2 IN (
SELECT r.Id4
FROM dataset1 r
WHERE r.Id5=125 AND r.Id6>=100000000000000 AND r.Id6<1000000000000000
)
ORDER BY p.Id1 DESC, p.Id2 DESC
However there appears to be huge amounts of data with Id6 in this range and thus, it takes a quite long time to compute. But I only have one hour to compute the query. I thus, am wondering if someone could help me to improve the performance of this query.
Thanks.

Since the filtering seems to be done on r, arrange for it to be looked at first:
SELECT p.Id1, p.Id2, p.Id3
FROM ( SELECT id4
FROM dataset1 AS r
WHERE r.id5 = 125
AND r.Id6 >= 100000000000000
AND r.Id6 < 100000000000000 ) AS x
JOIN dataset1 AS p ON p.id2 = x.id4
ORDER BY p.Id1 DESC, p.Id2 DESC;
For that, these indexes should be beneficial:
INDEX(id5, id6, id4) -- covering
INDEX(id2, id1, id3) -- covering
You have a "range" test on id6, yet the range is empty. I assume that was a mistake. Please don't simplify a query too much; we might give you advice that does not apply. I am assuming that the range is really a range.

IN tend to optimize poorly when the subquery returns a lot of data. You can try using EXISTS instead:
SELECT p.Id1, p.Id2, p.Id3
FROM dataset1 p
WHERE EXISTS (
SELECT 1
FROM dataset1 r
WHERE
r.Id4 = p.Id2
AND r.Id5 = 125
AND r.Id6 >= 100000000000000
AND r.Id6 < 100000000000000
)
ORDER BY p.Id1 DESC, p.Id2 DESC
Then, consider a multi-column index on (Id4, Id5, Id6) to speed up the subquery. The idea is to put the more restrictive criteria first - so obviously you want Id6 last, but you might want to try inverting the first two columns to see if any combination performs better than the other.
Side note: both the lower and upper bound for Id6 have the same value in your query. I take this as a typo (otherwise your query would always return no row).

To improve performance don't use an inner query. You can get you desired result by using an inner join too:
SELECT
p.Id1, p.Id2, p.Id3
FROM
dataset1 p
INNER JOIN
dataset1 r ON p.Id2 = r.Id4
AND r.Id5 = 125
AND r.Id6 >= 100000000000000
AND r.Id6 < 100000000000000
ORDER BY
p.Id1 DESC, p.Id2 DESC

Related

SQL query needs optimization

SELECT LM.user_id,LM.users_lineup_id, min( LM.total_score ) AS total_score
FROM vi_lineup_master LM JOIN
vi_contest AS C
ON C.contest_unique_id = LM.contest_unique_id join
(SELECT min( total_score ) as total_score
FROM vi_lineup_master
GROUP BY group_unique_id
) as preq
ON LM.total_score = preq.total_score
WHERE LM.contest_unique_id = 'iledhSBDO' AND
C.league_contest_type = 1
GROUP BY group_unique_id
Above query is to find the loser per group of game, query return accurate result but its not responding with large data. How can I optimize this?
You can try to move your JOINs to subqueries. Also, you should pay attention on your "wrong" GROUP BY usage on the outer query. In Mysql you can group by some columns and select others not specified in the group clause without any aggregation function, but the database can't ensure what data it will return to you. For the sake of consistency of your application, wrap them in an aggregation function.
Check if this one helps:
SELECT
MIN(LM.user_id) AS user_id,
MIN(LM.users_lineup_id) AS users_lineup_id,
MIN(LM.total_score) AS total_score
FROM vi_lineup_master LM
WHERE 1=1
-- check if this "contest_unique_id" is equals
-- to 'iledhSBDO' for a "league_contest_type" valued 1
AND LM.contest_unique_id IN
(
SELECT C.contest_unique_id
FROM vi_contest AS C
WHERE 1=1
AND C.contest_unique_id = 'iledhSBDO'
AND C.league_contest_type = 1
)
-- check if this "total_score" is one of the
-- "min(total_score)" from each "group_unique_id"
AND LM.total_score IN
(
SELECT MIN(total_score)
FROM vi_lineup_master
GROUP BY group_unique_id
)
GROUP BY LM.group_unique_id
;
Also, some pieces of this query may seem redundant, but it's because I did not want to change the filters you wrote, just moved them.
Also, your query logic seems a bit strange to me, based on the tables/columns names and how you wrote it... please, check the comments in my query which reflects what I understood of your implementation.
Hope it helps.

How to fix SQL query with Left Join and subquery?

I have SQL query with LEFT JOIN:
SELECT COUNT(stn.stocksId) AS count_stocks
FROM MedicalFacilities AS a
LEFT JOIN stocks stn ON
(stn.stocksIdMF = ( SELECT b.MedicalFacilitiesIdUser
FROM medicalfacilities AS b
WHERE b.MedicalFacilitiesIdUser = a.MedicalFacilitiesIdUser
ORDER BY stn.stocksId DESC LIMIT 1)
AND stn.stocksEndDate >= UNIX_TIMESTAMP() AND stn.stocksStartDate <= UNIX_TIMESTAMP())
These query I want to select one row from table stocks by conditions and with field equal value a.MedicalFacilitiesIdUser.
I get always count_stocks = 0 in result. But I need to get 1
The count(...) aggregate doesn't count null, so its argument matters:
COUNT(stn.stocksId)
Since stn is your right hand table, this will not count anything if the left join misses. You could use:
COUNT(*)
which counts every row, even if all its columns are null. Or a column from the left hand table (a) that is never null:
COUNT(a.ID)
Your subquery in the on looks very strange to me:
on stn.stocksIdMF = ( SELECT b.MedicalFacilitiesIdUser
FROM medicalfacilities AS b
WHERE b.MedicalFacilitiesIdUser = a.MedicalFacilitiesIdUser
ORDER BY stn.stocksId DESC LIMIT 1)
This is comparing MedicalFacilitiesIdUser to stocksIdMF. Admittedly, you have no sample data or data layouts, but the naming of the columns suggests that these are not the same thing. Perhaps you intend:
on stn.stocksIdMF = ( SELECT b.stocksId
-----------------------------^
FROM medicalfacilities AS b
WHERE b.MedicalFacilitiesIdUser = a.MedicalFacilitiesIdUser
ORDER BY b.stocksId DESC
LIMIT 1)
Also, ordering by stn.stocksid wouldn't do anything useful, because that would be coming from outside the subquery.
Your subquery seems redundant and main query is hard to read as much of the join statements could be placed in where clause. Additionally, original query might have a performance issue.
Recall WHERE is an implicit join and JOIN is an explicit join. Query optimizers
make no distinction between the two if they use same expressions but readability and maintainability is another thing to acknowledge.
Consider the revised version (notice I added a GROUP BY):
SELECT COUNT(stn.stocksId) AS count_stocks
FROM MedicalFacilities AS a
LEFT JOIN stocks stn ON stn.stocksIdMF = a.MedicalFacilitiesIdUser
WHERE stn.stocksEndDate >= UNIX_TIMESTAMP()
AND stn.stocksStartDate <= UNIX_TIMESTAMP()
GROUP BY stn.stocksId
ORDER BY stn.stocksId DESC
LIMIT 1

query with LEFT JOIN and ORDER BY...LIMIT slow, uses Filesort

I have the following query:
SELECT
fruit.date,
fruit.name,
fruit.reason,
fruit.id,
fruit.notes,
food.name
FROM
fruit
LEFT JOIN
food_fruits AS ff ON fruit.fruit_id = ff.fruit_id AND ff.type='fruit'
LEFT JOIN
food USING (food_id)
LEFT JOIN
fruits_sour AS fs ON fruits.id = fs.fruit_id
WHERE
(fruit.date < DATE_SUB(NOW(), INTERVAL 180 DAY))
AND (fruit.`status` = 'Rotten')
AND (fruit.location = 'USA')
AND (fruit.size = 'medium')
AND (fs.fruit_id IS NULL)
ORDER BY `food.name` asc
LIMIT 15 OFFSET 0
And all the indexes you could ever want, including the following which are being used:
fruit - fruit_filter (size, status, location, date)
food_fruits - food_type (type)
food - food (id)
fruits_sour - fruit_id (fruit_id)
I even have indexes which I thought would work better which are not being used:
food_fruits - fruit_key (fruit_id, type)
food - id_name (food_id, name)
The ORDER BY clause is causing a temporary table and filesort to be used, unfortunately. Without that, the query runs lickety-split. How can I get this query to not need to filesort? What am I missing?
EDIT:
The Explain:
The reason for this is your ORDER BY clause which is done on the field which is not part of index used for this query. The engine can run the query using the fruit_filter index, but then it has to sort on the different field, and that's when filesort comes into play (which basically means "sort without using index", thanks to the reminder in comments).
I don't know what times you are getting as a result, but if the difference is a lot, then I would create a temporary table with intermediate results and sorted it afterwards.
(By the way, i am not sure why do you use LEFT JOIN instead of INNER JOIN and why do you use food_fruits - answered in comments)
UPDATE.
Try subquery approach, may be (untested), which splits sorting from pre-filtering:
SELECT
fr.date,
fr.name,
fr.reason,
fr.id,
fr.notes,
food.name
FROM
(
SELECT
fruit.date,
fruit.name,
fruit.reason,
fruit.id,
fruit.notes,
FROM
fruit
LEFT JOIN
fruits_sour AS fs ON fruit.id = fs.fruit_id
WHERE
(fruit.date < DATE_SUB(NOW(), INTERVAL 180 DAY))
AND (fruit.`status` = 'Rotten')
AND (fruit.location = 'USA')
AND (fruit.size = 'medium')
AND (fs.fruit_id IS NULL)
) as fr
LEFT JOIN
food_fruits AS ff ON fr.fruit_id = ff.fruit_id AND ff.type='fruit'
LEFT JOIN
food USING (food_id)
ORDER BY `food.name` asc
LIMIT 15 OFFSET 0
Your ORDER BY ... LIMIT clauses require some sorting, you know. The trick to optimizing performance is to ORDER BY ... LIMIT the minimal set of columns, and then build your full result set based on the chosen fifteen rows. So let's try for a minimal set of columns in a subquery.
SELECT fruit.id,
food.name
FROM fruit
LEFT JOIN food_fruits AS ff ON fruit.fruit_id = ff.fruit_id
AND ff.type='fruit'
LEFT JOIN food USING (food_id)
LEFT JOIN fruits_sour AS fs ON fruits.id = fs.fruit_id
WHERE fruit.date < DATE_SUB(NOW(), INTERVAL 180 DAY)
AND fruit.`status` = 'Rotten'
AND fruit.location = 'USA'
AND fruit.size = 'medium'
AND fs.fruit_id IS NULL
ORDER BY food.name ASC
LIMIT 15 OFFSET 0
This query gives you the fifteen top ids and their names.
I would add id to the end of your existing fruit_filter index to give (size, status, location, date, id). That will make it into a compound covering index, and allow your filtering query to be satisfied entirely from the index.
Other than that, it's going to be hard to optimize this using more or different indexes because so much of the query is driven by other factors, like the LEFT JOIN ... IS NULL join-fail criterion you have applied.
Then you can join this subquery to your fruits table to pull the full result set.
That will look like this when it's all done.
SELECT fruit.date,
fruit.name,
fruit.reason,
fruit.id,
fruit.notes,
list.name
FROM fruit
JOIN (
SELECT fruit.id,
food.name
FROM fruit
LEFT JOIN food_fruits AS ff ON fruit.fruit_id = ff.fruit_id
AND ff.type='fruit'
LEFT JOIN food USING (food_id)
LEFT JOIN fruits_sour AS fs ON fruits.id = fs.fruit_id
WHERE fruit.date < DATE_SUB(NOW(), INTERVAL 180 DAY)
AND fruit.`status` = 'Rotten'
AND fruit.location = 'USA'
AND fruit.size = 'medium'
AND fs.fruit_id IS NULL
ORDER BY food.name ASC
LIMIT 15 OFFSET 0
) AS list ON fruit.id = list.id
ORDER BY list.name
Do you see how this goes? In the subquery you sling around just enough data to identify which tiny subset of rows you want to retrieve. Then, you join that subquery to your main table to pull out all your data. Limiting the row length in the stuff you have to sort helps performance because MySQL can sort it its sort buffer, rather than having to do a more elaborate and slower sort / merge operation. (But, you can't tell from EXPLAIN whether it will do this or not.)

Subquery's WHERE needs fields from main query

have been struggling with this for the whole day and would like to ask for help how to make this query happen. My problem is that in INNER JOIN subquery's WHERE parts I need to use matching values from each GC table row being processed and obviously subquery doesnt know nothing about main query that's why it fails. Hopefully you'll catch the idea what i am trying to acomplish here:
SET #now=100; #sunix datetime
SELECT a.id, b.maxdate
FROM GC AS a
INNER JOIN (
SELECT 0 id_group, MAX(dt_active_to) AS maxdate
FROM GCDeals
WHERE dt_active_from > a.dt_lastused AND dt_active_from <= #now
GROUP BY id_group
UNION ALL
SELECT 1 id_group, MAX(dt_active_to) AS maxdate
FROM GCDeals
WHERE id_group <> 2 AND dt_active_from > a.dt_lastused AND dt_active_from <= #now
UNION ALL
SELECT 2 id_group, MAX(dt_active_to) AS maxdate
FROM GCDeals
WHERE id_group <> 1 AND dt_active_from > a.dt_lastused AND dt_active_from <= #now
GROUP BY id_group
) AS b
ON a.id_group = b.id_group
LEFT JOIN GCMsg AS c
ON a.id = c.id_gc
WHERE a.is_active = 1 AND a.dt_lastused < #now AND c.id_gc IS NULL
ORDER BY a.id
Thank you
Okay, I hope I have understood your original SQL now. You want all GC with the last appropriate max date. What you consider appropriate depends both on gc.dt_lastused and on gc.id_group. So rather than joining the tables together, you should select the max date per record in a subquery:
select id,
(
select max(dt_active_to)
from gcdeals
where dt_active_from > gc.dt_lastused and dt_active_from <= #now
and
(
gc.id_group = 0)
or
(gc.id_group = 1 and gcdeals.id_group <> 2)
or
(gc.id_group = 2 and gcdeals.id_group <> 1)
)
) as maxdate
from gc
where is_active = 1 and dt_lastused < #now
and id not in (select id_gc from gcmsg)
order by id;
EDIT: Here is the same statement using a join, offering to select max(dt_active_from) and min(dt_active_to) in one pass:
select gc.id, max(gcd.dt_active_from), min(gcd.dt_active_to)
from gc
left outer join gcdeals gcd
on gc.id = gcd.id_gc
and gcd.dt_active_from > gc.dt_lastused and gcd.dt_active_from <= #now
and
(
gc.id_group = 0)
or
(gc.id_group = 1 and gcd.id_group <> 2)
or
(gc.id_group = 2 and gcd.id_group <> 1)
)
where gc.is_active = 1 and gc.dt_lastused < #now
group by gc.id
order by gc.id;
You see, once you found out how to select the desired value in a subselect, it is not too hard to change it into a join. You get what you are looking for in two steps. If on the other hand you start with thinking in joins the same task can be quite abstract.
As to the execution plan: Say GC has 1000 active records and there are usually about 10 appropriate matches in GCDeals. Then the first statement selects 1,000 records and uses a loop on each record to access the GCDeals aggregate value. The second statement would just join 1,000 GC records with 10 GCDeals records each, thus getting 10,000 records, then aggregate them to make it 1,000 records again. Maybe the loops are faster, maybe the join. This depends. But, say, GC has one million active records and on each record you expect 1000 GCDeals matches, then the first statement may be quite slow having to loop so many times. But the second statement will create a billion intermediate records which can cause memory problems and either lead to very slow execution, too, or even lead to an unsufficient memory error. So it's just good to know that both techniques are available.

How to limiting subquery requests to one?

I was thinking a way to using one query with a subquery instead of using two seperate queries.
But turns out using a subquery is causing multiple requests for each row in result set. Is there a way to limit that count subquery result only one with in a combined query ?
SELECT `ad_general`.`id`,
( SELECT count(`ad_general`.`id`) AS count
FROM (`ad_general`)
WHERE `city` = 708 ) AS count,
FROM (`ad_general`)
WHERE `ad_general`.`city` = '708'
ORDER BY `ad_general`.`id` DESC
LIMIT 15
May be using a join can solve the problem but dunno how ?
SELECT ad_general.id, stats.cnt
FROM ad_general
JOIN (
SELECT count(*) as cnt
FROM ad_general
WHERE city = 708
) AS stats
WHERE ad_general.city = 708
ORDER BY ad_general.id DESC
LIMIT 15;
The explicit table names aren't required, but are used both for clarity and maintainability (the explicit table names will prevent any imbiguities should the schema for ad_general or the generated table ever change).
You can self-join (join the table to itself table) and apply aggregate function to the second.
SELECT `adgen`.`id`, COUNT(`adgen_count`.`id`) AS `count`
FROM `ad_general` AS `adgen`
JOIN `ad_general` AS `adgen_count` ON `adgen_count`.city = 708
WHERE `adgen`.`city` = 708
GROUP BY `adgen`.`id`
ORDER BY `adgen`.`id` DESC
LIMIT 15
However, it's impossible to say what the appropriate grouping is without knowing the structure of the table.