MYSQL remove NOT IN subquery on table itself - mysql

I have the following query which displays a list of accounts with a certain margin level:
SELECT
crm_margincall.id,
crm_margincall.CreationTime,
ba.name AS crm_bankaccount_id,
crm_margincall.name,
crm_margincall.MarginCallLevel,
crm_margincall.UseOfEquityForMargin,
crm_margincall.MarginRequired,
crm_margincall.NetEquityForMargin,
crm_margincall.MarginDeficit,
crm_margincall.balance,
crm_margincall.deposited,
crm_margincall.prefunded,
crm_margincall.required
FROM
crm_margincall
LEFT JOIN
crm_bankaccount ba ON crm_margincall.crm_bankaccount_id = ba.id
WHERE
crm_margincall.name = 'MarginCall'
AND
crm_margincall.MarginCallLevel >= 100
AND
crm_margincall.crm_account_id NOT IN
(
SELECT
x.crm_account_id
FROM
crm_margincall x
WHERE
x.crm_account_id = crm_margincall.crm_account_id
AND
x.name = 'LevelDrop'
AND
x.MarginCallLevel < 100
AND
x.id > crm_margincall.id
)
ORDER BY
id
DESC
This query, on a table of ~22.500 records takes >10 seconds to run, this is caused by the subquery defining the NOT IN section (tried NOT EXISTS, isnt much faster). How can I join this table on itself to achieve the same effect?

This query, on a table of ~22.500 records takes >10 seconds to run,
this is caused by the subquery defining the NOT IN section (tried NOT
EXISTS, isnt much faster). How can I join this table on itself to
achieve the same effect?
This can be done in several ways, but a scan of 22500 records taking 10" means either a hardware issue, or a very inefficient JOIN.
The most likely cause of the latter is a missing index or a misconfigured index, and to investigate this, you need to issue an EXPLAIN:
EXPLAIN SELECT ...
Totally shooting in the dark, judging from the selected columns being used, I'd try with
CREATE INDEX test_index ON crm_margincall(name, crm_account_id, MarginCallLevel, id)
Other improvements might be possible, but you'd need to prepare a sample structure with some fake data in a SQLfiddle to really allow debugging.

Try something like this:
SELECT
crm_margincall.id,
crm_margincall.CreationTime,
ba.name AS crm_bankaccount_id,
crm_margincall.name,
crm_margincall.MarginCallLevel,
crm_margincall.UseOfEquityForMargin,
crm_margincall.MarginRequired,
crm_margincall.NetEquityForMargin,
crm_margincall.MarginDeficit,
crm_margincall.balance,
crm_margincall.deposited,
crm_margincall.prefunded,
crm_margincall.required
FROM
crm_margincall
LEFT JOIN
crm_bankaccount ba ON crm_margincall.crm_bankaccount_id = ba.id
INNER JOIN
(
SELECT
x.crm_account_id
FROM
crm_margincall x
WHERE
x.name = 'LevelDrop'
AND
x.MarginCallLevel < 100
AND
x.id > crm_margincall.id
) tt ON crm_margincall.crm_account_id = tt.crm_account_id
WHERE
crm_margincall.name = 'MarginCall'
AND
crm_margincall.MarginCallLevel >= 100
ORDER BY
id
DESC

Related

MySQL - Optimize JOIN only grouping necessary rows

My query selects the maximum value of a second table (laboratory values) for a subset of cases. Obviously this method is extremely slow (>25seconds) as I am using a subquery that, not knowing parent's filter, groups all laboratory.CaseID (15k in a 10mil record table) before just using 10 of them. The problem is easy to see in an EXPLAIN.
SELECT cases.CaseID,t.val FROM cases
LEFT JOIN
(SELECT Max(laboratory.LaboratoryValue) as val , CaseID FROM laboratory
WHERE laboratory.LaboratoryID =682
GROUP by laboratory.CaseID) as t
ON t.CaseID = cases.CaseID
WHERE cases.Diagnosis = 16;
I tried to optimize the query using a TEMPORARY TABLE, which speeds up to 0,008s, but this method seems a bit clumsy when doing it to many fields.
CREATE TEMPORARY TABLE tmptest
SELECT cases.CaseID FROM cases WHERE cases.Diagnosis = 16;
SELECT cases.CaseID,t.val FROM cases
LEFT JOIN
(SELECT Max(laboratory.LaboratoryValue) as val , CaseID FROM laboratory
WHERE laboratory.LaboratoryID =682
AND laboratory.CaseID IN (SELECT CaseID FROM tmptest)
GROUP by laboratory.CaseID) as t
ON t.CaseID = cases.CaseID
WHERE cases.Diagnosis = 16;
Is there a simpler solution for query optimization?
As requested the EXPLAIN. It shows the problem, CaseID is (as expected) not used.
The second query does it right (NoDoubles contains both indexes);
These indexes are needed:
laboratory: INDEX(LaboratoryID, CaseID, LaboratoryValue)
cases: INDEX(Diagnosis, CaseID)
Two more formulations to try and time:
SELECT c.CaseID, L.val
FROM
( SELECT CaseID, Max(laboratory.LaboratoryValue) as val
FROM laboratory
WHERE laboratory.LaboratoryID = 682
GROUP by laboratory.CaseID
) AS L
JOIN cases AS c USING(CaseID)
WHERE c.Diagnosis = 16;
SELECT c.CaseID,
( SELECT Max(laboratory.LaboratoryValue)
FROM laboratory
WHERE laboratory.LaboratoryID = 682
AND CaseID = c.CaseID
) AS val
FROM cases AS c
WHERE c.Diagnosis = 16;
They are not identical because of "LEFT". So, verify that you get the "right" answers. The first uses JOIN; the second is equivalent to your LEFT JOIN.
For further discussion, please provide EXPLAIN for whatever formulations you want to discuss. And SHOW CREATE TABLE.

How to performance tuning the limit sentence?

I run the following MYSQL sentence, however it spends over 10 seconds, although it just return 10 rows. BTW, if I remove the LIMIT 0, 10 , it would return 1,000,000 rows.I have created Index1 for column SceneCode and Index2 for column ProviderId.
SELECT *
FROM
(SELECT * FROM gf_sceneprovider WHERE SceneCode='DL00000003' ) AS sprovider
LEFT JOIN (SELECT * FROM gf_sceneprovidertemplate WHERE SceneCode='DL00000003' ) AS stemplate
ON sprovider.ProviderId = stemplate.ProviderId
INNER JOIN gf_provider AS provider
ON provider.ProviderId = sprovider.ProviderId
LIMIT 0, 10
I would do away with the subqueries in favor of direct joins:
SELECT *
FROM gf_sceneprovider sprovider
LEFT JOIN gf_sceneprovidertemplate stemplate
ON sprovider.ProviderId = stemplate.ProviderId
INNER JOIN gf_provider AS provider
ON provider.ProviderId = sprovider.ProviderId
WHERE sprovider.SceneCode = 'DL00000003' AND
stemplate.SceneCode = 'DL00000003'
LIMIT 0, 10
Then, add indices on the join columns if possible. Your original subqueries might prevent the indices on the gf_sceneprovider and gf_sceneprovidertemplate tables from being used effectively/at all. The reason for this is that your subqueries essentially create an on-the-fly table which, unlike the tables from which they select, have no indices. I think some RDMBS can cope with this in certain scenarios but it looks like that is not the case here.

Conditional order by in MYSQL. Should affect part of row

The left table is a result of my query. And I need to sort it as the right table.
I need to order by p_id, if level >= 2. The blue box of right table is a target of order by.
Is it possible? Of course it is an example. Actual data is hundreds and really need to be sorted.
I searched a lot, but coudln't find the same case.
edit : this table will be returned as java.util.ArrayList. If this kind of 'order by' is not possilbe, is it possible in java.util.ArrayList?
I'm sure it's not possible in one query in MySQL.
In your diagram on the right, the ordering has been done in two separate steps:
Sort by id
Sort each block by p_id if level >= 2
That's quite difficult to do in MySQL as you would need to identify the blocks and then iterate over them, sorting each block separately.
I've done something similar where ordering within blocks was required and then selecting from those ordered blocks. You can view that here but as I said, I think that that SQL code is horribly complicated involving 5 temporary tables. You would probably need fewer temp tables, but it would still be a very complicated procedure, quite slow and hard to maintain.
"Actual data is hundreds and really need to be sorted."
Is there any reason why you can't just sort it as you want in code?
$blockStart = FALSE;
$count = 0;
foreach($dataArray as $data){
if($blockStart === FALSE){
$blockStart = $count;
}
if($data['level'] < 2){ //Block has finished
sortBlock($dataArray, $blockStart, $count);
$blockStart = $count;
}
$count++;
}
sortBlock($dataArray, $blockStart, $count - 1);
function sortBlock($dataArray, $indexStart, $indexEnd){
//Sort the elements of $dataArray, between $indexStart and $indexEnd inclusive
//by the value of p_id
}
Trying to solve a general programming problem in MySQL when you could solve it in 1/10th of the programmer time (and probably have it perform faster as well) in Java is not a good path to follow.
It is possible to do this in SQL, but it would be a very, very complicated query in MySQL. Here is the approach.
(1) Create a subquery that has the original ids and an indicator of whether something is in level 2 or not. The ids in this table are going to define the final order.
(2) Next, create a separate counter for each group in this above table. In other databases, you would use row_number(). In my SQL, this requires a correlated subquery. This provides the mapping from id to the new ordering.
(3) Next, create a counter for each group, but this time with the needed order (by id for the non-level2 group, by your rules for ordering).
(4) Join the tables together to get the matching.
(5) Order by the original id.
Here is an attempt:
select altord.*
from (select t.*,
(select count(*) from t t2 where t2.id <= t.id and ((t2.level = 2 and t1.level = 2) or (t2.level <> 2 and t1.level <> 2))
) as seqnum
from t
) ord join
(select t.*,
(select count(*) from t t2 where (t2.id <= t.id and t2.level <> 2 and t.level <> 2) or (t2.level = 2 and t.level = 2 and (t2.pid < t.pid or t2.pid = t.pid and t2.id < t.id)))
) as seqnum
) altord
on ord.seqnum = altord.seqnum
order by ord.id
I'm not sure if this SQL is correct, but the idea can be implemented in a single query.

How to avoid filesort for that mysql query?

I'm using this kind of queries with different parameters :
EXPLAIN SELECT SQL_NO_CACHE `ilan_genel`.`id` , `ilan_genel`.`durum` , `ilan_genel`.`kategori` , `ilan_genel`.`tip` , `ilan_genel`.`ozellik` , `ilan_genel`.`m2` , `ilan_genel`.`fiyat` , `ilan_genel`.`baslik` , `ilan_genel`.`ilce` , `ilan_genel`.`parabirimi` , `ilan_genel`.`tarih` , `kgsim_mahalleler`.`isim` AS mahalle, `kgsim_ilceler`.`isim` AS ilce, (
SELECT `ilanresimler`.`resimlink`
FROM `ilanresimler`
WHERE `ilanresimler`.`ilanid` = `ilan_genel`.`id`
LIMIT 1
) AS resim
FROM (
`ilan_genel`
)
LEFT JOIN `kgsim_ilceler` ON `kgsim_ilceler`.`id` = `ilan_genel`.`ilce`
LEFT JOIN `kgsim_mahalleler` ON `kgsim_mahalleler`.`id` = `ilan_genel`.`mahalle`
WHERE `ilan_genel`.`ilce` = '703'
AND `ilan_genel`.`durum` = '1'
AND `ilan_genel`.`kategori` = '1'
AND `ilan_genel`.`tip` = '9'
ORDER BY `ilan_genel`.`id` DESC
LIMIT 225 , 15
and this is what i get in explain section:
these are the indexes that i already tried to use:
any help will be deeply appreciated what kind of index will be the best option or should i use another table structure ?
You should first simplify your query to understand your problem better. As it appears your problem is constrained to the ilan_gen1 table, the following query would also show you the same symptoms.:
SELECT * from ilan_gene1 WHERE `ilan_genel`.`ilce` = '703'
AND `ilan_genel`.`durum` = '1'
AND `ilan_genel`.`kategori` = '1'
AND `ilan_genel`.`tip` = '9'
So the first thing to do is check that this is the case. If so, the simpler question is simply why does this query require a file sort on 3661 rows. Now the 'hepsi' index sort order is:
ilce->mahelle->durum->kategori->tip->ozelik
I've written it that way to emphasise that it is first sorted on 'ilce', then 'mahelle', then 'durum', etc. Note that your query does not specify the 'mahelle' value. So the best the index can do is lookup on 'ilce'. Now I don't know the heuristics of your data, but the next logical step in debugging this would be:
SELECT * from ilan_gene1 WHERE `ilan_genel`.`ilce` = '703'`
Does this return 3661 rows?
If so, you should be able to see what is happening. The database is using the hepsi index, to the best of it's ability, getting 3661 rows back then sorting those rows in order to eliminate values according to the other criteria (i.e. 'durum', 'kategori', 'tip').
The key point here is that if data is sorted by A, B, C in that order and B is not specified, then the best logical thing that can be done is: first a look up on A then a filter on the remaining values against C. In this case, that filter is performed via a file sort.
Possible solutions
Supply 'mahelle' (B) in your query.
Add a new index on 'ilan_gene1' that doesn't require 'mahelle', i.e. A->C->D...
Another tip
In case I have misdiagnosed your problem (easy to do when I don't have your system to test against), the important thing here is the approach to solving the problem. In particular, how to break a complicated query into a simpler query that produces the same behaviour, until you get to a very simple SELECT statement that demonstrates the problem. At this point, the answer is usually much clearer.

indexes in mysql SELECT AS or using Views

I'm in over my head with a big mysql query (mysql 5.0), and i'm hoping somebody here can help.
Earlier I asked how to get distinct values from a joined query
mysql count only for distinct values in joined query
The response I got worked (using a subquery with join as)
select *
from media m
inner join
( select uid
from users_tbl
limit 0,30) map
on map.uid = m.uid
inner join users_tbl u
on u.uid = m.uid
unfortunately, my query has grown more unruly, and though I have it running, joining into a derived table is taking too long because there is no indexes available to the derived query.
my query now looks like this
SELECT mdate.bid, mdate.fid, mdate.date, mdate.time, mdate.title, mdate.name,
mdate.address, mdate.rank, mdate.city, mdate.state, mdate.lat, mdate.`long`,
ext.link,
ext.source, ext.pre, meta, mdate.img
FROM ext
RIGHT OUTER JOIN (
SELECT media.bid,
media.date, media.time, media.title, users.name, users.img, users.rank, media.address,
media.city, media.state, media.lat, media.`long`,
GROUP_CONCAT(tags.tagname SEPARATOR ' | ') AS meta
FROM media
JOIN users ON media.bid = users.bid
LEFT JOIN tags ON users.bid=tags.bid
WHERE `long` BETWEEN -122.52224684058 AND -121.79760915942
AND lat BETWEEN 37.07500915942 AND 37.79964684058
AND date = '2009-02-23'
GROUP BY media.bid, media.date
ORDER BY media.date, users.rank DESC
LIMIT 0, 30
) mdate ON (mdate.bid = ext.bid AND mdate.date = ext.date)
phew!
SO, as you can see, if I understand my problem correctly, i have two derivative tables without indexes (and i don't deny that I may have screwed up the Join statements somehow, but I kept messing with different types, is this ended up giving me the result I wanted).
What's the best way to create a query similar to this which will allow me to take advantage of the indexes?
Dare I say, I actually have one more table to add into the mix at a later date.
Currently, my query is taking .8 seconds to complete, but I'm sure if I could take advantage of the indexes, this could be significantly faster.
First, check for indices on ext(bid, date), users(bid) and tags(bid), you should really have them.
It seems, though, that it's LONG and LAT that cause you most problems. You should try keeping your LONG and LAT as a (coordinate POINT), create a SPATIAL INDEX on this column and query like that:
WHERE MBRContains(#MySquare, coordinate)
If you can't change your schema for some reason, you can try creating additional indices that include date as a first field:
CREATE INDEX ix_date_long ON media (date, `long`)
CREATE INDEX ix_date_lat ON media (date, lat)
These indices will be more efficient for you query, as you use exact search on date combined with a ranged search on axes.
Starting fresh:
Question - why are you grouping by both media.bid and media.date? Can a bid have records for more than one date?
Here's a simpler version to try:
SELECT
mdate.bid,
mdate.fid,
mdate.date,
mdate.time,
mdate.title,
mdate.name,
mdate.address,
mdate.rank,
mdate.city,
mdate.state,
mdate.lat,
mdate.`long`,
ext.link,
ext.source,
ext.pre,
meta,
mdate.img,
( SELECT GROUP_CONCAT(tags.tagname SEPARATOR ' | ')
FROM tags
WHERE ext.bid = tags.bid
ORDER BY tags.bid GROUP BY tags.bid
) AS meta
FROM
ext
LEFT JOIN
media ON ext.bid = media.bid AND ext.date = media.date
JOIN
users ON ext.bid = users.bid
WHERE
`long` BETWEEN -122.52224684058 AND -121.79760915942
AND lat BETWEEN 37.07500915942 AND 37.79964684058
AND ext.date = '2009-02-23'
AND users.userid IN
(
SELECT userid FROM users ORDER BY rank DESC LIMIT 30
)
ORDER BY
media.date,
users.rank DESC
LIMIT 0, 30
You might want to compare your perforamnces against using a temp table for each selection, and joining those tables together.
create table #whatever
create table #whatever2
insert into #whatever select...
insert into #whatever2 select...
select from #whatever join #whatever 2
....
drop table #whatever
drop table #whatever2
If your system has enough memory to hold full tables this might work out much faster. It depends on how big your database is.