i have a sql like this:
select t1.id,t1.type from collect t1
where t1.type='1' and t1.status='1'
and t1.total>(t1.totalComplate+t1.totalIng)
and id not in(
select tid from collect_log t2
where t2.tid=t1.id and t2.bid='1146')
limit 1;
is is ok, but its performance seems not very good and if i using a order command:
select t1.id,t1.type from collect t1
where t1.type='1' and t1.status='1'
and t1.total>(t1.totalComplate+t1.totalIng)
and id not in(
select tid from collect_log t2
where t2.tid=t1.id and t2.bid='1146')
order by t1.id asc
limit 1;
it become even worse.
how can i optimize this?
here is explain:
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+------+---------------+-----+---------+-----------------+------+-----------------------------+
| 1 | PRIMARY | t1 | ref | i2,i1,i3 | i1 | 4 | const,const | 99 | Using where; Using filesort |
| 2 | DEPENDENT SUBQUERY | t2 | ref | i5 | i5 | 65 | img.t1.id,const | 2 | Using where; Using index
1) If it's not already done, define an index on the collect.id column :
CREATE INDEX idx_collect_id ON collect (id);
Or possibly a unique index if you can (if id is never the same for any two lines) :
CREATE UNIQUE INDEX idx_collect_id ON collect (id);
Maybe you need an index on collect_log.tid or collect_log.bid, too. Or even on both columns, like so :
CREATE INDEX idx_collect_log_tidbid ON collect (tid, bid);
Make it UNIQUE if it makes sense, that is, if no two lines have the same values for the (tid, bid) couple in the table. For instance if these queries give the same result, it might be possible :
SELECT count(*) FROM collect_log;
SELECT count(DISTINCT tid, bid) FROM collect_log;
But don't make it UNIQUE if you're unsure what it means.
2) Verify the types of the columns collect.type, collect.status and collect_log.bid. In your query, you are comparing them with strings, but maybe they are defined as INT (or SMALLINT or TINYINT...) ? In this case I advise you to drop the quotes around the numbers, because string comparisons are painfully slow compared to integer comparisons.
select t1.id,t1.type from collect t1
where t1.type=1 and t1.status=1
and t1.total>(t1.totalComplate+t1.totalIng)
and id not in(
select tid from collect_log t2
where t2.tid=t1.id and t2.bid=1146)
order by t1.id asc
limit 1;
3) If that still doesn't help, just add EXPLAIN in front of your query, and you'll get the execution plan. Paste the results here and we can help you make some sense out of it. Actually, I would advise you to do this step before creating any new index.
I'd try to get rid of the IN statement using an INNER LEFT JOIN first.
Something like this (untested):
select t1.id,t1.type
from collect t1
LEFT JOIN collect_log t2 ON t1.id = t2.tid
where t1.type='1'
and t1.status='1'
and t1.total>(t1.totalComplate+t1.totalIng)
and NOT t2.bid = '1146'
order by t1.id asc
limit 1;
Related
There is huge execution time gap between this three query. I dont know underlying working of MySQL query engine.
Query1:
select t2.* from (
select id1, id2
from table t2
where condition
) t3, table t2
where
t2.id1 = t3.id1 AND
t2.id2 = t3.id2
takes ~180 seconds to run.
Query2:
select t2.* from (
select id1, id2
from table t2
where condition
) t3 inner join table t2
on
t2.id1 = t3.id1 AND
t2.id2 = t3.id2
takes ~180 seconds to run.
result of explain.
id|select_type|table |partitions|type|possible_keys|key |key_len|ref |rows |filtered|Extra |
--|-----------|----------|----------|----|-------------|-----------|-------|----------------------------------------------------------------------------------------|-----|--------|--------------------------------------------------|
1|PRIMARY |t2 | |ALL | | | | |95619| 19|Using where |
1|PRIMARY |t2 | |ALL | | | | |95619| 1|Using where; Using join buffer (Block Nested Loop)|
1|PRIMARY |<derived3>| |ref |<auto_key0> |<auto_key0>|216 |t2.id1,t2.id2 | 10| 100|Using index |
3|DERIVED |t3 | |ALL | | | | |95619| 100|Using temporary; Using filesort |
Query3:
select t2.* from (
select id1, id2
from table t2
where condition
) t3 left join table t2
on
t2.id1 = t3.id1 AND
t2.id2 = t3.id2
takes ~1.6 seconds to run.
result of explain.
id|select_type|table |partitions|type|possible_keys|key |key_len|ref |rows |filtered|Extra |
--|-----------|----------|----------|----|-------------|-----------|-------|----------------------------------------------------------------------------------------|-----|--------|--------------------------------------------------|
1|PRIMARY |t2 | |ALL | | | | |95619| 19|Using where |
1|PRIMARY |<derived3>| |ref |<auto_key0> |<auto_key0>|216 |t2.id1,t2.id2 | 10| 100|Using index |
1|PRIMARY |t2 | |ALL | | | | |95619| 100|Using where; Using join buffer (Block Nested Loop)|
3|DERIVED |t3 | |ALL | | | | |95619| 100|Using temporary; Using filesort |
UPDATE: I have edited the question with additional query.
UPDATE2 Added Explain result for query.
EDIT : Some stats about table.
subquery contains 83 rows. Table t2 have ~97k rows.
I think what's going on is this:
The first two queries are completely equivalent -- MySQL doesn't distinguish between INNER JOIN and cross product when the conditions relating the two tables are the same; it doesn't matter whether the conditions are in ON or WHERE. So the relevant difference is just between inner join and outer join.
When performing any kind of join, MySQL has to decide which table to treat as the primary table and which is the dependent table. It will scan the primary table and then find the matching rows in the dependent table to produce the result set.
An outer join forces a particular ordering to this; in a LEFT JOIN, the left table is always the primary, because the result has to include at least one row for each row in that table. So after generating an intermediate table for t3, it just has to scan those 83 rows and find the corresponding rows in t2 that match the joining condition. The columns being matched are presumably indexed, so this is very fast.
But with an inner join, it can go either way. The query optimizer will try to estimate which table is smaller, and use that as the primary that it scans. But when it's the result of a subquery, it doesn't know how many rows it will return. So it's apparently choosing to use t2 as the primary, rather than the intermediate t3 table. This means it's scanning 97K rows, testing each of them to find the matching rows in t3.
A smart optimizer would notice that the subquery is simply filtering the same table, so it must return fewer rows. Few people would claim that MySQL has one of the better query planners. I suspect it's choosing to use the regular table rather than the intermediate table because it can confine the scan to the index.
It's actually surprising that it only takes 180 seconds, rather than 1800 seconds (30 minutes), which is closer to the ratio between 97k and 83.
I'm not really an expert on interpreting EXPLAIN results, but I think you can see the difference on line 2; in the fast case it's table = <derived3>, rows = 10, in the slow case it's table = t2, rows = 95619.
I have the following queries which both return the same result and row count:
select * from (
select UNIX_TIMESTAMP(network_time) * 1000 as epoch_network_datetime,
hbrl.business_rule_id,
display_advertiser_id,
hbrl.campaign_id,
truncate(sum(coalesce(hbrl.ad_spend_network, 0))/100000.0, 2) as demand_ad_spend_network,
sum(coalesce(hbrl.ad_view, 0)) as demand_ad_view,
sum(coalesce(hbrl.ad_click, 0)) as demand_ad_click,
truncate(coalesce(case when sum(hbrl.ad_view) = 0 then 0 else 100*sum(hbrl.ad_click)/sum(hbrl.ad_view) end, 0), 2) as ctr_percent,
truncate(coalesce(case when sum(hbrl.ad_view) = 0 then 0 else sum(hbrl.ad_spend_network)/100.0/sum(hbrl.ad_view) end, 0), 2) as ecpm,
truncate(coalesce(case when sum(hbrl.ad_click) = 0 then 0 else sum(hbrl.ad_spend_network)/100000.0/sum(hbrl.ad_click) end, 0), 2) as ecpc
from hourly_business_rule_level hbrl
where (publisher_network_id = 31534)
and network_time between str_to_date('2017-08-13 17:00:00.000000', '%Y-%m-%d %H:%i:%S.%f') and str_to_date('2017-08-14 16:59:59.999000', '%Y-%m-%d %H:%i:%S.%f')
and (network_time IS NOT NULL and display_advertiser_id > 0)
group by network_time, hbrl.campaign_id, hbrl.business_rule_id
having demand_ad_spend_network > 0
OR demand_ad_view > 0
OR demand_ad_click > 0
OR ctr_percent > 0
OR ecpm > 0
OR ecpc > 0
order by epoch_network_datetime) as atb
left join dim_demand demand on atb.display_advertiser_id = demand.advertiser_dsp_id
and atb.campaign_id = demand.campaign_id
and atb.business_rule_id = demand.business_rule_id
ran explain extended, and these are the results:
+----+-------------+----------------------------+------+-------------------------------------------------------------------------------+---------+---------+-----------------+---------+----------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------------------------+------+-------------------------------------------------------------------------------+---------+---------+-----------------+---------+----------+----------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 1451739 | 100.00 | NULL |
| 1 | PRIMARY | demand | ref | PRIMARY,join_index | PRIMARY | 4 | atb.campaign_id | 1 | 100.00 | Using where |
| 2 | DERIVED | hourly_business_rule_level | ALL | _hourly_business_rule_level_supply_idx,_hourly_business_rule_level_demand_idx | NULL | NULL | NULL | 1494447 | 97.14 | Using where; Using temporary; Using filesort |
+----+-------------+----------------------------+------+-------------------------------------------------------------------------------+---------+---------+-----------------+---------+----------+----------------------------------------------+
and the other is:
select UNIX_TIMESTAMP(network_time) * 1000 as epoch_network_datetime,
hbrl.business_rule_id,
display_advertiser_id,
hbrl.campaign_id,
truncate(sum(coalesce(hbrl.ad_spend_network, 0))/100000.0, 2) as demand_ad_spend_network,
sum(coalesce(hbrl.ad_view, 0)) as demand_ad_view,
sum(coalesce(hbrl.ad_click, 0)) as demand_ad_click,
truncate(coalesce(case when sum(hbrl.ad_view) = 0 then 0 else 100*sum(hbrl.ad_click)/sum(hbrl.ad_view) end, 0), 2) as ctr_percent,
truncate(coalesce(case when sum(hbrl.ad_view) = 0 then 0 else sum(hbrl.ad_spend_network)/100.0/sum(hbrl.ad_view) end, 0), 2) as ecpm,
truncate(coalesce(case when sum(hbrl.ad_click) = 0 then 0 else sum(hbrl.ad_spend_network)/100000.0/sum(hbrl.ad_click) end, 0), 2) as ecpc
from hourly_business_rule_level hbrl
join dim_demand demand on hbrl.display_advertiser_id = demand.advertiser_dsp_id
and hbrl.campaign_id = demand.campaign_id
and hbrl.business_rule_id = demand.business_rule_id
where (publisher_network_id = 31534)
and network_time between str_to_date('2017-08-13 17:00:00.000000', '%Y-%m-%d %H:%i:%S.%f') and str_to_date('2017-08-14 16:59:59.999000', '%Y-%m-%d %H:%i:%S.%f')
and (network_time IS NOT NULL and display_advertiser_id > 0)
group by network_time, hbrl.campaign_id, hbrl.business_rule_id
having demand_ad_spend_network > 0
OR demand_ad_view > 0
OR demand_ad_click > 0
OR ctr_percent > 0
OR ecpm > 0
OR ecpc > 0
order by epoch_network_datetime;
and these are the results for the second query:
+----+-------------+----------------------------+------+-------------------------------------------------------------------------------+---------+---------+---------------------------------------------------------------+---------+----------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------------------------+------+-------------------------------------------------------------------------------+---------+---------+---------------------------------------------------------------+---------+----------+----------------------------------------------+
| 1 | SIMPLE | hourly_business_rule_level | ALL | _hourly_business_rule_level_supply_idx,_hourly_business_rule_level_demand_idx | NULL | NULL | NULL | 1494447 | 97.14 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | demand | ref | PRIMARY,join_index | PRIMARY | 4 | my6sense_datawarehouse.hourly_business_rule_level.campaign_id | 1 | 100.00 | Using where; Using index |
+----+-------------+----------------------------+------+-------------------------------------------------------------------------------+---------+---------+---------------------------------------------------------------+---------+----------+----------------------------------------------+
the first one takes about 2 seconds while the second one takes over 2 minutes!
why is the second query taking so long?
what am I missing here?
thanks.
Use a subquery whenever the subquery significantly shrinks the number of rows before - ANY JOIN - always to reinforce Rick James Plan B.
To reinforce Rick & Paul's answer which you have already documented. The answers by Rick and Paul deserve Acceptance.
One possible reason is the number of rows that have to be joined with the second table.
The GROUP BY clause and the HAVING clause will limit the number of rows returned from your subquery.
Only those rows will be used for the join.
Without the subquery only the WHERE clause is limiting the number of rows for the JOIN.
The JOIN is done before the GROUP BY and HAVING clauses are processed.
Depending on group size and the selectivity of the HAVING conditions there would be much more rows that need to be joined.
Consider the following simplified example:
We have a table users with 1000 entries and the columns id, email.
create table users(
id smallint auto_increment primary key,
email varchar(50) unique
);
Then we have a (huge) log table user_actions with 1,000,000 entries and the columns id, user_id, timestamp, action
create table user_actions(
id mediumint auto_increment primary key,
user_id smallint not null,
timestamp timestamp,
action varchar(50),
index (timestamp, user_id)
);
The task is to find all users who have at least 900 entries in the log table since 2017-02-01.
The subquery solution:
select a.user_id, a.cnt, u.email
from (
select a.user_id, count(*) as cnt
from user_actions a
where a.timestamp >= '2017-02-01 00:00:00'
group by a.user_id
having cnt >= 900
) a
left join users u on u.id = a.user_id
The subquery returns 135 rows (users). Only those rows will be joined with the users table.
The subquery runs in about 0.375 seconds. The time needed for the join is almost zero, so the full query runs in about 0.375 seconds.
Solution without subquery:
select a.user_id, count(*) as cnt, u.email
from user_actions a
left join users u on u.id = a.user_id
where a.timestamp >= '2017-02-01 00:00:00'
group by a.user_id
having cnt >= 900
The WHERE condition filters the table to 866,081 rows.
The JOIN has to be done for all those 866K rows.
After the JOIN the GROUP BY and the HAVING clauses are processed and limit the result to 135 rows.
This query needs about 0.815 seconds.
So you can already see, that a subquery can improve the performance.
But let's make things worse and drop the primary key in the users table.
This way we have no index which can be used for the JOIN.
Now the first query runs in 0.455 seconds. The second query needs 40 seconds - almost 100 times slower.
Notes
It's difficult to say if the same applies to your case. Reasons are:
Your queries are quite complex and far away from from beeing an MVCE.
I don't see anything beeng selected from the demand table - So it's unclear why you are joining it at all.
You use a LEFT JOIN in one query and an INNER JOIN in another one.
The relation between the two tables is unclear.
No information about indexes. You should provide the CREATE statements (SHOW CREATE table_name).
Test setup
drop table if exists users;
create table users(
id smallint auto_increment primary key,
email varchar(50) unique
)
select seq as id, rand(1) as email
from seq_1_to_1000
;
drop table if exists user_actions;
create table user_actions(
id mediumint auto_increment primary key,
user_id smallint not null,
timestamp timestamp,
action varchar(50),
index (timestamp, user_id)
)
select seq as id
, floor(rand(2)*1000)+1 as user_id
#, '2017-01-01 00:00:00' + interval seq*20 second as timestamp
, from_unixtime(unix_timestamp('2017-01-01 00:00:00') + seq*20) as timestamp
, rand(3) as action
from seq_1_to_1000000
;
MariaDB 10.0.19 with sequence plugin.
The queries are different. One says JOIN, the other says LEFT JOIN. You are not using demand, so the join is probably useless. However, in the case of JOIN, you are filtering out advertisers that are not in dim_demand; it that the intent?
But that does not address the question.
The EXPLAINs estimate that there are 1.5M rows in hbrl. But how many show up in the result? I would guess it is a lot fewer. From this, I can answer your question.
Consider these two:
SELECT ... FROM ( SELECT ... FROM a
GROUP BY or HAVING or LIMIT ) x
JOIN b
SELECT ... FROM a
JOIN b
GROUP BY or HAVING or LIMIT
The first will decrease the number of rows that need to join to b; the second will need to do a full 1.5M joins. I suspect that the time taken to do the JOIN (be it LEFT or not) is where the difference is.
Plan A: Remove demand from the query.
Plan B: Use a subquery whenever the subquery significantly shrinks the number of rows before the JOIN.
Indexing (may speed up both variants):
INDEX(publisher_network_id, network_time)
and get rid of this as being useless (since the between will fail anyway for NULL):
and network_time IS NOT NULL
Side note: I recommend simplifying and fixing this
and network_time
between str_to_date('2017-08-13 17:00:00.000000', '%Y-%m-%d %H:%i:%S.%f')
AND str_to_date('2017-08-14 16:59:59.999000', '%Y-%m-%d %H:%i:%S.%f')
to
and network_time >= '2017-08-13 17:00:00
and network_time < '2017-08-13 17:00:00 + INTERVAL 24 HOUR
For information, on the following examples, big_table is composed of millions of rows and small_table of hundreds.
Here is the basic query i'm trying to do:
SELECT b.id
FROM big_table b
LEFT JOIN small_table s
ON b.small_id=s.id
WHERE s.name like 'something%'
ORDER BY b.name
LIMIT 10, 10;
This is slow and I can understand why both index can't be used.
My initial idea was to split the query into parts.
This is fast:
SELECT id FROM small_table WHERE name like 'something%';
This is also fast:
SELECT id FROM big_table WHERE small_id IN (1, 2) ORDER BY name LIMIT 10, 10;
But, put together, it becomes slow:
SELECT id FROM big_table
WHERE small_id
IN (
SELECT id
FROM small_table WHERE name like 'something%'
)
ORDER BY name
LIMIT 10, 10;
Unless the subquery is re-evaluated for every row, it shouldn't be slower than executing both query separately right?
I'm looking for any help optimizing the initial query and understanding why the second one doesn't work.
EXPLAIN result for the last query :
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
| 1 | PRIMARY | small_table | range | PRIMARY, ix_small_name | ix_small_name | 768 | NULL | 1 | Using where; Using index; Using temporary; Using filesort |
| 1 | PRIMARY | big_table | ref | ix_join_foreign_key | ix_join_foreign_key | 9 | small_table.id | 11870 | |
temporary solution :
SELECT id FROM big_table ignore index(ix_join_foreign_key)
WHERE small_id
IN (
SELECT id
FROM small_table ignore index(PRIMARY)
WHERE name like 'something%'
)
ORDER BY name
LIMIT 10, 10;
(result & explain is the same with an EXISTS instead of IN)
EXPLAIN output becomes:
| 1 | PRIMARY | big_table | index | NULL | ix_big_name | 768 | NULL | 20 | |
| 1 | PRIMARY | <subquery2> | eq_ref | distinct_key | distinct_key | 8 | func | 1 | |
| 2 | MATERIALIZED | small_table | range | ix_small_name | ix_small_name | 768 | NULL | 1 | Using where; Using index |
if anyone has a better solution, I'm still interested.
The problem that you are facing is that you have conditions on the small table but are trying to avoid a sort in the large table. In MySQL, I think you need to do at least a full table scan.
One step is to write the query using exists, as others have mentioned:
SELECT b.id
FROM big_table b
WHERE EXISTS (SELECT 1
FROM small_table s
WHERE s.name LIKE 'something%' AND s.id = b.small_id
)
ORDER BY b.name;
The question is: Can you trick MySQL into doing the ORDER BY using an index? One possibility is to use the appropriate index. In this case, the appropriate index is: big_table(name, small_id, id) and small_table(id, name). The ordering of the keys in the index is important. Because the first is a covering index, MySQL might read through the index in order by name, choosing the appropriate ids.
You are looking for an EXISTS or IN query. As MySQL is known to be weak on IN I'd try EXISTS in spite of liking IN better for its simplicity.
select id
from big_table b
where exists
(
select *
from small_table s
where s.id = b.small_id
and s.name = 'something%'
)
order by name
limit 10, 10;
It would be helpful to have a good index on big_table. It should first contain the small_id to find the match, then the name for the sorting. The ID is automatically included in MySQL indexes, as far as I know (otherwise it should also be added to the index). So thus you'd have an index containing all fields needed from big_table (that is called a covering index) in the desired order, so all data can be read from the index alone and the table itself doesn't have to get accessed.
create index idx_big_quick on big_table(small_id, name);
you can try this:
SELECT b.id
FROM big_table b
JOIN small_table s
ON b.small_id = s.id
WHERE s.name like 'something%'
ORDER BY b.name;
or
SELECT b.id FROM big_table b
WHERE EXISTS(SELECT 1 FROM small_table s
WHERE s.name LIKE 'something%' AND s.id = b.small_id)
ORDER BY b.name;
NOTE: you don't seem to need LEFT JOIN. Left outer join will almost always result in full table scan of the big_table
PS make sure you have an index on big_table.small_id
Plan A
SELECT b.id
FROM big_table b
JOIN small_table s ON b.small_id=s.id
WHERE s.name like 'something%'
ORDER BY b.name
LIMIT 10, 10;
(Note removal of LEFT.)
You need
small_table: INDEX(name, id)
big_table: INDEX(small_id), or, for 'covering': INDEX(small_id, name, id)
It will use the s index to find 'something%' and walk through. But it must find all such rows, and JOIN to b to find all such rows there. Only then can it do the ORDER BY, OFFSET, and LIMIT. There will be a filesort (which may happen in RAM).
The column order in the indexes is important.
Plan B
The other suggestion may work well; it depends on various things.
SELECT b.id
FROM big_table b
WHERE EXISTS
( SELECT *
FROM small_table s
WHERE s.name LIKE 'something%'
AND s.id = b.small_id
)
ORDER BY b.name
LIMIT 10, 10;
That needs these:
big_table: INDEX(name), or for 'covering', INDEX(name, small_id, id)
small_table: INDEX(id, name), which is 'covering'
(Caveat: If you are doing something other than SELECT b.id, my comments about covering may be wrong.)
Which is faster (A or B)? Cannot predict without understanding the frequency of 'something%' and how 'many' the many-to-1 mapping is.
Settings
If these tables are InnoDB, then be sure that innodb_buffer_pool_size is set to about 70% of available RAM.
Pagination
Your use of OFFSET implies that you are 'paging' through the data? OFFSET is an inefficient way to do it. See my blog on such, but note that only Plan B will work with it.
Currently i have three different tables and three different queries, which are very similiar to each others with almost same joins. I was trying to combine all that three queries with in one query, so far not much success though. I will be very happy if someone has better solution or a direction to point. Thanks.
0.0013
SELECT `ilan_genel`.`id`, `ilan_genel`.`durum`, `ilan_genel`.`kategori`, `ilan_genel`.`tip`, `ilan_genel`.`ozellik`, `ilan_genel`.`m2`, `ilan_genel`.`fiyat`, `ilan_genel`.`baslik`, `ilan_genel`.`ilce`, `ilan_genel`.`mahalle`, `ilan_genel`.`parabirimi`, `kgsim_ilceler`.`isim` as ilce, (
SELECT ilanresimler.resimlink
FROM ilanresimler
WHERE ilanresimler.ilanid = ilan_genel.id LIMIT 1
) AS resim
FROM (`ilan_genel`)
LEFT JOIN `kgsim_ilceler` ON `kgsim_ilceler`.`id` = `ilan_genel`.`ilce`
ORDER BY `id` desc
LIMIT 30
0.0006
SELECT `video`.`id`, `video`.`url`, `ilan_genel`.`ilce`, `ilan_genel`.`tip`, `ilan_genel`.`m2`, `ilan_genel`.`ozellik`, `ilan_genel`.`fiyat`, `ilan_genel`.`parabirimi`, `ilan_genel`.`kullanici`, `ilanresimler`.`resimlink` as resim, `uyeler`.`isim` as isim, `uyeler`.`soyisim` as soyisim, `kgsim_ilceler`.`isim` as ilce
FROM (`video`)
LEFT JOIN `ilan_genel` ON `ilan_genel`.`id` = `video`.`id`
LEFT JOIN `kgsim_ilceler` ON `kgsim_ilceler`.`id` = `ilan_genel`.`ilce`
LEFT JOIN `ilanresimler` ON `ilanresimler`.`id` = `ilan_genel`.`resim`
LEFT JOIN `uyeler` ON `uyeler`.`id` = `ilan_genel`.`kullanici`
ORDER BY `siralama` desc
LIMIT 30
0.0005
SELECT `sanaltur`.`id`, `ilan_genel`.`ilce`, `ilan_genel`.`tip`, `ilan_genel`.`m2`, `ilan_genel`.`ozellik`, `ilan_genel`.`fiyat`, `ilan_genel`.`parabirimi`, `ilan_genel`.`kullanici`, `ilanresimler`.`resimlink` as resim, `uyeler`.`isim` as isim, `uyeler`.`soyisim` as soyisim, `kgsim_ilceler`.`isim` as ilce
FROM (`sanaltur`)
LEFT JOIN `ilan_genel` ON `ilan_genel`.`id` = `sanaltur`.`id`
LEFT JOIN `kgsim_ilceler` ON `kgsim_ilceler`.`id` = `ilan_genel`.`ilce`
LEFT JOIN `ilanresimler` ON `ilanresimler`.`id` = `ilan_genel`.`resim`
LEFT JOIN `uyeler` ON `uyeler`.`id` = `ilan_genel`.`kullanici`
ORDER BY `siralama` desc
LIMIT 30
These are actually three very different queries. I don't think you will be able to usefully combine them. Also, they seem pretty fast to me.
However, if you want to try to optimize each individual query, you can use EXPLAIN SELECT to find out how if each query uses appropriate indexes or not.
For example:
EXPLAIN SELECT *
FROM A
WHERE foo NOT IN (1,4,5,6);
Might yield:
+----+-------------+-------+------+---------------
| id | select_type | table | type | possible_keys
+----+-------------+-------+------+---------------
| 1 | SIMPLE | A | ALL | NULL
+----+-------------+-------+------+---------------
+------+---------+------+------+-------------+
| key | key_len | ref | rows | Extra |
+------+---------+------+------+-------------+
| NULL | NULL | NULL | 2 | Using where |
+------+---------+------+------+-------------+
In this case, the query had no possible_keys and therefore used no (or NULL) key to do the query. It's the key column you'd be interested in.
More information here:
http://dev.mysql.com/doc/refman/5.5/en/explain.html
http://dev.mysql.com/doc/refman/5.5/en/optimization-indexes.html
To start things off, I want to make it clear that I'm not trying to order by descending order.
I am looking to order by something else, but then filter further by displaying things in a second column only if the value in that column 1 row below it is less than itself. Once It finds that the next column is lower, it stops.
Example:
Ordered by column-------------------Descending Column
353215 20
535325 15
523532 10
666464 30
473460 20
If given that data, I would like it to only return 20, 15 and 10. Because now that 30 is higher than 10, we don't care about what's below it.
I've looked everywhere and can't find a solution.
EDIT: removed the big number init, and edd the counter in ifnull test, so it works in pure MySQL: ifnull(#prec,counter) and not ifnull(#prec,999999).
If your starting table is t1 and the base request was:
select id,counter from t1 order by id;
Then with a mysql variable you can do the job:
SET #prec=NULL;
select * from (
select id,counter,#prec:= if(
ifnull(#prec,counter)>=counter,
counter,
-1) as prec
from t1 order by id
) t2 where prec<>-1;
except here I need the 99999 as a max value for your column and there's maybe a way to put the initialisation of #prec to NULL somewhere in the 1st request.
Here the prec column contains the 1st row value counter, and then the counter value of each row if it less than the one from previous row, and -1 when this becomes false.
Update
The outer select can be removed completely if the variable assignment is done in the WHERE clause:
SELECT #prec := NULL;
SELECT
id,
counter
FROM t1
WHERE
(#prec := IF(
IFNULL(#prec, counter) >= counter,
counter,
-1
)) IS NOT NULL
AND #prec <> -1
ORDER BY id;
regilero EDIT:
I can remove the 1st initialization query using a temporary table (left join) of 1 row this way: but this may slow down the query, maybe.
(...)
FROM t1
LEFT JOIN (select #prec:=NULL as nullinit limit 1) as tmp1 ON tmp1.nullinit is null
(..)
As said by #Mike using a simple UNION query or even :
(...)
FROM t1 , (select #prec:=NULL) tmp1
(...)
is better if you want to avoid the first query.
So at the end the nicest solution is:
SELECT NULL AS id, NULL AS counter FROM dual WHERE (#prec := NULL)
UNION
SELECT id, counter
FROM t1
WHERE (
#prec := IF(
IFNULL(#prec, counter) >= counter,
counter,
-1 )) IS NOT NULL
AND #prec <> -1
ORDER BY id;
+--------+---------+
| id | counter |
+--------+---------+
| 353215 | 20 |
| 523532 | 10 |
| 535325 | 15 |
+--------+---------+
EXPLAIN SELECT output:
+----+--------------+------------+------+---------------+------+---------+------+------+------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------+------------+------+---------------+------+---------+------+------+------------------+
| 1 | PRIMARY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Impossible WHERE |
| 2 | UNION | t1 | ALL | NULL | NULL | NULL | NULL | 6 | Using where |
| NULL | UNION RESULT | <union1,2> | ALL | NULL | NULL | NULL | NULL | NULL | Using filesort |
+----+--------------+------------+------+---------------+------+---------+------+------+------------------+
You didn't find a solution because it is impossible.
SQL works only within a row, it can not look at rows above or below it.
You could write a stored procedure to do this, essentially looping one row at a time and calculating the logic.
It would probably be easier to write it in the frontend language, whatever it is you are using.
I'm afraid you can't do it in SQL. Relational databases were designed for different purpose so there is no abstraction like next or previous row. Do it outside the SQL in the 'wrapping' language.
I'm not sure whether these do what you want, and they're probably too slow anyway:
SELECT t1.col1, t1.col2
FROM tbl t1
WHERE t1.col2 = (SELECT MIN(t2.col2) FROM tbl t2 WHERE t2.col1 <= t1.col1)
Or
SELECT t1.col1, t1.col2
FROM tbl t1
INNER JOIN tbl t2 ON t2.col1 <= t1.col1
GROUP BY t1.col1, t1.col2
HAVING t1.col2 = MIN(t2.col2)
I guess you could maybe select them (in order) into a temporary table, that also has an auto-incrementing column, and then select from the temporary table, joining on to itself based on the auto-incrementing column (id), but where t1.id = t2.id + 1, and then use the where criteria (and appropriate order by and limit 1) to find the t1.id of the row where the descending column is greater in t2 than in t1. After which, you can select from the temporary table where the id is less than or equal to the id that you just found. It's not exactly pretty though! :)
It is actually possible, but the performance isn't easy to optimize. If Col1 is ordered and Col2 is the descending column:
First you create a self join of each row with the next row (note that this only works if the column value is unique, if not you need to join on unique values).
(Select Col1, (Select Min(Col2) as A2 from MyTable as B Where B.A2>A.Col1) As Col1FromNextRow From MyTable As A) As D
INNER JOIN
(Select Col1 As C1,Col2 From MyTable As C On C.C1=D.Col1FromNextRow)
Then you implement the "keep going until the first ascending value" bit:
Select Col2 FROM
(
(Select Col1, (Select Min(Col2) as A2 from MyTable as B Where B.A2>A.Col1) As Col1FromNextRow From MyTable As A) As D
INNER JOIN
(Select Col1 As C1,Col2 From MyTable As C On C.C1=D.Col1FromNextRow)
) As E
WHERE NOT EXISTS
(SELECT Col1 FROM MyTable As Z Where z.COL1<E.Col1 and Z.Col2 < E.Col2)
I don't have an environment to test this, so it probably has bugs. My apologies, but hopefully the idea is semi clear.
I would still try to do it outside of SQL.