where like and order by on different tables/columns - mysql

For information, on the following examples, big_table is composed of millions of rows and small_table of hundreds.
Here is the basic query i'm trying to do:
SELECT b.id
FROM big_table b
LEFT JOIN small_table s
ON b.small_id=s.id
WHERE s.name like 'something%'
ORDER BY b.name
LIMIT 10, 10;
This is slow and I can understand why both index can't be used.
My initial idea was to split the query into parts.
This is fast:
SELECT id FROM small_table WHERE name like 'something%';
This is also fast:
SELECT id FROM big_table WHERE small_id IN (1, 2) ORDER BY name LIMIT 10, 10;
But, put together, it becomes slow:
SELECT id FROM big_table
WHERE small_id
IN (
SELECT id
FROM small_table WHERE name like 'something%'
)
ORDER BY name
LIMIT 10, 10;
Unless the subquery is re-evaluated for every row, it shouldn't be slower than executing both query separately right?
I'm looking for any help optimizing the initial query and understanding why the second one doesn't work.
EXPLAIN result for the last query :
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
| 1 | PRIMARY | small_table | range | PRIMARY, ix_small_name | ix_small_name | 768 | NULL | 1 | Using where; Using index; Using temporary; Using filesort |
| 1 | PRIMARY | big_table | ref | ix_join_foreign_key | ix_join_foreign_key | 9 | small_table.id | 11870 | |
temporary solution :
SELECT id FROM big_table ignore index(ix_join_foreign_key)
WHERE small_id
IN (
SELECT id
FROM small_table ignore index(PRIMARY)
WHERE name like 'something%'
)
ORDER BY name
LIMIT 10, 10;
(result & explain is the same with an EXISTS instead of IN)
EXPLAIN output becomes:
| 1 | PRIMARY | big_table | index | NULL | ix_big_name | 768 | NULL | 20 | |
| 1 | PRIMARY | <subquery2> | eq_ref | distinct_key | distinct_key | 8 | func | 1 | |
| 2 | MATERIALIZED | small_table | range | ix_small_name | ix_small_name | 768 | NULL | 1 | Using where; Using index |
if anyone has a better solution, I'm still interested.

The problem that you are facing is that you have conditions on the small table but are trying to avoid a sort in the large table. In MySQL, I think you need to do at least a full table scan.
One step is to write the query using exists, as others have mentioned:
SELECT b.id
FROM big_table b
WHERE EXISTS (SELECT 1
FROM small_table s
WHERE s.name LIKE 'something%' AND s.id = b.small_id
)
ORDER BY b.name;
The question is: Can you trick MySQL into doing the ORDER BY using an index? One possibility is to use the appropriate index. In this case, the appropriate index is: big_table(name, small_id, id) and small_table(id, name). The ordering of the keys in the index is important. Because the first is a covering index, MySQL might read through the index in order by name, choosing the appropriate ids.

You are looking for an EXISTS or IN query. As MySQL is known to be weak on IN I'd try EXISTS in spite of liking IN better for its simplicity.
select id
from big_table b
where exists
(
select *
from small_table s
where s.id = b.small_id
and s.name = 'something%'
)
order by name
limit 10, 10;
It would be helpful to have a good index on big_table. It should first contain the small_id to find the match, then the name for the sorting. The ID is automatically included in MySQL indexes, as far as I know (otherwise it should also be added to the index). So thus you'd have an index containing all fields needed from big_table (that is called a covering index) in the desired order, so all data can be read from the index alone and the table itself doesn't have to get accessed.
create index idx_big_quick on big_table(small_id, name);

you can try this:
SELECT b.id
FROM big_table b
JOIN small_table s
ON b.small_id = s.id
WHERE s.name like 'something%'
ORDER BY b.name;
or
SELECT b.id FROM big_table b
WHERE EXISTS(SELECT 1 FROM small_table s
WHERE s.name LIKE 'something%' AND s.id = b.small_id)
ORDER BY b.name;
NOTE: you don't seem to need LEFT JOIN. Left outer join will almost always result in full table scan of the big_table
PS make sure you have an index on big_table.small_id

Plan A
SELECT b.id
FROM big_table b
JOIN small_table s ON b.small_id=s.id
WHERE s.name like 'something%'
ORDER BY b.name
LIMIT 10, 10;
(Note removal of LEFT.)
You need
small_table: INDEX(name, id)
big_table: INDEX(small_id), or, for 'covering': INDEX(small_id, name, id)
It will use the s index to find 'something%' and walk through. But it must find all such rows, and JOIN to b to find all such rows there. Only then can it do the ORDER BY, OFFSET, and LIMIT. There will be a filesort (which may happen in RAM).
The column order in the indexes is important.
Plan B
The other suggestion may work well; it depends on various things.
SELECT b.id
FROM big_table b
WHERE EXISTS
( SELECT *
FROM small_table s
WHERE s.name LIKE 'something%'
AND s.id = b.small_id
)
ORDER BY b.name
LIMIT 10, 10;
That needs these:
big_table: INDEX(name), or for 'covering', INDEX(name, small_id, id)
small_table: INDEX(id, name), which is 'covering'
(Caveat: If you are doing something other than SELECT b.id, my comments about covering may be wrong.)
Which is faster (A or B)? Cannot predict without understanding the frequency of 'something%' and how 'many' the many-to-1 mapping is.
Settings
If these tables are InnoDB, then be sure that innodb_buffer_pool_size is set to about 70% of available RAM.
Pagination
Your use of OFFSET implies that you are 'paging' through the data? OFFSET is an inefficient way to do it. See my blog on such, but note that only Plan B will work with it.

Related

how to optimize this sql in mysql

i have a sql like this:
select t1.id,t1.type from collect t1
where t1.type='1' and t1.status='1'
and t1.total>(t1.totalComplate+t1.totalIng)
and id not in(
select tid from collect_log t2
where t2.tid=t1.id and t2.bid='1146')
limit 1;
is is ok, but its performance seems not very good and if i using a order command:
select t1.id,t1.type from collect t1
where t1.type='1' and t1.status='1'
and t1.total>(t1.totalComplate+t1.totalIng)
and id not in(
select tid from collect_log t2
where t2.tid=t1.id and t2.bid='1146')
order by t1.id asc
limit 1;
it become even worse.
how can i optimize this?
here is explain:
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+--------------------+-------+------+---------------+-----+---------+-----------------+------+-----------------------------+
| 1 | PRIMARY | t1 | ref | i2,i1,i3 | i1 | 4 | const,const | 99 | Using where; Using filesort |
| 2 | DEPENDENT SUBQUERY | t2 | ref | i5 | i5 | 65 | img.t1.id,const | 2 | Using where; Using index
1) If it's not already done, define an index on the collect.id column :
CREATE INDEX idx_collect_id ON collect (id);
Or possibly a unique index if you can (if id is never the same for any two lines) :
CREATE UNIQUE INDEX idx_collect_id ON collect (id);
Maybe you need an index on collect_log.tid or collect_log.bid, too. Or even on both columns, like so :
CREATE INDEX idx_collect_log_tidbid ON collect (tid, bid);
Make it UNIQUE if it makes sense, that is, if no two lines have the same values for the (tid, bid) couple in the table. For instance if these queries give the same result, it might be possible :
SELECT count(*) FROM collect_log;
SELECT count(DISTINCT tid, bid) FROM collect_log;
But don't make it UNIQUE if you're unsure what it means.
2) Verify the types of the columns collect.type, collect.status and collect_log.bid. In your query, you are comparing them with strings, but maybe they are defined as INT (or SMALLINT or TINYINT...) ? In this case I advise you to drop the quotes around the numbers, because string comparisons are painfully slow compared to integer comparisons.
select t1.id,t1.type from collect t1
where t1.type=1 and t1.status=1
and t1.total>(t1.totalComplate+t1.totalIng)
and id not in(
select tid from collect_log t2
where t2.tid=t1.id and t2.bid=1146)
order by t1.id asc
limit 1;
3) If that still doesn't help, just add EXPLAIN in front of your query, and you'll get the execution plan. Paste the results here and we can help you make some sense out of it. Actually, I would advise you to do this step before creating any new index.
I'd try to get rid of the IN statement using an INNER LEFT JOIN first.
Something like this (untested):
select t1.id,t1.type
from collect t1
LEFT JOIN collect_log t2 ON t1.id = t2.tid
where t1.type='1'
and t1.status='1'
and t1.total>(t1.totalComplate+t1.totalIng)
and NOT t2.bid = '1146'
order by t1.id asc
limit 1;

Query works too slow when there is no results. How to improve it?

I have three tables
filters (id, name)
items(item_id, name)
items_filters(item_id, filter_id, value_id)
values(id, filter_id, filter_value)
about 20000 entries in items.
about 80000 entries in items_filters.
SELECT i.*
FROM items_filters itf INNER JOIN items i ON i.item_id = itf.item_id
WHERE (itf.filter_id = 1 AND itf.value_id = '1')
OR (itf.filter_id = 2 AND itf.value_id = '7')
GROUP BY itf.item_id
WITH ROLLUP
HAVING COUNT(*) = 2
LIMIT 0,10;
It 0.008 time when there is entries that match query and 0.05 when no entries match.
I tried different variations before:
SELECT * FROM items WHERE item_id IN (
SELECT `item_id`
FROM `items_filters`
WHERE (`filter_id`='1' AND `value_id`=1)
OR (`filter_id`='2' AND `value_id`=7)
GROUP BY `item_id`
HAVING COUNT(*) = 2
) LIMIT 0,6;
This completely freezes mysql when there are no entries.
What I really don't get is that
SELECT i.*
FROM items_filters itf INNER JOIN items i ON i.item_id = itf.item_id
WHERE itf.filter_id = 1 AND itf.value_id = '1' LIMIT 0,1
takes ~0.05 when no entries found and ~0.008 when there are
Explain
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
| 1 | SIMPLE | i | ALL | PRIMARY | NULL | NULL | NULL | 10 | Using temporary; Using filesort |
| 1 | SIMPLE | itf | ref | item_id | item_id | 4 | ss_stylet.i.item_id | 1 | Using where; Using index |
Aside from ensuring and index on items_filters on both (filter_id, value_id), I would prequalify your item IDs up front with a group by, THEN join to the items table. It looks like you are trying to find an item that meets two specific conditions, and for those, grab the items...
I've also left the "group by with rollup" in the outer, even though there will be a single instance per ID returned from the inner query. But since the inner query is already applying the limit of 0,10 records, its not throwing too many results to be joined to your items table.
However, since you are not doing any aggregates, I believe the outer group by and rollup are not really going to provide you any benefit and could otherwise be removed.
SELECT i.*
FROM
( select itf.item_id
from items_filters itf
WHERE (itf.filter_id = 1 AND itf.value_id = '1')
OR (itf.filter_id = 2 AND itf.value_id = '7')
GROUP BY itf.item_id
HAVING COUNT(*) = 2
LIMIT 0, 10 ) PreQualified
JOIN items i
ON PreQualified.item_id = i.item_id
Another approach MIGHT be to do a JOIN on the inner query so you don't even need to apply a group by and having. Since you are explicitly looking for exactly two items, I would then try the following. This way, the first qualifier is it MUST have an entry of the ID = 1 and value = '1'. It it doesn't even hit THAT entry, it would never CARE about the second. Then, by applying a join to the same table (aliased itf2), it has to find on that same ID -- AND the conditions for the second (id = 2 value = '7'). This basically forces a look almost like a single pass against the one entry FIRST and foremost before CONSIDERING anything else. That would STILL result in your limited set of 10 before getting item details.
SELECT i.*
FROM
( select itf.item_id
from items_filters itf
join items_filters itf2
on itf.item_id = itf2.item_id
AND itf2.filter_id = 2
AND itf2.value_id = '7'
WHERE
itf.filter_id = 1 AND itf.value_id = '1'
LIMIT 0, 10 ) PreQualified
JOIN items i
ON PreQualified.item_id = i.item_id
I also removed the group by / with rollup as per your comment of duplicates (which is what I expected).
That looks like four tables to me.
Do an EXPLAIN PLAN on the query and look for a TABLE SCAN. If you see one, add indexes on the columns in the WHERE clauses. Those will certainly help.

Why is Mysql using the wrong index?

Mysql is using an index on (faver_profile_id,removed,notice_id) when it should be using the index on (faver_profile_id,removed,id). The weird thing is that for some values of faver_profile_id it does use the correct index. I can use FORCE INDEX which drastically speeds up the query, but I'd like to figure out why mysql is doing this.
This is a new table (35m rows) copied from another table using INSERT INTO.. SELECT FROM.
I did not run OPTIMIZE TABLE or ANALYZE after. Could that help?
SELECT `Item`.`id` , `Item`.`cached_image` , `Item`.`submitter_id` , `Item`.`source_title` , `Item`.`source_url` , `Item`.`source_image` , `Item`.`nudity` , `Item`.`tags` , `Item`.`width` , `Item`.`height` , `Item`.`tumblr_id` , `Item`.`tumblr_reblog_key` , `Item`.`fave_count` , `Item`.`file_size` , `Item`.`animated` , `Favorite`.`id` , `Favorite`.`created`
FROM `favorites` AS `Favorite`
LEFT JOIN `items` AS `Item` ON ( `Favorite`.`notice_id` = `Item`.`id` )
WHERE `faver_profile_id` =11619
AND `Favorite`.`removed` =0
AND `Item`.`removed` =0
AND `nudity` =0
ORDER BY `Favorite`.`id` DESC
LIMIT 26
Query execution plan: "idx_notice_id_profile_id" is an index on (faver_profile_id,removed,notice_id)
1 | SIMPLE | Favorite | ref | idx_faver_idx_id,idx_notice_id_profile_id,notice_id_idx | idx_notice_id_profile_id | 4 | const,const | 15742 | Using where; Using filesort |
1 | SIMPLE | Item | eq_ref | PRIMARY | PRIMARY | 4 | gragland_imgfave.Favorite.notice_id | 1 | Using where
I don't know if its causing any confusion or not, but maybe by moving some of the AND qualifiers to the Item's join might help as its directly correlated to the ITEM and not the favorite. In addition, I've explicitly qualified table.field references where they were otherwise missing.
SELECT
Item.id,
Item.cached_image,
Item.submitter_id,
Item.source_title,
Item.source_url,
Item.source_image,
Item.nudity,
Item.tags,
Item.width,
Item.height,
Item.tumblr_id,
Item.tumblr_reblog_key,
Item.fave_count,
Item.file_size,
Item.animated,
Favorite.id,
Favorite.created
FROM favorites AS Favorite
LEFT JOIN items AS Item
ON Favorite.notice_id = Item.id
AND Item.Removed = 0
AND Item.Nudity = 0
WHERE Favorite.faver_profile_id = 11619
AND Favorite.removed = 0
ORDER BY Favorite.id DESC
LIMIT 26
So now, from the "Favorites" table, its criteria is explicitly down to faver_profile_id, removed, id (for order)

is there better way to do these mysql queries?

Currently i have three different tables and three different queries, which are very similiar to each others with almost same joins. I was trying to combine all that three queries with in one query, so far not much success though. I will be very happy if someone has better solution or a direction to point. Thanks.
0.0013
SELECT `ilan_genel`.`id`, `ilan_genel`.`durum`, `ilan_genel`.`kategori`, `ilan_genel`.`tip`, `ilan_genel`.`ozellik`, `ilan_genel`.`m2`, `ilan_genel`.`fiyat`, `ilan_genel`.`baslik`, `ilan_genel`.`ilce`, `ilan_genel`.`mahalle`, `ilan_genel`.`parabirimi`, `kgsim_ilceler`.`isim` as ilce, (
SELECT ilanresimler.resimlink
FROM ilanresimler
WHERE ilanresimler.ilanid = ilan_genel.id LIMIT 1
) AS resim
FROM (`ilan_genel`)
LEFT JOIN `kgsim_ilceler` ON `kgsim_ilceler`.`id` = `ilan_genel`.`ilce`
ORDER BY `id` desc
LIMIT 30
0.0006
SELECT `video`.`id`, `video`.`url`, `ilan_genel`.`ilce`, `ilan_genel`.`tip`, `ilan_genel`.`m2`, `ilan_genel`.`ozellik`, `ilan_genel`.`fiyat`, `ilan_genel`.`parabirimi`, `ilan_genel`.`kullanici`, `ilanresimler`.`resimlink` as resim, `uyeler`.`isim` as isim, `uyeler`.`soyisim` as soyisim, `kgsim_ilceler`.`isim` as ilce
FROM (`video`)
LEFT JOIN `ilan_genel` ON `ilan_genel`.`id` = `video`.`id`
LEFT JOIN `kgsim_ilceler` ON `kgsim_ilceler`.`id` = `ilan_genel`.`ilce`
LEFT JOIN `ilanresimler` ON `ilanresimler`.`id` = `ilan_genel`.`resim`
LEFT JOIN `uyeler` ON `uyeler`.`id` = `ilan_genel`.`kullanici`
ORDER BY `siralama` desc
LIMIT 30
0.0005
SELECT `sanaltur`.`id`, `ilan_genel`.`ilce`, `ilan_genel`.`tip`, `ilan_genel`.`m2`, `ilan_genel`.`ozellik`, `ilan_genel`.`fiyat`, `ilan_genel`.`parabirimi`, `ilan_genel`.`kullanici`, `ilanresimler`.`resimlink` as resim, `uyeler`.`isim` as isim, `uyeler`.`soyisim` as soyisim, `kgsim_ilceler`.`isim` as ilce
FROM (`sanaltur`)
LEFT JOIN `ilan_genel` ON `ilan_genel`.`id` = `sanaltur`.`id`
LEFT JOIN `kgsim_ilceler` ON `kgsim_ilceler`.`id` = `ilan_genel`.`ilce`
LEFT JOIN `ilanresimler` ON `ilanresimler`.`id` = `ilan_genel`.`resim`
LEFT JOIN `uyeler` ON `uyeler`.`id` = `ilan_genel`.`kullanici`
ORDER BY `siralama` desc
LIMIT 30
These are actually three very different queries. I don't think you will be able to usefully combine them. Also, they seem pretty fast to me.
However, if you want to try to optimize each individual query, you can use EXPLAIN SELECT to find out how if each query uses appropriate indexes or not.
For example:
EXPLAIN SELECT *
FROM A
WHERE foo NOT IN (1,4,5,6);
Might yield:
+----+-------------+-------+------+---------------
| id | select_type | table | type | possible_keys
+----+-------------+-------+------+---------------
| 1 | SIMPLE | A | ALL | NULL
+----+-------------+-------+------+---------------
+------+---------+------+------+-------------+
| key | key_len | ref | rows | Extra |
+------+---------+------+------+-------------+
| NULL | NULL | NULL | 2 | Using where |
+------+---------+------+------+-------------+
In this case, the query had no possible_keys and therefore used no (or NULL) key to do the query. It's the key column you'd be interested in.
More information here:
http://dev.mysql.com/doc/refman/5.5/en/explain.html
http://dev.mysql.com/doc/refman/5.5/en/optimization-indexes.html

How to select an item, the one below and the one above in MYSQL

I have a database with ID's that are non-integers like this:
b01
b02
b03
d01
d02
d03
d04
s01
s02
s03
s04
s05
etc. The letters represent the type of product, the numbers the next one in that group.
I'd like to be able to select an ID, say d01, and get b03, d01, d02 back. How do I do this in MYSQL?
Here is another way to do it using UNIONs. I think this is a little easier to understand and more flexible than the accepted answer. Note that the example assumes the id field is unique, which appears to be the case based on your question.
The SQL query below assumes your table is called demo and has a single unique id field, and the table has been populated with the values you listed in your question.
( SELECT id FROM demo WHERE STRCMP ( 'd01', id ) > 0 ORDER BY id DESC LIMIT 1 )
UNION ( SELECT id FROM demo WHERE id = 'd01' ORDER BY id ) UNION
( SELECT id FROM demo WHERE STRCMP ( 'd01', id ) < 0 ORDER BY id ASC LIMIT 1 )
ORDER BY id
It produces the following result: b03, d01, d02.
This solution is flexible because you can change each of the LIMIT 1 statements to LIMIT N where N is any number. That way you can get the previous 3 rows and the following 6 rows, for example.
Note: this is from M$ SQL Server, but the only thing that needs tweaking is the isnull function.
select *
from test m
where id between isnull((select max(id) from #test where col < 'd01'),'d01')
and isnull((select min(id) from #test where col > 'd01'),'d01')
Find your target row,
SELECT p.id FROM product WHERE id = 'd01'
and the row above it with no other row between the two.
LEFT JOIN product AS p1 ON p1.id > p.id -- gets the rows above it
LEFT JOIN -- gets the rows between the two which needs to not exist
product AS p1a ON p1a.id > p.id AND p1a.id < p1.id
and similarly for the row below it. (Left as an exercise for the reader.)
In my experience this is also quite efficient.
SELECT
p.id, p1.id, p2.id
FROM
product AS p
LEFT JOIN
product AS p1 ON p1.id > p.id
LEFT JOIN
product AS p1a ON p1a.id > p.id AND p1a.id < p1.id
LEFT JOIN
product AS p2 ON p2.id < p.id
LEFT JOIN
product AS p2a ON p2a.id < p.id AND p2a.id > p2.id
WHERE
p.id = 'd01'
AND p1a.id IS NULL
AND p2a.ID IS NULL
Although not a direct answer to your question I personally wouldn't rely on the natural order, since it may change duo to import/exports and produce side effects not easily understandable by fellow programmers. What about creating an alternate INTEGER index and fire up another query? "WHERE id > ...yourdesiredid ... LIMIT 1"?
mysql> describe test;
+-------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+-------+
| id | varchar(50) | YES | | NULL | |
+-------+-------------+------+-----+---------+-------+
mysql> select * from test;
+------+
| id |
+------+
| b01 |
| b02 |
| b03 |
| b04 |
+------+
mysql> select * from test where id >= 'b02' LIMIT 3;
+------+
| id |
+------+
| b02 |
| b03 |
| b04 |
+------+
What about using a cursor? This would let you traverse the returned set one row at a time. using it with two variables (like "current" and "last"), you could inchworm along the result until you hit your target. Then return the value of "last" (for n-1), your entered target (n), and then traverse / iterate one more time and return the "current" (n+1).