is there better way to do these mysql queries? - mysql

Currently i have three different tables and three different queries, which are very similiar to each others with almost same joins. I was trying to combine all that three queries with in one query, so far not much success though. I will be very happy if someone has better solution or a direction to point. Thanks.
0.0013
SELECT `ilan_genel`.`id`, `ilan_genel`.`durum`, `ilan_genel`.`kategori`, `ilan_genel`.`tip`, `ilan_genel`.`ozellik`, `ilan_genel`.`m2`, `ilan_genel`.`fiyat`, `ilan_genel`.`baslik`, `ilan_genel`.`ilce`, `ilan_genel`.`mahalle`, `ilan_genel`.`parabirimi`, `kgsim_ilceler`.`isim` as ilce, (
SELECT ilanresimler.resimlink
FROM ilanresimler
WHERE ilanresimler.ilanid = ilan_genel.id LIMIT 1
) AS resim
FROM (`ilan_genel`)
LEFT JOIN `kgsim_ilceler` ON `kgsim_ilceler`.`id` = `ilan_genel`.`ilce`
ORDER BY `id` desc
LIMIT 30
0.0006
SELECT `video`.`id`, `video`.`url`, `ilan_genel`.`ilce`, `ilan_genel`.`tip`, `ilan_genel`.`m2`, `ilan_genel`.`ozellik`, `ilan_genel`.`fiyat`, `ilan_genel`.`parabirimi`, `ilan_genel`.`kullanici`, `ilanresimler`.`resimlink` as resim, `uyeler`.`isim` as isim, `uyeler`.`soyisim` as soyisim, `kgsim_ilceler`.`isim` as ilce
FROM (`video`)
LEFT JOIN `ilan_genel` ON `ilan_genel`.`id` = `video`.`id`
LEFT JOIN `kgsim_ilceler` ON `kgsim_ilceler`.`id` = `ilan_genel`.`ilce`
LEFT JOIN `ilanresimler` ON `ilanresimler`.`id` = `ilan_genel`.`resim`
LEFT JOIN `uyeler` ON `uyeler`.`id` = `ilan_genel`.`kullanici`
ORDER BY `siralama` desc
LIMIT 30
0.0005
SELECT `sanaltur`.`id`, `ilan_genel`.`ilce`, `ilan_genel`.`tip`, `ilan_genel`.`m2`, `ilan_genel`.`ozellik`, `ilan_genel`.`fiyat`, `ilan_genel`.`parabirimi`, `ilan_genel`.`kullanici`, `ilanresimler`.`resimlink` as resim, `uyeler`.`isim` as isim, `uyeler`.`soyisim` as soyisim, `kgsim_ilceler`.`isim` as ilce
FROM (`sanaltur`)
LEFT JOIN `ilan_genel` ON `ilan_genel`.`id` = `sanaltur`.`id`
LEFT JOIN `kgsim_ilceler` ON `kgsim_ilceler`.`id` = `ilan_genel`.`ilce`
LEFT JOIN `ilanresimler` ON `ilanresimler`.`id` = `ilan_genel`.`resim`
LEFT JOIN `uyeler` ON `uyeler`.`id` = `ilan_genel`.`kullanici`
ORDER BY `siralama` desc
LIMIT 30

These are actually three very different queries. I don't think you will be able to usefully combine them. Also, they seem pretty fast to me.
However, if you want to try to optimize each individual query, you can use EXPLAIN SELECT to find out how if each query uses appropriate indexes or not.
For example:
EXPLAIN SELECT *
FROM A
WHERE foo NOT IN (1,4,5,6);
Might yield:
+----+-------------+-------+------+---------------
| id | select_type | table | type | possible_keys
+----+-------------+-------+------+---------------
| 1 | SIMPLE | A | ALL | NULL
+----+-------------+-------+------+---------------
+------+---------+------+------+-------------+
| key | key_len | ref | rows | Extra |
+------+---------+------+------+-------------+
| NULL | NULL | NULL | 2 | Using where |
+------+---------+------+------+-------------+
In this case, the query had no possible_keys and therefore used no (or NULL) key to do the query. It's the key column you'd be interested in.
More information here:
http://dev.mysql.com/doc/refman/5.5/en/explain.html
http://dev.mysql.com/doc/refman/5.5/en/optimization-indexes.html

Related

where like and order by on different tables/columns

For information, on the following examples, big_table is composed of millions of rows and small_table of hundreds.
Here is the basic query i'm trying to do:
SELECT b.id
FROM big_table b
LEFT JOIN small_table s
ON b.small_id=s.id
WHERE s.name like 'something%'
ORDER BY b.name
LIMIT 10, 10;
This is slow and I can understand why both index can't be used.
My initial idea was to split the query into parts.
This is fast:
SELECT id FROM small_table WHERE name like 'something%';
This is also fast:
SELECT id FROM big_table WHERE small_id IN (1, 2) ORDER BY name LIMIT 10, 10;
But, put together, it becomes slow:
SELECT id FROM big_table
WHERE small_id
IN (
SELECT id
FROM small_table WHERE name like 'something%'
)
ORDER BY name
LIMIT 10, 10;
Unless the subquery is re-evaluated for every row, it shouldn't be slower than executing both query separately right?
I'm looking for any help optimizing the initial query and understanding why the second one doesn't work.
EXPLAIN result for the last query :
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
| 1 | PRIMARY | small_table | range | PRIMARY, ix_small_name | ix_small_name | 768 | NULL | 1 | Using where; Using index; Using temporary; Using filesort |
| 1 | PRIMARY | big_table | ref | ix_join_foreign_key | ix_join_foreign_key | 9 | small_table.id | 11870 | |
temporary solution :
SELECT id FROM big_table ignore index(ix_join_foreign_key)
WHERE small_id
IN (
SELECT id
FROM small_table ignore index(PRIMARY)
WHERE name like 'something%'
)
ORDER BY name
LIMIT 10, 10;
(result & explain is the same with an EXISTS instead of IN)
EXPLAIN output becomes:
| 1 | PRIMARY | big_table | index | NULL | ix_big_name | 768 | NULL | 20 | |
| 1 | PRIMARY | <subquery2> | eq_ref | distinct_key | distinct_key | 8 | func | 1 | |
| 2 | MATERIALIZED | small_table | range | ix_small_name | ix_small_name | 768 | NULL | 1 | Using where; Using index |
if anyone has a better solution, I'm still interested.
The problem that you are facing is that you have conditions on the small table but are trying to avoid a sort in the large table. In MySQL, I think you need to do at least a full table scan.
One step is to write the query using exists, as others have mentioned:
SELECT b.id
FROM big_table b
WHERE EXISTS (SELECT 1
FROM small_table s
WHERE s.name LIKE 'something%' AND s.id = b.small_id
)
ORDER BY b.name;
The question is: Can you trick MySQL into doing the ORDER BY using an index? One possibility is to use the appropriate index. In this case, the appropriate index is: big_table(name, small_id, id) and small_table(id, name). The ordering of the keys in the index is important. Because the first is a covering index, MySQL might read through the index in order by name, choosing the appropriate ids.
You are looking for an EXISTS or IN query. As MySQL is known to be weak on IN I'd try EXISTS in spite of liking IN better for its simplicity.
select id
from big_table b
where exists
(
select *
from small_table s
where s.id = b.small_id
and s.name = 'something%'
)
order by name
limit 10, 10;
It would be helpful to have a good index on big_table. It should first contain the small_id to find the match, then the name for the sorting. The ID is automatically included in MySQL indexes, as far as I know (otherwise it should also be added to the index). So thus you'd have an index containing all fields needed from big_table (that is called a covering index) in the desired order, so all data can be read from the index alone and the table itself doesn't have to get accessed.
create index idx_big_quick on big_table(small_id, name);
you can try this:
SELECT b.id
FROM big_table b
JOIN small_table s
ON b.small_id = s.id
WHERE s.name like 'something%'
ORDER BY b.name;
or
SELECT b.id FROM big_table b
WHERE EXISTS(SELECT 1 FROM small_table s
WHERE s.name LIKE 'something%' AND s.id = b.small_id)
ORDER BY b.name;
NOTE: you don't seem to need LEFT JOIN. Left outer join will almost always result in full table scan of the big_table
PS make sure you have an index on big_table.small_id
Plan A
SELECT b.id
FROM big_table b
JOIN small_table s ON b.small_id=s.id
WHERE s.name like 'something%'
ORDER BY b.name
LIMIT 10, 10;
(Note removal of LEFT.)
You need
small_table: INDEX(name, id)
big_table: INDEX(small_id), or, for 'covering': INDEX(small_id, name, id)
It will use the s index to find 'something%' and walk through. But it must find all such rows, and JOIN to b to find all such rows there. Only then can it do the ORDER BY, OFFSET, and LIMIT. There will be a filesort (which may happen in RAM).
The column order in the indexes is important.
Plan B
The other suggestion may work well; it depends on various things.
SELECT b.id
FROM big_table b
WHERE EXISTS
( SELECT *
FROM small_table s
WHERE s.name LIKE 'something%'
AND s.id = b.small_id
)
ORDER BY b.name
LIMIT 10, 10;
That needs these:
big_table: INDEX(name), or for 'covering', INDEX(name, small_id, id)
small_table: INDEX(id, name), which is 'covering'
(Caveat: If you are doing something other than SELECT b.id, my comments about covering may be wrong.)
Which is faster (A or B)? Cannot predict without understanding the frequency of 'something%' and how 'many' the many-to-1 mapping is.
Settings
If these tables are InnoDB, then be sure that innodb_buffer_pool_size is set to about 70% of available RAM.
Pagination
Your use of OFFSET implies that you are 'paging' through the data? OFFSET is an inefficient way to do it. See my blog on such, but note that only Plan B will work with it.

Rows column in Query Plan confusing

I have a MySql query
SELECT TE.company_id,
SUM(TE.debit- TE.credit) As summation
FROM Transactions T JOIN Transaction_E TE2
ON (T.parent_id = TE2.transaction_id)
JOIN Transaction_E TE
ON (TE.transaction_id = T.id AND TE.company_id IS NOT NULL)
JOIN Accounts A
ON (TE2.account_id=A.id AND A.deactivated_timestamp=0)
WHERE (TE.company_id IN (1,2))
AND A.user_id=2341 GROUP BY TE.company_id;
When I explain the query, the plan for it is like (in summary):
| Select type | table | type | rows |
-------------------------------------
| SIMPLE | A | ref | 2 |
| SIMPLE | TE2 | ref | 17 |
| SIMPLE | T | ref | 1 |
| SIMPLE | TE | ref | 1 |
But if I do a count(*) on the same query (instead of SUM(..) ), then it shows that there are ~40k rows for a particular company_id. What I don't understand is why the query plan shows so few rows being scanned while there is at least 40k rows being processed. What does the rows column in the query plan represent? Does it not represent the number of rows that get processed in that table? In that case it should be at most 2*17*1*1 = 34 rows?
The query plan just shows a high level judgement on the expected number of rows required per table to meet the end result.
It is to be used as a tool for judging as to how the optimizer is 'seeing' your query, and to help it a bit, in case query performance is worse or can be improved.
There is always a possibility that the query plan is built based on an earlier snapshot of statistics, and hence should not be taken on face value, especially while dealing with cardinality.
Well, first let's get rid of the computational bug:
SELECT TE.company_id, TE.summation
FROM
( SELECT company_id,
SUM(debit - credit) As summation
FROM Transaction_E
WHERE company_id IN (1,2)
) TE
JOIN Transactions T ON TE.transaction_id = T.id
JOIN Transaction_E TE2 ON T.parent_id = TE2.transaction_id
JOIN Accounts A ON TE2.account_id = A.id
AND A.deactivated_timestamp = 0
WHERE A.user_id = 2341;
Your query is probably summing up the same company multiple times before doing the GROUP BY. My variant avoids that inflation of the aggregate.
I got rid of TE.company_id IS NOT NULL because it was redundant.
See what the EXPLAIN says about this, then let's discuss your question about EXPLAIN further.

Why is this MySQL query slow?

I have the following query, all relevant columns are indexed correctly. MySQL version 5.0.8. The query takes forever:
SELECT COUNT(*) FROM `members` `t` WHERE t.member_type NOT IN (1,2)
AND ( SELECT end_date FROM subscriptions s
WHERE s.sub_auth_id = t.member_auth_id AND s.sub_status = 'Completed'
AND s.sub_pkg_id > 0 ORDER BY s.id DESC LIMIT 1 ) < curdate( )
EXPLAIN output:
----+--------------------+-------+-------+-----------------------+---------+---------+------+------+-------------
id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra
----+--------------------+-------+-------+-----------------------+---------+---------+------+------+-------------
1 | PRIMARY | t | ALL | membership_type | NULL | NULL | NULL | 9610 | Using where
----+--------------------+-------+-------+-----------------------+---------+---------+------+------+-------------
2 | DEPENDENT SUBQUERY | s | index | subscription_auth_id, | PRIMARY | 4 | NULL | 1 | Using where
| | | | subscription_pkg_id, | | | | |
| | | | subscription_status | | | | |
----+--------------------+-------+-------+-----------------------+---------+---------+------+------+-------------
Why?
Your subselect refers to values in the parent query. This is known as a correlated (dependent) subquery, and such a query has to be executed once for every row in the parent query, which often leads to poor performance. It is often faster to rewrite the query as a JOIN, for example like this
(Note: without a sample schema to test with, it is impossible to say in advance if this will be faster and still correct, you might need to adjust it a little):
SELECT COUNT(*) FROM members t
LEFT JOIN (
SELECT sub_auth_id as member_id, max(id) as sid FROM subscriptions
WHERE sub_status = 'Completed'
AND sub_pkg_id > 0
GROUP BY sub_auth_id
LEFT JOIN (
SELECT id AS subid, end_date FROM subscriptions
WHERE sub_status = 'Completed'
AND sub_pkg_id > 0
) sdate ON sid = subid
) sub ON sub.member_id = t.member_auth_id
WHERE t.member_type NOT IN (1,2)
AND sub.end_date < curdate( )
The logic here is:
For each member, find his latest subscription.
For each latest subscription, find its end date.
Join these member-latest_sub_date pair to the members list.
Filter the list.
Your query is slow because as written you are considering 9,610 rows and therefore performing 9,610 SELECT subqueries in your WHERE clause. You really should rewrite your query to JOIN the members and subscriptions tables first, to which your WHERE conditions could still apply.
EDIT: Try this.
SELECT COUNT(*)
FROM `members` `t`
JOIN subscriptions s ON (s.sub_auth_id = t.member_auth_id)
WHERE t.member_type NOT IN (1,2)
AND s.sub_status = 'Completed'
AND s.sub_pkg_id > 0
AND end_date < curdate()
ORDER BY s.id DESC LIMIT 1
Caveat: I'm not a MySQL expert, but pretty good in a different SQL flavour (VFP), but I believe you will save some time if:
You count just one field, let's say memberid, instead of *.
Your comparison NOT IN (1,2) is replaced with > 2 (provided that is valid).
The ORDER BY in your subselect is unnecessary, I think. You're trying to get the last completed subscription?
The < curdate() should be inside your subselect's WHERE.
(SELECT end_date FROM subscriptions s
WHERE s.end_date < curdate() and s.sub_auth_id = t.member_auth_id AND
s.sub_status = 'Completed' AND s.sub_pkg_id > 0 ORDER BY s.id DESC LIMIT 1 )
Tune your subselect so as to trim down the set as quickly as possible. The first conditional should be the one least likely to occur.
I ended up doing it like this:
select count(*) from members t
JOIN subscriptions s ON s.sub_auth_id = t.member_auth_id
WHERE t.membership_type > 2 AND s.sub_status = 'Completed' AND s.sub_pkg_id > 0
AND s.sub_end_date < curdate( )
AND s.id = (SELECT MAX(ss.id) FROM subscriptions ss WHERE ss.sub_auth_id = t.member_auth_id)
I believe that the problem is due to a bug that won't be fixed until MySQL 6.

Query works too slow when there is no results. How to improve it?

I have three tables
filters (id, name)
items(item_id, name)
items_filters(item_id, filter_id, value_id)
values(id, filter_id, filter_value)
about 20000 entries in items.
about 80000 entries in items_filters.
SELECT i.*
FROM items_filters itf INNER JOIN items i ON i.item_id = itf.item_id
WHERE (itf.filter_id = 1 AND itf.value_id = '1')
OR (itf.filter_id = 2 AND itf.value_id = '7')
GROUP BY itf.item_id
WITH ROLLUP
HAVING COUNT(*) = 2
LIMIT 0,10;
It 0.008 time when there is entries that match query and 0.05 when no entries match.
I tried different variations before:
SELECT * FROM items WHERE item_id IN (
SELECT `item_id`
FROM `items_filters`
WHERE (`filter_id`='1' AND `value_id`=1)
OR (`filter_id`='2' AND `value_id`=7)
GROUP BY `item_id`
HAVING COUNT(*) = 2
) LIMIT 0,6;
This completely freezes mysql when there are no entries.
What I really don't get is that
SELECT i.*
FROM items_filters itf INNER JOIN items i ON i.item_id = itf.item_id
WHERE itf.filter_id = 1 AND itf.value_id = '1' LIMIT 0,1
takes ~0.05 when no entries found and ~0.008 when there are
Explain
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
| 1 | SIMPLE | i | ALL | PRIMARY | NULL | NULL | NULL | 10 | Using temporary; Using filesort |
| 1 | SIMPLE | itf | ref | item_id | item_id | 4 | ss_stylet.i.item_id | 1 | Using where; Using index |
Aside from ensuring and index on items_filters on both (filter_id, value_id), I would prequalify your item IDs up front with a group by, THEN join to the items table. It looks like you are trying to find an item that meets two specific conditions, and for those, grab the items...
I've also left the "group by with rollup" in the outer, even though there will be a single instance per ID returned from the inner query. But since the inner query is already applying the limit of 0,10 records, its not throwing too many results to be joined to your items table.
However, since you are not doing any aggregates, I believe the outer group by and rollup are not really going to provide you any benefit and could otherwise be removed.
SELECT i.*
FROM
( select itf.item_id
from items_filters itf
WHERE (itf.filter_id = 1 AND itf.value_id = '1')
OR (itf.filter_id = 2 AND itf.value_id = '7')
GROUP BY itf.item_id
HAVING COUNT(*) = 2
LIMIT 0, 10 ) PreQualified
JOIN items i
ON PreQualified.item_id = i.item_id
Another approach MIGHT be to do a JOIN on the inner query so you don't even need to apply a group by and having. Since you are explicitly looking for exactly two items, I would then try the following. This way, the first qualifier is it MUST have an entry of the ID = 1 and value = '1'. It it doesn't even hit THAT entry, it would never CARE about the second. Then, by applying a join to the same table (aliased itf2), it has to find on that same ID -- AND the conditions for the second (id = 2 value = '7'). This basically forces a look almost like a single pass against the one entry FIRST and foremost before CONSIDERING anything else. That would STILL result in your limited set of 10 before getting item details.
SELECT i.*
FROM
( select itf.item_id
from items_filters itf
join items_filters itf2
on itf.item_id = itf2.item_id
AND itf2.filter_id = 2
AND itf2.value_id = '7'
WHERE
itf.filter_id = 1 AND itf.value_id = '1'
LIMIT 0, 10 ) PreQualified
JOIN items i
ON PreQualified.item_id = i.item_id
I also removed the group by / with rollup as per your comment of duplicates (which is what I expected).
That looks like four tables to me.
Do an EXPLAIN PLAN on the query and look for a TABLE SCAN. If you see one, add indexes on the columns in the WHERE clauses. Those will certainly help.

MySQL Slow Query Optimisation

I have a database ~800k records showing ticket purchases. All tables are InnoDB. The slow query is:
SELECT e.id AS id, e.name AS name, e.url AS url, p.action AS action, gk.key AS `key`
FROM event AS e
LEFT JOIN participation AS p ON p.event=e.id
LEFT JOIN goldenkey AS gk ON gk.issuedto=p.person
WHERE p.person='139160'
OR p.person IS NULL;
This query is coming from PDO hence quoting of p.person. All columns used in JOINs and WHERE are indexed. p.event is foreign key constrained to e.id and gk.issuedto and p.person are foreign key constrained to an unmentioned table, person.id. All these are INTs. The table e is small - only 10 rows. Table p is ~500,000 rows and gk is empty at this time.
This query runs on a person's details page. We want to get a list of all events, then if there is a participation row their participation and if there is a golden key row then their golden key.
Slow query log gives:
Query_time: 12.391201 Lock_time: 0.000093 Rows_sent: 2 Rows_examined: 466104
EXPLAIN SELECT gives:
+----+-------------+-------+------+---------------+----------+---------+----------------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+---------------+----------+---------+----------------+------+-------------+
| 1 | SIMPLE | e | ALL | NULL | NULL | NULL | NULL | 10 | |
| 1 | SIMPLE | p | ref | event | event | 4 | msadb.e.id | 727 | Using where |
| 1 | SIMPLE | gk | ref | issuedto | issuedto | 4 | msadb.p.person | 1 | |
+----+-------------+-------+------+---------------+----------+---------+----------------+------+-------------+
This query runs at 7~12 seconds on first run for a given p.person then <0.05s in future. Dropping the OR p.person IS NULL does not improve query time. This query slowed right down when the size of p was increased from ~20k to ~500k (import of old data).
Does anyone have any suggestions on how to improve performance? Remembering overall aim is to retrieve a list of all events, then if there is a participation row their participation and if there is a golden key row then their golden key. If multiple queries will be more efficient I can do that.
If you can do away with p.person IS NULL try the following and see if it helps:
SELECT e.id AS id, e.name AS name, e.url AS url, p.action AS action, gk.key AS `key`
FROM event AS e
LEFT JOIN participation AS p ON (p.event=e.id AND p.person='139160')
LEFT JOIN goldenkey AS gk ON gk.issuedto=p.person
For grins... Add the keyword "STRAIGHT_JOIN" to your select...
SELECT STRAIGHT_JOIN ... rest of query...
I'm not sure how many indexes you have and schema of your table, but try avoid using null values by default, it can slow down your queries dramatically.
If you are doing a lookup for one particular person, which I'm guessing you are since you have the person id filter in there. I would try and reverse the query, so you are first searching though the person table and then making a union to and additional query which gives you all the events.
SELECT
e.id AS id, e.name AS name, e.url AS url,
p.action AS action, gk.key AS `key`
FROM person AS p
JOIN event AS e ON p.event=e.id
LEFT JOIN goldenkey AS gk ON gk.issuedto=p.person
UNION
SELECT
e.id AS id, e.name AS name, e.url AS url,
NULL, NULL
FROM event AS e
This would obviously mean you have a duplicate event in case the first query matches, but thats easily solved by wrapping a select around the whole thing, or maybe by using a variable and selecting the e.id into that in the first query and using that variable in the second query (not sure if this will work though, haven't tested it, cant see why not though).