Improve JOIN query speed - mysql

I have this simple join that works great but is HORRIBLY slow I think because the tech table is very large. There are many instances of uid as it tracks timestamp of the uid thus the distinct. What is the best way to speed this query up?
SELECT DISTINCT tech.uid,
listing.empno,
listing.firstname,
listing.lastname
FROM tech,
listing
WHERE tech.uid = listing.empno
ORDER BY listing.empno ASC

First add an Index to tech.UID and listing.EmpNo on their respective tables.
After you are sure there are indexes you can try to re-write your query like this:
SELECT DISTINCT tech.uid, listing.EmpNo, listing.FirstName, listing.LastName
FROM listing INNER JOIN tech ON tech.uid = listing.EmpNo
ORDER BY listing.EmpNo ASC;
If it's still not fast enough, put the word EXPLAIN before the query to get some hints about the execution plan of the query.
EXPLAIN SELECT DISTINCT tech.uid, listing.EmpNo, listing.FirstName, listing.LastName
FROM listing INNER JOIN tech ON tech.uid = listing.EmpNo
ORDER BY listing.EmpNo ASC;
Posts the Explain results so we can get better insight.
Hope it helps,

This is very simple query. Only thing you can do in SQL - you may add indexes on fields used in JOIN/WHERE and ORDER BY clauses (tech.uid, listing.empno), if there are no indexes.
If there are JOIN fields with NULL values - they may ruin your performance. You should filter them in WHERE clause (WHERE tech.uid is not null and listing.empno not null). If there are many rows with JOIN on NULL field - that data may produce cartesian result (not sure how is this called in english) with may contain enormous count of rows.
You may change MySQL configuration. There are many options useful for performance tuning, like key_buffer_size, sort_buffer_size, tmp_table_size, max_heap_table_size, read_buffer_size etc.

Related

Mysql Select INNER JOIN with order by very slow

I'm trying to speed up a mysql query. The Listings table has several million rows. If I don't sort them later I get the result in 0.1 seconds but once I sort it takes 7 seconds. What can I improve to speed up the query?
SELECT l.*
FROM listings l
INNER JOIN listings_categories lc
ON l.id=lc.list_id
AND lc.cat_id='2058'
INNER JOIN locations loc
ON l.location_id=loc.id
WHERE l.location_id
IN (7841,7842,7843,7844,7845,7846,7847,7848,7849,7850,7851,7852,7853,7854,7855,7856,7857,7858,7859,7860,7861,7862,7863,7864,7865,7866,7867,7868,7869,7870,7871,7872,7873,7874,7875,7876,7877,7878,7879,7880,7881,7882,7883,7884,7885,7886,7887,7888,7889,7890,7891,7892,7893,7894,7895,7896,7897,7898,7899,7900,7901,7902,7903)
ORDER BY date
DESC LIMIT 0,10;
EXPLAIN SELECT: Using Index l=date, loc=primary, lc=primary
Such performance questions are really difficult to answer and depend on the setup, indexes etc. So, there will likely not the one and only solution and even not really correct or incorrect attempts to improve the speed. This is a lof of try and error. Anyway, some points I noted which often cause performance issues are:
Avoid conditions within joins that should be placed in the where instead. A join should contain the columns only that will be joined, no further conditions. So the "lc.cat_id='2058" should be put in the where clause.
Using IN is often slow. You could try to replace it by using OR (l.location_id = 7841 OR location_id = 7842 OR...)
Open the query execution plan and check whether there is something useful for you.
Try to find out if there are special cases/values within the affected columns which slow down your query
Change "ORDER BY date" to "ORDER BY tablealias.date" and check if this makes a difference in performance. Even if not, it is better to read.
If you can rename the column "date", do this because using SQL keywords as table name or column name is no good idea. I'm unsure if this influences the performance, but it should be avoided if possible.
Good luck!
You can try additonal indexes to speed up the query, but you'll have a tradeoff when creating/manipulating data.
These combined keys could speed up the query:
listings: date, location_id
listings_categories: cat_id, list_id
Since the plan says it uses the date index, there wouldn't be a need to read the record to check the location_id when usign the new index, and same for the join with listinngs_category, index read would be enough
l: INDEX(location_id, id)
lc: INDEX(cat_id, list_id)
If those don't suffice, try the following rewrite.
SELECT l2.*
FROM
(
SELECT l1.id
FROM listings AS l1
JOIN listings_categories AS lc ON lc.list_id = l1.id
JOIN locations AS loc ON loc.id = l1.location_id
WHERE lc.cat_id='2058'
AND l1.location_id IN (7841, ..., 7903)
ORDER BY l1.date DESC
LIMIT 0,10
) AS x
JOIN listings l2 ON l1.id = x.id
ORDER BY l2.date DESC
With
listings: INDEX(location_id, date, id)
listings_categories: INDEX(cat_id, list_id)
The idea here is to get the 10 ids from the index before reaching to the table itself. Your version is probably shoveling around the whole table before sorting, and then delivering the 10.

Complex query optimization improve speed

I have the following query that i would like to optimize:
SELECT
*, #rownum := #rownum + 1 AS rank
FROM (
SELECT
SUM(a.id = 1) as KILLS,
SUM(a.id = 2) as DEATHS,
SUM(a.id = 3) as WINS,
tb1.totalPlaytime,
p.playerName
FROM
(
SELECT
player_id,
SUM(pg.timeEnded - pg.timeStarted) as totalPlaytime
FROM playergame pg
INNER JOIN player p
ON pg.player_id = p.id
WHERE pg.game_id IN(1, 2, 3)
GROUP BY
p.id
ORDER BY
p.playerName ASC
) tb1
INNER JOIN playeraction pa
ON pa.player_id = tb1.player_id
INNER JOIN action a
ON pa.action_id = a.id
INNER JOIN player p
ON pa.player_id = p.id
GROUP BY
p.id
ORDER BY
KILLS DESC) tb2
WHERE tb2.playerName LIKE "%"
Somehow i am having the feeling that this is not suited for mysql. I keep a lot of actions in different tables for a good statistical approach but this slows down everything. (perhaps big data?)
This is my model
Now i tried doing the following:
Combining joins in a view
I Combined the many JOINS into a VIEW. This gave me no improvements.
Index the tables
I indexed the frequently used keys, this did speed up but i can't manage to get the entire resultset below 0.613s.
Start from the action table and use left joins
This gave me a somewhat different approach but yet the joins keep being slow (the first example is still the fastest)
indexes:
Any hints, tips, additions, improvements are welcome
I removed my previous answer as it was wrong and did not help, and here I am just summarizing our conversation in the comments with additional comments from myself
There are several ways to speed up the query.
Make sure you are not making any redundant queries.
Do as few joins as possible.
Make indexes on multiple columns if possible.
Make indexes clustered if needed/possible http://dev.mysql.com/doc/refman/5.0/en/innodb-index-types.html
Regarding the query you wrote in the question:
Remove ORDER BY in the inner query
Remove INNER JOIN in the inner query and replace GROUP BY p.id by GROUP BY player_id
Few words on where indexes make sense and where not.
In your case it would not make sense to have index on gameid on table playergame because that probably would return loads of rows. So that is all what you can do about the most inner query.
The joins can also be a bit optimized if you know what you expect from the tables, i.e., the amount of data they may face. you may think of it as a question are you building database behind a MMO game of FPS. MMO will have millions of users per game, FPS will have only a few. Also different types of games may have different actions. That would imply that you may try to optimize the query by making the index more precise. If you are able to define in the inner join of action that gameid IN (...) then creating an index on tuple (gameid, id) might help.
Wildcart in WHERE clause. You may try to create an index on playername but it will only work if you look with a wildcard at the end of your search string, for one in the beginning you would need a separate index, and hope that query optimizer will be smart enough to switch between them each time you make a query.
Keep in mind that more indexes imply slowed insert and delete, so keep only as few as possible.
Another thing would be redesigning the structure a bit. You may still keep the database normalized, but maybe it would be usefull to have a table with summary of some games. You may have a table with summary of games that happened before yesterday, and your query would only summarize the data for today, and then join both tables if needed. Then you could optimize it by either creating and index on timestamp or partitioning table by day. Everything depends on the load you expect.
The topic is rather deep, so everything depends on what is the story behind the data.

How to optimize a MySQL query which uses variables multiple times?

I have a query like the following:
SELECT product.id FROM products
INNER JOIN supplier ON supplier.id = product.supplier_id
WHERE supplier.country = 'UK'
AND (
(1000 BETWEEN product.date_on AND product.date_off) OR
(2000 BETWEEN product.date_on AND product.date_off) OR
(2000 >= product.date_on AND
(product.date_off IS NULL OR 1000 <= product.date_off))
)
That query runs way too slow, and I believe I need some indexes, but I am not sure what to add. I have index on product.date_on and product.date_off, but because I am comparing those values multiple times in the AND clause I believe the indexes aren't used.
Maybe a composite index can be used, but I have no idea which fields and in what order should go in it in order to optimize this.
Btw, 1000 and 2000 are the 2 variables that i am passing...
For optimizing queries, you should look into execution plan.
Just prepend EXPLAIN in front of query.
EXPLAIN SELECT product.id FROM products ...
You can interpret results of EXPLAIN using following link http://dev.mysql.com/doc/refman/5.0/en/explain-output.html
Look at parts with maximum rows and think how to optimize this parts. Use possible_keys as a hint.

Why does the query take a long time in mysql even with a LIMIT clause?

Say I have an Order table that has 100+ columns and 1 million rows. It has a PK on OrderID and FK constraint StoreID --> Store.StoreID.
1) select * from 'Order' order by OrderID desc limit 10;
the above takes a few milliseconds.
2) select * from 'Order' o join 'Store' s on s.StoreID = o.StoreID order by OrderID desc limit 10;
this somehow can take up to many seconds. The more inner joins I add, slows it down further more.
3) select OrderID, column1 from 'Order' o join 'Store' s on s.StoreID = o.StoreID order by OrderID desc limit 10;
this seems to speed the execution up, by limiting the columns we select.
There are a few points that I dont understand here and would really appreciate it if anyone more knowledgeable with mysql (or rmdb query execution in general) can enlighten me.
Query 1 is fast since it's just a reverse lookup by PK and DB only needs to return the first 10 rows it encountered.
I don't see why Query 2 should take for ever. Shouldn't the operation be the same? i.e. get the first 10 rows by PK and then join with other tables. Since there's a FK constraint, it is guaranteed that the relationship will be satisfied. So DB doesn't need to join more rows than necessary and then trim the result, right? Unless, FK constraint allows null FK? In which case I guess a left join would make this much faster than an inner join?
Lastly, I'm guess query 3 is simply faster because less columns are used in those unnecessary joins? But why would the query execution need the other columns while joining? Shouldn't it just join using PKs first, and then get the columns for just the 10 rows?
Thanks!
My understanding is that the mysql engine applies limit after any join's happen.
From http://dev.mysql.com/doc/refman/5.0/en/select.html, The HAVING clause is applied nearly last, just before items are sent to the client, with no optimization. (LIMIT is applied after HAVING.)
EDIT: You could try using this query to take advantage of the PK speed.
select * from (select * from 'Order' order by OrderID desc limit 10) o
join 'Store' s on s.StoreID = o.StoreID;
All of your examples are asking for tablescans of the existing tables, so none of them will be more or less performant than the degree to which mysql can cache the data or results. Some of your queries have order by or join criteria, which can take advantage of indexes purely to make the joining process more efficient, however, that still is not the same as having a set of criteria that will trigger the use of indexes.
Limit is not a criteria -- it can be thought of as filtration once a result set is determined. You save time on the client, once the result set is prepared, but not on the server.
Really, the only way to get the answers you are seeking is to become familiar with:
EXPLAIN EXTENDED your_sql_statement
The output of EXPLAIN will show you how many rows are being looked at by mysql, as well as whether or not any indexes are being used.

How can I improve the performance of this MySQL query?

I have a MySQL query:
SELECT DISTINCT
c.id,
c.company_name,
cd.firstname,
cd.surname,
cis.description AS industry_sector
FROM (clients c)
JOIN clients_details cd ON c.id = cd.client_id
LEFT JOIN clients_industry_sectors cis ON cd.industry_sector_id = cis.id
WHERE c.record_type='virgin'
ORDER BY date_action, company_name asc, id desc
LIMIT 30
The clients table has about 60-70k rows and has an index for 'id', 'record_type', 'date_action' and 'company_name' - unfortunately the query still takes 5+ secs to complete. Removing the 'ORDER BY' reduces this to about 30ms since a filesort is not required. Is there any way I can alter this query to improve upon the 5+ sec response time?
See: http://dev.mysql.com/doc/refman/5.0/en/order-by-optimization.html
Especially:
In some cases, MySQL cannot use indexes to resolve the ORDER BY (..). These cases include the following:
(..)
You are joining many tables, and the columns in the ORDER BY are not all from the first nonconstant table that is used to retrieve rows. (This is the first table in the EXPLAIN output that does not have a const join type.)
You have an index for id, record_type, date_action. But if you want to order by date_action, you really need an index that has date_action as the first field in the index, preferably matching the exact fields in the order by. Otherwise yes, it will be a slow query.
Without seeing all your tables and indexes, it's hard to tell. When asking a question about speeding up a query, the query is just part of the equation.
Does clients have an index on id?
Does clients have an index on record_type
Does clients_details have an index on client_id?
Does clients_industry_sectors have an index on id?
These are the minimum you need for this query to have any chance of working quickly.
thanks so much for the input and suggestions. In the end I've decided to create a new DB table which has the sole purpose of existing to return results for this purpose so no joins are required, I just update the table when records are added or deleted to/from the master clients table. Not ideal from a data storage point of view but it solves the problem and means I'm getting results fantastically fast. :)