How can I improve the performance of this MySQL query?

How can I improve the performance of this MySQL query? - mysql

I have a MySQL query:
SELECT DISTINCT
c.id,
c.company_name,
cd.firstname,
cd.surname,
cis.description AS industry_sector
FROM (clients c)
JOIN clients_details cd ON c.id = cd.client_id
LEFT JOIN clients_industry_sectors cis ON cd.industry_sector_id = cis.id
WHERE c.record_type='virgin'
ORDER BY date_action, company_name asc, id desc
LIMIT 30
The clients table has about 60-70k rows and has an index for 'id', 'record_type', 'date_action' and 'company_name' - unfortunately the query still takes 5+ secs to complete. Removing the 'ORDER BY' reduces this to about 30ms since a filesort is not required. Is there any way I can alter this query to improve upon the 5+ sec response time?

See: http://dev.mysql.com/doc/refman/5.0/en/order-by-optimization.html
Especially:
In some cases, MySQL cannot use indexes to resolve the ORDER BY (..). These cases include the following:
(..)
You are joining many tables, and the columns in the ORDER BY are not all from the first nonconstant table that is used to retrieve rows. (This is the first table in the EXPLAIN output that does not have a const join type.)

You have an index for id, record_type, date_action. But if you want to order by date_action, you really need an index that has date_action as the first field in the index, preferably matching the exact fields in the order by. Otherwise yes, it will be a slow query.

Without seeing all your tables and indexes, it's hard to tell. When asking a question about speeding up a query, the query is just part of the equation.
Does clients have an index on id?
Does clients have an index on record_type
Does clients_details have an index on client_id?
Does clients_industry_sectors have an index on id?
These are the minimum you need for this query to have any chance of working quickly.

thanks so much for the input and suggestions. In the end I've decided to create a new DB table which has the sole purpose of existing to return results for this purpose so no joins are required, I just update the table when records are added or deleted to/from the master clients table. Not ideal from a data storage point of view but it solves the problem and means I'm getting results fantastically fast. :)

Related

Speed of query using FIND_IN_SET on MySql

i have several problems with my query from a catalogue of products.
The query is as follows:
SELECT DISTINCT (cc_id) FROM cms_catalogo
JOIN cms_catalogo_lingua ON ccl_id_prod=cc_id
JOIN cms_catalogo_famiglia ON (FIND_IN_SET(ccf_id, cc_famiglia) != 0)
JOIN cms_catalogo_categoria ON (FIND_IN_SET(ccc_id, cc_categoria) != 0)
JOIN cms_catalogo_sottocat ON (FIND_IN_SET(ccs_id, cc_sottocat) != 0)
LEFT JOIN cms_catalogo_order ON cco_id_prod=cc_id AND cco_id_lingua=1 AND cco_id_sottocat=ccs_id
WHERE ccc_nome='Alpine Skiing' AND ccf_nome='Ski'
I noticed that querying the first time it takes on average 4.5 seconds, then becomes rapid.
I use FIND_IN_SET because in my Database on table "cms_catalogo" I have the column "cc_famiglia" , "cc_categoria" and "cc_sottocat" with inside ID separated by commas (I know it's stupid).
Example:
Table cms_catalogo
Column cc_famiglia: 1,2,3,4,5
Table cms_catalogo_famiglia
Column ccf_id: 3
The slowdown in the query may arise from the use of FIND_IN_SET that way?
If instead of having IDs separated by comma have a table with ID as an index would be faster?
I can not explain, however, why the first execution of the query is very slow and then speeds up

It is better to use constraint connections between tables. So you better connect them by primary key.
If you want just to quick optimisation for this query:
Check explain select ... in mysql to see performance of you query;
Add indexes for columns ccc_id, ccf_id, ccs_id;
Check explain select ... after indexes added.
The first MySQL query takes much more time because it is raw query, the next are cached. So you should rely on first query time.
If it is not complicated report then execution time should be less than 50-100ms, otherwise you can get problems with performance in total. Because I am so sure it is not the only one query for your application.

How can I optimize large table in mysql?

I have a table with nearly 30 M records and size is 6.6 GB. I need to query some data from it and use group by and order by. It takes me too long to query the data, I lost connection to DB so many times...
I have index on all necessary fields as key and composite key. What else can I do to make it faster for the query?
Example query:
select id, max(price), avg(order) from table group by id, date order by id, location.

use EXPLAIN query, where query is your query. For example: EXPLAIN select * from table group by id, date order by id, location.
You'll see a table where MySQL analyses your query and shows which indices it looks for. Possibly you don't have sufficient (god enough) indices.

I don't think you can. With no filter (WHERE clause) and AVG the entire tables has to be read.
The only thing I can think of is to have a new table with ID, AVG_ORDER, MAX_PRICE (or whatever you need) and update that using a trigger or stored procedure when you insert/update new rows.
an index on ID,PRICE index might help you if you didn't need that pesky average.

Indexing isn't going to do you any good. You're averaging a column, so you have to read every row in the table. That's going to take time.

Why does the query take a long time in mysql even with a LIMIT clause?

Say I have an Order table that has 100+ columns and 1 million rows. It has a PK on OrderID and FK constraint StoreID --> Store.StoreID.
1) select * from 'Order' order by OrderID desc limit 10;
the above takes a few milliseconds.
2) select * from 'Order' o join 'Store' s on s.StoreID = o.StoreID order by OrderID desc limit 10;
this somehow can take up to many seconds. The more inner joins I add, slows it down further more.
3) select OrderID, column1 from 'Order' o join 'Store' s on s.StoreID = o.StoreID order by OrderID desc limit 10;
this seems to speed the execution up, by limiting the columns we select.
There are a few points that I dont understand here and would really appreciate it if anyone more knowledgeable with mysql (or rmdb query execution in general) can enlighten me.
Query 1 is fast since it's just a reverse lookup by PK and DB only needs to return the first 10 rows it encountered.
I don't see why Query 2 should take for ever. Shouldn't the operation be the same? i.e. get the first 10 rows by PK and then join with other tables. Since there's a FK constraint, it is guaranteed that the relationship will be satisfied. So DB doesn't need to join more rows than necessary and then trim the result, right? Unless, FK constraint allows null FK? In which case I guess a left join would make this much faster than an inner join?
Lastly, I'm guess query 3 is simply faster because less columns are used in those unnecessary joins? But why would the query execution need the other columns while joining? Shouldn't it just join using PKs first, and then get the columns for just the 10 rows?
Thanks!

My understanding is that the mysql engine applies limit after any join's happen.
From http://dev.mysql.com/doc/refman/5.0/en/select.html, The HAVING clause is applied nearly last, just before items are sent to the client, with no optimization. (LIMIT is applied after HAVING.)
EDIT: You could try using this query to take advantage of the PK speed.
select * from (select * from 'Order' order by OrderID desc limit 10) o
join 'Store' s on s.StoreID = o.StoreID;

All of your examples are asking for tablescans of the existing tables, so none of them will be more or less performant than the degree to which mysql can cache the data or results. Some of your queries have order by or join criteria, which can take advantage of indexes purely to make the joining process more efficient, however, that still is not the same as having a set of criteria that will trigger the use of indexes.
Limit is not a criteria -- it can be thought of as filtration once a result set is determined. You save time on the client, once the result set is prepared, but not on the server.
Really, the only way to get the answers you are seeking is to become familiar with:
EXPLAIN EXTENDED your_sql_statement
The output of EXPLAIN will show you how many rows are being looked at by mysql, as well as whether or not any indexes are being used.

Mysql - GROUP BY Avoid using tempoary

The goal of this query is to get a total of unique records (most recent per IP, by IP) per ref ID.
SELECT COUNT(DISTINCT ip), GROUP_CONCAT(ref.id)
FROM `sess` sess
JOIN `ref` USING(row_id)
WHERE sess.time BETWEEN '2010-04-21 00:00:00' AND '2010-04-21 23:59:59'
GROUP BY ref.id
ORDER BY sess.time DESC
The query works fine, but its using a temporary table. Any ideas?
The row_id is the primary on both tables. sess.time, sess.ip, and ref.id are all indexes.

I'm having trouble understanding how this query makes sense. Why do you use GROUP_CONCAT(ref.id) if you have GROUP BY ref.id? There can be only one value for ref.id per group by definition.
Also you ORDER BY sess.time even though sess could have multiple values for time per group. Which row in the group do you want to use for sorting?
I agree that a query that invokes a temporary table usually has a performance issue in MySQL. The temporary table often writes to disk, so you get an expensive disk I/O as part of the grouping & sorting.
Could you edit your question and show the table defintions (SHOW CREATE TABLE would be best)? Also please describe what the query is supposed to represent. Then we will have a better chance of giving you some suggestions about how to rewrite it.

It's probably using a temporary table because of the GROUP_CONCAT. But is that a problem really? Is the query too slow or do you simply dislike temporary tables?

Slow query when using ORDER BY

Here's the query (the largest table has about 40,000 rows)
SELECT
Course.CourseID,
Course.Description,
UserCourse.UserID,
UserCourse.TimeAllowed,
UserCourse.CreatedOn,
UserCourse.PassedOn,
UserCourse.IssuedOn,
C.LessonCnt
FROM
UserCourse
INNER JOIN
Course
USING(CourseID)
INNER JOIN
(
SELECT CourseID, COUNT(*) AS LessonCnt FROM CourseSection GROUP BY CourseID
) C
USING(CourseID)
WHERE
UserCourse.UserID = 8810
If I run this, it executes very quickly (.05 seconds roughly). It returns 13 rows.
When I add an ORDER BY clause at the end of the query (ordering by any column) the query takes about 10 seconds.
I'm using this database in production now, and everything is working fine. All my other queries are speedy.
Any ideas of what it could be? I ran the query in MySQL's Query Browser, and from the command line. Both places it was dead slow with the ORDER BY.
EDIT: Tolgahan ALBAYRAK solution works, but can anyone explain why it works?

maybe this helps:
SELECT * FROM (
SELECT
Course.CourseID,
Course.Description,
UserCourse.UserID,
UserCourse.TimeAllowed,
UserCourse.CreatedOn,
UserCourse.PassedOn,
UserCourse.IssuedOn,
C.LessonCnt
FROM
UserCourse
INNER JOIN
Course
USING(CourseID)
INNER JOIN
(
SELECT CourseID, COUNT(*) AS LessonCnt FROM CourseSection GROUP BY CourseID
) C
USING(CourseID)
WHERE
UserCourse.UserID = 8810
) ORDER BY CourseID

Is the column you're ordering by indexed?
Indexing drastically speeds up ordering and filtering.

You are selecting from "UserCourse" which I assume is a joining table between courses and users (Many to Many).
You should index the column that you need to order by, in the "UserCourse" table.
Suppose you want to "order by CourseID", then you need to index it on UserCourse table.
Ordering by any other column that is not present in the joining table (i.e. UserCourse) may require further denormalization and indexing on the joining table to be optimized for speed;
In other words, you need to have a copy of that column in the joining table and index it.
P.S.
The answer given by Tolgahan Albayrak, although correct for this question, would not produce the desired result, in cases where one is doing a "LIMIT x" query.

Have you updated the statistics on your database? I ran into something similar on mine where I had 2 identical queries where the only difference was a capital letter and one returned in 1/2 a second and the other took nearly 5 minutes. Updating the statistics resolved the issue

Realise answer is too late, however I have just had a similar problem, adding order by increased the query time from seconds to 5 minutes and having tried most other suggestions for speeding it up, noticed that the /tmp files where getting to be 12G for this query. Changed the query such that a varchar(20000) field being returned was "trim("ed and performance dramatically improved (back to seconds). So I guess its worth checking whether you are returning large varchars as part of your query and if so, process them (maybe substring(x, 1, length(x))?? if you dont want to trim them.
Query was returning 500k rows and the /tmp file indicated that each row was using about 20k of data.

A similar question was asked before here.
It might help you as well. Basically it describes using composite indexes and how order by works.

Today I was running into a same kind of problem. As soon as I was sorting the resultset by a field from a joined table, the whole query was horribly slow and took more than a hundred seconds.
The server was running MySQL 5.0.51a and by chance I noticed that the same query was running as fast as it should have always done on a server with MySQL 5.1. When comparing the explains for that query I saw that obviously the usage and handling of indexes has changed a lot (at least from 5.0 -> 5.1).
So if you encounter such a problem, maybe your resolution is to simply upgrade your MySQL

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008