Slow MySQL query -- possibly an index issue? - mysql

So firstly here's my query: (NOTE:I know SELECT * is bad practice I just switched it in to make the query more readable)
SELECT pcln_cities.*,COUNT(pcln_hotels.cityid) AS hotelcount
FROM pcln_cities
LEFT OUTER JOIN pcln_hotels ON pcln_hotels.cityid=pcln_cities.cityid
WHERE pcln_cities.state_name='California' GROUP BY pcln_cities.cityid
ORDER BY hotelcount DESC
LIMIT 5
So I know that to solve things like that you add EXPLAIN to the beginning of the query but I'm not 100% sure how to read the results, so here they are:
alt text http://www.andrew-g-johnson.com/query-results.JPG
Bonus points to an answer that tells me what to look for in the EXPLAIN results
EDIT The cities tables has the following indexes (or is it indices?)
cityid
state_name
and I just added one with both as I thought it might help (it didn't)
The hotels tables has the following indexes (or is it indices?)
cityid

Hmm, there's something not very right in your query.
You use an aggregate function (count), but you simply group by on id.
Normally, you should group on all columns in your select list, that are not an aggregate function.
As you've specified the query now, IMHO, the DBMS can never correctly determine which values he should display for those columns that are not an aggregate ...
It would be more correct if your query was written like:
select cityname, count(*)
from city inner join hotel on hotel.city_id = city_id
group by cityname
order by count(*) desc
If you do not have an index on the cityName, and you filter on cityname, it will improve performance if you put an index on that column.
In short: adding an index on columns that you regularly use for filtering or for sorting, may improve performance.
(That is simply put offcourse, you can use it as a 'guideline', but every situation is different. Sometimes it can be helpfull to add an index which spans multiple columns.
Also, remember that if you update or insert a record, indexes need to be updated as well, so there's a slight performance cost in adding/updating/deleting records)
Another thing that could improve performance, is using an inner join instead of an outer join. I don't think that it is necessary to use an outer join here.

It looks like you don't have an index on pcln_cities.state_name, or pcln_cities.cityid? Try adding them.
Given that you've updated your question to say that you do have these indexes, I can only suggest that your database currently has a preponderance of cities in California, so the query optimizer decided it would be easier to do a table scan and throw out the non-California ones than to use the index to pick out the California ones.

Your query looks fine. Is there a chance that something else has a lock on a record that you need? Are the tables especially big? I doubt that data is the problem as there are not that many hotels...
I've run in to similar issues with MySQL. After spending over a year tuning, patching, and thinking I'm a SQL dummy, I switched to SQL Server Express. The exact same queries with the exact same data would run 2-5 orders of magnitude faster in SQL Server Express. MySQL seemed to have an especially difficult time with moderately complex queries (5+ tables). I think the MySQL optimizer became retarded after SUN bought the organization...

Related

How to make a faster query when joining multiple huge tables?

I have 3 tables. All 3 tables have approximately 2 million rows. Everyday 10,000-100,000 new entries are entered. It takes approximately 10 seconds to finish the sql statement below. Is there a way to make this sql statement faster?
SELECT customers.name
FROM customers
INNER JOIN hotels ON hotels.cus_id = customers.cus_id
INNER JOIN bookings ON bookings.book_id = customers.book_id
WHERE customers.gender = 0 AND
customers.cus_id = 3
LIMIT 25 OFFSET 1;
Of course this statement works fine, but its slow. Is there a better way to write this code?
All database servers have a form of an optimization engine that is going to determine how best to grab the data you want. With a simple query such as the select you showed, there isn't going to be any way to greatly improve performance within the SQL. As others have said sub-queries won't helps as that will get optimized into the same plan as joins.
Reduce the number of columns, add indexes, beef up the server if that's an option.
Consider caching. I'm not a mysql expert but found this article interesting and worth a skim. https://www.percona.com/blog/2011/04/04/mysql-caching-methods-and-tips/
Look at the section on summary tables and consider if that would be appropriate. Does pulling every hotel, customer, and booking need to be up-to-the-minute or would inserting this into a summary table once an hour be fine?
A subquery don't help but a proper index can improve the performance so be sure you have proper index
create index idx1 on customers(gender , cus_id,book_id, name )
create index idex2 on hotels(cus_id)
create index idex3 on hotels(book_id)
I find it a bit hard to believe that this is related to a real problem. As written, I would expect this to return the same customer name over and over.
I would recommend the following indexes:
customers(cus_id, gender, book_id, name)
hotels(cus_id)
bookings(book_id)
It is really weird that bookings are not to a hotel.
First, these indexes cover the query, so the data pages don't need to be accessed. The logic is to start with the where clause and use those columns first. Then add additional columns from the on and select clauses.
Only one column is used for hotels and bookings, so those indexes are trivial.
The use of OFFSET without ORDER BY is quite suspicious. The result set is in indeterminate order anyway, so there is no reason to skip the nominally "first" value.

Complex MySQL Query is Slow

A program I've been working on uses a complex MySQL query to combine information from several tables that have matching item IDs. However, since I added the subqueries you see below, the query has gone from taking under 1 second to execute to over 3 seconds. Do you have any suggestions for what I might do to optimize this query to be faster? Am I wrong in my thinking that having one complex query is better than having 4 or 5 smaller queries?
SELECT uninet_articles.*,
Unix_timestamp(uninet_articles.gmt),
uninet_comments.commentcount,
uninet_comments.lastposter,
Unix_timestamp(uninet_comments.maxgmt)
FROM uninet_articles
RIGHT JOIN (SELECT aid,
(SELECT poster
FROM uninet_comments AS a
WHERE b.aid = a.aid
ORDER BY gmt DESC
LIMIT 1) AS lastposter,
Count(*) AS commentcount,
Max(gmt) AS maxgmt
FROM uninet_comments AS b
GROUP BY aid
ORDER BY maxgmt DESC
LIMIT 10) AS uninet_comments
ON uninet_articles.aid = uninet_comments.aid
LIMIT 10
Queries can be though of as going through the data to find what matches. Sub-queries require going through the data many times in order to find which items are needed. In this case, you probably want to rewrite it as multiple queries. Many times, multiple simpler queries will be better - I think this is one of those cases.
You can also look at if your indexes are working well - if you know what that is. The reason why has to do with this: How does database indexing work?.
For a specific suggestion, you can find the last poster for each AID in a different query, and simply join it afterwards.
It always depends on the data you have and the way you use it.
You should use explain on your selects to see if you are using the indexes or not.
http://dev.mysql.com/doc/refman/5.5/en/explain.html

Should criteria be duplicated on subqueries

I have a query which actually runs two queries on a table. I query the whole table, a datediff and then a subquery which tells me the sum of hours each unit spent in certain operational steps. The main query limits the results to the REP depot so technically I don't need to put that same criteria on the subquery since repair_order is unique.
Would it be faster, slower or no difference to apply the depot filter on the subquery?
SELECT
*,
DATEDIFF(date_shipped, date_received) as htg_days,
(SELECT SUM(t3.total_days) FROM report_tables.cycle_time_days as t3 WHERE t1.repair_order=t3.repair_order AND (operation='MFG' OR operation='ENG' OR operation='ENGH' OR operation='HOLD') GROUP BY t3.repair_order) as subt_days
FROM
report_tables.cycle_time_days as t1
WHERE
YEAR(t1.date_shipped)=2010
AND t1.depot='REP'
GROUP BY
repair_order
ORDER BY
date_shipped;
I run into this with a lot of situations but I never know if it would be better to put the filter in the sub query, main query or both.
In this example, it would actually alter the query if you moved your WHERE clause to filter by REP into the subquery. So it wouldn't be about performance at that point, it would be about getting the same result set. In general, though, if you will get the same exact result set by moving a WHERE clause elsewhere in a complex query, it is better to do so at the most atomic level possible, ie, in the subquery. Then the subquery returns a smaller result set to the main query before the main query has to process it.
The answer to your question will vary depending on your schema, the complexity of your queries, the reliability of your data, etc. A general rule of thumb is to try to process the least amount of data possible, which generally means filtering it at the lowest level possible as well.
When you want to optimize a query the absolute number one place to start is to use the EXPLAIN output to see what optimizations the query parser was able to figure out and check to see what the weakest link is in the query plan. Resolve that, rinse, repeat.
You can also use explain's "extended" keyword to see the actual query it built to run which will reveal more about its usage of your criteria. In some cases, it will optimize away duplicate conditions between parent/subqueries. In other cases, it may push the conditions down from the parent in to the subquery. In some cases for (too) complex queries I've seen the it repeat the condition when it was only specified in the query once. Thankfully, you don't have to guess, mysql's explain plan will reveal all, albeit sometimes in cryptic ways.
I usually use a derived table as a "driver or aggregating" query then join that result back onto whatever table that i want to pull data from:
select
t1.*,
datediff(t1.date_shipped, t1.date_received) as htg_days,
subt_days.total_days
from
cycle_time_days as t1
inner join
(
-- aggregating/driver query
select
repair_order,
sum(total_days) as total_days
from
cycle_time_days
where
year(date_shipped) = 2010 and depot = 'REP' and
operation in ('MFG','ENG','ENGH','HOLD') -- covering index on date, depot, op ???
group by
repair_order -- indexed ??
having
total_days > 14 -- added for demonstration purposes
order by
total_days desc limit 10
) as subt_days on t1.repair_order = subt_days.repair_order
order by
t1.date_shipped;

How can I improve the performance of this MySQL query?

I have a MySQL query:
SELECT DISTINCT
c.id,
c.company_name,
cd.firstname,
cd.surname,
cis.description AS industry_sector
FROM (clients c)
JOIN clients_details cd ON c.id = cd.client_id
LEFT JOIN clients_industry_sectors cis ON cd.industry_sector_id = cis.id
WHERE c.record_type='virgin'
ORDER BY date_action, company_name asc, id desc
LIMIT 30
The clients table has about 60-70k rows and has an index for 'id', 'record_type', 'date_action' and 'company_name' - unfortunately the query still takes 5+ secs to complete. Removing the 'ORDER BY' reduces this to about 30ms since a filesort is not required. Is there any way I can alter this query to improve upon the 5+ sec response time?
See: http://dev.mysql.com/doc/refman/5.0/en/order-by-optimization.html
Especially:
In some cases, MySQL cannot use indexes to resolve the ORDER BY (..). These cases include the following:
(..)
You are joining many tables, and the columns in the ORDER BY are not all from the first nonconstant table that is used to retrieve rows. (This is the first table in the EXPLAIN output that does not have a const join type.)
You have an index for id, record_type, date_action. But if you want to order by date_action, you really need an index that has date_action as the first field in the index, preferably matching the exact fields in the order by. Otherwise yes, it will be a slow query.
Without seeing all your tables and indexes, it's hard to tell. When asking a question about speeding up a query, the query is just part of the equation.
Does clients have an index on id?
Does clients have an index on record_type
Does clients_details have an index on client_id?
Does clients_industry_sectors have an index on id?
These are the minimum you need for this query to have any chance of working quickly.
thanks so much for the input and suggestions. In the end I've decided to create a new DB table which has the sole purpose of existing to return results for this purpose so no joins are required, I just update the table when records are added or deleted to/from the master clients table. Not ideal from a data storage point of view but it solves the problem and means I'm getting results fantastically fast. :)

Slow query when using ORDER BY

Here's the query (the largest table has about 40,000 rows)
SELECT
Course.CourseID,
Course.Description,
UserCourse.UserID,
UserCourse.TimeAllowed,
UserCourse.CreatedOn,
UserCourse.PassedOn,
UserCourse.IssuedOn,
C.LessonCnt
FROM
UserCourse
INNER JOIN
Course
USING(CourseID)
INNER JOIN
(
SELECT CourseID, COUNT(*) AS LessonCnt FROM CourseSection GROUP BY CourseID
) C
USING(CourseID)
WHERE
UserCourse.UserID = 8810
If I run this, it executes very quickly (.05 seconds roughly). It returns 13 rows.
When I add an ORDER BY clause at the end of the query (ordering by any column) the query takes about 10 seconds.
I'm using this database in production now, and everything is working fine. All my other queries are speedy.
Any ideas of what it could be? I ran the query in MySQL's Query Browser, and from the command line. Both places it was dead slow with the ORDER BY.
EDIT: Tolgahan ALBAYRAK solution works, but can anyone explain why it works?
maybe this helps:
SELECT * FROM (
SELECT
Course.CourseID,
Course.Description,
UserCourse.UserID,
UserCourse.TimeAllowed,
UserCourse.CreatedOn,
UserCourse.PassedOn,
UserCourse.IssuedOn,
C.LessonCnt
FROM
UserCourse
INNER JOIN
Course
USING(CourseID)
INNER JOIN
(
SELECT CourseID, COUNT(*) AS LessonCnt FROM CourseSection GROUP BY CourseID
) C
USING(CourseID)
WHERE
UserCourse.UserID = 8810
) ORDER BY CourseID
Is the column you're ordering by indexed?
Indexing drastically speeds up ordering and filtering.
You are selecting from "UserCourse" which I assume is a joining table between courses and users (Many to Many).
You should index the column that you need to order by, in the "UserCourse" table.
Suppose you want to "order by CourseID", then you need to index it on UserCourse table.
Ordering by any other column that is not present in the joining table (i.e. UserCourse) may require further denormalization and indexing on the joining table to be optimized for speed;
In other words, you need to have a copy of that column in the joining table and index it.
P.S.
The answer given by Tolgahan Albayrak, although correct for this question, would not produce the desired result, in cases where one is doing a "LIMIT x" query.
Have you updated the statistics on your database? I ran into something similar on mine where I had 2 identical queries where the only difference was a capital letter and one returned in 1/2 a second and the other took nearly 5 minutes. Updating the statistics resolved the issue
Realise answer is too late, however I have just had a similar problem, adding order by increased the query time from seconds to 5 minutes and having tried most other suggestions for speeding it up, noticed that the /tmp files where getting to be 12G for this query. Changed the query such that a varchar(20000) field being returned was "trim("ed and performance dramatically improved (back to seconds). So I guess its worth checking whether you are returning large varchars as part of your query and if so, process them (maybe substring(x, 1, length(x))?? if you dont want to trim them.
Query was returning 500k rows and the /tmp file indicated that each row was using about 20k of data.
A similar question was asked before here.
It might help you as well. Basically it describes using composite indexes and how order by works.
Today I was running into a same kind of problem. As soon as I was sorting the resultset by a field from a joined table, the whole query was horribly slow and took more than a hundred seconds.
The server was running MySQL 5.0.51a and by chance I noticed that the same query was running as fast as it should have always done on a server with MySQL 5.1. When comparing the explains for that query I saw that obviously the usage and handling of indexes has changed a lot (at least from 5.0 -> 5.1).
So if you encounter such a problem, maybe your resolution is to simply upgrade your MySQL