How to make a faster query when joining multiple huge tables? - mysql

I have 3 tables. All 3 tables have approximately 2 million rows. Everyday 10,000-100,000 new entries are entered. It takes approximately 10 seconds to finish the sql statement below. Is there a way to make this sql statement faster?
SELECT customers.name
FROM customers
INNER JOIN hotels ON hotels.cus_id = customers.cus_id
INNER JOIN bookings ON bookings.book_id = customers.book_id
WHERE customers.gender = 0 AND
customers.cus_id = 3
LIMIT 25 OFFSET 1;
Of course this statement works fine, but its slow. Is there a better way to write this code?

All database servers have a form of an optimization engine that is going to determine how best to grab the data you want. With a simple query such as the select you showed, there isn't going to be any way to greatly improve performance within the SQL. As others have said sub-queries won't helps as that will get optimized into the same plan as joins.
Reduce the number of columns, add indexes, beef up the server if that's an option.
Consider caching. I'm not a mysql expert but found this article interesting and worth a skim. https://www.percona.com/blog/2011/04/04/mysql-caching-methods-and-tips/
Look at the section on summary tables and consider if that would be appropriate. Does pulling every hotel, customer, and booking need to be up-to-the-minute or would inserting this into a summary table once an hour be fine?

A subquery don't help but a proper index can improve the performance so be sure you have proper index
create index idx1 on customers(gender , cus_id,book_id, name )
create index idex2 on hotels(cus_id)
create index idex3 on hotels(book_id)

I find it a bit hard to believe that this is related to a real problem. As written, I would expect this to return the same customer name over and over.
I would recommend the following indexes:
customers(cus_id, gender, book_id, name)
hotels(cus_id)
bookings(book_id)
It is really weird that bookings are not to a hotel.
First, these indexes cover the query, so the data pages don't need to be accessed. The logic is to start with the where clause and use those columns first. Then add additional columns from the on and select clauses.
Only one column is used for hotels and bookings, so those indexes are trivial.
The use of OFFSET without ORDER BY is quite suspicious. The result set is in indeterminate order anyway, so there is no reason to skip the nominally "first" value.

Related

Optimized query to find avg in MySQL with inner join

Currently I am running a query to find average with joining one more table.
Results are as expected but it does not perform very well, taking a lot of time to execute. So need a help to find the better query. Current query is:
SELECT AVG(t2.a),
AVG(t2.b),
AVG(t2.c),
t1.column1,
t1.column2
FROM table1 t1
INNER JOIN table2 t2
ON t1.column = t2.column
GROUP BY t1.column1, t2.column2
In the future when asking questions related to performance please always include an EXPLAIN output. You basically just write "EXPLAIN SELECT ....;" and it'll show you the execution plan of that query which includes detailed information which may hint towards possible optimizations.
Two things:
JOINs on unindexed columns may be very slow.
GROUP BY statements are generally slow kind of queries as they require sorting, especially when grouping multiple columns. GROUP BY can do index scans but this requires that there's tuple indexes on the involved columns which in your case since you're selecting columns from different tables probably doesn't work.
How many rows do you have? If you're grouping hundreds of million of rows you can easily expect query times that are in the range of hours (and I'm dead serious about hours). Grouping is just a horrifically expensive operation. Especially because you have memory limits which means the sorting takes place on-disk which induces an additional slowdown due to disk i/o which is just soo much slower than memory.
There are two possible answers.
The query is wrong -- Because the JOIN occurs before the AVERAGE, hence the average is of too many rows.
The query is right -- in which case, there is a lot of work to do, so it takes time. I have to believe that this is the case, since you GROUP BY columns from both tables.
Please provide real column names; it could help our understanding of the query.
But assuming the first case, let's fix the math and speed it up.
Computer the averages in a "derived" table.
Do the JOIN.
I won't attempt to write the code until I have some assurance that the case is worth pursuing.

Optimizing mysql query with the proper index

I have a table of 15.1 million records. I'm running the following query on it to process the records for duplicate checking.
select id, name, state, external_id
from companies
where dup_checked=0
order by name
limit 500;
When I use explain extended on the query it tells me it's using the index_companies_on_name index which is just an index on the company name. I'm assuming this is due to the ordering. I tried creating other indexes based on the name and dup_checked fields hoping it would use this one as it may be faster, but it still uses the index_companies_on_name index.
Initially it was fast enough, but now we're down to 3.3 million records left to check and this query is taking up to 90 seconds to execute. I'm not quite sure what else to do to make this run faster. Is a different index the answer or something else I'm not thinking of? Thanks.
Generally the trick here is to create an index that filters first, reducing the number of rows ("Cardinality"), and has the ordering applied secondarily:
CREATE INDEX `index_companies_on_dup_checked_name`
ON `companies` (`dup_checked`,`name`)
That should give you the scope you need.

How to make my MySQL SUM() query more faster

I have about 1 million rows on users table and have columns A AA B BB C CC D DD E EE F FF by example to count int values 0 & 1
SELECT
CityCode,SUM(A),SUM(B),SUM(C),SUM(D),SUM(E),SUM(F),SUM(AA),SUM(BB),SUM(CC),SUM(DD),SUM(EE),SUM(FF)
FROM users
GROUP BY CityCode
Result 8 rows in set (24.49 sec).
How to make my statement more faster?
Use explain to to know the excution plan of your query.
Create atleast one or more Index. If possible make CityCode primary key.
Try this one
SELECT CityCode,SUM(A),SUM(B),SUM(C),SUM(D), SUM(E),SUM(F),SUM(AA),SUM(BB),SUM(CC),SUM(DD),SUM(EE),SUM(FF)
FROM users
GROUP BY CityCode,A,B,C,D,E,F,AA,BB,CC,DD,EE,FF
Create an index on the CityCode column.
I believe it is not because of SUM(), try to say select CityCode from users group by CityCode; it should take neary the same time...
Use better hardware
increase caching size - if you use InnoDB engine, then increase the innodb_buffer_pool_size value
refactor your query to limit the number of users (if business logic permits that, of course)
You have no WHERE clause, which means the query has to scan the whole table. This will make it slow on a large table.
You should consider how often you need to do this and what the impact of it being slow is. Some suggestions are:
Don't change anything - if it doesn't really matter
Have a table which contains the same data as "users", but without any other columns that you aren't interested in querying. It will still be slow, but not as slow, especially if there are bigger ones
(InnoDB) use CityCode as the first part of the primary key for table "users", that way it can do a PK scan and avoid any sorting (may still be too slow)
Create and maintain some kind of summary table, but you'll need to update it each time a user changes (or tolerate stale data)
But be sure that this optimisation is absolutely necessary.

Slow query when using ORDER BY

Here's the query (the largest table has about 40,000 rows)
SELECT
Course.CourseID,
Course.Description,
UserCourse.UserID,
UserCourse.TimeAllowed,
UserCourse.CreatedOn,
UserCourse.PassedOn,
UserCourse.IssuedOn,
C.LessonCnt
FROM
UserCourse
INNER JOIN
Course
USING(CourseID)
INNER JOIN
(
SELECT CourseID, COUNT(*) AS LessonCnt FROM CourseSection GROUP BY CourseID
) C
USING(CourseID)
WHERE
UserCourse.UserID = 8810
If I run this, it executes very quickly (.05 seconds roughly). It returns 13 rows.
When I add an ORDER BY clause at the end of the query (ordering by any column) the query takes about 10 seconds.
I'm using this database in production now, and everything is working fine. All my other queries are speedy.
Any ideas of what it could be? I ran the query in MySQL's Query Browser, and from the command line. Both places it was dead slow with the ORDER BY.
EDIT: Tolgahan ALBAYRAK solution works, but can anyone explain why it works?
maybe this helps:
SELECT * FROM (
SELECT
Course.CourseID,
Course.Description,
UserCourse.UserID,
UserCourse.TimeAllowed,
UserCourse.CreatedOn,
UserCourse.PassedOn,
UserCourse.IssuedOn,
C.LessonCnt
FROM
UserCourse
INNER JOIN
Course
USING(CourseID)
INNER JOIN
(
SELECT CourseID, COUNT(*) AS LessonCnt FROM CourseSection GROUP BY CourseID
) C
USING(CourseID)
WHERE
UserCourse.UserID = 8810
) ORDER BY CourseID
Is the column you're ordering by indexed?
Indexing drastically speeds up ordering and filtering.
You are selecting from "UserCourse" which I assume is a joining table between courses and users (Many to Many).
You should index the column that you need to order by, in the "UserCourse" table.
Suppose you want to "order by CourseID", then you need to index it on UserCourse table.
Ordering by any other column that is not present in the joining table (i.e. UserCourse) may require further denormalization and indexing on the joining table to be optimized for speed;
In other words, you need to have a copy of that column in the joining table and index it.
P.S.
The answer given by Tolgahan Albayrak, although correct for this question, would not produce the desired result, in cases where one is doing a "LIMIT x" query.
Have you updated the statistics on your database? I ran into something similar on mine where I had 2 identical queries where the only difference was a capital letter and one returned in 1/2 a second and the other took nearly 5 minutes. Updating the statistics resolved the issue
Realise answer is too late, however I have just had a similar problem, adding order by increased the query time from seconds to 5 minutes and having tried most other suggestions for speeding it up, noticed that the /tmp files where getting to be 12G for this query. Changed the query such that a varchar(20000) field being returned was "trim("ed and performance dramatically improved (back to seconds). So I guess its worth checking whether you are returning large varchars as part of your query and if so, process them (maybe substring(x, 1, length(x))?? if you dont want to trim them.
Query was returning 500k rows and the /tmp file indicated that each row was using about 20k of data.
A similar question was asked before here.
It might help you as well. Basically it describes using composite indexes and how order by works.
Today I was running into a same kind of problem. As soon as I was sorting the resultset by a field from a joined table, the whole query was horribly slow and took more than a hundred seconds.
The server was running MySQL 5.0.51a and by chance I noticed that the same query was running as fast as it should have always done on a server with MySQL 5.1. When comparing the explains for that query I saw that obviously the usage and handling of indexes has changed a lot (at least from 5.0 -> 5.1).
So if you encounter such a problem, maybe your resolution is to simply upgrade your MySQL

Slow MySQL query -- possibly an index issue?

So firstly here's my query: (NOTE:I know SELECT * is bad practice I just switched it in to make the query more readable)
SELECT pcln_cities.*,COUNT(pcln_hotels.cityid) AS hotelcount
FROM pcln_cities
LEFT OUTER JOIN pcln_hotels ON pcln_hotels.cityid=pcln_cities.cityid
WHERE pcln_cities.state_name='California' GROUP BY pcln_cities.cityid
ORDER BY hotelcount DESC
LIMIT 5
So I know that to solve things like that you add EXPLAIN to the beginning of the query but I'm not 100% sure how to read the results, so here they are:
alt text http://www.andrew-g-johnson.com/query-results.JPG
Bonus points to an answer that tells me what to look for in the EXPLAIN results
EDIT The cities tables has the following indexes (or is it indices?)
cityid
state_name
and I just added one with both as I thought it might help (it didn't)
The hotels tables has the following indexes (or is it indices?)
cityid
Hmm, there's something not very right in your query.
You use an aggregate function (count), but you simply group by on id.
Normally, you should group on all columns in your select list, that are not an aggregate function.
As you've specified the query now, IMHO, the DBMS can never correctly determine which values he should display for those columns that are not an aggregate ...
It would be more correct if your query was written like:
select cityname, count(*)
from city inner join hotel on hotel.city_id = city_id
group by cityname
order by count(*) desc
If you do not have an index on the cityName, and you filter on cityname, it will improve performance if you put an index on that column.
In short: adding an index on columns that you regularly use for filtering or for sorting, may improve performance.
(That is simply put offcourse, you can use it as a 'guideline', but every situation is different. Sometimes it can be helpfull to add an index which spans multiple columns.
Also, remember that if you update or insert a record, indexes need to be updated as well, so there's a slight performance cost in adding/updating/deleting records)
Another thing that could improve performance, is using an inner join instead of an outer join. I don't think that it is necessary to use an outer join here.
It looks like you don't have an index on pcln_cities.state_name, or pcln_cities.cityid? Try adding them.
Given that you've updated your question to say that you do have these indexes, I can only suggest that your database currently has a preponderance of cities in California, so the query optimizer decided it would be easier to do a table scan and throw out the non-California ones than to use the index to pick out the California ones.
Your query looks fine. Is there a chance that something else has a lock on a record that you need? Are the tables especially big? I doubt that data is the problem as there are not that many hotels...
I've run in to similar issues with MySQL. After spending over a year tuning, patching, and thinking I'm a SQL dummy, I switched to SQL Server Express. The exact same queries with the exact same data would run 2-5 orders of magnitude faster in SQL Server Express. MySQL seemed to have an especially difficult time with moderately complex queries (5+ tables). I think the MySQL optimizer became retarded after SUN bought the organization...