I have two tables lets say employees and order, both table have millions of records.
Select orders.*
from orders
INNER JOIN employees
on Employees.id = orders.employeeid
WHERE orders.type='daily'
and orders.date > Employees.registerDate
and orders.date < Employees.regiserDate + Interval 60 Days
The above query is rough query, there may be syntax error, but is just considerations.
The query consuming almost 60 seconds to load, any body know how can i optimize this query
The indexing is one of the best options mentioned, you could also use stored procedure to optimize its performance on retrieving data.
you may would also like to use the ANALYZE statement to optimize the retrieval of data from those tables you may read more about this statement from this site:
http://dev.mysql.com/doc/refman/5.0/en/analyze-table.html
Set your indexes right (which you probably did?) or create views. You could also consider a temp table, but most of the time the sync makes it not worth it.
Related
I have 3 tables. All 3 tables have approximately 2 million rows. Everyday 10,000-100,000 new entries are entered. It takes approximately 10 seconds to finish the sql statement below. Is there a way to make this sql statement faster?
SELECT customers.name
FROM customers
INNER JOIN hotels ON hotels.cus_id = customers.cus_id
INNER JOIN bookings ON bookings.book_id = customers.book_id
WHERE customers.gender = 0 AND
customers.cus_id = 3
LIMIT 25 OFFSET 1;
Of course this statement works fine, but its slow. Is there a better way to write this code?
All database servers have a form of an optimization engine that is going to determine how best to grab the data you want. With a simple query such as the select you showed, there isn't going to be any way to greatly improve performance within the SQL. As others have said sub-queries won't helps as that will get optimized into the same plan as joins.
Reduce the number of columns, add indexes, beef up the server if that's an option.
Consider caching. I'm not a mysql expert but found this article interesting and worth a skim. https://www.percona.com/blog/2011/04/04/mysql-caching-methods-and-tips/
Look at the section on summary tables and consider if that would be appropriate. Does pulling every hotel, customer, and booking need to be up-to-the-minute or would inserting this into a summary table once an hour be fine?
A subquery don't help but a proper index can improve the performance so be sure you have proper index
create index idx1 on customers(gender , cus_id,book_id, name )
create index idex2 on hotels(cus_id)
create index idex3 on hotels(book_id)
I find it a bit hard to believe that this is related to a real problem. As written, I would expect this to return the same customer name over and over.
I would recommend the following indexes:
customers(cus_id, gender, book_id, name)
hotels(cus_id)
bookings(book_id)
It is really weird that bookings are not to a hotel.
First, these indexes cover the query, so the data pages don't need to be accessed. The logic is to start with the where clause and use those columns first. Then add additional columns from the on and select clauses.
Only one column is used for hotels and bookings, so those indexes are trivial.
The use of OFFSET without ORDER BY is quite suspicious. The result set is in indeterminate order anyway, so there is no reason to skip the nominally "first" value.
Currently I am running a query to find average with joining one more table.
Results are as expected but it does not perform very well, taking a lot of time to execute. So need a help to find the better query. Current query is:
SELECT AVG(t2.a),
AVG(t2.b),
AVG(t2.c),
t1.column1,
t1.column2
FROM table1 t1
INNER JOIN table2 t2
ON t1.column = t2.column
GROUP BY t1.column1, t2.column2
In the future when asking questions related to performance please always include an EXPLAIN output. You basically just write "EXPLAIN SELECT ....;" and it'll show you the execution plan of that query which includes detailed information which may hint towards possible optimizations.
Two things:
JOINs on unindexed columns may be very slow.
GROUP BY statements are generally slow kind of queries as they require sorting, especially when grouping multiple columns. GROUP BY can do index scans but this requires that there's tuple indexes on the involved columns which in your case since you're selecting columns from different tables probably doesn't work.
How many rows do you have? If you're grouping hundreds of million of rows you can easily expect query times that are in the range of hours (and I'm dead serious about hours). Grouping is just a horrifically expensive operation. Especially because you have memory limits which means the sorting takes place on-disk which induces an additional slowdown due to disk i/o which is just soo much slower than memory.
There are two possible answers.
The query is wrong -- Because the JOIN occurs before the AVERAGE, hence the average is of too many rows.
The query is right -- in which case, there is a lot of work to do, so it takes time. I have to believe that this is the case, since you GROUP BY columns from both tables.
Please provide real column names; it could help our understanding of the query.
But assuming the first case, let's fix the math and speed it up.
Computer the averages in a "derived" table.
Do the JOIN.
I won't attempt to write the code until I have some assurance that the case is worth pursuing.
How can i optimize this query? It's taking a lot of time to execute.
SELECT GROUP_CONCAT(`details_value`), details_key
FROM `correspondence_package_details`
WHERE `correspondence_id` IN (SELECT GROUP_CONCAT(`correspondence_id`)
FROM `correspondence_package_header`
WHERE create_date BETWEEN '2017-11-01' AND '2017-11-30'
GROUP BY `manual_package_id`,account_id)
AND (`details_key` = 'to' OR `details_key` = 'cc')
GROUP BY details_key;
Query optimization is both an art and science. By simply looking at the sql, you are limiting your options.
Run an explain plan on your query. Check for low performing join types and scans.
Add appropriate indexes on all join columns.
For date fields, sort the index based on whether you would normally be querying most recent or oldest records.
Regular run the optimize command on your databases so that the indexes and tablespaces get a quick tune up.
The query plan will be your best tool for this.
I Had a problem with order by when joins multiple tables which have millions of data. But I got solution as instead of join with distinct use of EXISTS will improve performance from the following question
How to improve order by performance with joins in mysql
SELECT
`tracked_twitter` . *,
COUNT( * ) AS twitterContentCount,
retweet_count + favourite_count + reply_count AS engagement
FROM
`tracked_twitter`
INNER JOIN
`twitter_content`
ON `tracked_twitter`.`id` = `twitter_content`.`tracked_twitter_id`
INNER JOIN
`tracker_twitter_content`
ON `twitter_content`.`id` = `tracker_twitter_content`.`twitter_content_id`
WHERE
`tracker_twitter_content`.`tracker_id` = '88'
GROUP BY
`tracked_twitter`.`id`
ORDER BY
twitterContentCount DESC LIMIT 20 OFFSET 0
But that method solves if I only need the result set from the parent table. What if, I want to execute grouped count and other math functions in other than parent table. I wrote a query that solves my criteria, but it takes 20 sec to execute. How can I optimize it ??.
Thanks in advance
Given the query is already fairly simple the options I'd look in to are ...
Execution plan (to find any missing indexes you could add)
caching (to ensure SQL already has all the data in ram)
de-normalisation (to turn the query in to flat select)
cache the data in the application (so you could use something like PLINQ on it)
Use a ram based store (redis, elastic)
File group adjustments (physically move the db to faster discs)
Partition your tables (to spread the raw data over multiple physical discs)
The further you go down this list the more involved the solutions become.
I guess it depends how fast you need the query to be and how much you need your solution to scale.
Here's the query (the largest table has about 40,000 rows)
SELECT
Course.CourseID,
Course.Description,
UserCourse.UserID,
UserCourse.TimeAllowed,
UserCourse.CreatedOn,
UserCourse.PassedOn,
UserCourse.IssuedOn,
C.LessonCnt
FROM
UserCourse
INNER JOIN
Course
USING(CourseID)
INNER JOIN
(
SELECT CourseID, COUNT(*) AS LessonCnt FROM CourseSection GROUP BY CourseID
) C
USING(CourseID)
WHERE
UserCourse.UserID = 8810
If I run this, it executes very quickly (.05 seconds roughly). It returns 13 rows.
When I add an ORDER BY clause at the end of the query (ordering by any column) the query takes about 10 seconds.
I'm using this database in production now, and everything is working fine. All my other queries are speedy.
Any ideas of what it could be? I ran the query in MySQL's Query Browser, and from the command line. Both places it was dead slow with the ORDER BY.
EDIT: Tolgahan ALBAYRAK solution works, but can anyone explain why it works?
maybe this helps:
SELECT * FROM (
SELECT
Course.CourseID,
Course.Description,
UserCourse.UserID,
UserCourse.TimeAllowed,
UserCourse.CreatedOn,
UserCourse.PassedOn,
UserCourse.IssuedOn,
C.LessonCnt
FROM
UserCourse
INNER JOIN
Course
USING(CourseID)
INNER JOIN
(
SELECT CourseID, COUNT(*) AS LessonCnt FROM CourseSection GROUP BY CourseID
) C
USING(CourseID)
WHERE
UserCourse.UserID = 8810
) ORDER BY CourseID
Is the column you're ordering by indexed?
Indexing drastically speeds up ordering and filtering.
You are selecting from "UserCourse" which I assume is a joining table between courses and users (Many to Many).
You should index the column that you need to order by, in the "UserCourse" table.
Suppose you want to "order by CourseID", then you need to index it on UserCourse table.
Ordering by any other column that is not present in the joining table (i.e. UserCourse) may require further denormalization and indexing on the joining table to be optimized for speed;
In other words, you need to have a copy of that column in the joining table and index it.
P.S.
The answer given by Tolgahan Albayrak, although correct for this question, would not produce the desired result, in cases where one is doing a "LIMIT x" query.
Have you updated the statistics on your database? I ran into something similar on mine where I had 2 identical queries where the only difference was a capital letter and one returned in 1/2 a second and the other took nearly 5 minutes. Updating the statistics resolved the issue
Realise answer is too late, however I have just had a similar problem, adding order by increased the query time from seconds to 5 minutes and having tried most other suggestions for speeding it up, noticed that the /tmp files where getting to be 12G for this query. Changed the query such that a varchar(20000) field being returned was "trim("ed and performance dramatically improved (back to seconds). So I guess its worth checking whether you are returning large varchars as part of your query and if so, process them (maybe substring(x, 1, length(x))?? if you dont want to trim them.
Query was returning 500k rows and the /tmp file indicated that each row was using about 20k of data.
A similar question was asked before here.
It might help you as well. Basically it describes using composite indexes and how order by works.
Today I was running into a same kind of problem. As soon as I was sorting the resultset by a field from a joined table, the whole query was horribly slow and took more than a hundred seconds.
The server was running MySQL 5.0.51a and by chance I noticed that the same query was running as fast as it should have always done on a server with MySQL 5.1. When comparing the explains for that query I saw that obviously the usage and handling of indexes has changed a lot (at least from 5.0 -> 5.1).
So if you encounter such a problem, maybe your resolution is to simply upgrade your MySQL