I'm using mySQL InnoDB tables. The query is simple but it just takes too long to run. Its bound to run even slower once I upload it to a server (not to mention that these two tables will keep growing in size).
The 'author' table size is 3,045 records.
The 'book' table size is 5,278 records.
SELECT author.*, count( auth_id ) AS author_count
FROM author
LEFT JOIN book ON author.id = book.auth_id
GROUP BY author.id
I'm wondering if there is some trick I can employ to make the query run at least twice as fast as it is now (it currently takes about 10.5 seconds to run - an eternity when displaying web content!)
1) Is this returning what you want?
2) You should list in the GROUP BY clause all the fields from the SELECT clause that are not in an agregate function (like COUNT in this case, but also could be AVG or SUM).
Related
I have trawled many of the similar responses on this site and have improved my code at several stages along the way. Unfortunately, this 3-row query still won't run.
I have one table with 100k+ rows and about 30 columns of which I can filter down to 3-rows (in this example) and then perform INNER JOINs across 21 small lookup tables.
In my first attempt, I was lazy and used implicit joins.
SELECT `master_table`.*, `lookup_table`.`data_point` x 21
FROM `lookup_table` x 21
WHERE `master_table`.`indexed_col` = "value"
AND `lookup_table`.`id` = `lookup_col` x 21
The query looked to be timing out:
#2013 - Lost connection to MySQL server during query
Following this, I tried being explicit about the joins.
SELECT `master_table`.*, `lookup_table`.`data_point` x 21
FROM `master_table`
INNER JOIN `lookup_table` ON `lookup_table`.`id` = `master_table`.`lookup_col` x 21
WHERE `master_table`.`indexed_col` = "value"
Still got the same result. I then realised that the query was probably trying to perform the joins first, then filter down via the WHERE clause. So after a bit more research, I learned how I could apply a subquery to perform the filter first and then perform the joins on the newly created table. This is where I got to, and it still returns the same error. Is there any way I can improve this query further?
SELECT `temp_table`.*, `lookup_table`.`data_point` x 21
FROM (SELECT * FROM `master_table` WHERE `indexed_col` = "value") as `temp_table`
INNER JOIN `lookup_table` ON `lookup_table`.`id` = `temp_table`.`lookup_col` x 21
Is this the best way to write up this kind of query? I tested the subquery to ensure it only returns a small table and can confirm that it returns only three rows.
First, at its most simple aspect you are looking for
select
mt.*
from
Master_Table mt
where
mt.indexed_col = 'value'
That is probably instantaneous provided you have an index on your master table on the given indexed_col in the first position (in case you had a compound index of many fields)…
Now, if I am understanding you correctly on your different lookup columns (21 in total), you have just simplified them for redundancy in this post, but actually doing something in the effect of
select
mt.*,
lt1.lookupDescription1,
lt2.lookupDescription2,
...
lt21.lookupDescription21
from
Master_Table mt
JOIN Lookup_Table1 lt1
on mt.lookup_col1 = lt1.pk_col1
JOIN Lookup_Table2 lt2
on mt.lookup_col2 = lt2.pk_col2
...
JOIN Lookup_Table21 lt21
on mt.lookup_col21 = lt21.pk_col21
where
mt.indexed_col = 'value'
I had a project well over a decade ago dealing with a similar situation... the Master table had about 21+ million records and had to join to about 30+ lookup tables. The system crawled and queried died after running a query after more than 24 hrs.
This too was on a MySQL server and the fix was a single MySQL keyword...
Select STRAIGHT_JOIN mt.*, ...
By having your master table in the primary position, where clause and its criteria directly on the master table, you are good. You know the relationships of the tables. Do the query in the exact order I presented it to you. Don't try to think for me on this and try to optimize based on a subsidiary table that may have smaller record count and somehow think that will help the query faster... it won't.
Try the STRAIGHT_JOIN keyword. It took the query I was working on and finished it in about 1.5 hrs... it was returning all 21 million rows with all corresponding lookup key descriptions for final output, hence still needed a longer duration than just 3 records.
First, don't use a subquery. Write the query as:
SELECT mt.*, lt.`data_point`
FROM `master_table` mt INNER JOIN
`lookup_table` l
ON l.`id` = mt.`lookup_col`
WHERE mt.`indexed_col` = value;
The indexes that you want are master_table(value, lookup_col) and lookup_table(id, data_point).
If you are still having performance problems, then there are multiple possibilities. High among them is that the result set is simply too big to return in a reasonable amount of time. To see if that is the case, you can use select count(*) to count the number of returned rows.
I Had a problem with order by when joins multiple tables which have millions of data. But I got solution as instead of join with distinct use of EXISTS will improve performance from the following question
How to improve order by performance with joins in mysql
SELECT
`tracked_twitter` . *,
COUNT( * ) AS twitterContentCount,
retweet_count + favourite_count + reply_count AS engagement
FROM
`tracked_twitter`
INNER JOIN
`twitter_content`
ON `tracked_twitter`.`id` = `twitter_content`.`tracked_twitter_id`
INNER JOIN
`tracker_twitter_content`
ON `twitter_content`.`id` = `tracker_twitter_content`.`twitter_content_id`
WHERE
`tracker_twitter_content`.`tracker_id` = '88'
GROUP BY
`tracked_twitter`.`id`
ORDER BY
twitterContentCount DESC LIMIT 20 OFFSET 0
But that method solves if I only need the result set from the parent table. What if, I want to execute grouped count and other math functions in other than parent table. I wrote a query that solves my criteria, but it takes 20 sec to execute. How can I optimize it ??.
Thanks in advance
Given the query is already fairly simple the options I'd look in to are ...
Execution plan (to find any missing indexes you could add)
caching (to ensure SQL already has all the data in ram)
de-normalisation (to turn the query in to flat select)
cache the data in the application (so you could use something like PLINQ on it)
Use a ram based store (redis, elastic)
File group adjustments (physically move the db to faster discs)
Partition your tables (to spread the raw data over multiple physical discs)
The further you go down this list the more involved the solutions become.
I guess it depends how fast you need the query to be and how much you need your solution to scale.
[site_list] ~100,000 rows... 10mb in size.
site_id
site_url
site_data_most_recent_record_id
[site_list_data] ~ 15+ million rows and growing... about 600mb in size.
record_id
site_id
site_connect_time
site_speed
date_checked
columns in bold are unique index keys.
I need to return 50 most recently updated sites AND the recent data that goes with it - connect time, speed, date...
This is my query:
SELECT SQL_CALC_FOUND_ROWS
site_list.site_url,
site_list_data.site_connect_time,
site_list_data.site_speed,
site_list_data.date_checked
FROM site_list
LEFT JOIN site_list_data
ON site_list.site_data_most_recent_record_id = site_list_data.record_id
ORDER BY site_data.date_checked DESC
LIMIT 50
Without the ORDER BY and SQL_CALC_FOUND_ROWS(I need it for pagination), the query takes about 1.5 seconds, with those it takes over 2 seconds or more which is not good enough because that particular page where this data will be shown is getting 20K+ pageviews/day and this query is apparently too heavy(server almost dies when I put this live) and too slow.
Experts of mySQL, how would you do this? What if the table got to 100 million records? Caching this huge result into a temp table every 30 seconds is the only other solution I got.
You need to add a heuristic to the query. You need to gate the query to get reasonable performance. It is effectively sorting your site_list_date table by date descending -- the ENTIRE table.
So, if you know that the top 50 will be within the last day or week, add a "and date_checked > <boundary_date>" to the query. Then it should reduce the overall result set first, and THEN sort it.
SQL_CALC_ROWS_FOUND is slow use COUNT instead. Take a look here
A couple of observations.
Both ORDER BY and SQL_CALC_FOUND_ROWS are going to add to the cost of your performance. ORDER BY clauses can potentially be improved with appropriate indexing -- do you have an index on your date_checked column? This could help.
What is your exact need for SQL_CALC_FOUND_ROWS? Consider replacing this with a separate query that uses COUNT instead. This can be vastly better assuming your Query Cache is enabled.
And if you can use COUNT, consider replacing your LEFT JOIN with an INNER JOIN as this will help performance as well.
Good luck.
I basically have two tables, a 'server' table and a 'server_ratings' table. I need to optimize the current query that I have (It works but it takes around 4 seconds). Is there any way I can do this better?
SELECT ROUND(AVG(server_ratings.rating), 0), server.id, server.name
FROM server LEFT JOIN server_ratings ON server.id = server_ratings.server_id
GROUP BY server.id;
Query looks ok, but make sure you have proper indexes:
on id column in server table - probably primary key,
on server_id column in server_ratings table,
If it does not help, then add rating column into server table and calculate it on a constant basis (see this answer about Cron jobs). This way you will save the time you spend on calculations. They can be made separately eg. every minute, but probably some less frequent calculations are enough (depending on how dynamic is your data).
Also make sure you query proper table - in the question you have mentioned servers table, but in the code there is reference to server table. Probably a typo :)
This should be slightly faster, because the aggregate function is executed first, resulting in fewer JOIN operations.
SELECT s.id, s.name, r.avg_rating
FROM server s
LEFT JOIN (
SELECT server_id, ROUND(AVG(rating), 0) AS avg_rating
FROM server_ratings
GROUP BY server_id
) r ON r.server_id = s.id
But the major point are matching indexes. Primary keys are indexed automatically. Make sure you have one on server_ratings.server_id, too.
Here's the query (the largest table has about 40,000 rows)
SELECT
Course.CourseID,
Course.Description,
UserCourse.UserID,
UserCourse.TimeAllowed,
UserCourse.CreatedOn,
UserCourse.PassedOn,
UserCourse.IssuedOn,
C.LessonCnt
FROM
UserCourse
INNER JOIN
Course
USING(CourseID)
INNER JOIN
(
SELECT CourseID, COUNT(*) AS LessonCnt FROM CourseSection GROUP BY CourseID
) C
USING(CourseID)
WHERE
UserCourse.UserID = 8810
If I run this, it executes very quickly (.05 seconds roughly). It returns 13 rows.
When I add an ORDER BY clause at the end of the query (ordering by any column) the query takes about 10 seconds.
I'm using this database in production now, and everything is working fine. All my other queries are speedy.
Any ideas of what it could be? I ran the query in MySQL's Query Browser, and from the command line. Both places it was dead slow with the ORDER BY.
EDIT: Tolgahan ALBAYRAK solution works, but can anyone explain why it works?
maybe this helps:
SELECT * FROM (
SELECT
Course.CourseID,
Course.Description,
UserCourse.UserID,
UserCourse.TimeAllowed,
UserCourse.CreatedOn,
UserCourse.PassedOn,
UserCourse.IssuedOn,
C.LessonCnt
FROM
UserCourse
INNER JOIN
Course
USING(CourseID)
INNER JOIN
(
SELECT CourseID, COUNT(*) AS LessonCnt FROM CourseSection GROUP BY CourseID
) C
USING(CourseID)
WHERE
UserCourse.UserID = 8810
) ORDER BY CourseID
Is the column you're ordering by indexed?
Indexing drastically speeds up ordering and filtering.
You are selecting from "UserCourse" which I assume is a joining table between courses and users (Many to Many).
You should index the column that you need to order by, in the "UserCourse" table.
Suppose you want to "order by CourseID", then you need to index it on UserCourse table.
Ordering by any other column that is not present in the joining table (i.e. UserCourse) may require further denormalization and indexing on the joining table to be optimized for speed;
In other words, you need to have a copy of that column in the joining table and index it.
P.S.
The answer given by Tolgahan Albayrak, although correct for this question, would not produce the desired result, in cases where one is doing a "LIMIT x" query.
Have you updated the statistics on your database? I ran into something similar on mine where I had 2 identical queries where the only difference was a capital letter and one returned in 1/2 a second and the other took nearly 5 minutes. Updating the statistics resolved the issue
Realise answer is too late, however I have just had a similar problem, adding order by increased the query time from seconds to 5 minutes and having tried most other suggestions for speeding it up, noticed that the /tmp files where getting to be 12G for this query. Changed the query such that a varchar(20000) field being returned was "trim("ed and performance dramatically improved (back to seconds). So I guess its worth checking whether you are returning large varchars as part of your query and if so, process them (maybe substring(x, 1, length(x))?? if you dont want to trim them.
Query was returning 500k rows and the /tmp file indicated that each row was using about 20k of data.
A similar question was asked before here.
It might help you as well. Basically it describes using composite indexes and how order by works.
Today I was running into a same kind of problem. As soon as I was sorting the resultset by a field from a joined table, the whole query was horribly slow and took more than a hundred seconds.
The server was running MySQL 5.0.51a and by chance I noticed that the same query was running as fast as it should have always done on a server with MySQL 5.1. When comparing the explains for that query I saw that obviously the usage and handling of indexes has changed a lot (at least from 5.0 -> 5.1).
So if you encounter such a problem, maybe your resolution is to simply upgrade your MySQL