I am not sure what is happening, but MySQL should handle this just fine in my opinion.
I have SQL like this.
SELECT u.id AS user_id, SUM(t.amount) AS total
FROM user u
INNER JOIN transaction t ON t.user_id = u.id
WHERE u.condition = true
GROUP BY u.id
ORDER BY total DESC;
This query runs for 10 seconds.
If I remove ORDER BY clause, the time is around 4 seconds.
Tables are very large but after GROUP BY I have only 40 rows. Does it really take 6 seconds to sort 40 rows? I would say this should be handled by the optimizer.
However if I run the query like this:
SELECT *
FROM (
SELECT u.id AS user_id, SUM(t.amount) AS total
FROM user u
INNER JOIN transaction t ON t.user_id = u.id
WHERE u.condition = true
GROUP BY u.id
) data
ORDER BY total DESC;
This query runs for 4 seconds. I understand I forced MySQL to sort only the 40 records retrieved from inner select.
I really do not understand one thing. MySQL cannot sort by total before GROUP BY.. So what is slowing the query so much?
In this case I can use the second query, but if I had another inner SQL, MySQL would start creating temporary tables and it would kill the performance maybe more than ORDER BY. Another "problem" is that I use ORM and using raw SQL is really painful.
Thanks for suggestions.
EDIT:
Execution plan with ORDER BY
Execution plan without ORDER BY
I can see in execution plan, there is additional filesort + temporary when using ORDER BY..
Related
I know there's many questions/answers for slow queries, but I'm struggling to relate an existing answer to my example.
I have the following simple query which counts article views in a subquery:
SELECT
articles.id,
articles.views,
articles.title,
articles.slug,
articles.created_at,
(SELECT count(*) FROM tracking WHERE element_id = articles.id AND tracking_type = 'article_view') AS tracking_views
FROM articles
WHERE articles.company_id = 123
ORDER BY articles.created_at DESC
This particular company has ~250 articles, and the query takes over 12 seconds.
Is there a better/more efficient way I could be doing this?
Try joining to a group by. Its pretty hard to say without knowing how many articles / views and companies there are though.
What you want is for SQL to be able to to the aggregation of tracking in one go, rather than individually for every row in the result, which is implied by the position of your tracking_view sub select.
If your lucky (I didnt check) the join to the counts sub select will be smart enough to skip any articles that are not for the right company. If not you can include the join back to company in the counts sub select.
eg
select a.*, counts.count
from articles a
join (
select count(*) as count, element_id
from tracking
where tracking_type = 'article_view'
group by tracking.element_id
) as counts on counts.element_id = a.id
where a.company_id = 123
ORDER BY articles.created_at DESC
I have two tables. The first table (users) is a simple "id, username" with 100,00 rows and the second (stats) is "id, date, stat" with 20M rows.
I'm trying to figure out which username went up by the most in stat and here's the query I have. On a powerful machine, this query takes minutes to complete. Is there a better way to write it to speed it up?
SELECT a.id, a.username, b.stat, c.stat, (b.stat - c.stat) AS stat_diff
FROM users AS a
INNER JOIN stats AS b ON (b.id=a.id)
INNER JOIN stats AS c ON (c.id=a.id)
WHERE b.date = '2016-01-10'
AND c.date = '2016-01-13'
GROUP BY a.id
ORDER BY stat_diff DESC
LIMIT 100
the other way i tried but it doesn't seem optimal is
SELECT a.id, a.username,
(SELECT b.stat FROM stats AS b ON (b.id=a.id) AND b.date = '2016-01-10') AS start,
(SELECT c.stat FROM stats AS c ON (c.id=a.id) AND c.date = '2016-01-14') AS end,
((SELECT b.stat FROM stats AS b ON (b.id=a.id) AND b.date = '2016-01-10') -
(SELECT c.stat FROM stats AS c ON (c.id=a.id) AND c.date = '2016-01-14')) AS stat_diff
FROM users AS a
GROUP BY a.id
ORDER BY stat_diff DESC
LIMIT 100
Introduction
Let's suppose we rewrite sentence like this:
SELECT a.id, a.username, b.stat, c.stat, (b.stat - c.stat) AS stat_diff
FROM users AS a
INNER JOIN stats AS b ON
b.date = STR_TO_DATE('2016-01-10', '%Y-%m-%d' ) and b.id=a.id
INNER JOIN stats AS c ON
c.date = STR_TO_DATE('2016-01-13', '%Y-%m-%d' ) and c.id=a.id
GROUP BY a.id
ORDER BY stat_diff DESC
LIMIT 100
And we ensure than:
users table has index on field id:
stats has index on composite field date, id: create index stats_idx_d_i on stats ( date, id );
Then
Database optimizer may use indexes to selected a Restricted Set of Date ('RSD'), that means, rows that match filtered dates. This is fast.
But
You are sorting by a calculated field:
(b.stat - c.stat) AS stat_diff #<-- calculated
ORDER BY stat_diff DESC #<-- this forces to calculate it
They are no possible optimization on this sort because you should to calculate one by one all results on your 'RSD' (restricted set of data).
Conclusion
The question is, how may rows they are on your 'RSD'? If only they are few hundreds rows you query may run fast, else, your query will be slow.
Any case, you should to be sure the first step of query ( without sorting ) is made by index and no fullscanning. Use Explain command to be sure.
All you need to do is to help optimizer.At a bare minimum.have a check list which looks like below
1.Are my join columns indexed ?
2.Are the where clauses Sargable
3.are there any implicit,explicit conversions
4.Am i seeing any statistics issues
one more interesting aspect to look at is how is your data distributed,once you understand the data,you will be able to intrepret the execution plan and alter it as per your need
EX:
Think like i have any customers table with 100rows,Each one has a minimum of 10 orders(total upto 10000 orders).Now if you need to find out only top 3 orders by date,you dont want a scan happening of orders table
Now in your case ,i may not go with second option,even though the optimizer may choose a good plan for this one as well,I will go first approach and try to see if the execution time is acceptable.if not then i will go through my check list and try to tune it further
The Query Seems OK, Verify your Indexes ..
Or
Try this Query
SELECT a.id, a.username, b.stat, c.stat, (b.stat - c.stat) AS stat_diff
FROM users AS a
INNER JOIN (select id,stat from stats where date = '2016-01-10') AS b ON (b.id=a.id)
INNER JOIN (select id,stat from stats where date = '2016-01-13') AS c ON (c.id=a.id)
GROUP BY a.id
ORDER BY stat_diff DESC
LIMIT 100
I have two tables, users and results.
Results contains column user_id same as users table.
I want to grab results table, sum results_value column, and than use user_id to grab additional info from users table.... I came up with this:
SELECT results.user_id, SUM(results.result_value), users.user_name, users.user_pic, users.user_level
FROM results, users
WHERE users.user_id = results.user_id
GROUP BY results.user_id
ORDER BY SUM(results.result_value) DESC
LIMIT 4
It actually works, but being a complete mysql beginner, I'm wondering if I'm doing something stupid, or maybe it works but there is a better way (faster) way of doing the same thing?
Your method is actually fine. It is possible to write it without GROUP BY, it is probably easier to understand:
SELECT
u.user_id,
( SELECT sum(r.result_value)
FROM results r
WHERE r.user_id = u.user_id
) AS rsum,
u.user_name,
u.user_pic,
u.user_level
FROM users u
ORDER BY 2 DESC
LIMIT 4
As you may notice, using table aliases helps a lot with readability of your queries.
Basics look fine. Main thing is to make sure you have suitable indexes. Also you should have all the non aggregate columns named in the GROUP BY clause (not 100% necessary in MySQL, but is in some other flavours of SQL, and not doing this can result in indeterminate values being returned for those columns).
You can also do the JOIN using more modern syntax:-
SELECT results.user_id, SUM(results.result_value) AS Asum, users.user_name, users.user_pic, users.user_level
FROM results
INNER JOIN users
ON users.user_id = results.user_id
GROUP BY results.user_id, users.user_name, users.user_pic, users.user_level
ORDER BY Asum DESC
LIMIT 4
When I run this query mysql server cpu usages stays at 100% and chokes the server. What am I doing wrong?
SELECT *
FROM projects p, orders o, invoices i
WHERE p.project_state = 'product'
AND (
p.status = 'expired'
OR p.status = 'finished'
OR p.status = 'open'
)
AND p.user_id = '12'
AND i.projectid =0
GROUP BY i.invoiceid
LIMIT 0 , 30
You are including the orders table but not joining to it. This will make a full cross join that can potentially produce millions of rows.
Use EXPLAIN to find out the query plan. From that you can work out what indexes will be required. Those indexes will vastly improve performance.
Also you are not limiting the orders in any way.
You didn't put any joins on the tables. I believe by default that will do a cross join. That means if you have 1000 projects, 100,000 orders and 100,000 invoices the resultset will be 1,000,000,000,000 (1 trillion) records.
You probably want to put some inner joins between those tables.
I have got this query:
SELECT
t.type_id, t.product_id, u.account_id, t.name, u.username
FROM
types AS t
INNER JOIN
( SELECT user_id, username, account_id
FROM users WHERE account_id=$account_id ) AS u
ON
t.user_id = u.user_id
ORDER BY
t.type_id DESC
1st question:
It takes around 30seconds to do this at the moment with only 18k records in types table.
The only indexes at the moment are only a primary indexes with just id.
Would the long time be caused by a lack of more indexes? Or would it be more to do with the structure of this query?
2nd question:
How can I add the LIMIT so I only get 100 records with the highest type_id?
Without changing the results, I think it is a 100 times faster if you don't make a sub-select of your users table. It is not needed at all in this case.
You can just add LIMIT 100 to get only the first 100 results (or less if there aren't a 100).
SELECT SQL_CALC_FOUND_ROWS /* Calculate the total number of rows, without the LIMIT */
t.type_id, t.product_id, u.account_id, t.name, u.username
FROM
types t
INNER JOIN users u ON u.user_id = t.user_id
WHERE
u.account_id = $account_id
ORDER BY
t.type_id DESC
LIMIT 1
Then, execute a second query to get the total number of rows that is calculated.
SELECT FOUND_ROWS()
That sub select on MySQL is going to slow down your query. I'm assuming that this
SELECT user_id, username, account_id
FROM users WHERE account_id=$account_id
doesn't return many rows at all. If that's the case then the sub select alone won't explain the delay you're seeing.
Try throwing an index on user_id in your types table. Without it, you're doing a full table scan of 18k records for each record returned by that sub select.
Inner join the users table and add that index and I bet you see a huge increase in speed.