I have got this query:
SELECT
t.type_id, t.product_id, u.account_id, t.name, u.username
FROM
types AS t
INNER JOIN
( SELECT user_id, username, account_id
FROM users WHERE account_id=$account_id ) AS u
ON
t.user_id = u.user_id
ORDER BY
t.type_id DESC
1st question:
It takes around 30seconds to do this at the moment with only 18k records in types table.
The only indexes at the moment are only a primary indexes with just id.
Would the long time be caused by a lack of more indexes? Or would it be more to do with the structure of this query?
2nd question:
How can I add the LIMIT so I only get 100 records with the highest type_id?
Without changing the results, I think it is a 100 times faster if you don't make a sub-select of your users table. It is not needed at all in this case.
You can just add LIMIT 100 to get only the first 100 results (or less if there aren't a 100).
SELECT SQL_CALC_FOUND_ROWS /* Calculate the total number of rows, without the LIMIT */
t.type_id, t.product_id, u.account_id, t.name, u.username
FROM
types t
INNER JOIN users u ON u.user_id = t.user_id
WHERE
u.account_id = $account_id
ORDER BY
t.type_id DESC
LIMIT 1
Then, execute a second query to get the total number of rows that is calculated.
SELECT FOUND_ROWS()
That sub select on MySQL is going to slow down your query. I'm assuming that this
SELECT user_id, username, account_id
FROM users WHERE account_id=$account_id
doesn't return many rows at all. If that's the case then the sub select alone won't explain the delay you're seeing.
Try throwing an index on user_id in your types table. Without it, you're doing a full table scan of 18k records for each record returned by that sub select.
Inner join the users table and add that index and I bet you see a huge increase in speed.
Related
I have this query in my program, when I do some sorting with select count(*) field from the query, I dont know why, it very slow when running that query.
The problem is when i do some ordering from posts_count, it run more slower than i do ordering with the other field.
Here's the query:
select 'tags'.*, (select count(*) from 'posts' inner join 'post_tag' on 'posts'.'id' = 'post_tag'.'post_id' where 'tags'.'id' = 'post_tag'.'tag_id') as 'posts_count' from 'tags' order by 'posts_count' asc limit 15 offset 0;
Here's the execution time :
Please someone help me to improve this query , Thank you.
What i expect is the query can be run faster.
SELECT t.*, COUNT(*) AS count
FROM tags AS t
LEFT OUTER JOIN post_tag AS pt ON t.id = pt.tag_id
GROUP BY t.id
ORDER BY count ASC LIMIT 15 OFFSET 0;
You should make sure post_tag has an index starting with the tag_id column. You didn't include your table definition in your question, so I must assume the index is there. If the primary key starts with tag_id, that's okay too.
You don't need to join to posts, if I can assume that a row exists in post_tag means it must reference an existing row in posts. You can get the information you need only by joining to post_tag.
I am not sure what is happening, but MySQL should handle this just fine in my opinion.
I have SQL like this.
SELECT u.id AS user_id, SUM(t.amount) AS total
FROM user u
INNER JOIN transaction t ON t.user_id = u.id
WHERE u.condition = true
GROUP BY u.id
ORDER BY total DESC;
This query runs for 10 seconds.
If I remove ORDER BY clause, the time is around 4 seconds.
Tables are very large but after GROUP BY I have only 40 rows. Does it really take 6 seconds to sort 40 rows? I would say this should be handled by the optimizer.
However if I run the query like this:
SELECT *
FROM (
SELECT u.id AS user_id, SUM(t.amount) AS total
FROM user u
INNER JOIN transaction t ON t.user_id = u.id
WHERE u.condition = true
GROUP BY u.id
) data
ORDER BY total DESC;
This query runs for 4 seconds. I understand I forced MySQL to sort only the 40 records retrieved from inner select.
I really do not understand one thing. MySQL cannot sort by total before GROUP BY.. So what is slowing the query so much?
In this case I can use the second query, but if I had another inner SQL, MySQL would start creating temporary tables and it would kill the performance maybe more than ORDER BY. Another "problem" is that I use ORM and using raw SQL is really painful.
Thanks for suggestions.
EDIT:
Execution plan with ORDER BY
Execution plan without ORDER BY
I can see in execution plan, there is additional filesort + temporary when using ORDER BY..
I have two tables. The first table (users) is a simple "id, username" with 100,00 rows and the second (stats) is "id, date, stat" with 20M rows.
I'm trying to figure out which username went up by the most in stat and here's the query I have. On a powerful machine, this query takes minutes to complete. Is there a better way to write it to speed it up?
SELECT a.id, a.username, b.stat, c.stat, (b.stat - c.stat) AS stat_diff
FROM users AS a
INNER JOIN stats AS b ON (b.id=a.id)
INNER JOIN stats AS c ON (c.id=a.id)
WHERE b.date = '2016-01-10'
AND c.date = '2016-01-13'
GROUP BY a.id
ORDER BY stat_diff DESC
LIMIT 100
the other way i tried but it doesn't seem optimal is
SELECT a.id, a.username,
(SELECT b.stat FROM stats AS b ON (b.id=a.id) AND b.date = '2016-01-10') AS start,
(SELECT c.stat FROM stats AS c ON (c.id=a.id) AND c.date = '2016-01-14') AS end,
((SELECT b.stat FROM stats AS b ON (b.id=a.id) AND b.date = '2016-01-10') -
(SELECT c.stat FROM stats AS c ON (c.id=a.id) AND c.date = '2016-01-14')) AS stat_diff
FROM users AS a
GROUP BY a.id
ORDER BY stat_diff DESC
LIMIT 100
Introduction
Let's suppose we rewrite sentence like this:
SELECT a.id, a.username, b.stat, c.stat, (b.stat - c.stat) AS stat_diff
FROM users AS a
INNER JOIN stats AS b ON
b.date = STR_TO_DATE('2016-01-10', '%Y-%m-%d' ) and b.id=a.id
INNER JOIN stats AS c ON
c.date = STR_TO_DATE('2016-01-13', '%Y-%m-%d' ) and c.id=a.id
GROUP BY a.id
ORDER BY stat_diff DESC
LIMIT 100
And we ensure than:
users table has index on field id:
stats has index on composite field date, id: create index stats_idx_d_i on stats ( date, id );
Then
Database optimizer may use indexes to selected a Restricted Set of Date ('RSD'), that means, rows that match filtered dates. This is fast.
But
You are sorting by a calculated field:
(b.stat - c.stat) AS stat_diff #<-- calculated
ORDER BY stat_diff DESC #<-- this forces to calculate it
They are no possible optimization on this sort because you should to calculate one by one all results on your 'RSD' (restricted set of data).
Conclusion
The question is, how may rows they are on your 'RSD'? If only they are few hundreds rows you query may run fast, else, your query will be slow.
Any case, you should to be sure the first step of query ( without sorting ) is made by index and no fullscanning. Use Explain command to be sure.
All you need to do is to help optimizer.At a bare minimum.have a check list which looks like below
1.Are my join columns indexed ?
2.Are the where clauses Sargable
3.are there any implicit,explicit conversions
4.Am i seeing any statistics issues
one more interesting aspect to look at is how is your data distributed,once you understand the data,you will be able to intrepret the execution plan and alter it as per your need
EX:
Think like i have any customers table with 100rows,Each one has a minimum of 10 orders(total upto 10000 orders).Now if you need to find out only top 3 orders by date,you dont want a scan happening of orders table
Now in your case ,i may not go with second option,even though the optimizer may choose a good plan for this one as well,I will go first approach and try to see if the execution time is acceptable.if not then i will go through my check list and try to tune it further
The Query Seems OK, Verify your Indexes ..
Or
Try this Query
SELECT a.id, a.username, b.stat, c.stat, (b.stat - c.stat) AS stat_diff
FROM users AS a
INNER JOIN (select id,stat from stats where date = '2016-01-10') AS b ON (b.id=a.id)
INNER JOIN (select id,stat from stats where date = '2016-01-13') AS c ON (c.id=a.id)
GROUP BY a.id
ORDER BY stat_diff DESC
LIMIT 100
I know this was discussed many times but my research did not help me with my problem.
I have a table (innodb) with about 3k records. I need to pick 1 row random with some filters, which i do it like this:
select id, title, topic_id
from posts
where id not in
(select post_id from records where user_id='$my_id' and checked='1')
and topic_id='$topic_id' and status='1'
order by RAND() limit 1
This gives me the result i wanted. The problem is this takes too much time even with 3k records. It will get slower when records are increased.
I have to find a solution for this. Any suggestions?
Update: Both tables are indexed with id columns.
Instead of using where id not in, I would use a LEFT JOIN:
SELECT id,
title,
topic_id
FROM posts p
LEFT JOIN records r
ON p.id = r.post_id
AND r.user_id='$my_id'
AND r.checked = '1'
WHERE p.topic_id='$topic_id'
AND status='1'
AND r.post_id IS NULL
ORDER BY RAND()
LIMIT 1;
With this, you will want an index on posts.id and another index on records.post_id, records.user_id, records.checked
I'm a MySQL query noobie so I'm sure this is a question with an obvious answer.
But, I was looking at these two queries. Will they return different result sets? I understand that the sorting process would commence differently, but I believe they will return the same results with the first query being slightly more efficient?
Query 1: HAVING, then AND
SELECT user_id
FROM forum_posts
GROUP BY user_id
HAVING COUNT(id) >= 100
AND user_id NOT IN (SELECT user_id FROM banned_users)
Query 2: WHERE, then HAVING
SELECT user_id
FROM forum_posts
WHERE user_id NOT IN(SELECT user_id FROM banned_users)
GROUP BY user_id
HAVING COUNT(id) >= 100
Actually the first query will be less efficient (HAVING applied after WHERE).
UPDATE
Some pseudo code to illustrate how your queries are executed ([very] simplified version).
First query:
1. SELECT user_id FROM forum_posts
2. SELECT user_id FROM banned_user
3. Group, count, etc.
4. Exclude records from the first result set if they are presented in the second
Second query
1. SELECT user_id FROM forum_posts
2. SELECT user_id FROM banned_user
3. Exclude records from the first result set if they are presented in the second
4. Group, count, etc.
The order of steps 1,2 is not important, mysql can choose whatever it thinks is better. The important difference is in steps 3,4. Having is applied after GROUP BY. Grouping is usually more expensive than joining (excluding records can be considering as join operation in this case), so the fewer records it has to group, the better performance.
You have already answers that the two queries will show same results and various opinions for which one is more efficient.
My opininion is that there will be a difference in efficiency (speed), only if the optimizer yields with different plans for the 2 queries. I think that for the latest MySQL versions the optimizers are smart enough to find the same plan for either query so there will be no difference at all but off course one can test and see either the excution plans with EXPLAIN or running the 2 queries against some test tables.
I would use the second version in any case, just to play safe.
Let me add that:
COUNT(*) is usually more efficient than COUNT(notNullableField) in MySQL. Until that is fixed in future MySQL versions, use COUNT(*) where applicable.
Therefore, you can also use:
SELECT user_id
FROM forum_posts
WHERE user_id NOT IN
( SELECT user_id FROM banned_users )
GROUP BY user_id
HAVING COUNT(*) >= 100
There are also other ways to achieve same (to NOT IN) sub-results before applying GROUP BY.
Using LEFT JOIN / NULL :
SELECT fp.user_id
FROM forum_posts AS fp
LEFT JOIN banned_users AS bu
ON bu.user_id = fp.user_id
WHERE bu.user_id IS NULL
GROUP BY fp.user_id
HAVING COUNT(*) >= 100
Using NOT EXISTS :
SELECT fp.user_id
FROM forum_posts AS fp
WHERE NOT EXISTS
( SELECT *
FROM banned_users AS bu
WHERE bu.user_id = fp.user_id
)
GROUP BY fp.user_id
HAVING COUNT(*) >= 100
Which of the 3 methods is faster depends on your table sizes and a lot of other factors, so best is to test with your data.
HAVING conditions are applied to the grouped by results, and since you group by user_id, all of their possible values will be present in the grouped result, so the placing of the user_id condition is not important.
To me, second query is more efficient because it lowers the number of records for GROUP BY and HAVING.
Alternatively, you may try the following query to avoid using IN:
SELECT `fp`.`user_id`
FROM `forum_posts` `fp`
LEFT JOIN `banned_users` `bu` ON `fp`.`user_id` = `bu`.`user_id`
WHERE `bu`.`user_id` IS NULL
GROUP BY `fp`.`user_id`
HAVING COUNT(`fp`.`id`) >= 100
Hope this helps.
No it does not gives same results.
Because first query will filter records from count(id) condition
Another query filter records and then apply having clause.
Second Query is correctly written