I have the following query:
SELECT *
FROM products
INNER JOIN product_meta
ON products.id = product_meta.product_id
JOIN sales_rights
ON product_meta.product_id = sales_rights.product_id
WHERE ( products.categories REGEXP '[[:<:]]5[[:>:]]' )
AND ( active = '1' )
AND ( products.show_browse = 1 )
AND ( product_meta.software_platform_mac IS NOT NULL )
AND ( sales_rights.country_id = '240'
OR sales_rights.country_id = '223' )
GROUP BY products.id
ORDER BY products.avg_rating DESC
LIMIT 0, 18;
Running the query with the omission of the ORDER BY section and the query runs in ~90ms, with the ORDER BY section and the query takes ~8s.
I've browsed around SO and have found the reason for this could be that the sort is being executed before all the data is returned, and instead we should be running ORDER BY on the result set instead? (See this post: Slow query when using ORDER BY)
But I can't quite figure out the definitive way on how I do this?
I've browsed around SO and have found the reason for this could be
that the sort is being executed before all the data is returned, and
instead we should be running ORDER BY on the result set instead?
I find that hard to believe, but if that's indeed the issue, I think you'll need to do something like this. (Note where I put the parens.)
select * from
(
SELECT products.id, products.avg_rating
FROM products
INNER JOIN product_meta
ON products.id = product_meta.product_id
JOIN sales_rights
ON product_meta.product_id = sales_rights.product_id
WHERE ( products.categories REGEXP '[[:<:]]5[[:>:]]' )
AND ( active = '1' )
AND ( products.show_browse = 1 )
AND ( product_meta.software_platform_mac IS NOT NULL )
AND ( sales_rights.country_id = '240'
OR sales_rights.country_id = '223' )
GROUP BY products.id
) as X
ORDER BY avg_rating DESC
LIMIT 0, 18;
Also, edit your question and include a link to that advice. I think many of us would benefit from reading it.
Additional, possibly unrelated issues
Every column used in a WHERE clause should probably be indexed somehow. Multi-column indexes might perform better for this particular query.
The column products.categories seems to be storing multiple values that you filter with regular expressions. Storing multiple values in a single column is usually a bad idea.
MySQL's GROUP BY is indeterminate. A standard SQL statement using a GROUP BY might return fewer rows, and it might return them faster.
If you can, you may want to index your ID columns so that the query will run quicker. This is a DBA-level solution, rather than a SQL solution - tuning the database will help overall performance.
The issue in the instance of this query, was that by using GROUP BY and ORDER BY in a query, MySQL is unable to use the index if the GROUP BY and ORDER BY expressions are different.
Related Reading:
http://dev.mysql.com/doc/refman/5.0/en/order-by-optimization.html
http://mysqldba.blogspot.co.uk/2008/06/how-to-pick-indexes-for-order-by-and.html
Related
I want to improve my current query. So I have this table called Incomes. Where I have a sourceId varchar field. I have a single SELECT for the fields I need, but I needed to add an extra field called isFirstTime to represent if it was the first time on the row on what that sourceId was used. This is my current query:
SELECT DISTINCT
`income`.*,
CASE WHEN (
SELECT
`income2`.id
FROM
`income` as `income2`
WHERE
`income2`."sourceId" = `income`."sourceId"
ORDER BY
`income2`.created asc
LIMIT 1
) = `income`.id THEN true ELSE false END
as isFirstIncome
FROM
`income` as `income`
WHERE `income`.incomeType IN ('passive', 'active') AND `income`.status = 'paid'
ORDER BY `income`.created desc
LIMIT 50
The query works but slows down if I keep increasing the LIMIT or OFFSET. Any suggestions?
UPDATE 1:
Added WHERE statements used on the original query
UPDATE 2:
MYSQL version 5.7.22
You can achieve it using Ordered Analytical Function.
You can use ROW_NUMBER or RANK to get the desired result.
Below query will give the desired output.
SELECT *,
CASE
WHEN Row_number()
OVER(
PARTITION BY sourceid
ORDER BY created ASC) = 1 THEN true
ELSE false
END AS isFirstIncome
FROM income
WHERE incomeType IN ('passive', 'active') AND status = 'paid'
ORDER BY created desc
DB Fiddle: See the result here
My first thought is that isFirstIncome should be an extra column in the table. It should be populated as the data is inserted.
If you don't like that, let's try to optimize the query...
Let's avoid doing the subquery more than 50 times. This requires turning the query inside-out. (It's like "explode-implode", where the query gathers lots of stuff, then sorts it and throws most of the rows away.)
To summarize:
do the least amount of effort to just identify the 5 rows.
JOIN to whatever tables are needed (including itself if appropriate); this is to get any other columns desired (including isFirstIncome).
SELECT i3.*,
( ... using i3 ... ) as isFirstIncome
FROM (
SELECT i1.id, i1.sourceId
FROM `income` AS i1
WHERE i1.incomeType IN ('passive', 'active')
AND i1.status = 'paid'
ORDER BY i1.created DESC
LIMIT 50
) AS i2
JOIN income AS i3 USING(id)
ORDER BY i2.created DESC -- yes, repeated
(I left out the computation of isFirstIncome; it is discussed in other Answers. But note that it will be executed at most 50 times.)
(The aliases -- i1, i2, i3 -- are numbered in the order they will be "used"; this is to assist in following the SQL.)
To assist in performance, add
INDEX(status, incomeType, created, id, sourceId)
It should help with my formulation, but probably not for the other versions. Your version would benefit from
INDEX(sourceId, created, id)
First off, take a look at diagram (this is an application for testing students knowledge)
I already have working application, which calculates score (in percents), but to sort by score, it is required to select all the records (of current test). And it drastically slows down app (~ 10 seconds of waiting). So I decided to move that logic into single sql query.
Now, my SQL query looks like this:
select test_results.*,
(
select test_result_total_score * 100 / test_result_total_max_score
from (
select (select sum(question_score)
from (
select question_total_right_answers = question_total_options as question_score
from (
select (
select count(*)
from answers
inner join answer_options on answer_options.id = answers.answer_option_id
where answers.asked_question_id = asked_questions.id
and answers.is_chosen = answer_options.is_right
) as question_total_right_answers,
(
select count(*)
from answers
left join answer_options on answer_options.id = answers.answer_option_id
where answers.asked_question_id = asked_questions.id
) as question_total_options
from asked_questions
where asked_questions.test_result_id = test_results.id
) as rigt_per_question
) as questions_scores) as test_result_total_score,
(select count(*)
from asked_questions
where asked_questions.test_result_id = test_results.id) as test_result_total_max_score
) as right_per_test_result
) as result_in_percents
from test_results
where test_results.id between 1 and 200;
Here is what it should do: for each asked question collect how many answer options there are (question_total_options) and how many answers user selected right (question_total_right_answers) - the very nested subqueries.
Then for each of this results calculate score (this is basically 1 if user selected all right options and 0 if at least one option is selected not right).
After that, we sum scores of all that questions (test_result_total_score - how many questions user answered right). Also, we calculate how many questions there are in test result (test_result_total_max_score).
With that information we can calculate percentage of right answered questions (test_result_total_score * 100 / test_result_total_max_score)
And the error is on lines 23 and 28:
where asked_questions.test_result_id = test_results.id
where asked_questions.test_result_id = test_results.id) as test_result_total_max_score
It says: [42S22][1054] Unknown column 'test_results.id' in 'where clause'
I have tried using variable #test_result_id like this:
select test_results.*,
#test_result_id := test_results.id,
( ... )
where asked_questions.test_result_id = #test_result_id
where asked_questions.test_result_id = #test_result_id) as test_result_total_max_score
And it evaluates, but in wrong way (probably because order of evaluation select values is undefined). BTW, all result_in_percents correspond to very first result.
For those facing similar problem, it seems that there is no simple solution.
First off, you can try rewrite your subqueries with joins as I did (see below). But when you would like to perform group operations on grouped results, you are really unhappy person). A "dirty" solution might be create function to overcome barrier of nesting subqueries.
create function test_result_in_percents(test_result_id bigint unsigned)
returns float
begin
return (
select sum(tmp.question_right) * 100 / count(*)
from (select sum(answers.is_chosen = answer_options.is_right) = count(*) as question_right
, asked_questions.test_result_id as test_result_id
from answers
inner join answer_options on answer_options.id = answers.answer_option_id
inner join asked_questions on asked_questions.id = answers.asked_question_id
where asked_questions.test_result_id = test_result_id
group by answers.asked_question_id
) as tmp
group by test_result_id
);
end;
And then, just use this function:
select (test_result_in_percents(test_results.id)) as `result_percents`
from `test_results`
where `test_results`.`test_id` = 181
and `test_results`.`test_id` is not null
order by `test_results`.`id` desc;
i try to optimize my query because it takes 3.5 seconds and its too long.
this is my query:
SELECT
`products`.*,
IFNULL(SUM(`products_uses`.`quantity`),0) as `usesQuantity`,
IF((`products`.`productQuantity` - IFNULL(SUM(`products_uses`.`quantity`),0)) > 0, (`products`.`productQuantity` - IFNULL(SUM(`products_uses`.`quantity`),0)), 0) AS `totalUses`
FROM `products`
LEFT JOIN `products_uses` ON `products`.`id` = `products_uses`.`productId`
WHERE `products`.`nurseForm` = 1
GROUP BY `products`.`id`
ORDER BY `products`.`fav` DESC, `products`.`productName` ASC
i tried to optimize with variables but nothing changed:
SELECT
`products`.*,
#usesQuantity := IFNULL(SUM(`products_uses`.`quantity`),0) as `usesQuantity`,
IF((`products`.`productQuantity` - #usesQuantity) > 0, (`products`.`productQuantity` - #usesQuantity), 0) AS `totalUses`
FROM `products`
LEFT JOIN `products_uses` ON `products`.`id` = `products_uses`.`productId`
WHERE `products`.`nurseForm` = 1
GROUP BY `products`.`id`
ORDER BY `products`.`fav` DESC, `products`.`productName` ASC
this query
sum how much quantity used each product - IFNULL(SUM(products_uses.quantity),0)
,
how much uses each product -
IF((`products`.`productQuantity` - IFNULL(SUM(`products_uses`.`quantity`),0)) > 0, (`products`.`productQuantity` - IFNULL(SUM(`products_uses`.`quantity`),0)), 0)
i tried to changed structure of the tables to myISAM and InnoDB nothing changed.
what can i do to optimize this query?
tnx
I'm not sure that introducing user variables will change much. But stategically adding indices on your two tables might help. Try this:
ALTER TABLE products ADD INDEX nurse_index (nurseForm);
ALTER TABLE products_uses ADD INDEX product_index (productId);
The first index, on the products.nurseForm column, might help the WHERE clause. In particular, this index would be a big help if only a few records match.
The second index, on products_uses.productId, might help the join go faster. Again, this would depend on how large your tables are.
You may also run EXPLAIN to see if any other bottlenecks stand out.
Providing SHOW CREATE TABLE would help. Meanwhile, I will guess.
This may (or may not) help. It attempts to do the summing without hauling all the columns of products around. It avoids the GROUP BY.
SELECT p2.*,
IFNULL(x.usesQuantity, 0) as `usesQuantity`,
GREATEST(p2.`productQuantity` - IFNULL(x.usesQuantity), 0) AS `totalUses`
FROM `products` AS p2
LEFT JOIN
( SELECT p.id AS xid,
SUM(pu.quantity) as `usesQuantity`
FROM products_uses AS pu
JOIN products AS p ON p.id = pu.productId
WHERE p.nurseForm = 1
) AS x ON x.xid = p2.id
ORDER BY p2.`fav` DESC,
p2.`productName` ASC
These indexes should help:
products: INDEX(nurseForm, id)
products: PRIMARY KEY(id) -- I am assuming this??
products_uses: INDEX(productId, quantity)
If LEFT were unnecessary there would be other optimizations.
MySQL 5.6 would help with the subquery.
MySQL 8.0 could help with the ORDER BY, meanwhile, a sort is required due to the mixture of DESC and ASC.
This query (along with a few others I think have a related issue) did not take 30 seconds when MySQL was local on the same EC2 instance as the rest of the website. More like milliseconds.
Does anything look off?
SELECT *, chv_images.image_id FROM chv_images
LEFT JOIN chv_storages ON chv_images.image_storage_id =
chv_storages.storage_id
LEFT JOIN chv_users ON chv_images.image_user_id = chv_users.user_id
LEFT JOIN chv_albums ON chv_images.image_album_id = chv_albums.album_id
LEFT JOIN chv_categories ON chv_images.image_category_id =
chv_categories.category_id
LEFT JOIN chv_meta ON chv_images.image_id = chv_meta.image_id
LEFT JOIN chv_likes ON chv_likes.like_content_type = "image" AND
chv_likes.like_content_id = chv_images.image_id AND chv_likes.like_user_id = 1
LEFT JOIN chv_follows ON chv_follows.follow_followed_user_id =
chv_images.image_user_id
LEFT JOIN chv_follows_projects ON
chv_follows_projects.follows_project_project_id =
chv_images.image_project_id LEFT JOIN chv_projects ON
chv_projects.project_id = follows_project_project_id WHERE
chv_follows.follow_user_id='1' OR (follows_project_user_id = 1 AND
chv_projects.project_privacy = "public" AND
chv_projects.project_is_public_upload = 1) GROUP BY chv_images.image_id
ORDER BY chv_images.image_id DESC
LIMIT 0,15
And this is what EXPLAIN shows:
Thank you
Update: This query has the same issue. It does not have a GROUP BY.
SELECT *, chv_images.image_id FROM chv_images
LEFT JOIN chv_storages ON chv_images.image_storage_id =
chv_storages.storage_id
LEFT JOIN chv_users ON chv_images.image_user_id = chv_users.user_id
LEFT JOIN chv_albums ON chv_images.image_album_id = chv_albums.album_id
LEFT JOIN chv_categories ON chv_images.image_category_id =
chv_categories.category_id
LEFT JOIN chv_meta ON chv_images.image_id = chv_meta.image_id
LEFT JOIN chv_likes ON chv_likes.like_content_type = "image" AND
chv_likes.like_content_id = chv_images.image_id AND chv_likes.like_user_id = 1
ORDER BY chv_images.image_id DESC
LIMIT 0,15
That EXPLAIN shows several table-scans (type: ALL), so it's not surprising that it takes over 30 seconds.
Here's your EXPLAIN:
Notice the column rows shows an estimated 14420 rows read from the first table chv_images. It's doing a table-scan of all the rows.
In general, when you do a series of JOINs, you can multiple together all the values in the rows column of the EXPLAIN, and the final result is how many row-reads MySQL has to do. In this case it's 14420 * 2 * 1 * 1 * 2 * 1 * 916, or 52,834,880 row-reads. That should put into perspective the high cost of doing several table-scans in the same query.
You might help avoid those table-scans by creating some indexes on these tables:
ALTER TABLE chv_storages
ADD INDEX (storage_id);
ALTER TABLE chv_categories
ADD INDEX (category_id);
ALTER TABLE chv_likes
ADD INDEX (like_content_id, like_content_type, like_user_id);
Try creating those indexes and then run the EXPLAIN again.
The other tables are already doing lookups by primary key (type: eq_ref) or by secondary key (type: ref) so those are already optimized.
Your EXPLAIN shows your query uses a temporary table and filesort. You should reconsider whether you need the GROUP BY, because that's probably causing the extra work.
Another tip is to avoid using SELECT * because it might be forcing the query to read many extra columns that you don't need. Instead, explicitly name only the columns you need.
Is there any indexes in chv_images?
I propose:
CREATE INDEX idx_image_id ON chv_images (image_id);
(Bill's ideas are good. I'll take the discussion a different way...)
Explode-Implode -- If the LEFT JOINs match no more than 1 row, change, for example,
SELECT
...
LEFT JOIN chv_meta ON chv_images.image_id = chv_meta.image_id
into
SELECT ...,
( SELECT foo FROM chv_meta WHERE image_id = chv_images.image_id ) AS foo, ...
If that can be done for all the JOINs, you can get rid of GROUP BY. This will avoid the costly "explode-implode" where JOINs lead to more rows, then GROUP BY gets rid of the dups. (I suspect you can't move all the joins in.)
OR -> UNION -- OR is hard to optimize. Your query looks like a good candidate for turning into UNION, then making more indexes that will become useful.
WHERE chv_follows.follow_user_id='1'
OR (follows_project_user_id = 1
AND chv_projects.project_privacy = "public"
AND chv_projects.project_is_public_upload = 1
)
Assuming that follows_project_user_id is in `chv_images,
( SELECT ...
WHERE chv_follows.follow_user_id='1' )
UNION DISTINCT -- or ALL, if you are sure there won't be dups
( SELECT ...
WHERE follows_project_user_id = 1
AND chv_projects.project_privacy = "public"
AND chv_projects.project_is_public_upload = 1 )
Indexes needed:
chv_follows: (follow_user_id)
chv_projects: (project_privacy, project_is_public_upload) -- either order
But this has not yet handled the ORDER BY and LIMIT. The general pattern for such:
( SELECT ... ORDER BY ... LIMIT 15 )
UNION
( SELECT ... ORDER BY ... LIMIT 15 )
ORDER BY ... LIMIT 15
Yes, the ORDER BY and LIMIT are repeated.
That works for page 1. If you want the next 15 rows, see http://mysql.rjweb.org/doc.php/pagination#pagination_and_union
After building those two sub-selects, look at them; I think you will be able to optimize each one, and may need new indexes because the Optimizer will start with a different 'first' table.
I have an existing mysql query that I need to add to and I'm not sure how to go about it.
Here is my current sql query.
SELECT tbl_brokerage_names.brokerage_id, tbl_brokerage_names.short_name,
b.indication, b.max_indication
FROM tbl_brokerage_names
LEFT JOIN (
SELECT * FROM tbl_recommendation_brokerages
WHERE recommendation_id = {$_GET['id']}
) b ON (tbl_brokerage_names.brokerage_id = b.brokerage_id)
ORDER BY tbl_brokerage_names.short_name ASC
Here is the query that I need to work into the previous query.
SELECT * , COUNT( * )
FROM tbl_streetaccounts
JOIN tbl_brokerage_names
WHERE tbl_brokerage_names.brokerage_id = tbl_streetaccounts.brokerage_id
Basically I need to return a count, so I need to combine these two queries.
You should run these as two separate queries.
The COUNT(*) query will return a single row, so there's no way to "combine" it with the first query while preserving the multi-row result of the first query.
Also, when you SELECT *, COUNT(*) you will get columns from some arbitrary row.
By the way, you have a glaring SQL injection vulnerability. Don't interpolate $_GET parameters directly in your SQL query. Instead, coerce it to an integer:
<?php
$id = (int) $_GET['id'];
$sql = "SELECT ... WHERE recommendation_id = {$id}";
Like #Bill said, you cannot get the count in every row without really weird syntax, but you can get an overall count using GROUP BY ... WITH ROLLUP.
e.g.:
<?php
$id = mysql_real_escape_string($_GET['id']); //works with anything, not just numbers
$query = "
SELECT tbl_brokerage_names.brokerage_id
, tbl_brokerage_names.short_name
, b.indication
, b.max_indication
, count(*) as rowcount
FROM tbl_brokerage_names
LEFT JOIN (
SELECT * FROM tbl_recommendation_brokerages
WHERE recommendation_id = '$id' //The single quotes are essential for safety!
) b ON (tbl_brokerage_names.brokerage_id = b.brokerage_id)
GROUP BY tbl_brokerage_names.brokerage_id WITH ROLLUP
ORDER BY tbl_brokerage_names.short_name ASC
";
The GROUP BY .. WITH ROLLUP will add an extra line to the result with all NULL's for the non aggregated columns and a grand total count.
If you have any lines where rowcount > 0 then you need to add extra clauses from table b to the group by clause to prevent MySQL from hiding arbitrary rows.
Table tbl_brokerage_names is already fully defined because you are grouping by the primary key.