Mysql Optimize Query: Trying to Get Average of Subquery

Mysql Optimize Query: Trying to Get Average of Subquery - mysql

I have the following query:
SELECT AVG(time) FROM
(SELECT UNIX_TIMESTAMP(max(datelast)) - UNIX_TIMESTAMP(min(datestart)) AS time
FROM table
WHERE id IN
(SELECT DISTINCT id
FROM table
WHERE product_id = 12394 AND datelast > '2011-04-13 00:26:59'
)
GROUP BY id
)
as T
The query gets the greatest datelast value and subtracts it from the greatest datestart value for every ID (which is the length of a user session), and then averages it.
The outer most query is there only to average the resulting times. Is there any way to optimize this query?
Output from EXPLAIN:
id select_type table type possible_keys key key_len ref rows extra
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 7
2 DERIVED table index NULL id 16 NULL 26 Using where
3 DEPENDENT SUBQUERY table index_subquery id,product_id,datelast id 12 func 2 Using index; Using where

Is the first SELECT really necessary ?
SELECT
AVG(time)
FROM
(
SELECT
UNIX_TIMESTAMP(max(datelast)) - UNIX_TIMESTAMP(min(datestart)) AS time
FROM
table
WHERE
product_id = 12394 AND datelast > '2011-04-13 00:26:59'
GROUP BY
id
)
I can't test now and I think it would work too. Otherwise, your query looks good.
You can optimize the query by adding a (datelast, product_id) key (always put the most restrictive field first, to increase selectivity).

Related

Improve performance of a last status retrieval from history table

I want to retrieve the latest status for an item from a history table. History table will have a record of all status changes for an item. The query must be quick to run.
Below is the query that I use to get the latest status per item
SELECT item_history.*
FROM item_history
INNER JOIN (
SELECT MAX(created_at) as created_at, item_id
FROM item_history
GROUP BY item_id
) as latest_status
on latest_status.item_id = item_history.item_id
and latest_status.created_at = item_history.created_at
WHERE item_history.status_id = 1
and item_history.created_at BETWEEN "2020-12-16" AND "2020-12-23"
I've tried putting query above into another inner join to link data with an item:
SELECT *
FROM `items`
INNER JOIN ( [query from above] )
WHERE items.category_id = 3
Notes about item_history table, I have index on the following columns: status_id, creatd_at and listing_id. I have also turned 3 of those into a compound primary key.
My issue is that MySQL keeps scanning the full table to grab MAX(created_at) which is a very slow operation, even tho I only have 3 million records within the history table.
Query plan as follows:
id
select_type
table
partitions
type
possible_keys
key
key_len
ref
rows
filtered
Extra
1
PRIMARY
items
NULL
ref
"PRIMARY,district"
district
18
const
694
100.00
NULL
1
PRIMARY
item_history
NULL
ref
"PRIMARY,status_id,created_at,item_history_item_id_index"
PRIMARY
9
"main.items.id,const"
1
100.00
"Using where"
1
PRIMARY
NULL
ref
<auto_key0>
<auto_key0>
14
"func,main.items.id"
10
100.00
"Using where; Using index"
2
DERIVED
item_history
NULL
range
"PRIMARY,status_id,created_at,item_history_item_id_index"
item_history_item_id_index
8
NULL
2751323
100.00
"Using index"

I want to retrieve the latest status for an item from a history table.
If you want the results for just one item, then use order by and limit:
select *
from item_history
where item_id = ? and created_at between '2020-12-16' and '2020-12-23'
order by created_at desc limit 1
This query would benefit an index on (item_id, created_at).
If you want the latest status per item, I would recommend a correlated subquery:
select *
from item_history h
where created_at = (
select max(h1.created_at)
from item_history h1
where h1.item_id = h.item_id
and h1.created_at between '2020-12-16' and '2020-12-23'
)
The same index should be beneficial.

Using window function MySQL 8.0.14+:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER(PARTITION BY item_id ORDER BY created_at DESC) r
FROM item_history
WHERE item_history.status_id = 1
and item_history.created_at BETWEEN '2020-12-16' AND '2020-12-23'
)
SELECT *
FROM cte WHERE r = 1;
Index on (item_id,created_at) will also help

select last record in each group for large database

I want to fetch last record in each group. I have used following query with very small database and it works perfectly -
SELECT * FROM logs
WHERE id IN (
SELECT max(id) FROM logs
WHERE id_search_option = 31
GROUP BY items_id
)
ORDER BY id DESC
But when it comes to actual database having millions of rows (80,00000+ rows), the system gets hanged.
I also tried another query, which gives result in 6.6sec on an average --
SELECT p1.id, p1.itemtype, p1.items_id, p1.date_mod
FROM logs p1
INNER JOIN (
SELECT max(id) as max_id, itemtype, items_id, date_mod
FROM logs
WHERE id_search_option = 31
GROUP BY items_id) p2
ON (p1.id = p2.max_id)
ORDER BY p1.items_id DESC;
Please help !
EDIT:: Explain 2nd query
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 1177 Using temporary; Using filesort
1 PRIMARY p1 eq_ref PRIMARY PRIMARY 4 p2.max_id 1
2 DERIVED logs ALL NULL NULL NULL NULL 7930527 Using where; Using temporary; Using filesort

select *from tablename orderby unique_column desc limit 0,1;
try it will work
here 0->oth record,1->one record

mysql:choosing the most efficient query from the two

Both of these mysql queries produce exactly the same result but query A is a simple union and it has the where postType clause embedded inside individual queries whereas query B has the same where clause applied to the external select of the virtual table which is a union of individual query results. I am concerned that the virtual table sigma from query B might get too large for no good reason if there are a lot of rows but then I am bit confused because how would the order by work for query A ; would it also not have to make a virtual table or something like that for sorting results. All may depend on how order by works for a union ? If order by for a union is also making a temp table ; would then query A almost equate to query B in resources(it will be much easier for us to implement query B in our system compared to query A)? Please guide/advise in any way possible, thanks
Query A
SELECT `t1`.*, `t2`.*
FROM `t1` INNER JOIN `t2` ON
`t1`.websiteID= `t2`.ownerID
AND `t1`.authorID= `t2`.authorID
AND `t1`.authorID=1559 AND `t1`.postType="simplePost"
UNION
SELECT `t1`.*
FROM `t1` where websiteID=1559 AND postType="simplePost"
ORDER BY postID limit 0,50
Query B
Select * from (
SELECT `t1`.*,`t2`.*
FROM `t1` INNER JOIN `t2` ON
`t1`.websiteID= `t2`.ownerID
AND `t1`.authorID= `t2`.authorID
AND `t1`.authorID=1559
UNION
SELECT `t1`.*
FROM `t1` where websiteID=1559
)
As sigma where postType="simplePost" ORDER BY postID limit 0,50
EXPLAIN FOR QUERY A
id type table type possible_keys keys key_len ref rows Extra
1 PRIMARY t2 ref userID userID 4 const 1
1 PRIMARY t1 ref authorID authorID 4 const 2 Usingwhere
2 UNION t1 ref websiteID websiteID 4 const 9 Usingwhere
NULL UNIONRESULT <union1,2> ALL NULL NULL NULL NULL NULL Usingfilesort
EXPLAIN FOR QUERY B
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 10 Using where; Using filesort
2 DERIVED t2 ref userID userID 4 1
2 DERIVED t1 ref authorID authorID 4 2 Using where
3 UNION t1 ref websiteID websiteID 4 9
NULL UNION RESULT <union2,3> ALL NULL NULL NULL NULL NULL

There is no doubt that version 1 - separate where clauses in each side of the union - will be faster. Let's look at why version - where clause over the union result - is worse:
data volume: there's always going to be more rows in the union result, because there are less conditions on what rows are returned. This means more disk I/O (depending on indexes), more temporary storage to hold the rowset, which means more processing time
repeated scan: the entire result of the union must be scanned again to apply the condition, when it could have been handled during the initial scan. This means double handling the rowset, albeit probably in-memory, still it's extra work.
indexes aren't used for where clauses on a union result. If you have an index over the foreign key fields and postType, it would not be used
If you want maximum performance, use UNION ALL, which passes the rows straight out into the result with no overhead, instead of UNION, which removes duplicates (usually by sorting) and can be expensive and is unnecessary based in your comments
Define these indexes and use version 1 for maximum performance:
create index t1_authorID_postType on t1(authorID, postType);
create index t1_websiteID_postType on t1(websiteID, postType);

perhaps this would work in lieu:
SELECT
`t1`.*
,`t2`.*
FROM `t1`
LEFT JOIN `t2` ON `t1`.websiteID = `t2`.ownerID
AND `t1`.authorID = `t2`.authorID
AND `t1`.authorID = 1559
WHERE ( `t1`.authorID = 1559 OR `t1`.websiteID = 1559 )
AND `t1`.postType = 'simplePost'
ORDER BY postID limit 0 ,50

Optimize joined order by query

I Have the following query:
SELECT `p_products`.`id`, `p_products`.`name`, `p_products`.`date`,
`p_products`.`img`, `p_products`.`safe_name`, `p_products`.`sku`,
`p_products`.`productstatusid`, `op`.`quantity`
FROM `p_products`
INNER JOIN `p_product_p_category`
ON `p_products`.`id` = `p_product_p_category`.`p_product_id`
LEFT JOIN (SELECT `p_product_id`,`order_date`,SUM(`product_quantity`) as quantity
FROM `p_orderedproducts`
WHERE `order_date`>='2013-03-01 16:51:17'
GROUP BY `p_product_id`) AS op
ON `p_products`.`id` = `op`.`p_product_id`
WHERE `p_product_p_category`.`p_category_id` IN ('15','23','32')
AND `p_products`.`active` = '1'
GROUP BY `p_products`.`id`
ORDER BY `date` DESC
Explain says:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY p_product_p_category ref p_product_id,p_category_id,p_product_id_2 p_category_id 4 const 8239 Using temporary; Using filesort
1 PRIMARY p_products eq_ref PRIMARY PRIMARY 4 pdev.p_product_p_category.p_product_id 1 Using where
1 PRIMARY ALL NULL NULL NULL NULL 78
2 DERIVED p_orderedproducts index order_date p_product_id 4 NULL 201 Using where
And I have indexes on a number of columns including p_products.date.
Problem is the speed when there are more then 5000 products in a number of categories. 60000 products take >1 second. Is there any way to speed things up?
This also holds true if I remove the left join in which case the result is:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE p_product_p_category index p_product_id,p_category_id,p_product_id_2 p_product_id_2 8 NULL 91167 Using where; Using index; Using temporary; Using filesort
1 SIMPLE p_products eq_ref PRIMARY PRIMARY 4 pdev.p_product_p_category.p_product_id 1 Using where
The intermediatate table p_product_p_category has indexes on both p_product_id and p_category_id aswell as a combined index with both.
Tries Ochi's suggestion and ended up with:
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY ALL NULL NULL NULL NULL 62087 Using temporary; Using filesort
1 PRIMARY nr1media_products eq_ref PRIMARY PRIMARY 4 cats.nr1media_product_id 1 Using where
2 DERIVED nr1media_product_nr1media_category range nr1media_category_id nr1media_category_id 4 NULL 62066 Using where
I think I can simplify the question to how can I join my products on the category intermediate table to fetch all unique products for the selected categories, sorted by date.
EDIT:
This gives me all unique products in the categories without using a temp table for ordering or grouping:
SELECT
`p_products`.`id`,
`p_products`.`name`,
`p_products`.`img`,
`p_products`.`safe_name`,
`p_products`.`sku`,
`p_products`.`productstatusid`
FROM
p_products
WHERE
EXISTS (
SELECT
1
FROM
p_product_p_category
WHERE
p_product_p_category.p_product_id = p_products.id
AND p_category_id IN ('15', '23', '32')
)
AND p_products.active = 1
ORDER BY
`date` DESC
Above query is very fast, much faster then the join using group by order by (0.04 VS 0.7 sec), although I don't understand why it can do this query without temp tables.
I think I need to find another solution for the orderedproducts join, it still slows the query down to >1 sec. Might make a cron to update the ranking of the products sold once every night and save that info to the p_products table.
Unless someone has a definitive solution...

You are joining every type of category to products - only then it gets filtered by category id
try to limit your query as soon as possible for e.g. instead of
INNER JOIN `p_product_p_category`
do
INNER JOIN ( SELECT * FROM `p_product_p_category` WHERE `p_category_id` IN ('15','23','32') )
so that you will be working on smaller subset of products right from begining

One possible solution would be to remove the derived table and just do a single Group By:
Select P.id, P.name, P.date
, P.img, P.safe_name, P.sku
, P.productstatusid
, Sum( OP.product_quantity ) As quantity
From p_products As P
Join p_product_p_category As CAT
On p_products.id = CAT.p_product_id
Left Join p_orderedproducts As OP
On OP.p_product_id = P.id
And OP.order_date >= '2013-03-01 16:51:17'
Where CAT.p_category_id In ('15','23','32')
And P.active = '1'
Group By P.id, P.name, P.date
, P.img, P.safe_name, P.sku
, P.productstatusid
Order By P.date Desc

MySQL: how to increase speed of a select query with 2 joins and 1 subquery

In a table 'ttraces' I have many records for different tasks (whose value is held in 'taskid' column and is a foreign key of a column 'id' in a table 'ttasks'). Each task inserts a record to 'ttraces' every 8-10 seconds, so caching data to increase performance is not a good idea. What I need is to select only the newest records for each task from 'ttraces', that means the records with the maximum value of the column 'time'. At the moment, I have over 500000 records in the table. The very simplified structure of these two tables looks as follows:
-----------------------
| ttasks |
-----------------------
| id | name | blocked |
-----------------------
---------------------
| ttraces |
---------------------
| id | taskid | time |
---------------------
And my query is shown below:
SELECT t.name,tr.time
FROM
ttraces tr
JOIN
ttasks t ON tr.itask = t.id
JOIN (
SELECT taskid, MAX(time) AS max_time
FROM ttraces
GROUP BY itask
) x ON tr.taskid = x.taskid AND tr.time = x.max_time
WHERE t.blocked
All columns used in WHERE and JOIN clauses are indexed. As for now the query runs for ~1,5 seconds. It's extremely crucial to increase its speed. Thanks for all suggestions. BTW: the database is running on a hosted, shared server and I can't move it anywhere else for the moment.
[EDIT]
EXPLAIN SELECT... results are:
--------------------------------------------------------------------------------------------------------------
id select_type table type possible_keys key key_len ref rows Extra
--------------------------------------------------------------------------------------------------------------
1 PRIMARY <derived2> ALL NULL NULL NULL NULL 74
1 PRIMARY t eq_ref PRIMARY PRIMARY 4 x.taskid 1 Using where
1 PRIMARY tr ref taskid,time time 9 x.max_time 1 Using where
2 DERIVED ttraces index NULL itask 5 NULL 570853
--------------------------------------------------------------------------------------------------------------
The engine is InnoDB.

I may be having a bit of a moment, but is this query not logically the same, and (almost certainly) faster?
SELECT t.id, t.name,max(tr.time)
FROM
ttraces tr
JOIN
ttasks t ON tr.itask = t.id
where BLOCKED
group by t.id, t.name

Here's my idea... You need one composite index on ttraces having taskid and time columns (in that order). Than, use this query:
SELECT t.name,
trm.mtime
FROM ttasks AS t
JOIN (SELECT taskid,
Max(time) AS mtime
FROM ttraces
GROUP BY taskid) AS trm
ON t.id = trm.taskid
WHERE t.blocked

Does this code return correct result? If so how is its speed time?
SELECT t.name, max_time
FROM ttasks t JOIN (
SELECT taskid, MAX(time) AS max_time
FROM ttraces
GROUP BY taskid
) x ON t.id = x.taskid

If there are many traces for each task then you can keep a table with only the newest traces. Whenever you insert into ttraces you also upsert into ttraces_newest:
insert into ttraces_newest (id, taskid, time) values
(3, 1, '2012-01-01 08:02:01')
on duplicate key update
`time` = current_timestamp
The primary key to ttraces_newest would be (id, taskid). Querying ttraces_newest would be cheaper. How much cheaper depends on how many traces there are to each task. Now the query is:
SELECT t.name,tr.time
FROM
ttraces_newest tr
JOIN
ttasks t ON tr.itask = t.id
WHERE t.blocked

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Mysql Optimize Query: Trying to Get Average of Subquery - mysql

Related

Improve performance of a last status retrieval from history table

select last record in each group for large database

mysql:choosing the most efficient query from the two

Optimize joined order by query

MySQL: how to increase speed of a select query with 2 joins and 1 subquery

Categories

Resources