Please suggest indexes to optimize below query. I couldn't allowed to rewrite the query but create indexes:
SELECT
`ADV`.`inds` as `c0`,
sum(`ADVpost`.`clk`) as `m0`
FROM
(SELECT *
FROM advts
WHERE comp_id =
(SELECT comp_id
FROM comp
WHERE name = 'abc')) as `ADV`,
(SELECT dt_id,
comp_id,
b_id,
ad_id,
clk,
resp
FROM advts_post
WHERE comp_id =
(SELECT comp_id
FROM comp
WHERE name = 'abc')) as `ADVpost`
WHERE
`ADVpost`.`ad_id` = `ADV`.`ad_id`
GROUP BY
`ADV`.`inds`
ORDER BY
ISNULL(`ADV`.`inds`), `ADV`.`inds` ASC
The explain for the query is as:
select_type table type possible_keys Extra
PRIMARY <derived2> ALL null Using temporary; Using filesort
PRIMARY <derived4> ALL null Using where; Using join buffer
DERIVED ADVpost ALL null Using where
SUBQUERY comp ALL null Using where
DERIVED advts ALL null Using where
SUBQUERY comp ALL null Using where
Existing indexes are as follows:
ADVpost > PRIMARY KEY (`dt_id`,`comp_id`,`b_id`,`ad_id`)
comp > PRIMARY KEY (`comp_id`)
advts > PRIMARY KEY (`ad_id`)
Thanks in advance.
Ok, maybe I am not an expert with MySQL optimization, but:
if it is possible and reasonable, try to avoid subselects where possible (instead it may be better to make separate query and then pass the retrieved ID, like comp_id, to the containing query),
put index on comp.name,
put index on advts_post.comp_id (single one),
put index on advts_post.ad_id (single one),
Maybe it is rather simple, but should help at least slightly make it faster. Tell us about the results.
That query is a dogs dinner - whoever wrote it should be severely punished, and the person who said you can't rewrite it but must make it run faster should be shot.
Loose the sub-selects!
MySQL does not do push-predicates very well (at all?).
Use proper joins instead and state implied joins:
SELECT ap.inds, SUM(ap.clk)
FROM advts_post AS ap
, comp AS co
, advts ad
WHERE ap.comp_id = co.comp_id
AND ad.comp_id = co.comp_id
AND ap.comp_id = ad.comp_id
AND co.name='abc'
GROUP BY ap.inds
ORDER BY ISNULL(ap.inds), ap.inds ASC
Related
I am using spring-data-jpa & postgresql-9.4.
There is a table: tbl_oplog. This table has about seven million rows of data, and data is need to be displayed on the front end.(paged).
I use Spring#PagingAndSortingRepository , and then I found that the data query was very slow. From the logs, I found that two SQL queries were issued:
select
oplog0_.id as id1_8_,
oplog0_.deleted as deleted2_8_,
oplog0_.result_desc as result_d3_8_,
oplog0_.extra as extra4_8_,
oplog0_.info as info5_8_,
oplog0_.login_ipaddr as login_ip6_8_,
oplog0_.level as level7_8_,
oplog0_.op_type as op_type8_8_,
oplog0_.user_name as user_nam9_8_,
oplog0_.op_obj as op_obj10_8_,
oplog0_.op as op11_8_,
oplog0_.result as result12_8_,
oplog0_.op_time as op_time13_8_,
oplog0_.login_name as login_n14_8_
from
tbl_oplog oplog0_
where
oplog0_.deleted=false
order by
oplog0_.op_time desc limit 10
And:
select
count(oplog0_.id) as col_0_0_
from
tbl_oplog oplog0_
where
oplog0_.deleted=?
(The second SQL statement is used to populate the page object,which is necessary)
I found the second statement to be very time-consuming. Why does it take so long?
How to optimize? Does this happen with Mysql?
Or is there any other way I can optimize this requirement? (It seems that select count is inevitable).
EDIT:
I'll use another table for the demonstration(same):
Table:
select count(*) from tbl_gather_log; // count is 6300931.cost 5.408S
EXPLAIN select count(*) from tbl_gather_log:
Aggregate (cost=246566.58..246566.59 rows=1 width=0)
-> Index Only Scan using tbl_gather_log_pkey on tbl_gather_log (cost=0.43..230814.70 rows=6300751 width=0)
EXPLAIN ANALYSE select count(*) from tbl_gather_log:
Aggregate (cost=246566.58..246566.59 rows=1 width=0) (actual time=6697.102..6697.102 rows=1 loops=1)
-> Index Only Scan using tbl_gather_log_pkey on tbl_gather_log (cost=0.43..230814.70 rows=6300751 width=0) (actual time=0.173..4622.674 rows=6300936 loops=1)
Heap Fetches: 298
Planning time: 0.312 ms
Execution time: 6697.267 ms
EDIT2:
TABLE:
create table tbl_gather_log (
id bigserial not null primary key,
event_level int,
event_time timestamp,
event_type int,
event_dis_type int,
event_childtype int,
event_name varchar(64),
dev_name varchar(32),
dev_ip varchar(32),
sys_type varchar(16),
event_content jsonb,
extra jsonb
);
And:
There are probably many filtering criteria supported, so i can't simply do special operations on deleted.For example, a query might be issued select * from tbl_oplog where name like xxx and type = xxx limit 10,so, there will be a query:select count * from tbl_oplog where name like xxx and type = xxx . Futhermore, i have to know exact counts. because I need to show how many pages there are on the front end.
The second statement takes a long time because it has to scan the whole table in order to count the rows.
One thing you can do is use an index:
CREATE INDEX ON tbl_oplog (deleted) INCLUDE (id);
VACUUM tbl_oplog; -- so you get an index only scan
Assuming that id is the primary key, it would be much better to use count(*) and omit the INCLUDE clause from the index.
But the best is probably to use an estimate:
SELECT t.reltuples * freq.f AS estimated_rows
FROM pg_stats AS s
JOIN pg_namespace AS n
ON s.schemaname = n.nspname
JOIN pg_class AS t
ON s.tablename = t.relname
AND n.oid = t.relnamespace
CROSS JOIN LATERAL
unnest(s.most_common_vals::text::boolean[]) WITH ORDINALITY AS val(v,id)
JOIN LATERAL
unnest(s.most_common_freqs) WITH ORDINALITY AS freq(f,id)
USING (id)
WHERE s.tablename = 'tbl_oplog'
AND s.attname = 'deleted'
AND val.v = ?;
This uses the distribution statistics to estimate the desired count.
If it is just about pagination, you don't need exact counts.
Read my blog for more on the topic of counting in PostgreSQL.
I am trying to understand Mysql Explain and I read the the Type "ALL" is the worst for performance. I just wrote a very simple sql with one left join
SELECT * FROM production_plan_header pph LEFT JOIN production_plan_details ppd ON ppd.ppd_header_id = pph.pph_id WHERE pph.pph_id =1
If I use EXPLAIN on this I get the following.
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE pph const PRIMARY PRIMARY 4 const 1
1 SIMPLE ppd ALL ppd_header_id NULL NULL NULL 7
As you can see the production_plan_details has a type "ALL". The rows show the total number of the rows in the table (7). The ppd_header_id column is indexed. Is there a way to prevent this "ALL" from my sql statement?
I am not sure if I got your question correctly, but if you want to remove rows having Type = "ALL", in that case you can write your query like following.
SELECT *
FROM production_plan_header pph
LEFT JOIN production_plan_details ppd
ON ppd.ppd_header_id = pph.pph_id
WHERE pph.pph_id = 1
AND ppd.type <> "ALL"
My problem is with this query in MySQL:
select
SUM(OrderThreshold < #LOW_COST) as LOW_COUNT,
SUM(OrderThreshold > #HIGH_COST) as HIGH_COUNT
FROM parts
-- where parttypeid = 1
When the where is uncommented, my run time jumps for 4.5 seconds to 341 seconds. There are approximately 21M total records in this table.
My EXPLAIN looks like this, which seems to indicate that it is utilizing the INDEX I have on PartTypeId.
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE parts ref PartTypeId PartTypeId 1 const 11090057
I created my table using this query:
CREATE TABLE IF NOT EXISTS parts (
Id INTEGER NOT NULL PRIMARY KEY,
PartTypeId TINYINT NOT NULL,
OrderThreshold INTEGER NOT NULL,
PartName VARCHAR(500),
INDEX(Id),
INDEX(PartTypeId),
INDEX(OrderThreshold),
);
The query with out the WHERE returns
LOW_COUNT HIGH_COUNT
3570 3584
With the where the results look like this:
LOW_COUNT HIGH_COUNT
2791 2147
How can I improve the performance of my query to keep the run time down in the seconds (instead of minutes) range when adding a where statement that only looks at one column?
Try
select SUM(OrderThreshold < #LOW_COST) as LOW_COUNT,
SUM(OrderThreshold > #HIGH_COST) as HIGH_COUNT
from parts
where parttypeid = 1
and OrderThreshold not between #LOW_COST and #HIGH_COST
and
select count(*) as LOW_COUNT, null as HIGH_COUNT
from parts
where parttypeid = 1
and OrderThreshold < #LOW_COST
union all
select null, count(*)
from parts
where parttypeid = 1
and OrderThreshold > #HIGH_COST
Your accepted answer doesn't explain what is going wrong with your original query:
select SUM(OrderThreshold < #LOW_COST) as LOW_COUNT,
SUM(OrderThreshold > #HIGH_COST) as HIGH_COUNT
from parts
where parttypeid = 1;
The index is being used to find the results, but there are a lot of rows with parttypeid = 1. I am guessing that each data page probably has at least one such row. That means that all the rows are being fetched, but they are being read out-of-order. That is slower than just doing a full table scan (as in the first query). In other words, all the data pages are being read, but the index is adding additional overhead.
As Juergen points out, a better form of the query moves the conditions into the where clause:
select SUM(OrderThreshold < #LOW_COST) as LOW_COUNT,
SUM(OrderThreshold > #HIGH_COST) as HIGH_COUNT
from parts
where parttypeid = 1 AND
(OrderThreshold < #LOW_COST OR OrderThreshold > #HIGH_COST)
(I prefer this form, because the where conditions match the case conditions.) For this query, you want an index on parts(parttypeid, OrderThreshold). I'm not sure about the MySQL optimizer in this case, but it might be better to write as:
select 'Low' as which, count(*) as CNT
from parts
where parttypeid = 1 AND
OrderThreshold < #LOW_COST
union all
select 'High', count(*) as CNT
from parts
where parttypeid = 1 AND
OrderThreshold > #HIGH_COST;
Each subquery should definitely use the index in this case. (If you want them in one row with two columns, there are a couple ways to achieve that, but I'm guessing that is not so important.)
Unfortunately, the best index for your query without the where clause is parts(OrderThreshold). This is a different index from the above.
My problem is this query:
(SELECT
ID1,ID2,ID3,ID4,ID5,ID10,ID11,ID13,ID14,ID454,ID453,
TIME,TEMP_ID,'ID_AUTO',PREDAJCA,VYTVORIL,MAIL,TEMP_ID_HASH,ID_SEND
FROM `load_send_calc`
WHERE `TEMP_ID` LIKE '$find%'
AND ACTIVE = 1 AND TEMP_ID > 0)
UNION ALL
(SELECT
ID1,ID2,ID3,ID4,ID5,ID10,ID11,ID13,ID14,ID454,ID453,TIME,'',
ID_AUTO,'','','','',''
FROM `temp`
WHERE `ID_AUTO` LIKE '$find%'
AND `ID_AUTO` NOT IN (SELECT TEMP_ID
FROM `load_send_calc`
WHERE `load_send_calc`.ACTIVE = 1)
)
ORDER BY TIME DESC LIMIT $limitFrom,$limitTo;
There are 18000 records in table load_send_calc and 3000 table temp. The query itself take more than 2 minutes to execute. Is there any way to optimize this time?
I already tried to put order into each subqueries but it didnt help significantly. I am really desperate so I really appreciate any kind of help.
EDIT:
Here is EXPLAIN result :
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY load_send_calc ALL NULL NULL NULL NULL 18394 Using where
2 UNION temp ALL NULL NULL NULL NULL 1918 Using where
3 DEPENDENT SUBQUERY load_send_calc ALL NULL NULL NULL NULL 18394 Using where
NULL UNION RESULT <union1,2> ALL NULL NULL NULL NULL NULL Using filesort
Thanks for adding your explain output - it tells us a lot. The query isn't using a single index, which is very bad for performance. A very simple optimisation would be add indexes on the fields that are used in the join, and also in the where clauses. In your case those fields would be:
load_send_calc.temp_id
load_send_calc.active
temp.id_auto
In addition to these, you have an unnecessary AND TEMP_ID > 0, since you are already limiting on the same field with WHERE TEMP_ID LIKE '$find%'
3 things to speed it up:
INDEX(active, temp_id) -- significantly better than two separate indexes
IN ( SELECT ... ) performs poorly, especially in old versions of MySQL. Turn it into a JOIN.
Add a LIMIT to each SELECT. For example:
( SELECT ... ORDER BY ... LIMIT 80 )
UNION ALL
( SELECT ... ORDER BY ... LIMIT 80 )
ORDER BY ... LIMIT 70, 10;
The inner ones have a limit of the max needed -- the outer's offset + limit.
I had a query that was resulting in the error:
ERROR 1104: The SELECT would examine more rows than MAX_JOIN_SIZE.
Check your WHERE and use SET SQL_BIG_SELECTS=1 or SET SQL_MAX_JOIN_SIZE=# if the SELECT is ok.
I have now changed the query, and I no longer get this error. max_join_size = 900,000 and sql_big_selects = 0. Unfortunately I don't have ssh access, I have to use phpmyadmin.
So my question is, is there any way of determining how many rows a particular query would examine? I would like to see how close a query is to the max_join_size limit.
EDIT: This was the original query:
SELECT * FROM `tz_verify`
LEFT JOIN `tz_sessions` ON `tz_sessions`.`timesheet_id` = `tz_verify`.`timesheet_id`
AND `tz_sessions`.`client_id` = `tz_verify`.`client_id`
LEFT JOIN `tz_clients` ON `tz_sessions`.`client_id` = `tz_clients`.`id`
LEFT JOIN `tz_tutor_comments` ON `tz_sessions`.`timesheet_id` = `tz_tutor_comments`.`timesheet_id`
AND `tz_sessions`.`client_id` = `tz_tutor_comments`.`client_id`
LEFT JOIN `tz_notes` ON `tz_sessions`.`notes` = `tz_notes`.`id`
WHERE `tz_verify`.`code` = 'b65f35601c' AND `confirmed` = 0;
I can temporarily enable SQL_BIG_SELECTS to get the EXPLAIN to run - here is the output:
id select_type table type possible_keys key ref rows extra
1 SIMPLE tz_verify ALL NULL NULL NULL 93 Using where
1 SIMPLE tz_sessions ALL NULL NULL NULL 559
1 SIMPLE tz_clients eq_ref PRIMARY PRIMARY tz_sessions.client_id 1
1 SIMPLE tz_tutor_comments ALL NULL NULL NULL 185
1 SIMPLE tz_notes eq_ref PRIMARY PRIMARY tz_sessions.notes 1
In rewriting the query, I just split it up to run two separate queries, first to find client_id (e.g. 226) and timesheet_id (e.g. 75) from tz_verify, then used these values in this query:
SELECT * FROM `tz_sessions`
LEFT JOIN `tz_clients`
ON `tz_clients`.`id` = 226
LEFT JOIN `tz_tutor_comments`
ON `tz_tutor_comments`.`timesheet_id` = 75
AND `tz_tutor_comments`.`client_id` = 226
LEFT JOIN `tz_notes`
ON `tz_sessions`.`notes` = `tz_notes`.`id`
WHERE `tz_sessions`.`client_id` = 226 AND `tz_sessions`.`timesheet_id` = 75;
Here is the EXPLAIN:
id select_type table type possible_keys key ref rows extra
1 SIMPLE tz_sessions ALL NULL NULL NULL 559 Using where
1 SIMPLE tz_clients const PRIMARY PRIMARY const 1
1 SIMPLE tz_tutor_comments ALL NULL NULL NULL 185
1 SIMPLE tz_notes eq_ref PRIMARY PRIMARY tz_sessions.notes 1
This doesn't seem as neat though as doing it in one go!
Based on the first output of EXPLAIN you posted:
I think the join size is: 93*559*185 = 9 617 595
your initial query seems to not be always using indexes when joining with tables tz_sessions and tz_tutor_comments. I suggest adding the following compound indexes (each index is made of 2 fields):
table tz_verify: (timesheet_id, client_id)
table tz_sessions: (timesheet_id, client_id)
table tz_tutor_comments: (timesheet_id, client_id)
If one of these indexes already exist, do not create it again.
Once you added the indexes, run the EXPLAIN again (using your initial query). You will notice that the query is now using indexes for each join (look in the column "key"). You may run your initial query again, it should no longer cause the error you had.
Documentation:
Optimizing Queries with EXPLAIN
CREATE INDEX Syntax
How MySQL Uses Indexes
Multiple-Column Indexes