Mysql union optimization with subquery - mysql

My problem is this query:
(SELECT
ID1,ID2,ID3,ID4,ID5,ID10,ID11,ID13,ID14,ID454,ID453,
TIME,TEMP_ID,'ID_AUTO',PREDAJCA,VYTVORIL,MAIL,TEMP_ID_HASH,ID_SEND
FROM `load_send_calc`
WHERE `TEMP_ID` LIKE '$find%'
AND ACTIVE = 1 AND TEMP_ID > 0)
UNION ALL
(SELECT
ID1,ID2,ID3,ID4,ID5,ID10,ID11,ID13,ID14,ID454,ID453,TIME,'',
ID_AUTO,'','','','',''
FROM `temp`
WHERE `ID_AUTO` LIKE '$find%'
AND `ID_AUTO` NOT IN (SELECT TEMP_ID
FROM `load_send_calc`
WHERE `load_send_calc`.ACTIVE = 1)
)
ORDER BY TIME DESC LIMIT $limitFrom,$limitTo;
There are 18000 records in table load_send_calc and 3000 table temp. The query itself take more than 2 minutes to execute. Is there any way to optimize this time?
I already tried to put order into each subqueries but it didnt help significantly. I am really desperate so I really appreciate any kind of help.
EDIT:
Here is EXPLAIN result :
id select_type table type possible_keys key key_len ref rows Extra
1 PRIMARY load_send_calc ALL NULL NULL NULL NULL 18394 Using where
2 UNION temp ALL NULL NULL NULL NULL 1918 Using where
3 DEPENDENT SUBQUERY load_send_calc ALL NULL NULL NULL NULL 18394 Using where
NULL UNION RESULT <union1,2> ALL NULL NULL NULL NULL NULL Using filesort

Thanks for adding your explain output - it tells us a lot. The query isn't using a single index, which is very bad for performance. A very simple optimisation would be add indexes on the fields that are used in the join, and also in the where clauses. In your case those fields would be:
load_send_calc.temp_id
load_send_calc.active
temp.id_auto
In addition to these, you have an unnecessary AND TEMP_ID > 0, since you are already limiting on the same field with WHERE TEMP_ID LIKE '$find%'

3 things to speed it up:
INDEX(active, temp_id) -- significantly better than two separate indexes
IN ( SELECT ... ) performs poorly, especially in old versions of MySQL. Turn it into a JOIN.
Add a LIMIT to each SELECT. For example:
( SELECT ... ORDER BY ... LIMIT 80 )
UNION ALL
( SELECT ... ORDER BY ... LIMIT 80 )
ORDER BY ... LIMIT 70, 10;
The inner ones have a limit of the max needed -- the outer's offset + limit.

Related

Why does this query doesn't use index for ORDER BY?

SELECT `f`.*
FROM `files_table` `f`
WHERE f.`application_id` IN(6)
AND `f`.`project_id` IN(130418)
AND `f`.`is_last_version` = 1
AND `f`.`temporary` = 0
AND f.deleted_by is null
ORDER BY `f`.`date` DESC
LIMIT 5
When I remove the ORDER BY, query executes in 0.1 seconds. With the ORDER BY it takes 3 seconds.
There is an index on every WHERE column and there is also an index on ORDER BY field (date).
What can I do to make this query faster? Why is ORDER BY slowing it down so much? Table has 3M rows.
instead of an index on each column in where be sure you have a composite index that cover all the columns in where
eg
create index idx1 on files_table (application_id, project_id,is_last_version,temporary,deleted_by)
avoid IN clause for single value use = for these
SELECT `f`.*
FROM `files_table` `f`
WHERE f.`application_id` = 6
AND `f`.`project_id` = 130418
AND `f`.`is_last_version` = 1
AND `f`.`temporary` = 0
AND f.deleted_by is null
ORDER BY `f`.`date` DESC
LIMIT 5
the date or others column in select could be useful retrive all info using the index and avoiding the access to the table data .. but for select all (select *)
you probably need severl columns an then the access to the table data is done however .. but you can try an eval the performance ..
be careful to place the data non involved in where at the right of all the column involved in where
create index idx1 on files_table (application_id, project_id,is_last_version,temporary,deleted_by, date)

SQL - Strange issue with SELECT

I have a strange situation with a simple select by column pqth_scan_code from the following table:
table pqth_
Field Type Null Key Default Extra
pqth_id int(11) NO PRI NULL auto_increment
pqth_scan_code varchar(250) NO NULL
pqth_info text YES NULL
pqth_opk int(11) NO 999
query 1
This query took 12.7221 seconds to execute
SELECT * FROM `pqth_` WHERE pqth_scan_code = "7900722!30#3#6$EN"
query 2
This query took took 0.0003 seconds to execute
SELECT * FROM `pqth` WHERE `pqth_id`=27597
Based on data from table pqth_ I have created the following table, where pqthc_id = pqth_id and pqthc_scan_code=pqth_scan_code
table pqthc
Field Type Null Key Default Extra
pqthc_id int(11) NO PRI NULL
pqthc_scan_code tinytext NO NULL
The same query ,query1, on table pqthc took 0.0259 seconds to run
SELECT * FROM `pqthc` WHERE pqthc_scan_code = "7900722!30#3#6$EN"
If I run the following query will took 0.0971 seconds, very strange.
query 3
SELECT * FROM `pqth` WHERE pqth_id = (SELECT pqthc_id From pqthc where pqthc_scan_code = "7900722!30#3#6$EN")
My question is why a SELECT by pqth_scan_code is slow and SELECT by pqth_id is fastest? Both columns are indexed.
For testing please get the export from this link
The same behavior is with MySQL and MariaDB server
SELECT * FROM `pqth_` WHERE pqth_scan_code = "7900722!30#3#6$EN"
needs INDEX(pqth_scan_code). Period. End of discussion.
SELECT * FROM `pqth` WHERE `pqth_id`=27597
has a useful index, since a PRIMARY KEY is an index (and it is unique).
SELECT * FROM `pqthc` WHERE pqthc_scan_code = "7900722!30#3#6$EN"
also needs INDEX(pqthc_scan_code). But it may have been faster because (1) the table is smaller, or (2) you ran the query before, thereby caching what was needed in RAM.
Please don't prefix column names with the table name.
Please don't have table names so close to each other that they are hard to distinguish. (pqth and pqthc)
SELECT *
FROM `pqth`
WHERE pqth_id =
( SELECT pqthc_id
From pqthc
where pqthc_scan_code = "7900722!30#3#6$EN"
)
The construct IN ( SELECT ... ) is not efficient.
It is rare to have two table with the same PRIMARY KEY; are you sure you meant that?
Use a JOIN instead:
SELECT a.*
FROM `pqth` AS a
JOIN pqthc AS c ON a.id = c.id
where c.scan_code = "7900722!30#3#6$EN"
If that is 'correct', then I recommend this 'covering' index:
INDEX(scan_code, id)
instead of the shorter INDEX(scan_code) I previously recommended.
More on indexing.
you have to understand the concept of primary key and indexes and how they help in searching,
reference docs here
First of all pqthc_scan_code has no index/key and pqthc_id does, keys help making searches faster.
Another difference is that pqthc_id is an integer where as pqthc_scan_code is a string. comparing integers is a lot more efficient than comparing strings.
You should avoid having to search on strings in really large tables.
You could add a index/key to pqthc_scan_code but i don't know how much it will help.
You can use EXPLAIN in fronto of your query to try and figure out what takes so long More info on EXPLAIN

Why does adding a WHERE statement (on a column with an index) to my query increase my run time from seconds to minutes?

My problem is with this query in MySQL:
select
SUM(OrderThreshold < #LOW_COST) as LOW_COUNT,
SUM(OrderThreshold > #HIGH_COST) as HIGH_COUNT
FROM parts
-- where parttypeid = 1
When the where is uncommented, my run time jumps for 4.5 seconds to 341 seconds. There are approximately 21M total records in this table.
My EXPLAIN looks like this, which seems to indicate that it is utilizing the INDEX I have on PartTypeId.
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE parts ref PartTypeId PartTypeId 1 const 11090057
I created my table using this query:
CREATE TABLE IF NOT EXISTS parts (
Id INTEGER NOT NULL PRIMARY KEY,
PartTypeId TINYINT NOT NULL,
OrderThreshold INTEGER NOT NULL,
PartName VARCHAR(500),
INDEX(Id),
INDEX(PartTypeId),
INDEX(OrderThreshold),
);
The query with out the WHERE returns
LOW_COUNT HIGH_COUNT
3570 3584
With the where the results look like this:
LOW_COUNT HIGH_COUNT
2791 2147
How can I improve the performance of my query to keep the run time down in the seconds (instead of minutes) range when adding a where statement that only looks at one column?
Try
select SUM(OrderThreshold < #LOW_COST) as LOW_COUNT,
SUM(OrderThreshold > #HIGH_COST) as HIGH_COUNT
from parts
where parttypeid = 1
and OrderThreshold not between #LOW_COST and #HIGH_COST
and
select count(*) as LOW_COUNT, null as HIGH_COUNT
from parts
where parttypeid = 1
and OrderThreshold < #LOW_COST
union all
select null, count(*)
from parts
where parttypeid = 1
and OrderThreshold > #HIGH_COST
Your accepted answer doesn't explain what is going wrong with your original query:
select SUM(OrderThreshold < #LOW_COST) as LOW_COUNT,
SUM(OrderThreshold > #HIGH_COST) as HIGH_COUNT
from parts
where parttypeid = 1;
The index is being used to find the results, but there are a lot of rows with parttypeid = 1. I am guessing that each data page probably has at least one such row. That means that all the rows are being fetched, but they are being read out-of-order. That is slower than just doing a full table scan (as in the first query). In other words, all the data pages are being read, but the index is adding additional overhead.
As Juergen points out, a better form of the query moves the conditions into the where clause:
select SUM(OrderThreshold < #LOW_COST) as LOW_COUNT,
SUM(OrderThreshold > #HIGH_COST) as HIGH_COUNT
from parts
where parttypeid = 1 AND
(OrderThreshold < #LOW_COST OR OrderThreshold > #HIGH_COST)
(I prefer this form, because the where conditions match the case conditions.) For this query, you want an index on parts(parttypeid, OrderThreshold). I'm not sure about the MySQL optimizer in this case, but it might be better to write as:
select 'Low' as which, count(*) as CNT
from parts
where parttypeid = 1 AND
OrderThreshold < #LOW_COST
union all
select 'High', count(*) as CNT
from parts
where parttypeid = 1 AND
OrderThreshold > #HIGH_COST;
Each subquery should definitely use the index in this case. (If you want them in one row with two columns, there are a couple ways to achieve that, but I'm guessing that is not so important.)
Unfortunately, the best index for your query without the where clause is parts(OrderThreshold). This is a different index from the above.

How do I find the join size of a mysql query?

I had a query that was resulting in the error:
ERROR 1104: The SELECT would examine more rows than MAX_JOIN_SIZE.
Check your WHERE and use SET SQL_BIG_SELECTS=1 or SET SQL_MAX_JOIN_SIZE=# if the SELECT is ok.
I have now changed the query, and I no longer get this error. max_join_size = 900,000 and sql_big_selects = 0. Unfortunately I don't have ssh access, I have to use phpmyadmin.
So my question is, is there any way of determining how many rows a particular query would examine? I would like to see how close a query is to the max_join_size limit.
EDIT: This was the original query:
SELECT * FROM `tz_verify`
LEFT JOIN `tz_sessions` ON `tz_sessions`.`timesheet_id` = `tz_verify`.`timesheet_id`
AND `tz_sessions`.`client_id` = `tz_verify`.`client_id`
LEFT JOIN `tz_clients` ON `tz_sessions`.`client_id` = `tz_clients`.`id`
LEFT JOIN `tz_tutor_comments` ON `tz_sessions`.`timesheet_id` = `tz_tutor_comments`.`timesheet_id`
AND `tz_sessions`.`client_id` = `tz_tutor_comments`.`client_id`
LEFT JOIN `tz_notes` ON `tz_sessions`.`notes` = `tz_notes`.`id`
WHERE `tz_verify`.`code` = 'b65f35601c' AND `confirmed` = 0;
I can temporarily enable SQL_BIG_SELECTS to get the EXPLAIN to run - here is the output:
id select_type table type possible_keys key ref rows extra
1 SIMPLE tz_verify ALL NULL NULL NULL 93 Using where
1 SIMPLE tz_sessions ALL NULL NULL NULL 559
1 SIMPLE tz_clients eq_ref PRIMARY PRIMARY tz_sessions.client_id 1
1 SIMPLE tz_tutor_comments ALL NULL NULL NULL 185
1 SIMPLE tz_notes eq_ref PRIMARY PRIMARY tz_sessions.notes 1
In rewriting the query, I just split it up to run two separate queries, first to find client_id (e.g. 226) and timesheet_id (e.g. 75) from tz_verify, then used these values in this query:
SELECT * FROM `tz_sessions`
LEFT JOIN `tz_clients`
ON `tz_clients`.`id` = 226
LEFT JOIN `tz_tutor_comments`
ON `tz_tutor_comments`.`timesheet_id` = 75
AND `tz_tutor_comments`.`client_id` = 226
LEFT JOIN `tz_notes`
ON `tz_sessions`.`notes` = `tz_notes`.`id`
WHERE `tz_sessions`.`client_id` = 226 AND `tz_sessions`.`timesheet_id` = 75;
Here is the EXPLAIN:
id select_type table type possible_keys key ref rows extra
1 SIMPLE tz_sessions ALL NULL NULL NULL 559 Using where
1 SIMPLE tz_clients const PRIMARY PRIMARY const 1
1 SIMPLE tz_tutor_comments ALL NULL NULL NULL 185
1 SIMPLE tz_notes eq_ref PRIMARY PRIMARY tz_sessions.notes 1
This doesn't seem as neat though as doing it in one go!
Based on the first output of EXPLAIN you posted:
I think the join size is: 93*559*185 = 9 617 595
your initial query seems to not be always using indexes when joining with tables tz_sessions and tz_tutor_comments. I suggest adding the following compound indexes (each index is made of 2 fields):
table tz_verify: (timesheet_id, client_id)
table tz_sessions: (timesheet_id, client_id)
table tz_tutor_comments: (timesheet_id, client_id)
If one of these indexes already exist, do not create it again.
Once you added the indexes, run the EXPLAIN again (using your initial query). You will notice that the query is now using indexes for each join (look in the column "key"). You may run your initial query again, it should no longer cause the error you had.
Documentation:
Optimizing Queries with EXPLAIN
CREATE INDEX Syntax
How MySQL Uses Indexes
Multiple-Column Indexes

Help me to optimize this query

Please suggest indexes to optimize below query. I couldn't allowed to rewrite the query but create indexes:
SELECT
`ADV`.`inds` as `c0`,
sum(`ADVpost`.`clk`) as `m0`
FROM
(SELECT *
FROM advts
WHERE comp_id =
(SELECT comp_id
FROM comp
WHERE name = 'abc')) as `ADV`,
(SELECT dt_id,
comp_id,
b_id,
ad_id,
clk,
resp
FROM advts_post
WHERE comp_id =
(SELECT comp_id
FROM comp
WHERE name = 'abc')) as `ADVpost`
WHERE
`ADVpost`.`ad_id` = `ADV`.`ad_id`
GROUP BY
`ADV`.`inds`
ORDER BY
ISNULL(`ADV`.`inds`), `ADV`.`inds` ASC
The explain for the query is as:
select_type table type possible_keys Extra
PRIMARY <derived2> ALL null Using temporary; Using filesort
PRIMARY <derived4> ALL null Using where; Using join buffer
DERIVED ADVpost ALL null Using where
SUBQUERY comp ALL null Using where
DERIVED advts ALL null Using where
SUBQUERY comp ALL null Using where
Existing indexes are as follows:
ADVpost > PRIMARY KEY (`dt_id`,`comp_id`,`b_id`,`ad_id`)
comp > PRIMARY KEY (`comp_id`)
advts > PRIMARY KEY (`ad_id`)
Thanks in advance.
Ok, maybe I am not an expert with MySQL optimization, but:
if it is possible and reasonable, try to avoid subselects where possible (instead it may be better to make separate query and then pass the retrieved ID, like comp_id, to the containing query),
put index on comp.name,
put index on advts_post.comp_id (single one),
put index on advts_post.ad_id (single one),
Maybe it is rather simple, but should help at least slightly make it faster. Tell us about the results.
That query is a dogs dinner - whoever wrote it should be severely punished, and the person who said you can't rewrite it but must make it run faster should be shot.
Loose the sub-selects!
MySQL does not do push-predicates very well (at all?).
Use proper joins instead and state implied joins:
SELECT ap.inds, SUM(ap.clk)
FROM advts_post AS ap
, comp AS co
, advts ad
WHERE ap.comp_id = co.comp_id
AND ad.comp_id = co.comp_id
AND ap.comp_id = ad.comp_id
AND co.name='abc'
GROUP BY ap.inds
ORDER BY ISNULL(ap.inds), ap.inds ASC