Performance issue full index scan in mysql database - mysql

I have a database with a table called QuizMatches. The table has the following structure:
CREATE TABLE `QuizMatches` (
`QuizMatchesGuid` binary(16) NOT NULL,
`DateStarted` datetime NOT NULL,
`LatestChanged` datetime NOT NULL,
`HostFBUserToken` varchar(250) NOT NULL,
`GuestFBUserToken` varchar(250) NOT NULL,
`ArrayOfQuestionIDs` varchar(200) NOT NULL,
`ArrayOfQuestionResponseTimesAndAnswersHost` varchar(900) NOT NULL,
`ArrayOfQuestionResponseTimesAndAnswersGuest` varchar(900) NOT NULL,
`MatchFinished` int(1) NOT NULL DEFAULT '0',
`Category` varchar(45) NOT NULL,
`JsonQuestions` varchar(4000) NOT NULL DEFAULT '[]',
`DateFinished` datetime NOT NULL,
`LatestPushSentDate` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
PRIMARY KEY (`QuizMatchesGuid`),
KEY `HostFBUserTokenIX` (`HostFBUserToken`),
KEY `GuestFBUserTokenIX` (`GuestFBUserToken`),
KEY `MatchFinishedIX` (`MatchFinished`),
KEY `LatestChangedIX` (`LatestChanged`),
KEY `LatestPushSentDateIX` (`LatestPushSentDate`),
KEY `DateFinishedIX` (`LatestChanged`,`HostFBUserToken`,`GuestFBUserToken`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
There is a large number of rows in this table and it is heavily used by multiple clients especially queries like the following are executed:
SELECT HEX(QuizMatchesGuid) AS QuizMatchesGuid, DateStarted,
LatestChanged, HostFBUserToken, GuestFBUserToken,
ArrayOfQuestionIDs, ArrayOfQuestionResponseTimesAndAnswersHost,
ArrayOfQuestionResponseTimesAndAnswersGuest, JsonQuestions
FROM CrystalDBQuiz.QuizMatches
ORDER BY LatestChanged DESC
LIMIT 10
The main problem seem to be that the database performs a full index scan. I have tried with different combinations of indexes but to no success.
If I run EXPLAIN on the above SELECT query I receive the following:
id: 1
select_type: SIMPLE
table: 'QuizMatches'
type: index
possible_keys: NULL
key: 'LatestChangedIX'
key_len: 8
ref: NULL
rows: 10
Extra:
Is there a way I can optimize SELECTS as the above example towards this database table?

If you are using the LIMIT statement for pagination I suggest you to use LatestChanged value for this due to ordering. So your query will turn to
SELECT HEX(QuizMatchesGuid) AS QuizMatchesGuid, DateStarted, LatestChanged,
HostFBUserToken, GuestFBUserToken, ArrayOfQuestionIDs,
ArrayOfQuestionResponseTimesAndAnswersHost,
ArrayOfQuestionResponseTimesAndAnswersGuest, JsonQuestions
FROM CrystalDBQuiz.QuizMatches
WHERE LatestChanged<[lastValue]
ORDER BY LatestChanged DESC
LIMIT 10

Related

does this update query have any possible rewrite options?

i have a below query and I don't know how to do explain plan for it. so what i have is temp table create query and table structure.
create temporary table if not exists tmp_staging_task_ids as
select distinct s.usr_task_id
from ue_events_staging s
where s.queue_id is null
limit 6500;
the above select query explain plan ;
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: s
partitions: NULL
type: ref
possible_keys: ue_events_staging_queue_id,usr_task_id,queue_id_usr_task_id,queue_id_app_id
key: queue_id_usr_task_id
key_len: 303
ref: const
rows: 17774428
filtered: 100.00
Extra: Using where; Using index; Using temporary
Query;
update ue_events_staging s
join tmp_staging_task_ids t on t.usr_task_id = s.usr_task_id
set s.queue_id = 'queue_id';
table structure;
Create Table: CREATE TABLE `ue_events_staging` (
`id` bigint NOT NULL AUTO_INCREMENT,
`queue_id` varchar(100) DEFAULT NULL,
`usr_task_id` bigint NOT NULL,
`app_id` bigint NOT NULL,
`platform` tinyint NOT NULL,
`capture_time` bigint NOT NULL,
`input_type` varchar(50) NOT NULL,
`type` varchar(100) NOT NULL,
`event_type` varchar(10) NOT NULL,
`screen` varchar(100) NOT NULL,
`object_name` varchar(255) DEFAULT NULL,
`app_custom_tag` varchar(255) DEFAULT NULL,
`exception_class_name` varchar(250) DEFAULT NULL,
`exception_tag` varchar(250) DEFAULT NULL,
`non_responsive` tinyint(1) DEFAULT '0',
`is_first` tinyint(1) DEFAULT '0',
`is_second` tinyint(1) DEFAULT '0',
`is_last` tinyint(1) DEFAULT '0',
`is_quit` tinyint(1) DEFAULT '0',
`x_coordinate` double DEFAULT NULL,
`y_coordinate` double DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `ue_events_staging_queue_id` (`queue_id`),
KEY `usr_task_id` (`usr_task_id`),
KEY `screen` (`app_id`,`platform`,`screen`),
KEY `app_id_queue_id` (`app_id`,`queue_id`),
KEY `queue_id_usr_task_id` (`queue_id`,`usr_task_id`),
KEY `queue_id_app_id` (`queue_id`,`app_id`)
please check the possibilities it takes around 3.5K seconds and causes load.
This looks like you're doing your updates in batches of 6500 rows.
If you don't need that temporary table, you can refactor your update query to stand alone. You don't need the temporary table because you can put its WHERE queue_id IS NULL directly into the WHERE of your UPDATE.
UPDATE ue_events_staging
SET queue_id = 'queue_id'
WHERE queue_id IS NULL
LIMIT 6500;
Your temporary table creation step pulls 6500 distinct (arbitrarily chosen) usr_task_id values from your table. Some of those values may relate to more than one row in your table, so your UPDATE statement may update more than 6500 rows in your table.
The refactoring I suggest will update 6500 arbitarily chosen rows in your table. At the end of the statement it's possible that some rows with a particular usr_task_id value will be updated and others will not. If that's acceptable for your business rules it will be faster.
If your business rules require all rows with each particular usr_task_id value to be updated at once, you could try this to simplify both statements.
create temporary table if not exists tmp_staging_task_ids as
select s.usr_task_id
from ue_events_staging s
where s.queue_id is null
limit 6500;
update ue_events_staging
set queue_id = 'queue_id'
where usr_task_id IN
(select usr_task_id from tmp_staging_task_ids);
This gets rid of the DISTINCT operator in the creation of your temporary table and may save a little time. The IN clause implies DISTINCT values.
"Arbitrarily chosen"? Statements without ORDER BY and with LIMIT clauses instruct MySQL to choose rows arbitrarily. MySQL picks the rows that are fastest to retrieve (hopefully).

Why Mysql use index for a single table, but not when join

I wonder why mysql use index for a single table query, while not for a join query, even i force using index.
Just show the tables and querys:
show create table dc_assess_plan_batch;
CREATE TABLE `dc_assess_plan_batch` (
`id` varchar(255) NOT NULL,
`app_batch_id` varchar(255) NOT NULL COMMENT '数据来源的id ',
`project_id` varchar(255) NOT NULL COMMENT '课程计划id',
`name` varchar(32) DEFAULT NULL COMMENT '批次名',
`time` datetime DEFAULT NULL COMMENT '批次的时间',
`batch_id` varchar(255) DEFAULT NULL,
`status` tinyint(4) DEFAULT NULL COMMENT '1.可用关系.0 不可用',
`app_from` varchar(10) DEFAULT NULL COMMENT '数据来源',
PRIMARY KEY (`id`),
KEY `inx_project_id` (`project_id`) USING BTREE,
KEY `app_batch_id` (`app_batch_id`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `dc_assess_task` (
`id` varchar(255) NOT NULL,
`name` varchar(255) DEFAULT NULL COMMENT '任务名称',
`class_id` varchar(255) DEFAULT NULL COMMENT '班级id',
`course_id` varchar(255) DEFAULT NULL COMMENT '课程id',
`app_from` varchar(255) DEFAULT NULL COMMENT '成绩来源',
`term_id` varchar(255) DEFAULT NULL COMMENT '学期id',
`publish_time` datetime DEFAULT NULL COMMENT '任务发布时间',
`completion_degree` double(10,3) DEFAULT NULL COMMENT '完成度',
`course_name` varchar(255) DEFAULT NULL COMMENT '课程名称',
`school_id` varchar(255) DEFAULT NULL,
`app_batch_id` varchar(255) NOT NULL,
`score_type` varchar(10) DEFAULT NULL COMMENT '''成绩计算规则 COUNT :按数量, SCORE:按照分数, KCBCOUNT:按课程币数量'',',
`xxpj_mode` enum('BY_GROUP','BY_PERSON') DEFAULT NULL COMMENT '线下评价评分模式',
`upload_flag` tinyint(4) NOT NULL DEFAULT '0' COMMENT '标志任务是否线下上传,1是,0否',
PRIMARY KEY (`id`),
KEY `FKrx8fxs12oe1mafttciiwir6cm` (`app_from`) USING BTREE,
KEY `app_batch_id` (`app_batch_id`) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
There are two tables, and both have index on filed 'app_batch_id'
query1:
explain select * from dc_assess_plan_batch where app_batch_id = '2ebb4066038441229e937d2de4a9b5632018-12-07';
# id, select_type, table, type, possible_keys, key, key_len, ref, rows, Extra
1 SIMPLE dc_assess_plan_batch ref app_batch_id app_batch_id 767 const 1 Using index condition
query2
explain select * from dc_assess_plan_batch dapb join dc_assess_task dat on dapb.app_batch_id = dat.app_batch_id where dat.course_id = '562aecc2d1c64fabbff4a18496acc757';
# id, select_type, table, type, possible_keys, key, key_len, ref, rows, Extra
1 SIMPLE dapb ALL app_batch_id 277947
1 SIMPLE dat ref app_batch_id app_batch_id 767 ifass.dapb.app_batch_id 1 Using where
-query3
explain select * from dc_assess_plan_batch dapb force index (app_batch_id) join dc_assess_task dat on dapb.app_batch_id = dat.app_batch_id where dat.course_id = '562aecc2d1c64fabbff4a18496acc757';
# id, select_type, table, type, possible_keys, key, key_len, ref, rows, Extra
1 SIMPLE dapb ALL app_batch_id 277947
1 SIMPLE dat ref app_batch_id app_batch_id 767 ifass.dapb.app_batch_id 1 Using where
Why index on 'app_batch_id' not work when join, even force using index
Which runs faster? Sometimes the "Optimizer knows best". However, the EXPLAINs seem the same?
I would expect these to help significantly with performance:
dat: (course_id, app_batch_id) -- the table has nothing like this
dapb: (app_batch_id)
When seeing a JOIN, the Optimizer usually looks at the WHERE clause to decide which table to start with. But, alas, there is no index for dat.course_id. SO it needs to run the query in a "brute force" way -- Scan all of one table, then reach into the other. A strong clue of this is ALL .. 277947 in the EXPLAIN. That says that it could not benefit from any available index on dat.
If it had such an index, the Optimizer would start with dat, reach in for perhaps 1 row, not 277947, then reach into the other table using the index you already have, even without using FORCE INDEX.

Mysql Query run faster

Table structure:
CREATE TABLE IF NOT EXISTS `logs` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`user` bigint(20) unsigned NOT NULL,
`type` tinyint(1) unsigned NOT NULL,
`date` int(11) unsigned NOT NULL,
`plus` decimal(10,2) unsigned NOT NULL,
`minus` decimal(10,2) unsigned NOT NULL,
`tax` decimal(10,2) unsigned NOT NULL,
`item` bigint(20) unsigned NOT NULL,
`info` char(10) NOT NULL,
PRIMARY KEY (`id`),
KEY `item` (`item`),
KEY `user` (`user`),
KEY `type` (`type`),
KEY `date` (`date`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 PACK_KEYS=0 ROW_FORMAT=FIXED;
Query:
SELECT logs.item, COUNT(logs.item) AS total FROM logs WHERE logs.type = 4 GROUP BY logs.item;
Table holds 110k records out of which 50k type 4 records.
Execution time: 0.13 seconds
I know this is fast, but can I make it faster?
I am expecting 1 million records and thus the time would grow quite a bit.
Analyze queries with EXPLAIN:
mysql> EXPLAIN SELECT logs.item, COUNT(logs.item) AS total FROM logs
WHERE logs.type = 4 GROUP BY logs.item\G
id: 1
select_type: SIMPLE
table: logs
type: ref
possible_keys: type
key: type
key_len: 1
ref: const
rows: 1
Extra: Using where; Using temporary; Using filesort
The "Using temporary; Using filesort" indicates some costly operations. Because the optimizer knows it can't rely on the rows with each value of item being stored together, it needs to scan the whole table and collect the count per distinct item in a temporary table. Then sort the resulting temp table to produce the result.
You need an index on the logs table on columns (type, item) in that order. Then the optimizer knows it can leverage the index tree to scan each value of logs.item fully before moving on to the next value. By doing this, it can skip the temporary table to collect values, and skip the implicit sorting of the result.
mysql> CREATE INDEX logs_type_item ON logs (type,item);
mysql> EXPLAIN SELECT logs.item, COUNT(logs.item) AS total FROM logs
WHERE logs.type = 4 GROUP BY logs.item\G
id: 1
select_type: SIMPLE
table: logs
type: ref
possible_keys: type,logs_type_item
key: logs_type_item
key_len: 1
ref: const
rows: 1
Extra: Using where

Mysql query optimisation, EXPLAIN and slow execution

Having some real issues with a few queries, this one inparticular. Info below.
tgmp_games, about 20k rows
CREATE TABLE IF NOT EXISTS `tgmp_games` (
`g_id` int(8) NOT NULL AUTO_INCREMENT,
`site_id` int(6) NOT NULL,
`g_name` varchar(255) NOT NULL,
`g_link` varchar(255) NOT NULL,
`g_url` varchar(255) NOT NULL,
`g_platforms` varchar(128) NOT NULL,
`g_added` datetime NOT NULL,
`g_cover` varchar(255) NOT NULL,
`g_impressions` int(8) NOT NULL,
PRIMARY KEY (`g_id`),
KEY `g_platforms` (`g_platforms`),
KEY `site_id` (`site_id`),
KEY `g_link` (`g_link`),
KEY `g_release` (`g_release`),
KEY `g_genre` (`g_genre`),
KEY `g_name` (`g_name`),
KEY `g_impressions` (`g_impressions`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
tgmp_reviews - about 200k rows
CREATE TABLE IF NOT EXISTS `tgmp_reviews` (
`r_id` int(8) NOT NULL AUTO_INCREMENT,
`site_id` int(6) NOT NULL,
`r_source` varchar(128) NOT NULL,
`r_date` date NOT NULL,
`r_score` int(3) NOT NULL,
`r_copy` text NOT NULL,
`r_link` text NOT NULL,
`r_int_link` text NOT NULL,
`r_parent` int(8) NOT NULL,
`r_platform` varchar(12) NOT NULL,
`r_impressions` int(8) NOT NULL,
PRIMARY KEY (`r_id`),
KEY `site_id` (`site_id`),
KEY `r_parent` (`r_parent`),
KEY `r_platform` (`r_platform`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ;
Here is the query, takes 3 seconds ish
SELECT * FROM tgmp_games g
RIGHT JOIN tgmp_reviews r ON g_id = r.r_parent
WHERE g.site_id = '34'
GROUP BY g_name
ORDER BY g_impressions DESC LIMIT 15
EXPLAIN
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE r ALL r_parent NULL NULL NULL 201133 Using temporary; Using filesort
1 SIMPLE g eq_ref PRIMARY,site_id PRIMARY 4 engine_comp.r.r_parent 1 Using where
I am just trying to grab the 15 most viewed games, then grab a single review (doesnt really matter which, I guess highest rated would be ideal, r_score) for each game.
Can someone help me figure out why this is so horribly inefficient?
I don't understand what is the purpose of having a GROUP BY g_name in your query, but this makes MySQL performing aggregates on the columns selected, or all columns from both table. So please try to exclude it and check if it helps.
Also, RIGHT JOIN makes database to query tgmp_reviews first, which is not what you want. I suppose LEFT JOIN is a better choice here. Please, try to change the join type.
If none of the first options helps, you need to redesign your query. As you need to obtain 15 most viewed games for the site, the query will be:
SELECT g_id
FROM tgmp_games g
WHERE site_id = 34
ORDER BY g_impressions DESC
LIMIT 15;
This is the very first part that should be executed by the database, as it provides the best selectivity. Then you can get the desired reviews for the games:
SELECT r_parent, max(r_score)
FROM tgmp_reviews r
WHERE r_parent IN (/*1st query*/)
GROUP BY r_parent;
Such construct will force database to execute the first query first (sorry for the tautology) and will give you the maximal score for each of the wanted games. I hope you will be able to use the obtained results for your purpose.
Your MyISAM table is small, you can try converting it to see if that resolves the issue. Do you have a reason for using MyISAM instead of InnoDB for that table?
You can also try running an analyze on each table to update the statistics to see if the optimizer chooses something different.

Mysql big table multiple dates in where clause query performance

I have following table with 2 million rows in it.
CREATE TABLE `gen_fmt_lookup` (
`episode_id` varchar(30) DEFAULT NULL,
`media_type` enum('Audio','Video') NOT NULL DEFAULT 'Video',
`service_id` varchar(50) DEFAULT NULL,
`genre_id` varchar(30) DEFAULT NULL,
`format_id` varchar(30) DEFAULT NULL,
`masterbrand_id` varchar(30) DEFAULT NULL,
`signed` int(11) DEFAULT NULL,
`actual_start` datetime DEFAULT NULL,
`scheduled_start` datetime DEFAULT NULL,
`scheduled_end` datetime DEFAULT NULL,
`discoverable_end` datetime DEFAULT NULL,
`created_at` datetime DEFAULT NULL,
KEY `idx_discoverability_gn` (`media_type`,`service_id`,`genre_id`,`actual_start`,`scheduled_end`,`scheduled_start`,`episode_id`),
KEY `idx_discoverability_fmt` (`media_type`,`service_id`,`format_id`,`actual_start`,`scheduled_end`,`scheduled_start`,`episode_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
Below is query with explain which I am running against this table
mysql> EXPLAIN select episode_id,scheduled_start
from gen_fmt_lookup
where media_type='video'
and service_id in ('mobile_streaming_100','mobile_streaming_200','iplayer_streaming_h264_flv_vlo','mobile_streaming_500','iplayer_stb_uk_stream_aac_concrete','captions','iplayer_uk_stream_aac_rtmp_concrete','iplayer_streaming_n95_3g','iplayer_uk_download_oma_wifi','iplayer_uk_stream_aac_3gp_concrete')
and genre_id in ('100001','100002','100003','100004','100005','100006','100007','100008','100009','100010')
and NOW() BETWEEN actual_start and scheduled_end
group by episode_id order by min(scheduled_start) limit 1 offset 100\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: nitro_episodes_gen_fmt_lookup
type: range
possible_keys: idx_discoverability_gn,idx_discoverability_fmt
key: idx_discoverability_gn
key_len: 96
ref: NULL
rows: 31719
Extra: Using where; Using index; Using temporary; Using filesort
1 row in set (0.16 sec)
So my questions are
Is the index used in query execution best? And if not can someone please suggest better index?
Can mysql use composite index with 2 dates in where clause? As in the query above where clause has and condition "and NOW() BETWEEN actual_start and scheduled_end " but mysql is using index 'idx_discoverability_gn' with key length of 96 only. Which means it is using index upto (media_type,service_id,genre_id,actual_start) only.why can't it use index upto (media_type,service_id,genre_id,actual_start,scheduled_end) ?
What else I can do to improve performance?
You have a range check, so a clustered index on (scheduled_start, actual_start, scheduled_end) might help. Your current indexes are not very useful.You can get rid of them and build one primary key (episode_id) and another index (service_id, genre-id, episode_id)