Mysql query is not using any key - mysql

CREATE TABLE `TEST` (
`ID1` mediumint(8) NOT NULL default '0',
`ID2` mediumint(8) NOT NULL default '0',
`DATE` datetime NOT NULL default '0000-00-00 00:00:00',
UNIQUE KEY `COMBO_INDEX` (`ID1`,`ID2`),
KEY `ID2` (`ID2`)
) ENGINE=InnoDB`
This table has approx 16196496 records
EXPLAIN SELECT * FROM TEST WHERE ID1 IN ('8518582', '5398912', '6120243', '6841316', '7580078', '7671953', '7775737', '7792470', '7887985', '7888375', '7946516', '8008760', '8111722', '8211235', '8262746', '8365675', '8396853', '8399818', '8410062', '8459079', '8490683')
I am getting output as
+----+-------------+------------------------+------+---------------+------+---------+------+----------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------------------+------+---------------+------+---------+------+----------+-------------+
| 1 | SIMPLE | TEST | ALL | ID1 | NULL | NULL | NULL | 16196496 | Using where |
+----+-------------+------------------------+------+---------------+------+---------+------+----------+-------------+
I don't understand why the query is not using any key.
Also, when I run this query in this innodb table, it's taking huge amount of time 329 second (MySQL version 5.0.45-log).
While if I run same query on myisam table, it's taking just 2 seconds (though on explain its showing the same result). I am using MySQL version 5.5.
Why is the query not taking any key?

innodb needs a primary key to fast seek to the row found in index. As long as you don't have any - mysql cannot do that so it prefers fullscan.
http://dev.mysql.com/doc/refman/5.6/en/innodb-table-and-index.html
Accessing a row through the clustered index is fast because the index search leads directly to the page with all the row data. If a table is large, the clustered index architecture often saves a disk I/O operation when compared to storage organizations that store row data using a different page from the index record. (For example, MyISAM uses one file for data rows and another for index records.)
So the obvious solution - is to replace the unique key with a primary key (though personally I don't like natural primary keys, composite natural primary keys especially).
PS: seems like my guess in the comments about using numbers instead of strings helped. Though the advice about adding primary keys still in action - do that to get even better performance.

i am not sure but another reason which i though is "it might be case when there is insufficient memory available for indexes to load and hence full scan"

Related

How can I optimise this mysql query that includes a where clause with an epoch time range?

I am trying to optimize the following mysql query:
SELECT events.id, events.tracking_id, events.event_time, events.event_type_id
FROM events
WHERE events.event_time >= 1564617600000000 AND events.event_time <= 1567295999000000
Here are the events table details:
CREATE TABLE `events` (
`id` char(36) NOT NULL,
`tracking_id` char(72) NOT NULL,
`event_time` bigint(16) NOT NULL,
`server_id` char(36) NOT NULL,
`project_id` char(36) NOT NULL,
`data_type_id` char(36) NOT NULL,
`event_type_id` char(36) NOT NULL,
PRIMARY KEY (`tracking_id`,`event_time`),
KEY `id_idx` (`id`),
KEY `server_id_idx` (`server_id`),
KEY `event_type_id_idx` (`event_type_id`),
KEY `event_time_idx` (`event_time`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
And the Explain output:
+----+-------------+--------+------------+------+----------------+------+---------+------+---------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+--------+------------+------+----------------+------+---------+------+---------+----------+-------------+
| 1 | SIMPLE | events | NULL | ALL | event_time_idx | NULL | NULL | NULL | 2877592 | 37.48 | Using where |
+----+-------------+--------+------------+------+----------------+------+---------+------+---------+----------+-------------+
The query takes about 30 seconds to run. And adding an index on event_time doesn't seem to have made any difference to the execution time - It doesn't look like the index is being used?
event_time was originally a char (36) but then I was getting the following warning: 'Cannot use range access on index 'event_time_idx' due to type or collation conversion on field 'event_time' which has dissappeared since I converted event_time to a bigint, but it's still not using the index.
What can I do to improve the performance of this query (which is actually a subquery in a much larger query)?
Do all the rows in your table, or at least a majority of them, match the condition? In other words, the timestamps you give are from 2019-08-01 00:00:00 to 2019-08-31 23:59:59, so one full month. Are most of the rows currently in your table from this month?
MySQL does cost-based optimization. It estimates the cost of reading an index entry, then using that to look up a row. This means two lookups per index entry, plus some overhead.
MySQL is correct to estimate that a table-scan might be better than using an index in certain cases. The threshold is not documented, but in my experience if it estimates the number of matching rows are over 20% of the table, it tends to do a table-scan. YMMV
You can use an index hint to tell MySQL that it should treat a table-scan as infinitely costly, so if the index can be used at all, it should prefer it.
SELECT events.id, events.tracking_id, events.event_time, events.event_type_id
FROM events FORCE INDEX (event_time_idx)
WHERE events.event_time >= 1564617600000000 AND events.event_time <= 1567295999000000
But keep in mind MySQL's cost-based optimizer might have been right. It might in fact be less costly to do the table-scan, depending on your data.

How to optimize a query using a range with Mysql

I try to optimize this query :
select id_store from receipt where receiptDate between '20151109' and '20151116'
I execute this query with the command EXPLAIN. It appears that no key is used. The index of receiptDate is not used. What's wrong ?
Here's the structure of the table receipt :
CREATE TABLE receipt (
id_store tinyint(3) unsigned NOT NULL default '0',
id_receipt int(7) unsigned NOT NULL default '0',
id_product smallint(6) unsigned NOT NULL default '0',
receiptDate char(8) NOT NULL default '',
qty float NOT NULL default '0',
turnover float NOT NULL default '0',
PRIMARY KEY (id_store,id_receipt,id_product,receiptDate),
KEY NDX_1 (receiptDate),
) ENGINE=MEMORY;
Here's the result of the command EXPLAIN :
+----+-------------+---------------------+--------+-----------------------------------------------------------+---------------------------------+---------+--------------------------------------+------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------------+--------+-----------------------------------------------------------+---------------------------------+---------+--------------------------------------+------+---------------------------------+
| 1 | SIMPLE | receipt | ALL |NDX_1 | | | |24789225| Using where |
+----+-------------+---------------------+--------+-----------------------------------------------------------+---------------------------------+---------+--------------------------------------+------+---------------------------------+
The table receipt contains 24.789.225 lines, with a average of 15.000 lines per day (receiptDate).
I execute the following query and I obtain 120.295 lines :
select count(*) from receipt where receiptDate between '20151109' and '20151116'
Thanks in advance for your help.
Since you indexed receiptDate, the database engine will use this index if the optimizer thinks it will improve performance. The optimizer takes its decisions based on statistics about your table. It creates these statistics in background, this is a mostly transparent process.
Now you are using a MEMORY engine. Because memory tables are supposed to be short lived, these special engines have very limited optimizer capabilities. You might want to force the query to use your index with the FORCE INDEX keyword.
Your date is stored as a CHAR(8), this is slow because the engine has to parse all height CHAR. You will obtain improved performance with an INT (convert the date to YYYYMMDD). Your queries should still work since the engine converts strings inputs to int automatically.
If you were going to use the InnoDB engine, then you should put the date as a primary key if possible, because the primary key is also the Clustered Index with this engine, meaning the data will be physically sorted by date on the storage.
Instead of an 8-byte (or 24-byte if utf8) CHAR(8) for receiptDate, use a 3-byte DATE datatype.
While you are at it, you could save another byte by making id_receipt MEDIUMINT UNSIGNED if it is only 7 digits.
Specify BTREE:
KEY `NDX_1` (`receiptDate`) USING BTREE
since MEMORY may be making it a HASH index. BTREE can handle ranges; HASH has to do a table scan.
I agree that InnoDB is likely to be better overall.

MySQL hanging on large SELECT

I'm trying to create a new table by joining four existing ones. My database is static, so making one large preprocessed table will simplify programming, and save lots of time in future queries. My query works fine when limited with a WHERE, but seems to either hang, or go too slowly to notice any progress.
Here's the working query. The result only takes a few seconds.
SELECT group.group_id, MIN(application.date), person.person_name, pers_appln.sequence
FROM group
JOIN application ON group.appln_id=application.appln_id
JOIN pers_appln ON pers_appln.appln_id=application.appln_id
JOIN person ON person.person_id=pers_appln.person_id
WHERE group_id="24601"
GROUP BY group.group_id, pers_appln.sequence
;
If I simply remove the WHERE line, it will run for days with nothing to show. Adding a CREATE TABLE newtable AS at the beginning does the same thing. It never moves beyond 0% progress.
The group, application, and person tables all use the MyISAM engine, while pers_appln uses InnoDB. The columns are all indexed. The table sizes range from about 40 million to 150 million rows. I know it's rather large, but I wouldn't think it would pose this much of a problem. The computer currently has 4GB of ram.
Any ideas how to make this work?
Here's the SHOW CREATE TABLE info. There are no views or virtual tables:
CREATE TABLE `group` (
`APPLN_ID` int(10) unsigned NOT NULL,
`GROUP_ID` int(10) unsigned NOT NULL,
KEY `idx_appln` (`APPLN_ID`),
KEY `idx_group` (`GROUP_ID`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
CREATE TABLE `application` (
`APPLN_ID` int(10) unsigned NOT NULL,
`APPLN_AUTH` char(2) NOT NULL DEFAULT '',
`APPLN_NR` varchar(20) NOT NULL DEFAULT '',
`APPLN_KIND` char(2) DEFAULT '',
`DATE` date DEFAULT NULL,
`IPR_TYPE` char(2) DEFAULT '',
PRIMARY KEY (`APPLN_ID`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
CREATE TABLE `person` (
`PERSON_ID` int(10) unsigned NOT NULL,
`PERSON_CTRY_CODE` char(2) NOT NULL,
`PERSON_NAME` varchar(300) DEFAULT NULL,
`PERSON_ADDRESS` varchar(500) DEFAULT NULL,
KEY `idx_person` (`PERSON_ID`),
) ENGINE=MyISAM DEFAULT CHARSET=utf8 MAX_ROWS=30000000 AVG_ROW_LENGTH=100
CREATE TABLE `pers_appln` (
`PERSON_ID` int(10) unsigned NOT NULL,
`APPLN_ID` int(10) unsigned NOT NULL,
`SEQUENCE` smallint(4) unsigned DEFAULT NULL,
`PLACE` smallint(4) unsigned DEFAULT NULL,
KEY `idx_pers_appln` (`APPLN_ID`),
KEY `idx_person` (`PERSON_ID`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
/*!50100 PARTITION BY HASH (appln_id)
PARTITIONS 20 */
Here's the EXPLAIN of my query:
+----+-------------+-------------+--------+----------------------------+-----------------+---------+--------------------------+----------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+--------+----------------------------+-----------------+---------+--------------------------+----------+---------------------------------+
| 1 | SIMPLE | person | ALL | idx_person | NULL | NULL | NULL | 47827690 | Using temporary; Using filesort |
| 1 | SIMPLE | pers_appln | ref | idx_application,idx_person | idx_person | 4 | mydb.person.PERSON_ID | 1 | |
| 1 | SIMPLE | application | eq_ref | PRIMARY | PRIMARY | 4 | mydb.pers_appln.APPLN_ID | 1 | |
| 1 | SIMPLE | group | ref | idx_application | idx_application | 4 | mydb.pers_appln.APPLN_ID | 1 | |
+----+-------------+-------------+--------+----------------------------+-----------------+---------+--------------------------+----------+---------------------------------+
Verify that key_buffer_size is about 200M and innodb_buffer_pool_size is about 1200M. Perhaps they could be bigger, but make sure you are not swapping.
group should have PRIMARY KEY(appln_id, group_id) and INDEX(group_id, appln_id) instead of the two KEYs it has.
pers_appln should have INDEX(person_id, appln_id) and INDEX(appln_id, person_id) instead of the two keys it has. If possible, one of those should be PRIMARY KEY, but watch out for the PARTITIONing.
A minor improvement would be to change those CHAR(2) fields to be CHARACTER SET ascii -- assuming you don't really need utf8. That would shrink the field from 6 bytes to 2 bytes per row.
The PARTITIONing is probably not helping at all. (No, I can't say that removing the PARTITIONing will speed it up much.)
If these suggestions do not help enough, please provide the output from EXPLAIN SELECT ...
Edit
Converting to InnoDB and specifying PRIMARY KEYs for all tables will help. This is because InnoDB "clusters" the PRIMARY KEY with the data. What you have now is a lot of bouncing between a MyISAM index and its data -- literally hundreds of millions of times. Assuming not everything can be cached in your small 4GB, that means a lot of disk I/O. I would not be surprised if the non-WHERE version would take a week to run. Even with InnoDB, there will be I/O, but some of it will be avoided because:
1. reaching into a table with the PK gets the data without another disk hit.
2. the extra indexes I proposed will avoid hitting the data, again avoiding an extra disk hit.
(Millions of references * "an extra disk hit" = days of time.)
If you switch all of your tables to InnoDB, you should lower key_buffer_size to 20M and raise innodb_buffer_pool_size to 1500M. (These are approximate; do not raise them so high that there is any swapping.)
Please show us the CREATE TABLEs with InnoDB -- I want to make sure each table has a PRIMARY KEY and which column(s) that is. The PRIMARY KEY makes a big difference in this particular situation.
For person, the MyISAM version has just a KEY(person_id). If you did not change the keys in the conversions, InnoDB will invent a PRIMARY KEY. When the JOIN to that table occurs, InnoDB will (1) drill down the BTree for key to find that invented PK value, then (2) drill down the PK+data BTree to find the row. If, instead, person_id could be the PK, that JOIN would run twice as fast. Possibly even faster--depending on how big the table is and how much it needs to jump around in the index / data. That is, the two BTree lookups is adding to the pressure on the cache (buffer_pool).
How big is each table? What was the final value for innodb_buffer_pool_size? Once you have changed everything from MyISAM to InnoDB, set key_buffer_size to 40M or less, and set innodb_buffer_pool_size to about 70% of available RAM. If the Data + Index sizes for all the tables are less than the buffer_pool, then (once cache is primed) the query won't have to do any I/O. This is easily a 10x speedup.
pers_appln is a many-to-many relationship? Then, probably
PRIMARY KEY(appln_id, person_id),
INDEX(person_id, appln_id) -- if you need to go the other direction, too.
I found the solution: switching to an SSD. My table creation time went from an estimated 45 days to 16 hours. Previously, the database spent all its time with hard drive I/O, barely even using 5% of the CPU or RAM.
Thanks everyone.

MySQL I/O bound InnoDB query optimization problem without setting innodb_buffer_pool_size to 5GB

I got myself into a MySQL design scalability issue. Any help would be greatly appreciated.
The requirements:
Storing users' SOCIAL_GRAPH and USER_INFO about each user in their social graph. Many concurrent reads and writes per second occur. Dirty reads acceptable.
Current design:
We have 2 (relevant) tables. Both InnoDB for row locking, instead of table locking.
USER_SOCIAL_GRAPH table that maps a logged in (user_id) to another (related_user_id). PRIMARY key composite user_id and related_user_id.
USER_INFO table with information about each related user. PRIMARY key is (related_user_id).
Note 1: No relationships defined.
Note 2: Each table is now about 1GB in size, with 8 million and 2 million records, respectively.
Simplified table SQL creates:
CREATE TABLE `user_social_graph` (
`user_id` int(10) unsigned NOT NULL,
`related_user_id` int(11) NOT NULL,
PRIMARY KEY (`user_id`,`related_user_id`),
KEY `user_idx` (`user_id`)
) ENGINE=InnoDB;
CREATE TABLE `user_info` (
`related_user_id` int(10) unsigned NOT NULL,
`screen_name` varchar(20) CHARACTER SET latin1 DEFAULT NULL,
[... and many other non-indexed fields irrelevant]
`last_updated` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`related_user_id`),
KEY `last_updated_idx` (`last_updated`)
) ENGINE=InnoDB;
MY.CFG values set:
innodb_buffer_pool_size = 256M
key_buffer_size = 320M
Note 3: Memory available 1GB, these 2 tables are 2GBs, other innoDB tables 3GB.
Problem:
The following example SQL statement, which needs to access all records found, takes 15 seconds to execute (!!) and num_results = 220,000:
SELECT SQL_NO_CACHE COUNT(u.related_user_id)
FROM user_info u LEFT JOIN user_socialgraph u2 ON u.related_user_id = u2.related_user_id
WHERE u2.user_id = '1'
AND u.related_user_id = u2.related_user_id
AND (NOT (u.related_user_id IS NULL));
For a user_id with a count of 30,000, it takes about 3 seconds (!).
EXPLAIN EXTENDED for the 220,000 count user. It uses indices:
+----+-------------+-------+--------+------------------------+----------+---------+--------------------+--------+----------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+--------+------------------------+----------+---------+--------------------+--------+----------+--------------------------+
| 1 | SIMPLE | u2 | ref | user_user_idx,user_idx | user_idx | 4 | const | 157320 | 100.00 | Using where |
| 1 | SIMPLE | u | eq_ref | PRIMARY | PRIMARY | 4 | u2.related_user_id | 1 | 100.00 | Using where; Using index |
+----+-------------+-------+--------+------------------------+----------+---------+--------------------+--------+----------+--------------------------+
How do we speed these up without setting innodb_buffer_pool_size to 5GB?
Thank you!
The user_social_graph table is not indexed correctly !!!
You have ths:
CREATE TABLE user_social_graph
(user_id int(10) unsigned NOT NULL,
related_user_id int(11) NOT NULL,
PRIMARY KEY (user_id,related_user_id),
KEY user_idx (user_id))
ENGINE=InnoDB;
The second index is redundant since the first column is user_id. You are attempting to join the related_user_id column over to the user_info table. That column needed to be indexed.
Change user_social_graphs as follows:
CREATE TABLE user_social_graph
(user_id int(10) unsigned NOT NULL,
related_user_id int(11) NOT NULL,
PRIMARY KEY (user_id,related_user_id),
UNIQUE KEY related_user_idx (related_user_id,user_id))
ENGINE=InnoDB;
This should change the EXPLAIN PLAN. Keep in mind that the index order matters depending the the way you query the columns.
Give it a Try !!!
What is the MySQL version? Its manual contains important information for speeding up statements and code in general;
Change your paradigm to a data warehouse capable to manage till terabyte table. Migrate your legacy MySQL data base with free tool or application to the new paradigm. This is an example: http://www.infobright.org/Downloads/What-is-ICE/ many others (free and commercial).
PostgreSQL is not commercial and there a lot of tools to migrate MySQL to it!

How to optimize a query that's using group by on a large number of rows

The table looks like this:
CREATE TABLE `tweet_tweet` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`text` varchar(256) NOT NULL,
`created_at` datetime NOT NULL,
`created_date` date NOT NULL,
...
`positive_sentiment` decimal(5,2) DEFAULT NULL,
`negative_sentiment` decimal(5,2) DEFAULT NULL,
`entity_id` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `tweet_tweet_entity_created` (`entity_id`,`created_at`)
) ENGINE=MyISAM AUTO_INCREMENT=1097134 DEFAULT CHARSET=utf8
The explain on the query looks like this:
mysql> explain SELECT `tweet_tweet`.`entity_id`,
STDDEV_POP(`tweet_tweet`.`positive_sentiment`) AS `sentiment_stddev`,
AVG(`tweet_tweet`.`positive_sentiment`) AS `sentiment_avg`,
COUNT(`tweet_tweet`.`id`) AS `tweet_count`
FROM `tweet_tweet`
WHERE `tweet_tweet`.`created_at` > '2010-10-06 16:24:43'
GROUP BY `tweet_tweet`.`entity_id` ORDER BY `tweet_tweet`.`entity_id` ASC;
+----+-------------+-------------+------+---------------+------+---------+------+---------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+------+---------------+------+---------+------+---------+----------------------------------------------+
| 1 | SIMPLE | tweet_tweet | ALL | NULL | NULL | NULL | NULL | 1097452 | Using where; Using temporary; Using filesort |
+----+-------------+-------------+------+---------------+------+---------+------+---------+----------------------------------------------+
1 row in set (0.00 sec)
About 300k rows are added to the table every day. The query runs about 4 seconds right now but I want to get it down to around 1 second and I'm afraid the query will take exponentially longer as the days go on. Total number of rows in tweet_tweet is currently only a little over 1M, but it will be growing fast.
Any thoughts on optimizing this? Do I need any more indexes? Should I be using something like Cassandra instead of MySQL? =)
You may try to reorder fields in the index (i.e. KEY tweet_tweet_entity_created (created_at, entity_id). That will allow mysql to use the index to reduce the quantity of actual rows that need to be grouped and ordered).
You're not using the index tweet_tweet_entity_created. Change your query to:
explain SELECT `tweet_tweet`.`entity_id`,
STDDEV_POP(`tweet_tweet`.`positive_sentiment`) AS `sentiment_stddev`,
AVG(`tweet_tweet`.`positive_sentiment`) AS `sentiment_avg`,
COUNT(`tweet_tweet`.`id`) AS `tweet_count`
FROM `tweet_tweet` FORCE INDEX (tweet_tweet_entity_created)
WHERE `tweet_tweet`.`created_at` > '2010-10-06 16:24:43'
GROUP BY `tweet_tweet`.`entity_id` ORDER BY `tweet_tweet`.`entity_id` ASC;
You can read more about index hints in the MySQL manual http://dev.mysql.com/doc/refman/5.1/en/index-hints.html
Sometimes MySQL's query optimizer needs a little help.
MySQL has a dirty little secret. When you create an index over multiple columns, only the first one is really "used". I've made tables that used Unique Keys and Foreign Keys, and I often had to set a separate index for one or more of the columns.
I suggest adding an extra index to just created_at at a minimum. I do not know if adding indexes to the aggregate columns will also speed things up.
if your mysql version 5.1 or higher ,you can consider partitioning option for large tables.
http://dev.mysql.com/doc/refman/5.1/en/partitioning.html