MySQL : Using EXPLAIN keyword to know about indexing. (Specific Use Case) - mysql

This is my table structure:
CREATE TABLE `channel_play_times_bar_chart` (
`playing_date` datetime NOT NULL,
`channel_report_tag` varchar(50) NOT NULL,
`country_code` varchar(50) NOT NULL,
`device_report_tag` int(11) NOT NULL,
`greater_than_30_minutes` decimal(10,0) NOT NULL,
`15_to_30_minutes` decimal(10,0) NOT NULL,
`0-15_minutes` decimal(10,0) NOT NULL,
PRIMARY KEY (`country_code`,`device_report_tag`,`channel_report_tag`,`playing_date`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
When I run the following query:
EXPLAIN EXTENDED
SELECT
channel_report_tag,
SUM(`greater_than_30_minutes`) AS '>30 minutes',
SUM(`15_to_30_minutes`) AS '15-30 Minutes',
SUM(`0-15_minutes`) AS '0-15 Minutes'
FROM
channel_play_times_bar_chart USE INDEX (ABCDE)
WHERE country_code = 'US'
AND device_report_tag = 14
AND channel_report_tag = 'DUNYA NEWS'
AND playing_date BETWEEN '2016-09-01'
AND '2016-09-13'
GROUP BY channel_report_tag
ORDER BY SUM(`greater_than_30_minutes`) DESC
LIMIT 10
This is the output I get ( open it in another tab):
The index was defined as :
CREATE INDEX ABCDE
ON channel_play_times_bar_chart (
`country_code`,
`device_report_tag`,
`channel_report_tag`,
`playing_date`,
`greater_than_30_minutes`
)
I am a bit confused here ; The key column shows ABCDE being used the as the index , yet ref column shows NULL. What does this mean ? Is the index actually being used ? If not what did I do wrong ?

It is using the key you are showing in the create index, that is, ABCDE.
It would be nice if you did a
show create table channel_play_times_bar_chart
and just showed it all at once. That key might not be of much use to you as it replicates most of what your rather wide Primary Key already gives you.
Once the query uses the key up thru the 3rd segment of the composite key, it resumes with a WHERE range on playing_date in that composite and finds 8 rows.
Note EXPLAIN is an estimate.
Further, I would reconsider the strategy for your PRIMARY KEY (PK) ideas especially considering that you decided to dupe it up more or less with the creation of ABCDE. That means you are maintaining two indexes with little if anything gained on the second one (you added one column to the secondary index ABCDE).
The PK It is rather WIDE (118 bytes I believe). It dictates the physical ordering. And that idea could easily be a bad one if used throughout the way you architect things. Changes made to data via UPDATE that impact the columns in the PK force a reshuffle of physical ordering of the table. That fact would be a good indication why id INT AUTO_INCREMENT PRIMARY KEY is often used as a best practice use case as it never endures a reshuffle and is THIN (4 bytes).
The width of keys and their strategy with referencing (other) tables (in Foreign Key Constraints) impact key sizes and performance for lookups. Wide keys can measurably slow down that process.
This is not to suggest that you shouldn't have a key on those columns like your secondary index ABCDE. But in general that is not a good idea for the PK.
Note that it could be argued that ABCDE never gives you any benefit over your PK due to range queries ceasing the use of it near the end that just WHERE out with ranges once it hits the date. Just a thought.
A nice read and rather brief is the article Using EXPLAIN to Write Better MySQL Queries.

Your query does use the ABCDE index. As MySQL documentation on EXPLAIN Output Format explains :) (bolding is mine):
key (JSON name: key)
The key column indicates the key (index) that MySQL actually decided
to use. If MySQL decides to use one of the possible_keys indexes to
look up rows, that index is listed as the key value.
The ref field of the explain output is primarily used when joins are present to show the fields / constants / expressions the index was compared with.

Related

Is this second multicolumn unique key in my MySQL statement redundant or does it improve performance?

I found this old code and I'm not sure if it's optimized or just doing something silly.
I have a SQL create statement like this:
CREATE TABLE `wp_pmpro_memberships_categories` (
`membership_id` int(11) unsigned NOT NULL,
`category_id` int(11) unsigned NOT NULL,
`modified` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY `membership_category` (`membership_id`,`category_id`),
UNIQUE KEY `category_membership` (`category_id`,`membership_id`)
);
Is that second UNIQUE KEY there redundant with the PRIMARY KEY on the same 2 columns? Or would the second one help for queries that filter by the category_id first then by the membership_id? Is it being ignored?
I'm trying to remember why I coded it that way, way back when. Seems similar to what this comment is describing: https://dba.stackexchange.com/a/1793/245678
Thanks!
It depends on your query patterns. If you do SELECT, UPDATE, DELETE only on the category_id column, then the 2nd index makes sense but you should omit the membership_id column (redundant) and the UNIQUE constraint.
MySQL will automatically use the PRIMARY KEY index if you use either membership_id or both columns. It doesn't matter in which order these columns appear in your WHERE clauses.
The secondary index does improve performance when going from a "category" to a "membership".
You coded it with those two indexes because some queries start with a "membership" and need to locate a "category"; some queries go the 'other' direction.
That's a well-coded "many-to-many mapping table".
InnoDB provides better performance than MyISAM.
The "Uniqueness" constraint in the UNIQUE key is redundant.
Checking for Uniqueness slows dowing writes by a very small amount. (The constraint must be checked before finishing the update to the index's BTree. A non-unique index can put off the update until later; see "change buffering".)
I like to say this to indicate that I have some reason for the pair of columns being together in the index:
INDEX(`category_id`,`membership_id`)
I discuss the schema pattern here: http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table

Optimising a query that uses index merge by intersection

I have a MySQL 8 database table accounts that has the following columns:
id (primary)
city_id (foreign key)
province_id (foreign key)
country_id (foreign key)
school_id (foreign key)
age (indexed)
EDIT: See bottom for complete table structure.
Now, imagine the following SQL query:
SELECT
COUNT(`id`) AS AGGREGATE
FROM
`accounts`
WHERE
`city_id` = 1
AND
`country_id` = 7
AND
`age` = 3
At 1 million records, this query becomes slow (~200ms).
When running EXPLAIN, I receive the following output:
id
select_type
table
partitions
type
possible_keys
key
key_len
ref
rows
filtered
Extra
1
SIMPLE
accounts
NULL
index_merge
accounts_city_id_foreign accounts_country_id_foreign accounts_age_index
accounts_city_id_foreign accounts_country_id_foreign accounts_age_index
9,2,9
NULL
15542
100.00
Using intersect(accounts_city_id_foreign, accounts_country_id_foreign, accounts_age_index); Using where; Using index
Given that MySQL appears to be using the indexes, I'm not sure what I can do to bring the execution time down. Does anyone have any ideas?
EDIT: In the future, the table will include more columns that will make it impossible to use a composite index as it will exceed the 16 column limit.
EDIT: Here's the complete table structure:
CREATE TABLE `accounts` (
`id` bigint unsigned NOT NULL AUTO_INCREMENT,
`city_id` bigint unsigned DEFAULT NULL,
`school_id` bigint unsigned DEFAULT NULL,
`country_id` bigint unsigned DEFAULT NULL,
`province_id` bigint unsigned DEFAULT NULL,
`age` tinyint unsigned DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `accounts_city_id_foreign` (`city_id`),
KEY `accounts_school_id_foreign` (`school_id`),
KEY `accounts_country_id_foreign` (`country_id`),
KEY `accounts_province_id_foreign` (`province_id`),
KEY `accounts_age_index` (`age`),
CONSTRAINT `accounts_city_id_foreign` FOREIGN KEY (`city_id`) REFERENCES `cities` (`id`) ON DELETE SET NULL,
CONSTRAINT `accounts_country_id_foreign` FOREIGN KEY (`country_id`) REFERENCES `countries` (`id`) ON DELETE SET NULL,
CONSTRAINT `accounts_province_id_foreign` FOREIGN KEY (`province_id`) REFERENCES `provinces` (`id`) ON DELETE SET NULL,
CONSTRAINT `accounts_school_id_foreign` FOREIGN KEY (`school_id`) REFERENCES `schools` (`id`) ON DELETE SET NULL
) ENGINE=InnoDB AUTO_INCREMENT=1000002 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
Try creating a composite index on all three columns, e.g. CREATE INDEX idx_city_country_age ON table (city_id, country_id, age)
Indexes are to help your querying. So as suggested by Marko and agreed by others, having an index on (city_id, country_id, age) should significantly help. Now, yes, you will add other columns to the table, but are you trying to filter on 16+ criteria??? I doubt it. And of the queries you would be running, even if you have multiple composite indexes to help optimize those queries, how many columns might you need at any single time? 4, 5, 6? After that, I mean how granular do you plan on getting with your data. Country, State/Province, City, Town, Village, Neighborhood, Street, House? and by the time you are that low in the data, you would be at the page level data anyhow, wouldn't you?
So, your query of Country = 7, that already chops off a ton of stuff. Then to a given city within that country? Great, now you are at a finite level.
if you are going do be doing queries against large data that requires any aggregations, and the data is rather fixed from a historical perspective, maybe having pre-aggregated tables by some common elements might help long term.
FEEDBACK
The performance of querying is not necessarily where you will be hit, it would be in the inserts, updates, deletes as whatever may change has to update all the indexes on the table - single or composite. If you are getting more than 5 columns in an index, ask yourself, really??? How granular is it that you need for the index to be optimized. Querying out the data should be very fast with proper indexes. Updating indexes is also quick, but if you are dealing with millions of inserts in a month, quarter, year? The user doing theirs may have a slight delay ( 1/4 second?) but adding up a million seconds starts to get delay. But again, over what period of time would insert/update/delete be done anyhow.
You asked what will bring the query time down, and using a composite index will do that. Searching a single composite index is faster than searching several single-column indexes and performing an intersection merge on the results.
You commented that you will be adding more columns in the future, and there will eventually be more than 16 columns.
You don't have to add ALL the columns to the composite index!
Index design is not magic. It follows rules. You will create indexes designed to support specific queries that you need to run. You don't add add columns to an index unless they help the given query. You may have multiple composite indexes in the table, created to help different queries.
You might like my presentation How to Design Indexes, Really (or the video).
Re your comment:
I won't know every possible query combination ahead of time.
Yes, that's true. You can only create indexes for queries that you know. Other queries will not be optimized. If you need to optimize queries in the future, you might need to add new indexes to support them.
In my experience, this happens regularly, and I address this in the presentation. You will review your queries from time to time, because of course your application code changes and the queries you need change. You may add new indexes, or replace an index with a different index, or drop indexes that are no longer needed.

Mysql table with composite index but not primary key

I need a table to store some ratings, in this table I have a composite index (user_id, post_id) and other column to identify different rating system.
user_id - bigint
post_id - bigint
type - varchar
...
Composite Index (user_id, post_id)
In this table I've not a primary key because the primary need to be unique and the INDEX not need to be unique, in my case univocity is a problem.
For example I can have
INSERT INTO tbl_rate
(user_id,post_id,type)
VALUES
(24,1234,'like'),
(24,1234,'love'),
(24,1234,'other');
The missing of PRIMARY KEY may cause performance problem? My table structure is good or I need to change it?
Thank you
A few points:
It sounds like you are just using what is currently unique about the table and making that as a primary key. That works. And natural keys have some advantages when it comes to querying because of locality. (The data for each user is stored in the same area). And because the table is clustered by that key which eliminates lookups to the data if you are searching by the columns in the primary.
But, using a natural primary key like you chose has disadvantages for performance as well.
Using a very large primary key will make all other indexes very large in innodb because the primary key is included in each index value.
Using a natural primary key isn't as fast as a surrogate key for INSERT's because in addition to being bigger it can't just insert at the end of the table each time. It has to insert in the section for that user and post etc.
Also, if u are searching by time most likely you will be seeking all over the table with a natural key unless time is your first column. surrogate keys tend to be local for time and can often be just right for some queries.
Using a natural key like yours as a primary key can also be annoying. What if you want to refer to a particular vote? You need a few fields. Also it's a little difficult to use with lots of ORMs.
Here's the Answer
I would create your own surrogate key and use it as a primary key rather than rely on innodb's internal primary key because you'll be able to use it for updates and lookups.
ALTER TABLE tbl_rate
ADD id INT UNSIGNED NOT NULL AUTO_INCREMENT,
ADD PRIMARY KEY(id);
But, if you do create a surrogate primary key, I'd also make your key a UNIQUE. Same cost but it enforces correctness.
ALTER TABLE tbl_rate
ADD UNIQUE ( user_id, post_id, type );
The missing of PRIMARY KEY may cause performance problem?
Yes in InnoDB for sure, as InnoDB will use a algorithm to create it's own "ROWID",
Which is defined in dict0boot.ic
Returns a new row id.
#return the new id */
UNIV_INLINE
row_id_t
dict_sys_get_new_row_id(void)
/*=========================*/
{
row_id_t id;
mutex_enter(&(dict_sys->mutex));
id = dict_sys->row_id;
if (0 == (id % DICT_HDR_ROW_ID_WRITE_MARGIN)) {
dict_hdr_flush_row_id();
}
dict_sys->row_id++;
mutex_exit(&(dict_sys->mutex));
return(id);
}
The main problem in that code is mutex_enter(&(dict_sys->mutex)); which blocks others threads from accessing if one thread is already running this code.
Meaning it will table lock the same as MyISAM would.
% may take a few nanoseconds. That is insignificant compared to
everything else. Anyway #define DICT_HDR_ROW_ID_WRITE_MARGIN 256
Indeed yes Rick James this is indeed insignificant compared to what was mentioned above.
The C/C++ compiler would micro optimize it more to to get even more performance out off it by making the CPU instructions lighter.
Still the main performance concern is mentioned above..
Also the modulo operator (%) is a CPU heavy instruction.
But depening on the C/C++ compiler (and/or configuration options) if might be optimized if DICT_HDR_ROW_ID_WRITE_MARGIN is a power of two. Like (0 == (id & (DICT_HDR_ROW_ID_WRITE_MARGIN - 1))) as bitmasking is much faster, i believe DICT_HDR_ROW_ID_WRITE_MARGIN indeed had a number which is a power of 2

is there any difference with joint primary key order?

I am curious about that , is there any difference with joint primary key order?
For example, is there any difference between the two tables' primary key? the key order would make no difference on the table?
CREATE TABLE `Q3` (
`user_id` VARCHAR(20) NOT NULL,
`retweet_id` VARCHAR(20) NOT NULL,
PRIMARY KEY (`user_id`,`retweet_id`)
)
vs
CREATE TABLE `Q3` (
`user_id` VARCHAR(20) NOT NULL,
`retweet_id` VARCHAR(20) NOT NULL,
PRIMARY KEY (`retweet_id`,`user_id`)
)
It would make difference in an index structure.
In composite index the index value consists of several values that go one after another. And the order determines what queries can be optimized using this particular index.
IE:
For the index created as
PRIMARY KEY (`user_id`,`retweet_id`)
The query like WHERE user_id = 42 will be optimized (not guaranteed, but technically possible), whereas for the query WHERE retweet_id = 4242 it won't be.
PS: it's a good idea to always have an artificial primary key, like a sequence (or an autoincrement column in case of mysql), instead of using natural primary keys. It would be better because the primary key is a clustered key, which means it defines how rows are physically stored in pages on disk. Which means it's a good idea for a PK to be monotonously growing (or decreasing, doesn't matter)
The order does affect how the index is used in queries. When you use multiple columns, each column is a sub-tree of the preceding column.
In your first case (user_id, retweet_id) - if you searched the index for user_id 1, you then have all the retweet_ids under that.
Subsequently if you wish to search for only retweet_id=7 (for all users) - the index cannot be used because you need to first step through each users item in the index.
So if you wish to query for user_id, or retweet_id individually (without the other), put that column first. If you need both you could consider adding a secondary index.
There are also limitations for range scans, you can only effectively use the last column queried for the range scan. You can read more about all of this here:
http://dev.mysql.com/doc/refman/5.6/en/multiple-column-indexes.html
Additionally if using InnoDB, the tables are stored in order of the PRIMARY KEY. This might matter for performance depending on how you query your data.

Mysql design for logtable

I would like to have advices about a mysql table design for a event logger.
Our needs :
- track a lot of action
- 10 000 actions / second
- 1 billion row at this time
Our hardware :
- 2*Xeon (seen as 32 CPU by the system)
- 128 GB RAM
- 6*600 SSD with Raid 10
Our table design :
CREATE TABLE IF NOT EXISTS `log_event` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`id_event` smallint(6) NOT NULL,
`id_user` bigint(20) NOT NULL,
`date` int(11) NOT NULL,
`data` bigint(20) NOT NULL,
PRIMARY KEY (`id`),
KEY `id_event_2` (`id_event`,`data`),
KEY `id_inscri` (`id_inscri`),
KEY `date` (`date`),
KEY `id_event_4` (`id_event`,`date`,`data`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8
ALTER TABLE `log_event`
ADD CONSTRAINT `log_event_ibfk_1` FOREIGN KEY (`id_inscri`) REFERENCES `inscription` (`id_inscri`) ON DELETE CASCADE ON UPDATE CASCADE;
Our problem :
- We have an auto-increment as primary, but it is not really used. Is it a problem to remove it ? We will no have primary key if we remove it => How to identify a line ?
We would like to do partionning, but with the foreign it seems to be impossible ?
We don't do bulk insert. Is it a good idea to insert in a Memory table without index and copy data every 5 minutes ?
Do you have any idea to optimize ? Do you have best practice for this kind of system ?
Thanks !
François
Primary keys of relational tables (relations) might have two types:
Natural - exists in subject area to completely determine each row of relational table.
Natural primary keys might be simple (if consists of only one column), or complex (if consists more than one column). It is not recomended to set a natural primary key on large string column.
Artificial - special column, injected by database designer / developer to boost table performance, if natural key is complex, and have to be used in related table (is foreign key for something), or if it is simple, but is large and will produce data overhead while copied in related table as a foreign key, or if it is complex to search (for example, CRUD operations on VARCHAR IDs might be slower, than on INT IDs). There might be other reasons. TL;DR: Artificial key - one special column, serving to completely determine each row of relational table and boost it's performance for CRUD operations.
We have an auto-increment as primary, but it is not really used. Is it
a problem to remove it ? We will no have primary key if we remove it
=> How to identify a line ?
If you do not need to reference your table to another tables (as source), then you may probably remove artificial key without any consequences. Still, I recomend you set any other PRIMARY KEY in this table to avoid data duplication, and for obviosity (if it matters).
Your table by itself (if properly normalized) will have natural key as one of "key candidates". It might be complex one (consist of few columns). It is normal. But don't set primary for strings, because PRIMARY always have index, which will produce data overhead. If it is combination of INT or "small" VARCHAR columns, then it is normal.
Consider as an option: id_event + id_user + date.
We don't do bulk insert. Is it a good idea to insert in a Memory table
without index and copy data every 5 minutes ?
It is not a bad idea. But it is not good idea, until it properly tested. Try to perform load-test, before real use.
If you not reference MEMORY table to others, then you still may join it with any other InnoDB table. But you will loose InnoDB functionality (referential integrity). If lose of parent table ON DELETE CASCADE ON UPDATE CASCADE is not a concern, then it might be done. As for me, InnoDB is not so slow to switch table engine, in your case.