I am curious about that , is there any difference with joint primary key order?
For example, is there any difference between the two tables' primary key? the key order would make no difference on the table?
CREATE TABLE `Q3` (
`user_id` VARCHAR(20) NOT NULL,
`retweet_id` VARCHAR(20) NOT NULL,
PRIMARY KEY (`user_id`,`retweet_id`)
)
vs
CREATE TABLE `Q3` (
`user_id` VARCHAR(20) NOT NULL,
`retweet_id` VARCHAR(20) NOT NULL,
PRIMARY KEY (`retweet_id`,`user_id`)
)
It would make difference in an index structure.
In composite index the index value consists of several values that go one after another. And the order determines what queries can be optimized using this particular index.
IE:
For the index created as
PRIMARY KEY (`user_id`,`retweet_id`)
The query like WHERE user_id = 42 will be optimized (not guaranteed, but technically possible), whereas for the query WHERE retweet_id = 4242 it won't be.
PS: it's a good idea to always have an artificial primary key, like a sequence (or an autoincrement column in case of mysql), instead of using natural primary keys. It would be better because the primary key is a clustered key, which means it defines how rows are physically stored in pages on disk. Which means it's a good idea for a PK to be monotonously growing (or decreasing, doesn't matter)
The order does affect how the index is used in queries. When you use multiple columns, each column is a sub-tree of the preceding column.
In your first case (user_id, retweet_id) - if you searched the index for user_id 1, you then have all the retweet_ids under that.
Subsequently if you wish to search for only retweet_id=7 (for all users) - the index cannot be used because you need to first step through each users item in the index.
So if you wish to query for user_id, or retweet_id individually (without the other), put that column first. If you need both you could consider adding a secondary index.
There are also limitations for range scans, you can only effectively use the last column queried for the range scan. You can read more about all of this here:
http://dev.mysql.com/doc/refman/5.6/en/multiple-column-indexes.html
Additionally if using InnoDB, the tables are stored in order of the PRIMARY KEY. This might matter for performance depending on how you query your data.
Related
I found this old code and I'm not sure if it's optimized or just doing something silly.
I have a SQL create statement like this:
CREATE TABLE `wp_pmpro_memberships_categories` (
`membership_id` int(11) unsigned NOT NULL,
`category_id` int(11) unsigned NOT NULL,
`modified` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY `membership_category` (`membership_id`,`category_id`),
UNIQUE KEY `category_membership` (`category_id`,`membership_id`)
);
Is that second UNIQUE KEY there redundant with the PRIMARY KEY on the same 2 columns? Or would the second one help for queries that filter by the category_id first then by the membership_id? Is it being ignored?
I'm trying to remember why I coded it that way, way back when. Seems similar to what this comment is describing: https://dba.stackexchange.com/a/1793/245678
Thanks!
It depends on your query patterns. If you do SELECT, UPDATE, DELETE only on the category_id column, then the 2nd index makes sense but you should omit the membership_id column (redundant) and the UNIQUE constraint.
MySQL will automatically use the PRIMARY KEY index if you use either membership_id or both columns. It doesn't matter in which order these columns appear in your WHERE clauses.
The secondary index does improve performance when going from a "category" to a "membership".
You coded it with those two indexes because some queries start with a "membership" and need to locate a "category"; some queries go the 'other' direction.
That's a well-coded "many-to-many mapping table".
InnoDB provides better performance than MyISAM.
The "Uniqueness" constraint in the UNIQUE key is redundant.
Checking for Uniqueness slows dowing writes by a very small amount. (The constraint must be checked before finishing the update to the index's BTree. A non-unique index can put off the update until later; see "change buffering".)
I like to say this to indicate that I have some reason for the pair of columns being together in the index:
INDEX(`category_id`,`membership_id`)
I discuss the schema pattern here: http://mysql.rjweb.org/doc.php/index_cookbook_mysql#many_to_many_mapping_table
I created/defined an admin table, now I have seen other programmers alter the table and add keys to the tables
CREATE TABLE `admin` (
`admin_id` int(11) NOT NULL AUTO_INCREMENT,
`admin_name` varchar(255) NOT NULL,
`admin_surname` varchar(255) NOT NULL,
`phone` CHAR(10) NOT NULL,
`admin_email` varchar(255) NOT NULL,
`password` varchar(255) NOT NULL,
PRIMARY KEY (`admin_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
ALTER TABLE `admin`
ADD PRIMARY KEY (`admin_id`),
ADD UNIQUE KEY `admin_email` (`admin_email`);
If I have already defined the table why should I alter the definition again here?
In InnoDB there exists clustered index always.
When primary key exists in a table then it is used as clustered index.
When no primary key but unique index(es) which's expression does not innclude NULLable columns exists then the most upper unique index from them in the table definition is clustered.
When no such unique index then inner hidden row number is used as an expression for clustered index.
Hence, if you create a table (and some expression is used for clustered index) and then use ALTER TABLE for to add primary key then the table must be rebuilt. It doesn't matter when the table is empty, but when there is data in it the process may be long enough (because COPY method is used).
If you create primary key as a part of CREATE TABLE then this is always fast.
I like to put all the index definitions inside the CREATE TABLE, and put them at the end instead of sitting on the column definitions.
Potential problem 1:
But I notice that some dump utilities like to add the indexes later. This may be a kludge to handle FOREIGN KEY definitions. Those have trouble if the tables are not created in just the right order.
It would seem better to simply ADD FOREIGN KEY... after all the tables are created and indexed.
Potential problem 2:
If you will be inserting a huge number of rows, it is usually more efficient to make the secondary keys after loading the data. This is more efficient than augmenting the indexes as it goes. For small tables (under, say, a million rows), this is not a big deal.
I do not understand why they ADD PRIMARY KEY after loading the data. That requires (as Akina points out) tossing the fabricated PK, sorting the data, and adding the real PK. That seems like extra work, even for a huge table.
If the rows are sorted in PK order, the loading is more efficient. The table is ordered by the PK (for InnoDB); inserting in that order is faster than jumping around. (mysqldump will necessarily provide them in PK order, so it is usually a non-issue.)
I'm new to sql and now working with MySQL.
I'm going through the concept of indexes and I'm not sure what would happen in the following case:
CREATE TABLE test (
id INT NOT NULL,
last_name CHAR(30) NOT NULL,
first_name CHAR(30) NOT NULL,
PRIMARY KEY (id),
INDEX name (last_name,first_name)
);
I have read that here, last_name or (last_name,first_name) can be used for lookup where as first_name cannot be used for lookup directly (not a leftmost index).
I have also read that PRIMARY KEY and UNIQUE KEY are indexed automatically. So, in my case where the id index comes? Don't it come as a leftmost prefix.
select * from test
where id=xxx and last_name==xxxx
will this call for a look up or searches the entire database?
First, your query is redundant. The id comparison is sufficient.
The optimizer is going to recognize that two indexes can be used for the query. I'm pretty sure that MySQL will choose the primary key index, because it is unique and clustered. Hence, it is obviously the correct one.
If neither index is unique or a primary key, then MySQL will resort to statistics about the indexes (or arbitrarily choosing one of them). You can read about index statistics in the documentation.
In MySQL, does following statement make sense?
CREATE TABLE `sku_classification` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`sku` int(10) unsigned NOT NULL,
`business_classification_id` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `IDX_SKU_BUSINESS_CLASSIFICATION` (`sku`,`business_classification_id`),
UNIQUE KEY `sku` (`sku`)
)
Is it an unnecessary overkill to add a unique key on a combination of fields (sku,business_classification_id), one of which (sku) already has unique index on it? Or is it not, and there is indeed some reason for such duplicate unique index?
Yes, you can. But it does not make sense. But, let's analyze what is going on.
An INDEX (UNIQUE or not) is a BTree that facilitates lookups in the table.
A UNIQUE index is both an index and a "constraint" saying that there shall not be any duplicates.
You have already said UNIQUE(sku). This provides both an index an a uniqueness constraint.
Adding UNIQUE(sku, x) in that order:
Does not provide any additional uniqueness constraint,
Does not provide any additional indexing capability, except...
Does provide a "covering" index that could be useful if the only two columns mentioned in a SELECT were sku and x. Even so, you may as make it an INDEX not a UNIQUE, because...
Every INSERT must do some extra effort to prevent "duplicate key". (OK, the INSERT code is not smart enough to see that you have UNIQUE(sku).)
If that is your complete table, there is no good reason to have the id AUTO_INCREMENT; you may as well promote sku to be the PRIMARY KEY. (A PK is a UNIQUE KEY.)
Furthermore... If, on the other hand, you were suggesting UNIQUE(x, sku), then there is one slight difference. This provides you a way to efficiently lookup by x -- a range of x, or x=constant AND sku BETWEEN ..., or certain other thing that are not provided by (sku, x). Order matters in an index. But, again, it may as well be INDEX(x, sku), not UNIQUE.
So, the optimal set of indexes for the table as presented is not 3 indexes, but 1:
PRIMARY KEY(sku)
One more note: With InnoDB, the PK is "clustered" in BTree with the data. That is, looking up by the PK is very efficient. When you need to go through a "secondary index", there are two steps: first drill down the secondary index's BTree to find the PK, then drill down the PK's BTree.
This is my table structure:
CREATE TABLE `channel_play_times_bar_chart` (
`playing_date` datetime NOT NULL,
`channel_report_tag` varchar(50) NOT NULL,
`country_code` varchar(50) NOT NULL,
`device_report_tag` int(11) NOT NULL,
`greater_than_30_minutes` decimal(10,0) NOT NULL,
`15_to_30_minutes` decimal(10,0) NOT NULL,
`0-15_minutes` decimal(10,0) NOT NULL,
PRIMARY KEY (`country_code`,`device_report_tag`,`channel_report_tag`,`playing_date`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
When I run the following query:
EXPLAIN EXTENDED
SELECT
channel_report_tag,
SUM(`greater_than_30_minutes`) AS '>30 minutes',
SUM(`15_to_30_minutes`) AS '15-30 Minutes',
SUM(`0-15_minutes`) AS '0-15 Minutes'
FROM
channel_play_times_bar_chart USE INDEX (ABCDE)
WHERE country_code = 'US'
AND device_report_tag = 14
AND channel_report_tag = 'DUNYA NEWS'
AND playing_date BETWEEN '2016-09-01'
AND '2016-09-13'
GROUP BY channel_report_tag
ORDER BY SUM(`greater_than_30_minutes`) DESC
LIMIT 10
This is the output I get ( open it in another tab):
The index was defined as :
CREATE INDEX ABCDE
ON channel_play_times_bar_chart (
`country_code`,
`device_report_tag`,
`channel_report_tag`,
`playing_date`,
`greater_than_30_minutes`
)
I am a bit confused here ; The key column shows ABCDE being used the as the index , yet ref column shows NULL. What does this mean ? Is the index actually being used ? If not what did I do wrong ?
It is using the key you are showing in the create index, that is, ABCDE.
It would be nice if you did a
show create table channel_play_times_bar_chart
and just showed it all at once. That key might not be of much use to you as it replicates most of what your rather wide Primary Key already gives you.
Once the query uses the key up thru the 3rd segment of the composite key, it resumes with a WHERE range on playing_date in that composite and finds 8 rows.
Note EXPLAIN is an estimate.
Further, I would reconsider the strategy for your PRIMARY KEY (PK) ideas especially considering that you decided to dupe it up more or less with the creation of ABCDE. That means you are maintaining two indexes with little if anything gained on the second one (you added one column to the secondary index ABCDE).
The PK It is rather WIDE (118 bytes I believe). It dictates the physical ordering. And that idea could easily be a bad one if used throughout the way you architect things. Changes made to data via UPDATE that impact the columns in the PK force a reshuffle of physical ordering of the table. That fact would be a good indication why id INT AUTO_INCREMENT PRIMARY KEY is often used as a best practice use case as it never endures a reshuffle and is THIN (4 bytes).
The width of keys and their strategy with referencing (other) tables (in Foreign Key Constraints) impact key sizes and performance for lookups. Wide keys can measurably slow down that process.
This is not to suggest that you shouldn't have a key on those columns like your secondary index ABCDE. But in general that is not a good idea for the PK.
Note that it could be argued that ABCDE never gives you any benefit over your PK due to range queries ceasing the use of it near the end that just WHERE out with ranges once it hits the date. Just a thought.
A nice read and rather brief is the article Using EXPLAIN to Write Better MySQL Queries.
Your query does use the ABCDE index. As MySQL documentation on EXPLAIN Output Format explains :) (bolding is mine):
key (JSON name: key)
The key column indicates the key (index) that MySQL actually decided
to use. If MySQL decides to use one of the possible_keys indexes to
look up rows, that index is listed as the key value.
The ref field of the explain output is primarily used when joins are present to show the fields / constants / expressions the index was compared with.