I am creating a table which will store around 100million rows in MySQL 5.6 using InnoDB storage engine. This table will have a foreign key that will link to another table with around 5 million rows.
Current Table Structure:
`pid`: [Foreign key from another table]
`price`: [decimal(9,2)]
`date`: [date field]
and every pid should have only one record for a date
What is the best way to create indexes on this table?
Option #1: Create Primary index on two fields pid and date
Option #2: Add another column id with AUTO_INCREMENT and primary index and create a unique index on column pid and date
Or any other option?
Only select query i will be using on this table is:
SELECT pid,price,date FROM table WHERE pid = 123
Based on what you said (100M; the only query is...; InnoDB; etc):
PRIMARY KEY(pid, date);
and no other indexes
Some notes:
Since it is InnoDB, all the rest of the fields are "clustered" with the PK, so a lookup by pid is acts as if price were part of the PK. Also WHERE pid=123 ORDER BY date would be very efficient.
No need for INDEX(pid, date, price)
Adding an AUTO_INCREMENT gains nothing (except a hint of ordering). If you needed ordering, then an index starting with date might be best.
Extra indexes slow down inserts. Especially UNIQUE ones.
Either method is fine. I prefer having synthetic primary keys (that is, the auto-incremented version with the additional unique index). I find that this is useful for several reasons:
You can have a foreign key relationship to the table.
You have an indicator of the order of insertion.
You can change requirements, so if some pids allows two values per day or only one per week, then the table can support them.
That said, there is additional overhead for such a column. This overhead adds space and a small amount of time when you are accessing the data. You have a pretty large table, so you might want to avoid this additional effort.
I would try with an index that attempts to cover the query, in the hope that MySQL has to access to the index only in order to get the result set.
ALTER TABLE `table` ADD INDEX `pid_date_price` (`pid` , `date`, `price`);
or
ALTER TABLE `table` ADD INDEX `pid_price_date` (`pid` , `price`, `date`);
Choose the first one if you think you may need to select applying conditions over pid and date in the future, or the second one if you think the conditions will be most probable over pid and price.
This way, the index has all the data the query needs (pid, price and date) and its indexing on the right column (pid)
By the way, always use EXPLAIN to see if the query planner will really use the whole index (take a look at the key and keylen outputs)
Related
I was wondering how would mysql act if i partition a table by date and then have some select or update queries by primary key ?
is it going to search all partitions or query optimizer knows in which partition the row is saved ?
What about other unique and not-unique indexed columns ?
Background
Think of a PARTITIONed table as a collection of virtually independent tables, each with its own data BTree and index BTree(s).
All UNIQUE keys, including the PRIMARY KEY must include the "partition key".
If the partition key is available in the query, the query will first try to do "partition pruning" to limit the number of partitions to actually look at. Without that info, it must look at all partitions.
After the "pruning", the processing goes to each of the possible partitions, and performs the query.
Select, Update
A SELECT logically does a UNION ALL of whatever was found in the non-pruned partitions.
An UPDATE applies its action to each non-pruned partitions. No harm is done (except performance) by the updates that did nothing.
Opinion
In my experience, PARTITIONing often slows thing down due to things such as the above. There are a small number of use cases for partitioning: http://mysql.rjweb.org/doc.php/partitionmaint
Your specific questions
partition a table by date and then have some select or update queries by primary key ?
All partitions will be touched. The SELECT combines the one result with N-1 empty results. The UPDATE will do one update, plus N-1 useless attempts to update.
An AUTO_INCREMENT column must be the first column in some index (not necessarily the PK, not necessarily alone). So, using the id is quite efficient in each partition. But that means that it is N times as much effort as in a non-partitioned table. (This is a performance drag for partitioning.)
Step 1:
I am creating a simple table.
CREATE TABLE `indexs`.`table_one` (
`id` INT NOT NULL AUTO_INCREMENT,
`name` VARCHAR(45) NULL,
PRIMARY KEY (`id`));
Step 2:
I make two inserts into this table.
insert into table_one (name) values ("B");
insert into table_one (name) values ("A");
Step 3:
I make a select, I get a table, the records in which are ordered by id.
SELECT * FROM table_one;
This is the expected result, because in mysql the primary key is a clustered index, therefore the data will be physically ordered by it.
Now the part I don't understand.
Step 4:
I am creating an index on the name column.
CREATE INDEX index_name ON table_one(name)
I repeat step 3 again, but I get a different result. The lines are now ordered according to the name column.
Why is this happening? why the order of the rows in the table changes in accordance with the new index on the name column, because as far as I understand, in mysql, the primary key is the only clustered index, and all indexes created additionally are secondary.
I make a select, I get a table, the records in which are ordered by id. [...] This is the expected result, because in mysql the primary key is a clustered index, therefore the data will be physically ordered by it.
There is some misunderstanding of a concept here.
Table rows have no inherent ordering: they represent unordered set of rows. While the clustered index enforces a physical ordering of data in storage, it does not guarantee the order in which rows are returned by a select query.
If you want the results of the query to be ordered, then use an order by clause. Without such clause, the ordering or the rows is undefined: the database is free to return results in whichever order it likes, and results are not guaranteed to be consistent over consecutive executions of the same query.
select * from table_one order by id;
select * from table_one order by name;
(GMB explains most)
Why is this happening? why the order of the rows in the table changes in accordance with the new index on the name column
Use EXPLAIN SELECT ... -- it might give a clue of what I am about to suggest.
You added INDEX(name). In InnoDB, the PRIMARY KEY column(s) are tacked onto the end of each secondary index. So it is effectively a BTree ordered by (name,id) and containing only those columns.
Now, the Optimizer is free to fetch the data from the index, since it has everything you asked for (id and name). (This index is called "covering".)
Since you did not specify an ORDER BY, the result set ordering is valid (see GMB's discussion).
Moral of the story: If you want an ordering, specify ORDER BY. (The Optimizer is smart enough to "do no extra work" if it can see how to provide the data without doing a sort.
Further experiment: Add another column to the table but don't change the indexes. Now you will find SELECT * FROM t is ordered differently than SELECT id, name FROM t. I think I have given you enough clues to predict this difference, if not, ask.
Whats is the difference between creating a covering index for all the foreign keys of a relation table and creating one index for each column (foreign key) of the relation table ?
For instance, I have the table sales(p_id, e_id, c_id, ammount) where p_id is a foreign key (products table), e_id is a foreign key (employee table) and c_id a foreign key (customer_table). The primary key of the table is {p_id, e_id, c_id}.
Which on is better ?
CREATE INDEX cmpindex ON sales(p_id, e_id, c_id)
OR
CREATE INDEX pindex on sales(p_id)
CREATE INDEX eindex on sales(e_id)
CREATE INDEX cindex on sales(c_id)
I mostly run queries with joins on the relation table and the parent tables.
Which one is better depends on your actual queries.
One thing to understand is that when you join the table sales once in your query, it will only use one index for it (at the most). So you need to make sure an index is available that is most appropriate for the query.
If you join the sales table always to all three other tables (customer, product and employee) then a composite index is to be preferred, assuming that the engine will actually use it and not perform a table scan.
The order of the fields in the composite index is important when it comes to the order of the results. For instance, if your query is going to group the results by product (first), and then order the details per customer, you could benefit from an index that has the product id first, and the customer id as second.
But it may also be that the engine decides that it is better to start scanning the table sales first and then join in the other three tables using their respective primary key indexes. In that case no index is used that exists on the sales table.
The only way to find out is to get the execution plan of your query and see which indexes will be used when they are defined.
If you only have one query on the sales table, there is no need to have several indexes. But more likely you have several queries which output completely different results, with different field selections, filters, groupings, ...Etc.
In that case you may need several indexes, some of which will serve for one type of query, and others for others. Note that what you propose is not mutually exclusive. You could maybe benefit from several composite indexes, which just have a different order of fields.
Obviously, a multitude of indexes will slow down data changes in those tables, so you need to consider that trade-off as well.
Note that an index on a compound key will only be used if you query on the first portion, the first and second portion, the first, second and third portion, etc., so querying on p_id or p_id and e_id, etc. or even e_id and p_id will utilize the index. Indeed, any query containing p_id will use this index.
However, if you query your Sales table on e_id or c-id or any combination of these two, cmpindex will not be used and a full table scan will be performed.
One benefit of having an index on each foreign key (a non-unique index, as there could be multiple sales of the same product, or by the same employee, or to the same customer, leading to duplicate entries in the index) is that the query optimizer has the option of using the index to reduce the number of rows returned, and then doing a sequential search through the result set.
E.g. if the query is a search on sales of a particular product to a particular customer (regardless of employee) and you have a million sales, the foreign key index cindex could be used to return 20 sales items to that particular customer, and that result set could be very efficiently searched sequentially to find which of these sales were for a particular product.
If the search was performed on Product and pindex was used, the result set may be 10,000 rows (all sales of that product), which would have to be sequentially searched to find the sales of that product to a particular customer, leading to a very inefficient query.
I believe that the statistics kept for a table (used by the optimizer) keep track of the average number of rows that will be returned for a query using each index, so the optimizer will be able to work out that cindex should be used rather than pindex in the examples above. Alternatively, you can give hints on your queries to specify that a particular index be used.
It is, obviously, important to run UPDATE STATISTICS on a regular basis, as the execution plan would use pindex in the example above if there were, on average, only 10 sales of each product.
If your queries(search) propagates through sales for each of tables independently then you must create a separate index for each.
If that's not necessary then you can go for composite.
As HoneyBadger commented, you already have a composite index, since your primary key is itself an index.
In general, you should use a single index for each column whenever you think you will have queries involving each field by itself.
As stated here, when you have a composite index, it can work with queries involving all fields, or with queries involving the first field (in order), the first and the second, or the first,second and third together. It won't be used in queries involving only the second and third field.
The other answers are missing an important point. When you declare a foreign key in MySQL, it creates an index on the column. This is not (necessarily) true in other databases, but it is true in MySQL.
So, the declaration automatically creates these indexes:
CREATE INDEX pindex on sales(p_id);
CREATE INDEX eindex on sales(e_id);
CREATE INDEX cindex on sales(c_id);
(These indexes are very handy for dealing with cascading constraints and maintaining the data integrity based on the foreign key.)
If you happen to have also declared an index on sales(p_id, e_id, c_id, amount), then the first of the indexes is not needed -- it is a subset of this index. However, the other two are needed.
Is this index needed? As mentioned in other questions, that depends on the queries you want to use the index for. I recommend starting with the documentation on this subject to understand how the indexes get used.
I have A table with almost 20 fields which several of those are Foreign Key that already has been indexed by Mysql, now I want to create a multi-indexes index that it contains 3 FK field,
First tried was based on Fields
ALTER TABLE `Add`
Add INDEX `IX_Add_ON_IDCat_IDStatus_IDModeration_DateTo_DateAdded`
(`IDCategory`,`IDStatus`,`IDModeration`,`DateTo`,`DateAdded`);
But I think it's better to have an index on indexes instead of fields but my following effort faced with error: Error Code: 1072. Key column 'FK_Add_Category' doesn't exist in table
ALTER TABLE `Add`
Add INDEX `IX_Add_ON_IDCat_IDStatus_IDModeration_DateTo_DateAdded`
(`FK_Add_Category`,`FK_Add_AddStatus`,`FK_Add_AddModeration`,
`IX_Add_DateTo`,`IX_Add_DateAdded`);
My question is is it possible to add an index on exists Indexes ( FK index in my case ) or not and there is the only way to create an index on Columns? if yes How I create that?
An index is an ordered list of values. It is used to make it more efficient to find rows in the table.
Think about the common, real-life, example of INDEX(last_name, first_name). It makes it easy to look up someone if you have their last name and first name. And sort of easy if you have only their last name.
But it is useless if all you have is their first name.
FOREIGN KEYs necessitate a lookup. Apparently you have a FK to AddStatus, since I see FK_Add_AddStatus. That FK generated a lookup for AddStatus. Think of that as being like a separate index on first_name. It is totally separate from the index on last_name & first_name.
5 columns is usually too many to put into a single index.
MySQL uses only one index for a given SELECT.
So, now, I ask, what SELECT might use that 5-column index? Please show us it. We can discuss whether it is useful, and whether the columns are in the optimal order.
I'm still trying to get my head around the best way to use INDEXES in MySQL. How do you know when to merge them together and when to have them separate?
Below are the indexes from the Wordpress posts table. See how post_name, post_parent and post_author are seperate entries? And then they have type_status_date which is a mixture of 4 fields?
http://img215.imageshack.us/img215/5976/screenshot20120426at431.png
I don't understand the logic behind this? Can anyone enlighten me?
Going to be a bit of a long answer but here we go. Please note I am not going to deal with the differences in database engines here(MyISAM and InnoDB have distinct way of implementing what I am trying to describe)
First thing you have to understand about a index is that it is a separate data structure stored on disk. Normally this is a b-tree data structure containing the column(s) that you have indexed and also contain a pointer to the row in the table(this pointer is normally the primary key).
The only index that is stored with the data is the primary key index. Thus a primary key index IS the table.
Lets assume you have following table definition.
CREATE TABLE `Student` (
`StudentNumber` INT NOT NULL ,
`Name` VARCHAR(32) NULL ,
`Surname` VARCHAR(32) NULL ,
`StudentEmail` VARCHAR(32) NULL ,
PRIMARY KEY (`StudentNumber`) );
Since we have a primary key on StudentID there will be a index containing the primary key and the other columns in the index. If you had to look at the data in the index you would probably see something like this.
1 , John ,Doe ,Jdoe#gmail.com
As you can see this is the table data once again showing you that the primary key index IS the table.
The StudentNumber column is indexed which allows your to effectively search on it the rest of the data is stored with the key. Thus if ran the following query:
SELECT * FROM Student WHERE StudentNumber=1
MySQL would use the primary index to quickly find the row and the read the data stored with the indexed column. Since there is a index MySQL can use the index to do a effective binary seek operation on the b-tree.
Also when it comes to retrieving the data after doing the search MySQL can read the data from the index thus we are using 1 operation in the index to retrieve the data. Now if I ran the following query:
SELECT * FROM Student WHERE Name ='Joe'
MySQL would check if there is a index that it could use to speed the query up. However in my case there is no index on name so MySQL would do a sequential read from the table one row at a time from the first row to the last.
At each row it would evaluate the row against the where clause and return matching row. So basically it reads the primary key index from top to bottom. Remember the primary key index is the table.
If I ran the following statement:
ALTER TABLE `TimLog`.`student`
ADD INDEX `ix_name` (`Name` ASC) ;
ALTER TABLE `TimLog`.`student`
ADD INDEX `ix_surname` (`Surname` ASC) ;
MySQL would create new indexes on the Student table. This will be stored away from the table on disk and the data inside would look something like this:
Data in ix_Name
John, 1 <--PRIMARY KEY VALUE
Data in ix_Surname
Doe, 1 <--PRIMARY KEY VALUE
Notice the data in the ix_Name index is the name and the primary key value. Great so if I ran the previous select statement MySQL would then read the ix_name index and get the primary key value for matching items and then use the primary key index to get the rest of the data.
So the number of operations to get the data from the index is 2. The matching rows are found in the index and then a lookup happens on the primary key to get the row data out.
You now have the following query:
SELECT * FROM Student WHERE Name='John' AND surname ='Doe'
Here MySQL cant use both indexes as it would be a waste of operations. If MySQL had to use both indexes in this query the following would happen(this should not happen).
1 Find in the ix_Name the rows with the value John
2 Read the primary key that matches to get the row data
3 Store the matching results
4 Find in the ix Surname the rows with the value Doe
5 Read the primary key that matches to get row data.
6 Store the matching results
7 Take the Name results and Surname results and merge them
8 Return query results.
This is really a waste of IO as MySQL would then read the table twice. Basically using one index would be better than trying to use two(I will explain in a momnet why). MySQL will choose 1 index to use in a this simple query.
So how does MySQL decide on which index to use?
MySQL keeps statistics around indexes internally. These statistics tell MySQL basically how unique a index is. So for the sake of argument lets say the surname index (ix_surname)was more unique than the name index(ix_name) MySQL would use the surname index (ix_surname).
Thus query retrieval would be like this:
1 Use the ix_surname and find rows that match the value Doe
2 Read the primary key and apply the filter for the value John on the actual column data in the row.
3 Return the matched row.
As you can see the number of operations in this search is much less. I have over simplified a lot of the technical detail. Indexing is a interesting thing to master but you have to look at it from the perspective of how do I get the data with the minimal amount of IO.
Hope it is as clear as mud now!
MySQL cannot normally use more than one index at a time. That means, for instance, that when you have a query that filters or sorts on two fields you put them both into the same index.
WordPress likely has a common query that filters and/or sorts on post_type, post_status and post_date. Making an educated guess as to what they stand for, this would likely be the core query for WordPress's Post listing pages. So the three fields are put into the same index.