I have a table we will be of 15-30k in size eventually not more.
I have only two columns in that table one is id and other is status
We will have insert queries ofc
We will have update query but not on id
We will have delete query
My question is should I create index on the column id ?
Will index be useful for a table having 15k-30k rows ? Or will it be negligible ?
Also I am concerned about the increase in cost of insert queries? Will it be worth to add index on id ? Considering the small table size will it be any faster or the effect will be negligible ?
If effect will be negligible, I should not add index to it right as it will increase the insert queries cost right ?
If your id column is a PRIMARY KEY, then it's already an index and there is no need to create a new one.
If no primary key is defined, it's best to get used of creating one for each table.
Without any index, MySQL has to start with the first row and go through the whole table to find the relevant rows.
Indexes make it possible to find data much faster, even on tables with few data.
MySQL 8.0 Reference Manual:
https://dev.mysql.com/doc/refman/8.0/en/mysql-indexes.html
Related
I was wondering how would mysql act if i partition a table by date and then have some select or update queries by primary key ?
is it going to search all partitions or query optimizer knows in which partition the row is saved ?
What about other unique and not-unique indexed columns ?
Background
Think of a PARTITIONed table as a collection of virtually independent tables, each with its own data BTree and index BTree(s).
All UNIQUE keys, including the PRIMARY KEY must include the "partition key".
If the partition key is available in the query, the query will first try to do "partition pruning" to limit the number of partitions to actually look at. Without that info, it must look at all partitions.
After the "pruning", the processing goes to each of the possible partitions, and performs the query.
Select, Update
A SELECT logically does a UNION ALL of whatever was found in the non-pruned partitions.
An UPDATE applies its action to each non-pruned partitions. No harm is done (except performance) by the updates that did nothing.
Opinion
In my experience, PARTITIONing often slows thing down due to things such as the above. There are a small number of use cases for partitioning: http://mysql.rjweb.org/doc.php/partitionmaint
Your specific questions
partition a table by date and then have some select or update queries by primary key ?
All partitions will be touched. The SELECT combines the one result with N-1 empty results. The UPDATE will do one update, plus N-1 useless attempts to update.
An AUTO_INCREMENT column must be the first column in some index (not necessarily the PK, not necessarily alone). So, using the id is quite efficient in each partition. But that means that it is N times as much effort as in a non-partitioned table. (This is a performance drag for partitioning.)
I am creating a table which will store around 100million rows in MySQL 5.6 using InnoDB storage engine. This table will have a foreign key that will link to another table with around 5 million rows.
Current Table Structure:
`pid`: [Foreign key from another table]
`price`: [decimal(9,2)]
`date`: [date field]
and every pid should have only one record for a date
What is the best way to create indexes on this table?
Option #1: Create Primary index on two fields pid and date
Option #2: Add another column id with AUTO_INCREMENT and primary index and create a unique index on column pid and date
Or any other option?
Only select query i will be using on this table is:
SELECT pid,price,date FROM table WHERE pid = 123
Based on what you said (100M; the only query is...; InnoDB; etc):
PRIMARY KEY(pid, date);
and no other indexes
Some notes:
Since it is InnoDB, all the rest of the fields are "clustered" with the PK, so a lookup by pid is acts as if price were part of the PK. Also WHERE pid=123 ORDER BY date would be very efficient.
No need for INDEX(pid, date, price)
Adding an AUTO_INCREMENT gains nothing (except a hint of ordering). If you needed ordering, then an index starting with date might be best.
Extra indexes slow down inserts. Especially UNIQUE ones.
Either method is fine. I prefer having synthetic primary keys (that is, the auto-incremented version with the additional unique index). I find that this is useful for several reasons:
You can have a foreign key relationship to the table.
You have an indicator of the order of insertion.
You can change requirements, so if some pids allows two values per day or only one per week, then the table can support them.
That said, there is additional overhead for such a column. This overhead adds space and a small amount of time when you are accessing the data. You have a pretty large table, so you might want to avoid this additional effort.
I would try with an index that attempts to cover the query, in the hope that MySQL has to access to the index only in order to get the result set.
ALTER TABLE `table` ADD INDEX `pid_date_price` (`pid` , `date`, `price`);
or
ALTER TABLE `table` ADD INDEX `pid_price_date` (`pid` , `price`, `date`);
Choose the first one if you think you may need to select applying conditions over pid and date in the future, or the second one if you think the conditions will be most probable over pid and price.
This way, the index has all the data the query needs (pid, price and date) and its indexing on the right column (pid)
By the way, always use EXPLAIN to see if the query planner will really use the whole index (take a look at the key and keylen outputs)
Let us consider I have a table with 60 columns , I need to perform all kind of queries on that table and need to join that table with other tables as well. And I almost using all rows for searching data in that table including other tables. This table is the like a primary table(like a primary key) in the database. So all table are in relation with this table.
By considering the above scenario can I create index on each column on the table (60 columns )
,is it good practice ?
In single sentence:
Is it best practice to create index on each column in a table ?
What might happens if I create index on each column in a table?
Where index might be "Primary key", "unique key" or "index"
Please comment, if this question is unclear for you people I will try to improve this question.
MySQL's documentation is pretty clear on this (in summary use indices on columns you will use in WHERE, JOIN, and aggregation functions).
Therefore there is nothing inherently wrong with creating an index on all columns in a table, even if it is 60 columns. The more indices there are the slower inserts and some updates will be because MySQL has to create the keys, but if you don't create the indices MySQL has to scan the entire table if only non-indexed columns are used in comparisons and joins.
I have to say that I'm astonished that you would
Have a table with 60 columns
Have all of those columns used either in a JOIN or WHERE clause without dependency on any other column in the same table
...but that's a separate issue.
It is not best practice to create index on each column in a table.
Indexes are most commonly used to improve query performance when the column is used in a where clause.
Suppose you use this query a lot:
select * from tablewith60cols where col10 = 'xx';
then it would be useful to have an index on col10.
Note that primary keys by default have an index on them, so when you join the table with other tables you should use the primary key to join.
Adding an index means that the database has to maintain it, that means that it has to be updated, so the more writes you have, the more the index will be updated.
Creating index out of the box is not a good idea, create an index only when you need it (or when you can see the need in the future... only if it is pretty obvious)
creating more index in SQL will increase only search speed while you will get slowness of insert and update and also it will take more storage.
I have a query of the following form:
SELECT * FROM MyTable WHERE Timestamp > [SomeTime] AND Timestamp < [SomeOtherTime]
I would like to optimize this query, and I am thinking about putting an index on timestamp, but am not sure if this would help. Ideally I would like to make timestamp a clustered index, but MySQL does not support clustered indexes, except for primary keys.
MyTable has 4 million+ rows.
Timestamp is actually of type INT.
Once a row has been inserted, it is never changed.
The number of rows with any given Timestamp is on average about 20, but could be as high as 200.
Newly inserted rows have a Timestamp that is greater than most of the existing rows, but could be less than some of the more recent rows.
Would an index on Timestamp help me to optimize this query?
No question about it. Without the index, your query has to look at every row in the table. With the index, the query will be pretty much instantaneous as far as locating the right rows goes. The price you'll pay is a slight performance decrease in inserts; but that really will be slight.
You should definitely use an index. MySQL has no clue what order those timestamps are in, and in order to find a record for a given timestamp (or timestamp range) it needs to look through every single record. And with 4 million of them, that's quite a bit of time! Indexes are your way of telling MySQL about your data -- "I'm going to look at this field quite often, so keep an list of where I can find the records for each value."
Indexes in general are a good idea for regularly queried fields. The only downside to defining indexes is that they use extra storage space, so unless you're real tight on space, you should try to use them. If they don't apply, MySQL will just ignore them anyway.
I don't disagree with the importance of indexing to improve select query times, but if you can index on other keys (and form your queries with these indexes), the need to index on timestamp may not be needed.
For example, if you have a table with timestamp, category, and userId, it may be better to create an index on userId instead. In a table with many different users this will reduce considerably the remaining set on which to search the timestamp.
...and If I'm not mistaken, the advantage of this would be to avoid the overhead of creating the timestamp index on each insertion -- in a table with high insertion rates and highly unique timestamps this could be an important consideration.
I'm struggling with the same problems of indexing based on timestamps and other keys. I still have testing to do so I can put proof behind what I say here. I'll try to postback based on my results.
A scenario for better explanation:
timestamp 99% unique
userId 80% unique
category 25% unique
Indexing on timestamp will quickly reduce query results to 1% the table size
Indexing on userId will quickly reduce query results to 20% the table size
Indexing on category will quickly reduce query results to 75% the table size
Insertion with indexes on timestamp will have high overhead **
Despite our knowledge that our insertions will respect the fact of have incrementing timestamps, I don't see any discussion of MySQL optimisation based on incremental keys.
Insertion with indexes on userId will reasonably high overhead.
Insertion with indexes on category will have reasonably low overhead.
** I'm sorry, I don't know the calculated overhead or insertion with indexing.
If your queries are mainly using this timestamp, you could test this design (enlarging the Primary Key with the timestamp as first part):
CREATE TABLE perf (
, ts INT NOT NULL
, oldPK
, ... other columns
, PRIMARY KEY(ts, oldPK)
, UNIQUE (oldPK)
) ENGINE=InnoDB ;
This will ensure that the queries like the one you posted will be using the clustered (primary) key.
Disadvantage is that your Inserts will be a bit slower. Also, If you have other indices on the table, they will be using a bit more space (as they will include the 4-bytes wider primary key).
The biggest advantage of such a clustered index is that queries with big range scans, e.g. queries that have to read large parts of the table or the whole table will find the related rows sequentially and in the wanted order (BY timestamp), which will also be useful if you want to group by day or week or month or year.
The old PK can still be used to identify rows by keeping a UNIQUE constraint on it.
You may also want to have a look at TokuDB, a MySQL (and open source) variant that allows multiple clustered indices.
I run the following query on my database :
SELECT e.id_dernier_fichier
FROM Enfants e JOIN FichiersEnfants f
ON e.id_dernier_fichier = f.id_fichier_enfant
And the query runs fine. If I modifiy the query like this :
SELECT e.codega
FROM Enfants e JOIN FichiersEnfants f
ON e.id_dernier_fichier = f.id_fichier_enfant
The query becomes very slow ! The problem is I want to select many columns in table e and f, and the query can take up to 1 minute ! I tried different modifications but nothing works. I have indexes on id_* also on e.codega. Enfants has 9000 lines and FichiersEnfants has 20000 lines. Any suggestions ?
Here are the info asked (sorry not having shown them from the beginning) :
The difference in performance is possibly due to e.id_dernier_fichier being in the index used for the JOIN, but e.codega not being in that index.
Without a full definition of both tables, and all of their indexes, it's not possible to tell for certain. Also, including the two EXPLAIN PLANs for the two queries would help.
For now, however, I can elaborate on a couple of things...
If an INDEX is CLUSTERED (this also applies to PRIMARY KEYs), the data is actually physically stored in the order of the INDEX. This means that knowing you want position x in the INDEX also implicity means you want position x in the TABLE.
If the INDEX is not clustered, however, the INDEX is just providing a lookup for you. Effectively saying position x in the INDEX corresponds to position y in the TABLE.
The importance here is when accessing fields not specified in the INDEX. Doing so means you have to actually go to the TABLE to get the data. In the case of a CLUSTERED INDEX, you're already there, the overhead of finding that field is pretty low. If the INDEX isn't clustered, however, you effectifvely have to JOIN the TABLE to the INDEX, then find the field you're interested in.
Note; Having a composite index on (id_dernier_fichier, codega) is very different from having one index on just (id_dernier_fichier) and a seperate index on just (codega).
In the case of your query, I don't think you need to change the code at all. But you may benefit from changing the indexes.
You mention that you want to access many fields. Putting all those fields in a composite index is porbably not the best solution. Instead you may want to create a CLUSTERED INDEX on (id_dernier_fichier). This will mean that once the *id_dernier_fichier* has been located, you're already in the right place to get all the other fields as well.
EDIT Note About MySQL and CLUSTERED INDEXes
13.2.10.1. Clustered and Secondary Indexes
Every InnoDB table has a special index called the clustered index where the data for the rows is stored:
If you define a PRIMARY KEY on your table, InnoDB uses it as the clustered index.
If you do not define a PRIMARY KEY for your table, MySQL picks the first UNIQUE index that has only NOT NULL columns as the primary key and InnoDB uses it as the clustered index.
If the table has no PRIMARY KEY or suitable UNIQUE index, InnoDB internally generates a hidden clustered index on a synthetic column containing row ID values. The rows are ordered by the ID that InnoDB assigns to the rows in such a table. The row ID is a 6-byte field that increases monotonically as new rows are inserted. Thus, the rows ordered by the row ID are physically in insertion order.