how is consistency ensured when using indexes? - mysql

For select statements like
select * from table where indexed_col='abc';
sql would go to index table and fetch row address and return required.
But what about dml statements like
update table set indexed_col='abc' where condition;
how is consistency ensured between table and indexed table?

MySQL updates all indexes that include the column indexed_col when you update that column.
It must update the clustered index (aka primary key index) of course. The columns of a given row are stored in the leaf node of the clustered index, so changing any column requires updating that index.
Other unique indexes on the same table that include the updated column must be updated at the same time. In other words, when you execute UPDATE, the time it takes for that statement to execute includes the time to update the clustered index and also any unique indexes that include the column indexed_col.
For non-unique secondary indexes, MySQL's default storage engine InnoDB uses a change buffer, which is a temporary list of pending changes to those indexes. When you update the column indexed_col, MySQL adds an entry to the change buffer for each index that column is part of. Then it calls the execution of your UPDATE done, and returns control to the client.
If you subsequently do a SELECT query as you show, MySQL checks both the table's indexes and the change buffer. Any entries in the change buffer for that index take priority, since they reflect more recent changes.
Eventually, MySQL runs a background thread to merge change buffer entries into the respective index.

Related

InnoDB Locking - Does record lock use indexes?

https://dev.mysql.com/doc/refman/8.0/en/innodb-locking.html#innodb-intention-locks
Record Locks
A record lock is a lock on an index record. For example, SELECT c1
FROM t WHERE c1 = 10 FOR UPDATE; prevents any other transaction from
inserting, updating, or deleting rows where the value of t.c1 is 10.
Record locks always lock index records, even if a table is defined
with no indexes. For such cases, InnoDB creates a hidden clustered
index and uses this index for record locking. See Section 15.6.2.1,
“Clustered and Secondary Indexes”.
An index is a data structure (behind the scenes it SEES like a small table where each record contains a column with the primary key of the original record, another column with the page where the original record is located in the original table among other columns) from what I understand,
so index record refers to a "node" of that index which is a data structure?
So, you mean record lock uses INDEXES "by default" to PROVIDE MORE PERFORMANCE?
I guess, to understand that sentence, you need to know that InnoDB always stores table data in b-trees, e.g. in indexes, see Clustered and Secondary Indexes:
Each InnoDB table has a special index called the clustered index that stores row data.
[...]
If a table has no PRIMARY KEY or suitable UNIQUE index, InnoDB generates a hidden clustered index named GEN_CLUST_INDEX on a synthetic column that contains row ID values.
So this index exists anyway, and For such cases, InnoDB creates a hidden clustered index and uses this index for record locking, implying the index is created just for locking, might throw you a bit off track.
So to answer your question: MySQL does not lock the index instead of the record because it would provide more performance, but because "locking the record" and "locking the entry in the clustered index" are equivalent.
In addition, MySQL can and will also place locks on secondary indexes. These are your data structures that point to a record in the original table by providing the primary key (or the GEN_CLUST_INDEX). But note that no "page where the original record is located" is needed for this (for InnoDB).

Determining partitioning key in range based partitioning of a MySQL Table

I've been researching for a while regarding database partitioning in MySQL. Since I have one ever-growing table in my DB, I thought of using partitioning as an effective tool to optimize it. I'm only interested in retaining recent data (say last 6 months) and the table has a column name 'CREATED_AT' (TIMESTAMP, NON-PRIMARY), the approach which popped up in my mind is as follows
Create a time-based range partition on the table by using 'CREATED_AT' as the partition key.
Run a DB level Event periodically and drop partitions which are obsolete. ( older than 6 months).
However, the partition can only be realized if I make 'CREATED_AT' field as primary. But doesn't it violate the primary key principle? since the same field is non-unique and can have tons of rows with the same value, doesn't marking it as primary turn out to be an anti-pattern? Is there any workaround to acheive time based ranged partitioning in this scenario?
This is a problem that prevents many MySQL users from using partitioning.
The column you use for your partitioning key must be in every PRIMARY KEY or UNIQUE KEY of the table. It doesn't have to be the only column in those keys (because keys can be multi-column), but it has to be part of every unique key.
Still, in many tables it would violate the logical design of the table. So partitioning is not practical.
You could grit your teeth and design a table with partitions that has a compromised design:
create table mytable (
id bigint auto_increment not null,
created_at datetime not null,
primary key (id, created_at)
) partition by range columns (created_at) (
partition p20190101 values less than ('2019-01-01'),
partition p20190201 values less than ('2019-02-01'),
partition p20190301 values less than ('2019-03-01'),
partition p20190401 values less than ('2019-04-01'),
-- etc...
partition pMAX values less than (MAXVALUE)
);
I tested this table and there's no error when I define it. Even though this table technically allows multiple rows with the same id value if they have different timestamps, in practice you can code your application to just let id values be auto-incremented, and never change the id. As long as your code is the only application that inserts data, you can more or less have some assurance that the data doesn't contain multiple rows with the same id.
You might think you can add a secondary unique key constraint to enforce that id must be unique by itself. But this violates the partitioning rules:
mysql> alter table mytable add unique key (id);
ERROR 1503 (HY000): A UNIQUE INDEX must include all columns in the table's partitioning function
You just have to trust that your application won't insert invalid data.
Or else forget about using partitioning, and instead just add an index to the created_at column, and use incremental DELETE instead of using DROP PARTITION to prune old data.
The latter strategy is what I see used in almost every case. Usually, it's important to have the RDBMS enforce strict uniqueness on the id column. It's not safe to allow this uniqueness to be unenforced.
Re your comment:
Isn't dropping of an entire partition a much cheaper operartion than performing incremental deletes?
Yes and no.
DELETE can be rolled back, so it results in some overhead, like temporarily storing data in the rollback segment. On the other hand, it locks only the rows that match the index search.
Dropping a partition doesn't do rollback, so there are some steps it can skip. But it does an ALTER TABLE, so it needs to first acquire a metadata lock on the whole table. Any concurrent query, either read or write, will block that and be blocked by it.
Demo:
Open two MySQL client windows. In the first session do this:
mysql> START TRANSACTION;
mysql> SELECT * FROM mytable;
This holds a metadata lock on the table, which blocks things like ALTER TABLE.
In the second window:
mysql> ALTER TABLE mytable DROP PARTITION p20190101;
<pauses, waiting for the metadata lock held by the first session!>
You can even open a third session and do this:
mysql> SELECT * FROM mytable;
<also pauses>
The second SELECT is waiting behind the ALTER TABLE. They are both queued for the metadata lock.
If I commit the first SELECT, then the ALTER TABLE finally finishes:
mysql> ALTER TABLE mytable DROP PARTITION p20190101;
Query OK, 0 rows affected (6 min 25.25 sec)
That 6 min 25 sec isn't because it takes a long time to do the DROP PARTITION. It's because I had left my transaction uncommitted that long while writing this post.
Metadata lock waits don't time out like an InnoDB row lock, which times out after 50 seconds. The default metadata lock timeout is 1 year! See https://dev.mysql.com/doc/refman/8.0/en/server-system-variables.html#sysvar_lock_wait_timeout
Statements like ALTER TABLE, DROP TABLE, RENAME TABLE, and even things like CREATE TRIGGER need to acquire a metadata lock.
So in some cases, depending on if you have long-running transactions holding onto metadata locks, it could be better for your concurrent throughput to use DELETE to remove data incrementally, even if it takes longer.

Will MySQL use Multiple-column index if I use columns in different order?

Reading the MySQL docs we see this example table with multiple-column index name:
CREATE TABLE test (
id INT NOT NULL,
last_name CHAR(30) NOT NULL,
first_name CHAR(30) NOT NULL,
PRIMARY KEY (id),
INDEX name (last_name,first_name)
);
It is explained with examples in which cases the index will or will not be utilized. For example, it will be used for such query:
SELECT * FROM test
WHERE last_name='Widenius' AND first_name='Michael';
My question is, would it work for this query (which is effectively the same):
SELECT * FROM test
WHERE first_name='Michael' AND last_name='Widenius';
I couldn't find any word about that in the documentation - does MySQL try to swap columns to find appropriate index or is it all up to the query?
Should be the same because (from mysql doc) the query optiminzer work looking at
Each table index is queried, and the best index is used unless the
optimizer believes that it is more efficient to use a table scan. At
one time, a scan was used based on whether the best index spanned more
than 30% of the table, but a fixed percentage no longer determines the
choice between using an index or a scan. The optimizer now is more
complex and bases its estimate on additional factors such as table
size, number of rows, and I/O block size.
http://dev.mysql.com/doc/refman/5.7/en/where-optimizations.html
In some cases, MySQL can read rows from the index without even
consulting the data file.
and this should be you case
Without ICP, the storage engine traverses the index to locate rows in
the base table and returns them to the MySQL server which evaluates
the WHERE condition for the rows. With ICP enabled, and if parts of
the WHERE condition can be evaluated by using only fields from the
index, the MySQL server pushes this part of the WHERE condition down
to the storage engine. The storage engine then evaluates the pushed
index condition by using the index entry and only if this is satisfied
is the row read from the table. ICP can reduce the number of times the
storage engine must access the base table and the number of times the
MySQL server must access the storage engine.
http://dev.mysql.com/doc/refman/5.7/en/index-condition-pushdown-optimization.html
For the two queries you stated, it will work the same.
However, for queries which have only one of the columns, the order of the index matters.
For example, this will use the index:
SELECT * FROM test WHERE last_name='Widenius';
But this wont:
SELECT * FROM test WHERE first_name='Michael';

Use of (INDEX(0)) in sql query

I run this query on SQL Server 2008:
SELECT *
FROM Dealers WITH (INDEX(0))
WHERE ID = 'rrsdsd'
But the execution plan shows that it used the clustered index of the table.
Why so?
What you are telling SQL Server is to execute the query not using any indexes. Therefore, I would expect the query plan to show the clustered index being used (because that is your data), but a scan being done and not a seek. Is this the case?
According to the documentation:
If a clustered index exists, INDEX(0) forces a clustered index scan and INDEX(1) forces a clustered index scan or seek.
This is exactly what you are seeing.
Documentation says it all...
INDEX (index_value [,... n ] ) | INDEX = ( index_value) The INDEX()
syntax specifies the names or IDs of one or more indexes to be used by
the query optimizer when it processes the statement. The alternative
INDEX = syntax specifies a single index value. Only one index hint per
table can be specified.
If a clustered index exists, INDEX(0) forces a clustered index scan and INDEX(1) forces a clustered index scan or seek. If no
clustered index exists, INDEX(0) forces a table scan and INDEX(1) is
interpreted as an error.
If multiple indexes are used in a single hint list, the duplicates are
ignored and the rest of the listed indexes are used to retrieve the
rows of the table. The order of the indexes in the index hint is
significant. A multiple index hint also enforces index ANDing, and the
query optimizer applies as many conditions as possible on each index
accessed. If the collection of hinted indexes do not include all
columns referenced by the query, a fetch is performed to retrieve the
remaining columns after the SQL Server Database Engine retrieves all
the indexed columns.

MySql Query very slow

I run the following query on my database :
SELECT e.id_dernier_fichier
FROM Enfants e JOIN FichiersEnfants f
ON e.id_dernier_fichier = f.id_fichier_enfant
And the query runs fine. If I modifiy the query like this :
SELECT e.codega
FROM Enfants e JOIN FichiersEnfants f
ON e.id_dernier_fichier = f.id_fichier_enfant
The query becomes very slow ! The problem is I want to select many columns in table e and f, and the query can take up to 1 minute ! I tried different modifications but nothing works. I have indexes on id_* also on e.codega. Enfants has 9000 lines and FichiersEnfants has 20000 lines. Any suggestions ?
Here are the info asked (sorry not having shown them from the beginning) :
The difference in performance is possibly due to e.id_dernier_fichier being in the index used for the JOIN, but e.codega not being in that index.
Without a full definition of both tables, and all of their indexes, it's not possible to tell for certain. Also, including the two EXPLAIN PLANs for the two queries would help.
For now, however, I can elaborate on a couple of things...
If an INDEX is CLUSTERED (this also applies to PRIMARY KEYs), the data is actually physically stored in the order of the INDEX. This means that knowing you want position x in the INDEX also implicity means you want position x in the TABLE.
If the INDEX is not clustered, however, the INDEX is just providing a lookup for you. Effectively saying position x in the INDEX corresponds to position y in the TABLE.
The importance here is when accessing fields not specified in the INDEX. Doing so means you have to actually go to the TABLE to get the data. In the case of a CLUSTERED INDEX, you're already there, the overhead of finding that field is pretty low. If the INDEX isn't clustered, however, you effectifvely have to JOIN the TABLE to the INDEX, then find the field you're interested in.
Note; Having a composite index on (id_dernier_fichier, codega) is very different from having one index on just (id_dernier_fichier) and a seperate index on just (codega).
In the case of your query, I don't think you need to change the code at all. But you may benefit from changing the indexes.
You mention that you want to access many fields. Putting all those fields in a composite index is porbably not the best solution. Instead you may want to create a CLUSTERED INDEX on (id_dernier_fichier). This will mean that once the *id_dernier_fichier* has been located, you're already in the right place to get all the other fields as well.
EDIT Note About MySQL and CLUSTERED INDEXes
13.2.10.1. Clustered and Secondary Indexes
Every InnoDB table has a special index called the clustered index where the data for the rows is stored:
If you define a PRIMARY KEY on your table, InnoDB uses it as the clustered index.
If you do not define a PRIMARY KEY for your table, MySQL picks the first UNIQUE index that has only NOT NULL columns as the primary key and InnoDB uses it as the clustered index.
If the table has no PRIMARY KEY or suitable UNIQUE index, InnoDB internally generates a hidden clustered index on a synthetic column containing row ID values. The rows are ordered by the ID that InnoDB assigns to the rows in such a table. The row ID is a 6-byte field that increases monotonically as new rows are inserted. Thus, the rows ordered by the row ID are physically in insertion order.