InnoDB Locking - Does record lock use indexes? - mysql

https://dev.mysql.com/doc/refman/8.0/en/innodb-locking.html#innodb-intention-locks
Record Locks
A record lock is a lock on an index record. For example, SELECT c1
FROM t WHERE c1 = 10 FOR UPDATE; prevents any other transaction from
inserting, updating, or deleting rows where the value of t.c1 is 10.
Record locks always lock index records, even if a table is defined
with no indexes. For such cases, InnoDB creates a hidden clustered
index and uses this index for record locking. See Section 15.6.2.1,
“Clustered and Secondary Indexes”.
An index is a data structure (behind the scenes it SEES like a small table where each record contains a column with the primary key of the original record, another column with the page where the original record is located in the original table among other columns) from what I understand,
so index record refers to a "node" of that index which is a data structure?
So, you mean record lock uses INDEXES "by default" to PROVIDE MORE PERFORMANCE?

I guess, to understand that sentence, you need to know that InnoDB always stores table data in b-trees, e.g. in indexes, see Clustered and Secondary Indexes:
Each InnoDB table has a special index called the clustered index that stores row data.
[...]
If a table has no PRIMARY KEY or suitable UNIQUE index, InnoDB generates a hidden clustered index named GEN_CLUST_INDEX on a synthetic column that contains row ID values.
So this index exists anyway, and For such cases, InnoDB creates a hidden clustered index and uses this index for record locking, implying the index is created just for locking, might throw you a bit off track.
So to answer your question: MySQL does not lock the index instead of the record because it would provide more performance, but because "locking the record" and "locking the entry in the clustered index" are equivalent.
In addition, MySQL can and will also place locks on secondary indexes. These are your data structures that point to a record in the original table by providing the primary key (or the GEN_CLUST_INDEX). But note that no "page where the original record is located" is needed for this (for InnoDB).

Related

how is consistency ensured when using indexes?

For select statements like
select * from table where indexed_col='abc';
sql would go to index table and fetch row address and return required.
But what about dml statements like
update table set indexed_col='abc' where condition;
how is consistency ensured between table and indexed table?
MySQL updates all indexes that include the column indexed_col when you update that column.
It must update the clustered index (aka primary key index) of course. The columns of a given row are stored in the leaf node of the clustered index, so changing any column requires updating that index.
Other unique indexes on the same table that include the updated column must be updated at the same time. In other words, when you execute UPDATE, the time it takes for that statement to execute includes the time to update the clustered index and also any unique indexes that include the column indexed_col.
For non-unique secondary indexes, MySQL's default storage engine InnoDB uses a change buffer, which is a temporary list of pending changes to those indexes. When you update the column indexed_col, MySQL adds an entry to the change buffer for each index that column is part of. Then it calls the execution of your UPDATE done, and returns control to the client.
If you subsequently do a SELECT query as you show, MySQL checks both the table's indexes and the change buffer. Any entries in the change buffer for that index take priority, since they reflect more recent changes.
Eventually, MySQL runs a background thread to merge change buffer entries into the respective index.

What is the default order of a MySQL Table before creating a Clustered Index?

So I am just learning about clustered/nonclustered indexes.
Now I read that clustered indexes order the data physically by i.e. the primary key.
But why would this even be necessary? Isn't the table ordered by the ID (Primary Key) by default? Because you start with record A (ID 1) then record B (ID 2) and so on. They are always sorted. Why is there a need for clustered indexes?
Tables are not sorted. While an auto incremented ID is issued in ascending order, the DBMS is free to store the record wherever there is place on the disk. And if you query table data without an ORDER BY clause, you may get the rows in any old order.
An index on the ID can be used to find these rows quickly. It is very fast to find an ID in the index and the index tells you which row to read from the table.
If your table is all about finding a row by ID quickly, which is typical for mere lookup tables, say a table with all country names, you can instead make this a clusterted index.
"Clustered index" simply means that the whole table data is inside the index structure, so instead of searching the index and then get to the table row, you get the row straight away. Oracle has come up with a better name for this in my opinion; they call this "index organized table".

Is the primary key stored implicitly in other keys in mysql myisam engine?

My problem: imagine a table with millions of rows, like
CREATE TABLE a {
id INT PRIMARY KEY,
column2..,
column3..,
many other columns..
..
INDEX (column2);
and a query like this:
SELECT id FROM a WHERE column2 > 10000 LIMIT 1000 OFFSET 5000;
My question: does mysql only use the index "column2" (so the primary key id is implicitly stored as a reference in other indexes), or does it have to fetch all rows to get also the id, which is selected for output? In that case the query should be much faster with a key declared as:
INDEX column2(column2, id)
Short answer: No.
Long answer:
MyISAM, unlike InnoDB, has a "pointer" to the data in the leaf node of each index, including that PRIMARY KEY.
So, INDEX(col2) is essentially INDEX(col2, ptr). Ditto for INDEX(id) being INDEX(id, ptr).
The "pointer" is either a byte offset into the .MYD file (for DYNAMIC) or record number (for FIXED). In either case, the pointer leads to a "seek" into the .MYD file.
The pointer defaults to a 6-byte number, allowing for a huge number of rows. It can be changed by a setting, either for saving space or allowing an even bigger number of rows.
For your particular query, INDEX(col2, id) is optimal and "covering". It is better than INDEX(col2) for MyISAM, but they are equivalent for InnoDB, since InnoDB implicitly has the PK in each secondary index.
The query will have to scan at least 5000+1000 rows, at least in the index's BTree.
Note that InnoDB's PRIMARY KEY is clustered with the data, but MyISAM's PRIMARY KEY is a separate BTree, just like other secondary indexes.
You really should consider moving to InnoDB; there is virtually no reason to use MyISAM today.
An index on column2 is required. Your suggestion with id in the index will prevent table scans and should be very efficient.
Further more it is faster to do this assuming that column2 is a continuous sequence:
SELECT id FROM a WHERE column2 > 15000 LIMIT 1000;
This is because to work with the offset it would just have to scan the next 5000 records (MySQL does not realize that you are actually offsetting column2).

Overhead of Composite Indexes

I have many tables where I have indexes on foreign keys, and clustered indexes which include those foreign keys. For example, I have a table like the following:
TABLE: Item
------------------------
id PRIMARY KEY
owner FOREIGN KEY
status
... many more columns
MySQL generates indexes for primary and foreign keys, but sometimes, I want to improve query performance so I'll create clustered or covering indexes. This leads to have indexes with overlapping columns.
INDEXES ON: Item
------------------------
idx_owner (owner)
idx_owner_status (owner, status)
If I dropped idx_owner, future queries that would normally use idx_owner would just use idx_owner_status since it has owner as the first column in the index.
Is it worth keeping idx_owner around? Is there an additional I/O overhead to use idx_owner_status even though MySQL only uses part of the index?
Edit: I am really only interested in the way InnoDB behaves regarding indexes.
Short Answer
Drop the shorter index.
Long Anwser
Things to consider:
Drop it:
Each INDEX is a separate BTree that resides on disk, so it takes space.
Each INDEX is updated (sooner or later) when you INSERT a new row or an UPDATE modifies an indexed column. This takes some CPU and I/O and buffer_pool space for the 'change buffer'.
Any functional use (as opposed to performance) for the shorter index can be performed by the longer one.
Don't drop it:
The longer index is bulkier than the shorter one. So it is less cacheable. So (in extreme situations) using the bulkier one in place of the shorter one could cause more I/O. A case that aggravates this: INDEX(int, varchar255).
It is very rare that the last item really overrides the other items.
Bonus
A "covering" index is one that contains all the columns mentioned in a SELECT. For example:
SELECT status FROM tbl WHERE owner = 123;
This will touch only the BTree for INDEX(owner, status), thereby being noticeably faster than
SELECT status, foo FROM tbl WHERE owner = 123;
If you really need that query to be faster, then replace both of your indexes with INDEX(owner, status, foo).
PK in Secondary key
One more tidbit... In InnoDB, the columns of the PRIMARY KEY are implicitly appended to every secondary key. So, the three examples are really
INDEX(owner, id)
INDEX(owner, status, id)
INDEX(owner, status, foo, id)
More discussion in my blogs on composite indexes and index cookbook.

MySql Query very slow

I run the following query on my database :
SELECT e.id_dernier_fichier
FROM Enfants e JOIN FichiersEnfants f
ON e.id_dernier_fichier = f.id_fichier_enfant
And the query runs fine. If I modifiy the query like this :
SELECT e.codega
FROM Enfants e JOIN FichiersEnfants f
ON e.id_dernier_fichier = f.id_fichier_enfant
The query becomes very slow ! The problem is I want to select many columns in table e and f, and the query can take up to 1 minute ! I tried different modifications but nothing works. I have indexes on id_* also on e.codega. Enfants has 9000 lines and FichiersEnfants has 20000 lines. Any suggestions ?
Here are the info asked (sorry not having shown them from the beginning) :
The difference in performance is possibly due to e.id_dernier_fichier being in the index used for the JOIN, but e.codega not being in that index.
Without a full definition of both tables, and all of their indexes, it's not possible to tell for certain. Also, including the two EXPLAIN PLANs for the two queries would help.
For now, however, I can elaborate on a couple of things...
If an INDEX is CLUSTERED (this also applies to PRIMARY KEYs), the data is actually physically stored in the order of the INDEX. This means that knowing you want position x in the INDEX also implicity means you want position x in the TABLE.
If the INDEX is not clustered, however, the INDEX is just providing a lookup for you. Effectively saying position x in the INDEX corresponds to position y in the TABLE.
The importance here is when accessing fields not specified in the INDEX. Doing so means you have to actually go to the TABLE to get the data. In the case of a CLUSTERED INDEX, you're already there, the overhead of finding that field is pretty low. If the INDEX isn't clustered, however, you effectifvely have to JOIN the TABLE to the INDEX, then find the field you're interested in.
Note; Having a composite index on (id_dernier_fichier, codega) is very different from having one index on just (id_dernier_fichier) and a seperate index on just (codega).
In the case of your query, I don't think you need to change the code at all. But you may benefit from changing the indexes.
You mention that you want to access many fields. Putting all those fields in a composite index is porbably not the best solution. Instead you may want to create a CLUSTERED INDEX on (id_dernier_fichier). This will mean that once the *id_dernier_fichier* has been located, you're already in the right place to get all the other fields as well.
EDIT Note About MySQL and CLUSTERED INDEXes
13.2.10.1. Clustered and Secondary Indexes
Every InnoDB table has a special index called the clustered index where the data for the rows is stored:
If you define a PRIMARY KEY on your table, InnoDB uses it as the clustered index.
If you do not define a PRIMARY KEY for your table, MySQL picks the first UNIQUE index that has only NOT NULL columns as the primary key and InnoDB uses it as the clustered index.
If the table has no PRIMARY KEY or suitable UNIQUE index, InnoDB internally generates a hidden clustered index on a synthetic column containing row ID values. The rows are ordered by the ID that InnoDB assigns to the rows in such a table. The row ID is a 6-byte field that increases monotonically as new rows are inserted. Thus, the rows ordered by the row ID are physically in insertion order.