What is stored in the leaf node of clustered index

What is stored in the leaf node of clustered index - mysql

I understand that in the leaf node of clustered index the table record is stored together with say primary key.
But I found some articles stated that primary key is stored with block address of real record instead of real table record.
Could you tell me which is correct?
(1)store block address
(2)store real data

Be careful what you read. Be sure the article talks about "MySQL" and its main 'engine' "InnoDB".
primary key is stored with block address of real record instead of real table record.
Several entire rows are stored in each leaf node (block) of the data's B+Tree. That BTree is ordered by the PRIMARY KEY, which is (obviously) part of the row.
The only "block addresses" are the links you have in both of your diagrams.
I vote for your number 2 diagram, with these provisos:
There is a 4-column row with id=6 and other columns of James, 37, LA.
The row with id=15 is not fully shown. That is, you left out the other 3 columns.
A "block" is 16KB and can hold between 1 and several hundred rows, depending on
size of rows,
whether rows have been deleted, leaving 'free' space,
etc.
(100 rows per block for either data or index is a simple Rule of Thumb.)

In the context of mysql and innodb, from the mysql official page
https://dev.mysql.com/doc/refman/8.0/en/innodb-index-types.html
Each InnoDB table has a special index called the clustered index that stores row data.
If a table is large, the clustered index architecture often saves a disk I/O operation when compared to storage organizations that store row data using a different page from the index record.
Based on above facts, especially number 2, I believe #2 is the correct one. From my side the reasons are
(1)save one time I/O.
If leaf node save the page address, there will be one more time of I/O to fetch the record.
(2)more maintainability.
If page split happened and the leaf node save the page address only, there will be a lot of trouble for clustered index to update the record data page address.
However, the reason why I think #1 has points is that saving address only is cheaper than saving whole row of record data and thus store more index.

Related

Is a MySQL SELECT query slower if a table has lots of other data?

Suppose I have two tables (with the same amount of rows). The first one only has one column, say ID. The second one has the same column plus an additional one, say Text (of type longtext) which may contain a lot of data.
Now consider a query of the form SELECT ID FROM table WHERE ID>=7.
Question: Is this query slower on the second table then on the first one? The query does not use the Text column, but maybe this data still affects the performance of the query.

In this simple query the most part of the query execution time is a time needed for reading the data from disk, and total time is approximately equal to disk reading time.
Case 1. The column is not indexed in both tables.
The server should scan (and hence should read) the whole table for to find the rows matched the condition. In this case the time needed for the query execution is approximately proportional to the disk size occupied by the table.
Case 2. The column is clustered index expression in both tables.
This can be when this column is the only column in the primary key expression or, if the primary key is not defined, when this column is the only column in the unique key expression, this unique key is the most first index defined in the table structure (physically in CREATE TABLE code), and this column is defined as NOT NULL.
The server should read the table part starting from the block which contains the row with first value matched the condition for to find the rows matched this condition. The time needed for the query execution is approximately proportional to the disk size occupied by the table part to be read, so it is approximately proportional to the disk size occupied by the whole table. But when the table on-disk body pages contains too many "empty" elements (deleted rows) then the difference is unpredictable.
Case 3. The column is indexed as secondary index in both tables.
In this case the index is separate on-disk structure, and the server will read this index only. When the index expressions are equal, and the datatypes are equal too, then the disk sizes occupied by these indexes are approximately equal too. And the time needed for both queries execution will be approximately the same.
Case 4. In one table this column is clustered index expression and in another table it is secondary index expression.
The disk size needed for secondary index is less than the disk size needed for clustered index. So the query to the table in which this column is secondary index will be faster.

(My "cases" are different than Akina's)
Assuming (Please add to the Question):
id is the PRIMARY KEY on each table.
Both tables are InnoDB.
Let's look at the structure of each:
The id-only table is a long list of ids in a B+Tree. Aside from InnoDB overhead, there is nothing extra.
Each row in the other table has two things (plus overhead): id, and either the text value (if short enough) or a "pointer" to an "off-record" place containing the text. (The cutoff depends on ROW_FORMAT, MySQL version, and the phase of the moon. Note that each row, therefore, is bulkier than the id-only table, but id tricky to predict how much bulkier.
Conclusion 1: The SELECT id FROM with_text will need to read more blocks because of the text column. The number of blocks for the second table is likely to be twice as many, even without fetching the text.
We need more assumptions before we can launch into answering the question:
Will the data (excluding the off-record parts) fit in the buffer_pool? If not, there will be a lot of I/O.
Is this the first run of your 'query'? If the data will fit into the buffer pool, the first time you run it, it will be slow; subsequent times will have no I/O, hence be faster.
Will the rows be inserted in primary-key order? "In order" will be up to twice as densely packed.
There are no secondary indexes. If there were, the SELECT id ... might use the index instead. That would mess up the block count, hence speed.
Conclusion 2: Nothing is trivial.
I spelled out details so that you would hesitate to apply the conclusions to some other pair of tables.
In particular, INDEX(foo) is like the SELECT id FROM without_text case.

To echo "PM-77's" reply: if ID is a primary key, then it necessarily has an (implicit ...) index. The query will make use of this index.
Therefore, any other (irrelevant) structural differences will not affect the performance of the query.

Are database records with sequential primary autoincrement keys likely to be on the same page in an INNODB table?

Given a table with large auto-incremented integer primary keys, I would like to know if transactions on multiple rows with consecutive or near consecutive primary keys are likely to result in fewer disk IO operations than the same transactions on the same number of rows with values that are more widely distributed.
So for example:
How would a SELECT WHERE id IN statement on records with the primary keys:
10202, 10203, 10205, 10207, 10208, 10209, compare to the same statement with the keys: 7, 10202, 52401, 28772, 924, 1189, assuming all the records are considerably smaller than 1/6th of the page size?

As #Gordan pointed out the records are stored ordered by the primary key. It doesn't matter in what order they're inserted though.
I'd say InnoDB is likely to make less IO requests if the primary key values are clustered. Let me explain why.
Let's say all InnoDB pages are on disk, there is nothing in the buffer pool. To pull record 10202 InnoDB will need to read the root page, all non-leaf pages along the way to the leaf page. All pages it read will be stored in the buffer pool.
Next read - 10203. Most likely InnoDB won't do any disk read because all pages are already in the buffer pool (unless a less likely case when 10203 is on a neighboring leaf page).
Now, if InnoDB needs to read some id from other part of the index, it will need to read the root page (most likely cached), non-leaf pages (some of them are most likely cached. Take into account B+tree is quite shallow), and the leaf page (not cached). So, you get an extra disk read.

Yes they are, if they were inserted in that order. In reality, however, overall database I/O pattern is pretty random, so the net benefit may be difficult to measure conclusively.

How search key is searched in the "index file"

What I have understood till now is that in database data is actually stored on hard disk in blocks in files, and an index points to the block of the file where data is actually stored.
Now, what I am wondering is that how a search key is searched in the index files, suppose my query is select empname from employee where empid = 12345 and I have index on empid then what I think is that "index file" will contain all the employee id's. and then how empid 12345 will be searched in that, sequentially?

an index points to the block of the file where data is actually
stored.
A non-clustered index points to where the data is actually stored
There are two types of index:
A clustered index - where the rows are physically stored in the disk
in the order specified in the index.
A non clustered index - a reference to a data page where the data
is stored.
A table can only have one clustered index (as the data can only be physically stored one way) but can have many non-clustered indexes.
The process of finding the data using the index you have described is called an index seek
Consider the diagram below which is similar to your example
The index seek will start at the root node (Level 2 on the diagram) To find your example of 12345 we have to go down to level 1 via the left branch of the tree because the value we are looking for falls into that value range.
We then have multiple leaf nodes to find our value on level 0 (the diagram shows three for simplicity but there could be hundreds) each leaf node in the diagram holds 2000 values so we then go to the one which holds 12345. We then search each value in that range and check if it is 12345.
Once we have our value, what happens next depends on whether the index is a clustered or non-clustered index, if it is a clustered index, we retrieve the value, if it is a non clustered index, we go to the datapage it refers to and then retrieve the value.
Example diagram taken from here

How does MySQL determine if an INSERT is unique?

I would like to know if there is an implicit SELECT being run prior to performing an INSERT on a table that has any column defined as UNIQUE. I cannot find anything about this in the documentation for INSERT.
I have asked some other questions that nobody seems to be able to answer - perhaps because I'm not properly explaining myself - that are related to the above question.
If I understand correctly, then I assume the following would be true:
CASE 1:
You have a table with 1 billion rows. Each row has a UUID column which is unique. If you perform an insert the server must do some kind of implicit SELECT COUNT(*) FROM table WHERE UUID = [new uuid] and determine if the count is 0 or 1. Correct?
CASE 2:
You have a table with 1 billion rows. Each row has a composite unique key consisting of a DATE and a UUID. If you perform an insert the server must do some kind of implicit SELECT COUNT(*) FROM table WHERE DATE = [date] AND UUID = [new uuid] and check if the count is 0 or 1. Yes?
I use the word implicit because at some point, somewhere in the process, the server MUST be checking the value. If not it would require that the laws of physics dictate that two identical rows cannot exist - and as far as I'm informed physics don't play a big role when it comes to the uniqueness of numbers written down somewhere, in binary, on a magnetic disk in a computer.
Let's assume your 1 billion rows are equally and sequentially distributed across 2,000 different dates. Would this not mean that case 2 would perform the insert faster because it can look up the UUIDs segmented into dates? If not, then would it be better to use case 1 for insert speed - and in that case, why?
This question is theoretical, so don't bother with considering regular SELECT performance in this case. The primary key wouldn't be the UUID+DATE index.
As a response to comments: The UUID in my case is designed solely for the purpose of avoiding duplicate entries because of bad connections. Since you cannot make the same entry for a different date twice (without it logically being a new entry), the UUID does not need to be globally unique - it needs only be unique for each date. This is why I can permit it being part of a composite key.

There are a few flaws and misconceptions in the previous answers; rather than point them out, I will start from scratch.
Referring to InnoDB only...
An INDEX (including UNIQUE and PRIMARY KEY) is a BTree. BTrees are very efficient a locating one row based on the key the BTree is sorted on. (It is also efficient at scanning in key-order.) The "fan out" of a typical BTree in MySQL is on the order of 100. So, for a million rows, the BTree is about 3 levels deep (log100(million)); for a trillion rows, it is only twice as deep (approximately). So, even if nothing is cached, it takes only 3 disk hits to locate one particular row in a million-row index.
I am being loose here with "index" versus "table" because they are essentially the same (in InnoDB, at least). Both are BTrees. What differs is what is in the leaf nodes: The leaf nodes of a table BTree has all the columns. (I am ignoring the off-block storage for TEXT/BLOB in InnoDB.) An INDEX (other than the PRIMARY KEY) has a copy of the PRIMARY KEY in the leaf node. This is how a secondary key can get from the INDEX BTree to the rest of the row's columns, and how InnoDB does not have to store multiple copies of all the columns.
The PRIMARY KEY is "clustered" with the data. That is one BTree contains both all the columns of all the rows, and it is ordered according to the PRIMARY KEY specification.
Locating a record by PRIMARY KEY is one BTree search. Locating a record by a SECONDARY KEY is two BTree searches, one in the secondary INDEX's BTree which gives you the PRIMARY KEY; then a second one to drill down the data/PK BTree.
PRIMARY KEY(UUID)... Since the UUID is very random, the "next" row you INSERT will be located at a 'random' spot. If the table is much bigger than be cached in the buffer_pool, the block the new row needs to go into is very likely to not be cached. This leads to a disk hit to pull the block into cache (the buffer pool), and eventually another disk hit to write it back to disk.
Since a PRIMARY KEY is a UNIQUE KEY, something else is going on at the same time (No SELECT COUNT(*) etc). The UNIQUEness is checked after the block is fetched and before deciding whether to give a "duplicate key" error, or to store the row. Also, if the block is "full" then the block will need to be 'split' to make room for the new row.
INDEX(UUID) or UNIQUE(UUID)... There is a BTree for that index. On INSERT, some randomly located block will need to be fetched, modified, possibly split, and written back to disk, very much like the PK discussion above. If you had UNIQUE(UUID), there would also be a check for UNIQUEness and possibly an error message. In either case, there is, now and/or later, disk I/O.
AUTO_INCREMENT PK... If the PRIMARY KEY is an auto_increment, then new records are added to the 'last' block in the data BTree. When it gets full (every 100 or so records) there is (logically) a block split and flush of the old block to disk. (Actually, the I/O is probably delayed and done in the background.)
PRIMARY KEY(id) + UNIQUE(UUID)... Two BTrees. On an INSERT, there is activity in both. This is likely to be worse than simply PRIMARY KEY(UUID). Add up the disk hits above to see what I mean.
"Disk hits" are the killer in huge tables, and especially with UUIDs. "Count the disk hits" to get a feel for performance, especially when comparing two possible techniques.
Now for your secret sauce... PRIMARY KEY(date, UUID)... You are allowing the same UUID to show up on two different days. This can help! Back to how a PK works and checking for UNIQUEness... The "compound" index (date, UUID) is checked for UNIQUEness as the record is inserted. The records are sorted by date+UUID, so all of today's records are clumped together. IF (and this might be a big IF) one day's data fits in the buffer pool (but the entire table does not), then this is what is happening every morning... INSERTs are suddenly adding new records to the "end" of the table because of the new "date". These inserts are occurring randomly within the new date. Blocks in the buffer_pool are being pushed out to disk to make room for the new blocks. But, nicely, what you see is smooth, fast, INSERTs. This is unlike what you saw with PRIMARY KEY(UUID), when many rows had to wait for a disk read before UNIQUEness could be checked. All of today's blocks stay cached, and you don't have to wait for I/O.
But, if you ever get so big that you cannot fit one day's data in the buffer pool, things will start slowing down, first at the end of the day, then it will creep earlier and earlier as the frequency of INSERTs increases.
By the way, PARTITION BY RANGE(date), together with PRIMARY KEY(uuid, date) has somewhat similar characteristics. (Yes I deliberately flipped the PK columns.)

When inserting large amounts of data in a table, keep in mind that the data ends up being physically stored on a disk somewhere. To actually read and write the data from the disk, MySQL (and most other RDBMS) uses something called a clustered index. If you specify a Primary Key or a Unique Index on a table, the column or columns participating in the key/index becomes the clustered index key. This means that on the disk, data is physically stored in the same order as the values in the key column(s).
By utilising the clustered index, the database engine can quickly determine whether a value already exists, without having to scan the whole table. In theory, if a table contains N = 1.000.000 records, the engine on average needs log2(N) = 20 operations to check if a value exists, regardless of how many columns participate in the index. For secondary indexes, a B-tree or a hash table is typically used (search the web for these terms, for a detailed explanation of how they work).
The conclusion of this article is wrong:
"... MySQL is unable to buffer enough data to guarantee a value is
unique and is therefore caused to perform a tremendous amount of
reading for each insert to guarantee uniqueness"
This is incorrect. Checking uniqueness does not really require any additional work, as the engine had to locate the place to insert the new record anyway. What causes the performance slowdown, is the use of UUID's. Remember that UUID's are randomly generated, whenever a new record is inserted. This means that the new record needs to be inserted at a random physical position on the disk, and this causes existing data to be shifted around, to accomodate the new record. If, on the other hand, the index column is a value that increases monotonically (such as an auto-increment INT), new records will always be inserted after the last record, meaning no existing data will ever need to be moved.
In your case, there won't be any performance difference between case 1 and case 2. But you will still run into trouble because of the randomness of the UUID's. It would be much better if you used an auto-incrementing value instead of the UUID. Also, since UUID's are always unique by nature, it really doesn't make much sense to index them with a UNIQUE constraint. Alternatively, if you really must use UUID's, make sure that you have a primary key on your table, that is based on auto-incrementing INT's, to ensure that new records are never randomly inserted on the disk.

This is the very purpose of a UNIQUE constraint:
A UNIQUE index creates a constraint such that all values in the index must be distinct. An error occurs if you try to add a new row [or update an existing row] with a key value that matches [another] existing row.
Earlier in the same manual page, it is stated that
A column list of the form (col1,col2,...) creates a multiple-column index. Index key values are formed by concatenating the values of the given columns.
How this constraint is implemented is not documented, but it must somehow equate to a preliminary SELECT with the values to be inserted/updated. The cost of such a check is often negligible, because, by definition, the fields are indexed (this overhead becomes relevant when dealing with bulk inserts).
The number of columns covered by the index is not meaningful in terms of performance (for example, compared to the number of rows in the table). It does impact the disk space occupied by the index, but this should really not matter in your design decisions.

Short, single-field indexes or enormous covering indexes in MySQL

I am trying to understand exactly what is and is not useful in a multiple-field index. I have read this existing question (and many more) plus other sites/resources (MySQL Performance Blog, Percona slideshares, etc.) but I'm not totally confident that what I've found on the subject is current and accurate. So please bear with me while I repeat some of what I think I know.
By indexing wisely, I can not only reduce how long it takes to match my query condition(s), but also reduce how long it takes to fetch the fields I want in my query result.
The index is just a sorted, duplicated subset of the full data, paired with pointers (MyISAM) or PKs (InnoDB), that I can search more efficiently than the full table.
Given the above, using an index to match my condition(s) really happens in the same way as fetching my desired result, except I created this special-purpose table (the index) that gets me an intermediate result set really quickly; and with this intermediate result set I can retrieve my final desired result set much more efficiently than by performing a full table scan.
Furthermore, if the index covers all the fields in my query (not just the conditions), instead of an intermediate result set, the index will give me everything I need without having to fetch any rows from the complete table.
InnoDB tables are clustered on the PK, so rows with consecutive PKs are likely to be stored in the same block (given many rows per block), and I can grab a range of rows with consecutive PKs fairly efficiently.
MyISAM tables are not clustered; there is some hidden internal row ordering that has no fixed relation to the PK (or any index), so any time I want to grab a set of rows, I may have to retrieve a different block for every single row - even if these rows have consecutive PKs.
Assuming the above is at least generally accurate, here's my puzzle. I have a slowly changing dimension table defined with the following columns (more or less) and using MyISAM:
dim_owner_ID INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
person_ID INT UNSIGNED NOT NULL,
raw_name VARCHAR(92) NOT NULL,
first VARCHAR(30),
middle VARCHAR(50),
last VARCHAR(30),
suffix CHAR(3),
flag CHAR(1)
Each "owner" is a unique instance of a particular individual with a particular name, so if Sue Smith changes her name to Sue Brown, that results in two rows that are the same except for the last field and the surrogate key. My understanding is that the only way to enforce this constraint internally is to do:
UNIQUE INDEX uq_owner_complete (person_ID, raw_name, first, middle, last, suffix, flag)
And that's basically going to duplicate the entire table (except for the surrogate key).
I also need to index a few other fields for quick joins and searches. While there will be some writes, and disk space is neither free nor infinite, read performance is absolutely the #1 priority here. These smaller indexes should serve very well to cover the conditions of the queries that will be run against the table, but in almost every case, the entire row needs to be selected.
With that in mind:
Is there any reasonable middle ground between sticking with short, single-field indexes (prefix where possible) and expanding every index to cover the entire table?
How would the latter be any different from storing the entire dataset five times on disk, but sorted differently each time?
Is there any benefit to adding the PK/surrogate ID to each of the smaller indexes in the hope that the query optimizer will be able to work some sort of index merge magic?
If this were an InnoDB index, the PK would already be there, but since it's MyISAM it's got pointers to the full rows instead. So if I'm understanding things correctly, there's no point (no pun intended) to adding the PK to any other index, unless doing so would allow the retrieval of the desired result set directly from the index. Which is not likely here.
I understand if it seems like I'm trying too hard to optimize, and maybe I am, but the tasks I need to perform using this database take weeks at a time, so every little bit helps.

You have to understand one concept. An index (either InnoDB or MyiSAM, ether Primary or secondary) is a data structure that's called "B+ tree".
Each node in the B+ tree is a couple (k, v), where k is a key, v is a value. If you build index on last_name your keys will be "Smith", "Johnson", "Kuzminsky" etc.
Value in the index is some data. If the index is the secondary index then the data is a primary key values.
So if you build index on last_name each node will be a couple (last_name, id) e.g. ("Smith", 5).
Primary index is an index where k is primary key and data is all other fields.
Bearing in mind the above let me comment some points:
By indexing wisely, I can not only reduce how long it takes to match my query condition(s), but also reduce how long it takes to fetch the fields I want in my query result.
Not exactly. If your secondary index is good you can quickly find v based on you query condition. E.g. you can quickly find PK by last name.
The index is just a sorted, duplicated subset of the full data, paired with pointers (MyISAM) or PKs (InnoDB), that I can search more efficiently than the full table.
Index is B+tree where each node is a couple of indexed field(s) value(s) and PK.
Given the above, using an index to match my condition(s) really happens in the same way as fetching my desired result, except I created this special-purpose table (the index) that gets me an intermediate result set really quickly; and with this intermediate result set I can retrieve my final desired result set much more efficiently than by performing a full table scan.
Not exactly. If there were no index you'd have to scan whole table and choose only records where last_name = "Smith". But you have index (last_name, PK), so having key "Smith" you can quickly find all PK where last_name = "Smith". And then you can quickly find your full result (because you need not only the last name, but the first name too). So you're right, queries like SELECT * FROM table WHERE last_name = "Smith" are executed in two steps:
Find all PK
By PK find full record.
Furthermore, if the index covers all the fields in my query (not just the conditions), instead of an intermediate result set, the index will give me everything I need without having to fetch any rows from the complete table.
Exactly. If your index is actually (last_name, first_name, id) and your query is SELECT first_name WHERE last_name = "Smith" you don't do the second step. You have the first name in the secondary index, so you don't have to go to the Primary index.
InnoDB tables are clustered on the PK, so rows with consecutive PKs are likely to be stored in the same block (given many rows per block), and I can grab a range of rows with consecutive PKs fairly efficiently.
Right. Two neighbor PK values will most likely be in the same page. Well, except cases when one PK is the last value in a page and next PK value is stored in the next page.
Basically, this is why B+ tree structure was invented. It's not only efficient for search but also efficient in sequential access. And until recently we had rotating hard drives.
MyISAM tables are not clustered; there is some hidden internal row ordering that has no fixed relation to the PK (or any index), so any time I want to grab a set of rows, I may have to retrieve a different block for every single row - even if these rows have consecutive PKs.
Right. If you insert new records to MyISAM table the records will be added to the end of MYD file regardless the PK order.
Primary index of MyISAM table will be B+tree with pointers to records in the MYD file.
Now about your particular problem. I don't see any reason to define UNIQUE INDEX uq_owner_complete.
Is there any reasonable middle ground between sticking with short, single-field indexes (prefix where possible) and expanding every index to cover the entire table?
The best is to have the secondary index on all columns that are used in the WHERE clause, except low selective fields (like sex). The most selective fields must go first in the index. For example (last_name, eye_color) is good. (eye_color, last_name) is bad.
If the covering index allows to avoid additional PK lookup, that's excellent. But if not that's acceptable too.
How would the latter be any different from storing the entire dataset five times on disk, but sorted differently each time?
Yes.
Is there any benefit to adding the PK/surrogate ID to each of the smaller indexes in the hope that the query optimizer will be able to work some sort of index merge magic?
PK is already a part of the index.( Remember, it's stored as data.) So, it makes no sense to explicitly add PK fields to the secondary index. I think (but not sure) that MyISAM secondary indexes store the PK values too (and Primary indexes do store the pointers).
To summarize:
Make your PK shorter as possible (surrogate PK works great)
Add as many indexes as you need until writes performance becomes unacceptable for you.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008