Clustered Index in MySQL

Clustered Index in MySQL - mysql

Are clustered index created and stored separately than the actual data in MySQL and if so, then why can we not have more than one clustered index. All we need to do is create another index and store it in memory.

A clustered index is, at least partially, the way the table is physically sorted and stored, i.e. the row order on disk. That's why you can only have one. But because it reflects the physical layout of the rows, it's potentially more compact and performant than a typical index.
UPDATE:
As #RickJames excellently points out below, in InnoDB (MySQL's default engine since 5.5.5), a lookup is typically a two-stage process. One b-tree relates a secondary key to a primary key, a second b-tree relates the primary key to the location of a data record. If retrieving data on primary key, only the second lookup is necessary. In that sense, a b-tree lookup is always necessary.
Additionally, according to the MySQL documentation:
Typically, the clustered index is synonymous with the primary key. 1
And the reason it's considered "clustered" and not just a primary key is because InnoDB attempts to order the data records according to primary key and leaves room for future records to be inserted in the correct location in its data pages 2.
Because of that, not only is a query on a InnoDB primary key one fewer b-tree lookup than a secondary index, but the primary key b-tree can be significantly smaller because of the physically ordering of the data on disk.
It stands to reason even if there were a mechanism to make a secondary index that pointed directly to a data record (like an index MyISAM), it wouldn't perform as well as InnoDB's primary/clustered index.
So, it's fundamentally the (at least partial) physical ordering of data records by primary key which prevents you from getting the same performance from a secondary index.

MySQL's InnoDB does the following for its PRIMARY KEY: The data records are in PK order, stored together in a B+Tree structure. This allows for rapid point-queries and range scans. That is, the value at the 'bottom' of the tree has all the columns of the table.
InnoDB's secondary keys are also in a B+Tree, but the bottom values are the PK columns. Hence, a second lookup is needed to fetch a row(s) by a secondary key.
Note that a secondary key could contain all the columns of the table, thereby acting like a second clustered index. The drawback is that any modification to the table would necessarily involve changes to both BTrees.
MyISAM, in contrast, throws the data into a file (the .MYD) and has every index in its own BTree in the .MYI file. The bottom of each BTree is a pointer (row number or byte offset) into the .MYD. The PK is not implemented any differently than a secondary key.
(Note: FULLTEXT and SPATIAL indexes are not covered by the above discussion.)

Related

When we create a clustered index does it takes extra space?

I am asking this question with repect to mysql database.I read that clustered index orders the table based on primary key or columns that we provide for making clustered index, where as in non clustered index there is separate space taken for key and record pointer.
Also I read as there is no separate index table, clustered index is faster than non clustered index where as non clustered index must first look into index table find corresponding record pointer and fetch record data
Does that mean there is no extra space taken for clustered index?
PS:I know that there are already some similar answers on this question but I can't understand.

There is no extra space taken because every InnoDB table is stored as the clustered index. There is in fact only the clustered index, and secondary indexes. There's no separate storage for data, because all the unindexed columns are simply stored in the terminal nodes of the clustered index. You might like to read more about it here: https://dev.mysql.com/doc/refman/8.0/en/innodb-index-types.html
It is true that if you do a lookup using a secondary index, and then select columns besides those in the secondary index, InnoDB would do a sort of double lookup. Once to search the secondary index, which results in the value of the primary key(s) where the value you are searching for is found, and then it uses those primary keys to search the clustered index to combine with the other columns.
This double-lookup is mitigated partially by the Adaptive Hash, which is a cache of frequently-searched values. This cache is populated automatically as you run queries. So over time, if you run queries for the same values over again, it isn't so costly.

The situation is more complex than your question.
First, let's talk only about ENGINE=InnoDB; other engines work differently.
There is about 1% overhead for the non-leaf BTree nodes to "cluster" the PRIMARY KEY with the data.
If you do not explicitly specify a PRIMARY KEY, it may be able to use a UNIQUE key as the PK. But if not, then a hidden, 6-byte number will be used for the PK. This would take more space than if you had, say, a 4-byte INT for the PK! That is, you cannot create a table without a PRIMARY KEY.
The above 2 items is TMI; think of the PK as taking no extra space.
Yes, lookup by the PK is faster than lookup by a secondary key. But if you need a secondary key, then create it. Playing a game of first fetching ids, then fetching the rows is slower than doing all the work in a single query.
A Secondary key also uses BTree also. But it is sorted by the key's column(s) and does not include all the other columns. Instead, it includes the PK's columns. (Hence the "double-lookup" that Bill mentioned.)
A "covering index" is one that contains all the columns needed for a particular SELECT. In that case, all the work can be done in the index's BTree, thereby avoiding the double-lookup. That is, a covering index is as fast as a primary key lookup. (I would guess that 20% of indexes are "covering" or could be made covering by adding a column or two.)
BTrees have a bunch of overhead. A Rule of Thumb: Add up the size of each column (4 bytes for INT, etc), then multiply by 2 or 3. The result will often be a good estimate of the disk space needed for the Data or Index Btree.
This discussion does not cover FULLEXT or SPATIAL indexes.

InnoDB secondary index includes value instead of pointer to PK, how is it enough?

I am reading Effective Mysql - Optimizing Mysql Statements and in chapter 3 there was this explanation:
The secondary indexes in InnoDB use the B-tree data structure; however, they differ from the MyISAM implementation. In InnoDB, the secondary index stores the physical value of the primary key. In MyISAM, the secondary index stores a pointer to the data that contains the primary key value.
This is important for two reasons. First, the size of secondary indexes in InnoDB can be much larger when a large primary key is defined—for example when your primary key in InnoDB is 40 bytes in length. As the number of secondary indexes increase, the comparison size of the indexes can become significant. The second difference is that the secondary index now includes the primary key value and is not required as part of the index. This can be a significant performance improvement with table joins and covering indexes.
There are many questions that come to my mind, mostly due to lack of understanding of what author is trying to convey.
It is unclear what the author means in the second difference in
second paragraph. What is not required as part of index anymore?
Does InnoDB secondary index B-tree only store PK value or PK value
and Pointer to it? or PK Value and pointer to data row?
What kind of performance improvement would there be due to the storage method (2nd question's answer)?
This question contains an example and also an answer. He explains how it contains PK value, but what I am still not understanding is,
To complete the join, if the pointer is not there in the secondary index and only the value, wont MySQL do a full index scan on Primary Key index with that value from secondary index? How would that be efficient than having the pointer also?

The secondary index is an indirect way to access the data. Unlike the primary (clustered) index, when you traverse the secondary index in InnoDB and you reach the leaf node you find a primary key value for the corresponding row the query is looking for. Using this value you traverse the primary index to fetch the row. This means 2 index look ups in InnoDB.
For MyISAM because the leaf of the secondary node is a pointer to the actual row you only require 1 index lookup.
The secondary index is formed based on certain attributes of your table that are not the PK. Hence the PK is not required to be part of the index by definition. Whether it is (InnoDB) or not (MyISAM) is implementation detail with corresponding performance implications.
Now the approach that InnoDB follows might at first seem inefficient in comparison to MyISAM (2 lookups vs 1 lookup) but it is not because the primary index is kept in memory so the penalty is low.
But the advantage is that InnoDB can split and move rows to optimize the table layout on inserts/updates/deletes of rows without needing to do any updates on the secondary index since it does not refer to the affected rows directly

Basics..
MyISAM's PRIMARY KEY and secondary keys work the same. -- Both are BTrees in the .MYI file where a "pointer" in the leaf node points to the .MYD file.
The "pointer" is either a byte offset into the .MYD file, or a record number (for FIXED). Either results in a "seek" into the .MYD file.
InnoDB's data, including the columns of the PRIMARY KEY, is stored in one BTree ordered by the PK.
This makes a PK lookup slightly faster. Both drill down a BTree, but MyISAM needs an extra seek.
Each InnoDB secondary key is stored in a separate BTree. But in this case the leaf nodes contain any extra columns of the PK. So, a secondary key lookup first drills down that BTree based on the secondary key. There it will find all the columns of both the secondary key and the primary key. If those are all the columns you need, this is a "covering index" for the query, and nothing further is done. (Faster than MyISAM.)
But usually you need some other columns, so the column(s) of the PK are used to drill down the data/PK BTree to find the rest of the columns in the row. (Slower than MyISAM.)
So, there are some cases where MyISAM does less work; some cases where InnoDB does less work. There are a lot of other things going on; InnoDB is winning many comparison benchmarks over MyISAM.
Caching...
MyISAM controls the caching of 1KB index blocks in the key_buffer. Data blocks are cached by the Operating System.
InnoDB caches both data and secondary index blocks (16KB in both cases) in the buffer_pool.
"Caching" refers to swapping in/out blocks as needed, with roughly a "least recently used" algorithm.
No BTree is loaded into RAM. No BTree is explicitly kept in RAM. Every block is requested as needed, with the hope that it is cached in RAM. For data and/or index(es) smaller than the associated buffer (key_buffer / buffer_pool), the BTree may happen to stay in RAM until shutdown.
The source-of-truth is on disk. (OK, there are complex tricks that InnoDB uses with log files to avoid loss of data when a crash occurs before blocks are flushed to disk. That cleanup automatically occurs when restarting after the crash.)
Pulling the plug..
MyISAM:
Mess #1: Indexes will be left in an unclean state. CHECK TABLE and REPAIR TABLE are needed.
Mess #2: If you are in the middle of UPDATEing a thousand rows in a single statement, some will be updated, some won't.
InnoDB:
As alluded to above, InnoDB performs things atomically, even across pulling the plug. No index is left mangled. No UPDATE is left half-finished; it will be ROLLBACKed.
Example..
Given
columns a,b,c,d,e,f,g
PRIMARY KEY(a,b,c)
INDEX(c,d)
The BTree leaf nodes will contain:
MyISAM:
for the PK: a,b,c,pointer
for secondary: c,d,pointer
InnoDB:
for the PK: a,b,c,d,e,f,g (the entire row is stored with the PK)
for secondary: c,d,a,b

Non-clustered index contains copies of physical data or just positions to physical data ? [duplicate]

I have a limited exposure to DB and have only used DB as an application programmer. I want to know about Clustered and Non clustered indexes.
I googled and what I found was :
A clustered index is a special type of index that reorders the way
records in the table are physically
stored. Therefore table can have only
one clustered index. The leaf nodes
of a clustered index contain the data
pages. A nonclustered index is a
special type of index in which the
logical order of the index does not
match the physical stored order of
the rows on disk. The leaf node of a
nonclustered index does not consist of
the data pages. Instead, the leaf
nodes contain index rows.
What I found in SO was What are the differences between a clustered and a non-clustered index?.
Can someone explain this in plain English?

With a clustered index the rows are stored physically on the disk in the same order as the index. Therefore, there can be only one clustered index.
With a non clustered index there is a second list that has pointers to the physical rows. You can have many non clustered indices, although each new index will increase the time it takes to write new records.
It is generally faster to read from a clustered index if you want to get back all the columns. You do not have to go first to the index and then to the table.
Writing to a table with a clustered index can be slower, if there is a need to rearrange the data.

A clustered index means you are telling the database to store close values actually close to one another on the disk. This has the benefit of rapid scan / retrieval of records falling into some range of clustered index values.
For example, you have two tables, Customer and Order:
Customer
----------
ID
Name
Address
Order
----------
ID
CustomerID
Price
If you wish to quickly retrieve all orders of one particular customer, you may wish to create a clustered index on the "CustomerID" column of the Order table. This way the records with the same CustomerID will be physically stored close to each other on disk (clustered) which speeds up their retrieval.
P.S. The index on CustomerID will obviously be not unique, so you either need to add a second field to "uniquify" the index or let the database handle that for you but that's another story.
Regarding multiple indexes. You can have only one clustered index per table because this defines how the data is physically arranged. If you wish an analogy, imagine a big room with many tables in it. You can either put these tables to form several rows or pull them all together to form a big conference table, but not both ways at the same time. A table can have other indexes, they will then point to the entries in the clustered index which in its turn will finally say where to find the actual data.

In SQL Server, row-oriented storage both clustered and nonclustered indexes are organized as B trees.
(Image Source)
The key difference between clustered indexes and non clustered indexes is that the leaf level of the clustered index is the table. This has two implications.
The rows on the clustered index leaf pages always contain something for each of the (non-sparse) columns in the table (either the value or a pointer to the actual value).
The clustered index is the primary copy of a table.
Non clustered indexes can also do point 1 by using the INCLUDE clause (Since SQL Server 2005) to explicitly include all non-key columns but they are secondary representations and there is always another copy of the data around (the table itself).
CREATE TABLE T
(
A INT,
B INT,
C INT,
D INT
)
CREATE UNIQUE CLUSTERED INDEX ci ON T(A, B)
CREATE UNIQUE NONCLUSTERED INDEX nci ON T(A, B) INCLUDE (C, D)
The two indexes above will be nearly identical. With the upper-level index pages containing values for the key columns A, B and the leaf level pages containing A, B, C, D
There can be only one clustered index per table, because the data rows
themselves can be sorted in only one order.
The above quote from SQL Server books online causes much confusion
In my opinion, it would be much better phrased as.
There can be only one clustered index per table because the leaf level rows of the clustered index are the table rows.
The book's online quote is not incorrect but you should be clear that the "sorting" of both non clustered and clustered indices is logical, not physical. If you read the pages at leaf level by following the linked list and read the rows on the page in slot array order then you will read the index rows in sorted order but physically the pages may not be sorted. The commonly held belief that with a clustered index the rows are always stored physically on the disk in the same order as the index key is false.
This would be an absurd implementation. For example, if a row is inserted into the middle of a 4GB table SQL Server does not have to copy 2GB of data up in the file to make room for the newly inserted row.
Instead, a page split occurs. Each page at the leaf level of both clustered and non clustered indexes has the address (File: Page) of the next and previous page in logical key order. These pages need not be either contiguous or in key order.
e.g. the linked page chain might be 1:2000 <-> 1:157 <-> 1:7053
When a page split happens a new page is allocated from anywhere in the filegroup (from either a mixed extent, for small tables or a non-empty uniform extent belonging to that object or a newly allocated uniform extent). This might not even be in the same file if the filegroup contains more than one.
The degree to which the logical order and contiguity differ from the idealized physical version is the degree of logical fragmentation.
In a newly created database with a single file, I ran the following.
CREATE TABLE T
(
X TINYINT NOT NULL,
Y CHAR(3000) NULL
);
CREATE CLUSTERED INDEX ix
ON T(X);
GO
--Insert 100 rows with values 1 - 100 in random order
DECLARE #C1 AS CURSOR,
#X AS INT
SET #C1 = CURSOR FAST_FORWARD
FOR SELECT number
FROM master..spt_values
WHERE type = 'P'
AND number BETWEEN 1 AND 100
ORDER BY CRYPT_GEN_RANDOM(4)
OPEN #C1;
FETCH NEXT FROM #C1 INTO #X;
WHILE ##FETCH_STATUS = 0
BEGIN
INSERT INTO T (X)
VALUES (#X);
FETCH NEXT FROM #C1 INTO #X;
END
Then checked the page layout with
SELECT page_id,
X,
geometry::Point(page_id, X, 0).STBuffer(1)
FROM T
CROSS APPLY sys.fn_PhysLocCracker( %% physloc %% )
ORDER BY page_id
The results were all over the place. The first row in key order (with value 1 - highlighted with an arrow below) was on nearly the last physical page.
Fragmentation can be reduced or removed by rebuilding or reorganizing an index to increase the correlation between logical order and physical order.
After running
ALTER INDEX ix ON T REBUILD;
I got the following
If the table has no clustered index it is called a heap.
Non clustered indexes can be built on either a heap or a clustered index. They always contain a row locator back to the base table. In the case of a heap, this is a physical row identifier (rid) and consists of three components (File:Page: Slot). In the case of a Clustered index, the row locator is logical (the clustered index key).
For the latter case if the non clustered index already naturally includes the CI key column(s) either as NCI key columns or INCLUDE-d columns then nothing is added. Otherwise, the missing CI key column(s) silently gets added to the NCI.
SQL Server always ensures that the key columns are unique for both types of indexes. The mechanism in which this is enforced for indexes not declared as unique differs between the two index types, however.
Clustered indexes get a uniquifier added for any rows with key values that duplicate an existing row. This is just an ascending integer.
For non clustered indexes not declared as unique SQL Server silently adds the row locator into the non clustered index key. This applies to all rows, not just those that are actually duplicates.
The clustered vs non clustered nomenclature is also used for column store indexes. The paper Enhancements to SQL Server Column Stores states
Although column store data is not really "clustered" on any key, we
decided to retain the traditional SQL Server convention of referring
to the primary index as a clustered index.

I realize this is a very old question, but I thought I would offer an analogy to help illustrate the fine answers above.
CLUSTERED INDEX
If you walk into a public library, you will find that the books are all arranged in a particular order (most likely the Dewey Decimal System, or DDS). This corresponds to the "clustered index" of the books. If the DDS# for the book you want was 005.7565 F736s, you would start by locating the row of bookshelves that is labeled 001-099 or something like that. (This endcap sign at the end of the stack corresponds to an "intermediate node" in the index.) Eventually you would drill down to the specific shelf labelled 005.7450 - 005.7600, then you would scan until you found the book with the specified DDS#, and at that point you have found your book.
NON-CLUSTERED INDEX
But if you didn't come into the library with the DDS# of your book memorized, then you would need a second index to assist you. In the olden days you would find at the front of the library a wonderful bureau of drawers known as the "Card Catalog". In it were thousands of 3x5 cards -- one for each book, sorted in alphabetical order (by title, perhaps). This corresponds to the "non-clustered index". These card catalogs were organized in a hierarchical structure, so that each drawer would be labeled with the range of cards it contained (Ka - Kl, for example; i.e., the "intermediate node"). Once again, you would drill in until you found your book, but in this case, once you have found it (i.e, the "leaf node"), you don't have the book itself, but just a card with an index number (the DDS#) with which you could find the actual book in the clustered index.
Of course, nothing would stop the librarian from photocopying all the cards and sorting them in a different order in a separate card catalog. (Typically there were at least two such catalogs: one sorted by author name, and one by title.) In principle, you could have as many of these "non-clustered" indexes as you want.

Find below some characteristics of clustered and non-clustered indexes:
Clustered Indexes
Clustered indexes are indexes that uniquely identify the rows in an SQL table.
Every table can have exactly one clustered index.
You can create a clustered index that covers more than one column. For example: create Index index_name(col1, col2, col.....).
By default, a column with a primary key already has a clustered index.
Non-clustered Indexes
Non-clustered indexes are like simple indexes. They are just used for fast retrieval of data. Not sure to have unique data.

Clustered Index
A clustered index determines the physical order of DATA in a table. For this reason, a table has only one clustered index(Primary key/composite key).
"Dictionary" No need of any other Index, its already Index according to words
Nonclustered Index
A non-clustered index is analogous to an index in a Book. The data is stored in one place. The index is stored in another place and the index has pointers to the storage location. this help in the fast search of data. For this reason, a table has more than 1 Nonclustered index.
"Biology Book" at starting there is a separate index to point Chapter location and At the "END" there is another Index pointing the common WORDS location

A very simple, non-technical rule-of-thumb would be that clustered indexes are usually used for your primary key (or, at least, a unique column) and that non-clustered are used for other situations (maybe a foreign key). Indeed, SQL Server will by default create a clustered index on your primary key column(s). As you will have learnt, the clustered index relates to the way data is physically sorted on disk, which means it's a good all-round choice for most situations.

Clustered Index
A Clustered Index is basically a tree-organized table. Instead of storing the records in an unsorted Heap table space, the clustered index is actually B+Tree index having the Leaf Nodes, which are ordered by the clusters key column value, store the actual table records, as illustrated by the following diagram.
The Clustered Index is the default table structure in SQL Server and MySQL. While MySQL adds a hidden clusters index even if a table doesn't have a Primary Key, SQL Server always builds a Clustered Index if a table has a Primary Key column. Otherwise, the SQL Server is stored as a Heap Table.
The Clustered Index can speed up queries that filter records by the clustered index key, like the usual CRUD statements. Since the records are located in the Leaf Nodes, there's no additional lookup for extra column values when locating records by their Primary Key values.
For example, when executing the following SQL query on SQL Server:
SELECT PostId, Title
FROM Post
WHERE PostId = ?
You can see that the Execution Plan uses a Clustered Index Seek operation to locate the Leaf Node containing the Post record, and there are only two logical reads required to scan the Clustered Index nodes:
|StmtText |
|-------------------------------------------------------------------------------------|
|SELECT PostId, Title FROM Post WHERE PostId = #P0 |
| |--Clustered Index Seek(OBJECT:([high_performance_sql].[dbo].[Post].[PK_Post_Id]), |
| SEEK:([high_performance_sql].[dbo].[Post].[PostID]=[#P0]) ORDERED FORWARD) |
Table 'Post'. Scan count 0, logical reads 2, physical reads 0
Non-Clustered Index
Since the Clustered Index is usually built using the Primary Key column values, if you want to speed up queries that use some other column, then you'll have to add a Secondary Non-Clustered Index.
The Secondary Index is going to store the Primary Key value in its Leaf Nodes, as illustrated by the following diagram:
So, if we create a Secondary Index on the Title column of the Post table:
CREATE INDEX IDX_Post_Title on Post (Title)
And we execute the following SQL query:
SELECT PostId, Title
FROM Post
WHERE Title = ?
We can see that an Index Seek operation is used to locate the Leaf Node in the IDX_Post_Title index that can provide the SQL query projection we are interested in:
|StmtText |
|------------------------------------------------------------------------------|
|SELECT PostId, Title FROM Post WHERE Title = #P0 |
| |--Index Seek(OBJECT:([high_performance_sql].[dbo].[Post].[IDX_Post_Title]),|
| SEEK:([high_performance_sql].[dbo].[Post].[Title]=[#P0]) ORDERED FORWARD)|
Table 'Post'. Scan count 1, logical reads 2, physical reads 0
Since the associated PostId Primary Key column value is stored in the IDX_Post_Title Leaf Node, this query doesn't need an extra lookup to locate the Post row in the Clustered Index.

Clustered Index
Clustered indexes sort and store the data rows in the table or view based on their key values. These are the columns included in the index definition. There can be only one clustered index per table, because the data rows themselves can be sorted in only one order.
The only time the data rows in a table are stored in sorted order is when the table contains a clustered index. When a table has a clustered index, the table is called a clustered table. If a table has no clustered index, its data rows are stored in an unordered structure called a heap.
Nonclustered
Nonclustered indexes have a structure separate from the data rows. A nonclustered index contains the nonclustered index key values and each key value entry has a pointer to the data row that contains the key value.
The pointer from an index row in a nonclustered index to a data row is called a row locator. The structure of the row locator depends on whether the data pages are stored in a heap or a clustered table. For a heap, a row locator is a pointer to the row. For a clustered table, the row locator is the clustered index key.
You can add nonkey columns to the leaf level of the nonclustered index to by-pass existing index key limits, and execute fully covered, indexed, queries. For more information, see Create Indexes with Included Columns. For details about index key limits see Maximum Capacity Specifications for SQL Server.
Reference: https://learn.microsoft.com/en-us/sql/relational-databases/indexes/clustered-and-nonclustered-indexes-described

Let me offer a textbook definition on "clustering index", which is taken from 15.6.1 from Database Systems: The Complete Book:
We may also speak of clustering indexes, which are indexes on an attribute or attributes such that all of tuples with a fixed value for the search key of this index appear on roughly as few blocks as can hold them.
To understand the definition, let's take a look at Example 15.10 provided by the textbook:
A relation R(a,b) that is sorted on attribute a and stored in that
order, packed into blocks, is surely clusterd. An index on a is a
clustering index, since for a given a-value a1, all the tuples with
that value for a are consecutive. They thus appear packed into
blocks, execept possibly for the first and last blocks that contain
a-value a1, as suggested in Fig.15.14. However, an index on b is
unlikely to be clustering, since the tuples with a fixed b-value
will be spread all over the file unless the values of a and b are
very closely correlated.
Note that the definition does not enforce the data blocks have to be contiguous on the disk; it only says tuples with the search key are packed into as few data blocks as possible.
A related concept is clustered relation. A relation is "clustered" if its tuples are packed into roughly as few blocks as can possibly hold those tuples. In other words, from a disk block perspective, if it contains tuples from different relations, then those relations cannot be clustered (i.e., there is a more packed way to store such relation by swapping the tuples of that relation from other disk blocks with the tuples the doesn't belong to the relation in the current disk block). Clearly, R(a,b) in example above is clustered.
To connect two concepts together, a clustered relation can have a clustering index and nonclustering index. However, for non-clustered relation, clustering index is not possible unless the index is built on top of the primary key of the relation.
"Cluster" as a word is spammed across all abstraction levels of database storage side (three levels of abstraction: tuples, blocks, file). A concept called "clustered file", which describes whether a file (an abstraction for a group of blocks (one or more disk blocks)) contains tuples from one relation or different relations. It doesn't relate to the clustering index concept as it is on file level.
However, some teaching material likes to define clustering index based on the clustered file definition. Those two types of definitions are the same on clustered relation level, no matter whether they define clustered relation in terms of data disk block or file. From the link in this paragraph,
An index on attribute(s) A on a file is a clustering index when: All tuples with attribute value A = a are stored sequentially (= consecutively) in the data file
Storing tuples consecutively is the same as saying "tuples are packed into roughly as few blocks as can possibly hold those tuples" (with minor difference on one talking about file, the other talking about disk). It's because storing tuple consecutively is the way to achieve "packed into roughly as few blocks as can possibly hold those tuples".

Clustered Index:
Primary Key constraint creates clustered Index automatically if no clustered Index already exists on the table. Actual data of clustered index can be stored at leaf level of Index.
Non Clustered Index:
Actual data of non clustered index is not directly found at leaf node, instead it has to take an additional step to find because it has only values of row locators pointing towards actual data.
Non clustered Index can't be sorted as clustered index. There can be multiple non clustered indexes per table, actually it depends on the sql server version we are using. Basically Sql server 2005 allows 249 Non Clustered Indexes and for above versions like 2008, 2016 it allows 999 Non Clustered Indexes per table.

Clustered Index - A clustered index defines the order in which data is physically stored in a table. Table data can be sorted in only way, therefore, there can be only one clustered index per table. In SQL Server, the primary key constraint automatically creates a clustered index on that particular column.
Non-Clustered Index - A non-clustered index doesn’t sort the physical data inside the table. In fact, a non-clustered index is stored at one place and table data is stored in another place. This is similar to a textbook where the book content is located in one place and the index is located in another. This allows for more than one non-clustered index per table.It is important to mention here that inside the table the data will be sorted by a clustered index. However, inside the non-clustered index data is stored in the specified order. The index contains column values on which the index is created and the address of the record that the column value belongs to.When a query is issued against a column on which the index is created, the database will first go to the index and look for the address of the corresponding row in the table. It will then go to that row address and fetch other column values. It is due to this additional step that non-clustered indexes are slower than clustered indexes
Differences between clustered and Non-clustered index
There can be only one clustered index per table. However, you can
create multiple non-clustered indexes on a single table.
Clustered indexes only sort tables. Therefore, they do not consume
extra storage. Non-clustered indexes are stored in a separate place
from the actual table claiming more storage space.
Clustered indexes are faster than non-clustered indexes since they
don’t involve any extra lookup step.
For more information refer to this article.

Cluster or unique index MySql DB?

I have one interesting question: what is difference between cluster index and unique index? What's better and faster and why?

With MySQL's InnoDB engine, you get 3 choices of ordinary indexes:
PRIMARY KEY -- "clustered" with the data and "unique". (Often AUTO_INCREMENT)
UNIQUE -- unique, not clustered.
INDEX -- non unique, not clustered.
All are implemented as BTrees. Clustered means that the data is in the leaf node.
Considerations:
Any of the 3 choices can help with performance of searching for row(s).* InnoDB needs a PRIMARY KEY.
If you want the database to spit at you when you are trying to insert something that is already there, you need to specify the column(s) of the PRIMARY KEY or a UNIQUE.
Secondary keys (UNIQUE or INDEX) go through the PRIMARY KEY to find the data. Corollary 1: Finding a row by the PK is faster. Corollary 2: A bulky (eg, VARCHAR(255)) PK is an extra burden on every secondary key.
A "covering" secondary key does not need to go beyond its BTree.
Data and indexes are cached in RAM at the 16KB block unit; caching may be an important consideration in performance.
UUID/GUID indexes are terrible when the index cannot be fully cached -- due to high I/O.
An INSERT must immediately check the PRIMARY KEY and any UNIQUE key(s) for duplicate, but it can delay updating other secondary keys. (This may have impact on performance during inserts.)
From those details, it is sometimes possible to deduce the answers to your questions.
(Caveat: Engines other than InnoDB have some different characteristics.)

What happens with an index exactly when a row is inserted or updated in mysql?

Is the index re-built completely or is the index updated? If it is updated then what exactly is updated?
Assume InnoDB is being used.

All indexes for a table in MySQL are "immediately" updated (not rebuilt) as a row is INSERTed into that table. Ditto for DELETE. In some cases, UPDATE causes index update(s).
By "immediately", I mean that you cannot tell whether it is finished before control is returned to you, or whether there is some form of caching going on.
Most indexes in MySQL are BTrees. In a few cases, there is FULLTEXT, SPATIAL, or HASH.
Adding an entry to a BTree involves drilling down the "tree" (~3 levels for a million-row table) and adding a 'record' in the leaf node. This is fast enough that you cannot tell whether it is done live.
If you have a dozen indexes, then there are a dozen BTrees (or whatever) to update. This suggests you should not have more indexes than you need.
In InnoDB the PRIMARY KEY is "clustered". That is, the data and the PRIMARY KEY live together in a single BTree, ordered by the PRIMARY KEY and containing all the data.
In InnoDB, each 'record' in a secondary index (also structured as a BTree) contains a copy of the PRIMARY KEY. (This may be what Zafar is alluding to.)
A BTree index is very efficient for
"Point queries" -- Finding one row, given the 'key'.
"Range queries" -- Finding rows given a key range (eg, WHERE key BETWEEN 22 AND 44)

In short you can say that in innodb, every index is associated with clustured index (normally known as primary key) so whenever any index value updated then it (changed value) will again associated with clustered index.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008