InnoDB B+ tree index - duplicate values - mysql

How are duplicate keys handled in InnoDB's implementation of B+ tree for it's indexes.
For example, if there is a table with 1 million rows having a column with cardinality of 10. If we create an index on this column, how will the resulting B+ tree would look like?
Will it just have 10 keys and the value of each key is the list of primary keys which belong to that key (if yes, in what structure? Linked list?) or will it have 1M keys (if yes, then B+ tree would have to be handled differently)?

In some sense, an InnoDB BTree has no duplicates. This is because the columns of the PRIMARY KEY are appended to the columns specified for a secondary key. That leads to a fully-ordered list.
When you lookup via a secondary key (or the initial part of a key), the query will drill down the BTree to find the first row in the index matching what you gave, then scan forward to get any others. To get the rest of the columns, it takes the PRIMARY KEY columns to do a second BTree lookup.
The Optimizer will rarely use an index with "low cardinality". For example, a yes/no or true/false or male/female column should not be indexed. The Optimizer would find it faster to simply scan the table rather than bounce back and forth between the index and (via the PK columns) the main BTree.
The cutoff for when to use the index versus punting is somewhere around 20%, depending on the phase of the moon.

Bad index
The case you propose is a bad one for a B+ tree. A cardinality of 10 means only 10 of the 1 million values are unique. Actually it is not only bad for a B+ tree, it is a bad index in general. Based on this index you will on average be left with a subset of approx. 100,000 values, which you either have to look through or use another value to filter further.
B+ tree properties
Concerning the structure of the resulting tree there are some things to keep in mind here:
A node cannot contain arbitrary much data.
Inserts may require splits if the leaf node is full
Occasionally the split of a leaf node necessitates split of the next higher node
In worst case scenarios the split may cascade all the way up to the root node
https://www.percona.com/files/presentations/percona-live/london-2011/PLUK2011-b-
Leafs are linked as a double-linked list.
Leaf nodes are linked together as doubly linked list
[…]
Entire tree may be scanned without visiting the higher nodes at all
https://www.percona.com/files/presentations/percona-live/london-2011/PLUK2011-b-
Expectation
If you insert a lot of data with keys which more or less belong all to the same equivalence class, I would expect a tree, which will not help a lot. The 10 keys might be present solely in the root node, and all data deeper in the tree will just be unsorted (because there is nothing left to sort it).
Due to the fact that the leafs are double-linked lists you are basically left with what I've written in the beginning: You have to traverse a big subset of the values. Concerning the given index this had to be expected and the B+ tree might doing well given the circumstances (a list is ok for just going through all data).
Actually this goes one abstraction deeper: The leafs are double-linked, but there are multiple values in each leaf (data or link to PK). Nevertheless these are in a list too, so if you just traverse everything it makes not much of a difference.
Examining InnoDB space
Please see that you can also investigate what MySQL is really building. There are tools to inspect the built index data structures, see for example
https://blog.jcole.us/2013/01/10/btree-index-structures-in-innodb/
https://github.com/jeremycole/innodb_ruby

InnoDB stores table in B+ tree index called internally PRIMARY. The key of the index is your primary key fields.
If you define a secondary index there will be additional B+ tree index(in .ibd or ibdata1) where the key is the secondary index fields and value is the primary key.
B+ tree itself doesn't require key to be unique. Uniqueness of PRIMARY and all UNIQUE indexes are enforced at server level.
Here're some slides about how InnoDB organizes indexes and uses them to access the data. http://www.slideshare.net/akuzminsky/efficient-indexes-in-mysql#downloads-panel

Related

Build secondary indexes more freely

There is a table t_bl with the following fields (id,a,b).
select a,b from t_bl where a = "XXX";
In this case, the b field is not used as the basis for retrieval, but needs to appear in the retrieval results.
Then there are the following two index building schemes.
Create an index on the a field
Advantage: Makes each node of the index reduce the space overhead of storing the b field.
Disadvantage: Returning the query results requires returning the table to the clustered index
through the primary key id of the secondary index to query the value of the b field, which
affects the query performance.
Create a joint index of a,b fields
Advantage: Covering indexes can be used to reduce return tables and improve query efficiency.
Disadvantage: Making every non-node in the secondary index adds unnecessary space overhead for storing the b field (because the b field is not the basis for retrieval).
So why not provide a mechanism that enables users to create a secondary index that is asymmetric between non-leaf nodes and leaf nodes?
For example, in this example, the user's better choice is to create a non-leaf node to store the a field, and the leaf node to store the indexes of the two fields a and b.
Some implementations of SQL databases do exactly what you describe, adding a column to the leaf node only, so it doesn't take space in the non-leaf nodes, but can be used for covering indexes.
An example of a product that does this is Microsoft SQL Server, which supports syntax allowing you to define optional non-key columns to INCLUDE() in a secondary index. See https://learn.microsoft.com/en-us/sql/relational-databases/indexes/create-indexes-with-included-columns?view=sql-server-ver16
However, InnoDB is not currently implemented to do this. As far as I know, there's no reason it can't do this, but they have not implemented it. I guess other features were higher priority.
For what it's worth, the SQL standard doesn't include anything about indexes, so each vendor implements their indexing feature totally as an extension to the standard. They are therefore free to implement index features according to their own priorities.
Simply make a "composite" "covering" index:
INDEX(a, b)
That particular query will run faster because it does not need to bounce between the index's BTree and the data's BTree.
In some similar situations, I recommend
PRIMARY KEY(a, id), -- all columns are efficiently accessed via a=..
INDEX(id) -- to keep AUTO_INCREMENT happy
As for 2's disadvantage -- It is tiny. I like the Rule of Thumb that MySQL's BTrees have a fan-out of about 100. That is, each node has 100 child nodes. The corollary is that the non-leaf nodes take up only 1% of the total disk space. (This may be the clinching argument for the developers to say "let's not bother implementing INCLUDE".)

Do RDBs like MySQL and PostgreSQL store copy of data for each index? Or just B-tree with links to real objects?

How RDBs like MySQL and PostgreSQL manage memory for new indexes?
My guess that RDB creates B-tree (or other indexes) with References/Links to real objects in memory.
Another guess that it duplicates all the data for each new index.
So basically this question is about "What B-tree consists of? References, or real objects?"
Google search is too overheat about DB topics and RDMS products. So, I also would be very grateful for good articles about this.
The details vary, but a B-tree index is a tree structure that is stored on disk. It contains duplicates of the indexed terms (the index keys) and a (direct or indirect) pointer to the indexed row in the table.
A B-tree index represents a sorted list of the index keys that allows fast searches. The tree structure speeds up searching through the list and allows inserting and deleting entries without too much data churn.
It is unclear what you mean by a "real object". The index keys are certainly real, and they are stored in the index. But if you mean the whole table row, that is only referenced from the index.
For MySQL's Engine=InnoDB, it works this way:
The data is stored in PRIMARY KEY order in a B+Tree. This makes lookups and ranges based on the PK very efficient.
Each secondary keys is also a B+Tree, but ordered by the order given by the secondary key column(s). Each "row" also has the columns of the PK, thereby providing the reference (link) to the data's BTree.
If the columns of the secondary key plus the PK are the only columns you need, then the query is performed using only the secondary key's BTree.
There is no "ROWNUM" as found in some other database brands.
If you don't hit certain limits, you could include all the table's columns in a secondary index.

Non-clustered index contains copies of physical data or just positions to physical data ? [duplicate]

I have a limited exposure to DB and have only used DB as an application programmer. I want to know about Clustered and Non clustered indexes.
I googled and what I found was :
A clustered index is a special type of index that reorders the way
records in the table are physically
stored. Therefore table can have only
one clustered index. The leaf nodes
of a clustered index contain the data
pages. A nonclustered index is a
special type of index in which the
logical order of the index does not
match the physical stored order of
the rows on disk. The leaf node of a
nonclustered index does not consist of
the data pages. Instead, the leaf
nodes contain index rows.
What I found in SO was What are the differences between a clustered and a non-clustered index?.
Can someone explain this in plain English?
With a clustered index the rows are stored physically on the disk in the same order as the index. Therefore, there can be only one clustered index.
With a non clustered index there is a second list that has pointers to the physical rows. You can have many non clustered indices, although each new index will increase the time it takes to write new records.
It is generally faster to read from a clustered index if you want to get back all the columns. You do not have to go first to the index and then to the table.
Writing to a table with a clustered index can be slower, if there is a need to rearrange the data.
A clustered index means you are telling the database to store close values actually close to one another on the disk. This has the benefit of rapid scan / retrieval of records falling into some range of clustered index values.
For example, you have two tables, Customer and Order:
Customer
----------
ID
Name
Address
Order
----------
ID
CustomerID
Price
If you wish to quickly retrieve all orders of one particular customer, you may wish to create a clustered index on the "CustomerID" column of the Order table. This way the records with the same CustomerID will be physically stored close to each other on disk (clustered) which speeds up their retrieval.
P.S. The index on CustomerID will obviously be not unique, so you either need to add a second field to "uniquify" the index or let the database handle that for you but that's another story.
Regarding multiple indexes. You can have only one clustered index per table because this defines how the data is physically arranged. If you wish an analogy, imagine a big room with many tables in it. You can either put these tables to form several rows or pull them all together to form a big conference table, but not both ways at the same time. A table can have other indexes, they will then point to the entries in the clustered index which in its turn will finally say where to find the actual data.
In SQL Server, row-oriented storage both clustered and nonclustered indexes are organized as B trees.
(Image Source)
The key difference between clustered indexes and non clustered indexes is that the leaf level of the clustered index is the table. This has two implications.
The rows on the clustered index leaf pages always contain something for each of the (non-sparse) columns in the table (either the value or a pointer to the actual value).
The clustered index is the primary copy of a table.
Non clustered indexes can also do point 1 by using the INCLUDE clause (Since SQL Server 2005) to explicitly include all non-key columns but they are secondary representations and there is always another copy of the data around (the table itself).
CREATE TABLE T
(
A INT,
B INT,
C INT,
D INT
)
CREATE UNIQUE CLUSTERED INDEX ci ON T(A, B)
CREATE UNIQUE NONCLUSTERED INDEX nci ON T(A, B) INCLUDE (C, D)
The two indexes above will be nearly identical. With the upper-level index pages containing values for the key columns A, B and the leaf level pages containing A, B, C, D
There can be only one clustered index per table, because the data rows
themselves can be sorted in only one order.
The above quote from SQL Server books online causes much confusion
In my opinion, it would be much better phrased as.
There can be only one clustered index per table because the leaf level rows of the clustered index are the table rows.
The book's online quote is not incorrect but you should be clear that the "sorting" of both non clustered and clustered indices is logical, not physical. If you read the pages at leaf level by following the linked list and read the rows on the page in slot array order then you will read the index rows in sorted order but physically the pages may not be sorted. The commonly held belief that with a clustered index the rows are always stored physically on the disk in the same order as the index key is false.
This would be an absurd implementation. For example, if a row is inserted into the middle of a 4GB table SQL Server does not have to copy 2GB of data up in the file to make room for the newly inserted row.
Instead, a page split occurs. Each page at the leaf level of both clustered and non clustered indexes has the address (File: Page) of the next and previous page in logical key order. These pages need not be either contiguous or in key order.
e.g. the linked page chain might be 1:2000 <-> 1:157 <-> 1:7053
When a page split happens a new page is allocated from anywhere in the filegroup (from either a mixed extent, for small tables or a non-empty uniform extent belonging to that object or a newly allocated uniform extent). This might not even be in the same file if the filegroup contains more than one.
The degree to which the logical order and contiguity differ from the idealized physical version is the degree of logical fragmentation.
In a newly created database with a single file, I ran the following.
CREATE TABLE T
(
X TINYINT NOT NULL,
Y CHAR(3000) NULL
);
CREATE CLUSTERED INDEX ix
ON T(X);
GO
--Insert 100 rows with values 1 - 100 in random order
DECLARE #C1 AS CURSOR,
#X AS INT
SET #C1 = CURSOR FAST_FORWARD
FOR SELECT number
FROM master..spt_values
WHERE type = 'P'
AND number BETWEEN 1 AND 100
ORDER BY CRYPT_GEN_RANDOM(4)
OPEN #C1;
FETCH NEXT FROM #C1 INTO #X;
WHILE ##FETCH_STATUS = 0
BEGIN
INSERT INTO T (X)
VALUES (#X);
FETCH NEXT FROM #C1 INTO #X;
END
Then checked the page layout with
SELECT page_id,
X,
geometry::Point(page_id, X, 0).STBuffer(1)
FROM T
CROSS APPLY sys.fn_PhysLocCracker( %% physloc %% )
ORDER BY page_id
The results were all over the place. The first row in key order (with value 1 - highlighted with an arrow below) was on nearly the last physical page.
Fragmentation can be reduced or removed by rebuilding or reorganizing an index to increase the correlation between logical order and physical order.
After running
ALTER INDEX ix ON T REBUILD;
I got the following
If the table has no clustered index it is called a heap.
Non clustered indexes can be built on either a heap or a clustered index. They always contain a row locator back to the base table. In the case of a heap, this is a physical row identifier (rid) and consists of three components (File:Page: Slot). In the case of a Clustered index, the row locator is logical (the clustered index key).
For the latter case if the non clustered index already naturally includes the CI key column(s) either as NCI key columns or INCLUDE-d columns then nothing is added. Otherwise, the missing CI key column(s) silently gets added to the NCI.
SQL Server always ensures that the key columns are unique for both types of indexes. The mechanism in which this is enforced for indexes not declared as unique differs between the two index types, however.
Clustered indexes get a uniquifier added for any rows with key values that duplicate an existing row. This is just an ascending integer.
For non clustered indexes not declared as unique SQL Server silently adds the row locator into the non clustered index key. This applies to all rows, not just those that are actually duplicates.
The clustered vs non clustered nomenclature is also used for column store indexes. The paper Enhancements to SQL Server Column Stores states
Although column store data is not really "clustered" on any key, we
decided to retain the traditional SQL Server convention of referring
to the primary index as a clustered index.
I realize this is a very old question, but I thought I would offer an analogy to help illustrate the fine answers above.
CLUSTERED INDEX
If you walk into a public library, you will find that the books are all arranged in a particular order (most likely the Dewey Decimal System, or DDS). This corresponds to the "clustered index" of the books. If the DDS# for the book you want was 005.7565 F736s, you would start by locating the row of bookshelves that is labeled 001-099 or something like that. (This endcap sign at the end of the stack corresponds to an "intermediate node" in the index.) Eventually you would drill down to the specific shelf labelled 005.7450 - 005.7600, then you would scan until you found the book with the specified DDS#, and at that point you have found your book.
NON-CLUSTERED INDEX
But if you didn't come into the library with the DDS# of your book memorized, then you would need a second index to assist you. In the olden days you would find at the front of the library a wonderful bureau of drawers known as the "Card Catalog". In it were thousands of 3x5 cards -- one for each book, sorted in alphabetical order (by title, perhaps). This corresponds to the "non-clustered index". These card catalogs were organized in a hierarchical structure, so that each drawer would be labeled with the range of cards it contained (Ka - Kl, for example; i.e., the "intermediate node"). Once again, you would drill in until you found your book, but in this case, once you have found it (i.e, the "leaf node"), you don't have the book itself, but just a card with an index number (the DDS#) with which you could find the actual book in the clustered index.
Of course, nothing would stop the librarian from photocopying all the cards and sorting them in a different order in a separate card catalog. (Typically there were at least two such catalogs: one sorted by author name, and one by title.) In principle, you could have as many of these "non-clustered" indexes as you want.
Find below some characteristics of clustered and non-clustered indexes:
Clustered Indexes
Clustered indexes are indexes that uniquely identify the rows in an SQL table.
Every table can have exactly one clustered index.
You can create a clustered index that covers more than one column. For example: create Index index_name(col1, col2, col.....).
By default, a column with a primary key already has a clustered index.
Non-clustered Indexes
Non-clustered indexes are like simple indexes. They are just used for fast retrieval of data. Not sure to have unique data.
Clustered Index
A clustered index determines the physical order of DATA in a table. For this reason, a table has only one clustered index(Primary key/composite key).
"Dictionary" No need of any other Index, its already Index according to words
Nonclustered Index
A non-clustered index is analogous to an index in a Book. The data is stored in one place. The index is stored in another place and the index has pointers to the storage location. this help in the fast search of data. For this reason, a table has more than 1 Nonclustered index.
"Biology Book" at starting there is a separate index to point Chapter location and At the "END" there is another Index pointing the common WORDS location
A very simple, non-technical rule-of-thumb would be that clustered indexes are usually used for your primary key (or, at least, a unique column) and that non-clustered are used for other situations (maybe a foreign key). Indeed, SQL Server will by default create a clustered index on your primary key column(s). As you will have learnt, the clustered index relates to the way data is physically sorted on disk, which means it's a good all-round choice for most situations.
Clustered Index
A Clustered Index is basically a tree-organized table. Instead of storing the records in an unsorted Heap table space, the clustered index is actually B+Tree index having the Leaf Nodes, which are ordered by the clusters key column value, store the actual table records, as illustrated by the following diagram.
The Clustered Index is the default table structure in SQL Server and MySQL. While MySQL adds a hidden clusters index even if a table doesn't have a Primary Key, SQL Server always builds a Clustered Index if a table has a Primary Key column. Otherwise, the SQL Server is stored as a Heap Table.
The Clustered Index can speed up queries that filter records by the clustered index key, like the usual CRUD statements. Since the records are located in the Leaf Nodes, there's no additional lookup for extra column values when locating records by their Primary Key values.
For example, when executing the following SQL query on SQL Server:
SELECT PostId, Title
FROM Post
WHERE PostId = ?
You can see that the Execution Plan uses a Clustered Index Seek operation to locate the Leaf Node containing the Post record, and there are only two logical reads required to scan the Clustered Index nodes:
|StmtText |
|-------------------------------------------------------------------------------------|
|SELECT PostId, Title FROM Post WHERE PostId = #P0 |
| |--Clustered Index Seek(OBJECT:([high_performance_sql].[dbo].[Post].[PK_Post_Id]), |
| SEEK:([high_performance_sql].[dbo].[Post].[PostID]=[#P0]) ORDERED FORWARD) |
Table 'Post'. Scan count 0, logical reads 2, physical reads 0
Non-Clustered Index
Since the Clustered Index is usually built using the Primary Key column values, if you want to speed up queries that use some other column, then you'll have to add a Secondary Non-Clustered Index.
The Secondary Index is going to store the Primary Key value in its Leaf Nodes, as illustrated by the following diagram:
So, if we create a Secondary Index on the Title column of the Post table:
CREATE INDEX IDX_Post_Title on Post (Title)
And we execute the following SQL query:
SELECT PostId, Title
FROM Post
WHERE Title = ?
We can see that an Index Seek operation is used to locate the Leaf Node in the IDX_Post_Title index that can provide the SQL query projection we are interested in:
|StmtText |
|------------------------------------------------------------------------------|
|SELECT PostId, Title FROM Post WHERE Title = #P0 |
| |--Index Seek(OBJECT:([high_performance_sql].[dbo].[Post].[IDX_Post_Title]),|
| SEEK:([high_performance_sql].[dbo].[Post].[Title]=[#P0]) ORDERED FORWARD)|
Table 'Post'. Scan count 1, logical reads 2, physical reads 0
Since the associated PostId Primary Key column value is stored in the IDX_Post_Title Leaf Node, this query doesn't need an extra lookup to locate the Post row in the Clustered Index.
Clustered Index
Clustered indexes sort and store the data rows in the table or view based on their key values. These are the columns included in the index definition. There can be only one clustered index per table, because the data rows themselves can be sorted in only one order.
The only time the data rows in a table are stored in sorted order is when the table contains a clustered index. When a table has a clustered index, the table is called a clustered table. If a table has no clustered index, its data rows are stored in an unordered structure called a heap.
Nonclustered
Nonclustered indexes have a structure separate from the data rows. A nonclustered index contains the nonclustered index key values and each key value entry has a pointer to the data row that contains the key value.
The pointer from an index row in a nonclustered index to a data row is called a row locator. The structure of the row locator depends on whether the data pages are stored in a heap or a clustered table. For a heap, a row locator is a pointer to the row. For a clustered table, the row locator is the clustered index key.
You can add nonkey columns to the leaf level of the nonclustered index to by-pass existing index key limits, and execute fully covered, indexed, queries. For more information, see Create Indexes with Included Columns. For details about index key limits see Maximum Capacity Specifications for SQL Server.
Reference: https://learn.microsoft.com/en-us/sql/relational-databases/indexes/clustered-and-nonclustered-indexes-described
Let me offer a textbook definition on "clustering index", which is taken from 15.6.1 from Database Systems: The Complete Book:
We may also speak of clustering indexes, which are indexes on an attribute or attributes such that all of tuples with a fixed value for the search key of this index appear on roughly as few blocks as can hold them.
To understand the definition, let's take a look at Example 15.10 provided by the textbook:
A relation R(a,b) that is sorted on attribute a and stored in that
order, packed into blocks, is surely clusterd. An index on a is a
clustering index, since for a given a-value a1, all the tuples with
that value for a are consecutive. They thus appear packed into
blocks, execept possibly for the first and last blocks that contain
a-value a1, as suggested in Fig.15.14. However, an index on b is
unlikely to be clustering, since the tuples with a fixed b-value
will be spread all over the file unless the values of a and b are
very closely correlated.
Note that the definition does not enforce the data blocks have to be contiguous on the disk; it only says tuples with the search key are packed into as few data blocks as possible.
A related concept is clustered relation. A relation is "clustered" if its tuples are packed into roughly as few blocks as can possibly hold those tuples. In other words, from a disk block perspective, if it contains tuples from different relations, then those relations cannot be clustered (i.e., there is a more packed way to store such relation by swapping the tuples of that relation from other disk blocks with the tuples the doesn't belong to the relation in the current disk block). Clearly, R(a,b) in example above is clustered.
To connect two concepts together, a clustered relation can have a clustering index and nonclustering index. However, for non-clustered relation, clustering index is not possible unless the index is built on top of the primary key of the relation.
"Cluster" as a word is spammed across all abstraction levels of database storage side (three levels of abstraction: tuples, blocks, file). A concept called "clustered file", which describes whether a file (an abstraction for a group of blocks (one or more disk blocks)) contains tuples from one relation or different relations. It doesn't relate to the clustering index concept as it is on file level.
However, some teaching material likes to define clustering index based on the clustered file definition. Those two types of definitions are the same on clustered relation level, no matter whether they define clustered relation in terms of data disk block or file. From the link in this paragraph,
An index on attribute(s) A on a file is a clustering index when: All tuples with attribute value A = a are stored sequentially (= consecutively) in the data file
Storing tuples consecutively is the same as saying "tuples are packed into roughly as few blocks as can possibly hold those tuples" (with minor difference on one talking about file, the other talking about disk). It's because storing tuple consecutively is the way to achieve "packed into roughly as few blocks as can possibly hold those tuples".
Clustered Index:
Primary Key constraint creates clustered Index automatically if no clustered Index already exists on the table. Actual data of clustered index can be stored at leaf level of Index.
Non Clustered Index:
Actual data of non clustered index is not directly found at leaf node, instead it has to take an additional step to find because it has only values of row locators pointing towards actual data.
Non clustered Index can't be sorted as clustered index. There can be multiple non clustered indexes per table, actually it depends on the sql server version we are using. Basically Sql server 2005 allows 249 Non Clustered Indexes and for above versions like 2008, 2016 it allows 999 Non Clustered Indexes per table.
Clustered Index - A clustered index defines the order in which data is physically stored in a table. Table data can be sorted in only way, therefore, there can be only one clustered index per table. In SQL Server, the primary key constraint automatically creates a clustered index on that particular column.
Non-Clustered Index - A non-clustered index doesn’t sort the physical data inside the table. In fact, a non-clustered index is stored at one place and table data is stored in another place. This is similar to a textbook where the book content is located in one place and the index is located in another. This allows for more than one non-clustered index per table.It is important to mention here that inside the table the data will be sorted by a clustered index. However, inside the non-clustered index data is stored in the specified order. The index contains column values on which the index is created and the address of the record that the column value belongs to.When a query is issued against a column on which the index is created, the database will first go to the index and look for the address of the corresponding row in the table. It will then go to that row address and fetch other column values. It is due to this additional step that non-clustered indexes are slower than clustered indexes
Differences between clustered and Non-clustered index
There can be only one clustered index per table. However, you can
create multiple non-clustered indexes on a single table.
Clustered indexes only sort tables. Therefore, they do not consume
extra storage. Non-clustered indexes are stored in a separate place
from the actual table claiming more storage space.
Clustered indexes are faster than non-clustered indexes since they
don’t involve any extra lookup step.
For more information refer to this article.

Does postgresql have O(1) lookups against the clustered index?

The questions are the following:
Does postgresql (or other database implementations) have O(1) lookups against the clustered index?
ie, a direct lookup of the row position on the file system from the row's id (where the id column is the clustered index)
If there is no way to do a such lookups, is the lookup for a row by id log2n?
Considering this, does postgresql or any sql engine have a way to have indexes yield positions to rows in other tables to avoid this?
Does postgresql or any sql engine have a way to lookup rows directly (and the lifecycle associated with how rows are moved)?
I am presuming rows don't move relative to database engine storage format unless the clustered index is changed...
These questions stem from the following junction table necessary for implementing many-to-many relationships:
junction_table:
parent_id
child_id
retrieving set of child_ids
select * from junction_table where parent_id=parent_value
a fundamentally correct implementation should yield a set of locations for the child rows
worse, at least a way to calculate child rows positions from the set of child_ids
VS a one-to-many query that yields the direction location of the child row:
one_to_many_child_table:
id
name
parent_id
select * from child_table where p_id=parent_value
Many Issues -- Let me mention each, the put the pieces together.
BTrees, by their nature are O(logn). But, you can think of it pretty much as O(1). A BTree might typically have 100 child links in each node. That says that a million rows would be only 3 levels deep; a trillion rows would be about 6 levels deep.
Furthermore, LRU caching (such as MySQL does at the block level) tends to keep at least the non-leaf nodes in cache. (Having what you need in cache is the real optimization for large databases.)
B+Tree -- Take a BTree and add bidirectional links between the leaf nodes. This makes "range scans" very efficient.
B+Tree indexes are the "best overall".
Clustering In this context, let's say that 'clustering' implies that the unique row identifier is stored with the data. (For MySQL, that's the PRIMARY KEY; some others us a 'rownum'.)
PRIMARY KEY may be clustered and/or unique -- this varies with database implementations.
Secondary key is usually a BTree, but getting from it to the data is implemented in different ways. It might point directly to the data; it might have a "rownum", which can be used to find the record; or it might have a copy of the Primary key, thereby allowing the lookup of the row via the PK.
MyISAM's InnoDB -- A PRIMARY KEY is clustered with the data, organized as a B+Tree, and unique. This implies that a point query by the PK will do one dive in a BTree to find the entire row.
A Secondary key in InnoDB has a separate BTree, and a copy of the PK is found in the leaf node. So, a secondary key lookup is two dives (one in secondary BTree, one in PK+data BTree). That is, unless the index is 'covering' and all the columns needed (for a SELECT) are found in the Secondary key + primary key.
MySQL's MyISAM -- MySQL's older engine (which has gone out of favor) implemented both PRIMARY KEY and Secondary keys as BTrees where the leaf node has a byte address into the data file. So both types of key involve one BTree dive plus a filesystem 'seek' into another file.
Hash -- A true O(1) lookup requires a perfect hash. No one implements that. However some implementations have a Hash + some form of handling overflows. So that is O(1) sometimes, and a little slower other times. (MySQL has Hash available on its MEMORY engine.)
Rownum / Rowid -- This is some kind of number that lets to db go straight to the row. Oracle, for example, uses this kind of thing. However, you have to map your key to a rownum first. So, it is somewhat a 2-step process. (MySQL does not use Rownum/Rowid.)
One to many -- In any situation, the index to make 1:many efficient will have the "many" clustered next to each other in the index, but are likely to have the "rows" they point to scattered around the data.
Postgresql (I do not know how Postgres works.)

Understanding keys in databases

This question is geared towards MySQL, since that is what I'm using -- but I think that it's probably the same or similar for almost every major database implementation.
How do keys work in a database? By that I mean, when you set a field to 'primary key', 'unique key' or an 'index' -- what do each of these do, and when should I use each one?
Right now I have a table containing a few fields, one of them being a GUID (minus the { and } around it). I set the GUID field to the primary key and I see that it created a binary tree. So it improves search performance -- but what differentiates that from other types of keys?
I realize this may not really be programming related (although it is development related) -- I wasn't sure where exactly to ask this but SO is what I use the most so I'll ask here. Migrate as necessary
There are probably hundreds of references for this elsewhere on the web, so a bit of Googling will help you get deep into understanding DB design. That said, the basic gist is:
primary key: a field or combination of fields which must be unique for each row, and which is/are indexed to provide rapid lookup of a row given a key value; cannot contain NULL, and a table can only have one primary key. Generally indexed in a clustered index, which means that the data in the table is reordered to match the order of the index, a process that greatly improves serial data retrieval. (This is the main reason a table can only have one primary key -- the order of the data can't match the order of more than one index!)
unique key: same as a primary key, but on some DB platforms, can contain NULL values so long as they don't violate the uniqueness constraint. (In other words, if the unique key contains a single column, there can only be one row in the table with NULL in that column; if the key contains more than one column, then the table can only contain rows with NULLs in the columns such that there's no non-unique duplication of NULL values across the columns in the key.) On other platforms (including MySQL), unique constraints can contain multiple NULLs; the uniqueness constraint only applies to non-NULL values of the referenced columns. There can be more than one of these per table. Indexed in a non-clustered index.
index: a field or combination of fields which are pre-indexed for more rapid retrieval given a value for the field(s) in the index. A table can have more than one index.
When you define a primary key, the database creates an index based on that key. It needs to be unique. In general you can create an index that to speed up access to data based on non-unique query data. The indexed retrieval time for a uniquely keyed data should be better than for non-uniquely keyed indexes, so I try to use unique indexes where possible.
At the most basic, primary keys represent how the records will be physically stored in memory / on disk, you would want the unique field you're going to search on the most to be this as it will greatly reduce searching.
Unique key's are fields that can only contain unique values.
An index is a specialized "map" to the database file that queries can reference.
These are extremely simplified answers, but I think that's the gist of it.
One more thing, any key is essentially a separate table that is sorted by the index that points directly to the row(s) that match the key.
A BTree style index is stored in a balanced tree, a balanced tree is a tree structure where traveling left is smaller and traveling right is larger.
5
3 7
2 4 6 8
Would be an example of a balanced tree. The other major type is a Hash, where a mathematical expression turns the key into the relative memory location of the key.
In order to really understand keys, you have to understand them at three levels: conceptual, logical, and physical. I'm going to reverse my habitual order, and discuss physical first.
Most programmers tend to think at the physical level. At the physical level, a key is a surrogate (stand-in) for the address of a row. When a row is to be referenced, a copy of the key can be used to specify the row. When a reference to a row is made in another row, the copy is known as a foreign key.
Most experienced programmers have a thorough understanding of pointers and addresses, and would understand exactly how the data structure worked if only it used pointers and addresses. Before the relational databases became dominant, there were in fact databases that used pointers to records embedded in other records to tie the data together.
A disadvantage to using keys instead of pointers is that the DBMS has to use an index to translate a key reference back to a pointer in order to retrieve the row in question. An advantage is that the level of indirection allows the DBMS to shuffle all the rows in a table for whatever purpose, as long as the DBMS updates all the relevant indexes accordingly.
Viewed at this level, keys might as well be simple, integer, and autoincremented. These work faster than other kinds of keys, and they sidestep certain data management issues that arise when user supplied data is missing or inconsistent. However, sidestepping data management issues at this level can create a minefield at the two higher levels.
At the logical level, a key is a minimal subset of the data in a tuple (row) that allows a single matching tuple to be specified, and when the DBMS retrieves the container for that tuple, all the attributes in the tuple are now available. Every relation has at least one candidate key. In the worst case, the entire tuple is the only candidate key. When multiple candidate keys exist for a single relation (table), common practice is to choose one candidate key as the primary key, and to make all references via this primary key.
(Actually, relation and table are not synonymous, but I'm simplifying here. Likewise, tuple and row are not synonymous, although they look identical at first glance.)
The primary reason to declare a primary key is to rule out duplicate keys or missing keys.
Sometimes database people choose to leave duplicate and missing key avoidance up to the programmers whose applications write to the database. More commonly, a primary key constraint serves to reflect an error back to a program that violates a primary key constraint.
When a DBMS sets up a primary key constraint, it also builds an index on the primary key. This allows the DBMS to find duplicates quickly, and it also speeds up certain queries that use the key column(s).
At the conceptual level, keys are the means by which the user community identifies instances of entities, whether those entities are persons (employees, travellers, etc.), things (bank accounts, hotel rooms, etc.) or whatever. The key is data and the entity identified by the key is not data. The key can thus be seen a surrogate for the entity in the database.
At the conceptual level, keys are always natural, and never automatically supplied by the system. However, in the real world, keys are often mismanaged, and the consequences of mismanagement are overcome by what is called "common sense". Instilling common sense into an automated system is generally not feasible.
I never really described an index in the above, but it's implicit in what I said. An index is a data structure that serves to map from a key to a pointer. In all the databases you are likely to use, indexes are declared by the database builder (or perhaps a DBA) and managed by the DBMS.