There is a table t_bl with the following fields (id,a,b).
select a,b from t_bl where a = "XXX";
In this case, the b field is not used as the basis for retrieval, but needs to appear in the retrieval results.
Then there are the following two index building schemes.
Create an index on the a field
Advantage: Makes each node of the index reduce the space overhead of storing the b field.
Disadvantage: Returning the query results requires returning the table to the clustered index
through the primary key id of the secondary index to query the value of the b field, which
affects the query performance.
Create a joint index of a,b fields
Advantage: Covering indexes can be used to reduce return tables and improve query efficiency.
Disadvantage: Making every non-node in the secondary index adds unnecessary space overhead for storing the b field (because the b field is not the basis for retrieval).
So why not provide a mechanism that enables users to create a secondary index that is asymmetric between non-leaf nodes and leaf nodes?
For example, in this example, the user's better choice is to create a non-leaf node to store the a field, and the leaf node to store the indexes of the two fields a and b.
Some implementations of SQL databases do exactly what you describe, adding a column to the leaf node only, so it doesn't take space in the non-leaf nodes, but can be used for covering indexes.
An example of a product that does this is Microsoft SQL Server, which supports syntax allowing you to define optional non-key columns to INCLUDE() in a secondary index. See https://learn.microsoft.com/en-us/sql/relational-databases/indexes/create-indexes-with-included-columns?view=sql-server-ver16
However, InnoDB is not currently implemented to do this. As far as I know, there's no reason it can't do this, but they have not implemented it. I guess other features were higher priority.
For what it's worth, the SQL standard doesn't include anything about indexes, so each vendor implements their indexing feature totally as an extension to the standard. They are therefore free to implement index features according to their own priorities.
Simply make a "composite" "covering" index:
INDEX(a, b)
That particular query will run faster because it does not need to bounce between the index's BTree and the data's BTree.
In some similar situations, I recommend
PRIMARY KEY(a, id), -- all columns are efficiently accessed via a=..
INDEX(id) -- to keep AUTO_INCREMENT happy
As for 2's disadvantage -- It is tiny. I like the Rule of Thumb that MySQL's BTrees have a fan-out of about 100. That is, each node has 100 child nodes. The corollary is that the non-leaf nodes take up only 1% of the total disk space. (This may be the clinching argument for the developers to say "let's not bother implementing INCLUDE".)
Related
Even if I don't have a primary key or unique key, InnoDB still creates a cluster index on a synthetic column as described below.
https://dev.mysql.com/doc/refman/5.5/en/innodb-index-types.html
So, why does InnoDB have to require clustered index? Is there a defenite reason clustered index must exist here?
In Oracle Database or MSSQL I don't see they require this.
Also, I don't think cluster index have so tremendous advantage comparing to ordinary table either.
It is true that looking for data using clustering key does not need an additional disk read and faster than when I don't have one but without cluster index, secondary index can look up faster by using physical rowID.
Therefore, I don't see any reason for insisting using it.
Other vendors have a "ROWNUM" or something like that. InnoDB is much simpler. Instead of having that animal, it simply requires something that you will usually want anyway. In both cases, it is a value that uniquely identifies a row. This is needed for guts of transactions -- knowing which row(s) to lock, etc, to provide transactional integrity. (I won't go into the rationale here.)
In requiring (or providing) a PK, and in doing certain other simplifications, InnoDB sacrifices several little-used (or easily worked around) features: Multiple pks, multiple clustered indexes, no pk, etc.
Since the "synthetic column" takes 6 bytes, it is almost always better to simply provide id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY, even if you don't use it. But if you don't use it, but do have a non-NULL UNIQUE key, then you may as well make it the PK. (As MySQL does by default.)
A lookup by a secondary key first gets the PK value from the secondary key's BTree. Then the main BTree (with the data ordered by the PK) is drilled down to find the row. Hence, secondary keys can be slower that use of the PK. (Usually this is not enough slower to matter.) So, this points out one design decision that required a PK.) (Other vendors use ROWNUM, or something, to locate the record, instead of the PK.)
Back to "Why?". There are many decisions in MySQL where the designers said "simplicity is better for this free product, let's not bother building some complex, but little-used feature. At first there were no subqueries (temp tables were a workaround). No Views (they are only syntactic sugar). No Materialized Views (OK, this may be a failing; but they can be simulated). No bit-mapped or hash or isam (etc) indexing (BTree is very good for "all-around" usage).
Also, by always "clustering" the PK with the data, lookups via the PK are inherently faster than the competition (no going through a ROWNUM). (Secondary key lookups may not be faster.)
Another difference -- MySQL was very late in implementing "index merge", wherein it uses two indexes, then ANDs or ORs the results. This can be efficient with ROWNUMs, but not with clustered PKs.
(I'm not a MySQL/MariaDB/Percona developer, but I have used them since 1999, and have been to virtually all major MySQL Conferences, where inside info is often divulged. So, I think I have enough insight into their thinking to present this answer.)
The questions are the following:
Does postgresql (or other database implementations) have O(1) lookups against the clustered index?
ie, a direct lookup of the row position on the file system from the row's id (where the id column is the clustered index)
If there is no way to do a such lookups, is the lookup for a row by id log2n?
Considering this, does postgresql or any sql engine have a way to have indexes yield positions to rows in other tables to avoid this?
Does postgresql or any sql engine have a way to lookup rows directly (and the lifecycle associated with how rows are moved)?
I am presuming rows don't move relative to database engine storage format unless the clustered index is changed...
These questions stem from the following junction table necessary for implementing many-to-many relationships:
junction_table:
parent_id
child_id
retrieving set of child_ids
select * from junction_table where parent_id=parent_value
a fundamentally correct implementation should yield a set of locations for the child rows
worse, at least a way to calculate child rows positions from the set of child_ids
VS a one-to-many query that yields the direction location of the child row:
one_to_many_child_table:
id
name
parent_id
select * from child_table where p_id=parent_value
Many Issues -- Let me mention each, the put the pieces together.
BTrees, by their nature are O(logn). But, you can think of it pretty much as O(1). A BTree might typically have 100 child links in each node. That says that a million rows would be only 3 levels deep; a trillion rows would be about 6 levels deep.
Furthermore, LRU caching (such as MySQL does at the block level) tends to keep at least the non-leaf nodes in cache. (Having what you need in cache is the real optimization for large databases.)
B+Tree -- Take a BTree and add bidirectional links between the leaf nodes. This makes "range scans" very efficient.
B+Tree indexes are the "best overall".
Clustering In this context, let's say that 'clustering' implies that the unique row identifier is stored with the data. (For MySQL, that's the PRIMARY KEY; some others us a 'rownum'.)
PRIMARY KEY may be clustered and/or unique -- this varies with database implementations.
Secondary key is usually a BTree, but getting from it to the data is implemented in different ways. It might point directly to the data; it might have a "rownum", which can be used to find the record; or it might have a copy of the Primary key, thereby allowing the lookup of the row via the PK.
MyISAM's InnoDB -- A PRIMARY KEY is clustered with the data, organized as a B+Tree, and unique. This implies that a point query by the PK will do one dive in a BTree to find the entire row.
A Secondary key in InnoDB has a separate BTree, and a copy of the PK is found in the leaf node. So, a secondary key lookup is two dives (one in secondary BTree, one in PK+data BTree). That is, unless the index is 'covering' and all the columns needed (for a SELECT) are found in the Secondary key + primary key.
MySQL's MyISAM -- MySQL's older engine (which has gone out of favor) implemented both PRIMARY KEY and Secondary keys as BTrees where the leaf node has a byte address into the data file. So both types of key involve one BTree dive plus a filesystem 'seek' into another file.
Hash -- A true O(1) lookup requires a perfect hash. No one implements that. However some implementations have a Hash + some form of handling overflows. So that is O(1) sometimes, and a little slower other times. (MySQL has Hash available on its MEMORY engine.)
Rownum / Rowid -- This is some kind of number that lets to db go straight to the row. Oracle, for example, uses this kind of thing. However, you have to map your key to a rownum first. So, it is somewhat a 2-step process. (MySQL does not use Rownum/Rowid.)
One to many -- In any situation, the index to make 1:many efficient will have the "many" clustered next to each other in the index, but are likely to have the "rows" they point to scattered around the data.
Postgresql (I do not know how Postgres works.)
We are in the process of migrating from MySQL to PGSQL and we have a 100 million row table.
When I was trying to ascertain how much space both systems use, I found much less difference for tables, but found huge differences for indexes.
MySQL indexes were occupying more size than the table data itself and postgres was using considerably lesser sizes.
When digging through for the reason, I found that MySQL uses B+ trees to store the indexes and postgres uses B-trees.
MySQL usage of indexes was a little different, it stores the data along with the indexes (due to which the increased size), but postgres doesn't.
Now the questions:
Comparing B-tree and B+ trees on database speak, it is better to use B+trees since they are better for range queries O(m) + O(logN) - where m in the range and lookup is logarithmic in B+trees?
Now in B-trees the lookup is logarithmic for range queries it shoots up to O(N) since it does not have the linked list underlying structure for the data nodes. With that said, why does postgres uses B-trees? Does it perform well for range queries (it does, but how does it handle internally with B-trees)?
The above question is from a postgres point of view, but from a MySQL perspective, why does it use more storage than postgres, what is the performance benefit of using B+trees in reality?
I could have missed/misunderstood many things, so please feel free to correct my understanding here.
Edit for answering Rick James questions
I am using InnoDB engine for MySQL
I built the index after populating the data - same way I did in postgres
The indexes are not UNIQUE indexes, just normal indexes
There were no random inserts, I used csv loading in both postgres and MySQL and only after this I created the indexes.
Postgres block size for both indexes and data is 8KB, I am not sure for MySQL, but I didn't change it, so it must be the defaults.
I would not call the rows big, they have around 4 text fields with 200 characters long, 4 decimal fields and 2 bigint fields - 19 numbers long.
The P.K is a bigint column with 19 numbers,I am not sure if this is bulky? On what scale should be differentiate bulky vs non-bulky?
The MySQL table size was 600 MB and Postgres was around 310 MB both including indexes - this amounts to 48% bigger size if my math is right.But is there a way that I can measure the index size alone in MySQL excluding the table size? That can lead to better numbers I guess.
Machine info : I had enough RAM - 256GB to fit all the tables and indexes together, but I don't think we need to traverse this route at all, I didn't see any noticeable performance difference in both of them.
Additional Questions
When we say fragmentation occurs ? Is there a way to do de-fragmentation so that we can say that beyond this, there is nothing to be done.I am using Cent OS by the way.
Is there a way to measure index size along in MySQL, ignoring the primary key as it is clustered, so that we can actually see what type is occupying more size if any.
First, and foremost, if you are not using InnoDB, close this question, rebuild with InnoDB, then see if you need to re-open the question. MyISAM is not preferred and should not be discussed.
How did you build the indexes in MySQL? There are several ways to explicitly or implicitly build indexes; they lead to better or worse packing.
MySQL: Data and Indexes are stored in B+Trees composed of 16KB blocks.
MySQL: UNIQUE indexes (including the PRIMARY KEY) must be updated as you insert rows. So, a UNIQUE index will necessarily have a lot of block splits, etc.
MySQL: The PRIMARY KEY is clustered with the data, so it effectively takes zero space. If you load the data in PK order, then the block fragmentation is minimal.
Non-UNIQUE secondary keys may be built on the fly, which leads to some fragmentation. Or they can be constructed after the table is loaded; this leads to denser packing.
Secondary keys (UNIQUE or not) implicitly include the PRIMARY KEY in them. If the PK is "large" then the secondary keys are bulky. What is your PK? Is this the 'answer'?
In theory, totally random inserts into a BTree lead to a the blocks being about 69% full. Maybe this is the answer. Is MySQL 45% bigger (1/69%)?
With 100M rows, probably many operations are I/O-bound because you don't have enough RAM to cache all the data and/or index blocks needed. If everything is cached, then B-Tree versus B+Tree won't make much difference. Let's analyze what needs to happen for a range query when things are not fully cached.
With either type of Tree, the operation starts with a drill-down in the Tree. For MySQL, 100M rows will have a B+Tree of about 4 levels deep. The 3 non-leaf nodes (again 16KB blocks) will be cached (if they weren't already) and be reused. Even for Postgres, this caching probably occurs. (I don't know Postgres.) Then the range scan starts. With MySQL it walks through the rest of the block. (Rule of Thumb: 100 rows in a block.) Ditto for Postgres?
At the end of the block something different has to happen. For MySQL, there is a link to the next block. That block (with 100 more rows) is fetched from disk (if not cached). For a B-Tree the non-leaf nodes need to be traversed again. 2, probably 3 levels are still cached. I would expect the need for another non-leaf node to be fetched from disk only 1/10K rows. (10K = 100*100) That is, Postgres might hit the disk 1% more often than MySQL, even on a "cold" system.
On the other hand, if the rows are so fat that only 1 or 2 can fit in a 16K block, the "100" I kept using is more like "2", and the 1% becomes maybe 50%. That is, if you have big rows this could be the "answer". Is it?
What is the block size in Postgres? Note that many of the computations above depend on the relative size between the block and the data. Could this be an answer?
Conclusion: I've given you 4 possible answers. Would you like to augment the question to confirm or refute that each of these apply? (Existence of secondary indexes, large PK, inefficient building of secondary indexes, large rows, block size, ...)
Addenda about PRIMARY KEY
For InnoDB, another thing to note... It is best to have a PRIMARY KEY in the definition of the table before loading the data. It is also best to sort the data in PK order before LOAD DATA. Without specifying any PRIMARY KEY or UNIQUE key, InnoDB builds a hidden 6-byte PK; this is usually sub-optimal.
At databases you have often queries who delivers some data ranges like id's from 100 to 200.
In this case
B-Tree needs to follow the path from the root to the leafs for every single entry to get the data-pointer.
B+-Trees can 'walk' through the leafs and has to follow the path to the leafs only the first time (i.e. for the id 100)
This is because B+-Trees stores only the data (or data-pointer) in the leafs and the leafs are linked so that you can perform a rapid in-order-traversal.
B+-Tree
Another point is:
At B+Trees the inner nodes stores only pointer to other nodes without any data-pointer, so you have more space for pointers and you need less IO-Operations and you can store more node-pointers at a memory-page.
So for range-queries B+-Trees are the optimum data-strucure. For single selections B-Trees might be better (causes of the depth/size of the tree), cause the data-pointer are located also inside the tree.
MySQL and PostgreSQL aren't really comparable here Innodb uses an index to store table data (and secondary indexes just point at the pkey). This is great for single row pkey lookups and with B+ trees, do ok with range queries on the pkey field, but have performance drawbacks for everything else.
PostgreSQL uses heap tables and puts indexes as separate. It supports a number of different indexing algorithms. Depending on your range query, a btree index may not help you and you may need a GiST Index instead. Similarly GIN indexes work well with member lookups (for arrays, fts etc).
I think btree is used because it excels at the simple use case: what roes contain the following data? This becomes a building block of GIN for example.
But it isn't true that PostgreSQL cannot use B+ trees. GiST is built on B+ Tree indexes in a generalized format. So PostgreSQL gives you the option to use B+ trees where they come in handy.
I've always heard that "proper" indexing of one's SQL tables is key for performance. I've never seen a real-world example of this and would like to make one using SQLFiddle but not sure on the SQL syntax to do so.
Let's say I have 3 tables: 1) Users 2) Comments 3) Items.
Let's also say that each item can be commented on by any user. So to get item=3's comments here's what the SQL SELECT would look like:
SELECT * from comments join users on comments.commenter_id=users.user_id
WHERE comments.item_id=3
I've heard that generally speaking if the number of rows gets large, i.e., many thousands/millions, one should put indices on the WHERE and the JOINed column. So in this case, comments.item_id, comments.commenter_id, and users.user_id.
I'd like to make a SQLFiddle to compare having these tables indexed vs. not using many thousands, millions rows for each table. Might someone help with generating this SQLFiddle?
I'm the owner of SQL Fiddle. It definitely is not the place for generating huge databases for performance testing. There are too many other variables that you don't (but should, in real life) have control over, such as memory, hdd configuration, etc.... Also, as a shared environment, there are other people using it which could also impact your tests. That being said, you can still build a small db in sqlfiddle and then view the execution plans for queries with and without indexes. These will be consistent regardless of other environmental factors, and will be a good source for learning optimization.
There's quite a few different ways to index a table and you might choose to index multiple tables differently depending on what your most used SELECT statements are. The 2 fundamental types of indexes are called clustered and non-clustered.
Clustered indexes store all of the information on the index itself rather than storing a list of references that the database can pull from and then use to find the actual data. The easiest way to visualize this is to think of the index and the table itself as separate objects. In a clustered index, if the column you indexed is used as a criterion (in the WHERE clause) then the information the query pulls will be pulled directly from the index and not the table.
On the other hand, non-clustered indexes is more like a reference table. It tells the query where the actual information it is requesting is stored at on the table object itself. So in essence, there is an extra step involved of actually retrieving the data from the table itself when you use non-clustered indexes.
Clustered indexes store data physically on the hard disk in a sequential order, and as a result of that, you can only have one clustered index on a table (since we can only store a table in one 'physical' way on a disk drive). Clustered indexes also need to be unique (although this may not be the case to the naked eye, it is always the case to the database itself). Because of this, most clustered indexes are put on the primary key (since most primary keys are unique).
Unlike clustered indexes, you can have as many non-clustered indexes are you want on a table since after all, they are just reference tables for the actual table itself. Since we have an essentially unlimited number of options for non-clustered indexes, users like to put as many of these as needed on columns that are commonly used in the WHERE clause of a SELECT statement.
But like all things, excess is not always good. The more indexes you put on a table, the more 'overhead' there is on that table. Indexes might speed up your query runs, but excessive overhead will also slow them down. The key is to find a balance between too many indexes and not enough indexes for your particular situation.
As far as a good place to test the performance of your queries with or without indexes, I would recommend using SQL Server. There's a function in SQL Server Management Studio called 'Execution Plan' which tells you the cost and time to run of a query.
Sequential keys allow one to use clustered index. How material is that benefit? How much is lost if 1% (say) of the keys are out of sequential order by one or two rank?
Thanks,
JDelage
Short:
Clustered index, in general, can be used on anything that is sortable. Sequentiality (no gaps) is not required - your records will be maintained in order with common index maintenance principles (only difference is that with clustered index the leafs are big because they hold the data, too).
Long:
Good clustering can give you orders of magnitude improvements.
Basically with good clustering you will be reading data very efficiently on any spinning media.
The measure on which you should evaluate if the clustering is good should be done by examining the most common queries (that actually will read data and can not be answered by indexes alone).
So, for example if you have composite natural keys as primary key on which the table is clustered AND if you always access the data according to the subset of the key then with simple sequential disk reads you will get answers to your query in the most efficient way.
However, if the most common way to access this data is not according to the natural key (for example the application spends 95% of time looking for last 5 records within the group AND the date of update is not part of the clustered index), then you will not be doing sequential reads and your choice of the clustered index might not be the best.
So, all this is at the level of physical implementation - this is where things depend on the usage.
Note:
Today not so relevant, but tomorrow I would expect most DBs to run off the SSDs - where access times are nicer and nicer and with that (random access reads are similar in speed to sequential reads on SSDs) the importance of clustered indexes would diminish.
You need to understand the purpose of the clustered-index.
It may be helpful in some cases, to speed up inserts, but mostly, we use clustered indexes to make queries faster.
Consider the case where you want to read a range of keys from a table - this is very common - it's called a range scan.
Range scans on a clustered index are massively better than a range scan on a secondary index (not using a covering index). These are the main case for using clustered indexes. It mostly saves 1 IO operation per row in your result. This can be the difference between a query needing, say, 10 IO operations and 1000.
It really is amazing, particularly if you have no blobs and lots of records per page.
If you have no SPECIFIC performance problem that you need to fix, don't worry about it.
But do also remember, that it is possible to make a composite primary key, and that your "unique ID" need not be the whole primary key. A common (very good) technique is to add something which you want to range-scan, as the FIRST part of the PK, and add a unique ID (meaningless) afterwards.
So consider the case where you want to scan your table by time - you can make the time the first part of the PK (it is not going to be unique, so it's not enough on its own), and a unique ID the second.
Do not however, do premature optimisation. If your database fits in memory (Say 32Gb), you don't care about IO operations. It's never going to do any reads anyway.