I know if a table is too big, the indexes can hardly be fit into the buffer_pool,
so using index may result in a large number of random disk IO. So the full table scan,
in general, is probably much faster than index scan even though it only reads about %1 rows.
What I am confused about is :
[0] If there are a big table( 30 millions rows),and many small tables(each table can be fit into memory(buffer)),
will the big table also affect query about small tables ?
My logic is <======>
the buffer is shared by the whole database, so the big table will take most of buffer.
So the indexes of small tables can also hardly be fit into buffer(or it's often
removed from the buffer). Then the above conclusion(full table scan vs index scan) can be applied to this case .
[1] When the big table are partitioned into may small tables(in just one machine), the situation of buffer should keep identical.
So such partition cannot solve this problem(full table scan vs index scan), right? so the "big table" should not mean "one big table", but the "huge database or the sum of data is large"
To sum up, is my inclusion right? if wrong, why? Please give me a hint. Thanks very much.
The buffer_pool is shared across all tables, data and index. But the rest of what you said is needs to focus on "blocks" instead of "tables".
Caching is performed on a block basis. A block (in InnoDB) is 16KB. Most of the innodb_buffer_pool_size is dedicated to data and index blocks.
The cache is run (approximately) as LRU (Least Recently Used) -- That is, the least recently used blocks are tossed from the cache when other blocks are needed.
No, a table or index is not "entirely" loaded into the cache. Instead, the desired blocks are loaded (and purged) when needed.
If all the data and indexes fit into the cache, then (eventually) all the blocks will 'live' there.
If the data plus indexes are too big, then blocks will come and go as needed. Usually this is nearly as good as having them all loaded. For example, if you are usually using "recent" records, then the blocks containing them will 'stay' in the cache; meanwhile "old" blocks will get bumped out.
If you are using UUIDs (GUIDs), performance can get really bad -- this is because of the random nature of such indexed values.
Full table scans (and full index scans) should be avoided whether or not things are too big to fit in cache. They are costly, and they can usually be avoided by proper indexing and/or query formulation.
When you do a full table scan on a table that is bigger than the cache, something's gotta give. You will have to do some I/O, and some blocks will be bumped out of cache. However, there is a technique built in that prevents blindly purging the entire cache for an occasional table scan. For further discussion, research innodb_old_blocks_pct. (No, I don't recommend changing it from the default 37%.)
What do you mean by partitioning a table? If you mean the builtin PARTITION mechanism, then so what? If you scan a table you are scanning all the partitions. Same number blocks; same impact on the cache.
I have dealt with sets of tables that exceed the buffer_pool by a factor of 10 or more. I can discuss performance techniques, but I need a specific SHOW CREATE TABLE (with or without PARTITIONs) and some of the naughty queries (such as table scans).
The Optimizer chooses between doing a table scan and using an index based on a variety of statistics, etc. A Rule of Thumb is that, if more than 20% of the rows need to be touched, it will do a table scan instead of bouncing between the index and the data. (Note: the cutoff is much higher than the 1% you mentioned.)
An Index is structured as a BTree in 16KB blocks, so it is very efficient to start in the middle and scan a range. For example: INDEX(last_name) for WHERE last_name LIKE 'J%' would probably do a "range scan" of 10% of the index, even if that involved bouncing over to the table a lot.
Related
I have a table such as follows:
CREATE TABLE Associations (
obj_id int unsigned NOT NULL,
attr_id int unsigned NOT NULL,
assignment Double NOT NULL
PRIMARY KEY (`obj_id`, `attr_id`),
);
Now the insertion order for the rows is/will be random. Would such a definition lead to fragmentation of the table? Should I be adding an auto inc primary key or would that only speed up the insert and would not help the speed of SELECT queries?
What would a better table definition be for random inserts?
Note, that performance wise I am more interested in SELECT than INSERT
(Assuming you are using ENGINE=InnoDB.)
Short answer: Do not fret about fragmentation.
Long answer:
There are two types of "fragmentation" -- Which one bothers you?
BTree blocks becoming less than full.
Blocks becoming scattered around the disk.
If you have an SSD disk, the scattering of blocks around the disk has no impact on performance. For HDD, it matters some, but still not enough to get very worried about.
Fragmentation does not "run away". If two adjacent blocks are seen to be relatively empty, they are combined. Result: The "average" block is about 69% full.
In your particular example, when you want multiple "attributes" for one "object", they will be found "clustered". That is they will be mostly in the same block, hence a bit faster to access. Adding id AUTO_INCREMENT PRIMARY KEY would slow SELECTs/UPDATEs down.
Another reason why an id would slow down SELECTs is that SELECT * FROM t WHERE obj_id=... needs to first find the item in the index, then reach into the data for the other columns. With PRIMARY KEY(obj_id, ...), there is no need for this extra hop. (In some situations, this is a big speedup.)
OPTIMIZE TABLE takes time and blocks access while you are running it.
Even after OPTIMIZE, fragmentation comes back -- for a variety of reasons.
"Fill factor" is virtually useless -- UPDATE and DELETE store extra copies of rows pending COMMIT. This leads to block splits (aka page splits) if fill_factor is too high or sparse blocks if too low. That is, it is too hard to be worth trying to tune.
Fewer indexes means less disk space, etc. You probably need an index on (obj_id, attr_id) whether or not you also have (id). So, why waste space when it does not help?
The one case where OPTIMIZE TABLE can make a noticeable difference is after you delete lots of rows. I discuss several ways to avoid this issue here: http://mysql.rjweb.org/doc.php/deletebig
I guess you use the InnoDB access method. InnoDB stores its data in a so-called clustered index. That is, all the data is stashed away behind the BTREE primary key.
Read this for background.
When you insert a row, you're inserting it into the BTREE structure. To oversimplify, BTREEs are made up of elaborately linked pages accessible in order. That means your data goes into some page somewhere. When you insert data in primary-key order, the data goes into a page at the end of the BTREE. So, when a page fills up, InnoDB just makes another one and puts your data there.
But, when you insert in some other order, often your row must go between other rows in an existing BTREE page. If the page has enough free space, InnoDB can drop your data into it. But, if the page does not have enough space, InnoDB must do a page split. It makes two pages from one, and puts your new row into one of the two.
Doing inserts in some order other than index order causes more page splits. That's why it doesn't perform as well. The classic example is building a table with a UUIDv4 (random) primary key column.
Now, you asked about autoincrementing primary keys. If you have such a key in your InnoDB table, all (or almost all) your INSERTs go into the last page of the clustered index, so you don't get the page split overhead. Cool.
But, if you need an index on some other column or columns that aren't in your INSERT order, you'll get page splits in that secondary index. The entries in secondary indexes are often smaller than the ones in clustered indexes, so you get fewer page splits. But you still get them.
Some DBMSs, but not MySQL, let you declare FILL_PERCENT(50) or something in both clustered and secondary indexes. That's useful for out-of-order loads because your can make your pages start out with less space already used, so you get fewer page splits. (Of course, you use more RAM and SSD with lower fill factors.)
MySQL doesn't have FILL_FACTOR in its data definition language. It does have a global systemwide variable called innodb_fill_factor. It is a percentage number. Its default is 100, which actually means 1/16th of each page is left unused.
If you know you have to do a big out-of-index-order bulk load you can give this
command first to leave 60% of each new page available, to reduce page splits.
SET GLOBAL innodb_fill_factor = 40;
But beware, this is a system-wide setting. It will apply to everything on your MySQL server. You might want to put it back when done to save RAM and SSD space in production.
Finally, OPTIMIZE TABLE tablename; can reorganize tables that have had a lot of page splits to clean them up. (In InnoDB the OPTIMIZE command actually maps to ALTER TABLE tablename FORCE; ANALYZE TABLE tablename;.) It can take a while, so beware.
When you OPTIMIZE, InnoDB remakes the pages to bring their fill percentages near to the number you set in the system variable.
Unless you're doing a really vast bulk load on a vast table, my advice is to not worry about all this fill percentage business. Design your table to match your application and don't look back.
When you're done with any bulk load you can, if you want, do OPTIMIZE TABLE to get rid of any gnarly page splits.
Edit Your choice of primary key is perfect for your queries' WHERE pattern obj_id IN (val, val, val). Don't change that primary key, especially not to an autoincrementing one.
Pro tip It's tempting to try to forsee scaling problems in the early days of an app's lifetime. And there's no harm in it. But in the case of SQL databases, it's really hard to forsee the actual query patterns that will emerge as your app scales up. Fortunately, SQL's designed so you can add and tweak indexes as you go. You don't have to achieve performance perfection on day 1. So, my advice: think about this issue, but avoid overthinking it. With respect, you're starting to overthink it.
In MySql InnoDB, is there an performance advantage of partitioning the table compared to simply using an index?
Common considerations:
Is an Index the Best Solution?
An index isn’t always the right tool. At a high level, keep in mind that indexes are most
effective when they help the storage engine find rows without adding more work than
they avoid. For very small tables, it is often more effective to simply read all the rows
in the table. For medium to large tables, indexes can be very effective. For enormous
tables, the overhead of indexing, as well as the work required to actually use the indexes,
can start to add up. In such cases you might need to choose a technique that identifies
groups of rows that are interesting to the query, instead of individual rows. You can
use partitioning for this purpose.
If you have lots of tables, it can also make sense to create a metadata table to store some
characteristics of interest for your queries. For example, if you execute queries that
perform aggregations over rows in a multitenant application whose data is partitioned
into many tables, you can record which users of the system are actually stored in each
table, thus letting you simply ignore tables that don’t have information about those
users. These tactics are usually useful only at extremely large scales. In fact, this is a
crude approximation of what Infobright does. At the scale of terabytes, locating individual rows doesn’t make sense; indexes are replaced by per-block metadata.
One thing is sure: you can’t scan the whole table every time you want to query it,
because it’s too big. And you don’t want to use an index because of the maintenance
cost and space consumption. Depending on the index, you could get a lot of fragmentation and poorly clustered data, which would cause death by a thousand cuts through
random I/O. You can sometimes work around this for one or two indexes, but rarely
for more. Only two workable options remain: your query must be a sequential scan
over a portion of the table, or the desired portion of the table and index must fit entirely
in memory.
It’s worth restating this: at very large sizes, B-Tree indexes don’t work. Unless the index
covers the query completely, the server needs to look up the full rows in the table, and
that causes random I/O a row at a time over a very large space, which will just kill query
response times. The cost of maintaining the index (disk space, I/O operations) is also
very high. Systems such as Infobright acknowledge this and throw B-Tree indexes out
entirely, opting for something coarser-grained but less costly at scale, such as per-block
metadata over large blocks of data.
This is what partitioning can accomplish, too. The key is to think about partitioning
as a crude form of indexing that has very low overhead and gets you in the neighborhood
of the data you want. From there, you can either scan the neighborhood sequentially,
or fit the neighborhood in memory and index it. Partitioning has low overhead because
there is no data structure that points to rows and must be updated—partitioning
doesn’t identify data at the precision of rows, and has no data structure to speak of.
Instead, it has an equation that says which partitions can contain which categories of
rows.
(many thanks to High Performance MySQL great book)
99% of cases I have looked at do not benefit from PARTITIONing as much as from INDEXing.
My Rules of Thumb for using Partitioning are in http://mysql.rjweb.org/doc.php/partitionmaint . Also, that lists the only 4 use cases where partitioning improves performance.
OK, I can't say "exactly" 99%, but it is very close to that. I do believe strongly in the "4" -- I have been searching since partitioning was added to MySQL many years ago.
For Data Warehousing, the usual performance solution is to create and maintain "Summary tables". This works nicely for 'most' DW applications.
"Very large BTrees don't work"? Bull. A million-row index will have a BTree depth of about 3. A trillion rows -- about 6. Where's the "won't work"? A "point query" on a trillion row table will touch twice as many nodes in the BTree, and more of them are unlikely to be cached. But it "will work".
Infobright, with its "columnar storage", has its niche. TokuDB, with its "fractal indexing", has its niche. Neither one can say "we are better than BTrees most of the time". (Both those engines get part of their speed by compression.)
Bottom Line: Use an index. Probably a "composite" index. (More indexing tips: http://mysql.rjweb.org/doc.php/index_cookbook_mysql )
We are in the process of migrating from MySQL to PGSQL and we have a 100 million row table.
When I was trying to ascertain how much space both systems use, I found much less difference for tables, but found huge differences for indexes.
MySQL indexes were occupying more size than the table data itself and postgres was using considerably lesser sizes.
When digging through for the reason, I found that MySQL uses B+ trees to store the indexes and postgres uses B-trees.
MySQL usage of indexes was a little different, it stores the data along with the indexes (due to which the increased size), but postgres doesn't.
Now the questions:
Comparing B-tree and B+ trees on database speak, it is better to use B+trees since they are better for range queries O(m) + O(logN) - where m in the range and lookup is logarithmic in B+trees?
Now in B-trees the lookup is logarithmic for range queries it shoots up to O(N) since it does not have the linked list underlying structure for the data nodes. With that said, why does postgres uses B-trees? Does it perform well for range queries (it does, but how does it handle internally with B-trees)?
The above question is from a postgres point of view, but from a MySQL perspective, why does it use more storage than postgres, what is the performance benefit of using B+trees in reality?
I could have missed/misunderstood many things, so please feel free to correct my understanding here.
Edit for answering Rick James questions
I am using InnoDB engine for MySQL
I built the index after populating the data - same way I did in postgres
The indexes are not UNIQUE indexes, just normal indexes
There were no random inserts, I used csv loading in both postgres and MySQL and only after this I created the indexes.
Postgres block size for both indexes and data is 8KB, I am not sure for MySQL, but I didn't change it, so it must be the defaults.
I would not call the rows big, they have around 4 text fields with 200 characters long, 4 decimal fields and 2 bigint fields - 19 numbers long.
The P.K is a bigint column with 19 numbers,I am not sure if this is bulky? On what scale should be differentiate bulky vs non-bulky?
The MySQL table size was 600 MB and Postgres was around 310 MB both including indexes - this amounts to 48% bigger size if my math is right.But is there a way that I can measure the index size alone in MySQL excluding the table size? That can lead to better numbers I guess.
Machine info : I had enough RAM - 256GB to fit all the tables and indexes together, but I don't think we need to traverse this route at all, I didn't see any noticeable performance difference in both of them.
Additional Questions
When we say fragmentation occurs ? Is there a way to do de-fragmentation so that we can say that beyond this, there is nothing to be done.I am using Cent OS by the way.
Is there a way to measure index size along in MySQL, ignoring the primary key as it is clustered, so that we can actually see what type is occupying more size if any.
First, and foremost, if you are not using InnoDB, close this question, rebuild with InnoDB, then see if you need to re-open the question. MyISAM is not preferred and should not be discussed.
How did you build the indexes in MySQL? There are several ways to explicitly or implicitly build indexes; they lead to better or worse packing.
MySQL: Data and Indexes are stored in B+Trees composed of 16KB blocks.
MySQL: UNIQUE indexes (including the PRIMARY KEY) must be updated as you insert rows. So, a UNIQUE index will necessarily have a lot of block splits, etc.
MySQL: The PRIMARY KEY is clustered with the data, so it effectively takes zero space. If you load the data in PK order, then the block fragmentation is minimal.
Non-UNIQUE secondary keys may be built on the fly, which leads to some fragmentation. Or they can be constructed after the table is loaded; this leads to denser packing.
Secondary keys (UNIQUE or not) implicitly include the PRIMARY KEY in them. If the PK is "large" then the secondary keys are bulky. What is your PK? Is this the 'answer'?
In theory, totally random inserts into a BTree lead to a the blocks being about 69% full. Maybe this is the answer. Is MySQL 45% bigger (1/69%)?
With 100M rows, probably many operations are I/O-bound because you don't have enough RAM to cache all the data and/or index blocks needed. If everything is cached, then B-Tree versus B+Tree won't make much difference. Let's analyze what needs to happen for a range query when things are not fully cached.
With either type of Tree, the operation starts with a drill-down in the Tree. For MySQL, 100M rows will have a B+Tree of about 4 levels deep. The 3 non-leaf nodes (again 16KB blocks) will be cached (if they weren't already) and be reused. Even for Postgres, this caching probably occurs. (I don't know Postgres.) Then the range scan starts. With MySQL it walks through the rest of the block. (Rule of Thumb: 100 rows in a block.) Ditto for Postgres?
At the end of the block something different has to happen. For MySQL, there is a link to the next block. That block (with 100 more rows) is fetched from disk (if not cached). For a B-Tree the non-leaf nodes need to be traversed again. 2, probably 3 levels are still cached. I would expect the need for another non-leaf node to be fetched from disk only 1/10K rows. (10K = 100*100) That is, Postgres might hit the disk 1% more often than MySQL, even on a "cold" system.
On the other hand, if the rows are so fat that only 1 or 2 can fit in a 16K block, the "100" I kept using is more like "2", and the 1% becomes maybe 50%. That is, if you have big rows this could be the "answer". Is it?
What is the block size in Postgres? Note that many of the computations above depend on the relative size between the block and the data. Could this be an answer?
Conclusion: I've given you 4 possible answers. Would you like to augment the question to confirm or refute that each of these apply? (Existence of secondary indexes, large PK, inefficient building of secondary indexes, large rows, block size, ...)
Addenda about PRIMARY KEY
For InnoDB, another thing to note... It is best to have a PRIMARY KEY in the definition of the table before loading the data. It is also best to sort the data in PK order before LOAD DATA. Without specifying any PRIMARY KEY or UNIQUE key, InnoDB builds a hidden 6-byte PK; this is usually sub-optimal.
At databases you have often queries who delivers some data ranges like id's from 100 to 200.
In this case
B-Tree needs to follow the path from the root to the leafs for every single entry to get the data-pointer.
B+-Trees can 'walk' through the leafs and has to follow the path to the leafs only the first time (i.e. for the id 100)
This is because B+-Trees stores only the data (or data-pointer) in the leafs and the leafs are linked so that you can perform a rapid in-order-traversal.
B+-Tree
Another point is:
At B+Trees the inner nodes stores only pointer to other nodes without any data-pointer, so you have more space for pointers and you need less IO-Operations and you can store more node-pointers at a memory-page.
So for range-queries B+-Trees are the optimum data-strucure. For single selections B-Trees might be better (causes of the depth/size of the tree), cause the data-pointer are located also inside the tree.
MySQL and PostgreSQL aren't really comparable here Innodb uses an index to store table data (and secondary indexes just point at the pkey). This is great for single row pkey lookups and with B+ trees, do ok with range queries on the pkey field, but have performance drawbacks for everything else.
PostgreSQL uses heap tables and puts indexes as separate. It supports a number of different indexing algorithms. Depending on your range query, a btree index may not help you and you may need a GiST Index instead. Similarly GIN indexes work well with member lookups (for arrays, fts etc).
I think btree is used because it excels at the simple use case: what roes contain the following data? This becomes a building block of GIN for example.
But it isn't true that PostgreSQL cannot use B+ trees. GiST is built on B+ Tree indexes in a generalized format. So PostgreSQL gives you the option to use B+ trees where they come in handy.
I was reading ebook chapter about indexes, and indexing strategies, many of these aspects I already know, but I stucked on clustered indexes in InnoDB, here is the quote:
Clustering gives the largest improvement for I/O-bound workloads. If
the data fits in memory the order in which it’s accessed doesn’t
really matter, so clustering doesn’t give much benefit.
I belive that this is truth, but how am I supposed to guess if the data would fit in memory? How the database decide when to process the data in-memory, and when not?
Let's say we have a table Emp with columns ID, Name, and Phone filled with 100 000 records
If, one example, I will put the clustered index on the ID column, and perform this query
SELECT * FROM Employee;
How do I know if this will use a benefits from clustered index?
It's somehow relative to this thread
Difference between In memory databases and disk memory database
but yet I am not sure how the database will behave
Your example might be 20MB.
"In memory" really means "in the InnoDB buffer_pool", whose size is controlled by innodb_buffer_pool_size, which should be set to about 70% of available RAM.
If your query hits the disk instead of finding everything cached in the buffer_pool, it will run (this is just a Rule of Thumb) 10 times as slow.
What you are saying on "clustered index" is misleading. Let me turn things around...
InnoDB really needs a PRIMARY KEY.
A PK is (by definition in MySQL) UNIQUE.
There can be only one PK for a table.
The PK can be a "natural" key composed of one (or more) columns that 'naturally' work.
If you don't have a "natural" choice, then use id INT UNSIGNED NOT NULL AUTO_INCREMENT.
The PK and the data are stored in the same BTree. (Actually a B+Tree.) This leads to "the PK is clustered with the data".
The real question is not whether something is clustered, but whether it is cached in RAM. (Remember the 10x RoT.)
If the table is small, it will stay in cache (once all its blocks are touched), hence avoid disk hits.
If some subset of a huge table is "hot", it will tend to stay in cache.
If you must access a huge table "randomly", you will suffer a slowdown due to lots of disk hits. (This happens when using UUIDs as PRIMARY KEY or other type of INDEX.)
How the database decide when to process the data in-memory, and when not?
That's 'wrong', too. All processing is in memory. On a block-by-block basis, pieces of the tables and indexes are moved into / out of the buffer_pool. A block (in InnoDB) is 16KB. And the buffer_pool is a "cache" of such blocks.
SELECT * FROM Employee;
is simple, but costly. It operates thus:
"Open" table Employee (if not already open -- a different 'cache' handles this).
Go to the start of the table. This involves drilling down the left side of the PK's BTree to the first leaf node (block). And fetch it into the buffer_pool if not already cached.
Read a row -- this will be in that leaf node.
Read next row -- this is probably in the same block. If not, get the 'next' block (read from disk if necessary).
Repeat step 4 until finished with the table.
Things get more interesting if you have a WHERE clause. And then it depends on whether the PK or some other INDEX is involved.
Etc, etc.
I was optimizing a 3 GB table as a MEMORY table in order to do some analysis on it, and I was curious if adding indexes even help a MEMORY table. Since the data is all in memory anyway, is this just redundant?
No, they're not redundant.
Yes, continue to use indexes.
The speed of access to a memory table on smaller tables with a non-indexed column may seem almost identical to the indexed ones due to how fast full table scans can be in memory, but as the table grows or as you join them together to make larger result sets there will be a difference.
Regardless of the storage method the engine uses (disk/memory), proper indexes will improve performance as long the storage engine supports them. How the indexes are implemented may vary, but I know they are implemented in the table types MEMORY, INNODB, and MyISAM. BTW: The default method for indexes in MEMORY tables is with a hash instead of a B-Tree.
Also, I generally don't recommend coding to your storage engine. What's a memory table today may need to changed to innodb tomorrow--the SQL and schema should stand on it's own.
No, indexing has little to do with data access speed. An index reorganizes data in order to optimize specific queries.
For example if you add a balanced binary tree index to a one-million-row column, you will be able to find the item you want in about 20 read operations, instead of a average half million.
So placing that million rows in memory, which is 100x faster than the disk, will speed a brute force search by 100x. Adding the index will further improve the speed by a factor of twenty-five thousand by allowing the DB to perform a smarter search instead of a merely faster search.
Things are more complicated than this, because other factors get into play, and you rarely get such large a benefit from an index. Smarter searches are also slower on a one-by-one basis: those 20 index seeks cost much more than 20 brute force seeks. Then there's index maintenance, etc.
But my suggestion is to keep the data in memory if you can -- and index them.