Oracle performance of schema changes as compared to MySQL ALTER TABLE? - mysql

When using MySQL MyISAM tables, and issuing an ALTER TABLE statement to add a column, MySQL creates a temporary table and copies all the data into the new table before overwriting the original table.
If that table has a lot of data, this process can be very slow (especially when rebuilding indexes), and requires you to have enough free space on the disk to store 2 copies of the table. This is very annoying.
How does Oracle work when adding columns? Is it fast on large tables?
I'm always interested in being able to do schema changes without having a lot of downtime. We are always adding new features to our software which require schema changes with every release. Any advice is appreciated...

Adding a column with no data to a large table in Oracle is generally very fast. There is no temporary copy of the data and no need to rebuild indexes. Slowness will generally arise when you want to add a column to a large table and backfill data into that new column for all the existing rows, since now you're talking about an UPDATE statement that affects a large number of rows.
Adding columns can lead to row migration over time. If you have a block that is 80% full with 4 rows and you add columns that will grow the size of each row 30% over time, you'll eventually reach a point where Oracle has to move one of the 4 rows to a different block. It does this by leaving a pointer to the new block in the old block, which causes reads on that migrated row to require more I/O. Eliminating migrated rows can be somewhat costly, and though it is generally possible to do without downtime assuming you're using the enterprise edition, it is generally easier if you have a bit of downtime. But row migration is something that you generally only have to worry about well down the road. If you know that certain tables are likely to have their row size increase substantially in the future, you can mitigate problems in advance by specifying a larger PCTFREE setting for the table.

With regard to downtime, altering a table (and a bunch of other DDL operations) take an exclusive lock. However Oracle can also perform online redefinition of oobjects using the DBMS_REDEFINITION package, which can really take a bite out of downtime.

Related

Giant unpartitioned MySQL table issues

I have a MySQL table which is about 8TB in size. As you can imagine, querying is horrendous.
I am thinking about:
Create a new table with partitions
Loop through a series of queries to dump data into those partitions
But the loop will require lots of queries to be submitted & each will be REALLY slow.
Is there a better way to do this? Repartitioning a production database in-situ isn't going to work - this seemed like an OK option, but slow
And is there a tool that will make life easier? Rather than a Python job looping & submitting jobs?
Thanks a lot in advance
You could use pt-online-schema-change. This free tool allows you to partition the table with an ALTER TABLE statement, but it does not block clients from using the table while it's restructuring it.
Another useful tool could be pt-archiver. You would create a new table with your partitioning idea, then pt-archiver to gradually copy or move data from the old table to the new table.
Of course try out using these tools in a test environment on a much smaller table first, so you get some practice using them. Do not try to use them for the first time on your 8TB table.
Regardless of what solution you use, you are going to need enough storage space to store the entire dataset twice, plus binary logs. The old table will not shrink, even as you remove data from it. So I hope your filesystem is at least 24TB. Or else the new table should be stored on a different server (or ideally several other servers).
It will also take a long time no matter which solution you use. I expect at least 4 weeks, and perhaps longer if you don't have a very powerful server with direct-attached NVMe storage.
If you use remote storage (like Amazon EBS) it may not finish before you retire from your career!
In my opinion, 8TB for a single table is a problem even if you try partitioning. Partitioning doesn't magically fix performance, and could make some queries worse. Do you have experience with querying partitioned tables? And you understand how partition pruning works, and when it doesn't work?
Before you choose partitioning as your solution, I suggest you read the whole chapter on partitioning in the MySQL manual: https://dev.mysql.com/doc/refman/8.0/en/partitioning.html, especially the page on limitations: https://dev.mysql.com/doc/refman/8.0/en/partitioning-limitations.html Then try it out with a smaller table.
A better strategy than partitioning for data at this scale is to split the data into shards, and store each shard on one of multiple database servers. You need a strategy for adding more shards as I assume the data will continue to grow.

How to alter a table in large mysql DataBase

I have very large mysql DB(more than 100 G) and I want to migrate some changes in table. It need to posing that I have no space to make backup for this size, as results, it rolls back all changes. In clear way, when I want to alter table in mysql, it backup for it self in local disk and because of no space on my disk, it 's rolling back. Is there any one to help for this issue?
It depends on the type of ALTER change. Since MySQL 5.6, some alterations can be done "inplace" which does not use extra storage. Read https://dev.mysql.com/doc/refman/8.0/en/innodb-online-ddl-operations.html for details.
But many types of alterations (basically any change that affects the storage size of a row), still require a "table rebuild" which means it makes a copy of the table in the process of doing the alter. This requires extra storage, usually about the same as the size of the original table. If your server does not have enough storage space to do this, then you cannot alter the table, FULL STOP. You should have addressed this long before the table grew to this size. It is your responsibility to monitor the size of the database and make sure you have enough storage space.
You said in the comments above that it is not possible to add more storage space.
Sometimes InnoDB tablespaces can become "fragmented." That means they could store the same rows using slightly less storage space. You can accomplish defragmentation with OPTIMIZE TABLE <name>; or ALTER TABLE <name> FORCE;
A possible strategy is to optimize each of your tables, starting with the smallest table and doing one by one in order of size. You hope that you have enough space for the copy table needed to do this, and any space freed up by optimizing the small tables will help you to optimize the medium-size tables, and so on. This is a long shot, because it's also possible you don't have enough "wasted" space to recover to make a difference. You may still not be able to alter your largest tables once you're all done with this process. Unfortunately, MySQL has no way of reporting fragmentation of the tables, so you can only try it and see how much space, if any, you recover.
After that, the only other option is to delete data from the table until you can do the alteration. The alter still requires extra storage, but only as much as needed to copy the remaining rows of data. That is, if you delete 50% of the rows, then the tablespace file that stores the copy only needs roughly 50% of the storage space.
You can also try dropping some of the indexes of the table as you do your alteration. Indexes take space in the same tablespace, and it's not uncommon if you have multiple indexes for them to take more space than the data itself. If you include some DROP INDEX operations in your ALTER TABLE statement, the resulting copy will take less space. On the other hand, those indexes might be necessary for proper query optimization, and without the indexes, your application won't be usable.

Post optimization needed after deleting rows in a MYSQL Database

I have a log table that is currently 10GB. It has a lot of data for the past 2 years, and I really feel at this point I don't need so much in there. Am I wrong to assume it is not good to have years of data in a table (a smaller table is better)?
My tables all have an engine of MYISAM.
I would like to delete all data of 2014 and 2015, and soon i'll do 2016, but i'm concerned about after I run the DELETE statement, what exactly will happen. I understand because it's ISAM there is a lock that will occur where no writing can take place? I would probably delete data by the month, and do it late at night, to minimize this as it's a production DB.
My prime interest, specifically, is this: should I take some sort of action after this deletion? Do I need to manually tell MYSQL to do anything to my table, or is MYSQL going to do all the housekeeping itself, reclaiming everything, reindexing, and ultimately optimizing my table after the 400,000k records I'll be deleting.
Thanks everyone!
Plan A: Use a time-series PARTITIONing of the table so that future deletions are 'instantaneous' because of DROP PARTITION. More discussion here . Partitioning only works if you will be deleting all rows older than X.
Plan B: To avoid lengthy locking, chunk the deletes. See here . This is optionally followed by an OPTIMIZE TABLE to reclaim space.
Plan C: Simply copy over what you want to keep, then abandon the rest. This is especially good if you need to preserve only a small proportion of the table.
CREATE TABLE new LIKE real;
INSERT INTO new
SELECT * FROM real
WHERE ... ; -- just the newer rows;
RENAME TABLE real TO old, new TO real; -- instantaneous and atomic
DROP TABLE old; -- after verifying that all went well.
Note: The .MYD file contains the data; it will never shrink. Deletes will leave holes in it. Further inserts (and opdates) will use the holes in preference to growing the table. Plans A and C (but not B) will avoid the holes, and truly free up space.
Tim and e4c5 have given some good recommendations and I urge them to add their answers.
You can run OPTIMIZE TABLE after doing the deletes. Optimize table will help you with a few things (taken from the docs):
If the table has deleted or split rows, repair the table.
If the index pages are not sorted, sort them.
If the table's statistics are not up to date (and the repair could not be accomplished by sorting the index), update them.
According to the docs: http://dev.mysql.com/doc/refman/5.7/en/optimize-table.html
Use OPTIMIZE TABLE in these cases, depending on the type of table:
...
After deleting a large part of a MyISAM or ARCHIVE table, or making
many changes to a MyISAM or ARCHIVE table with variable-length rows
(tables that have VARCHAR, VARBINARY, BLOB, or TEXT columns). Deleted
rows are maintained in a linked list and subsequent INSERT operations
reuse old row positions. You can use OPTIMIZE TABLE to reclaim the
unused space and to defragment the data file. After extensive changes
to a table, this statement may also improve performance of statements
that use the table, sometimes significantly.

Cassandra write performance vs Releational Databases

I am trying to grasp some performance differences between Cassandra and relational databases.
From what I have read, Cassandra's write performance remains constant regardless of data volume. By write performance, I am assuming this implies both new rows being added as well as existing rows being replaced on a key match (like an update in the relational world). Is that assumption correct?
Also, from what I understand about relational databases updates get slower when tables/partitions become larger. This is because a full table scan must be performed to locate the row, or an index lookup needs to be performed and both of these things will take longer as the table or partition grows. So updates take perpetually longer based on the data volume of the table/partition?
When new data is inserted to a relational database, I know any indexes need to to have the new data but there is no lookup involved correct? So will inserts also become perpetually slower as data volume increases or stay constant with relational databases?
Thanks for any tips
They will become slower if the table has indexes. Not only the data must be written, but the index must be updated too. Inserting in a table that has no indexes and no constraints is lightning fast, because no checks need to be done. The record can just be written at the end of the table space.
On the relational DB side, I've been doing load testing on our RDBMS where I can see that the performance drops exponentially as data is added to the DB.
I'm still working on a Cassandra setup to be able to realize a comparable test. In the meantime, this Cassandra presentation gives some info on Cassandra compared to MySQL:
http://www.slideshare.net/Eweaver/cassandra-presentation-at-nosql

Is Postgres better than MySql when one needs to add a column to a table with millions of rows?

We're having problems with Mysql. When I search around, I see many people having the same problem.
I have joined up with a product where the database has some tables with as many as 150 million rows. One example of our problem is that one of these tables has over 30 columns and about half of them are no longer used. When trying to remove columns or renaming columns, mysql wants to copy the entire table and rename. With this amount of data, it would take many hours to do this and the site would be offline pretty much the whole time. This is just the first of several large migrations to improve the schema. These aren't intended as a regular thing. Just a lot of cleanup I inherited.
I tried searching to see if people have the same problem with Postgres and I find almost nothing in comparison talking about this issue. Is this because Postgres is a lot better at it, or just that less people are using postgres?
In PostgreSQL, adding a new column without default value to a table is instantaneous, because the new column is only registered in the system catalog, not actually added on disk.
When the only tool you know is a hammer, all your problems look like a nail. For this problem, PostgreSQL is much much better at handling these types of changes. And the fact is, it doesn't matter how well you designed your app, you WILL have to change the schema on a live database someday. While MySQL's various engines really are amazing for certain corner cases, here none of them help. PostgreSQL's very close integration between the various layers means that you can have things like transactional ddl that allow you to roll back anything that isn't an alter / create database / tablespace. Or very very fast alter tables. Or non-impeding create indexes. And so on. It limits PostgreSQL to the things it does well (traditional transactional db load handling is a strong point) and not so great at the things that MySQL often fills in the gaps on, like live networked clustered storage with the ndb engine.
In this case none of the different engines in MySQL allow you to easily solve this problem. The very versatility of multiple storage engines means that the lexer / parser / top layer of the DB cannot be as tightly integrated to the storage engines, and therefore a lot of the cool things pgsql can do here mysql can't.
I've got a 118Gigabyte table in my stats db. It has 1.1 billion rows in it. It really should be partitioned but it's not read a whole lot, and when it is we can wait on it. At 300MB/sec (the speed the array it's on can read) it takes approximately 118*~3seconds to read, or right around 5 minutes. This machine has 32Gigs of RAM, so it cannot hold the table in memory.
When I ran the simple statement on this table:
alter table mytable add test text;
it hung waiting for a vacuum. I killed the vacuum (select pg_cancel_backend(12345) (<-- pid in there) and it finished immediately. A vacuum on this table takes a long time to run btw. Normally it's not a big deal, but when making changes to table structure, you have to wait on vacuums, or kill them.
Dropping a column is just as simple and fast.
Now we come to the problem with postgresql, and that is the in-heap MVCC storage. If you add that column, then do an update table set test='abc' it updates each row, and exactly doubles the size of the table. Unless HOT can update the rows in place, but then you need a 50% fill factor table which is double sized to begin with. The only way to get the space back is to either wait and let vacuum reclaim it over time and reuse it one update at a time, or to run cluster or vacuum full to shrink it back down.
you can get around this by running updates on parts of the table at a time (update where pkid between 1 and 10000000; ...) and running vacuum between each run to reclaim the space.
So, both systems have warts and bumps to deal with.
maybe because this should not be a regualr occurrence.
perhaps, reading between the lines, you need to be adding a row to another table, instead of columns to a large existing table..?