We dispose of a database MySQL partitioned over every of our client with a unique ID (it is not a linear partition created with RANGE over a value).
When I create or delete a new partition, with requests :
ALTER TABLE "table_name"
ADD PARTITION (PARTITION "client_id_value" VALUES IN ("client_id_value"));
ALTER TABLE "table_name" DROP PARTITION "client_id_value";
SQL does a table full scan, even if there is no value corresponding to the new partition yet (and I know it). It is problematic as we create and delete a partition in every fitness test we do to ensure the good process of creating specific, separate space in data for the client. One solution would be to keep a partition for testing, but I wonder if we can skip the full scan while altering the table.
Any ideas ?
Related
Since partition also splits the table into subtables, I wanted to know if there is any way to index the partitioned table one by one based on the partition name or id. I am asking this because, my table can have 1 Billion+ rows and add index query takes long hours/day, so wanted to check if I can start adding index based on the partition that I think is more important first or vice versa.
No, MySQL has no syntax to support creating indexes on a partitioned table one partition at a time. The index will be added to all partitions in one ALTER TABLE or CREATE INDEX statement.
At my company, we execute schema changes using pt-online-schema-change, a script that allows clients to continue reading and writing the table while the alter is running. It might even take longer to run the schema change, but since it doesn't block clients, this doesn't cause a problem.
The script is part of the Percona Toolkit, which is a free, open-source collection of tools written in Perl and Bash.
I have a table with the primary key of max signed INT hit, 2147483647
Imagine I want to switch it to unsigned, and there are no negative values in the table since it is a primary key, because Im under the current belief that it is the fastest way to get the table going again.
Should the ALTER TABLE statement to switch it to an unsigned INT be a relatively quick process, since the values of the ids shouldn't change? What about locking?
Mysql documentation on ALTER TABLE command describes quite in a detailed manner under "Storage, Performance, and Concurrency Considerations" section which changes can be done quickly, without table copy and index rebuild, and what locks mysql will apply during the course of the command. Changing the column type is unfortunately not listed as something that can be done in place (of course, read the documentation corresponding to your mysql version, I just linked the newest one).
For some operations, an in-place ALTER TABLE is possible that does not
require a temporary table:
For ALTER TABLE tbl_name RENAME TO new_tbl_name without any other options, MySQL simply renames any files that correspond to the table
tbl_name without making a copy. (You can also use the RENAME TABLE
statement to rename tables. See Section 13.1.28, “RENAME TABLE
Syntax”.) Any privileges granted specifically for the renamed table
are not migrated to the new name. They must be changed manually.
Alterations that modify only table metadata and not table data are immediate because the server only needs to alter the table .frm file,
not touch table contents. The following changes are fast alterations
that can be made this way:
Renaming a column.
Changing the default value of a column.
Changing the definition of an ENUM or SET column by adding new enumeration or set members to the end of the list of valid member
values, as long as the storage size of the data type does not change.
For example, adding a member to a SET column that has 8 members
changes the required storage per value from 1 byte to 2 bytes; this
will require a table copy. Adding members in the middle of the list
causes renumbering of existing members, which requires a table copy.
ALTER TABLE with DISCARD ... PARTITION ... TABLESPACE or IMPORT ... PARTITION ... TABLESPACE do not create any temporary tables or
temporary partition files.
ALTER TABLE with ADD PARTITION, DROP PARTITION, COALESCE PARTITION, REBUILD PARTITION, or REORGANIZE PARTITION does not create
any temporary tables (except when used with NDB tables); however,
these operations can and do create temporary partition files.
ADD or DROP operations for RANGE or LIST partitions are immediate operations or nearly so. ADD or COALESCE operations for HASH or KEY
partitions copy data between all partitions, unless LINEAR HASH or
LINEAR KEY was used; this is effectively the same as creating a new
table, although the ADD or COALESCE operation is performed partition
by partition. REORGANIZE operations copy only changed partitions and
do not touch unchanged ones.
Renaming an index.
Adding or dropping an index, for InnoDB.
Locking:
While ALTER TABLE is executing, the original table is readable by
other sessions (with the exception noted shortly). Updates and writes
to the table that begin after the ALTER TABLE operation begins are
stalled until the new table is ready, then are automatically
redirected to the new table without any failed updates. The temporary
copy of the original table is created in the database directory of the
new table. This can differ from the database directory of the original
table for ALTER TABLE operations that rename the table to a different
database.
The exception referred to earlier is that ALTER TABLE blocks reads
(not just writes) at the point where it is ready to install a new
version of the table .frm file, discard the old file, and clear
outdated table structures from the table and table definition caches.
At this point, it must acquire an exclusive lock. To do so, it waits
for current readers to finish, and blocks new reads (and writes).
I've got a table with 10M rows, and I'm trying to ALTER TABLE to add another column (a VARCHAR(80)).
From a data-modelling perspective, that column should be NOT NULL - but the amount of time it takes to run the statement is a consideration, and the client code could be changed to deal with a NULL column if that's warranted.
Should the NULL-ability of the column I'm trying to add significantly impact the amount of time it takes to add the column either way?
More Information
The context in which I'm doing this is a Django app, with a migration generated by South - adding three separate columns, and adding an index on one of the newly-added columns. Looking at the South-generated SQL, it spreads this operation (adding three columns and an index) over 15 ALTER TABLE statements - which seems like it will make this operation take a whole lot longer than it should.
I've seen some references that suggest that InnoDB doesn't actually have to create a field in the on-disk file for nullable fields that are NULL, and just modifies a bitfield in the header. Would this impact the speed of the ALTER TABLE operation?
I don't think the nullability of the column has anything to do with the speed of ALTER TABLE. In most alter table operations, the whole table - with all the indexes - has to be copied (temporarily) and then the alteration is done on the copy. With 10M rows, it's kind of slow. From MySQL docs:
Storage, Performance, and Concurrency Considerations
In most cases, ALTER TABLE makes a temporary copy of the original table. MySQL waits for other operations that are modifying the table, then proceeds. It incorporates the alteration into the copy, deletes the original table, and renames the new one. While ALTER TABLE is executing, the original table is readable by other sessions. Updates and writes to the table that begin after the ALTER TABLE operation begins are stalled until the new table is ready, then are automatically redirected to the new table without any failed updates. The temporary table is created in the database directory of the new table. This can differ from the database directory of the original table for ALTER TABLE operations that rename the table to a different database.
If you want to make several changes in a table's structure, it's usually better to do them in one ALTER TABLE operation.
Allowing client code to make changes in tables is probably not the best idea - and you have hit on one good reason for not allowing that. Why do you need it? If you can't do otherwise, it would probably be better - for performance reasons - to allow your client code to be creating a table (with the new column and the PK of the existing table) instead of adding a column.
I have a table with 10 million records, what is the fastest way to delete & retain last 30 days.
I know this can be done in event scheduler, but my worry is if takes too much time, it might lock the table for much time.
It will be great if you can suggest some optimum way.
Thanks.
Offhand, I would:
Rename the table
Create an empty table with the same name as your
original table
Grab the last 30 days from your "temp" table and insert
them back into the new table
Drop the temp table
This will enable you to keep the table live through (almost) the entire process and get the past 30 days worth of data at your leisure.
You could try partition tables.
PARTITION BY LIST (TO_DAYS( date_field ))
This would give you 1 partition per day, and when you need to prune data you just:
ALTER TABLE tbl_name DROP PARTITION p#
http://dev.mysql.com/doc/refman/5.1/en/partitioning.html
Not that it helps you with your current problem, but if this is a regular occurance, you might want to look into a merge table: just add tables for different periods in time, and remove them from the merge table definition when no longer needed. Another option is partitioning, in which it is equally trivial to drop a (oldest) partition.
To expand on Michael Todd's answer.
If you have the space,
Create a blank staging table similar to the table you want to reduce in size
Fill the staging table with only the records you want to have in your destination table
Do a double rename like the following
Assuming:
table is the table name of the table you want to purge a large amount of data from
newtable is the staging table name
no other tables are called temptable
rename table table to temptable, newtable to table;
drop temptable;
This will be done in a single transaction, which will require an instantaneous schema lock. Most high concurrency applications won't notice the change.
Alternatively, if you don't have the space, and you have a long window to purge this data, you can use dynamic sql to insert the primary keys into a temp table, and join the temp table in a delete statement. When you insert into the temp table, be aware of what max_packet_size is. Most installations of MySQL use 16MB (16777216 bytes). Your insert command for the temp table should be under max_packet_size. This will not lock the table. You'll want to run optimize table to reclaim space for the rest of the engine to use. You probably won't be able to reclaim disk space, unless you were to shutdown the engine and move the data files.
Shutdown your resource,
SELECT .. INTO OUTFILE, parse output, delete table, LOAD DATA LOCAL INFILE optimized_db.txt - more cheaper to re-create, than to UPDATE.
I am working with a big table (~100.000.000 rows) in SQL Server 2008. Frequently, I need to add and remove batches of ~30.000.000 rows to and from this table. Currently, before loading a large batch into the table, I disable indexes, I insert the data, then I rebuild the index. I have measured this to be the fastest approach.
Since recently, I am considering implementing table partitioning on this table to increase speed. I will partition the table according to my batches.
My question, will it be possible to disable the index of one particular partition, and load the data into that one before enabling it again? In that case, the rest of my table will not have to suffer a complete index rebuild, and my loading can be even faster?
Indexes are typically on the Partition Scheme. For the scenario you are talking about you can actually load up a new table with the batch (identical structure, different name) and then use the SWITCH command to add this table as a new partition into your existing table.
I have included code that I use to perform this, you will need to modify it based on your table names:
DECLARE #importPart int
DECLARE #hourlyPart int
SET #importPart = 2 -- always, so long as the Import table is only made up of 1 partition
-- get the Hourly partition
SELECT
#hourlyPart = MAX(V.boundary_id) + 1
FROM
sys.partition_range_values V
JOIN sys.partition_functions F
ON V.function_id = F.function_id
AND F.name = 'pfHourly'
ALTER TABLE Import
SWITCH PARTITION #importPart
TO Hourly PARTITION #hourlyPart;