What happens if I drop partitions in a table? - partitioning

I have few questions regarding the partitions, I'm planning to drop the existing partition scheme/function in table1 and then point the new partition scheme/function to table1 (I know how to go ahead with the process but before that I would like to know some suggestions/answers to the below queries )
If I drop the old/existing partition scheme/function with data in the table what Will happen to the data in old partition.ndf files ? (deleted / Not modified)
If I drop the old/existing partition scheme/function with data in the table what Will happen to the data in the table ? (will there be any loss of records)
Thanks

Finally I got the answer for the question I raised.
Even though the partitions are droped from an existing table the filegroup/files will have its data. After partition drop if any data is added to the table then those data won't be there in the filegroup/files.
There won't be any data loss for this condition.

Related

Table with 50 million data and adding index takes too much time

I was working on table which has near about 50 million data(2GB-size). I had requirement to optimize the performance. So when I add index on column through phpmyadmin panel, table got lock and result in holding up all queries in queue on that table and ultimately results in restart/kill all queries. (And yeah, I forgot to mention I was doing this on production. My bad!)
When I did some research I found out some solution like creating duplicate table but any alternative method ?
You may follow this steps,
Create a temp table
Creates triggers on the first table (for
inserts, updates, deletes) so that they are replicated to the temp
table
In small batches, migrate data When done, rename table to new
table, and drop the other table
But as you said you are doing it in production then you need to consider live traffic while dropping a table and creating another one

MySQL/ASP - Delete Duplicate Rows

MySQL/ASP - Delete Duplicate Rows
I have a table with 100,000 rows called 'photoSearch'. When transferring the data from other tables (that took bloody ages and I was bloody tired), I accidentally forgot to remove the test transfer I did, which left 3500 rows in the table before I transferred everything over in one go.
The ID column is 'photoID' (INT) and I need to remove all duplicates that have a photoID of less than 6849. If I could just remove the duplicates, it would be less painful than to delete the table and start another transfer.
Has anybody got any suggestions on the most practical and safest way to do this?
UPDATE:
I actually answered my own question. I backed up my table for safety, and then I ran this:
ALTER IGNORE TABLE photoSearch ADD UNIQUE INDEX unique_id_index (photoID);
This removed all 3500 duplicates in under a minute :)
Traditional method
Backup your existing table photoSearch to something like tmp_photoSearch using a
create table tmp_photoSearch select * from photoSearch;
After that, you can perform data massage into table tmp_photoSearch.
Once you have gotten the results as expected,
perform a swap table
rename table photoSearch to photoSearch_backup, tmp_photoSearch to photoSearch;
To increase insert speed (if the bottle-neck is not on network transfer),
http://dev.mysql.com/doc/refman/5.0/en/insert-speed.html
To increase performance for MyISAM tables, for both LOAD DATA INFILE and INSERT, enlarge the key cache by increasing the key_buffer_size system variable

Table data handling - optimum usage

I have a table with 8 millions records in mysql.
I want to keep last one week data and delete the rest, i can take a dump and recreate the table in another schema.
I am struggling to get the queries right, please share your views and best approaches to do this.Best way to delete so that it will not affect other tables in the production.
Thanks.
MySQL offers you a feature called partitioning. You can do a horizontal partition and split your tables by rows. 8 Million isn't that much, how is the insertion rate per week?
CREATE TABLE MyVeryLargeTable (
id SERIAL PRIMARY KEY,
my_date DATE
-- your other columns
) PARTITION BY HASH (YEARWEEK(my_date)) PARTITIONS 4;
You can read more about it here: http://dev.mysql.com/doc/refman/5.1/en/partitioning.html
Edit: This one creates 4 partitions, so this will last for 4 weeks - therefore I suggest changing to partitions based on months / year. Partition limit is quite high but this is really a question how the insertion rate per week/month/year looks like.
Edit 2
MySQL5.0 comes with an Archive Engine, you should use this for your Archive table ( http://dev.mysql.com/tech-resources/articles/storage-engine.html ). Now how to get your data into the archive table? It seems like you have to write a cron-job that runs on the beginning of every week, moving all records to the archive table and deleting them from the original one. You could write a stored procedure for this but the cron-job needs to run on the shell. Keep in mind this could affect your data integrity in some way. What about upgrading to MySQL 5.1?

Is there any way to do a bulk/faster delete in mysql?

I have a table with 10 million records, what is the fastest way to delete & retain last 30 days.
I know this can be done in event scheduler, but my worry is if takes too much time, it might lock the table for much time.
It will be great if you can suggest some optimum way.
Thanks.
Offhand, I would:
Rename the table
Create an empty table with the same name as your
original table
Grab the last 30 days from your "temp" table and insert
them back into the new table
Drop the temp table
This will enable you to keep the table live through (almost) the entire process and get the past 30 days worth of data at your leisure.
You could try partition tables.
PARTITION BY LIST (TO_DAYS( date_field ))
This would give you 1 partition per day, and when you need to prune data you just:
ALTER TABLE tbl_name DROP PARTITION p#
http://dev.mysql.com/doc/refman/5.1/en/partitioning.html
Not that it helps you with your current problem, but if this is a regular occurance, you might want to look into a merge table: just add tables for different periods in time, and remove them from the merge table definition when no longer needed. Another option is partitioning, in which it is equally trivial to drop a (oldest) partition.
To expand on Michael Todd's answer.
If you have the space,
Create a blank staging table similar to the table you want to reduce in size
Fill the staging table with only the records you want to have in your destination table
Do a double rename like the following
Assuming:
table is the table name of the table you want to purge a large amount of data from
newtable is the staging table name
no other tables are called temptable
rename table table to temptable, newtable to table;
drop temptable;
This will be done in a single transaction, which will require an instantaneous schema lock. Most high concurrency applications won't notice the change.
Alternatively, if you don't have the space, and you have a long window to purge this data, you can use dynamic sql to insert the primary keys into a temp table, and join the temp table in a delete statement. When you insert into the temp table, be aware of what max_packet_size is. Most installations of MySQL use 16MB (16777216 bytes). Your insert command for the temp table should be under max_packet_size. This will not lock the table. You'll want to run optimize table to reclaim space for the rest of the engine to use. You probably won't be able to reclaim disk space, unless you were to shutdown the engine and move the data files.
Shutdown your resource,
SELECT .. INTO OUTFILE, parse output, delete table, LOAD DATA LOCAL INFILE optimized_db.txt - more cheaper to re-create, than to UPDATE.

SQL Server 2008: Disable index on one particular table partition

I am working with a big table (~100.000.000 rows) in SQL Server 2008. Frequently, I need to add and remove batches of ~30.000.000 rows to and from this table. Currently, before loading a large batch into the table, I disable indexes, I insert the data, then I rebuild the index. I have measured this to be the fastest approach.
Since recently, I am considering implementing table partitioning on this table to increase speed. I will partition the table according to my batches.
My question, will it be possible to disable the index of one particular partition, and load the data into that one before enabling it again? In that case, the rest of my table will not have to suffer a complete index rebuild, and my loading can be even faster?
Indexes are typically on the Partition Scheme. For the scenario you are talking about you can actually load up a new table with the batch (identical structure, different name) and then use the SWITCH command to add this table as a new partition into your existing table.
I have included code that I use to perform this, you will need to modify it based on your table names:
DECLARE #importPart int
DECLARE #hourlyPart int
SET #importPart = 2 -- always, so long as the Import table is only made up of 1 partition
-- get the Hourly partition
SELECT
#hourlyPart = MAX(V.boundary_id) + 1
FROM
sys.partition_range_values V
JOIN sys.partition_functions F
ON V.function_id = F.function_id
AND F.name = 'pfHourly'
ALTER TABLE Import
SWITCH PARTITION #importPart
TO Hourly PARTITION #hourlyPart;