What is the faster way to delete old records - mysql

I have a table with about 35 million rows. each has about 35 integer values and one time value (last updated)
The table has two indexes
primary - uses two integer values from the table columns
Secondary - uses the 1st integer from the primary + another integer value.
I would like to delete old records (about 20 millions of them) according to the date field.
What is the fastest way:
1. Delete as is according the the date field?
2. Create another index by date and then delete by date.
There will be one time deletion of large portion of the data and then incremental weekly deletion of much smaller parts.
Is there another way to do it more efficiently?

it might be quicker to create a new table containing the rows you want to keep, drop the old table and then rename the new table

For weekly deletions an index on date field would speed things up.

Fastest (but not easiest) - i think - is to keep your records segmented into multiple
tables based on date, e.g. given week, and then have a union table of all those tables for the regular queries across the whole thing (so your queries would be unaltered). You would each week, create new tables and redefine the union table.
When you wish to drop old records, you simply recreate the union table to leave the records in the old tables out, and then drop those left out (remember to truncate before you drop depending on you filesystem). This is probably the fastest way to get there with MySQL.
A mess to manage though :)

Related

how to structure large table and its transactions in database?

I have two big tables for example:
'tbl_items' and 'tbl_items_transactions'
First table keeping some items metadata which may have 20 (varchar) columns with millions rows... and second table keeping each transaction of first table.
for example if a user insert new record to tbl_items then automatically a new record will be adding to tbl_items_transactions with same data plus date, username and transaction type to keep each row history.
so in the above scenario two tables have same columns but tbl_items_transactions have 3 extra columns date, username, transaction_type to keep each tbl_items history
now assume we have 1000 users that wants to Insert, Update, Delete tbl_items records with a web application. so these two tables scale very soon (maybe billion rows in tbl_items_transactions)
I have tried MySQL, MariaDB, PostgreSQL... they are very good but when table scale and millions rows inserted they are slow when run some select queries on tbl_items_transactions... but sometimes PostgreSQL is faster than MySQL or MariaDB
now I think I'm doing wrong things... If you was me... do you use MariaDB or PostgreSQL or somthing like that and structure your database like what I did?
Your setup is wrong.
You should not duplicate the columns from tbl_items in tbl_items_transactions, rather you should have a foreign key in the latter table pointing to the former.
That way data integrity is preserved, and tbl_items_transactions will be much smaller. This technique is called normalization.
To speed up queries when the table get large, define indexes on them that match the WHERE and JOIN conditions.

Can resetting initial/Start Auto-Increment value affect speed of queries?

I have a 6 million record table with an auto-increment ID PK. Due to various operations over the last several weeks, my starting ID is 2 million. Updates and other queries take a long time, and I'm wondering if having an iD range from 2mil to 8mil vs starting at 1 to 6 million could be responsible? I've noticed anecdotally that if I do selects/updates using a range of say ID>1000000 and ID<1001000 seems to be slower than ID>1 and ID<1000.
Is it worth it to remove the existing PK and add a new one starting at 1? I know I can do
ALTER TABLE tablename Auto-Increment=1
but I cannot do this here with 6 million existing records and Auto-Increment IDs already.
Clearly I can do try and test but for various reasons including the time it is going to take given the size of table, indexes, etc I'd prefer to ask before going to the time and effort if anyone knows the answer definitively.
Update:
For now I did the following:
CREATE TABLE table_new LIKE table;
To dupe the table with indexes and all
Then:
Alter Table table_new set Auto-Increment=1
So the empty duped table re-sets count to 1
Then I inserted from the original table to the duped table:
Insert into Table_New (FieldA,FieldB,FieldC)
Select FieldA,FIeldB,FieldC from Table
To insert all the records minus the ID field so that the Auto-Increment is added per inserted record, starting at 1 as the re-set specified and finally of course:
RENAME TABLE table TO table_old;
RENAME TABLE table_new TO table;

How to handle large amounts of data in MySQL database?

Background
I have spent couple of days trying to figure out how I should handle large amounts of data in MySQL. I have selected some programs and techniques for the new server for the software. I am probably going to use Ubuntu 14.04LTS running nginx, Percona Server and will be using TokuDB for the 3 tables I have planned and InnoDB for the rest of the tables.
But yet I have the major problem unresolved. How to handle the huge amount of data in database?
Data
My estimates for the possible data to receive is 500 million rows a year. I will be receiving measurement data from sensors every 4 minutes.
Requirements
Insertion speed is not very critical, but I want to be able to select few hundred measurements in 1-2 seconds. Also the amount of required resources is a key factor.
Current plan
Now I have thought of splitting the sensor data in 3 tables.
EDIT:
On every table:
id = PK, AI
sensor_id will be indexed
CREATE TABLE measurements_minute(
id bigint(20),
value float,
sensor_id mediumint(8),
created timestamp
) ENGINE=TokuDB;
CREATE TABLE measurements_hour(
id bigint(20),
value float,
sensor_id mediumint(8),
created timestamp
) ENGINE=TokuDB;
CREATE TABLE measurements_day(
id bigint(20),
value float,
sensor_id mediumint(8),
created timestamp
) ENGINE=TokuDB;
So I would be storing this 4 minute data for one month. After the data is 1 month old it would be deleted from minute table. Then average value would be calculated from the minute values and inserted into the measurements_hour table. Then again when the data is 1 year old all the hour data would be deleted and daily averages would be stored in measurements_day table.
Questions
Is this considered a good way of doing this? Is there something else to take in consideration? How about table partitioning, should I do that? How should I execute the splitting of the date into different tables? Triggers and procedures?
EDIT: My ideas
Any idea if MonetDB or Infobright would be any good for this?
I have a few suggestions, and further questions.
You have not defined a primary key on your tables, so MySQL will create one automatically. Assuming that you meant for "id" to be your primary key, you need to change the line in all your table create statements to be something like "id bigint(20) NOT NULL AUTO_INCREMENT PRIMARY KEY,".
You haven't defined any indexes on the tables, how do you plan on querying? Without indexes, all queries will be full table scans and likely very slow.
Lastly, for this use-case, I'd partition the tables to make the removal of old data quick and easy.
I had to solve that type of ploblem before, with nearly a Million rows per hour.
Some tips:
Engine Mysam. You don't need to update or manage transactions with that tables. You are going to insert, select the values, and eventualy delete it.
Be careful with the indexes. In my case, It was critical the insertion and sometimes Mysql queue was full of pending inserts. A insert spend more time if your table has more index. The indexes depends of your calculated values and when you are going to do it.
Sharding your buffer tables. I only trigger the calculated values when the table was ready. When I was calculating my a values in buffer_a table, it's because the insertions was on buffer_b one. In my case, I calculate the values every day, so I switch the destination table every day. In fact, I dumped all the data and exported it in another database to make the avg, and other process without disturb the inserts.
I hope you find this helpful.

Need Suggestion for applying index to a table containing 6 hundred thousand records

I have a table storing latitude, longitude, address and auto-generated id. The table contains 6 hundred thousand records in it. I have to apply UNIQUE INDEX to the combination of lat-lon. But it is taking too much time, and query timesout. The query is
ALTER TABLE `<db name>`.`<table name>`
ADD UNIQUE `latLonIndex`(`latitude`, `longitude`);.
I am using MySQL server 5.0. Please suggest a way.
Also, the current table has some duplicate lat-lon combination. Hence, I can not apply the unique index. Please suggest a way to overcome this difficulty as well.
Create a second table that has the same columns and the index you need. Insert from the other table in batches. When done, remove the old table and rename the new one to the old one's name.

MySQL Analyze and Optimize - Are they required if only inserts - and the table has no joins?

I have a MyISAM table in MySQL which consists of two fields (f1 integer unsigned, f2 integer unsigned) and contains 320 million rows. I have an index on f2. Every week I insert about 150,000 rows into this table. I would like to know what is the frequency with which I need to run "analyze" and "optimize" on this table (as it would probably take a long time and block in the meantime)? I do not do any deletes or update statements, but just insert new rows every week. Also, I am not using this table in any joins so, based on this information, are "analyze" and "optimize" really required?
Thanks in advance,
Tim
ANALYZE TABLE checks the keys, OPTIMIZE TABLE kind of reorganizes tables.
If you never...ever... delete or update the data in your table, only insert new ones, you won't need analyze or optimize.