MySQL archival of a live production database table - mysql

I need a proper strategy to archive a live production table.
This table has a lot of inserts happening on it(close to 1000 inserts per minute). It is a myisam table and it has a key column with an auto incremented numeric value.
I need to move data older than Jan 01 to a new archive table
Inserts should not get affected.
The data is hosted on an RDS instance in Amazon.
Please help!
EDIT:
The table structure is:
CREATE TABLE data (
id int(20) NOT NULL AUTO_INCREMENT,
id_1 varchar(64) CHARACTER SET utf8 NOT NULL,
id_2 varchar(64) CHARACTER SET utf8 NOT NULL,
timestamp int(10) unsigned NOT NULL,
status_code int(10) unsigned NOT NULL,
PRIMARY KEY (id),
UNIQUE KEY check_2 (id_1,id_2,timestamp,status_code),
KEY account_id_3 (id_1,timestamp)
) ENGINE=MyISAM AUTO_INCREMENT=75996470 DEFAULT CHARSET=latin1;
In addition to the above fields, there are about 30 more fields in this table which can accept NULL values.

Related

Appropriate MySQL Table Engine for this Scenario

We have a table that will store data every minute per user. We have about 1,000 users. The table only has 11 columns. As mentioned a new record is create for each user each minute, so 1440 records per user per day. The table is highly indexed.
Once created, the data is read and processed via cron jobs every hour.
After 14 days the data is deleted. This is a rolling process.
Generic MySQL wisdom seems to be to use InnoDB for everything, however we have had problems deleting large amounts of data using InnoDB. A memory table is no good as the data must survive a reboot.
Does anyone understand the other MySQL table engines well enough to know is a different type would be better in this scenario?
Here is the table definition:
CREATE TABLE geoc1clo_where.map_data (
MapID bigint(20) NOT NULL AUTO_INCREMENT,
Date date DEFAULT NULL,
DeviceID varchar(128) DEFAULT NULL,
Alarm varchar(255) DEFAULT NULL,
FixTime datetime DEFAULT NULL,
Valid int(1) DEFAULT NULL,
Lat double DEFAULT NULL,
Lon double DEFAULT NULL,
Speed float DEFAULT NULL,
Course float DEFAULT NULL,
Address varchar(512) DEFAULT NULL,
PRIMARY KEY (MapID),
INDEX IDX_map_data (MapID, FixTime, DeviceID),
INDEX IDX_map_data_FixTime (FixTime),
INDEX IDX_map_data2 (DeviceID, FixTime),
INDEX IDX_map_data3 (DeviceID, Date)
)
ENGINE = MYISAM
AUTO_INCREMENT = 98169276
AVG_ROW_LENGTH = 69
CHARACTER SET latin1
COLLATE latin1_swedish_ci;

MySQL partitioning by table rows

I create a table as below
CREATE TABLE `Archive_MasterLog` (
`LogID` INT(10) NOT NULL AUTO_INCREMENT,
`LogDate` DATETIME NULL,
`AssessorName` VARCHAR(255) NULL,
`TblName` VARCHAR(100) NULL,
PRIMARY KEY (`LogID`),
UNIQUE INDEX `Index_72491D22_3806_4A01` (`LogID`)
)
ENGINE = INNODB;
I want to partitioning this table by number of rows of table ==> every of 100K rows will create a new partition.
How can do it from MySQL?
Why? You will probably gain no benefits from PARTITIONing.
Will you be purging old data? If so, the partition on LogDate. Then we can discuss how to purge.
You have two keys on the same pair of rows, keep the PRIMARY KEY, toss the UNIQUE key.
You have an index on RecordID, but that column does not exist??
The problem comes from the frequently of data. Some months or weeks we have more than 2M rows/month but others month we have less than 10K rows. I reviewed the data and found that the we should partition by LogID
The reasion also comes from the customer. They don't want to change the the key of table.
Here's my solution
CREATE TABLE `ULPAT`.`MasterLog` (
`LogID` INT(10) NOT NULL AUTO_INCREMENT,
`LogDate` DATETIME NULL,
`AssessorName` VARCHAR(255) NULL,
`TblName` VARCHAR(100) NULL,
PRIMARY KEY (`LogID`),
INDEX `LogID` (`LogID`)
)
ENGINE = INNODB
PARTITION BY HASH(mod(ceiling(LogID*0.0000005), 400))
PARTITIONS 400;
I think this is not the best solution but work for me.
Thanks

What is the meaning of AUTO_INCREMENT=20018215 here in table schema

CREATE TABLE `tblspmaster` (
`CSN` bigint(20) NOT NULL AUTO_INCREMENT,
`SP` varchar(50) NOT NULL,
`FileImportedDate` date NOT NULL,
`AMZFileName` varchar(580) NOT NULL,
`CasperBatch` varchar(50) NOT NULL,
`BatchProcessedDate` date NOT NULL,
`ExpiryDate` date NOT NULL,
`Region` varchar(50) NOT NULL,
`FCCity` varchar(50) NOT NULL,
`VendorID` int(11) NOT NULL,
`LocationID` int(11) NOT NULL,
PRIMARY KEY (`CSN`)
) ENGINE=InnoDB AUTO_INCREMENT=20018215 DEFAULT CHARSET=latin1;
What is the meaning of AUTO_INCREMENT=20018215 here in table schema . as i am inserting 500k records my identity is OK from 1 to 500k but when i tried to insert next 500k records, next records identity column value is 524281 instead of 500001.
It means that the first auto-assigned value (to CSN) will be 20018215
The large initial value, 20018215, was probably the previous value of the auto increment when you did a "Send to SQL Editor" -> "Create Statement" menu selection in MySQL Workbench. This is just a safe value to skip over existing data just in case you have to reimport the previous records.
I had the same question, but after generating several "Create" edit templates from known tables, I noticed the AUTO_INCREMENT value corresponded to the quantity of existing records in those tables. I removed the large values from my templates since I want my new tables to begin with a primary key = 1.

Over-Indexed a MySQL table. How can I remediate?

I have a table with 3 million rows and 6 columns. The problem is that my mysqld server wouldn't generate the output for any query and it would simply time out.
I then read here that over-indexing could involve too much of swapping data from memory to disk and can cause the server to slow down.
So I ran a query ALTER TABLE <Tbl_name> DROP INDEX <Index_name>;. This query has been running for 10 hours and has not completed yet.
Is this expected to run for so long?
Is there a better way to Dropping/Altering my indices?
edit - Added SHOW CREATE TABLE output
| Sample | CREATE TABLE `sample` (
`ID` int(11) NOT NULL AUTO_INCREMENT,
`FiMD5` varchar(32) NOT NULL,
`NoMD5` varchar(32) NOT NULL,
`SeMD5` varchar(32) NOT NULL,
`SeesMD5` varchar(32) NOT NULL,
`ImMD5` varchar(32) NOT NULL,
`Ovlay` tinyint(1) NOT NULL DEFAULT '1',
PRIMARY KEY (`ID`),
KEY `FiMD5_3` (`FiMD5`),
KEY `ID` (`ID`),
KEY `ID_2` (`ID`),
KEY `pIndex` (`FiMD5`),
KEY `FiMD5_` (`FiMD5`,`NoMD5`)
) ENGINE=InnoDB AUTO_INCREMENT=3073630 DEFAULT CHARSET=latin1 |
Perhaps do the following would be faster:
SELECT ... INTO OUTFILE first
Use TRUNCATE TABLE to delete everything
Modify the table
Use LOAD to restore the data
If step 2 takes too long, perhaps dropping the table and recreate it.

How can you speed up making changes to large tables (200k+ rows) in mysql databases?

I have a table inside of my mysql database which I constantly need to alter and insert rows into but it continues running slow when I make changes making it difficult because there are over 200k+ entries. I tested another table which has very few rows and it moves quickly, so it's not the server or database itself but that particular table which has a tough time. I need all of the table's rows and cannot find a solution to get around the load issues.
DROP TABLE IF EXISTS `articles`;
/*!40101 SET #saved_cs_client = ##character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `articles` (
`id` int(11) NOT NULL auto_increment,
`content` text NOT NULL,
`author` varchar(255) NOT NULL,
`alias` varchar(255) NOT NULL,
`topic` varchar(255) NOT NULL,
`subtopics` varchar(255) NOT NULL,
`keywords` text NOT NULL,
`submitdate` timestamp NOT NULL default CURRENT_TIMESTAMP,
`date` varchar(255) NOT NULL,
`day` varchar(255) NOT NULL,
`month` varchar(255) NOT NULL,
`year` varchar(255) NOT NULL,
`time` varchar(255) NOT NULL,
`ampm` varchar(255) NOT NULL,
`ip` varchar(255) NOT NULL,
`score_up` int(11) NOT NULL default '0',
`score_down` int(11) NOT NULL default '0',
`total_score` int(11) NOT NULL default '0',
`approved` varchar(255) NOT NULL,
`visible` varchar(255) NOT NULL,
`searchable` varchar(255) NOT NULL,
`addedby` varchar(255) NOT NULL,
`keyword_added` varchar(255) NOT NULL,
`topic_added` varchar(255) NOT NULL,
PRIMARY KEY (`id`),
KEY `score_up` (`score_up`),
KEY `score_down` (`score_down`),
FULLTEXT KEY `SEARCH` (`content `),
FULLTEXT KEY `asearch` (`author`),
FULLTEXT KEY `topic` (`topic`),
FULLTEXT KEY `keywords` (`content `,`keywords`,`topic`,`author`),
FULLTEXT KEY `content ` (`content `,`keywords`),
FULLTEXT KEY `new` (`keywords`),
FULLTEXT KEY `author` (`author`)
) ENGINE=MyISAM AUTO_INCREMENT=290823 DEFAULT CHARSET=latin1;
/*!40101 SET character_set_client = #saved_cs_client */;
With indexes it depends:
more indexes = faster selecting, slower inserting
less indexes = slower selecting, faster inserting
Because the index tables has to be rebuild when inserting and the more data in the table is the more work is for mysql to do to rebuild the index.
So maybe you could remove indexes you not need, that should speed your inserting up.
Another option is to partition you table into many - this stops the bottle neck.
Just try to pass the changes in an update script. This is slow because it creates tables. try updating the tables where changes has been made.
For example create a variable that catches all the changes in the program, with that, insert it to the tables query. That should be fast enough for programs. But as we all know speed depends on how much data is processed.
Let me know if you need anything else.
This may or may not help you directly, but I notice that you have a lot of VARCHAR(255) columns in your table. Some of them seem like they might be totally unnecessary — do you really need all those date / day / month / year / time / ampm columns? — and many could be replaced by more compact datatypes:
Dates could be stored as a DATETIME (or TIMESTAMP).
IP addresses could be stored as INTEGERs, or as BINARY(16) for IPv6.
Instead of storing usernames in the article table, you should create a separate user table and reference it using INTEGER keys.
I don't know what the approved, visible and searchable fields are, but I bet they don't need to be VARCHAR(255)s.
I'd also second Adrian Cornish's suggestion to split your table. In particular, you really want to keep frequently changing and frequently accessed metadata, such as up/down vote scores, separate from rarely changing and infrequently accessed bulk data like article content. See for example http://20bits.com/articles/10-tips-for-optimizing-mysql-queries-that-dont-suck/
"I have a table inside of my mysql database which I constantly need to alter and insert rows into but it continues"
Try innodb on this table if you application performs A LOT update, insert concurrently there, row level locking $$$
I recommend you to split that "big table"(not that big actually, but for MySQL it may be) in several tables to make the most of the query cache. Any time you update some record in that table, the query cache is erased. Also you can try to reduce the isolation level, but that is a little more complicated.