how to optimize table in MYSQL? - mysql

I have one database and it contains 100 tables , in which three tables size are increase to 3 GB to 8 GB.what i do to reduce the table size.?
i am using optimize table command , it is working fine but size not decrease .
i am using percona tool kit command .
commands are working fine and completed successfully but
table size is same there is no effect on table size.
what will i do to solve this issue?
HOW I CAN OPTIMIZE TABLE WITH pt-online-schema-change for optimizing very large table?
my table size is 10 GB how can i decrease the table size.

If the data in your tables is already well organized on disk, optimize table won't help you.
You can:
delete old data
get rid of secondary indexes that you don't need
turn on compression for your tables
partition your tables and move certain partitions to other storage

How to shrink disk footprint?
Don't use INT (4 bytes) when MEDIUMINT (3 bytes) will suffice.
Don't use MEDIUMINT when SMALLINT (2 bytes) will suffice. Etc.
Don't use utf8 for hex strings.
Normalize repeated values. (Within reason)
Compress TEXT fields in the client, then use BLOB.
Do not PARTITION unless there is a valid reason for it; it is likely to make the footprint bigger, perhaps significantly bigger.
INDEX(a) is useless if you have INDEX(a,b)
OPTIMIZE TABLE is likely to be a waste of time. (As you found out.)
pt-online-schema-change (or 5.6) helps you make certain changes without locking the table a long time.
You have not even mentioned the basics -- MyISAM or InnoDB. (Some techniques differ.) Windows or Unix? (Some tools do not work on Windows.)
Let's see SHOW CREATE TABLE; we can probably come up with more suggestions.
Why is 10GB scaring you? Many systems deal with 100GB. 1TB gets a bit scary.

Related

Is it a good idea to distribute records of one table into several multiple tables having similar table structure

I have a table with records 62 Million.
Table structure: 52 columns
Storage engine: InnoDB
Collation: utf8_general_ci
SELECT - Maximum number of operations performed
INSERT - Always in bulk but it doesn't happen always.
UPDATE - Very less number of operations but sometime much and sometime not at all
Since we are fetching in real time almost always. Is it a good idea to distribute records from this one big table in some logic into multiple similar tables in order to select record pretty much faster?
MYSQL Version: mysql Ver 15.1 Distrib 10.2.33-MariaDB
It is almost guaranteed to be slower by that technique.
Provide CREATE TABLE and the important queries.
Often a SELECT can be sped up by a composite index and/or a reformulation.
62M rows is above average, but not a scary size.
"INSERT - Always in bulk" -- Let's see your technique; there may be a way to speed it up further.
"Archiving" old data -- Actually removing the data may help some. Keeping it around, but using suitable indexes is usually fine. We need to see your queries and schema.
"Sharding" is putting parts of the data in separate servers. This is handy when you have exhausted the write capacity of a single machine.
"Replication" (Primary + Replica) allows shifting reads to another server, thereby spreading the load. With this technique, you system can handle a virtually unlimited number of SELECTs.
"Yes, indexes have been implemented" -- That may mean that you have one index per column. This is almost always not optimal.
"128GB RAM" -- If the entire dataset is much smaller than that, then most of the RAM is going unused.
"Query design" -- I had one example of a CPU that was pegged at 100%. After making the following change, the CPU dropped to 1%:
SELECT ... WHERE DATE(dt) = CURDATE();
-->
SELECT ... WHERE dt >= CURDATE();

How to compress columns in MySQL?

I have a table which stores e-mail correspondences. Every time someone
replies, the whole body of the trail is also included and saved into
the database (and I need it that way because the amount of application
level changes to rectify that are going to be too high).
The size of the mail text column is 10000. But, I am having trouble storing text more than that. As I am not sure, how many correspondences can occur, I don't know what a good number will be for the column.
The engine is InnoDB. Can I use some kind of columnar compression technique in MySQL to avoid increasing the size of the column?
And, what if I go ahead and increase the varchar column to, say, 20000. The table has about 2 million records. Will that be a good thing to do?
You are probably looking for MySQL COMPRESS() and UNCOMPRESS() function to compress data for storage and retrieval respectively.
Also look at InnoDB Compression Usage.
As long as the data doesn't need editing, you can use the archive engine.
This answer is specific to Percona
Percona introduced a compressed column format a while ago. That you can use on CREATE or ALTERs
CREATE TABLE test_compressed (
id INT NOT NULL PRIMARY KEY,
value MEDIUMTEXT COLUMN_FORMAT COMPRESSED
);
Reference: https://www.percona.com/doc/percona-server/5.7/flexibility/compressed_columns.html
For me the best way to use text data compression is to use a Percona compressed column format.
ALTER TABLE `tableName` MODIFY `mail` TEXT COLUMN_FORMAT COMPRESSED NOT NULL;
I've tested compression on table used as cache, storing mainly HTML data, the size decreased from 620 MB to 110.6MB.
I think you should consider using TEXT type instead of long VARCHAR.
Data fields are stored separately from innodb clustered index and it can affect and probably improve the performance of your database.
You have a few different options:
Wait for the RFE to add column compression to MySQL (see https://bugs.mysql.com/bug.php?id=106541) - unlikely this will ever be done
Use application level compression and decompression - much more work involved in doing this
Rely on MySQL's compress and uncompress functions to do this for you (see https://dev.mysql.com/doc/refman/8.0/en/encryption-functions.html#function_compress) - these are not reliable as they depend on how MySQL was compiled (zlib or not) - and they don't give great results a lot of the time
Don't worry about the file size as disk space is cheap and simply change the column type to TEXT (see https://dev.mysql.com/doc/refman/8.0/en/blob.html)
Often the best option if disk space is your main concern is changing the table to be compressed using: ALTER TABLE t1 ROW_FORMAT = COMPRESSED; - for emails this can give very good compression and if need be it can be tuned for even better compression for your particular workload (see https://dev.mysql.com/doc/refman/8.0/en/innodb-compression-tuning.html)

MySQL Memory Engine vs InnoDB on RAMdisk

I'm writing a bit of software that needs to flatten data from a hierarchical type of format into tabular format. Instead of doing it all in a programming language every time and serving it up, I want to cache the results for a few seconds, and use SQL to sort and filter. When in use, we're talking 400,000 writes and 1 or 2 reads over the course of those few seconds.
Each table will contain 3 to 15 columns. Each row will contain from 100 bytes to 2,000 bytes of data, although it's possible that in some cases, some rows may get up to 15,000 bytes. I can clip data if necessary to keep things sane.
The main options I'm considering are:
MySQL's Memory engine
A good option, almost specifically written for my use case! But.. "MEMORY tables use a fixed-length row-storage format. Variable-length types such as VARCHAR are stored using a fixed length. MEMORY tables cannot contain BLOB or TEXT columns." - Unfortunately, I do have text fields with a length up to maybe 10,000 characters - and even that is a number that is not specifically limited. I could adjust the varchar length based on the max length of text columns as I loop through doing my flattening, but that's not totally elegant. Also, for my occasional 15,000 character row, does that mean I need to allocate 15,000 characters for every row in the database? If there was 100,000 rows, that's 1.3 gb not including overhead!
InnoDB on RAMDisk
This is meant to run on the cloud, and I could easily spin up a server with 16gb of ram, configure MySQL to write to tmpfs and use full featured MySQL. My concern for this is space. While I'm sure engineers have written the memory engine to prevent consuming all temp storage and crashing the server, I doubt this solution would know when to stop. How much actual space will my 2,000 bytes of data consume when in database format? How can I monitor it?
Bonus Questions
Indexes
I will in fact know in advance which columns need to be filtered and sorted by. I could set up an index before I do inserts, but what kind of performance gain could I honestly expect on top of a ram disk? How much extra overhead to indexes add?
Inserts
I'm assuming inserting multiple rows with one query is faster. But the one query, or series of large queries are stored in memory, and we're writing to memory, so if I did that I'd momentarily need double the memory. So then we talk about doing one or two or a hundred at a time, and having to wait for that to complete before processing more.. InnoDB doesn't lock the table but I worry about sending two queries too close to each other and confusing MySQL. Is this a valid concern? With the MEMORY engine I'd have to definitely wait for completion, due to table locks.
Temporary
Are there any benefits to temporary tables other than the fact that they're deleted when the db connection closes?
I suggest you use MyISAM. Create your table with appropriate indexes for your query. Then disable keys, load the table, and enable keys.
I suggest you develop a discipline like this for your system. I've used a similar discipline very effectively.
Keep two copies of the table. Call one table_active and the second one table_loading.
When it's time to load a new copy of your data, use commands like this.
ALTER TABLE table_loading DISABLE KEYS;
/* do your insertions here, to table_loading */
/* consider using LOAD DATA INFILE if it makes sense. */
ALTER TABLE table_loading ENABLE KEYS; /* this will take a while */
/* at this point, suspend your software that's reading table_active */
RENAME TABLE table_active TO table_old;
RENAME TABLE table_loading TO table_active;
/* now you can resume running your software */
TRUNCATE TABLE table_old;
RENAME TABLE table_old TO table_loading;
Alternatively, you can DROP TABLE table_old; and create a new table for table_loading instead of the last rename.
This two-table (double-buffered) strategy should work pretty well. It will create some latency because your software that's reading the table will work on an old copy. But you'll avoid reading from an incompletely loaded table.
I suggest MyISAM because you won't run out of RAM and blow up and you won't have the fixed-row-length overhead or the transaction overhead. But you might also consider MariaDB and the Aria storage engine, which does a good job of exploiting RAM buffers.
If you do use the MEMORY storage engine, be sure to tweak your max_heap_table_size system variable. If your read queries will use index range scans (sequential index access) be sure to specify BTREE style indexes. See here: http://dev.mysql.com/doc/refman/5.1/en/memory-storage-engine.html

Is InnoDB (MySQL 5.5.8) the right choice for multi-billion rows?

So, one of my tables in MySQL which uses the InnoDB storage engine will contain multi-billion rows(with potentially no limit to how many will be inserted).
Can you tell me what sort of optimizations i can do to help speed up things?
Cause with a few million rows already, it will start getting slow.
Of course if you suggest to use something else. The only options i have are PostgreSQL and Sqlite3. But I've been told that sqlite3 is not a good choice for that.
As for postgresql, i have absolutely no idea how it is, as i've never used it.
I imagine though, at least about 1000-1500 inserts per second in that table.
A simple answer to your question would be yes InnoDB would be the perfect choice for a multi-billion row data set.
There is a host of optimization that is possbile.
The most obvious optimizations would be setting a large buffer pool, as buffer pool is the single most important thing when it comes to InnoDB because InnoDB buffers the data as well as the index in the buffer pool. If you have a dedicated MySQL server with only InnoDB tables, then you should set upto 80% of the available RAM to be used by InnoDB.
Another most important optimization is having proper indexes on the table (keeping in mind the data access/update pattern), both primary and secondary. (Remember that primary indexes are automatically appended to secondary indexes).
With InnoDB there are some extra goodies, such as protection from data corruption, auto-recovery etc.
As for increasing write-performance, you should setup your transaction log files to be upto a total of 4G.
One other thing that you can do is partition the table.
You can eek out more performance, by setting the bin-log-format to "row", and setting the auto_inc_lock_mode to 2 (that will ensure that innodb does not hold table level locks when inserting into auto-increment columns).
If you need any specific advice you can contact me, I would be more than willing to help.
optimizations
Take care not to have too many indexes. They are expensive when inserting
Make your datatypes fit your data, as tight fit you can. (so don't go saving ip-adresses in a text or a blob, if you know what i mean). Look in to varchar vs char. Don't forget that because varchar is more flexible, you are trading in some things. If you know a lot about your data it might help to use char's, or it might be clearly better to use varchars. etc.
Do you read at all from this table? If so, you might want to do all the reading from a replicated slave, although your connection should be good enough for that amount of data.
If you have big inserts (aside from the number of inserts), make sure your IO is actually quick enough to handle the load.
I don't think there is any reason MySQL wouldn't support this. Things that can slow you down from "thousands" to "millions" to "billions" are stuff like aforementioned indexes. There is -as far as i know- no "mysql is full" problem.
Look into Partial indexes. From wikipedia (quickest source I could find, didn't check the references, but I'm sure you can manage:)
MySQL as of version 5.4 does not
support partial indexes.[3] In MySQL,
the term "partial index" is sometimes
used to refer to prefix indexes, where
only a truncated prefix of each value
is stored in the index. This is
another technique for reducing index
size.[4]
No idea on the MySQL/InnoDB part (I'd assume it'll cope). But if you end up looking at alternatives, PostgreSQL can manage a DB of unlimited size on paper. (At least one 32TB database exists according to the FAQ.)
Can you tell me what sort of optimizations i can do to help speed up things?
Your milage will vary depending on your application. But with billions of rows, you're at least looking into partitioning your data, in order to work on smaller tables.
In the case of PostgreSQL, you'd also look into creating partial indexes where appropriate.
You may want to have a look at:
http://www.mysqlperformanceblog.com/2006/06/09/why-mysql-could-be-slow-with-large-tables/
http://forums.whirlpool.net.au/archive/954126
If you have a very large table (Billions of records) and need to data mine the table (queries that read lots of data), mysql can slow to a crawl.
Large databases (200+GB) are fine, but they are bound by IO/ temp table to disk and multiple other issues when attempting to read large groups that don't fit in memory.

MySQL ALTER TABLE on very large table - is it safe to run it?

I have a MySQL database with a MyISAM table with 4 million rows. I update this table about once a week with about 2000 new rows. After updating, I then alter the table like this:
ALTER TABLE x ORDER BY PK DESC
I order the table by the primary key field in descending order. This has not given me any problems on my development machine (Windows with 3GB memory). Three times I have tried it successfully on the production Linux server (with 512MB RAM - and achieving the resulted sorted table in about 6 minutes each time), the last time I tried it I had to stop the query after about 30 minutes and rebuild the database from a backup.
Can a 512MB server cope with that alter statement on such a large table? I have read that a temporary table is created to perform the ALTER TABLE command.
Question: Can this alter command be safely run? What should be the expected time for the alteration of the table?
As I have just read, the ALTER TABLE ... ORDER BY ... query is useful to improve performance in certain scenarios. I am surprised that the PK Index does not help with this. But, from the MySQL docs, it seems that InnoDB does use the index. However InnoDB tends to be slower as MyISAM. That said, with InnoDB you wouldn't need to re-order the table but you would lose the blazing speed of MyISAM. It still may be worth a shot.
The way you explain the problems, it seems that there is too much data loaded into memory (maybe there is even swapping going on?). You could easily check that with monitoring your memory usage. It's hard to say as I do not know MySQL all that well.
On the other hand, I think your problem lies at a very different place: You are using a machine with only 512 Megs of RAM as Database server with a table containing more than 4Mio rows... And you are performing a very memory-heavy operation on the whole table on that machine. It seems that 512Megs will not nearly be enough for that.
A much more fundamental issue I am seeing here: You are doing development (and quite likely testing as well) in an environment that is very different to the production environment. The kind of problem you are explaining is to be expected. Your development machine has six times as much memory as your production machine. I believe I can safely say, that the processor is much faster as well. In that case, I suggest you create a virtual machine mimicking your production site. That way you can easily test your project without disrupting the production site.
What you're asking it to do is rebuild the entire table and all its indexes; this is an expensive operation particularly if the data doesn't fit in ram. It will complete, but it will be vastly slower if the data doesn't fit in ram, particularly if you have lots of indexes.
I question your judgement when choosing to run a machine with such tiny memory in production. Anyway:
Is this ALTER TABLE really necessary; what specific query are you trying to speed up, and have you tried it without?
Have you considered making your development machine more like production? I mean, using a dev box with MORE memory is never a good idea, and using a different OS is definitely not either.
There is probably also some tuning you can do to try to help; it largely depends on your schema (indexes in particular). 4M rows is not very many (for a machine with normal amounts of ram).
is the primary key auto_increment? if so, then doing ALTER TABLE ... ORDER BY isn't going to improve anything since everything will be inserted in order anyway.
(unless you have lots of deletes)
I'd probably create a View instead which is ordered by the PK value, so that for one thing you don't need to lock up that huge table while the ALTER is being performed.
If you're using InnoDB, you shouldn't have to explicitly perform the ORDER BY either post-insert or at query time. According to the MySQL 5.0 manual, InnoDB already defaults to primary key ordering for query results:
http://dev.mysql.com/doc/refman/5.0/en/alter-table.html#id4052480
MyISAM tables return records in insertion order by default, instead, which may work as well if you only ever append to the table, rather than using an UPDATE query to modify any rows in-place.