high overhead in new mysql table - mysql

does anyone knows why I get an overhead of 131.0 MiB on a newly created table (zero rows)?
im using phpmy admin and the code of my script is
CREATE TABLE IF NOT EXISTS `mydb`.`mytable` (
`idRol` INT NOT NULL AUTO_INCREMENT ,
`Rol` VARCHAR(45) NOT NULL ,
PRIMARY KEY (`idRol`) )
ENGINE = InnoDB;
thanks in advance.

InnoDB uses a shared table space. That means that per default all the tables regardless of database are stored in a single file in the filesystem. This differs from for example MyISAM which stores every table as a single file.
The behaviour of InnoDB can be changed, although I don't think it's really necessary in this case. See Using Per-Table Tablespaces.
The overhead is probably the space left by deleted rows, and InnoDB will reuse it when you insert new data. It's nothing to be concerned about.

It might be because mysql generated an index on 'idRol'
Storing an index takes some space, but I am not sure if this is the reason. It's only a guess. I'm not a DBA.

Related

MySQL ADD COLUMN slow under AWS RDS

I have an RDS MySql with the following settings:
Class: db.m5.xlarge
Storage: Prosisionned 1000 IOPS (SSD)
I then want to add a few columns to a table that is about 20 GB in size (according to INFORMATION_SCHEMA.files). Here's my statement:
ALTER TABLE MY_TABLE
ADD COLUMN NEW_COLUMN_1 DECIMAL(39, 30) NULL,
ADD COLUMN NEW_COLUMN_2 DECIMAL(39, 30) NULL,
ADD COLUMN NEW_COLUMN_3 INT(10) UNSIGNED NULL,
ADD CONSTRAINT SOME_CONSTRAINT FOREIGN KEY (NEW_COLUMN_3) REFERENCES SOME_OTHER_TABLE(SOME_OTHER_PK),
ADD COLUMN NEW_COLUMN_4 DATE NULL;
This query took 172 minutes to execute. Most of this time was spent coping the data to a temporary table.
During that operation, there were no other queries (read or write) being executed. I had the database just for myself. SHOW FULL PROCESSLIST was saying that State was equal to copy to tmp table for my query.
What I don't understand is that the the AWS RDS Console tells me that the write througput was between 30 MB/s and 35 MB/s for 172 minutes.
Assuming a write througput of 30 MB/s, I should have been able to write 30 * 60 * 172 = 309600 MB = 302 GB. This is much bigger than the size of the temporary table that was created during the opration (20 GB).
So two questions:
what is mysql/rds writing beside my temp table? Is there a way to disable that so that I can get the full bandwidth to create the temp table?
is there any way to accelerate that operation? Taking 3 hours to write 20 GB of data seems pretty long.
I was using MySQL 5.7. According to this MySQL blog post, version 8.0 improved the situation: "InnoDB now supports Instant ADD COLUMN".
I therefore changed my query to use the new feature.
-- Completes in 0.375 seconds!
ALTER TABLE MY_TABLE
ADD COLUMN NEW_COLUMN_1 DECIMAL(39, 30) NULL,
ADD COLUMN NEW_COLUMN_2 DECIMAL(39, 30) NULL,
ADD COLUMN NEW_COLUMN_3 INT(10) UNSIGNED NULL,
-- 'ALGORITHM=INSTANT' is not compatible with foreign keys.
-- The foreign key will need to be added in another statement
-- ADD CONSTRAINT SOME_CONSTRAINT FOREIGN KEY (NEW_COLUMN_3) REFERENCES SOME_OTHER_TABLE(SOME_OTHER_PK),
ADD COLUMN NEW_COLUMN_4 DATE NULL,
-- the new option
ALGORITHM=INSTANT;
-- This completed in about 6 minutes.
-- Adding the foreign creates an index under the hood.
-- This index was 1.5 GB big.
SET FOREIGN_KEY_CHECKS=0;
ALTER TABLE MY_TABLE
ADD FOREIGN KEY (NEW_COLUMN_3) REFERENCES SOME_OTHER_TABLE(SOME_OTHER_PK);
SET FOREIGN_KEY_CHECKS=1;
So my conclusions:
upgrade to MySQL 8 if you can
make sure that you always use (when possible) the ALGORITHM=INSTANT option.
InnoDB is probably the storage engine you are using, since it's the default storage engine. InnoDB does some I/O that might seem redundant, to ensure there is no data loss.
For example:
Data and index pages modified in the buffer pool must be written to the tablespace. The table may need to split some pages during the process of adding columns, because the rows become wider, and fewer rows fit per page.
During writing pages to the tablespace, InnoDB first writes those pages to the doublewrite buffer, to ensure against data loss if there's a crash during a page write.
Transactions are written to the InnoDB redo log, and this may even result in multiple overwrites to the same block in the log.
Transactions are also written to the binary log if it is enabled for purposes of replication. Though this shouldn't be a big cost in the cast of an ALTER TABLE statement, because DDL statements are always written to the binary log in statement format, not in row format.
You also asked what can be done to speed up the ALTER TABLE. The reason to want it to run faster is usually because during an ALTER TABLE, the table is locked and may block concurrent queries.
At my company, we use the free tool pt-online-schema-change, so we can continue to use the table more or less freely while it is being altered. It actually takes longer to complete the alter this way, but it's not so inconvenient since it doesn't block our access to the table.

Reclaiming space from a small but heavily used MySQL 8 table

I have a production mysql 8 server that has a table for user sessions for a PHP application. I am using innodb_file_per_table. The table is small at any given time (about 300-1000 rows), but rows are constantly being deleted and added. Without interference, the sessions.ibd file slowly grows until it takes up all available disk space. This morning, the table was 300 records and took up over 90GB. This built up over the long term (months).
Running OPTIMIZE TABLE reclaims all of the disk space and brings the table back under 100M. An easy solution would be to make a cron script that runs OPTIMIZE TABLE once a week during our maintenance period. Another proposed suggestion is to convert the table to a MyISAM table, since it doesn't really require any of the features of InnoDB. Both of these solutions should be effective, but they are table specific and don't protect against the general problem. I'd like to know whether there is a solution to this problem that involves database configuration.
Here are the non-default innodb configuration options that we're using:
innodb-flush-log-at-trx-commit = 1
innodb-buffer-pool-size = 24G
innodb-log-file-size = 512M
innodb-buffer-pool-instances = 8
Are there other options we should be using so that the sessions.ibd file doesn't continually grow?
Here is the CREATE TABLE for the table:
CREATE TABLE `sessions` (
`id` varchar(255) NOT NULL DEFAULT '',
`data` mediumtext,
`expires` int(11) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
In addition to additions and subtractions, the data column is updated often.
MyISAM would have a different problem -- fragmentation. After a delete, there is a hole in the table. The hole is filled in first. Then a link is made to the next piece of the record. Eventually fetching a row would involve jumping around most of the table.
If 300 rows takes 100MB, a row averages 333KB? That's rather large. And does that number vary a lot? Do you have a lot of text/blob columns? Do they change often? Or is it just delete and add? Would you care to share SHOW CREATE TABLE.
I can't think how a table could grow by a factor of 900 without having at least one very multi-GB row added, then deleted. Perhaps with the schema, I could think of some cause and/or workaround.

MySQL Table taking up too much space?

I have a raw text file, with the size of 8.1GB.
The input data is very straight forward:
Lab_A (string), Lab_B (string), Distance (float)
I was trying to load the data into a table, using LOAD DATA INFILE, but the drive ran out of space.
The destination table had the following format:
Id (INT), Lab_A (VARCHAR), Lab_B (VARCHAR), Distance (FLOAT).
With a primary key of Id and an index of (Lab_A + Distance).
Create statement below:
CREATE TABLE 'warwick_word_suite'.'distances' (
'id' INT NOT NULL AUTO_INCREMENT,
'label1' VARCHAR(45) NOT NULL,
'label2' VARCHAR(45) NOT NULL,
'distance' FLOAT NOT NULL,
PRIMARY KEY ('id'),
INDEX 'LABEL_INDEX' ('label1' ASC, 'distance' ASC));
The drive had 50GB and ran out of space. Given 10GB reserved for the system, I am assuming the table was requesting more than > 32GB for the table.
My question is:
How much do InnoDB tables actually take up, relative to the size of the input data?
Do indexed tables take up a lot more space, compared to identical unindexed tables?
Should I simply order a bigger drive for my database server?
EDIT:
I tracked down the data hog to "ibdata1", stored in /var/lib/mysql. This file is taking up 30.3GB.
Double trouble.
InnoDB takes 2x-3x what the raw data takes. This is a crude approximation; there are many factors.
ibdata1 is the default place to put the table. Having tried to put the table there, that file will not shrink. This can be a problem. It would have been better to have innodb_file_per_table = ON before trying to load the file. Then the table would have gone into a separate .ibd file, and upon failure, that file would have vanished. As it is, you are low on disk space with no simple way to recover it. (Recovery includes dumping all the other InnoDB tables, stopping mysqld, removing ibdata1, restarting, and then reloading the other tables.
Back to the ultimate problem... How to use the data. First, can we see a sample (a few rows) of the data. There may be some clues. How many rows in the table (or lines in the file)?
This may be a case for loading into MyISAM instead of InnoDB; the size for that table will be closer to 8.1GB, plus two indexes, which may add another 5-10GB. Still unpleasantly tight.
Normalizing the lab names would probably be a big gain. Suppose you have 10K labs and 100M distances (every lab to every other lab). Half of those are redundant? Normalizing lab names would save maybe 50 bites per row -- perhaps half the space?
Or you could get more disk space.
Ponder which suggestion(s) of the above you want to tackle; the let us know what you still need help with.

What could cause very slow performance of single UPDATEs of a InnoDB table?

I have a table in my web app for storing session data. It's performing badly, and I can't figure out why. Slow query log shows updating a row takes anything from 6 to 60 seconds.
CREATE TABLE `sessions` (
`id` char(40) COLLATE utf8_unicode_ci NOT NULL,
`payload` text COLLATE utf8_unicode_ci NOT NULL,
`last_activity` int(11) unsigned NOT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `session_id_unique` (`id`) USING HASH
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
The PK is a char(40) which stores a unique session hash generated by the framework this project uses (Laravel).
(I'm aware of the redundancy of the PK and unique index, but I've tried all combinations and it doesn't have any impact on performance in my testing. This is the current state of it.)
The table is small - fewer than 200 rows.
A typical query from the slow query log looks like this:
INSERT INTO sessions (id, payload, last_activity)
VALUES ('d195825ddefbc606e9087546d1254e9be97147eb',
'YTo1OntzOjY6Il90b2tlbiI7czo0MDoi...around 700 chars...oiMCI7fX0=',
1405679480)
ON DUPLICATE KEY UPDATE
payload=VALUES(payload), last_activity=VALUES(last_activity);
I've done obvious things like checking the table for corruption. I've tried adding a dedicated PK column as an auto increment int, I've tried without a PK, without the unique index, swapping the text column for a very very large varchar, you name it.
I've tried switching the table to use MyISAM, and it's still slow.
Nothing I do seems to make any difference - the table performs very slowly.
My next thought was the query. This is generated by the framework, but I've tested hacking it out into a UPDATE with an INSERT if that fails. The slowness continued on the UPDATE statement.
I've read a lot of questions about slow INSERT and UPDATE statements, but those usually related to bulk transactions. This is just one insert/update per user per request. The site is not remotely busy, and it's on its own VPS with plenty of resources.
What could be causing the slowness?
This is not an answer but SE comment length is too damn short. So.
What happens if you run an identical INSERT ... ON DUPLICATE KEY UPDATE... statement directly on the command line? Please try with and without actual usage of the application. The application may be artificially slowing down this UPDATE (for example, in INNODB a transaction might be opened, but committed after a lot of time was consumed. You tested with MyISAM too which does not support transactions. Perhaps in that case an explicit LOCK could account for the same effect. If the framework uses this trick, I'm not sure, I don't know laravel) Try to benchmark to see if there is a concurrency effect.
Another question: is this a single server? Or is it a master that replicates to one or more slaves?
Apart from this question, a few observations:
the values for id are hex strings. the column is unicode. this means 3*40 bytes are reserved while only 40 are utilized. This is a waste that will make things inefficient in general. It would be much better to use BINARY or ASCII as character encoding. Better yet, change the id column to BINARY data type and store the (unhexed) binary value
A hash for a innodb PK table will scatter the data across pages. The idea to use a auto_incrment pk, or not explicitly declare a pk at all (this will cause innodb to create an autoincrement pk of its own internally) is a good idea.
It looks like the payload is base64 encoded. Again the character encoding is specified to be unicode. Ascii or Binary (the character encoding, not the data type) is much more appropriate.
the HASH keyword in the unique index on ID is meaningless. InnoDB does not implement HASH indexes. Unfortunately MySQL is perfectly silent about this (see http://bugs.mysql.com/bug.php?id=73326)
(while this list does offer angles for improvement it seems unlikely that the extreme slowness can be fixed with this. there must be something else going on)
Frustratingly, the answer is this case was a bad disk. One of the disks in the storage array had gone bad, and so writes were taking forever to complete. Simply that.

Mysql 'Partitioning' vs Splitting data into different tables

We have a mysql table called posts_content.
The structure is as follows :
CREATE TABLE IF NOT EXISTS `posts_content` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`post_id` int(11) NOT NULL,
`forum_id` int(11) NOT NULL,
`content` longtext CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=79850 ;
The problem is that the table is getting pretty huge. Many giga-bytes of data ( we have a crawling engine ).
We keep inserting data into the table on a daily bases but seldom do we retrieve the data. Now as the table is getting pretty huge its getting difficult to handle the table.
We discussed two possibilities
Use MySQL's partitioning feature to partition the table using the forum_id ( there are about 50 forum_ids so there would be about 50 partitions. Note that even each partition if made so will eventually grow to again many giga-bytes of data maybe even eventually need its own drive
Create separate tables for each forum_id and split the data like that.
I hope I have clearly explained the problem. WHat I need to know is which of the above two would be a better solution in the long run. What are the adv. dis adv. of both the cases.
Thanking you
The difference is that in the first case you leave MySQL to do the sharding, and in the second case you are doing it on your own. MySQL won't scan any shards that do not contain the data, however if you have a query WHERE forum_id IN(...) it may need to scan several shards. As far as I remember, in that case the operation is syncronous, e.g. MySQL queries one partition at a time, and you may want to implement it asyncronously. Generally, if you do the partitioning on your own, you are more flexible, but for simple partitioning, based on the forum_id, if you query only 1 forum_id at a time, MySQL partitioning is OK.
My advice is to read the MySQL documentation on partitioning, especially the restrictions and limitations section, and then decide.
Although this is an old post, caveat with regards to partitioning if your engine is still MyISAM. MySQL 8.0 no longer supports partitioning other than Innodb or NDB storage engines only. In that case, you have to convert your MyISAM table to InnoDB or NDB but you need to remove partitioning first before converting it, else it cannot be used afterwards.
here you have a good answer for your question: https://dba.stackexchange.com/a/24705/15243
Basically, let your system grow and while you get familiarized with partitioning, and when your system really need to be "cropped in pieces", do it with partitioning.
A quick solution for 3x space shrinkage (and probably a speedup) is to compress the content and put it into a MEDIUMBLOB. Do the compression in the client, not the server; this saves on bandwidth and allows you to distribute the computation among the many client servers you have (or will have).
"Sharding" is separating the data across multiple servers. See MariaDB and Spider. This allows for size growth and possibly performance scaling. If you end up sharding, the forum_id may be the best. But that assumes no forum is too big to fit on one server.
"Partitioning" splits up the data, but only within a single server; it does not appear that there is any advantage for your use case. Partitioning by forum_id will not provide any performance.
Remove the FOREIGN KEYs; debug your application instead.