Mysql reclaim mysql disk space after deleting rows - mysql

First of all sorry for repeatating this repeated question as it has been asked numerous times already. I have gone through those questions and answers.
I have a table on production environment of size approx 159 GB and now I want to migrate MySQL to another server. But since total DB size is more than 300 GB I am not able to migrate it easily. So I am trying to reclaim space by deleting records from MySQL. I deleted more than 70 % records from this table and tried OPTIMIZE TABLE but its giving an error as :
mysql> OPTIMIZE TABLE table_name;
+--------------------------------------+----------+----------+-----------------------+
| Table | Op | Msg_type | Msg_text |
+--------------------------------------+----------+----------+-----------------------+
| table_name | optimize | note | Table does not support optimize, doing recreate + analyze instead |
| table_name | optimize | error | The table 'table_name' is full |
| table_name | optimize | status | Operation failed |
+--------------------------------------+----------+----------+-----------------------+
innodb_file_per_table is set to ON
SHOW VARIABLES LIKE '%innodb_file_per_table%';
Variable_name Value
--------------------- --------
innodb_file_per_table ON
Mysql Version: 5.7.28-log
I read somewhere that alter table will help however it slows down all MySQL queries.
In one answer I read that copying data in another table and then renaming it and deleting original table (which I assume OPTIMIZE TABLE does internally) will help but doing so will need downtime.
Is there any other way which I can achieve this.?

Related

Reduce the size of MySQL NDB binlog

I am running NDB Cluster and I see that on mysql api nodes, there is a very big binary log table.
+---------------------------------------+--------+-------+-------+------------+---------+
| CONCAT(table_schema, '.', table_name) | rows | DATA | idx | total_size | idxfrac |
+---------------------------------------+--------+-------+-------+------------+---------+
| mysql.ndb_binlog_index | 83.10M | 3.78G | 2.13G | 5.91G | 0.56 |
Is there any recommended way to reduce the size of that without breaking anything? I understand that this will limit the time frame for point-in-time recovery, but the data has is growing out of hand and I need to do a bit of clean up.
It looks like this is possible. I don't see anything here: http://dev.mysql.com/doc/refman/5.5/en/mysql-cluster-replication-pitr.html that says you can't based on the last epoch.
Some additional information might be gained by reading this article:
http://www.mysqlab.net/knowledge/kb/detail/topic/backup/id/8309
The mysql.ndb_binlog_index is a MyISAM table. If you are cleaning it,
make sure you don't delete entries of binary logs that you still need.

mysql dump query hangs

I have run mysql -u root -p gf < ~/gf_backup.sql to restore my db. However when I see the process list I see that one query has has been idle for a long time. I do not know the reason why.
mysql> show processlist;
+-----+------+-----------+-------------+---------+-------+-----------+------------------------------------------------------------------------------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+-----+------+-----------+-------------+---------+-------+-----------+------------------------------------------------------------------------------------------------------+
| 662 | root | localhost | gf | Query | 18925 | query end | INSERT INTO `gf_1` VALUES (1767654,'90026','Lddd',3343,34349),(1 |
| 672 | root | localhost | gf | Query | 0 | NULL | show processlist |
+-----+------+-----------+-------------+---------+-------+-----------+------------------------------------------------------------------------------------------------------+
Please check free space with df -h command (if under Linux/Unix) if you're out of space do not kill or restart MySQL until it catch up with changes when you free some space.
you may also want to check max_allowed_packet setting in my.cnf and set it to something like 256M, please refer to http://dev.mysql.com/doc/refman/5.0/en/server-system-variables.html#sysvar_max_allowed_packet
Probably your dump is very large and contains much normalized data (records split into a bunch of tables, with a bunch of foreign key constraints, indexes and so on).
If so, you may try to remove all constraints and index definitions from the SQL file, then import the data and re-create the former removed directives. This is a well-known trick to speed up imports, because INSERT commands without validation of any constraints are a lot faster, and creation of an index and so on afterwards can be done in a single transaction.
See also: http://support.tigertech.net/mysql-large-inserts
Of course, you should kill the query first. And remove all fragments it created already.

Optimizing a simple mysql select on a large table (75M+ rows)

I have a statistics table which grows at a large rate (around 25M rows/day) that I'd like to optimize for selects, the table fits in memory, and the server has plenty of spare memory (32G, table is 4G).
My simple roll-up query is:
EXPLAIN select FROM_UNIXTIME(FLOOR(endtime/3600)*3600) as ts,sum(numevent1) as success , sum(numevent2) as failure from stats where endtime > UNIX_TIMESTAMP()-3600*96 group by ts order by ts;
+----+-------------+--------------+------+---------------+------+---------+------+----------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+------+---------------+------+---------+------+----------+----------------------------------------------+
| 1 | SIMPLE | stats | ALL | ts | NULL | NULL | NULL | 78238584 | Using where; Using temporary; Using filesort |
+----+-------------+--------------+------+---------------+------+---------+------+----------+----------------------------------------------+
Stats is an innodb table, there is a normal index on endtime.. How should I optimize this?
Note: I do plan on adding roll-up tables, but currently this is what I'm stuck with, and I'm wondering if its possible to fix it without additional application code.
I've been doing local tests. Try the following:
alter table stats add index (endtime, numevent1, numevent2);
And remove the order by as it should be implicit in the group by (I guess the parser just ignores the order by in this case, but just in case :)
Since you are using InnoDB you can also try the following:
a) Change innodb_buffer_pool_size to 24GB (requires server restart) - which will ensure that your whole table can be loaded into memory therefore will speed up sorting even as the table grows larger
b) Add innodb_file_per_table which causes InnoDB to place each new table space in its own table. Needs you to drop existing table and recreate it
c) Use the smallest available column size which can fit the data. Without seeing the actual column definitions and some sample, I cannot provide any specific ideas. Can you provide a sample schema and maybe 5 rows of data

COUNT(id) query is taking too long, what performance enhancements might help?

I have a query timeout problem. When I did a:
SELECT COUNT(id) AS rowCount FROM infoTable;
in my program, my JDBC call timed out after 2.5 minutes.
I don't have much database admin expertise but I am currently tasked with supporting a legacy database. In this mysql database, there is an InnoDB table:
+-------+------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+------------+------+-----+---------+----------------+
| id | bigint(20) | NO | PRI | NULL | auto_increment |
| info | longtext | NO | | | |
+-------+------------+------+-----+---------+----------------+
It currently has a high id of 5,192,540, which is the approximate number of rows in the table. Some of the info text is over 1M, some is very small. Around 3000 rows are added on a daily basis. Machine has loads of free disk space, but not a lot of extra memory. Rows are read and are occasionally modified but are rarely deleted, though I'm hoping to clean out some of the older data which is pretty much obsolete.
I tried the same query manually on a smaller test database which had 1,492,669 rows, installed on a similar machine with less disk space, and it took 9.19 seconds.
I tried the same query manually on an even smaller test database which had 98,629 rows and it took 3.85 seconds. I then added an index to id:
create index infoTable_idx on infoTable(id);
and the subsequent COUNT took 4.11 seconds, so it doesn't seem that adding an index would help in this case. (Just for kicks, I did the same on the aforementioned mid-sized db and access time increased from 9.2 to 9.3 seconds.)
Any idea how long a query like this should be taking? What is locked during this query? What happens if someone is adding data while my program is selecting?
Thanks for any advice,
Ilane
You might try executing the following explain statement, might be a bit quicker:
mysql> EXPLAIN SELECT id FROM table;
That may or may not yield quicker results, look for the rows field.

How to optimize mysql indexes so that INSERT operations happen quickly on a large table with frequent writes and reads?

I have a table watchlist containing today almost 3Mil records.
mysql> select count(*) from watchlist;
+----------+
| count(*) |
+----------+
| 2957994 |
+----------+
It is used as a log to record product-page-views on a large e-commerce site (50,000+ products). It records the productID of the viewed product, the IP address and USER_AGENT of the viewer. And a timestamp of when it happens:
mysql> show columns from watchlist;
+-----------+--------------+------+-----+-------------------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------+--------------+------+-----+-------------------+-------+
| productID | int(11) | NO | MUL | 0 | |
| ip | varchar(16) | YES | | NULL | |
| added_on | timestamp | NO | MUL | CURRENT_TIMESTAMP | |
| agent | varchar(220) | YES | MUL | NULL | |
+-----------+--------------+------+-----+-------------------+-------+
The data is then reported on several pages throughout the site on both the back-end (e.g. checking what GoogleBot is indexing), and front-end (e.g. a side-bar box for "Recently Viewed Products" and a page showing users what "People from your region also liked" etc.).
So that these "report" pages and side-bars load quickly I put indexes on relevant fields:
mysql> show indexes from watchlist;
+-----------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-----------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| watchlist | 1 | added_on | 1 | added_on | A | NULL | NULL | NULL | | BTREE | |
| watchlist | 1 | productID | 1 | productID | A | NULL | NULL | NULL | | BTREE | |
| watchlist | 1 | agent | 1 | agent | A | NULL | NULL | NULL | YES | BTREE | |
+-----------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
Without the INDEXES, pages with the side-bar for example would spend about 30-45sec executing a query to get the 7 most-recent ProductIDs. With the indexes it takes <0.2sec.
The problem is that with the INDEXES the product pages themselves are taking longer and longer to load because as the table grows the write operations are taking upwards of 5sec. In addition there is a spike on the mysqld process amounting to 10-15% of available CPU each time a product page is viewed (roughly once every 2sec). We already had to upgrade the server hardware because on a previous server it was reaching 100% and caused mysqld to crash.
My plan is to attempt a 2-table solution. One table for INSERT operations, and another for SELECT operations. I plan to purge the INSERT table whenever it reaches 1000 records using a TRIGGER, and copy the oldest 900 records into the SELECT table. The report pages are a mixture of real-time (recently viewed) and analytics (which region), but the real-time pages tend to only need a handful of fresh records while the analytical pages don't need to know about the most recent trend (i.e. last 1000 views). So I can use the small table for the former and the large table for the latter reports.
My question: Is this an ideal solution to this problem?
Also: With TRIGGERS in MySQL is it possible to nice the trigger_statement so that it takes longer, but doesn't consume much CPU? Would running a cron job every 30min that is niced, and which performs the purging if required be a better solution?
Write operations for a single row into a data table should not take 5 seconds, regardless how big the table gets.
Is your clustered index based on the timestamp field? If not, it should be, so you're not writing into the middle of your table somewhere. Also, make sure you are using InnoDB tables - MyISAM is not optimized for writes.
I would propose writing into two tables: one long-term table, one short-term reporting table with little or no indexing, which is then dumped as needed.
Another solution would be to use memcached or an in-memory database for the live reporting data, so there's no hit on the production database.
One more thought: exactly how "live" must either of these reports be? Perhaps retrieving a new list on a timed basis versus once for every page view would be sufficient.
A quick fixed might be to use the INSERT DELAYED syntax, which allows mysql to queue the inserts and execute them when it has time. That's probably not a very scalable solution though.
I actually think that the principles of what you will be attempting is sound, although I wouldn't use a trigger. My suggested solution would be to let the data accumulate for a day and then purge the data to the secondary log table with a batch script that runs at night. This is mainly because these frequent transfers of a thousand rows would still put a rather heavy load on the server, and because I don't really trust the MySQL trigger implementation (although that isn't based on any real substance).
Instead of optimizing indexes you could use some sort of database write offload. You could delegate writing to some background process via asynchronous queue (ActiveMQ for example). Inserting a message into ActiveMQ queue is very fast. We are using ActiveMQ and have about 10-20K insert operations on test platform (and this is single threaded test application! So you could have more).
Look for 'shadow tables' when reconstructing tables this way, you don't need to write to the production table.
I had the same issue even using InnoDB tables or MyISAM as mention before, not optimized for writes, and solved it by using a second table to write temp data (that can periodically update master huge table). Master table over 18 million records, used to read only records and write result on to second small table.
The problem is the insert/update onto the big master table, takes a while, and worse if there are several updates or inserts on the queue awaiting, even with the INSERT DELAYED or UPDATE [LOW_PRIORITY] options enabled
To make it even faster, do read the small secondary table first, when searching a record, if te record is there, then work on the second table only. Use the master big table for reference and picking up new data record only *if data is not on the secondary small table, you just go and read the record from the master (Read is fast on InnoDB tables or MyISAM schemes)and then insert that record on the small second table.
Works like a charm, takes much less than 5 seconds to read from huge master 20 Million record and write on to second small table 100K to 300K records in less than a second.
This works just fine.
Regards
Something that often helps when doing bulk loads is to drop any indexes, do the bulk load, then recreate the indexes. This is generally much faster than the database having to constantly update the index for each and every row inserted.