MySQL performance - Selecting and deleting from a large table

MySQL performance - Selecting and deleting from a large table - mysql

I have a large table called "queue". It has 12 million records right now.
CREATE TABLE `queue` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`userid` varchar(64) DEFAULT NULL,
`action` varchar(32) DEFAULT NULL,
`target` varchar(64) DEFAULT NULL,
`name` varchar(64) DEFAULT NULL,
`state` int(11) DEFAULT '0',
`timestamp` int(11) DEFAULT '0',
`errors` int(11) DEFAULT '0',
PRIMARY KEY (`id`),
UNIQUE KEY `idx_unique` (`userid`,`action`,`target`),
KEY `idx_userid` (`userid`),
KEY `idx_state` (`state`)
) ENGINE=InnoDB;
Multiple PHP workers (150) use this table simultaneously.
They select a record, perform a network request using the selected data and then delete the record.
I get mixed execution times from the select and delete queries. Is the delete command locking the table?
What would be the best approach for this scenario?
SELECT record + NETWORK request + DELETE the record
SELECT record + NETWORK request + MARK record as completed + DELETE completed records using a cron from time to time (I don't want an even bigger table).
Note: The queue gets new records every minute but the INSERT query is not the issue here.
Any help is appreciated.

"Don't queue it, just do it". That is, if the tasks are rather fast, it is better to simply perform the action and not queue it. Databases don't make good queuing mechanisms.
DELETE does not lock an InnoDB table. However, you can write a DELETE that seems that naughty. Let's see your actual SQL so we can work in improving it.
12M records? That's a huge backlog; what's up?
Shrink the datatypes so that the table is not gigabytes:
action is only a small set of possible values? Normalize it down to a 1-byte ENUM or TINYINT UNSIGNED.
Ditto for state -- surely it does not need a 4-byte code?
There is no need for INDEX(userid) since there is already an index (UNIQUE) starting with userid.
If state has only a few value, the index won't be used. Let's see your enqueue and dequeue queries so we can discuss how to either get rid of that index or make it 'composite' (and useful).
What's the current value of MAX(id)? Is it threatening to exceed your current limit of about 4 billion for INT UNSIGNED?
How does PHP use the queue? Does it hang onto an item via an InnoDB transaction? That defeats any parallelism! Or does it change state. Show us the code; perhaps the lock & unlock can be made less invasive. It should be possible to run a single autocommitted UPDATE to grab a row and its id. Then, later, do an autocommitted DELETE with very little impact.
I do not see a good index for grabbing a pending item. Again, let's see the code.
150 seems like a lot -- have you experimented with fewer? They may be stumbling over each other.
Is the Slowlog turned on (with a low value for long_query_time)? If so, I wonder what is the 'worst' query. In situations like this, the answer may be surprising.

Related

Multi-Table MariaDB Database Partitioning Solution

Looking for some guidance on how to best tackle partitioning on some database tables for the purpose of archiving/deleting data over a certain age. The main reason for this is to resolve some issues in database size.
You can think of the data akin to telemetry data where is is growing over time, but once it enters the database it doesn't change outside of the first 10-15 minutes in the event there is any form of conflicting data that requires the application to update a recent record (max 15 mins).
Current database size is approaching 500GB and is sitting on NVMe storage across a 3x Node Galera cluster in three cities. Backups are becoming increasingly larger and if an SST is needed between nodes this can take a couple of hours to complete which is less than ideal.
The plan to deal with this is by way of archiving, where we plan to off-board historical data to another server (say once a year) with slower storage that can then be backed up once and won't change for 12 months. The historical data will be rarely accessed, and in the event it is our application will handle querying the archive server if older than a certain date instead of the production servers that are relied on heavily for "recent" data.
We have 3x tables per customer, and they reference each other in a sort of heirarchy. There are no foreign keys in the tables, but they do hold references to one another and are used in JOIN queries. Eg. summary table is the top of the hierarchy and holds one record per "event". Under this is the details table and there could be 1-10 detail records sitting under the summary event. Under details is the digits table that could include 0-10 records per detailed record.
CREATE TABLE data below;
CREATE TABLE `summary_X` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`start_utc` datetime DEFAULT NULL,
`end_utc` datetime DEFAULT NULL,
`total_duration` smallint(6) DEFAULT NULL,
`legs` tinyint(4) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `start_utc` (`start_utc`)
) ENGINE=InnoDB
CREATE TABLE `details_X` (
`xid` bigint(20) NOT NULL AUTO_INCREMENT,
`id` int(11) NOT NULL,
`duration` smallint(6) DEFAULT NULL,
`start_utc` timestamp NULL DEFAULT NULL,
`end_utc` timestamp NULL DEFAULT NULL,
`event` varchar(2) DEFAULT NULL,
`event_time` smallint(6) DEFAULT NULL,
`event_a` varchar(7) DEFAULT NULL,
`event_b` varchar(7) DEFAULT NULL,
`ani` varchar(20) DEFAULT NULL,
`dnis` varchar(10) DEFAULT NULL,
`first_time` varchar(30) DEFAULT NULL,
`final_time` varchar(30) DEFAULT NULL,
`digits_count` int(2) DEFAULT 0,
`sys_a` varchar(3) DEFAULT NULL,
`sys_b` varchar(3) DEFAULT NULL,
`log_id_a` varchar(12) DEFAULT NULL,
`seq_a` varchar(1) DEFAULT NULL,
`log_id_b` varchar(12) DEFAULT NULL,
`seq_b` varchar(1) DEFAULT NULL,
`assoc_log_id_a` varchar(12) DEFAULT NULL,
`assoc_log_id_b` varchar(12) DEFAULT NULL,
PRIMARY KEY (`xid`),
KEY `start_utc` (`start_utc`),
KEY `end_utc` (`end_utc`),
KEY `event_a` (`event_a`),
KEY `event_b` (`event_b`),
KEY `id` (`id`),
KEY `final_digits` (`final_digits`),
KEY `log_id_a` (`log_id_a`),
KEY `log_id_b` (`log_id_b`)
) ENGINE=InnoDB
CREATE TABLE `digits_X` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`leg_id` bigint(20) DEFAULT NULL,
`sequence` int(2) NOT NULL,
`digits` varchar(30) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `digits` (`digits`),
KEY `leg_id` (`leg_id`)
) ENGINE=InnoDB
My first thought was to partition on Year, sounds easy enough but we don't have a date column on the digits table, so records here could be orphaned away from their mapped details record and no longer match in a JOIN on the archive server.
We then can also have a similar issue with summary and the timestamps on the "details" records could span multiple years. Eg. Summary event starts at 2021-12-31 23:55:00. First detail record is same timestamp, and then the next detail under the same event could be 2022-01-01 00:11:00. If 2021 partition was archived off to the other server, the 2022 detail would be orphaned and no longer JOIN to the 2021 summary event.
One alternative could be not to partition at all and do SELECT/INSERT/DELETE which isn't practical with the volume of data. Some tables have 30M-40M rows per year so this would be very resource taxing. There are also 400+ customers each with their own sets of tables.
Another I thought of was to add a column to the three tables as a "Year" column we can partition on but would include the Year of first event across all, so all related records can be on the same partitions/server, but this seems like a waste of space and there should be a better way.
Any thoughts or guidance would be appreciated.

To add PARTITIONing will require copying the entire table over. That will involve downtime and disk space. If you can live with that, then...
PARTITION BY RANGE(...) where the expression involves, say, TO_DAYS(...) or possibly TO_SECONDS(...). Then set up cron jobs to add a new partition periodically. (There is nothing automated for such.) And to detach the oldest partition. See Partition for a discussion of the details. (TO_DAYS avoids the need for a 'year' column.)
Note that Partitioning is implemented as several sub-tables under a table. With "transportable tablespaces", you can detach a partition from the big table, turning it into a table unto itself. At that point, you are free to move it to another server of something.
In a situation like yours, I might consider the following.
Write the raw data to a file (perhaps one per day) for archiving;
Insert into a table that will live only briefly; this will be purged by some means frequently;
Update "normalization" tables
"Summarize" the data into Summary Tables, where each set of rows covers one hour (or whatever makes sense).
Write "reports" from the summary table(s).
Be aware that each Partition takes an extra 5.5MB (average), so do not make many partitions. Or do you need only 2, each containing 15 minutes' data?
Meanwhile, I would look carefully at the schema. Can an INT (4 bytes) be turned into a SMALLINT (2 bytes). Can more things be Normalized.
digits_count int(2) -- that is a 4-byte INT; the (2) has no meaning and has been removed in MySQL 8. (MariaDB may follow suit someday.) It sounds like you need only a 1-byte TINYINT UNSIGNED (range: 0..255).
Since this is log info, be aware of Daylight Savings wrt DATETIME. (One hour per year is missing; another hour repeats.) This problem does not occur with TIMESTAMP. Each one takes 5 bytes (unless you include fractional seconds.)
(I can't advise on unnecessary indexes without seeing the queries.) SHOW TABLE STATUS will tell you how much space is being consumed by all the indexes.
Are the 3 tables of similar size?
Re "orphaning" -- You need at least 2 partitions -- one being filled (0-100% full) and an older partition (100% full)
"30M-40M rows per year" times 400 customers. Does that add up to 500 rows inserted per second? Are they INSERTed one row at a time? High speed ingestion
Are there more deletes and selects than inserts? And/or do they involve more than single rows? (I'm fishing for more info go help with some other issues you either have or are threatening to have.) Even with Deletes and no Partitioning, the disk growth will slow down as free space is generated, then reused. ("Rince and repeat.")
Without partitioning, see Huge Deletes . But... DELETEing data from a table does not shrink it disk footprint. However if each 'customer' has 1/400th of the data; and (of course) you do each customer separately, then there may not be any disk problem
I've given you a lot to think about. Answer some of my questions; I may have more advice.

How to efficiently update values without a primary key in MySQL?

I am currently facing an issue with designing a database table and updating/inserting values into it.
The table is used to collect and aggregate statistics that are identified by:
the source
the user
the statistic
an optional material (e.g. item type)
an optional entity (e.g. animal)
My main issue is, that my proposed primary key is too large because of VARCHARs that are used to identify a statistic.
My current table is created like this:
CREATE TABLE `Statistics` (
`server_id` varchar(255) NOT NULL,
`player_id` binary(16) NOT NULL,
`statistic` varchar(255) NOT NULL,
`material` varchar(255) DEFAULT NULL,
`entity` varchar(255) DEFAULT NULL,
`value` bigint(20) NOT NULL)
In particular, the server_id is configurable, the player_id is a UUID, statistic is the representation of an enumeration that may change, material and entity likewise. The value is then aggregated using SUM() to calculate the overall statistic.
So far it works but I have to use DELETE AND INSERT statements whenever I want to update a value, because I have no primary key and I can't figure out how to create such a primary key in the constraints of MySQL.
My main question is: How can I efficiently update values in this table and insert them when they are not currently present without resorting to deleting all the rows and inserting new ones?
The main issue seems to be the restriction MySQL puts on the primary key. I don't think adding an id column would solve this.

Simply add an auto-incremented id:
CREATE TABLE `Statistics` (
statistis_id int auto_increment primary key,
`server_id` varchar(255) NOT NULL,
`player_id` binary(16) NOT NULL,
`statistic` varchar(255) NOT NULL,
`material` varchar(255) DEFAULT NULL,
`entity` varchar(255) DEFAULT NULL,
`value` bigint(20) NOT NULL
);
Voila! A primary key. But you probably want an index. One that comes to mind:
create index idx_statistics_server_player_statistic on statistics(server_id, player_id, statistic)`
Depending on what your code looks like, you might want additional or different keys in the index, or more than one index.

Follow the below hope it will solve your problem :-
- First use a variable let suppose "detailed" as money with your table.
- in your project when you use insert statement then before using statement get the maximum of detailed (SELECT MAX(detailed)+1 as maxid FROM TABLE_NAME( and use this as use number which will help you to FETCH,DELETE the record.
-you can also update with this also BUT during update MAXIMUM of detailed is not required.
Hope you understand this and it will help you .

I have dug a bit more through the internet and optimized my code a lot.
I asked this question because of bad performance, which I assumed was because of the DELETE and INSERT statements following each other.
I was thinking that I could try to reduce the load by doing INSERT IGNORE statements followed by UPDATE statements or INSERT .. ON DUPLICATE KEY UPDATE statements. But they require keys to be useful which I haven't had access to, because of constraints in MySQL.
I have fixed the performance issues though:
By reducing the amount of statements generated asynchronously (I know JDBC is blocking but it worked, it just blocked thousand of threads) and disabling auto-commit, I was able to improve the performance by 600 times (from 60 seconds down to 0.1 seconds).
Next steps are to improve the connection string and gaining even more performance.

Mysql: Deadlock found when trying to get lock, need remove key?

i count page view statistics in Mysql and sometimes get deat lock.
How can resolve this problem? Maybe i need remove one of key?
But what will happen with reading performance? Or is it not affect?
Table:
CREATE TABLE `pt_stat` (
`stat_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`post_id` int(11) unsigned NOT NULL,
`stat_name` varchar(50) NOT NULL,
`stat_value` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`stat_id`),
KEY `post_id` (`post_id`),
KEY `stat_name` (`stat_name`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8
Error: "Deadlock found when trying to get lock; try restarting transaction".
UPDATE pt_stat SET stat_value = stat_value + 1 WHERE post_id = "21500" AND stat_name = 'day_20170111';

When dealing with deadlocks, the first thing to do, always, is to see whether you have complex transactions deadlocking against eachother. This is the normal case. I assume based on your question that the update statement, however, is in its own transaction and therefore there are no complex interdependencies among writes from a logical database perspective.
Certain multi-threaded databases (including MySQL) can have single statements deadlock against themselves due to write dependencies within threads on the same query. MySQL is not alone here btw. MS SQL Server has been known to have similar problems in some cases and workloads. The problem (as you seem to grasp) is that a thread updating an index can deadlock against another thread that updates an index (and remember, InnoDB tables are indexes with leaf-nodes containing the row data).
In these cases there are three things you can look at doing:
If the problem is not severe, then the best option is generally to retry the transaction in case of deadlock.
You could reduce the number of background threads but this will affect both read and write performance, or
You could try removing an index (key). However, keep in mind that unindexed scans on MySQL are slow.

Is this MongoDB use case?

I have a dating website in which i send daily alerts and log alerts in ALERTS_LOG.
CREATE TABLE `ALERTS_LOG` (
`RECEIVERID` mediumint(11) unsigned NOT NULL DEFAULT '0',
`MATCHID` mediumint(11) unsigned NOT NULL DEFAULT '0',
`DATE` smallint(6) NOT NULL DEFAULT '0',
KEY `RECEIVER` (`RECEIVER`),
KEY `USER` (`USER`)
) ENGINE=MRG_MyISAM DEFAULT CHARSET=latin1 INSERT_METHOD=LAST UNION=(`ALERTS_LOG110`,`ALERTS_LOG111`,`ALERTS_LOG112`)
Logic Of Insertion : I have create merge table and each sub tables like ALERTS_LOG110 store 0-15 days record. On every 1st and 16th i create a new table and change definition of mergeMyisam.
Example : INSERT_METHOD=LAST UNION=(ALERTS_LOG111,ALERTS_LOG112,ALERTS_LOG113).
Advantage :
Deletion of is super fast.
Issues with this approach:
1. When i change definition, i often got site down issue as when i change the definition, indexes need to get on cache and all select queries got stuck.
2. Locking issue because of too many inserts and select.
So, can I look MongoDB for solving this issue?

No, not really. Re-engineering your application to use two different database types because of performance on this log table seems like a poor choice.
It's not really clear why you have so many entries being logged, but on the face of it look might like to look into partitioning in MySQL and partition your table by day or week and then drop those partitions. Deletion is still super fast and there would be no downtime for it because you won't be changing object names every day.

How to optimize a 'col = col + 1' UPDATE query that runs on 100,000+ records?

See this previous question for some background. I'm trying to renumber a corrupted MPTT tree using SQL. The script is working fine logically, it is just much too slow.
I repeatedly need to execute these two queries:
UPDATE `tree`
SET `rght` = `rght` + 2
WHERE `rght` > currentLeft;
UPDATE `tree`
SET `lft` = `lft` + 2
WHERE `lft` > currentLeft;
The table is defined as such:
CREATE TABLE `tree` (
`id` char(36) NOT NULL DEFAULT '',
`parent_id` char(36) DEFAULT NULL,
`lft` int(11) unsigned DEFAULT NULL,
`rght` int(11) unsigned DEFAULT NULL,
... (a couple of more columns) ...,
PRIMARY KEY (`id`),
KEY `parent_id` (`parent_id`),
KEY `lft` (`lft`),
KEY `rght` (`rght`),
... (a few more indexes) ...
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
The database is MySQL 5.1.37. There are currently ~120,000 records in the table. Each of the two UPDATE queries takes roughly 15 - 20 seconds to execute. The WHERE condition may apply to a majority of the records, so that almost all records need to be updated each time. In the worst case both queries are executed as many times as there are records in the database.
Is there a way to optimize this query by keeping the values in memory, delaying writing to disk, delaying index updates or something along these lines? The bottleneck seems to be hard disk throughput right now, as MySQL seems to be writing everything back to disk immediately.
Any suggestion appreciated.

I never used it, but if your have enough memory, try the memory table.
Create a table with the same structure as tree, insert into .. select from .., run your scripts against the memory table, and write it back.

Expanding on some ideas from comment as requested:
The default is to flush to disk after every commit. You can wrap multiple updates in a commit or change this parameter:
http://dev.mysql.com/doc/refman/5.1/en/innodb-parameters.html#sysvar_innodb_flush_log_at_trx_commit
The isolation level is simple to change. Just make sure the level fits your design. This probably won't help because a range update is being used. It's nice to know though when looking for some more concurrency:
http://dev.mysql.com/doc/refman/5.1/en/set-transaction.html
Ultimately, after noticing the range update in the query, your best bet is the MEMORY table that andrem pointed out. Also, you'll probably be able to find some performance by using a btree indexes instead of the default of hash:
http://www.mysqlperformanceblog.com/2008/02/01/performance-gotcha-of-mysql-memory-tables/

You're updating indexed columns - indexes negatively impact (read: slow down) INSERT/UPDATEs.
If this is a one time need to get things correct:
Drop/delete the indexes on the columns being updated (lft, rght)
Run the update statements
Re-create the indexes (this can take time, possibly equivalent to what you already experience in total)

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008