Truncating MEMORY table, and locking - mysql

I use a 1 column memory table to keep track of views on various items in my DB. Each view = INSERT query into the memory table. Every 10 mins, I wanna count() the rows for each item, and commit changes to DB.
The question is.... if I run the query that will get the list of all items, such as
SELECT COUNT(*) AS period_views, `item_id` FROM `-views` GROUP BY `item_id` ORDER BY `item_id`
and then run an update query for each row to add the amount of views in that period, and then truncate the table. This operation might take a few seconds.... and in those few seconds, there is going to be other INSERTS into that table, that didnt make it into the original count. Will they be truncated too once that command executes? or will the table be locked until the entire operation completes, and the the new INSERTs added?

MySQL does not lock the table automatically, and it is possible that you will lose some records in between getting the count and performing the truncate. So two solutions jump out at me:
1) Use table locks to prevent the memory table being updated - depending on the nature of your application, this means that all of your clients might freeze for a few seconds while you are updating, this might be OK.
2) Add a second column to keep track of which records you are currently updating ...
ALTER TABLE `-views` ADD work_in_progress TINYINT NOT NULL DEFAULT 0;
And then when you want to work on the those records
UPDATE `-views` SET work_in_progress = 1;
SELECT COUNT(*) AS period_views, `item_id` FROM `-views` WHERE work_in_progress GROUP BY `item_id` ORDER BY `item_id`;
# [ perform updates as necessary ]
DELETE FROM `-views` WHERE work_in_progress;
This implementation will guarantee that you don't delete any -views which were added while you were updating.
And FWIW, -views is an awful name for a table!

Related

Performance of mysql counting rows in a big table

This fairly obvious question has very few (couldnt find any) solid answers.
I do simple select from table of 2 million rows.
select count(id) as total from big_table
Any machine I try this query on, usually takes at least 5 seconds to complete. This is unacceptable for realtime queries.
The reason I need an exact value of rows fetched is for precise statistical calculations later on.
Using the last auto increment value is unfortunately not an options because rows also get deleted periodically.
It can indeed be slow when running on an InnoDB engine. As stated in section 14.24 of the MySQL 5.7 Reference Manual, “InnoDB Restrictions and Limitations”, 3rd bullet point:
InnoDB InnoDB does not keep an internal count of rows in a table because concurrent transactions might “see” different numbers of rows at the same time. Consequently, SELECT COUNT(*) statements only count rows visible to the current transaction.
For information about how InnoDB processes SELECT COUNT(*) statements, refer to the COUNT() description in Section 12.20.1, “Aggregate Function Descriptions”.
The suggested solution is a counter table. This is a separate table with one row and column, having the current record count. It could be kept updated via triggers. Something like this:
create table big_table_count (rec_count int default 0);
-- one-shot initialisation:
insert into big_table_count select count(*) from big_table;
create trigger big_insert after insert on big_table
for each row
update big_table_count set rec_count = rec_count + 1;
create trigger big_delete after delete on big_table
for each row
update big_table_count set rec_count = rec_count - 1;
You can see here a fiddle, where you should alter the insert/delete statements in the build section to see the effect on:
select rec_count from big_table_count;
You could extend this for several tables, either by creating such a table for each, or to reserve a row per table in the above counter table. It would then be keyed by a column "table_name".
Improving concurrency
The above method does have an impact if you have many concurrent sessions inserting or deleting records, because they need to wait for each other to complete the update of the counter.
A solution is to not let the triggers update the same, single record, but to let them insert a new record, like this:
create trigger big_insert after insert on big_table
for each row
insert into big_table_count (rec_count) values (1);
create trigger big_delete after delete on big_table
for each row
insert into big_table_count (rec_count) values (-1);
The way to get the count then becomes:
select sum(rec_count) from big_table_count;
Then, once in a while (e.g. daily) you should re-initialise the counter table to keep it small:
truncate table big_table_count;
insert into big_table_count select count(*) from big_table;

error 1206 whenever trying to delete records from a table

I have a table with more than 40 million records.i want to delete about 150000 records with a sql query:
DELETE
FROM t
WHERE date="2013-11-24"
but I get error 1206(The total number of locks exceeds the lock table size).
I searched a lot and change the buffer pool size:
innodb_buffer_pool_size=3GB
but it didn't work.
I also tried to lock tables but didn't work too:
Lock Tables t write;
DELETE
FROM t
WHERE date="2013-11-24";
unlock tables;
I know one solution is to split the process of deleting but i want this be my last option.
I am using mysql server, server OS is centos and server Ram is 4GB.
I'll appreciate any help.
You can use Limit on your delete and try deleting data in batches of say 10,000 records at a time as:
DELETE
FROM t
WHERE date="2013-11-24"
LIMIT 10000
You can also include an ORDER BY clause so that rows are deleted in the order specified by the clause:
DELETE
FROM t
WHERE date="2013-11-24"
ORDER BY primary_key_column
LIMIT 10000
There are a lot of quirky ways this error can occur. I will try to list one or two and perhaps the analogy holds true for someone reading this at some point.
On larger datasets even when changing innodb_buffer_pool_size to a larger value, you can hit this error when an adequate index is not in place to isolate the rows in the where clause. Or in some cases with the primary index (see this) and the comment from Roger Gammans:
From the (5.0 documentation for innodb):-
If you have no indexes suitable for your statement and MySQL must scan
the entire table to process the statement, every row of the table
becomes locked, which in turn blocks all inserts by other users to the
table. It is important to create good indexes so that your queries do
not unnecessarily scan many rows.
A visual of how this error can occur and difficult to solve is with this simple schema:
CREATE TABLE `students` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`thing` int(11) NOT NULL,
`campusId` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `ix_stu_cam` (`camId`)
) ENGINE=InnoDB;
A table with 50 Million rows. FK's not shown, not the issue. This table was originally for showing query performance also not important. Yet, in initializing thing=id in blocks of 1M rows, I had to perform a limit during the block update to prevent other problems, by using:
update students
set thing=id
where thing!=id
order by id desc
limit 1000000 ; -- 1 Million
This was all well until it got down to say 600000 left to update as seen by
select count(*) from students where thing!=id;
Why I was doing that count(*) stemmed from repeated
Error 1206: The total number of locks exceeds the lock table size
I could keep lowering my LIMIT shown in the above update, but in the end I would be left, say, with 1200 != in the count, and the problem just continued.
Why did it continue? Because the system filled the lock table as it scanned this large table. Sure, it might "intra implicit transaction" have changed those last 1200 row to equal, in my mind, but due to the lock table filling up, in reality would abort the transaction with nothing set. And the process would stalemate.
Illustration 2:
In this example, let's say I have 288 rows of the 50 Million row table that could be updated shown above. Due to the end-game problem described, I would often find a problem running this query twice:
update students set thing=id where thing!=id order by id desc limit 200 ;
But I would not have a problem with these:
update students set thing=id where thing!=id order by id desc limit 200;
update students set thing=id where thing!=id order by id desc limit 88 ;
Solutions
There are many ways to solve this, including but not limited to:
A. The creation of another index on a column suggesting the data had been updated, perhaps a boolean. And incorporating that into the where clause. Yet on huge tables, the creation of somewhat temporary indexes may be out of the question.
B. Populating a 2nd table with yet to be cleaned id's could be another solution. Coupled with and update with a join pattern.
C. Dynamically changing the LIMIT value so as to not cause an overrun of the lock table. The overrun can occur when there are simply no more rows to UPDATE or DELETE (your operation), the LIMIT has not been reached, and the lock table fills up in a fruitless scan for more that simply don't exist (seen above in Illustration2).
The main point of this answer is to offer an understanding of why it is happening. And for any reader to craft an end-game solution that fits their needs (versus, at times, fruitless changes to system variables, reboots, and prayers).
The simplest way is to create an index on the date column. I had 170 million rows and was deleting 6.5 million rows. I ran into the same problem and solved it by creating non-clustered index on the column which I was using in the WHERE clause then I executed the delete query and it worked.
Delete the index if you don't need it for future.

SELECT ... FOR UPDATE from one table in multiple threads

I need a little help with SELECT FOR UPDATE (resp. LOCK IN SHARE MODE).
I have a table with around 400 000 records and I need to run two different processing functions on each row.
The table structure is appropriately this:
data (
`id`,
`mtime`, -- When was data1 set last
`data1`,
`data2` DEFAULT NULL,
`priority1`,
`priority2`,
PRIMARY KEY `id`,
INDEX (`mtime`),
FOREIGN KEY ON `data2`
)
Functions are a little different:
first function - has to run in loop on all records (is pretty fast), should select records based on priority1; sets data1 and mtime
second function - has to run only once on each records (is pretty slow), should select records based on priority2; sets data1 and mtime
They shouldn't modify the same row at the same time, but the select may return one row in both of them (priority1 and priority2 have different values) and it's okay for transaction to wait if that's the case (and I'd expect that this would be the only case when it'll block).
I'm selecting data based on following queries:
-- For the first function - not processed first, then the oldest,
-- the same age goes based on priority
SELECT id FROM data ORDER BY mtime IS NULL DESC, mtime, priority1 LIMIT 250 FOR UPDATE;
-- For the second function - only processed not processed order by priority
SELECT if FROM data ORDER BY priority2 WHERE data2 IS NULL LIMIT 50 FOR UPDATE;
But what I am experiencing is that every time only one query returns at the time.
So my questions are:
Is it possible to acquire two separate locks in two separate transactions on separate bunch of rows (in the same table)?
Do I have that many collisions between first and second query (I have troubles debugging that, any hint on how to debug SELECT ... FROM (SELECT ...) WHERE ... IN (SELECT) would be appreciated )?
Can ORDER BY ... LIMIT ... cause any issues?
Can indexes and keys cause any issues?
Key things to check for before getting much further:
Ensure the table engine is InnoDB, otherwise "for update" isn't going to lock the row, as there will be no transactions.
Make sure you're using the "for update" feature correctly. If you select something for update, it's locked to that transaction. While other transactions may be able to read the row, it can't be selected for update, updated or deleted by any other transaction until the lock is released by the original locking transaction.
To keep things clean, try explicitly starting a transaction using "START TRANSACTION", run your select "for update", do whatever you're going to do to the records that are returned, and finish up by explicitly executing a "COMMIT" to close out the transaction.
Order and limit will have no impact on the issue you're experiencing as far as I can tell, whatever was going to be returned by the Select will be the rows that get locked.
To answer your questions:
Is it possible to acquire two separate locks in two separate transactions on separate bunch of rows (in the same table)?
Yes, but not on the same rows. Locks can only exist at the row level in one transaction at a time.
Do I have that many collisions between first and second query (I have troubles debugging that, any hint on how to debug SELECT ... FROM (SELECT ...) WHERE ... IN (SELECT) would be appreciated )?
There could be a short period where the row lock is being calculated, which will delay the second query, however unless you're running many hundreds of these select for updates at once, it shouldn't cause you any significant or noticable delays.
Can ORDER BY ... LIMIT ... cause any issues?
Not in my experience. They should work just as they always would on a normal select statement.
Can indexes and keys cause any issues?
Indexes should exist as always to ensure sufficient performance, but they shouldn't cause any issues with obtaining a lock.
All points in accepted answer seem fine except below 2 points:
"whatever was going to be returned by the Select will be the rows that get locked." &
"Can indexes and keys cause any issues?
but they shouldn't cause any issues with obtaining a lock."
Instead all the rows which are internally read by DB during deciding which rows to select and return will be locked. For example below query will lock all rows of the table but might select and return only few rows:
select * from table where non_primary_non_indexed_column = ? for update
Since there is no index, DB will have to read the entire table to search for your desired row and hence lock entire table.
If you want to lock only one row either you need to specify its primary key or an indexed column in the where clause. Thus indexing becomes very important in case of locking only the appropriate rows.
This is a good reference - https://dev.mysql.com/doc/refman/5.7/en/innodb-locking-reads.html

MySQL Query Optimization for large table

I have a very large database of images and i need to run an update to increment the view count on the images. every hour there are over one million unique rows to update. Right now it takes about an hour to run this query is there anyway to have this run faster?
i'm creating a memory table:
CREATE TABLE IF NOT EXISTS tmp_views_table (
key VARCHAR(7) NOT NULL,
views INT NOT NULL,
primary key ( `key` )
) ENGINE = MEMORY
Then I insert 1000 views at a time using a loop that runs until all the views have been inserted into the memory table:
insert low_priority into tmp_views_table
values ('key', 'count'),('key', 'count'),('key', 'count'), etc...
Then i run an update on the actual table like this:
update images, tmp_views_table
set images.views = images.views+tmp_views_table.views
where images.key = tmp_views_table.key
this last update is the one that is taking around an hour, the memory table stuff runs pretty quickly.
Is there a faster way that i can do this update?
Are you using Innodb, right? Try general tuning of mysql and innodb engine to allow for faster data changes.
I suppose you have an index on the key field of images table. You can try your update query also without index on the memory table - in that case the query optimizer should choose full table scan of the memory table.
I have never used joins with UPDATE statements, so I don't know exactly it is executed, but maybe the JOIN is taking too long. Maybe you can post an EXPLAIN result of that query.
Here is what I have used in one project to do the something similar - insert/update real-time data to temp table and merge it to aggregate table once a day, so can try if it will execute faster.
INSERT INTO st_views_agg (pageid,pagetype,day,count)
SELECT pageid,pagetype,DATE(`when`) AS day, COUNT(*) AS count FROM st_views_pending WHERE (pagetype=4) GROUP BY pageid,pagetype,day
ON DUPLICATE KEY UPDATE count=count+VALUES(count);

MySQL - How do I efficiently get the row with the lowest ID?

Is there a faster way to update the oldest row of a MySQL table that matches a certain condition than using ORDER BY id LIMIT 1 as in the following query?
UPDATE mytable SET field1 = '1' WHERE field1 = 0 ORDER BY id LIMIT 1;
Note:
Assume the primary key is id and there is also a index on field1.
We are updating a single row.
We are not updating strictly the oldest row, we are updating the oldest row that matches a condition.
We want to update the oldest matching row, i.e the lowest id, i.e. the head of the FIFO queue.
Questions:
Is the ORDER BY id necessary? How does MySQL order by default?
Real world example
We have a DB table being used for a email queue. Rows are added when we want to queue emails to send to our users. Rows are removed by a cron job, run each minute, processing as many as possible in that minute and sending 1 email per row.
We plan to ditch this approach and use something like Gearman or Resque to process our email queue. But in the meantime I have a question on how we can efficiently mark the oldest item of the queue for processing, a.k.a. The row with the lowest ID. This query does the job:
mysql_query("UPDATE email_queue SET processingID = '1' WHERE processingID = 0 ORDER BY id LIMIT 1");
However, it is appearing in the mysql slow log a lot due to scaling issues. The query can take more than 10s when the table has 500,000 rows. The problem is that this table has grown massively since it was first introduced and now sometimes has half a million rows and a overhead of 133.9 MiB. For example we INSERT 6000 new rows perhaps 180 times a day and DELETE roughly the same number.
To stop the query appearing in the slow log we removed the ORDER BY id to stop a massive sort of the whole table. i.e.
mysql_query("UPDATE email_queue SET processingID = '1' WHERE processingID = 0 LIMIT 1");
... but the new query no longer always gets the row with the lowest id (although it often does). Is there a more efficient way of getting the row with the lowest id other than using ORDER BY id ?
For reference, this is the structure of the email queue table:
CREATE TABLE IF NOT EXISTS `email_queue` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`time_queued` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT 'Time when item was queued',
`mem_id` int(10) NOT NULL,
`email` varchar(150) NOT NULL,
`processingID` int(2) NOT NULL COMMENT 'Indicate if row is being processed',
PRIMARY KEY (`id`),
KEY `processingID` (`processingID`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
Give this a read:
ORDER BY … LIMIT Performance Optimization
sounds like you have other processes locking the table preventing your update completing in a timely manner - have you considered using innodb ?
I think the 'slow part' comes from
WHERE processingID = 0
It's slow because it's not indexed. But, indexing this column (IMHO) seems incorrect too.
The idea is to change above query to something like :
WHERE id = 0
Which theoretically will be faster since it uses index.
How about creating another table which contains ids of rows which hasn't been processed? Hence the insertion works twice. First to insert to the real table and the second is to insert id into 'table of hasn't processed'. The processing part too, needs to double its duty. First to retrieve an id from 'table of hasn't been processed' then delete it. The second job of processing part is to process of course.
Of course, the id column in 'table of hasn't been processed' needs to index its content. Just to ensure that selecting and deleting will be faster.
This question is old, but for reference for anyone ending up here:
You have a condition on processingID (WHERE processingID = 0), and within that constraint you want to order by ID.
What's happening with your current query is that it scans the table from the lowest ID to the greatest, stopping when it finds 1 record matching the condition. Presumably, it will first find a ton of old records, scanning almost the entire table until it finds an unprocessed one near the end.
How do we improve this?
Consider that you have an index on processingID. Technically, the primary key is always appended (which is how the index can "point" to anything in the first place). So you really have an index on processingID, id. That means ordering on that will be fast.
Change your ordering to: ORDER BY processingID, id
Since you have fixed processingID to a single value with you WHERE clause, this does not change the resulting order. However, it does make it easy for the database to apply both your condition and your ordering, without scanning any records that do not match.
One funny thing is that MySQL, by default, returns rows orderd by ID, instead in a casual way as stated in the relational theory (I am not sure if this behaviour is changed in the latest versions). So, the last row you get from a select should be the last inserted row. I would not use this way, of course.
As you said, the best solution is to use something like Resque, or RabbitMQ & co.
You could use an in-memory table, that is volatile, but much faster, than store, there the latest ID, or just use a my_isam table to add persistency. It is simple and fast in performance and it takes a little bit to implement.