This fairly obvious question has very few (couldnt find any) solid answers.
I do simple select from table of 2 million rows.
select count(id) as total from big_table
Any machine I try this query on, usually takes at least 5 seconds to complete. This is unacceptable for realtime queries.
The reason I need an exact value of rows fetched is for precise statistical calculations later on.
Using the last auto increment value is unfortunately not an options because rows also get deleted periodically.
It can indeed be slow when running on an InnoDB engine. As stated in section 14.24 of the MySQL 5.7 Reference Manual, “InnoDB Restrictions and Limitations”, 3rd bullet point:
InnoDB InnoDB does not keep an internal count of rows in a table because concurrent transactions might “see” different numbers of rows at the same time. Consequently, SELECT COUNT(*) statements only count rows visible to the current transaction.
For information about how InnoDB processes SELECT COUNT(*) statements, refer to the COUNT() description in Section 12.20.1, “Aggregate Function Descriptions”.
The suggested solution is a counter table. This is a separate table with one row and column, having the current record count. It could be kept updated via triggers. Something like this:
create table big_table_count (rec_count int default 0);
-- one-shot initialisation:
insert into big_table_count select count(*) from big_table;
create trigger big_insert after insert on big_table
for each row
update big_table_count set rec_count = rec_count + 1;
create trigger big_delete after delete on big_table
for each row
update big_table_count set rec_count = rec_count - 1;
You can see here a fiddle, where you should alter the insert/delete statements in the build section to see the effect on:
select rec_count from big_table_count;
You could extend this for several tables, either by creating such a table for each, or to reserve a row per table in the above counter table. It would then be keyed by a column "table_name".
Improving concurrency
The above method does have an impact if you have many concurrent sessions inserting or deleting records, because they need to wait for each other to complete the update of the counter.
A solution is to not let the triggers update the same, single record, but to let them insert a new record, like this:
create trigger big_insert after insert on big_table
for each row
insert into big_table_count (rec_count) values (1);
create trigger big_delete after delete on big_table
for each row
insert into big_table_count (rec_count) values (-1);
The way to get the count then becomes:
select sum(rec_count) from big_table_count;
Then, once in a while (e.g. daily) you should re-initialise the counter table to keep it small:
truncate table big_table_count;
insert into big_table_count select count(*) from big_table;
Related
I need to set a maximum limit of rows in my MySQL table. Documentation tell us that one can use following SQL code to create table:
CREATE TABLE `table_with_limit`
`id` int(11) DEFAULT NULL
) ENGINE=InnoDB MAX_ROWS=100000
But MAX_ROWS property is not a hard limit ("store not more then 100 000 rows and delete other") but a hint for database engine that this table will have AT LEAST 100 000 rows.
The only possible way I see to solve the problem is to use BEFORE INSERT trigger which will check the count of rows in table and delete the older rows. But I'm pretty sure that this is a huge overheat :/
Another solution is to clear the table with cron script every N minutes. This is a simplest way, but still it needs another system to watch for.
Anyone knows a better solution? :)
Try to make a restriction on adding a new record to a table. Raise an error when a new record is going to be added.
DELIMITER $$
CREATE TRIGGER trigger1
BEFORE INSERT
ON table1
FOR EACH ROW
BEGIN
SELECT COUNT(*) INTO #cnt FROM table1;
IF #cnt >= 25 THEN
CALL sth(); -- raise an error
END IF;
END
$$
DELIMITER ;
Note, that COUNT operation may be slow on big InnoDb tables.
On MySQL 5.5 you can use SIGNAL // RESIGNAL statement to raise an error.
Create a table with 100,000 rows.
Pre-fill one of the fields with a
"time-stamp" in the past.
Select oldest record, update "time-stamp"
when "creating" (updating) record.
Only use select and update - never use insert or delete.
Reverse index on "time-stamp" field makes
the select/update fast.
There is no way to limit the maximum number of a table rows in MySQL, unless you write a Trigger to do that.
I'm just making up an answer off the top of my head. My assumption is that you want something like a 'bucket' where you put in records, and that you want to empty it before it hits a certain record number count.
After an insert statement, run SELECT LAST_INSERT_ID(); which will get you the auto increment of a record id. Yes you still have to run an extra query, but it will be low resource intensive. Once you reach a certain count, truncate the table and reset the auto increment id.
Otherwise you can't have a 'capped' table in mysql, as you would have to have pre-defined actions like (do we not allowe the record, do we truncate the table? etc).
I have a table in SQL Server 2005 and 2008 databases that has a periodic job which truncates the table and another table that has a periodic job that deletes rows from the end. My question is if I have a select that has started before the truncate or delete jobs start, will the select query the pre truncate/delete data or could some data be removed before the select reads it? The selects would be using one table or the other not joining both.
The statements would be along these lines:
select * from someTable where id > x and id < y
truncate table someTable
delete from otherTable where id < z
Truncate will wait for the SELECT to finish. DELETE will not. DELETE can remove rows before the SELECT had a chance to read them. Whether this will really happen or not depends on more factors, like your table organization (indexes, clustered index) and cardinality (number of rows in table, number of rows between lower and upper, number of rows lower than upper).
You can prevent this from happening by deploying row versioning
I need to set a maximum limit of rows in my MySQL table. Documentation tell us that one can use following SQL code to create table:
CREATE TABLE `table_with_limit`
`id` int(11) DEFAULT NULL
) ENGINE=InnoDB MAX_ROWS=100000
But MAX_ROWS property is not a hard limit ("store not more then 100 000 rows and delete other") but a hint for database engine that this table will have AT LEAST 100 000 rows.
The only possible way I see to solve the problem is to use BEFORE INSERT trigger which will check the count of rows in table and delete the older rows. But I'm pretty sure that this is a huge overheat :/
Another solution is to clear the table with cron script every N minutes. This is a simplest way, but still it needs another system to watch for.
Anyone knows a better solution? :)
Try to make a restriction on adding a new record to a table. Raise an error when a new record is going to be added.
DELIMITER $$
CREATE TRIGGER trigger1
BEFORE INSERT
ON table1
FOR EACH ROW
BEGIN
SELECT COUNT(*) INTO #cnt FROM table1;
IF #cnt >= 25 THEN
CALL sth(); -- raise an error
END IF;
END
$$
DELIMITER ;
Note, that COUNT operation may be slow on big InnoDb tables.
On MySQL 5.5 you can use SIGNAL // RESIGNAL statement to raise an error.
Create a table with 100,000 rows.
Pre-fill one of the fields with a
"time-stamp" in the past.
Select oldest record, update "time-stamp"
when "creating" (updating) record.
Only use select and update - never use insert or delete.
Reverse index on "time-stamp" field makes
the select/update fast.
There is no way to limit the maximum number of a table rows in MySQL, unless you write a Trigger to do that.
I'm just making up an answer off the top of my head. My assumption is that you want something like a 'bucket' where you put in records, and that you want to empty it before it hits a certain record number count.
After an insert statement, run SELECT LAST_INSERT_ID(); which will get you the auto increment of a record id. Yes you still have to run an extra query, but it will be low resource intensive. Once you reach a certain count, truncate the table and reset the auto increment id.
Otherwise you can't have a 'capped' table in mysql, as you would have to have pre-defined actions like (do we not allowe the record, do we truncate the table? etc).
I am collecting readings from several thousand sensors and storing them in a MySQL database. There are several hundred inserts per second. To improve the insert performance I am storing the values initially into a MEMORY buffer table. Once a minute I run a stored procedure which moves the inserted rows from the memory buffer to a permanent table.
Basically I would like to do the following in my stored procedure to move the rows from the temporary buffer:
INSERT INTO data SELECT * FROM data_buffer;
DELETE FROM data_buffer;
Unfortunately the previous is not usable because the data collection processes insert additional rows in "data_buffer" between INSERT and DELETE above. Thus those rows will get deleted without getting inserted to the "data" table.
How can I make the operation atomic or make the DELETE statement to delete only the rows which were SELECTed and INSERTed in the preceding statement?
I would prefer doing this in a standard way which works on different database engines if possible.
I would prefer not adding any additional "id" columns because of performance overhead and storage requirements.
I wish there was SELECT_AND_DELETE or MOVE statement in standard SQL or something similar...
I beleive this will work but will block until insert is done
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN TRANSACTION;
INSERT INTO data (SELECT * FROM data_buffer FOR UPDATE);
DELETE FROM data_buffer;
COMMIT TRANSACTION;
A possible way to avoid all those problems, and to also stay fast, would be to use two data_buffer tables (let's call them data_buffer1 and data_buffer2); while the collection processes insert into data_buffer2, you can do the insert and delete on data_buffer2; than you switch, so collected data goes into data_buffer2, while data is inserted+deleted from data_buffer1 into data.
How about having a row id, get the max value before insert, make the insert and then delete records <= max(id)
This is a similar solution to #ammoQ's answer. The difference is that instead of having the INSERTing process figure out which table to write to, you can transparently swap the tables in the scheduled procedure.
Use RENAME in the scheduled procedure to swap tables:
CREATE TABLE IF NOT EXISTS data_buffer_new LIKE data_buffer;
RENAME TABLE data_buffer TO data_buffer_old, data_buffer_new TO data_buffer;
INSERT INTO data SELECT * FROM data_buffer_old;
DROP TABLE data_buffer_old;
This works because RENAME statement swaps the tables atomically, thus the INSERTing processes will not fail with "table not found". This is MySQL specific though.
I assume the tables are identical, with the same columns and primary key(s)? If that is the case, you could nestled select inside a where clause...something like this:
DELETE FROM data_buffer
WHERE primarykey IN (SELECT primarykey FROM data)
This is a MySQL specific solution. You can use locking to prevent the INSERTing processes from adding new rows while you are moving rows.
The procedure which moves the rows should be as follows:
LOCK TABLE data_buffer READ;
INSERT INTO data SELECT * FROM data_buffer;
DELETE FROM data_buffer;
UNLOCK TABLE;
The code which INSERTs new rows in the buffer should be changed as follows:
LOCK TABLE data_buffer WRITE;
INSERT INTO data_buffer VALUES (1, 2, 3);
UNLOCK TABLE;
The INSERT process will obviously block while the lock is in place.
I am trying to copy one table over another one "atomically". Basically I want to update a table periodically, such that a process that reads from the table will not get an incomplete result if another process is updating the table.
To give some background info, I want a table that acts as a leaderboard for a game. This leaderboard will update every few minutes via a separate process. My thinking is as follows:
Table SCORES contains the publicly-viewable leaderboard that will be read from when a user views the leaderboard. This table is updated every few minutes. The process that updates the leaderboard will create a SCORES_TEMP table that contains the new leaderboard. Once that table is created, I want to copy all of its contents over to SCORES "atomically". I think what I want to do is something like:
TRUNCATE TABLE SCORES;
INSERT INTO SCORES SELECT * FROM SCORES_TEMP;
I want to replace everything in SCORES. I don't need to maintain my primary keys or auto increment values. I just want to bring in all the data from SCORES_TEMP. But I know that if someone views the scores before these 2 statements are done, the leaderboard will be blank. How can I do this atomically, such that it will never show blank or incomplete data? Thanks!
Use rename table
RENAME TABLE old_table TO backup_table, new_table TO old_table;
It's atomic, works on all storage engines, and doesn't have to rebuild the indexes.
In MySQL, because of the behavior of TRUNCATE I think you'll need to:
BEGIN TRANSACTION;
DELETE FROM SCORES;
INSERT INTO SCORES SELECT * FROM SCORES_TEMP;
COMMIT TRANSACTION;
I'm not sure there's a way to make what is always effectively a DDL operation transaction safe.
You may use transactions (for InnoDB),
BEGIN TRANSACTION;
DELETE FROM SCORES;
INSERT INTO SCORES SELECT * FROM SCORES_TEMP;
COMMIT;
or LOCK TABLES (for MyISAM):
LOCK TABLES;
DELETE FROM SCORES;
INSERT INTO SCORES SELECT * FROM SCORES_TEMP;
UNLOCK TABLES;
I don't know hot MySQL deals with transaction, but in T-SQL you could write
BEGIN TRAN
DELETE FROM SCORES
INSERT INTO SCORES SELECT * FROM SCORES_TEMP
COMMIT TRAN
This way your operation would be "atomic", but not instantaneous.