How to properly clean up a table - mysql

In order to determine how often some object has been used, I use a table with the following fields;
id - objectID - timestamp
Every time an object is used, it's ID and time() are added in. This allows me to determine how often an object has been used in the last hour/minute/second etc.
After one hour, the row is useless (I'm not checking above one hour). However, it is my understanding that it is unwise to simply delete the row, because it may mess up the primary key (auto_increment ID).
So I added a field called "active". Prior to checking how often an object has been used I loop over all WHERE active=1 and set it to 0 if more than 1 hour has passed. I don't think this would give any concurrency problems between multiple users, but this leaves me with alot of unused data.
Now I'm thinking that maybe it's best to, prior to inserting new usage data, check if there is a field with active=0 and then rather than inserting a new row, update that one with the new data, and set active to 1 again. However, this would require table locking to prevent multiple clients from updating the same row.
Can anyone shed some more light on this, please?

I've never heard anywhere that deleting rows messes up primary keys.
Are you perhaps attempting to ensure that the id values automatically assigned by auto_increment match those of another table? This is not necessary - you can simply use an INTEGER PRIMARY KEY as the id column and assign the values explicitly.

You could execute an update query that match all rows older than 1 hour.
UPDATE table SET active=0 WHERE timestamp < now() - interval 1 hour

Related

MySQL - Prevent Insertion of Record if ID and Timestamp Unique Combo Constraint Within Timeframe

I have a scenario in which 3 standalone agents are reporting uptime statuses for various hosts. If the hosts go down and are offline, a downtime record should be created. Unfortunately, since the agents report exactly at the same time with the same information, I've seen duplicate entries that are 1-2 seconds apart.
I have a unique constraint that was created on the table for both the datetime and the host ID. Thus, they cannot be the same. But if the requests from the agents come in at the same time or a second apart, a duplicate might be created despite code checks looking for an existing entry (in this case, an entry hasn't been created yet in all three instances if the agents report at the same time). The unique constraint won't prevent the duplicates either, since the datetime might be 1 second ahead or behind when the PHP / MySQL call finishes getting processed...
So, what is the best way to handle this situation? Is there a way in MySQL to specify that if a unique constraint (which includes a datetime field) is within a certain time frame of another record with the unique constraint, it shouldn't be allowed to insert?
Do I need to run a job that removes entries within a few seconds of each other, or is there a way to get MySQL to do this for me somehow?
Table structure looks like this
entry_id host_id datetime
Entries might be
1 121 01/17/2019 02:38:04 AM
1 121 01/17/2019 02:38:05 AM
1 121 01/17/2019 02:36:04 AM
I want to prevent the insertion of the bold entry since it's within 1 second from the last entry that was inserted. Code checks won't work because no entries may exist at the time it checks for one. I already have a code check looking for an existing entry, and since it doesn't find one and the code can be run at the same time for each request, the check fails and says a new entry should be created.
There are more datetime columns in my table, but they aren't needed to understand this situation. Any help is appreciated.
If you're using MySQL 5.7 or newer, you can define a generated column that contains the timestamp rounded down to a 5 or 10 second range. Then you can define a unique index on that column, so you'll get a constraint violation if you try to create two records with timestamps in that same period.
ALTER TABLE yourTable
ADD datetime_10sec INT AS (10 * ROUND(UNIX_TIMESTAMP(datetime)/10)) PERSISTENT,
ADD UNIQUE INDEX (host_id, datetime_10sec);
Change both 10 to whatever time granularity you want to use.

How to implement temporal data in MySQL

I currently have a non-temporal MySQL DB and need to change it to a temporal MySQL DB. In other words, I need to be able to retain a history of changes that have been made to a record over time for reporting purposes.
My first thought for implementing this was to simply do inserts into the tables instead of updates, and when I need to select the data, simply doing a GROUP BY on some column and ordering by the timestamp DESC.
However, after thinking about things a bit, I realized that that will really mess things up because the primary key for each insert (which would really just be simulating a number of updates on a single record) will be different and thus mess up any linkage that uses the primary key to link to other records in the DB.
As such, my next thought was to continue updating the main tables in the DB, but also create a new insert into an "audit table" that is simply a copy of the full record after the update, and then when I needed to report on temporal data, I could use the audit table for querying purposes.
Can someone please give me some guidance or links on how to properly do this?
Thank you.
Make the given table R temporal(ie, to maintain the history).
One design is to leave the table R as it is and create a new table R_Hist with valid_start_time and valid_end_time.
Valid time is the time when the fact is true.
The CRUD operations can be given as:
INSERT
Insert into both R
Insert into R_Hist with valid_end_time as infinity
UPDATE
Update in R
Insert into R_Hist with valid_end_time as infinity
Update valid_end_time with the current time for the “latest” tuple
DELETE
Delete from R
Update valid_end_time with the current time for the “latest” tuple
SELECT
Select from R for ‘snapshot’ queries (implicitly ‘latest’ timestamp)
Select from R_Hist for temporal operations
Instead, you can choose to design new table for every attribute of table R. By this particular design you can capture attribute level temporal data as opposed to entity level in the previous design. The CRUD operations are almost similar.
I did a column Deleted and a column DeletedDate. Deleted defaults to false and deleted date null.
Complex primary key on IDColumn, Deleted, and DeletedDate.
Can index by deleted so you have real fast queries.
No duplicate primary key on your IDColumn because your primary key includes deleted and deleted date.
Assumption: you won't write to the same record more than once a millisecond. Could cause duplicate primary key issue if deleted date is not unique.
So then I do a transaction type deal for updates: select row, take results, update specific values, then insert. Really its an update to deleted true deleted date to now() then you have it spit out the row after update and use that to get primary key and/or any values not available to whatever API you built.
Not as good as a temporal table and takes some discipline but it builds history into 1 table that is easy to report on.
I may start updating the deleted date column and change it to added/Deleted in addition to the added date so I can sort records by 1 column, the added/deleted column while always updated the addedBy column and just set the same value as the added/Deleted column for logging sake.
Either way could just do a complex case when not null as addedDate else addedDate as addedDate order by AddedDate desc. so, yeah, whatever, this works.

unique id without auto_increment

I have an existing schema with a non-auto-incrementing primary key. The key is used as a foreign key in a dozen other tables.
I have inherited a program with major performance problems. Currently, when a new row is added to this table, this is how a new unique id is created:
1) a query for all existing primary key values is retrieved
2) a random number is generated
3) if the number does not exist in the retrieved values, use it, otherwise goto (2)
The app is multi-threaded and multi-server, so simply grabbing the existing ids once at startup isn't an option. I do not have unique information from the initiating request to grab and convert into a pseudo-unique value (like a member id).
I understand it is theoretically possible to perform surgery on the internals to add autoincrementing to an existing primary key. I understand it would also be possible to systematically drop all foreign keys pointing to this table, then create-rename-insert a new version of the table, then add back foreign keys, but this table format is dictated by a third-party app and if I mess this up then Bad Things happen.
Is there a way to leverage sql/mysql to come up with unique row values?
The closest I have come up with is choosing a number randomly from a large space and hoping it is unique in the database, then retrying when the odd collision occurs.
Ideas?
If the table has a primary key that isn't being used for foreign key references, then drop that primary key. The goal is to make your column an auto-incremented primary key.
So, look for the maximum value and then the following should do what you want:
alter table t modify id int not null auto_increment primary key;
alter table t auto_increment = <maximum value> + 1;
I don't think it is necessary to explicitly set the auto_increment value, but I like to be sure.
I think you can SELECT MAX('strange-id-column')+1. That value will be unique and you can put that sql code inside a transaction with the INSERT code in order to prevent errors.
It seems really expensive to pull back a list of all primary key values (for large sets), and then to generate psuedo-random value and verify it's unique, by checking it against the list.
One of the big problems I see with this approach is that a pseudo-random number generator will generate the same sequence of values, when the sequence is started with the same seed value.
If that ever happens, then there will be collision after collision after collision until the sequence reaches a value that hasn't yet been used. And the next time it happens, you'd spin through that whole list again, to add one more value.
I don't understand why the value has to be random.
If there's not a requirement for pseudo-randomness, and an ascending value would be okay, here's what I would do if I didn't want to make any changes to the existing table:
I'd create another "id-generator" table that has an auto_increment column. I perform inserts to that table to generate id values.
Instead of running a query to pull back all existing id values from the existing table, I'd instead perform an INSERT into the "id-generator" table, and then a SELECT LAST_INSERT_ID() to retrieve the id of the row I just inserted, and that would use that as "generated" id value.
Basically, emulating an Oracle SEQUENCE object. It wouldn't be necessary to keep all of the rows in "id-generator" table. So, I could perform a DELETE of all rows that have an id value less than the maximum id value.
If there is a requirement for pseudo-randomness (shudder) I'd probably just attempt the INSERT as a way to find out if the key exists or not. If the insert fails due to a duplicate key, I'd try again with a different id value.
The repeated sequence from a pseudo-random generator scares me... if I got several collisions in a row... are these from a previously used sequence, or are they values from a different sequence. I don't have any way of knowing. Abandoning the sequence and restarting with a new seed, if that seed has been used before, I'm off chasing another series of previously generated values.
For low levels of concurrency (average concurrent ongoing inserts < 1) You can use optimistic locking to achieve a unique id without autoincrement:
set up a one-row table for this function, eg:
create table last_id (last_id bigint not null default 0);
To get your next id, retrieve this value in your app code, apply your newId function, and then attempt to update the value, eg:
select last_id from last_id; // In DB
newId = lastId + 1 // In app code
update last_id set last_id=$newId where last_id=$lastId // In DB
Check the number of rows that were updated. If it was 0, another server beat you to it and you should return to step 1.

Reserve a block of auto-increment ids in MySQL

I receive batches of, say, 100 items that I need to insert into three related MySQL tables: say current, recent, and historical. I want to insert each batch in each table as a group in a single insert statement for speed. The current table has an auto-increment primary key id that I need to obtain for each inserted row and use as the primary key to insert the same row in the recent and historical tables. My idea is to get the current auto_increment value for current, increment it by 100 using alter table current AUTO_INCREMENT=, then insert the 100 rows into current with programmatically set ids from the block that I just "reserved". Then I can use the same 100 reserved id values for the inserts into the recent and historical tables without having to query them again from the current table.
My question: Is there some reason that this is a bad idea? I have seen nothing about it on the web. The closest I have seen on stack overflow is Insert into an auto increment field but that is not quite the same thing. I can see possible threading issues if I tried to do this from more than one thread at a time.
I'm also open to other suggestions on how to accomplish this.
There might be concurrency issues: If another connection inserts values between the time you get the current value and you set the new value, you would get duplicate keys.
I am not aware if that can happen in your situation, however, or if the inserts happen only from your batch described above, and there is never another instance of it running in parallel.
Methinks you shoud decouple the IDs from the 3 tables and using ALTER TABLE sounds very fishy too.
The most proper way I can think of:
in recent and historical, add a colum that references to current ID; don't try to force the primary IDs to be the same.
Acquire a WRITE table lock on current.
Get the auto_increment value X for current.
Insert your 100 records; their IDs should now run from X+1 to X+100.
Release the table lock.
Insert records in recent and historical with the know IDs in the extra column.
Note: I'm not sure if the auto_increment value points to the next ID, or the current highest value. If you use MAX(id) then you should use the code above.
This is [a bit] late, but in case someone else has this same question (as I did):
As Ethan pointed out in his comment, auto_increment is an internal MySQL utility to produce unique keys. Since you have the ability to generate your own id values external to MySQL, I suggest removing the auto_increment overhead from the table (but keep id as PK, for transport to the other tables). You can then insert your own id values along with the data.
Obviously once you do this you'll have to program your own incrementing id values. To retrieve a "starting point" for each batch and maintain the speed of a single batch INSERT call, create another table (I'll call in management) with just a single record of last_id, which is equivalent to, but independent of, max(id) of your three primary tables. Then, each time a new batch is ready to be processed, start a transaction on management with a write lock, read management.last_id, UPDATE management.last_id to (last_id+1)+number in batch, then close the transaction. You now have sequential id values to insert that are reserved for that batch because any future calls to management.last_id will the next-larger set of id values.
The write-locked transaction removes any concurrency issues (as stated in FrankPI's answer) because any other processes attempting to read management must wait for the lock to be removed and will return the value after the UPDATE. This also removes the id ambiguity in JvO's answer: "...IDs should now run from X+1 to X+100", which can be a dangerous assumption.

mysql delete,autoincrement

I have a table in MySQL using InnoDB and a column is there with the name "id".
So my problem is that whenever I delete the last row from the table and then insert a new value, the new value gets inserted after the deleted id.
I mean suppose my id is 32, and I want to delete it and then if I insert a new row after delete, then the column id auto-increments to 33. So the serial format is broken ie,id =30,31,33 and no 32.
So please help me out to assign the id 32 instead of 33 when ever I insert after deleting the last column.
Short answer: No.
Why?
It's unnecessary work. It doesn't matter, if there are gaps in the serial number.
If you don't want that, don't use auto_increment.
Don't worry, you won't run out of numbers if your column is of type int or even bigint, I promise.
There are reasons why MySQL doesn't automatically decrease the autoincrement value when you delete a row. Those reasons are
danger of broken data integrity (imagine multiple users perform deletes or inserts...doubled entries may occur or worse)
errors may occur when you use master slave replication or transactions
and so on ...
I highly recommend you don't waste time on this! It's really, really error prone.
You have two major misunderstandings about how a relational database works:
there is no such thing as the "last row" in a relational database.
The ID (assuming that is your primary key) has no meaning whatsoever. It doesn't matter if the new row is assigned the 33, 35354 or 236532652632. It's just a value to uniquely identify that row.
Do not rely on consecutive values in your primary key column.
And do not try the max(id)+1 approach. It will simply not work in a system with more than one transaction.
You should stop fighting this, even using SELECT max(id) will not fix this properly when using transactional database engine like Innodb.
Why you might ask? Imagine that you have 2 transactions, A and B, that started almost at the same time, both doing INSERT. First transaction A needs new row id, and it will use it from invisible sequence associated with this table (known as AUTOINCREMENT value), say 21. Another transaction B will use another successive value (say 22) - so far so good.
But, what if transaction A rolls back? Value 21 cannot be reused, and 22 is already committed. And what if there were 10 such transactions?
And max(id) can assign the same value to both A and B, so this is not valid as well.
I suppose you mean "Whenever I delete the last row from the table", isn't it?
Anyway this is how autoincrement works. It's made to keep correct data relations. If in another table you use an id of a record that has been deleted it's more correct to get an error instead of get another record when querying that id.
Anyway here you can see how to get the first free id in a field.