I’m in the process of cleaning up an old database. I have a table, User, with Identity 1,1 set for the User.SequenceNumber column which is used as the foreign key in many tables as the users ID number. Another table, UserNotes, had many records added erroneously over the past years and the UserNotes.UserID is zero. My initial thoughts were to add a User.SequenceNumber of zero with the User.UserName of ‘Unknown’ to restore the foreign key constraint between User.SequenceNumber and UserNotes.UserID. I have been able to successfully do this in the test area with a few records.
My concern is, when I add the ‘zero’ row, will it start causing problems in my 5 million plus database that has approximately 6 reads plus 4 records saved per minute? I found where people have problems with the database doing this when they don’t want it to happen but not where someone wanted to do this intentionally.
Inserting a 0 into a column defined as IDENTITY(1,1) should be fine as far as the database is concerned, because you would be inserting a value outside the range allocated for automatically generated values (which, assuming int, is [1..2,147,483,647]), meaning that your arbitrary value would not interfere with automatic value generation.
This could affect your application, although not very likely. For instance, if your application sees a 0 reference as a special value not because it is 0 but because of the absence of rows with SequenceNumber = 0 in User, that logic would be broken with the addition of the 0 row.
Alternatively, you could just replace all the zeros in UserNotes.UserID with NULLs. That would seem to me a more natural way to designate an "unknown" reference. On the other hand, it would also seem to me more likely to affect the application, and possibly to a greater extent too – particularly if the application is already set to work with zeros as special reference values and designed in a way so as to immediately distinguish between a 0 and a NULL when reading the results of a query.
Either way, there is a bigger issue you seem to be facing here. The fact that UserNotes.UserID has been allowed to contain non-existing references of 0, a value not found in the User.SequenceNumber column, implies that UserNotes.UserID is not formally defined as a foreign key. When there is no formal foreign key/primary key relationship, there is no referential integrity either (from the database's stand point anyway), and your UserNotes table is prone to getting non-existent references even after the clean-up you are carrying out.
You should really consider establishing the formal relationship between the tables at the database level. That may require additional changes to the application (depending on how it is designed), but it would be a justified time/effort investment and might spare you some headache in the future.
It's not going to break the Identity property.
I think you're right to be concerned that an artificial change could have side effects. But I think the application-side ones are the concern... what else does your application do with the record that you haven't considered.
If the zero row is unique and you don't have any other constraints on the column, you can insert the row.
Related
I have a system whereby users can input data into a mysql table from many sites across the globe.
The data is posted via ajax to my table without issues. But, I would like to improve my insertion code to prevent insertion if the timestamp is within some interval. This would weed out duplicate rows in my table.
Before you get angry -> I do understand I can set a primary key to certain columns and prevent duplicate insertion.
In my use case, I need to allow duplications of the numeric data where it is truly duplicated values from a unique submission -> this is valid in my case. I would like to leverage the timestamp to weed out obvious double insertions where the variables were submitted by accident twice.
I have tried to disable the button for 1-2 seconds, but this hasn't solved the problem entirely.
If I have columns: weight, height, country and the timestamp, I'd like to somehow check if there is an insert within n sections of the timestamp, where the post includes data that matches these variables. This would tell me that there is an accidental duplication from a user and I shouldn't insert it into the database.
I'm not too familiar with MYSQL, so I was hoping to get some guidance here.
Thanks.
There are different solutions, depending on the specifics of your case:
If you need to apply some rule that validates the new row using values inside the row itself a CHECK constraint will do. Consider, though, that MySQL enforces CHECK constraints starting in version 8.0.3 (if I remember well).
If you want to enforce a rule in relation to other rows, you can serialize the insertions into a queue. The consumer of the queue will validate the insertions one by one and will accept or reject them. Consider that serialization is not a good option for massive level of insertions, since it produce a bottleneck (this may be your case since you say insertions from across the globe).
Alternatively, you can use optimistic insertion, and always produce the insertion with an intermediate status "waiting for validation". Then other process(es) can validate the row. If all is good, then the row is approved; if not, then a compensation procedure is executed, in a-la-microservice way.
Which one is your case?
I have a MYSQL table, where (to an already existing table) I added another column "Number" that is auto_incremented and has a UNIQUE KEY constraint.
There are 17000+ records in the table. After adding the "Number" column, one value is missed - there is a value of 14 369 and the next one is 14 371.
I tried removing the column and adding it again, but the missing value is still missing.
What might be the problem, and what is the least painfull way to solve this?
There is no problem and there is nothing to fix.
MySQL's auto_increment provides unique values, and it calculates them using sequential increment algorithm (it just increments a number).
That algorithm guarantees the fastest and accurate way of generating unique values.
That's its job. It doesn't "reuse" numbers and forcing it to do so comes with disastrous performance and stability.
Since queries do fail sometimes, these numbers get "lost" and you can't have them back.
If you require sequential numbers for whatever reason, create a procedure or scheduled event and maintain the numbers yourself.
You have to bear in mind that MySQL is a transactional database designed to operate under concurrent access. If it were to reuse these numbers, the performance would be abysmal since it'd have to use locks and force people to wait until it reorganizes the numbers.
InnoDB engine, the default engine, uses primary key values to organize records on the hard drive. If you were to change any of the values, it would start re-writing the records incurring a HUGE I/O wait that depends on the amount of data on the disk - it could bring the whole serve to a grinding halt.
TL:DR; there is no problem, there is nothing to fix, don't do it. If you persist, expect abnormal behavior.
Everyone says don't re-use deleted MySql keys. eg. Stack Overflow question: I want to reuse the gaps of the deleted rows
I have read all of the "expert" opinions but have not found a single answer that gives a valid reason why not. Everyone simply asks "why do you want to"?
Well here is a very good reason. If my users have a choice of entering URL mysite.com/person.php?id=123 or a URL mysite.com/person.php?id=123456789123, which one would they most likely prefer?
So can anyone give me a reason why re-using 123 would be a bad idea? I am actually not talking about one record. My records get added and deleted in blocks of several thousand. Updates are very rare and I am the only person who does updates.
There are also no dependencies. Nothing points to those records so there are no integrity issues with other tables.
When I want to add another block of records I will have a simple search routine that searches for the first block of unused record keys large enough to accommodate all of the records being added. Much the same way that hard disk space usage works.
Keys are usually used as unique identifiers, if they are used again, they stop being unique, and become shared. This is the logic behind the idea of not to reuse keys.
So I would suggest, split the key and the id of the user, to two fields, key the key as unique, and the id make it "choose-able" via a gap-finding function.
Before you split, create this new column called user-id, and copy to it the id (which is currently your key) of the users.
Then make this column unique, so that you prevent accidental cases of id reuse.
And you are "home" free.
I am using MySql in phpMyadmin. I have a table which contains a primary key. This primary key is the 'userid' and it is also an "auto increment" field. The application also has a functionality of deleting a particular user with a 'userid'. So after deleting a user when i again create a new user, the 'userid' gets a value of the next integer. i want the table to consider the deletion and assign primary key value, numbers which have been deleted
..
example:
the 'userid' values in the table are - 1,2,3,4,5,6,7....
i deleted userid with value 3.
so now when i create a next record of user, the table should use the userid value '3' as it is no longer in use. how can i do that in phpmyadmin?
i want to do this to keep the no of values of userid minimum. the count may go upto a 5 digit value of the userid. hence if a 2 digit is available to use since its been deleted before, using this 2 digit value will save memory usage of the database
It is entirely possible to assign the ID that is no longer used by explicitely providing it in the next insert you make. AUTO_INCREMENT only assigns an id if you do not supply it yourself.
Be certain though that the ID is really not being used, otherwise the insertion will fail.
That being said, I would discourage doing this. I am not 100% certain, but I think that when you declare an integer in MySQL, it requires integer space, regardless of how many digits the integer has, but I am open to clarification on this point. In any case, I believe the minor benefit of potentially using a little less space is not worth risking failure by tinkering with your IDs.
In my experience, such little things have a tendency to haunt you later on, and I do not see the real benefit.
I suggest looking for other ways to improve memory usage if necessary.
I have some mysql tables that have auto incrementing id's that are primary keys, but I notice that I never actually use them... I used to think that every table must have a primary key so I guess that is why I created them before. Should I remove them all if I don't use them at all?
Unless you are running into space problems I wouldn't remove them.
They are a life saver in case you by mistake (or oversight) populate the database with repeated/wrong data.
They also help to have related tables, where you reference the content on one table through the autogenerated id.
This is assuming you have indexes for the other columns you use to actually query the data (if you don't, then more reason to keep the autoincrement ids and use them!).
No.
You should keep them; a database always needs something that differentiates a row from another row (a "Key" of some sort).
If you have something that is guaranteed to be unique for each row, then you can use that as a key; otherwise keep the Primary Key and the Auto generated ID.
I'd personally keep them. They will be especially useful at a later date if you expand the database design and need to reference this table.
Interesting!...
I seem to hold a minority opinion here, getting both upvoted and downvoted to currently an even 0, yet no one in the majority opinion (see responses above) seems to make much of a case for keeping the id field, and the downvoters didn't even bother leaving comments hinting at why doing away with the id is such a bad idea.
In their defense, my own original response did not include any strong argument as to why it is ok to do away with the id attribute in some cases (which seem to apply to the OP). Maybe such a gratuitous response makes it, in of itself, a downvotable response.
Please do educate me, and the OP, by leaving comments pro or against the _systematic_ (and I stress "systematic") need to include auto-incremented non-semantic primary keys in all tables. A promised I returned and added to my response to provide a list of reasons why it may be detrimental to [again, systematically] impose a auto-incremented PK.
My original response:
You bet! you can remove these!
Before you do anything to the database make sure you have a backup, in particular is the DB size is significant.
Use the ALTER TABLE statement to remove the id in the tables where you want to remove it. Specifically
ALTER TABLE myTable DROP COLUMN id
(you also need to remove the PK constraint before removing the id, if the table has such a constraint)
EDIT (Added later)
There are many cases where it just doesn't make sense to carry along an autoincremented ID key, regardless of the relative little extra storage requirement these keys add.
In all these cases, the underlying implication is that
either the data itself supplies a primary key,
or, the application manages the key generation
The key supplied "natively" in the data doesn't necessarily neeeds to be a single column key, it can be a composite key, although in these cases one may wish to study the situation more closely, particularly is the overal key is a bit long.
Here are some of the drawbacks of using an auto-incremeted primary key in lieu of a native or application-supplied key:
The effective data integrity may go unchecked
i.e. the server may allow record insertions of updates which create a duplicated [native] key (eventhough the artificial, autoincremented primary key hides this reality)
When relying on the auto-incremented PK for the support of joins between tables, when part of the [native] key values have to be updated...
...we either create the need of deleting the record in full and and re-insert it with the news values,
...or the risk of keeping outdated/incorrect links.
A common "follow-up" with auto-incremented keys is to create a clustered index on the table for this key.
This does make sense for tables without an native or application-supplied primary key, so so much for data sets that have such keys.
Effectively this prevents choosing a key for the clustered index which may be more beneficial for the most common query patterns.
Migrating tables with an auto-incremented key can made more difficult depending on the DBMS (need to declare the underlying column as plain integer, prior to copy, then need start again the autoincrement...)
For narrow tables, i.e. tables with a few columns only, the relative cost of the auto-incremented PK can be significant, and impact performance in a non negligible fashion.
When inserting new records along with associated records in related tables, the auto-incremented key needs to be obtained after the insertion of the main record, before the related records can be inserted; the logic is simpler when the column values supporting the link are known ahead of time.
To summarize, the idea that so long as the storage can carry the [relatively minimal] extra "weight" of the artificial primary key, we should include and use such a key, is not without drawbacks of its own.
A final consideration is that just like it is rather easy to remove such keys when we don't need them, they too can be easily added, post-facto, when/if it becomes apparent that they are useful in a particular situation. Neither form of refactoring (adding vs. removing the auto-incremented columns) is risk free, but neither is a major production either.
Yes, if you can figure out another primary key.
There is obviously a flaw of your table design. For example, you had a table like
relation_id(PK), parent_id, child_id .
It is known that the combination of parent_id and child_id is unique, then you can assign the primary key to be parent_id + child_id, and then drop the column relation_id.
There should may endlessly other possible cases, but just bear in mind that primary key is helping you to locate data quickly, as well as helping you have your design making sense.