mysql auto increment primary key running out - mysql

I maintain a table with an ID AUTO INCREMENT PRIMARY KEY. When I delete an entry and re-add one, the new entry does not take the ID of the previous one instead it increments again by one. Is that normal and is it advised to not change this behavior? I just have a feeling this is creating a non scalable system as eventually it could run out of indexes.

This is by design, million of databases have primary keys like these with an integer key.
If you delete 90% of your inserts, you will run out of keys after 400 million rows1)
If and when you do you can do an
ALTER TABLE `test`.`table1` MODIFY COLUMN `item_id` BIGINT UNSIGNED NOT NULL
, ROW_FORMAT = DYNAMIC;
where column item_id would be your primary key.
After that you'll never have to worry about running out of key-space again.
Don't be tempted to start out with a bigint primary key!
It will make all your queries slower.
It will make the tables bigger.
On InnoDB the primary key is included on every secondary index, making a small primary key much faster with inserts.
For most tables you will never need it.
If you know your big table will have more rows than the integer can hold, than by all means make it a bigint, but you should only do this for tables that really need it. Especially on InnoDB tables.
Don't use a GUID, it's just a lot of wasted space, slowing everything way down for no reason 99,99% of the time.
1) using a unsigned! integer as primary key.

User niceguy07 uploads a picture of his kitten. The picture is saved as 000012334.jpg because you use primary keys as filenames instead of putting untrusted user data into them (which is a good idea).
niceguy07 sends a link with ?picture_id=12334 to his date.
niceguy07 deletes his kitten pictures and user fatperv08 uploads a picture of himself wearing only a batman mask.
Your database reuses primary keys, so unfortulately now the link with ?picture_id=12334 points to a picture of a naked fat perv wearing a batman mask.
Re-using primary key values of deleted records is an extremely bad idea. It is, in fact, a bug, if the primary key leaks out of the database because you use it in :
a URL
a filename
dumped along with other data in a file
etc
Since it is, in fact, very useful to do all of the above, not reusing primary key ids is a good idea...

It's fine. Depending on how many records you expect, you may want to make sure it's a bigint type, but int should be fine in most cases.

It is normal. Don't worry about ID's being sequential.

It depends on the number of records you plan on having and how often they are deleted and added, but if it's going to be an issue, use a bigint primary key.
There are also other options, such as using a GUID, if you are truely worried about running out of rows, but I've never run into a situation where I actually needed a bigint, I just occasionally use them on volitle tables to be safe.

A primary key in database design should be seen as a unique identifier of the info being represented. By removing an ID from a table you are essentially saying that this record shall be no more. If something was to take its place you are saying that the record was brought back to life. Now technically when you removed it in the first place you should have removed all foreign key references as well. One may be tempted at this point to say well since all traces are gone than there is no reason something shouldn't take it's place. Well, what about backups? Lets say you removed the record by accident, and it ended up replaced by something else, you would not be able to easily restore that record. This could also potentially cause problems with rollbacks. So not only is reusing ID's not necessary but fundamentally wrong in the theory of database design.

adding to dykstrad's great comment
If for example the primary key points to a point of sale price as an example, say a diet coke. Sloppy house keeping you delete the old price and reinsert the new price. Good housekeeping would dictate that you would do an update on the item which preserves the key / referential relationships as it is still the same item / unique relationship

Related

Database Table without Primary Key

I have a table used for a message board (See below). I know best practices dictate creating a Primary Key but I can see no reason to create one. I will be searching mainly on (UID, GRP_ID) and will create an index on this. Deleting will be based on Last_timestp. In this scenario, should there be a PK?
CREATE TABLE CP.CHAT
(UID BIGINT NOT NULL, GRP_ID BIGINT NOT NULL, CHAT VARCHAR(200) NOT NULL, LAST_TIMESTP TIMESTAMP);
Primary keys are not required, and your table will work just fine.
Here are some reasons why I almost always use primary keys.
They let you uniquely target a row. If you have multiple rows with the exact same data, it can be tedious to delete it out.
They provide an implicit ordering to the table. If you are troubleshooting your database, the sequence of the keys tells you the order they were created.
Ultimately, the room taken up by PK are not going to be a huge overhead for the database in terms of storage space. It won't hurt to have it, but it could help to have it.
I see no need to tell you what a Primary Key is and what it is used for. It seems you already know it, and figured out the structure just fine.
The thing is, if you won't have any other tables which depends on this table, then your table setup will be sufficent for you. As you already have a UID column there, I say I would make that a PK, since it won't cost you much in terms of storage, but also could help you sort your table (by auto-incrementing integers) and would work as a selective for delete and update operations in the future.
Still couldn't figure out how you'll manage to delete records by Last_timestp values, though...

Please confirm my use of primary key and unique index

I think I understand primary keys and indexes.
In my setup, I have a table with several columns. Two of these columns are User ID, and Username.
Ideally I would like both to be unique, and non nullable.
As far as I can tell, my best use would be to have the User ID as the primary key, as this is the most important field not to NULL, and it will never change as the database grows.
I would then have to have the username column as a unique index, so that it can be the same on another row, although unfortunately, could end up NULL.
This is what I will do unless there is a way to have both columns as unique and non NULLABLE?
You can declare the Username column as NOT NULL and put an unique index on it. Although the index itself won't force not-null values, the field definition will, so it will be effectively a unique non-nullable field.
From both my application development and datawarehouse experience I would recommend having a separate primary key that is not used in any business setting and do not use User ID as the primary key. Using UserID as the primary key can lead to a whole host of problems. I would index each column (separately).
Anytime you need to merge or reassign a user or change their ID, etc, having actually used their userID as the primary key will lead to a lot of problems for those operations.
Also, on the web, this will open up people seeing URL's like ....user/1/details and then potentially being able to change the '1' to a '2' (for example) and seeing other peoples info. It is better if the ID is unique like '57489574389ghfjghfjghf' and then it's harder to hack URLs with.
The choice between a 'natural' and a 'surrogate' key is explained well here:
http://www.agiledata.org/essays/keys.html
Most of the problems people experience in this area are for edge cases such as merges and deletes. These are usually of low priority initially but concern over them will grow over time and poorly engineered solutions will start to break down (usually because at the point that data quality is 'recognized' there is often such a large volume of 'bad' data that going forward is untenable - the old data can't be 'fixed' and without that rules are hard to introduce for new records which will co-exist with them. This assumes that the ability to update old records is still required.
Nop, sorry to say you are incorrect, on both accounts.
1) Right about everything, except that the PK can change if you want it to.
2) Unique index is, by definition, unique, it cannot be repeated. What you mean is a plain old index, not unique, which can be repeated. Its purpose is to speed up querying if you filter often by that field. Otherwise is better not to use it.
What you want: Column1 = Primary Key (not null), Column2 = Unique Index (not null), exactly what you said, but now you know why it does work as you need it to.
EDIT: Also, it seems you make a corelation between indexes and non-nullables. You can make a column non-nullable, independently of whether it is an index or not.
Totally agree with Michael, your primary key column should not contain any meaningful data, especially like userID. So you should add another column for the PK and fill it from a sequence.
Also agree with Darhazer: you should put a not null constraint and a unique index on both the userid and username fields.

Can a database table be without a primary key?

Can anyone tell me if a table in a relational database (such as MySQL / SQL SERVER) can be without a primary key?
For example, I could have table day_temperature, where I register temperature and time. I don't see the reason to have a primary key for such a table.
Technically, you can declare such a table.
But in your case, the time should be made the PRIMARY KEY, since it's probably wrong to have different temperatures for the same time and probably useless to have same more than once.
Logically, each table should have a PRIMARY KEY so that you could distinguish two records.
If you don't have a candidate key in you data, just create a surrogate one (AUTO_INCREMENT, SERIAL or whatever your database offers).
The only excuse for not having a PRIMARY KEY is a log or similar table which is a subject to heavy DML and having an index on it will impact performance beyond the level of tolerance.
Like always it depends.
Table does not have to have primary key. Much more important is to have correct indexes. On database engine depends how primary key affects indexes (i.e. creates unique index for primary key column/columns).
However, in your case (and 99% other cases too), I would add a new auto increment unique column like temp_id and make it surrogate primary key.
It makes much easier maintaining this table -- for example finding and removing records (i.e. duplicated records) -- and believe me -- for every table comes time to fix things :(.
If the possibility of having duplicate entries (for example for the same time) is not a problem, and you don't expect to have to query for specific records or range of records, you can do without any kind of key.
You don't need a PK, but it's recommended that you have one. It's the best way to identify unique rows. Sometimes you don't want an auto incremental int PK, but rather create the PK on something else. For example in your case, if there's only one unique row per time, you should create the PK on the time. It makes looks up based on time faster, plus it ensures that they're unique (you can be sure that the data integrity isn't violated):
Even if you do not add a primary key to an InnoDB table in MySQL, MySQL adds a hidden clustered index to that table. If you do not define a primary key, MySQL locates the first UNIQUE index where all the key columns are NOT NULL and InnoDB uses it as the clustered index.
If the table has no primary key or suitable UNIQUE index, InnoDB internally generates a clustered index GEN_CLUST_INDEX on a synthetic column containing row ID values.
https://dev.mysql.com/doc/refman/8.0/en/innodb-index-types.html
The time would then become your primary key. It will help index that column so that you can query data based on say a date range. The PK is what ultimately makes your row unique, so in your example, the datetime is the PK.
I would include a surrogate/auto-increment key, especially if there is any possibility of duplicate time/temperature readings. You would have no other way to uniquely identify a duplicate row.
I run into the same question on one of the tables i did.
The problem was that the PK was supposed to be composed out of all the rows of the table all is well but this means that the table size will grow very fast with each row inserted.
I choose to not have a PK, but only have an index on the row i do the lookup on.
When you replicate a database on mysql, A table without a primary key may cause delay in the replication.
http://lists.mysql.com/mysql/227217
The most common mistake when using ROW or MIXED is the failure to
verify that every table you want to replicate has a PRIMARY KEY on
it. This is a mistake because when a ROW event (such as the one
documented above) is sent to the slave and neither the master's copy
nor the slave's copy of the table has a PRIMARY KEY on the table,
there is no way to easily identify which unique row you want
replication to change.
According to your answer I would consider three options:
put a PK on both cols, this way for each time there could be only one temp and vise versa. This solution allows for multiple rows with the same temp or the same time just that there wouldn't be any two rows with same temp AND time.
don't put a PK at all but do put a unique index on both cols. one unique index containing both cols. this would allow for nulls in temp and time but incurs more space to maintain index.
these two options would be best for retrieval speed if you have heavy reads but would result in lower inserts rate as indices would have to be updated as well.
don't put any index at all, nor PK. this would be best for inserts but very bad for searching. useful for logging where retrieval is done by another
mechanism or when inserting device is not required to check for dups.
Also, it is very important to consider cardinality here and think about future consequences of using an auto incremented number. if you're planning to do A LOT OF inserts then even an auto incremented unsigned bigint would be a risk because it would eventually run out. In your example I guess you'll be saving data daily - for how long? this would be problematic if you saved temp every minute... so I'll take this as an extreme example.
I guess it is best to think about what you need from the table. are you doing "save-and-forget" for the entire year for the temp at every minute? are you going to use this table frequently in real-time decision making in your business logic? I think it is best to segregate data necessary for real-time (oltp) from long-term saving data that would be required seldom and its retrieval latency is allowed to be high (olap). it's even worth duplicating the data into two different tables, one heavily indexed and get erased once in a while to control cardinality and the second is actually saved on a magentic disk with almost no indices at all (it is possible to transfer a schema from your main fs into another fs).
I've got a better example of a table that doesn't need a primary key - a joiner table. Say I have a table with something called "capabilities", and another table with something called "groups", and I want a joiner table that tells me all the capabilities that all the groups might have, so it's basicallly
create table capability_group
( capability_id varchar(32),
group_id varchar(32));
There is no reason to have a primary key on that, because you never address a single row - you either want all the capabilities for a given group, or all the groups for a given capabilty. It would be better to have a unique constraint on (capabilty_id,group_id), and separate indexes on both fields.

Mysql auto increment primary key id's

I have some mysql tables that have auto incrementing id's that are primary keys, but I notice that I never actually use them... I used to think that every table must have a primary key so I guess that is why I created them before. Should I remove them all if I don't use them at all?
Unless you are running into space problems I wouldn't remove them.
They are a life saver in case you by mistake (or oversight) populate the database with repeated/wrong data.
They also help to have related tables, where you reference the content on one table through the autogenerated id.
This is assuming you have indexes for the other columns you use to actually query the data (if you don't, then more reason to keep the autoincrement ids and use them!).
No.
You should keep them; a database always needs something that differentiates a row from another row (a "Key" of some sort).
If you have something that is guaranteed to be unique for each row, then you can use that as a key; otherwise keep the Primary Key and the Auto generated ID.
I'd personally keep them. They will be especially useful at a later date if you expand the database design and need to reference this table.
Interesting!...
I seem to hold a minority opinion here, getting both upvoted and downvoted to currently an even 0, yet no one in the majority opinion (see responses above) seems to make much of a case for keeping the id field, and the downvoters didn't even bother leaving comments hinting at why doing away with the id is such a bad idea.
In their defense, my own original response did not include any strong argument as to why it is ok to do away with the id attribute in some cases (which seem to apply to the OP). Maybe such a gratuitous response makes it, in of itself, a downvotable response.
Please do educate me, and the OP, by leaving comments pro or against the _systematic_ (and I stress "systematic") need to include auto-incremented non-semantic primary keys in all tables. A promised I returned and added to my response to provide a list of reasons why it may be detrimental to [again, systematically] impose a auto-incremented PK.
My original response:
You bet! you can remove these!
Before you do anything to the database make sure you have a backup, in particular is the DB size is significant.
Use the ALTER TABLE statement to remove the id in the tables where you want to remove it. Specifically
ALTER TABLE myTable DROP COLUMN id
(you also need to remove the PK constraint before removing the id, if the table has such a constraint)
EDIT (Added later)
There are many cases where it just doesn't make sense to carry along an autoincremented ID key, regardless of the relative little extra storage requirement these keys add.
In all these cases, the underlying implication is that
either the data itself supplies a primary key,
or, the application manages the key generation
The key supplied "natively" in the data doesn't necessarily neeeds to be a single column key, it can be a composite key, although in these cases one may wish to study the situation more closely, particularly is the overal key is a bit long.
Here are some of the drawbacks of using an auto-incremeted primary key in lieu of a native or application-supplied key:
The effective data integrity may go unchecked
i.e. the server may allow record insertions of updates which create a duplicated [native] key (eventhough the artificial, autoincremented primary key hides this reality)
When relying on the auto-incremented PK for the support of joins between tables, when part of the [native] key values have to be updated...
...we either create the need of deleting the record in full and and re-insert it with the news values,
...or the risk of keeping outdated/incorrect links.
A common "follow-up" with auto-incremented keys is to create a clustered index on the table for this key.
This does make sense for tables without an native or application-supplied primary key, so so much for data sets that have such keys.
Effectively this prevents choosing a key for the clustered index which may be more beneficial for the most common query patterns.
Migrating tables with an auto-incremented key can made more difficult depending on the DBMS (need to declare the underlying column as plain integer, prior to copy, then need start again the autoincrement...)
For narrow tables, i.e. tables with a few columns only, the relative cost of the auto-incremented PK can be significant, and impact performance in a non negligible fashion.
When inserting new records along with associated records in related tables, the auto-incremented key needs to be obtained after the insertion of the main record, before the related records can be inserted; the logic is simpler when the column values supporting the link are known ahead of time.
To summarize, the idea that so long as the storage can carry the [relatively minimal] extra "weight" of the artificial primary key, we should include and use such a key, is not without drawbacks of its own.
A final consideration is that just like it is rather easy to remove such keys when we don't need them, they too can be easily added, post-facto, when/if it becomes apparent that they are useful in a particular situation. Neither form of refactoring (adding vs. removing the auto-incremented columns) is risk free, but neither is a major production either.
Yes, if you can figure out another primary key.
There is obviously a flaw of your table design. For example, you had a table like
relation_id(PK), parent_id, child_id .
It is known that the combination of parent_id and child_id is unique, then you can assign the primary key to be parent_id + child_id, and then drop the column relation_id.
There should may endlessly other possible cases, but just bear in mind that primary key is helping you to locate data quickly, as well as helping you have your design making sense.

Should I add a autoinc primary key for the sake of having a primary key?

I have a table which needs 2 fields. One will be a foreign key, the other is not necessarily unique. There really isn't a reason that I can find to have a primary key other than having read that "every single tabel ever needs needs needs a primary key".
Edit:
Some good thoughts in here.
For clarity's sake, I will give you an example that is similar to my database needs.
Let's say have a table with product type, quantity, cost, and manufacturer.
Product type will not always be unique (say, MP3 Player), but manufacturer/product type will be unique (say, Apple MP3 Player). Forget about the various models the manufacturers make for this example. For ease, this table has a autoincrementing primary key.
I am giving a point value and logging how often these products are searched for, added to a cart, and bought for display on a list of hot items.
The way I have it layed out currently is in a second table with a FK pointing to the main table, and a second column for the total number of "popularity points" this item has gained.
The answers have seen here have made me think that perhaps I should just add a "points" column to my primary products table so that I could just track there... but that seems like I'm not normalizing my database enough.
My problem is I'm currently mostly just a hobbyist doing this for learning, and don't have the luxury of a DBA to tell me how to set up my tables, so I have to learn both the coding side and the database side.
You have to distinguish between primary key and surrogate key. Auto-incremented column would be a particular case of the latter. Your question, therefore, is twofold:
Does every table need to have a primary key?
Does every table need to have a surrogate primary key?
The answer to first question is YES except in some special cases (association table for many-to-many relationship arguably being an example of such a special case). The reason for this is that you usually need to be able (if not right now then in the future) to consistently address individual rows of that table - for updates / deletion, for example.
The answer to the second question is NO. If your table represents a core business entity then OR it can be referenced from many-to-one association, having a surrogate key is probably a good idea; but it's not absolutely necessary.
It's somewhat unclear what your table's function is; from your description it sounds like it has "collection of values" semantics (FK to "main" table + value). Certain ORMs don't support surrogate keys in such circumstances; if that's what has prompted your question it's OK to leave the surrogate (or even primary in case of bag) key off.
For the sake of having something unique and as identifier, please please please please have a primary key in every table :)
It also helps forward compaitability in case there are future schema changes and 2 values are no long unique. Plus, memory are much cheaper now, feel free to use them as investments. ;)
i am not sure how the other field looks like .. but i am guessing that it would be to ok to have a composite primary key , which is based on the FK and the other field .. but then again i dont know your exact scenario.
I would say that it's absolutely necessary to have some sort of primary key in every table.
Interestingly enough, one of the DBA's for a Viacom property once told me that there was really no discernible difference in using an INT UNSIGNED or a VARCHAR(n) as a primary key in MySQL. This was in reference to a user table with more than 64 million rows. I believe n can be decently large (<=100), but I forget the what they limited to. Unfortunately, I don't have any empirical data to back that up.
You don't HAVE to have a primary key on every table, but it is considered best practice to have them as they are almost always necessary on a normalized relational database design. If you're finding a bunch of tables you don't think need PKs, then you should revisit the design/layout of your tables. To read more on normalization see here.
A couple scenarios that I can think of where you may not need or want a PK on a table would be a table strictly for logging. (to limit performance degradation of writing the log and maintaining a unique index) and in the scenario where your just storing data used to pump through an application for test purposes.
I'll be contrary and say you shouldn't add the key if you don't have a reason for it. It is very easy to add this column later if needed.
Strictly speaking, a surrogate key is not necessary, but a primary key is.
Many people use the term "primary key" to mean a single column that is an auto-incrementing integer. But this is not an accurate definition of a primary key.
A primary key is a constraint on one or more columns that serve to identify each row uniquely. Yes, you need some way of addressing individual rows. This is a crucial characteristic of a relation (aka a table).
You say you have a foreign key and another column that is not unique. But are these two columns taken together unique? If so, you can declare a primary key constraint over these two columns.
Defining another surrogate key (also called a pseudokey -- the auto-incrementing type) is a convenience because some people don't like to have to reference two columns when selecting a single row. Or they want the freedom to change values in the other columns easily, without changing the value of the primary key by which one addresses the individual row.
This is a technique related to normalization and a pretty good practice. A key made up of an auto incrementing number has many benefits:
You have a PK that does not pertain to the data.
You never have to change the PK value
Every row will automatically have a unique identifier