Renewable primary or unique key - mysql

There is a table. No PK, 2 FK, with some arbitrary number of columns.
Unfortunately FK are not unique in any way.
Adding new data is easy.
Deleting data (finding a row) is ok if I put unique constraint to some other col.
(DELETE ... WHERE fk1=:fk1 AND fk2=:fk2 AND ucol=:ucol)
What to do with UPDATE?
I cant use that ucol because that same ucol might be subject of change. I have several solutions, but none of them seem ok.
Solution1:
Put PK in table. Use it for DELETE and UPDATE. Deleting will make lot of holes in it but that's no problem. In theory, it can run out of PK numbers (int, unsigned int) if there's some heavy deleting going on.
Solution1a
Make CK of (fk1, fk2, some new col) use that to locate the row. It's the same as just using the PK.
Solution2
Use timestamp with microtime/ hash/ unique key generator/ something to populate new unique col. That col is used as PK to locate the row for UPDATE and DELETE. Excellent only if unique algo does it's job perfectly.
My question:
Is there something better? That doesn't require fancy algorithms and have no risk of overflowing auto-incremented PK...
----------------- edit----------------
Solution2a
Use mysql UUID! It's far better (and easier to use) than, creating custom timestamp / hash / something_unique.

As per my suggestion , it will be better to add a PK to the table because of following reasons:
1. It will give unique id to each row , which will help in DELETE and UPDATE script.
2. PK will create a cluster index on the column which will improve performace of the table while retriving data.
3. Its always adviced to provide a PK in each table.
4. In future you can use the PK as a FK in any table if required.

Related

Auto-increment a primary key in MySql

During the creation of tables using mysql on phpmyadmin, I always find an issue when it comes to primary keys and their auto-increments. When I insert lines into my table. The auto_increment works perfectly adding a value of 1 to each primary key on each new line. But when I delete a line for example a line where the primary key is 'id = 4' and I add a new line to the table. The primary key in the new line gets a value of 'id = 5' instead of 'id = 4'. It acts like the old line was never deleted.
Here is an example of the SQL statement:
CREATE TABLE employe(
id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(30) NOT NULL
)
ENGINE = INNODB;
How do I find a solution to this problem ?
Thank you.
I'm pretty sure this is by design. If you had IDs up to 6 in your table and you deleted ID 2, would you want the next input to be an ID of 2? That doesn't seem to follow the ACID properties. Also, if there was a dependence on that data, for example, if it was user data, and the ID determined user IDs, it would invalidate pre-existing information, since if user X was deleted and the same ID was assigned to user Y, that could cause integrity issues in dependent systems.
Also, imagine a table with 50 billion rows. Should the table run an O(n) search for the smallest missing ID every time you're trying to insert a new record? I can see that getting out of hand really quickly.
Some links you might like to read:
Principles of Transaction-Oriented Database Recovery (1983)
How can we re-use the deleted id from any MySQL-DB table?
Why do you care?
Primary keys are internal row identifiers that are not supposed to be sexy or good looking. As long as they are able identify each row uniquely, they serve their purpose.
Now, if you care about its value, then you probably want to expose the primary key value somewhere, and that's a big red flag. If you need an external, visible identifier, you can create a secondary column with any formatting sequence and values you want.
As a side note, the term AUTO_INCREMENT is a bit misleading. It doesn't really mean they increase one by one all the time. It just mean it will try to produce sequential numbers, as long as it is possible. In multi-threaded apps that's usually not possible since batches or numbers are reserved per thread so the row insertion sequence may end actually not following the natural numbering. Row deletions have a similar effect, as well as INSERT with roll backs.
Primary keys are meant to be used for joining tables together and
indexing, they are not meant to be used for human usage. Reordering
primary key columns could orphan data and wreck havoc to your queries.
Tips: Add another column to your table and reorder that column to your will if needed (show that column to your user instead of the primary key).

Should I use my two columns that uniquely identify a record as primary key?

I started to design a database that tracks system events by following some online tutorials, and some easy examples start by assigning auto-incrementing IDs as primary keys. I looked at my database, I don't really need IDs. Out of all my columns, the timestamp and device ID are the two columns that together identifies an unique event.
What my program does right now is to pull some events from system log in the past x minutes and insert these events to the database. However, I could be going too much into the past that the events overlap with what's already in the database. As I mentioned before, timestamp and device ID are the two fields that uniquely identify an event. My question is, should I use these two fields as my primary key and use "Insert ignore" from now on so I can avoid having duplicate records?
It is a good practise to never have your business values as table's primary key and always to use synthetic, e.g. autoincrement, values for this. You will make your life easier in the future when business requirements change :)
We are currently struggling with exactly this situation. Have a column with business values as a primary key for 2 years and now painfully introducing an autoincrement one.
You may need to use foreign key from other table to this in the future to link some rows between two tables. It is easier with one-column primary key.
But if you don't need it now - no need to create column special for index. Table can be altered in future to add such column with autoincrement and move primary key to it.

Rearrange primary keys in mysql

How to rearrange primary key column values after deleting some rows from a table in MySQL?
Foe example; a table with 4 row of data with primary key values 1,2,3,4. When delete 2nd and 3rd rows, then the key value of 4th row change to 2.
Please help me to find solution.
Why do this? You don't need to rearrange your key since it's only number, identifier for record. It has no actual meaning - so let DBMS handle that. This is a very common mistake - trying to take DBMS role.
However, I'll answer your question for common case. In MySQL you can rearrange column with:
update t cross join (select #cur:=0) as init set t.col=#cur:=#cur+1
-this, however, can't be used with column under UNIQUE (so primary key as well) restriction since during update you'll possibly get duplicate records. You should drop restriction first before do that (and create it again after update).
One method is THIS ONE.
Other then that, you can simply drop the table which is primary and then again create it. This will do the job
Why do you want to change primary keys for your data? In general this is bad idea to do that, especially when integrity contstraints comes into the game. If you need to do such thing, I would say you have bad DB desing and you should take closer look on that aspect.

how to perform update query using surrogate key

I am very new to database concepts and currently learning how to design a database. I have a table with below columns...
this is in mysql:
1. Names - text - unique but might change in future
2. Result - varchar - not unique
3. issues_id - int - not unique
4. comments - text - not unique
5. level - varchar - not unique
6. functionality - varchar - not unique
I cannot choose any of the above columns as primary keys as they might change in future. So i created a Auto-Increment id as names_id. I also have a GUI( a JTable) that shows this table and user updates Result,issues_id and comments based on the Names.Names here is a big text column. I cannot display names_id in the GUI as it does not make any sense in the GUI. Now when the user updates the database after giving inputs for column2,3,4 in the GUI i used the below query to update the database, i couldnt use names_id in where clause as the Jtable's row_id does not match with the names_id because not all the rows are loaded onto JTable.
update <tablename> set Result=<value>,issues_id=<value>,comments=<value>
where Names=<value>;
I could get the database updated but i want to know if its ok to update the database without even using the PK. how efficient is this? what purpose does the surrogate key serve here?
It is perfectly acceptable to update the database using a where condition that doesn't reference the primary key.
You may want to learn about indexes and constraints, though. You query could end up updating more than one row, if multiple rows have the same name. If you want to ensure that they are unique, then you can create a unique constraint on the column.
A primary key always creates an index on that column. This index makes access fast. If there is no index on name, then the update will need to scan the entire table to look at all names. You can make this faster by building an index on the field.

How to save all versions of posts in mysql database

It is popular to save all versions of posts when editing (like in stackexchange projects), as we can restore old versions. I wonder what is the best way to save all versions.
Method 1: Store all versions in the same table, and adding a column for order or active version. This will makes the table too long.
Method 2: Create an archive table to store older versions.
In both methods, I wonder how deals with the row ID which is the main identifier of the article.
The "best" way to save revision history depends on what your specific goals/constraints are -- and you haven't mentioned these.
But here some thoughts about your two suggested methods:
create one table for posts, and one for post history, for example:
create table posts (
id int primary key,
userid int
);
create table posthistory (
postid int,
revisionid int,
content varchar(1000),
foreign key (postid) references posts(id),
primary key (postid, revisionid)
);
(Obviously there would be more columns, foreign keys, etc.) This is straightforward to implement and easy to understand (and easy to let the RDBMS maintain referential integrity), but as you mentioned may result in posthistory have too many rows to be searched quickly enough.
Note that postid is a foreign key in posthistory (and the PK of posts).
Use a denormalized schema where all of the latest revisions are in one table, and previous revisions are in a separate table. This requires more logic on the part of the program, i.e. when I add a new version, replace the post with the same id in the post table, and also add this to the revision table.
(This may be what SE sites use, based on the data dump in the SE Data Explorer. Or maybe not, I can't tell.)
For this approach, postid is also a foreign key in the posthistory table, and the primary key in the posts table.
In my opinion, a interesting approach is
to define another table, for example posts_archive (it will contain all columns of posts table + an auto-incremented primary key + optionally a date...)
to feed this table through after-insert and after-updates triggers defined on posts table.
If the size of the table is an issue, then the second option would be the better choice. That way the active version can be returned quickly from a smaller table, and restoring an older version from the larger archive table is accepted to take longer. That said, the size of the table should not be an issue with a sensible database and indexing.
Either way, you need a primary key that consists of multiple table columns instead of just row ID. The trivial answer would be to include a timestamp containing the time each revision was created into the key, so that ID continues to identify a specific article, and ID and revision time together identify a specific revision of the article.
Dealing with temporal data is a known problem.
The method 1 simply changes your table identifier: you will end up with a table containing messageID, version, description, ... with a primary key messageID, version.
Modifying the data is done by simply adding a row with an incremented version. Querying is a little bit more complicated.
The method 2 is more tedious, you will end up with a table with a rowID and a second table that is exactly the same as in the method 1. Then, on every update, you will have to remember to copy the data into the "backup table".
The method 3: answser given by Matt
In my opinion, method 1 and 3 are better. The schema is simplier in 1, but you can have unversionned data for your posts using the method 3.