What is the best way to prevent duplicate values in databases ?
I have a table called names that has only one column called name that is unique (declared as unique attribute).
What is the best way to insert a new name (x) ?
Way1: Should I make a select query for the name x first to check if exist or not. Then make another query to insert the name iff it is not exists in the table.
Way2: Make only one query to insert the name and ignore the error if name already exists.
The second way is the better way. Why run two queries when you can just run one?
When you declare the column as unique, you have told the database to do the extra work for ensure that this is true. You don't need to do anything else -- other than check the errors on the return.
Database constraint will definitely take care about uniqueness, but if you have logic where you need to use last inserted ID to other child table, then only I think you will require to perform manual check before insert, else just ignore exception if raise due to duplication.
The first way works. After the action you can be sure that the record exists (unless some other error occured) You do need a second query (or some another mechanism) to retrieve the actual tuple, either the existing one or a fresly inserted one.
The second way is terrible: the DBMS session is in error-state, {your current work has implicitely been rolled back, and your all cursors have been closed} So, you'll have to start your work allover again, hopefully without the duplicate.
The case you give is a simplified "upsert". Do a search for upsert and you will find answers to the more general question. Some databases, like mysql provide for
insert ignore for this simple case.
Otherwise for the simple case you mention you can use the second approach. For the more general upsert, it is surprisingly difficult to get it right. The issue is concurrent updates. In fact, I have not seen a satisfactory answer for general upserts. Some say to use "merge" but that is subject to concurrency issues.
Related
I just started with mysql and I don't know if the way to make that check is the correct or I am going in the wrong direction.
I have a varchar named user_num in a table. and I need to check that when I do INSERT, the value of user_num_list have to be between [1, n] being "n" the quantiti of Objects that have the same Group has new Object.
I'm not English speaking and I'm sure it's a bit hard to understand, and for me to express myself, so there is some code:
create table player(
group varchar(15),
user_num int(15),
CONSTRAINT ck_player_user CHECK( user_num > 0 AND user_num < SELECT count(*) FROM player WHERE player.group=group)
)ENGINE=InnoDB;
I don't know if i can "SELECT" inside a CHECK statement and also i dont know how to express "player.group=group" meaning that group(new INSERT player_group) have to be same has player.group
Thank you.
You are out of luck.
First of all, MySQL does not validate the CHECK constraints before version 8.0.16. If you are using a MySQL version older than this one, MySQL will record the CHECK constraint and will silently ignore it
.
Second, in order to enforce this contraint you would need to use a subquery in it, and MySQL does not allow subqueries as part of the CHECK constraint. See 13.1.20.7 CHECK Constraints where it literally says:
Subqueries are not permitted.
The problem with subqueries in the CHECK condition is that it's practically impossible to implement. Consider what happens when you delete a row in this table. The count for the corresponding group will decrease, which might violate a CHECK constraint for another row. The system would need to reevaluate the checks for every row. If you now have one million rows in the table, one million queries need to be executed every time you INSERT, DELETE or UPDATE a row. If you would also permit cross table statements, all checks in the database need to be reevaluated on any change of data.
In theorie it would be possible to generate a "reverse check". The system would need to determine, which operations can possibly violate a check. In your case that would be deleting a row or updating a group. The "reverse check" could be
NOT EXISTS (
SELECT *
FROM player p
WHERE p.group = OLD.group
AND p.user_num > (SELECT COUNT(*) from player p WHERE p.group = OLD.group)
)
I think you would agree, that it would be hard to implement a system, that is able to generate a reverse check like this. Queries can be much more complicated than yours. And for some queries there might not even exist a reverse check. If you look at how long it has taken to implement the CHECK constraint - I wouldn't expect any subqueries to be permitted in the next decade.
So I would say - Your options are:
Implement your checks in triggers
Implement your checks in application code
Change the requirements
I'd prefer the last one.
Example tables (not actual database):
In this example, I would have the SecurityCode(Unique), and Time. My current solution involves attempting to add a new Person using the security code, then querying the ID, then adding to the Times table. This is 3 separate statements and could likely be a lot faster. Any advice on how to optimise this?
Thanks.
Edit: I previously forgot to mention that this is normally done in a batch of 30-40 records.
I am also considering using SecurityCode as the foreign key in Times.
I think there are many ways of achieve this, the easiest:
Try using "IF", you only need it for the first step of your statement, the last two are independent to the result of this evaluation.
Plus, save your security code in a variable, then you will save one table scan (you already have it)
**please note its just pseudo-code**
IF (exists select * from person where securityCode = #securityCode) then
Step 1
End
Step 2
Step 3
Can you try it?
The fastest way seemed to be to batch ignore insert all security codes, then batch insert all Times with a subquery to select the correct ID from Person.
Hi all I having a Identity column and a Computed primary key column in my table I need to get the last inserted record immediately after inserting the record in to database, So I have written the following queries can some one tell which is the best one to choose
SELECT
t.[StudentID]
FROM
[tbl_Student] t
WHERE
t.ID = IDENT_CURRENT('tbl_Student')
The other is using MAX as follows
Select
MAX(StudentID)
from tbl_Student
From the above two queries which is the best one to choose.
MAX and IDENT_CURRENT, according to technet, would behave much the same and both would be equally unreliable.
"IDENT_CURRENT is not limited by scope and session; it is limited to a specified table. IDENT_CURRENT returns the identity value generated for a specific table in any session and any scope. For more information, see IDENT_CURRENT (Transact-SQL)."
Basically, to return the last insert within the current scope, regardless of any potential triggers or inserts / deletes from other sessions, you should use SCOPE_IDENTITY. Of course, that's assuming you're running the query in the same scope as the actual insert in the first place. :)
If you are, you also have the alternative of simply using OUTPUT clause to get the inserted ID values into a table variable / temporary table, and select from there.
The original answer, where my assumptions about IDENT_CURRENTwhere wrong.
Use the first one. IDENT_CURRENT should give you the last item for the current connection. If someone else would insert another student concurrently IDENT_CURRENT will give you the correct value for both clients, while MAX might give you a wrong value.
EDIT:
As it was mentioned in the other answer IDENT_CURRENTand MAXare equally unreliable in case of concurrent usage. I would still go for IDENT_CURRENT but if you want to get the last identity used by the current scope or session you can use the functions ##IDENTITY and SCOPE_IDENTITY. This technet article explains the detailed differences between IDENT_CURRENT, ##IDENTITY and SCOPE_IDENTITY.
Current Structure
As you can see Path can be referenced by multiple Tables and multiple records within those tables.
Points can also be referenced by two different tables.
My Question
I would like to delete a PathType however this gets complicated as
a Path may be owned by more than one PathType so deleting the
Path without checking how many references there are to it is out
of the question.
Secondly, if this Path's only reference is the PathType I'm
trying to delete then I will want to delete this Path and any
records in PathPoints.
Lastly, if there are no other references on Point from any other records then this will also need to be deleted but only if its not used by any other object.
Attempts So Far
DELETE PathType1.*, Path.*, PathPoints.*, Point.* FROM PathType1,Path,PathPoints,Point WHERE PathType1.ID = 1 AND PathType1.PATH = Path.ID AND (SELECT COUNT(*) FROM PathType1 WHERE PathType1.PATH = Path.ID) < 1 AND (SELECT COUNT(*) FROM PathType2 WHERE PathType2.PATH = Path.ID) = 0
Obviously the above statement goes on but this isn't the right way about I don't think because if one fails then nothing is deleted...
I think that maybe it isn't possible to do what I'm attempting through one statement and I may have to iterate through each section and handle them based on the outcome. Not so efficient but I don't see any alternative at this time.
I hope this is clear. If you have any more questions or need any clarification then please do not hesitate to ask
First there is no way I would do this in a query like that even if the database allowed it which most will not. This is an unmaintanable mess.
The preferred method is to create a transaction, then delete from one table at a time starting with the bottommost child table. Then commit the transaction. And of course have error handling so the entire transaction is riolled back if one delete fails to maintain data integrity. If I intended to do this repeatedly, I would do it in a stored proc.
I am pretty new to this so sorry for my lack of knowledge.
I set up a few tables which I have successfully written to and and accessed via a Perl script using CGI and DBI modules thanks to advice here.
This is a member list for a local band newsletter. Yeah I know, tons of apps out there but, I desire to learn this.
1- I wanted to avoid updating or inserting a row if an piece of my input matches column data in one particular column/field.
When creating the table, in phpmyadmin, I clicked the "U" (unique) on that columns name in structure view.
That seemed to work and no dupes are inserted but, I desire a hard coded Perl solution so, I understand the mechanics of this.
I read up on "insert ignore" / "update ignore" and searched all over but, everything I found seems to not just skip a dupe.
The column is not a key or autoinc just a plain old field with an email address. (mistake?)
2- When I write to the database, I want to do NOTHING if the incoming email address matches one in that field.
I desire the fastest method so I can loop through their existing lists export data, (they cannot figure out the software) with no racing / locking issues or whatever conditions in which I am in obvious ignorance.
Since I am creating this from scratch, 1 and 2 may be in fact partially moot. If so, what would be the best approach?
I would still like an auto increment ID so, I can access via the ID number or loop through with some kind of count++ foreach.
My stone knife approach may be laughable to the gurus here but, I need to start somewhere.
Thanks in advance for your assistance.
With the email address column declared UNIQUE, INSERT IGNORE is exactly what you want for insertion. Sounds like you already know how to do the right thing!
(You could perform the "don't insert if it already exists" functionality in perl, but it's difficult to get right, because you have to wrap the test and update in a transaction. One of the big advantages of a relational database is that it will perform constraint checks like this for you, ensuring data integrity even if your application is buggy.)
For updating, I'm not sure what an "update ignore" would look like. What is in the WHERE clause that is limiting your UPDATE to only affect the 1 desired row? Perhaps that auto_increment primary key you mentioned? If you are wanting to write, for example,
UPDATE members SET firstname='Sue' WHERE member_id = 5;
then I think this "update ignore" functionality you want might just be something like
UPDATE members SET firstname='Sue' WHERE member_id = 5
AND email != 'sue#example.com';
which is an odd thing to do, but that's my best guess for what you might mean :)
Just do the insert, if data would make the unique column not be unique you'll get an SQL error, you should be able to trap this and do whatever is appropriate (e.g. ignore it, log it, alert user ...)