How can Mysql INSERT operations be optimized for possible duplicate entries?

How can Mysql INSERT operations be optimized for possible duplicate entries? - mysql

In my NodeJs application, I am using a threads(defined by my self) to open ports in a computer. There is a restriction that only one thread should be bound to each port.
I am maintaining a table(PORTS) in Mysql which holds data about open ports and the bound thread. Currently I am using the following approach to avoid two threads being bind to the same port.
=> Insert an entry to the PORTS table with port number as the PRIMARY KEY. This will throws the following error
error inserting semaphore port:10014 error Error: ER_DUP_ENTRY: Duplicate entry 'port:10014' for key 'NAME'
I capture this error and then try another port until an error is not thrown.
Is this a good practice? Or should I first check if the value exists using a SELECT query and then INSERT entry if not?
Note- AFAIK In the second approach, SELECT and INSERT can be run as an atomic operation by using an SQL Transaction.

Assuming you have multiple 'clients' attempting to get 'ports' at the same time, your second approach is bad. This is because a different client could sneak in between the SELECT and INSERT and grab the port.
Your first approach is better because it is 'atomic'. And it is a single action, thereby being faster (not that speed is a big deal).

Related

Should I handle ER_DUP_ENTRY before or after INSERT?

I have Node.js backend application running with MySQL.
I am handling ER_DUP_ENTRY on Node.js side, after INSERT query executed on current version of server. So I am running INSERT and then if MySQL returns that ER_DUP_ENTRY, I show up a warning to the user.
I thought getting this error every time from MySQL put an extra load on the database.
My question is, should I have to check database with SELECT for DUPLICATE entry and then execute INSERT query or is there no problem with the current version?

You don't want to check before executing the insert query. Just set it up UNIQUE constraint over the table for a specific column or combined column(multiple). So whenever you will insert the same data again MySQL will handle this duplicacy.
check the insert query response status. Then you can determine this is success / failed(duplicate or some other error).

Using Unique Indices versus Querying the Database

I am working on a login / registration system in node.js.
I usually query the database, to check whether the given username already exists, and if it doesn't, I create the new user.
I got the idea recently, of using the Unique Index in the MySQL database for the username. I have some questions though.
What would be the most efficient way to check for duplicates? Search the database for the given username, or use the Unique Index and catch an error from MySQL if it already exists?
I feel unsafe with MySQL spitting out errors when duplicates are made, but maybe I'm just crazy.
If I were to use the Unique Index, would it still be efficient to use it for every unique value? Such as having a Unique index for the username, email etc.?

What would be the most efficient way to check for duplicates? Search the database for the given username, or use the Unique Index and catch an error from MySQL if it already exists?
In first case you will be finding the user with username and then check whether it is found or not. So in this case your DB checks for this username and you also put one check.
Now consider second case where unique index is present. So you give mysql the data and it will try to check first and either throws the error or put the data into DB. This way you don't have to check double if the usrname is already in the DB or not. This will also save you from race conditions
If you are worrying about the mysql throwing errors then don't worry. mysql will throw an integrity error which you can catch and send appropriate response like username exists already

It's better to use Unique Indexes (with the validation occurring in the database engine), because this avoids thread races and ensures database integrity. Validating through select is unsafe and not a recommended way of doing it.
With that said, I recommend checking with a select before inserting to notify the user of the "username taken" before the tentative to insert.
Another good reason to use Unique Indexes is the performance. Depending on the size of the table, it can be way faster.

MySQL perfomance: letting a UNIQUE field generate an error or manually checking it

Theoretical question about the impact on performance.
One of the fields in my table is unique. For instance, email_address in the Users table.
What has less of an impact on performance? Attempting to add an already existing email address and getting the error, or doing a search on the email field?

The UNIQUE field will probably be faster.
If you tell MySQL that a certain field is unique, it may perform some optimizations.
Additionally, if you want to insert the record if it isn't in the table already you might run into some concurrency issues. Assume there are two people trying to register with the same email address. Now, if you perform the uniqueness check yourself something like so:
bool exists = userAlreadyExists(email);
if (exists)
showWarning();
else
insertUser(email);
something like the following might happen:
User 1 executes userAlreadyExists("foo#example.com") // returns false
User 2 executes userAlreadyExists("foo#example.com") // returns false
User 1 executes insertUser("foo#example.com")
User 2 executes insertUser("foo#example.com") // which is now a duplicate
If you let MySQL perform the uniqueness check, the above won't happen.

If you check then update, you have to query the database twice. And its turn it will check the table index twice. You have both network overhead and database processing overhead.
My point of view is you have to be optimistic: update and handle gracefully the potential failure if there is some duplicate values.
The two-steps approach has one other drawback: don't forget there will be concurrent access to your database. Depending on you database setup (isolation level, database engine), there is a potential that DB was modified by an other connection between the SELECT and your UPDATE.

MySQL and PDO: Could PDO::lastInsertId theoretically fail?

I have been pondering on this for a while.
Consider a web application of huge proportions, where, let's say, millions of SQL queries are performed every second.
I run my code:
$q = $db->prepare('INSERT INTO Table
(First,Second,Third,Fourth)
VALUES (?,?,?,?)');
$q->execute(array($first,$second,$third,$fourth));
Then immediately after, I want to fetch the auto incremented ID of this last query:
$id = $db->lastInsertId();
Is it possible for lastInsertId to fail, i.e. fetch the ID of some SQL insert query that was executed between my two code blocks?
Secondary:
If it can fail, what would be the best way to plug this possible leak?
Would it be safer to create another SQL query to fetch the proper ID from the database, just to be sure?

It will always be safe provided that the PDO implementation is not doing something really bone-headed. The following is from the MySQL information on last_insert_id:
The ID that was generated is maintained in the server on a per-connection basis. This means that the value returned by the function to a given client is the first AUTO_INCREMENT value generated for most recent statement affecting an AUTO_INCREMENT column by that client. This value cannot be affected by other clients, even if they generate AUTO_INCREMENT values of their own. This behavior ensures that each client can retrieve its own ID without concern for the activity of other clients, and without the need for locks or transactions.

No. lastInsertId is per-connection, and doesn't require a request to the server - mysql always sends it back in its response packet.
So if the execute method doesn't throw an exception, then you are guaranteed to have the right value in lastInsertId.
It won't ever give you the insert ID of anything else, unless your query failed for some reason (e.g. invalid syntax) in which case it might give you the insert ID from the previous one on the same connection. But not anybody else's.

Logging query errors in MySQL

We have an application that uses several SQL queries and might at times generate the odd error.
For example, it could be a :
INSERT INTO TABLE (ID, FIELD) VALUES (1, "field value");
which would result in a:
ERROR 1062 (23000): Duplicate entry '1' for key 'PRIMARY'
Because the unique primary key constraint has been violated.
Is it possible in MySQL to somehow log the error along with the query that caused it? I have tried to enable the error-log and general-log in the /etc/mysql/my.cnf but it never produced the expected result. I could enable logging of every query without their errors (pretty useless for us, we're only interested in queries that result in errors).
The errors can be caught by the applications (in our case they are, we are using Perl DBI), however when there are several statements in a stored procedure then we do not know which one as the error message does not include the text of the query, or even the name of the table involved. This makes troubleshooting quite difficult.
I am sure I am missing something obvious. For example, in Oracle this is the default behavior, query errors are logged into a text file where they can be easily identified.

This is a client thing. Isolate database accesses in an access layer and generate the log on the client. The database cannot log this.

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

How can Mysql INSERT operations be optimized for possible duplicate entries? - mysql

Related

Should I handle ER_DUP_ENTRY before or after INSERT?

Using Unique Indices versus Querying the Database

MySQL perfomance: letting a UNIQUE field generate an error or manually checking it

MySQL and PDO: Could PDO::lastInsertId theoretically fail?

Logging query errors in MySQL

Categories

Resources