MySQL - Is it possible to run multiple synchronous inserts? - mysql

I googled and searched on SO, but was not able to find an answer; maybe you could point me to some reference/docs?
This is more about understanding the way MySQL treats table contents while inserting.
I have a table (Myisam) which has an auto-increment primary key 'autoid'. I am using a simple script to insert 1000s+ of records. What I am trying to do is running multiple instances of this script (you can image it similar to accessing the script from different machines at same time).
Is MySql capable of distributing the auto-increment primary keys accordingly without any further action from my side or do I have to do some sort of table locking for each machine? Maybe I have to choose InnoDb over MyIsam?
What I am trying to achieve is: irrespective of how many machines are simultaneously triggering the script, all inserts should be completed without skipping any auto-increment id or throwing errors like "Duplicate Value for...".
Thanks a lot

The whole point of using a database is that can handle situations like this transactionally. So yes, this scenario works fine on every commonly used DBMS system, including MySQL.
How do you think the average forum would work with 50 users simultaneously posting replies to a topic, all from forked parallel Apache processes so possible only microseconds apart, or from multiple loadbalanced webservers?
Internally it just uses a mutex/semaphore like any other process when accessing and incrementing the shared resource (the autoincrement value of a table in this case) to mitigate the inherent race conditions.

Related

How does a lock work for two inserts in MySQL?

Let's say isolation level is Repeatable Read as it's really is as default for MySQL.
I have two inserts (no checking, no unique columns).
a) Let's say these two inserts happen at the same moment. What will happen? Will it first run the first insert and the second or both of them in different MySQL's threads?
b) Let's say I have insert statement and column called vehicle_id as unique, but before that, I check if it exists or not. If it doesn't exist, go on and insert. Let's say two threads in my code both come at the same moment. So they will both go into if statement since they happened at the same moment.
Now, they both have to do insert with the same vehicle_id. How does MySQL handle this? If it's asynchronous or something, maybe both inserts might happen so quickly that they will both get inserted even though vehicle_id was the same as unique field. If it's not asynchronous or something, one will get inserted first, second one waits. When one is done, second one goes and tries to insert, but it won't insert because of unique vehicle_id restriction. How does this situation work?
I am asking because locks in repeatable read for INSERT lose their essence. I know how it's going to work for Updating/Selecting.
As I understand it the situation is:
a) the threads are assigned for each connection. If both updates are received on the same connection then they will be executed in the same thread, one after the other according to the order in whcih they are received. If they're in different threads then it will be down to whichever thread is scheduled first and that's likely to be OS determined and non-deterministic from your point of view.
b) if a column is defined as UNIQUE at the server, then you cannot insert a second row with the same value so the second insert must fail.
Trying to use a conflicting index in the way you described appears to be an application logic problem, not a MySQL problem. Whatever entity is responsible for your unique ID's (which is your application in this case) it needs to ensure that they are unique. One approach is to implement an Application Lock using MySQL which allows applications running in isolation from each other to share a lock at the server. Check in the mysql docs for how to use this. It's usual use is intended to be application level - therefore not binding on the MySQL server. Another approach would be to use Uuids for unique keys and rely on their uniqueness when you need to create a new one.

Can I INSERT into table while UPDATING multiple different rows with MariaDB or MySQL?

I am creating a custom analytics system and currently in the database designing process. I'm planning to use MariaDB with the InnoDB engine to be able to handle big loads.
The data I'm expecting could be around 500k clicks/day. I will need to insert these rows into the database, which means that I'll have around 5.8 inserts/sec on average. However, at the same time, I want to record if someone visited a page associated with that click. (basically to record funnels)
So what I'm planning to do is to create additional columns and search for the ID of the specific row then update that column with the exact time of the visit.
My first question: is this generally a recommended approach to design the database like that? If not, how else is it worth to design the database?
My only concern is that while updating rows the Table will be locked, and can't do inserts, therefore slowing down the user experience.
My second question: is this something I should worry about, that the table gets locked while updating, and thus slowing down inserts? Does it hurt performance?
InnoDB doesn't lock the table for insert if you're performing the update. Your users won't experience any weird hanging.
It's an MVCC compliant engine, designed to handle concurrent access to underlying tables.
You can control the engine's behavior by choosing an appropriate isolation level, however the default (REPEATABLE READ) is excellent and does the job more than well.
If a table is being modified by multiple users (not users that connect to your site but connections established towards MySQL via a scripting language or some other service) and there's many inserts/updates/deletes - MySQL can throw an error saying a deadlock occurred.
A deadlock is a warning, not an error, that more than 1 thread tried to access an occupied resource (such as two threads tried to update the same row at the same time, but only 1 will be allowed to do so). It's an indication you should repeat the query.
I'm suggesting that you take care of all possible scenarios in the language of your choice when it comes to handling MySQL that's under heavier I/O.
~6 inserts a second isn't a lot, make sure you're allowing MySQL to access sufficient system resources. For InnoDB, check the value of innodb_buffer_pool_size or google a bit to see what it is and how to use it to make your database run fast.
Good luck!
At a mere 5.6/second, there won't be much problem.
I do, however, suggest vertical partitioning for "Likes", "Upvotes", "Clicks", and similar things. These tend to have a lot of UPDATEs of random single rows, and may interfere with other activity.
That is, have a separate table with (perhaps) just 2 columns:
The id of the item being Liked/Clicked/etc.
A counter.
It is simple enough (and fast enough) to JOIN via that id when you want to display info including the counter.
As already pointed out, the row is locked, not the table.

Relying on MySQL features vs my script

I've always relied on my PHP programming for most processes which I need to do, that I know can be done via a MySQL query or feature. For example:
I know that MySQL has a FOREIGN KEY feature that helps maintain data integrity but I don't rely on MySQL. I might as well make my scripts do this as it is more flexible; I'm basically using MySQL as STORAGE and my SCRIPTS as the processor.
I would like to keep things that way, put most of the load on my coding. I make sure that my scripts are robust to check for conflicts, orphaned rows, etc every time it makes changes and I even have a SYSTEM CHECK routine that runs through all these data verification processes so I really try to do everything on script side as long as it doesn't impact the whole thing's performance significantly (since I know MySQL can do things faster internally I mean I do use MySQL COUNT() functions of course).
Of course any direct changes done to the tables will not trigger routines in my script. but that's a different story. I'm pretty comfortable with doing this and I plan to keep doing this until I am convinced otherwise.
The only thing that I really have an issue with right now is, checking for duplicates.
My current routine is basically inserting products with serial numbers. I need to make sure that there are no duplicate serial numbers entered into the database.
I can simply rely on MySQL UNIQUE constraint to make sure of this or I can do it script side and this is what I did.
This product routine is a BATCH routine where anything from 1 to 500 products will be entered into the database at one call to the script.
Obviously I check for both duplicate entries in the data submitted as well as the data in the database. Here's a chunk of my routine
for ($i = 1; $i <= $qty; $i++) {
//
$serial = $serials_array[$i - 1]; // -1 coz arrays start at zero
//check duplicates in submitted data ++++++++++++++++++++++++++
if($serial_check[$serial] == 1) { // duplicate found!
exit("stat=err&statMsg=Duplicate serial found in your entry! ($serial)");
}else{
$serial_check[$serial] = 1;
}
//check duplicates in database
if(db_checkRow("inventory_stocks", "WHERE serial='$serial'"))exit("stat=err&statMsg=Serial Number is already used. ($serial)");
//++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
}
OK so basically it's:
1) Check submitted data for duplicates via creating an array that I can check against each serial number submitted - THIS IS no problem and really fast with PHP even up to 1000 records.
2) But, to check the database for duplicates, I have to call a function I made (db_checkRow) w/c basically issues a SELECT statement on EACH serial submitted and see if there's a hit/duplicate.
So, basically, 500 SELECT statements to check for duplicates vs just the MySQL unique constraint feature.
Does it really matter much??
Another reason I design my software like this is because at least if I need to deploy my stuff on a different database I don't rely too much on database features, hence I can easily port my application with very little tweaking.
It's almost guaranteed that MySQL will be faster at checking duplicates. Unless you are running your PHP on some uber-machine and the MySQL is running on an old wristwatch the index checking will be faster and better optimized than anything you can do via PHP.
Not to mention that your process is fine until someone else (or some other app) starts writing to the db. You can save yourself having to write the duplicate checking code in the first place - and again in the next app - and so on.
You're wrong. You're very, dangerously wrong.
The database has been designed for a specific function. You will never beat MySQL at enforcing a unique constraint. The database has been designed to do explicitly that as quickly as possible. It is impossible that you can do it quicker or more efficiently in PHP as you still need to access the database to determine whether the data you're inserting would be a duplicate.
This is easily demonstrated by the fact that you have 500 select statements to enforce a single unique constraint. As your table grows this will get even more ridiculous. What happens when your table hits 2,000 rows? What if you have a new table with a million rows?
Use the database features that have been designed explicitly to make your life easy.
You're also assuming that the only way the database will be accessed is through the application. This is an extremely dangerous assumption that is almost certain to be incorrect as time progresses.
Please read this programmers question, which seems like it's been written just for you. Simply put, “Never do in code what you can get the SQL server to do well for you”. I cannot emphasise this enough.

Preventing duplicate database inserts/updates in our Rails app from simultaneous transactions

As our Rails application deals with increasing user activity and load, we're starting to see some issues with simultaneous transactions. We've used JavaScript to disable / remove the buttons after clicks, and this works for the most part, but isn't an ideal solution. In short, users are performing an action multiple times in rapid succession. Because the action results in a row insert into the DB, we can't just lock one row in the table. Given the high level of activity on the affected models, I can't use the usual locking mechanims ( http://guides.rubyonrails.org/active_record_querying.html#locking-records-for-update ) that you would use for an update.
This question ( Prevent simultaneous transactions in a web application ) addresses a similar issue, but it uses file locking (flock) to provide a solution, so this won't work with multiple application servers, as we have. We could do something similar I suppose with Redis or another data store that is available to all of our application servers, but I don't know if this really solves the problem fully either.
What is the best way to prevent duplicate database inserts from simultaneously executed transactions?
Try adding a unique index to the table where you are having the issue. It won't prevent the system from attempting to insert duplicate data, but it will prevent it from getting stored in the database. You will just need to handle the insert when it fails.

scaleability of failing transactions mysql

I have a table that stores messages from one user to another. messages(user_id,friend_id,message,created_date). My primary key is (friend_id,created_date). This prevents duplicate messages (AFAIK) because they will fail to insert.
Right now this is ok because my code generates about 20 of these queries at a time per user and I only have one user. But if there were hundreds or thousands of users would this create a bottleneck in my database with all the failed transactions? And if what kinds of things could I do to improve the situation?
EDIT:
The boiled down question is should I use the primary key constraint,check outside of mysql, or use some other mysql functionality to keep duplicates out of the database?
Should be fine as mysql will just do a primary key lookup internally and ignore the record (I'm assuming you're using INSERT IGNORE). If you were checking if they exist before inserting, mysql will still check again when you insert. This means if most inserts are going to succeed, then you're saving an extra check. If the vast majority of inserts were failing (not likely) then possibly the savings from not sending unnecessary data would outweigh the occasional repeated check.