MySQL query synchronization/locking question - mysql

I have a quick question that I can't seem to find online, not sure I'm using the right wording or not.
Do MySql database automatically synchronize queries or coming in at around the same time? For example, if I send a query to insert something to a database at the same time another connection sends a query to select something from a database, does MySQL automatically lock the database while the insert is happening, and then unlock when it's done allowing the select query to access it?
Thanks

Do MySql databases automatically synchronize queries coming in at around the same time?
Yes.
Think of it this way: there's no such thing as simultaneous queries. MySQL always carries out one of them first, then the second one. (This isn't exactly true; the server is far more complex than that. But it robustly provides the illusion of sequential queries to us users.)
If, from one connection you issue a single INSERT query or a single UPDATE query, and from another connection you issue a SELECT, your SELECT will get consistent results. Those results will reflect the state of data either before or after the change, depending on which query went first.
You can even do stuff like this (read-modify-write operations) and maintain consistency.
UPDATE table
SET update_count = update_count + 1,
update_time = NOW()
WHERE id = something
If you must do several INSERT or UPDATE operations as if they were one, you'll need to use the InnoDB engine, and you'll need to use transactions. The transaction will block SELECT operations while it is in progress. Teaching you to use transactions is beyond the scope of a Stack Overflow answer.

The key to understanding how a modern database engine like InnoDB works is Multi-Version Concurrency Control or MVCC. This is how simultaneous operations can run in parallel and then get reconciled into a consistent "view" of the database when fully committed.
If you've ever used Git you know how you can have several updates to the same base happening in parallel but so long as they can all cleanly merge together there's no conflict. The database works like that as well, where you can begin a transaction, apply a bunch of operations, and commit it. Should those apply without conflict the commit is successful. If there's trouble the transaction is rolled back as if it never happened.
This ability to juggle multiple operations simultaneously is what makes a transaction-capable database engine really powerful. It's an important component necessary to meet the ACID standard.
MyISAM, the original engine from MySQL 3.0, doesn't have any of these features and locks the whole database on any INSERT operation to avoid conflict. It works like you thought it did.
When creating a database in MySQL you have your choice of engine, but using InnoDB should be your default. There's really no reason at all to use MyISAM as any of the interesting features of that engine (e.g. full-text indexes) have been ported over to InnoDB.

Related

Can I INSERT into table while UPDATING multiple different rows with MariaDB or MySQL?

I am creating a custom analytics system and currently in the database designing process. I'm planning to use MariaDB with the InnoDB engine to be able to handle big loads.
The data I'm expecting could be around 500k clicks/day. I will need to insert these rows into the database, which means that I'll have around 5.8 inserts/sec on average. However, at the same time, I want to record if someone visited a page associated with that click. (basically to record funnels)
So what I'm planning to do is to create additional columns and search for the ID of the specific row then update that column with the exact time of the visit.
My first question: is this generally a recommended approach to design the database like that? If not, how else is it worth to design the database?
My only concern is that while updating rows the Table will be locked, and can't do inserts, therefore slowing down the user experience.
My second question: is this something I should worry about, that the table gets locked while updating, and thus slowing down inserts? Does it hurt performance?
InnoDB doesn't lock the table for insert if you're performing the update. Your users won't experience any weird hanging.
It's an MVCC compliant engine, designed to handle concurrent access to underlying tables.
You can control the engine's behavior by choosing an appropriate isolation level, however the default (REPEATABLE READ) is excellent and does the job more than well.
If a table is being modified by multiple users (not users that connect to your site but connections established towards MySQL via a scripting language or some other service) and there's many inserts/updates/deletes - MySQL can throw an error saying a deadlock occurred.
A deadlock is a warning, not an error, that more than 1 thread tried to access an occupied resource (such as two threads tried to update the same row at the same time, but only 1 will be allowed to do so). It's an indication you should repeat the query.
I'm suggesting that you take care of all possible scenarios in the language of your choice when it comes to handling MySQL that's under heavier I/O.
~6 inserts a second isn't a lot, make sure you're allowing MySQL to access sufficient system resources. For InnoDB, check the value of innodb_buffer_pool_size or google a bit to see what it is and how to use it to make your database run fast.
Good luck!
At a mere 5.6/second, there won't be much problem.
I do, however, suggest vertical partitioning for "Likes", "Upvotes", "Clicks", and similar things. These tend to have a lot of UPDATEs of random single rows, and may interfere with other activity.
That is, have a separate table with (perhaps) just 2 columns:
The id of the item being Liked/Clicked/etc.
A counter.
It is simple enough (and fast enough) to JOIN via that id when you want to display info including the counter.
As already pointed out, the row is locked, not the table.

MySQL Transaction

It could be a dumb question, and tried to search for it and found nothing.
I been using mysql for years(not that to long) but i never had tried mysql transactions.
Now my question is, what would happen if i issue an insert or delete statement from multiple clients using transactions? does it would lock the table and prevent other client to perform there query?
what would happen if other client issue a transaction query while the other client still have unfinished transaction?
I appreciate for any help will come.
P.S. most likely i will use insert using a file or csv it could be a big chunk of data or just a small one.
MySQL automatically performs locking for single SQL statements to keep clients from interfering with each other, but this is not always sufficient to guarantee that a database operation achieves its intended result, because some operations are performed over the course of several statements. In this case, different clients might interfere with each other.
Source: http://www.informit.com/articles/article.aspx?p=2036581&seqNum=12

Mysql Lock times in slow query log

I have an application that has been running fine for quite awhile, but recently a couple of items have started popping up in the slow query log.
All the queries are complex and ugly multi join select statements that could use refactoring. I believe all of them have blobs, meaning they get written to disk. The part that gets me curious is why some of them have a lock time associated with them. None of the queries have any specific locking protocols set by the application. As far as I know, by default you can read against locks unless explicitly specified.
so my question: What scenarios would cause a select statement to have to wait for a lock (and thereby be reported in the slow query log)? Assume both INNODB and MYISAM environments.
Could the disk interaction be listed as some sort of lock time? If yes, is there documentation around that says this?
thanks in advance.
MyISAM will give you concurrency problems, an entire table is completely locked when an insert is in progress.
InnoDB should have no problems with reads, even while a write/transaction is in progress due to it's MVCC.
However, just because a query is showing up in the slow-query log doesn't mean the query is slow - how many seconds, how many records are being examined?
Put "EXPLAIN" in front of the query to get a breakdown of the examinations going on for the query.
here's a good resource for learning about EXPLAIN (outside of the excellent MySQL documentation about it)
I'm not certain about MySql, but I know that in SQL Server select statements do NOT read against locks. Doing so will allow you to read uncommitted data, and potentially see duplicate records or miss a record entirely. The reason for this is because if another process is writing to the table, the database engine may decide it's time to reorganize some data and shifts it around on disk. So it moves a record you already read to the end and you see it again, or it moves one from the end up higher where you've already past.
There's a guy on the net somewhere who actually wrote a couple of scripts to prove that this happens and I tried them once and it only took a few seconds before a duplicate showed up. Of course, he designed the scripts in a fashion that would make it more likely to happen, but it proves that it definitely can happen.
This is okay behaviour if your data doesn't need to be accurate and can certainly help prevent deadlocks. However, if you're working on an application dealing with something like people's money then that's very bad.
In SQL Server you can use the WITH NOLOCK hint to tell your select statement to ignore locks. I'm not sure what the equivalent in MySql would be but maybe someone else here will say.

mySQL Replication

We have an update process which currently takes over an hour and means that our DB is unusable during this period.
If I setup up replication would this solve the problem or would the replicated DB suffer from exactly the same problem that the tables would be locked during the update?
Is it possible to have the replicated DB prioritize reading over updating?
Thanks,
D
I suspect that with replication you're just going to be dupolicating the issue (unless most of the time is spent in CPU and only results in a couple of records being updated).
Without knowing a lot more about the scema, distribution and size of data and the update process its impossible to say how best to resolve the problem - but you might get some mileage out of using innodb instead of C-ISAM and making sure that the update is implemented as a number of discrete steps (e.g. using stored procuedures) rather than a single DML statement.
MySQL gives you the ability to run queries delaye. Example: "INSERT DELAYED INTO...", this will cause the query to only be executed when MYSQL has time to take the query.
Based on your input, it sounds like you are using MyISAM tables, MyISAM only support table-wide locking. That means that a single update will lock the whole database table until the query is completed. InnoDB on the other hand uses row locking, which will not cause SELECT queries to wait(hang) for updates to complete.
So you have the best chances of a better sysadmin life if you change to InnoDB :)
When it comes to replication it is pretty normal to seperate updates and selects to two different MySQL servers, and that does tend to work very well. But if you are using MyISAM tables and does a lot of updates, the locking issue itself will still be there.
So my 2 cents: First get rid of MyISAM, then consider replication or a better scaled MySQL server if the problem still exists. (The key for good performance in MySQL is to have at least the size of all indexes across all databases as physical RAM)

SQL Server / MySQL / Access - speeding up inserting many rows in an inefficient manner

SETUP
I have to insert a couple million rows in either SQL Server 2000/2005, MySQL, or Access. Unfortunately I don't have an easy way to use bulk insert or BCP or any of the other ways that a normal human would go about this. The inserts will happen on one particular database but that code needs to be db agnostic -- so I can't do bulk copy, or SELECT INTO, or BCP. I can however run specific queries before and after the inserts, depending on which database I'm importing to.
eg.
If IsSqlServer() Then
DisableTransactionLogging();
ElseIf IsMySQL() Then
DisableMySQLIndices();
End If
... do inserts ...
If IsSqlServer() Then
EnableTransactionLogging();
ElseIf IsMySQL() Then
EnableMySQLIndices();
End If
QUESTION
Are there any interesting things I can do to SQL Server that might speed up these inserts?
For example, is there a command I could issue to tell SQL Server, "Hey, don't bother recording these transactions in the transaction log".
Or maybe I could say, "Hey, I have a million rows coming in, so don't update your index until I'm totally finished".
ALTER INDEX [IX_TableIndex] ON Table DISABLE
... inserts
ALTER INDEX [IX_TableIndex] ON Table REBUILD
(Note: Above index disable only works on 2005, not 2000. Bonus points if you know a way to do this on 2000).
What about MySQL, and Access?
The single biggest thing that will kill performance here is the fact that (it sounds like) you're executing a million different INSERTs against the DB. Each INSERT is treated as a single operation. If you can do this as a single operation, then you will almost certainly have a huge performance improvement.
Both MySQL and SQL Server support 'selects' of constant expressions without a table name, so this should work as one statement:
INSERT INTO MyTable(ID, name)
SELECT 1, 'Fred'
UNION ALL SELECT 2, 'Wilma'
UNION ALL SELECT 3, 'Barney'
UNION ALL SELECT 4, 'Betty'
It's not clear to me if Access supports that, not having Access available. HOWEVER, Access does support constants in a SELECT, as far as I can tell, and you can coerce the above into ANSI SQL-92 (which should be supported by all 3 engines; it's about as close to 'DB agnostic' as you'll get) by just adding
FROM OneRowTable
to the end of every individual SELECT, where 'OneRowTable' is a table with just one row of dummy data.
This should let you insert a million rows of data in much much less than a million INSERT statements -- and things like index reshuffling will be done once, rather than a million times. You may have much less need for other optimisations after that.
is this a regular process or a one time event?
I have, in the past, just scripted out the current indexes, dropped them, inserted the rows, then just re-add the indexes.
The SQL Management Studio can script out the indexes from the right click menus...
For SQL Server:
You can set the recovery model to "Simple", so your transaction log will be kept small. Do not forget to set back afterwards.
Disabling the indexes is actually a good idea. This will work on SQL 2005, not on SQL Server 2000.
alter index [INDEX_NAME] on [TABLE_NAME] disable
And to enable
alter index [INDEX_NAME] on [TABLE_NAME] rebuild
And then just insert the rows one by one. You have to be patient, but at least it is somewhat faster.
If it is a one-time thing (or it happens often enough to justify automating this), also considering dropping/disabling all indexes, and then adding/reenabling them again when the insert it done
The trouble with setting the recovery model to simple is that it affects any other users entering data at the same time and thus will amke thier changes unrecoverable.
Samre thing with disabling the indexes, this disables for everyone and may make the database run slower than a slug.
Suggest you run the import in batches.
If this is not something that needs to be read terribly quickly, you can do an "Insert Delayed" into the table on MySQL. This allows your code to continue running without having to wait for the insert to actually happen. This does have some limitations, but if your primary concern is to get the program to finish quickly, this may help. Be warned that there is a nice long list of situations where this may not act as expected. Check the docs.
I do not know if this functionality works for Access or MS SQL, though.
Have you considered using the Factory pattern? I'm guessing you're writing the code for this, so if using the factory pattern you could code up a factory that returned a concrete "IDataInserter" type class that would do the work for.
This would still allow you to be data agnostic and get the fastest method for each type of database.
SQL Server 2000/2005, MySQL, and Access can all load directly from a tab / cr text file they just have different commands to do it. If you've got the case statement to determine which DB you're importing into just figure out their preference for importing a text file.
Can you use DTS (2000) or SSIS (2005) to build a package to do this? DTS and SSIS can both pull from the same source and pipe out to the different potential destinations. Go for SSIS if you can. There's a lot of good, fast technology in there along with functionality to embed the IsSQLServer, IsMySQL, etc. logic.
It's worth considering breaking your inserts into smaller batches; a single transaction with lots of queries will be slow.
You might consider using SQL's bulk-logged recovery model during your bulk insert.
http://msdn.microsoft.com/en-us/library/ms190422(SQL.90).aspx
http://msdn.microsoft.com/en-us/library/ms190203(SQL.90).aspx
You might also disable the indexes on the target table during your inserts.