concurrent mySql database queries - mysql

I'm running mySql server that being updated every 4 hours.
In the mean while data can be retrieved.
Does mySql handle this scenario where the DB is being updated
and a query from the user is received?
or I should handle this scenario?
Is it possible to create a snapshot of the DB just before the update takes place
and query this DB?
Thanks

If you are worried about how concurrent writes are handled, that's one thing, but you can safely read from a MySQL database that is being updated by another thread without worrying about corrupted data.
Of course, if the writing thread is performing multiple writes that need to be atomic (ie all or none), make sure he is using a transaction.

You can use transactions and/or locking to make sure that every user sees a consistent view (i.e. completely updated or not updated at all). The database engine itself is very well suited to the scenario. If you are not worried about what readers see at the exact time the updates are running, you don't need to do anything.

Considering my scenario I think that no further actions are required.
If the write comes before the read the user will get the most updated data
otherwise it does not, and he will wait for next query.
This fits my requirements.
Thank you all

Related

MySQL query synchronization/locking question

I have a quick question that I can't seem to find online, not sure I'm using the right wording or not.
Do MySql database automatically synchronize queries or coming in at around the same time? For example, if I send a query to insert something to a database at the same time another connection sends a query to select something from a database, does MySQL automatically lock the database while the insert is happening, and then unlock when it's done allowing the select query to access it?
Thanks
Do MySql databases automatically synchronize queries coming in at around the same time?
Yes.
Think of it this way: there's no such thing as simultaneous queries. MySQL always carries out one of them first, then the second one. (This isn't exactly true; the server is far more complex than that. But it robustly provides the illusion of sequential queries to us users.)
If, from one connection you issue a single INSERT query or a single UPDATE query, and from another connection you issue a SELECT, your SELECT will get consistent results. Those results will reflect the state of data either before or after the change, depending on which query went first.
You can even do stuff like this (read-modify-write operations) and maintain consistency.
UPDATE table
SET update_count = update_count + 1,
update_time = NOW()
WHERE id = something
If you must do several INSERT or UPDATE operations as if they were one, you'll need to use the InnoDB engine, and you'll need to use transactions. The transaction will block SELECT operations while it is in progress. Teaching you to use transactions is beyond the scope of a Stack Overflow answer.
The key to understanding how a modern database engine like InnoDB works is Multi-Version Concurrency Control or MVCC. This is how simultaneous operations can run in parallel and then get reconciled into a consistent "view" of the database when fully committed.
If you've ever used Git you know how you can have several updates to the same base happening in parallel but so long as they can all cleanly merge together there's no conflict. The database works like that as well, where you can begin a transaction, apply a bunch of operations, and commit it. Should those apply without conflict the commit is successful. If there's trouble the transaction is rolled back as if it never happened.
This ability to juggle multiple operations simultaneously is what makes a transaction-capable database engine really powerful. It's an important component necessary to meet the ACID standard.
MyISAM, the original engine from MySQL 3.0, doesn't have any of these features and locks the whole database on any INSERT operation to avoid conflict. It works like you thought it did.
When creating a database in MySQL you have your choice of engine, but using InnoDB should be your default. There's really no reason at all to use MyISAM as any of the interesting features of that engine (e.g. full-text indexes) have been ported over to InnoDB.

Many Individual Queries v. One Big One

I'm in a situation where an entire column in a table (used for user tokens) needs to be wiped, i.e., all user tokens are reset simultaneously. There are two ways of going about it: reset each user's token individually with a separate UPDATE query; or make one big query that affects all rows.
The advantage of one big query is that it will obviously be much faster, but I'm worried about the implications of a large UPDATE query when the database is big. Will requests that occur during the query be affected?
Afraid it's not that simple. Even if you enable dirty reads, running one big update has a lot of drawbacks:
long running transaction that updates one column will effectively block other insert, update and delete transactions.
long running transaction causes enourmous load on disk because server is having to write to a log file everything that is taking place so that you can roll back that huge transaction.
if a transaction fails, you would have to rerun it entirely, it is not restartable.
So if simultaneous requirement can be interpreted "in one batch that may take a while to run", I would opt for batching it. A good research write up on performance of DELETEs in MySql is here: http://mysql.rjweb.org/doc.php/deletebig, and I think most of the findings are applicable to UPDATE.
The trick will be finding the optimal "batch size".
Added benefits of batching is that you can make this process resilient to failures and restart-friendly.
The answer depends on the transaction and isolation level you've established.
You can set isolation to allow "dirty reads", "phantom reads", or force serialization of reads and writes.
However you do that UPDATE, you'll want it to be a single unit of work.
I'd recommend minimizing network latency and updating all the user tokens in one network roundtrip. This means either writing a single query or batching many into one request.

MySQL: How many UPDATES per second can an average box support?

We've got a constant stream of simple updates to a single MySQL table (storing user activity information). Let's say we group these into batch updates each second.
I want a ballpark idea of when mysql on a typical 4-core 8GB box will start having an issue keeping up with the updates coming in each second. E.g. how many rows of updates can I make # 1 per second?
This is a thought exercise to decide if I should get going with MySQL in the early days of our applications release (simplify development), or if MySQL's likely to bomb so soon as to make it not worth even venturing down that path.
The only way you can get a decent figure is through benchmarking your specific use case. There are just too many variables and there is no way around that.
It shouldn't take too long either if you just knock a bash script or a small demo app and hammer it with jmeter, then that can give you a good idea.
I used jmeter when trying to benchmark a similar use case. The difference was I was looking for write throughput for number of INSERTS. The most useful thing that came out when I was playing was the 'innodb_flush_log_at_trx_commit' param. If you are using INNODB and don't need ACID compliance for your use case, then changing it to 0. This makes a huge difference to INSERT throughput and will likely do the same in your UPDATE use case. Although note that with this setting, changes only get flushed to disk once per second, so if your server gets a power cut or something, you could lose a seconds worth of data.
On my Quad Core 8GB Machine for my use case:
innodb_flush_log_at_trx_commit=1 resulted in 80 INSERTS per second
innodb_flush_log_at_trx_commit=0 resulted in 2000 INSERTS per second
These figures will probably bear no relevance to your use case - which is why you need to benchmark it yourself.
A lot of it depends on the quality of the code which you use to push to the DB.
If you write your batch to insert a single value per INSERT request (i.e.,
INSERT INTO table (field) VALUES (value_1);
INSERT INTO table (field) VALUES (value_2);
...
INSERT INTO table (field) VALUES (value_n);
, your performance will crash and burn.
If you insert multiple values using a single INSERT (i.e.
INSERT INTO table (field) values (value_1),(value_2)...(value_n);
, you'll find that you could easily insert many records per second
As an example, I wrote a quick app which needed to add the details of a request for an LDAP account to a holding DB. Inserting one field at a time (i.e., LDAP_field, LDAP_value), execution of the whole script took 10's of seconds. When I concatenated the values into a single INSERT request, execution time of the script went down to about 2 seconds from start to finish. This included the overhead of starting and committing a transaction
Hope this helps
Its not easy to give a general answer to this question. The numbers you ask for rely heavily not only on the hardware of your database server, MySQL itself, but also on server/client configuration, network and - equally important - on your database/table design too.
Generally speaking, with a naked MySQL setup on a state-of-the-art server and update statements using unique keys, I don't have issues below 200 update-statementsp er second if I fire them from localhost, at least that's what I get on my six year old winxp test enviroment. A naked installation on a new system will scale this way higher. If you think way bigger, one server isn't the way to go. MySQL can be tweaked and scaled out in some ways, therefore many companies rely heavily on it.
Just some basics:
If the fields you want to update have huge index files, the update
statements are alot slower since each statement has to write not only
data, but also index informations.
If your update statement cannot
use an index, it might take longer for the server to allocate the
required fields it has to update.
Slow memory and/or slow harddisks
might also slow down overall server performance.
Slow network
connection slows down communication between client and server.
There are whole books written about it, so I'll stop here and advise some further reading, if you're interested!

Updating large quantities of data in a production database

I have a large quantity of data in a production database that I want to update with batches of data while the data in the table is still available for end user use. The updates could be insertion of new rows or updates of existing rows. The specific table is approximately 50M rows, and the updates will be between 100k - 1M rows per "batch". What I would like to do is insert replace with a low priority.. In other words, I want the database to kind of slowly do the batch import without impacting performance of other queries that are occurring concurrently to the same disk spindles. To complicate this, the update data is heavily indexed. 8 b-tree indexes across multiple columns to facilitate various lookup that adds quite a bit of overhead to the import.
I've thought about batching the inserts down into 1-2k record blocks, then having the external script that loads the data just pause for a couple seconds between each insert, but that's really kind of hokey IMHO. Plus, during a 1M record batch, I really don't want to add 500-1000 2second pauses to add 20-40 minutes of extra load time if its not needed. Anyone have ideas on a better way to do this?
I've dealt with a similar scenario using InnoDB and hundreds of millions of rows. Batching with a throttling mechanism is the way to go if you want to minimize risk to end users. I'd experiment with different pause times and see what works for you. With small batches you have the benefit that you can adjust accordingly. You might find that you don't need any pause if you run this all sequentially. If your end users are using more connections then they'll naturally get more resources.
If you're using MyISAM there's a LOW_PRIORITY option for UPDATE. If you're using InnoDB with replication be sure to check that it's not getting too far behind because of the extra load. Apparently it runs in a single thread and that turned out to be the bottleneck for us. Consequently we programmed our throttling mechanism to just check how far behind replication was and pause as needed.
An INSERT DELAYED might be what you need. From the linked documentation:
Each time that delayed_insert_limit rows are written, the handler checks whether any SELECT statements are still pending. If so, it permits these to execute before continuing.
Check this link: http://dev.mysql.com/doc/refman/5.0/en/server-status-variables.html What I would do is write a script that will execute your batch updates when MySQL is showing Threads_running or Connections under a certain number. Hopefully you have some sort of test server where you can determine what a good number threshold might be for either of those server variables. There are plenty of other of server status variables to look at in there also. Maybe control the executions by the Innodb_data_pending_writes number? Let us know what works for you, its an interesting question!

What is the best way to update (or replace) an entire database table on a live machine?

I'm being given a data source weekly that I'm going to parse and put into a database. The data will not change much from week to week, but I should be updating the database on a regular basis. Besides this weekly update, the data is static.
For now rebuilding the entire database isn't a problem, but eventually this database will be live and people could be querying the database while I'm rebuilding it. The amount of data isn't small (couple hundred megabytes), so it won't load that instantaneously, and personally I want a bit more of a foolproof system than "I hope no one queries while the database is in disarray."
I've thought of a few different ways of solving this problem, and was wondering what the best method would be. Here's my ideas so far:
Instead of replacing entire tables, query for the difference between my current database and what I want to place in the database. This seems like it could be an unnecessary amount of work, though.
Creating dummy data tables, then doing a table rename (or having the server code point towards the new data tables).
Just telling users that the site is going through maintenance and put the system offline for a few minutes. (This is not preferable for obvious reasons, but if it's far and away the best answer I'm willing to accept that.)
Thoughts?
I can't speak for MySQL, but PostgreSQL has transactional DDL. This is a wonderful feature, and means that your second option, loading new data into a dummy table and then executing a table rename, should work great. If you want to replace the table foo with foo_new, you only have to load the new data into foo_new and run a script to do the rename. This script should execute in its own transaction, so if something about the rename goes bad, both foo and foo_new will be left untouched when it rolls back.
The main problem with that approach is that it can get a little messy to handle foreign keys from other tables that key on foo. But at least you're guaranteed that your data will remain consistent.
A better approach in the long term, I think, is just to perform the updates on the data directly (your first option). Once again, you can stick all the updating in a single transaction, so you're guaranteed all-or-nothing semantics. Even better would be online updates, just updating the data directly as new information becomes available. This may not be an option for you if you need the results of someone else's batch job, but if you can do it, it's the best option.
BEGIN;
DELETE FROM TABLE;
INSERT INTO TABLE;
COMMIT;
Users will see the changeover instantly when you hit commit. Any queries started before the commit will run on the old data, anything afterwards will run on the new data. The database will actually clear the old table once the last user is done with it. Because everything is "static" (you're the only one who ever changes it, and only once a week), you don't have to worry about any lock issues or timeouts. For MySQL, this depends on InnoDB. PostgreSQL does it, and SQL Server calls it "snapshotting," and I can't remember the details off the top of my head since I rarely use the thing.
If you Google "transaction isolation" + the name of whatever database you're using, you'll find appropriate information.
We solved this problem by using PostgreSQL's table inheritance/constraints mechanism.
You create a trigger that auto-creates sub-tables partitioned based on a date field.
This article was the source I used.
Which database server are you using? SQL 2005 and above provides a locking method called "Snapshot". It allows you to open a transaction, do all of your updates, and then commit, all while users of the database continue to view the pre-transaction data. Normally, your transaction would lock your tables and block their queries, but snapshot locking would be perfect in your case.
More info here: http://blogs.msdn.com/craigfr/archive/2007/05/16/serializable-vs-snapshot-isolation-level.aspx
But it requires SQL Server, so if you're using something else....
Several database systems (since you didn't specify yours, I'll keep this general) do offer the SQL:2003 Standard statement called MERGE which will basically allow you to
insert new rows into a target table from a source which don't exist there yet
update existing rows in the target table based on new values from the source
optionally even delete rows from the target that don't show up in the import table anymore
SQL Server 2008 is the first Microsoft offering to have this statement - check out more here, here or here.
Other database system probably will have similar implementations - it's a SQL:2003 Standard statement after all.
Marc
Use different table names(mytable_[yyyy]_[wk]) and a view for providing you with a constant name(mytable). Once a new table is completely imported update your view so that it uses that table.