In our application, we are using SELECT FOR UPDATE statement to ensure locking for our entities from other threads. One of our original architects who implemented this logic put a comment in our wiki that MySQL has a limit of 200 for select for update statements. I could not find anything like this anywhere on the internet. Does anyone know if this is true and if so is there any way we can increase the limit?
The primary reason for SELECT FOR UPDATE is used is for Concurrency Prevention in the case when two users are currently trying to access the same data in the same time. If the users, however, try to update the data there will be a serious problem in the database.
In some Database Systems this problem can affect database integrity in a serious way. To help prevent concurrency problem, some Database Management Systems like SQL Server and MySQL use locking in most cases to prevent serious data integrity problems from occuring.
These locks delay the execution of the committed transaction if it conflicts the transaction that is already running.
In SQL Server or MySQL SELECT FOR UPDATE queries are used when the transaction is committed or rolled back.
In MySQL, however, the transaction records are allocated to individual MySQL servers for a minimum total number of transactions in the cluster.
MySQL uses high level datbase algorithm that makes up this formula:
TotalNoOfConcurrentTransactions = (maximum number of tables accessed in any single transaction + 1) * number of SQL nodes.
Each data node can handle TotalNoOfConcurrentTransactions / number of data nodes. Each and every Network Database (NDB) Cluster has 4 data nodes.
The result of the above formula is expressed as MaxNoOfConcurrentTransactions / 4.
In MySQL Documentation, they provided an example using 10 SQL nodes using a cluster in 10 tables in 11 transaction that resulted in 275 as MaxNoOfConcurrentTransactions.
LIMIT in SELECT FOR UPDATE is possibly used for number of rows affected during update.
I am not sure probably your architects made use of the figure above according to MySQL Documentation.
Please check the link below for more information.
https://dev.mysql.com/doc/refman/8.0/en/mysql-cluster-ndbd-definition.html#ndbparam-ndbd-maxnoofconcurrentoperations
Related
I'm using a 3rd party ETL application (Pentaho/ Kettle/ Spoon) --- so unfortunately I'm not sure of the exact SQL query, but I can try different manual queries.
I'm just wondering why ... MySQL seems to allow multiple processes at once do an insert, but if found, update ... queries.
MS SQL does not ... it "locks" the rows when one query is doing an insert/ update ... and throws an error if another query tries to insert/ update over the same data.
I guess this makes sense ... but I'm just a bit annoyed that MySQL allows this, and MS SQL does not.
Is there any way to get around this?
I just want as fast a way as possible to insert/ update a list of 1000 records into a data table. In the past I just divided this numbers into 20 processes updating 50 records doing insert/ updates ... this worked in parallel because none of the 1000 records are duplicate ... they are only some duplicates of them already in table ... so they can be inserted/ updated in any order, so long as it happens.
Any thoughts? Thanks
MySQL use the ISAM storage engine by default which does not support transactions. SQL Server is a RDBMS and supports transactions as you've observed though you can tweak the isolation levels to do risky things like read uncommitted (very rarely a good idea).
If you want your MySQL database to have transaction support, you need to explicitly create your table with the option ENGINE=INNODB. Older versions also support ENGINE=BDB which is the Berkeley Database engine. See MySQL docs for more details on InnoDB
http://dev.mysql.com/doc/refman/5.7/en/innodb-storage-engine.html
In oracle we can create a table and insert data and select it with parallel option.
Is there any similar option in mysql. I am migrating from oracle to mysql and my system has more select and less data change, so any option to select parallely is what i am seeking for.
eg: Lets consider my table has 1 million rows and if i use parallel(5) option then five threads are running the same query with limit and fetching approximately 200K each and as final result i get 1 million record in 1/5th of usual time.
In short, the answer is no.
The MySQL server is designed to execute concurrent user sessions in parallel, but not to execute one given user session in several parts in parallel.
This is a personal opinion, but I would refrain from wanting to apply optimizations up front, making assumptions about how the RDBMS works. Better measure the query first, and see if the response time is a real concern or not, and only then investigate possible optimizations.
"Premature optimization is the root of all evil." (Donald Knuth)
Queries within MySQL are always run parallel. If you want to run different queries simultaneously through your program, however, you would need to open different connections through workers that your program would have async access to.
You could also run tasks through creating events or using delayed inserts, however I don't think that applies very well here. Something else to consider:
Generally, some operations are guarded between individual query
sessions (called transactions). These are supported by InnoDB
backends, but not MyISAM tables (but it supports a concept called
atomic operations). There are various level of isolation which differ
in which operations are guarded from each other (and thus how
operations in one parallel transactions affect another) and in their
performance impact. - Holger Just
He also mentions the MySQL transcations page, which breifly goes over the different engine types available to MySQL (MyISAM being faster, but not as reliable):
MySQL Transcations
We recently switched our tables to use InnoDB (from MyISAM) specifically so we could take advantage of the ability to make updates to our database while still allowing SELECT queries to occur (i.e. by not locking the entire table for each INSERT)
We have a cycle that runs weekly and INSERTS approximately 100 million rows using "INSERT INTO ... ON DUPLICATE KEY UPDATE ..."
We are fairly pleased with the current update performance of around 2000 insert/updates per second.
However, while this process is running, we have observed that regular queries take very long.
For example, this took about 5 minutes to execute:
SELECT itemid FROM items WHERE itemid = 950768
(When the INSERTs are not happening, the above query takes several milliseconds.)
Is there any way to force SELECT queries to take a higher priority? Otherwise, are there any parameters that I could change in the MySQL configuration that would improve the performance?
We would ideally perform these updates when traffic is low, but anything more than a couple seconds per SELECT query would seem to defeat the purpose of being able to simultaneously update and read from the database. I am looking for any suggestions.
We are using Amazon's RDS as our MySQL server.
Thanks!
I imagine you have already solved this nearly a year later :) but I thought I would chime in. According to MySQL's documentation on internal locking (as opposed to explicit, user-initiated locking):
Table updates are given higher priority than table retrievals. Therefore, when a lock is released, the lock is made available to the requests in the write lock queue and then to the requests in the read lock queue. This ensures that updates to a table are not “starved” even if there is heavy SELECT activity for the table. However, if you have many updates for a table, SELECT statements wait until there are no more updates.
So it sounds like your SELECT is getting queued up until your inserts/updates finish (or at least there's a pause.) Information on altering that priority can be found on MySQL's Table Locking Issues page.
I have a table in SQL server database in which I am recording the latest activity time of users. Can somebody please confirm me that SQL server will automatically handle the scenario when multiple update requests received simultaneously for different users. I am expecting 25-50 concurrent update request on this table but each request is responsible for updating different rows in the table. Do i need something extra like connection pooling etc..?
Yes, Sql Server will handle this scenario.
It is a SGDB and it expects scenarios like this one.
When you insert/update/delete a row in Sql, sql will lock the table/row/page to garantee that you will be able to do what you want. This lock will be released when you are done inserting/updating/deleting the row.
Check this Link
And introduction-to-locking-in-sql-server
But there are a few thing you should do:
1 - Make sure you will do whatener you want fast. Because of the lock issue, if you stay connected for too long other requests to the same table may be locked until you are done and this can lead to a timeout.
2 - Always use a transaction.
3 - Make sure to adjust the fill factor of your indexes. Check Fill Factor on MSDN.
4 - Adjust the Isolation level according to what you want.
5 - Get rid of unused indexes to speed up your insert/update.
Connection pooling are not very related to your question. Connection pooling is a technique that avoid the extra overhead of creating new connections to the Database every time you send a request. In C# and other languages that uses ADO this is automatically done. Check this out: SQL Server Connection Pooling.
Other links that may be usefull:
best-practices-for-inserting-updating-large-amount-of-data-in-sql-2008
Speed Up Insert Performance
Hi I am developing a site with JSP/Servlets running on Tomcat for the front-end and with a MySql db for the backend which is accessed through JDBC.
Many users of the site can access and write to the database at the same time ,my question is :
Do i need to explicitly take locks before each write/read access to the db in my code?
OR Does Tomcat handle this for me?
Also do you have any suggestions on how best to implement this ? I have written a significant amount of JDBC code already without taking the locks :/
I think you are thinking about transactions when you say "locks". At the lowest level, your database server already ensure that parallel read writes won't corrupt your tables.
But if you want to ensure consistency across tables, you need to employ transactions. Simply put, what transactions provide you is an all-or-nothing guarantee. That is, if you want to insert a Order in one table and related OrderItems in another table, what you need is an assurance that if insertion of OrderItems fails (in step 2), the changes made to Order tables (step 1) will also get rolled back. This way you'll never end up in a situation where an row in Order table have no associated rows in Order items.
This, off-course, is a very simplified representation of what a transaction is. You should read more about it if you are serious about database programming.
In java, you usually do transactions by roughly with following steps:
Set autocommit to false on your jdbc connection
Do several insert and/or updates using the same connection
Call conn.commit() when all the insert/updates that goes together are done
If there is a problem somewhere during step 2, call conn.rollback()