I have two transactions with shareable count down latch. Each transaction is performed in separate thread. I need such mechanism to reproduce "dirty read" situation.
The first transaction do update of "target entity" and flush changes without commit, then I set thread to waiting state (countDownLatch).
In the second transaction I fetch "target entity" and copy dirty field to another entity and do save. After save operation I make countDown to continue the first transaction and rollback it.
Below the code samples of transaction methods.
The first:
#Transactional
public void updatePrice() {
log.info("Start price updating. Thread '{}'", Thread.currentThread().getName());
Collection<Product> products = productRepository.findAll();
products.forEach(this::updatePrice);
productRepository.saveAll(products);
entityManager.flush();
log.info("Flash changes and wait for order transaction");
try {
latch.await();
} catch (InterruptedException e) {
log.error("Something wrong");
Thread.currentThread().interrupt();
}
log.error("Rollback changes");
throw new RuntimeException("Unexpected exception");
}
The second: I have main service that is bounded by transaction and wrapper service on it to count down the latch.
Wrapper:
public void doOrder(Long productId, Long amount) {
log.info("Start do order. Thread '{}'", Thread.currentThread().getName());
orderService.doOrder(productId, amount);
log.info("Order transaction committed");
latch.countDown();
log.info("Finish order process");
}
Main service:
#Transactional
public void doOrder(Long productId, Long amount) {
Product product = productRepository.findById(productId).get();
log.info("Get product");
Order order = Order.builder().price(product.getPrice()).amount(amount).productId(new Long(productId)).build();
orderRepository.save(order);
log.info("Save order");
}
So, at the line
orderRepository.save(order);
the thread has became frozen. I see only insert statement in the logs.
But if I remove relation between 'order' and 'product', the insert doesn't become frozen.
I guess that there is deadlock on 'product' and countDownLatch. But this issue occurs only with mysql jdbc. If I switch onto postgres jdbc, then no issue there.
PS:
The first transaction executes following queries:
select * from product;
update product set price = ...
The second one executes:
select * from product where id = ...;
insert into product_order(product_id, amount, price) values (...)
Related
#Test
public void transaction() throws Exception {
Connection conn = null;
PreparedStatement ps = null;
try {
String sql = "insert into `1` values(?, ?, ?, ?)";
conn = JDBCUtils.getConnection();
ps = conn.prepareStatement(sql);
conn.setAutoCommit(false);
for(int i = 1; i <= 10000; i++){
ps.setObject(1, i);
ps.setObject(2, 10.12345678);
ps.setObject(3, "num_" + i);
ps.setObject(4, "2021-12-24 19:00:00");
ps.addBatch();
}
ps.executeBatch();
ps.clearBatch();
conn.commit();
} catch (Exception e) {
conn.rollback();
e.printStackTrace();
}finally {
JDBCUtils.closeResources(conn, ps);
}
}
When setAutoCommit = true, local MySQL and distributed MySQL insert speeds are very slow.
When I set the transaction to commit manually, just like the code above, the local MySQL speed has increased a lot, but the insertion speed of distributed MySQL is still very slow.
Is there any additional parameters I need to set?
Setting parameters probably won't help (much).
There are a couple of reasons for the slowness:
With autocommit=true you are committing on every insert statement. That means the each new row must be written to disk before the database server returns the response to the client.
With autocommit=false there is still a client -> server -> client round trip for each insert statement. Those round trips add up to a significant amount of time.
One way to make this faster is to insert multiple rows with each insert statement, but that is messy because you would need to generate complex (multi-row) insert statements.
A better way is to use JDBC's batch feature to reduce the number of round-trips. For example:
PreparedStatement ps = c.prepareStatement("INSERT INTO employees VALUES (?, ?)");
ps.setString(1, "John");
ps.setString(2,"Doe");
ps.addBatch();
ps.clearParameters();
ps.setString(1, "Dave");
ps.setString(2,"Smith");
ps.addBatch();
ps.clearParameters();
int[] results = ps.executeBatch();
(Attribution: above code copied from this answer by #Tusc)
If that still isn't fast enough, you should get even better performance using MySQL's native bulk insert mechanism; e.g. load data infile; see High-speed inserts with MySQL
For completeness, I am adding this suggestion from #Wilson Hauck
"In your configuration [mysqld] section, innodb_change_buffer_max_size=50 # from 25 (percent) for improved INSERT rate per second. SHOW FULL PROCESSLIST; to monitor when the instance has completed adjustment, then do your inserts and put it back to 25 percent for typical processing speed."
This may increase the insert rate depending on your table and its indexes, and on the order in which you are inserting the rows.
But the flip-side is that you may be able to achieve the same speedup (or more!) by other means; e.g.
by sorting your input so that rows are inserted in index order, or
by dropping the indexes, inserting the records and then recreating the indexes.
You can read about the change buffer here and make your own judgements.
Using EF Core 2.2.6 and Pomelo.EntityFrameworkCore.MySql 2.2.6 (with MySqlConnector 0.59.2)). I have a model for UserData:
public class UserData
{
[DatabaseGenerated(DatabaseGeneratedOption.None)]
public ulong ID { get; private set; }
[Required]
public Dictionary<string, InventoryItem> Inventory { get; set; }
public UserData()
{
Data = new Dictionary<string, string>();
}
}
I have a REST method that can be called that will add items to the user inventory:
using (var transaction = context.Database.BeginTransaction())
{
UserData data = await context.UserData.FindAsync(userId);
// there is code here to detect duplicate entries/etc, but I've removed it for brevity
foreach (var item in items) data.Inventory.Add(item.ItemId, item);
context.UserData.Update(data);
await context.SaveChangesAsync();
transaction.Commit();
}
If two or more calls to this method are made with the same user id then I get concurrent accesses (despite the transaction). This causes the data to sometimes be incorrect. For example, if the inventory is empty and then two calls are made to add items simultaneously (item A and item B), sometimes the database will only contain either A or B, and not both. From logging it appears that it is possible for EF to read from the database while the other read/write is still occurring, causing the code to have the incorrect state of the inventory for when it tries to write back to the db. So I tried marking the isolation level as serializable.
using (var transaction = context.Database.BeginTransaction(System.Data.IsolationLevel.Serializable))
Now I sometimes see an exception:
MySql.Data.MySqlClient.MySqlException (0x80004005): Deadlock found when trying to get lock; try restarting transaction
I don't understand how this code could deadlock... Anyways, I tried to proceed by wrapping this whole thing in a try/catch, and retry:
public static async Task<ResponseError> AddUserItem(Controller controller, MyContext context, ulong userId, List<InventoryItem> items, int retry = 5)
{
ResponseError result = null;
try
{
using (var transaction = context.Database.BeginTransaction(System.Data.IsolationLevel.Serializable))
{
UserData data = await context.UserData.FindAsync(userId);
// there is code here to detect duplicate entries/etc, but I've removed it for brevity
foreach (var item in items) data.Inventory.Add(item.ItemId, item);
context.UserData.Update(data);
await context.SaveChangesAsync();
transaction.Commit();
}
}
catch (Exception e)
{
if (retry > 0)
{
await Task.Delay(SafeRandomGenerator(10, 500));
return await AddUserItem(controller, context, userId, items, retry--);
}
else
{
// store exception and return error
}
}
return result;
}
And now I am back to the data being sometimes correct, sometimes not. So I think the deadlock is another problem, but this is the only method accessing this data. So, I'm at a loss. Is there a simple way to read from the database (locking the row in the process) and then writing back (releasing the lock on write) using EF Core? I've looked at using concurrency tokens, but this seems overkill for what appears (on the surface to me) to be a trivial task.
I added logging for mysql connector as well as asp.net server and can see the following failure:
fail: Microsoft.EntityFrameworkCore.Database.Command[20102]
=> RequestId:0HLUD39EILP3R:00000001 RequestPath:/client/AddUserItem => Server.Controllers.ClientController.AddUserItem (ServerSoftware)
Failed executing DbCommand (78ms) [Parameters=[#p1='?' (DbType = UInt64), #p0='?' (Size = 4000)], CommandType='Text', CommandTimeout='30']
UPDATE `UserData` SET `Inventory` = #p0
WHERE `ID` = #p1;
SELECT ROW_COUNT();
A total hack is to just delay the arrival of the queries by a bit. This works because the client is most likely to generate these calls on load. Normally back-to-back calls aren't expected, so spreading them out in time by delaying on arrival works. However, I'd rather find a correct approach, since this just makes it less likely to be an issue:
ResponseError result = null;
await Task.Delay(SafeRandomGenerator(100, 500));
using (var transaction = context.Database.BeginTransaction(System.Data.IsolationLevel.Serializable))
// etc
This isn't a good answer, because it isn't what I wanted to do, but I'll post it here as it did solve my problem. My problem was that I was trying to read the database row, modify it in asp.net, and then write it back, all within a single transaction and while avoiding deadlocks. The backing field is JSON type, and MySQL provides some JSON functions to help modify that JSON directly in the database. This required me to write SQL statements directly instead of using EF, but it did work.
The first trick was to ensure I could create the row if it didn't exist, without requiring a transaction and lock.
INSERT INTO UserData VALUES ({0},'{{}}','{{}}') ON DUPLICATE KEY UPDATE ID = {0};
I used JSON_REMOVE to delete keys from the JSON field:
UPDATE UserData as S set S.Inventory = JSON_REMOVE(S.Inventory,{1}) WHERE S.ID = {0};
and JSON_SET to add/modify entries:
UPDATE UserData as S set S.Inventory = JSON_SET(S.Inventory,{1},CAST({2} as JSON)) WHERE S.ID = {0};
Note, if you're using EF Core and want to call this using FromSql then you need to return the entity as part of your SQL statement. So you'll need to add something like this to each SQL statement:
SELECT * from UserData where ID = {0} LIMIT 1;
Here is a full working example as an extension method:
public static async Task<UserData> FindOrCreateAsync(this IQueryable<UserData> table, ulong userId)
{
string sql = "INSERT INTO UserData VALUES ({0},'{{}}','{{}}') ON DUPLICATE KEY UPDATE ID = {0}; SELECT * FROM UserData WHERE ID={0} LIMIT 1;";
return await table.FromSql(sql, userId).SingleOrDefaultAsync();
}
public static async Task<UserData> JsonRemoveInventory(this DbSet<UserData> table, ulong userId, string key)
{
if (!key.StartsWith("$.")) key = $"$.\"{key}\"";
string sql = "UPDATE UserData as S set S.Inventory = JSON_REMOVE(S.Inventory,{1}) WHERE S.ID = {0}; SELECT * from UserData where ID = {0} LIMIT 1;";
return await table.AsNoTracking().FromSql(sql, userId, key).SingleOrDefaultAsync();
}
Usage:
var data = await context.UserData.FindOrCreateAsync(userId);
await context.UserData.JsonRemoveInventory(userId, itemId);
I have a game where I use (Spring, Hibernate and MySQL 5.7) in the back-end. At the end of each game, I execute a native query to bulk update balance of the winners. Most games the update executes successfully for all winners, but for a few games I face the problem of updating the balance for most of them except for one or two (random). The update doesn't throw an error and as said Most of the winners' balance is updated successfully. Here is the method in my DAO (I summed it up) with the native query:
#Override
public Integer addAmountToPlayerBalance(List<Long> playerIds, Double amount, Long gameId) {
try
{
StringBuilder queryNative = new StringBuilder();
queryNative.append("update game_player_user set balance = ifnull(balance,0) + :amount where id in (:playerIds)");
Session session = teleEM.unwrap(Session.class);
org.hibernate.Query query = session.createSQLQuery(queryNative.toString());
query.setParameter("amount", amount);
query.setParameterList("playerIds", playerIds);
int numOfUpdatedRecords = query.executeUpdate();
return numOfUpdatedRecords;
}
catch (NoResultException e)
{
return null;
}
}
Notes:
1) I added the code for returning the number of updated records a week ago but it hasn't occurred on production since then. but as I said, it happens occasionally.
2) The list of player Ids includes all of the winners even in the games where the issue appears.
3) The amount is double but rounded to 2 decimal places.
I've a table with a column that needs to be constantly recomputed and I want this table to be scallable. Users must be able to write on it as well.
It's difficult to test this type of things without having a server and concurrent users, at least I don't know how.
So is one of those two options viable ?
#ApplicationScoped
public class Abean {
#EJB
private MyService myService;
#Asynchronous
public void computeTheData(){
long i = 1;
long numberOfRows = myService.getCountRows(); // gives the number of row in the table
while(i<numberOfRows){
myService.updateMyRow(i);
}
computeTheData(); // recursion so it never stops, I'm wondering if this wouldn't spawn more threads and if it would be an issue.
}
}
public class MyService implements MyServiceInterface{
...
public void updateMyRows(int row){
Query query = em.createQuery("SELECT m FROM MyEntity WHERE m.id=:id");
Query.setParameter("id", row);
List<MyEntity> myEntities = (MyEntity) query.getResultList();
myEntity.computeData();
}
}
VS
#ApplicationScoped
public class Abean {
#EJB
private MyService myService;
#Asynchronous
public void computeTheData(){
myService.updateAllRows();
}
}
public class MyService implements MyServiceInterface{
...
public void updateAllRows(int page){
Query query = em.createQuery("SELECT m FROM MyEntity");
List<MyEntity> myEntities = (MyEntity) query.getResultList();
myEntity.computeData();
}
}
Is any of this viable ? I'm using mysql and the engine for tables is innoDB.
You should use pessimistic locking to lock modified rows before update, so that manual modifications by a user are not in conflict with background updates. If you did not use locking, your user's modifications would sometimes be rolled back, if they collide with background job having modified the same row.
Also, with pessimistic locking, your user may encounter rollback if her transaction waits to acquire the lock for longer than timeout happends. To prevent this, you should make all transactions, which use pessimistic locks, as short as possible. Therefore, the background job should create a new transaction for every row or small group of rows, if it may run longer than reasonable time. Locks are released only after the transaction finishes (User will wait until lock is released).
Example of how your MyService could look like, running every update in separate transaction (in reality, you may run multiple updates in batch in single transaction, passing list or range of ids as parameter to updateMyRows):
public class MyService implements MyServiceInterface{
...
#TransactionAttribute(TransactionAttributeType.REQUIRES_NEW) // this will create a new transaction when running this method from another bean, e.g. from Abean
public void updateMyRows(int row){
TypedQuery<MyEntity> query = em.createQuery(SELECT m FROM MyEntity WHERE m.id=:id", MyEntity.class);
query.setParameter("id", row);
query.setLockMode(LockModeType.PESSIMISTIC_WRITE); // this will lock all entities retrieved by the query
List<MyEntity> myEntities = query.getResultList();
if (!myEntities.isEmpty()) {
myEntities.get(0).computeData();
}
}
}
When you use only id in where condition, you may consider em.find(row, MyEntity.class, LockModeType.PESSIMISTIC_WRITE).computeData() instead of using a query (add null pointer check after em.find())
Other notes:
It is not clear from the question how you trigger the background job. Running the job infinitely, as you wrote in the example, would on one hand NOT create additional threads (as you call the methond on the same bean, annotations are not considered recursively). On the other hand, if there is an exception, your background job should at least handle exceptions so that it will not be stopped. You may also want to add some wait time between subsequent executions.
It is better to run background jobs as scheduled jobs. One possible option is #Schedule annotation instead of #Asynchronous. You may specify the frequency, in which the job will be executed in background. It is then good to check in the beginning of your job, whether previous execution is finished. Another option with Java EE 7 is to use ManagedScheduledExecutorService to trigger a background job periodically at specified interval.
Question on locking scope in SQL Server (SQL Azure to be precise).
Scenario
A bunch of records are selected using a select statements.
We loop through the records
Each record is updated within a transactionscope -
(each record is independent of the other and there is no need for a table lock)
Am I right in assuming that the above would result in a row level lock of just that particular record row?
Framing the question within the context of a concrete example.
In the below example would each item in itemsToMove be locked one at a time?
var itemsToMove = ObjectContext.Where(emp => emp.ExpirationDate < DateTime.Now)
foreach(Item expiredItem in itemsToMove)
{
bool tSuccess = false;
using (TransactionScope transaction = new TransactionScope())
{
try
{
//We push this to another table. In this case Azure Storage.
bool bSuccess = PushToBackup();
if(bSuccess)
{
ObjectContext.DeleteObject(expiredItem);
}
else
{
//throw an exception or return
return false;
}
ObjectContext.SaveChanges();
transaction.Complete();
tSuccess = true;
}
catch (Exception e)
{
return cResults;
}
}
}
if (tSuccess)
{
ObjectContext.AcceptAllChanges();
}
Provided that there isn't any outer / wrapper transaction calling your code, each call to transaction.Complete() should commit and release any locks.
Just a couple of quick caveats
SQL will not necessarily default to row level locking - it may use page level or higher locks (recommend that you leave SQL to its own devices, however)
Note that the default isolation level of a new TransactionScope() is read serializable. This might be too pessimistic for your scenario.