How to Bulk update in HIbernate - mysql

I need to update multiple rows in my MySQL database using Hibernate. I have done this using JDBC where we have the support of batched Query. I want something like this in hibernate.
Does hibernate support batched Query?
Batched Query Example in jdbc:
// Create statement object
Statement stmt = conn.createStatement();
String SQL = "INSERT INTO Employees (id, first, last, age) " +
"VALUES(200,'Zia', 'Ali', 30)";
// Add above SQL statement in the batch.
stmt.addBatch(SQL);
// Create one more SQL statement
String SQL = "INSERT INTO Employees (id, first, last, age) " +
"VALUES(201,'Raj', 'Kumar', 35)";
// Add above SQL statement in the batch.
stmt.addBatch(SQL);
int[] count = stmt.executeBatch();
Now when we issue stmt.executeBatch call Both Sql Query will be executed in a single jdbc round trip.

You may check the Hibernate documentation. Hibernate has some configuration properties that control (or disable) the use of JDBC batching.
If you issue the same INSERT multiple times and your entity does not use an identity generator, Hibernate will use JDBC batching transparently.
The configuration must enable the use of JDBC batching. Batching is disabled by default.
Configuring the Hibernate
The hibernate.jdbc.batch_size property defines the number of statements that Hibernate will batch before asking the driver to execute the batch. Zero or a negative number will disable the batching.
You can define a global configuration, e.g. in the persistence.xml, or define a session-specific configuration. To configure the session, you can use code like the following
entityManager
.unwrap( Session.class )
.setJdbcBatchSize( 10 );
Using the JDBC batching
As mentioned before, Hibernate call the JDBC batching transparently. If you wanna control the batching, you can use the flush() and clear() methods in the session.
The following is an example from the Documentation. It calls flush() and clear() when the number of insertions reach a batchSize value. It works efficiently if batchSize is lesser or equal than the configured hibernate.jdbc.batch_size.
EntityManager entityManager = null;
EntityTransaction txn = null;
try {
entityManager = entityManagerFactory().createEntityManager();
txn = entityManager.getTransaction();
txn.begin();
// define a batch size lesser or equal than the JDBC batching size
int batchSize = 25;
for ( int i = 0; i < entityCount; ++i ) {
Person Person = new Person( String.format( "Person %d", i ) );
entityManager.persist( Person );
if ( i > 0 && i % batchSize == 0 ) {
//flush a batch of inserts and release memory
entityManager.flush();
entityManager.clear();
}
}
txn.commit();
} catch (RuntimeException e) {
if ( txn != null && txn.isActive()) txn.rollback();
throw e;
} finally {
if (entityManager != null) {
entityManager.close();
}
}

Related

Distributed database insertion speed is very slow

#Test
public void transaction() throws Exception {
Connection conn = null;
PreparedStatement ps = null;
try {
String sql = "insert into `1` values(?, ?, ?, ?)";
conn = JDBCUtils.getConnection();
ps = conn.prepareStatement(sql);
conn.setAutoCommit(false);
for(int i = 1; i <= 10000; i++){
ps.setObject(1, i);
ps.setObject(2, 10.12345678);
ps.setObject(3, "num_" + i);
ps.setObject(4, "2021-12-24 19:00:00");
ps.addBatch();
}
ps.executeBatch();
ps.clearBatch();
conn.commit();
} catch (Exception e) {
conn.rollback();
e.printStackTrace();
}finally {
JDBCUtils.closeResources(conn, ps);
}
}
When setAutoCommit = true, local MySQL and distributed MySQL insert speeds are very slow.
When I set the transaction to commit manually, just like the code above, the local MySQL speed has increased a lot, but the insertion speed of distributed MySQL is still very slow.
Is there any additional parameters I need to set?
Setting parameters probably won't help (much).
There are a couple of reasons for the slowness:
With autocommit=true you are committing on every insert statement. That means the each new row must be written to disk before the database server returns the response to the client.
With autocommit=false there is still a client -> server -> client round trip for each insert statement. Those round trips add up to a significant amount of time.
One way to make this faster is to insert multiple rows with each insert statement, but that is messy because you would need to generate complex (multi-row) insert statements.
A better way is to use JDBC's batch feature to reduce the number of round-trips. For example:
PreparedStatement ps = c.prepareStatement("INSERT INTO employees VALUES (?, ?)");
ps.setString(1, "John");
ps.setString(2,"Doe");
ps.addBatch();
ps.clearParameters();
ps.setString(1, "Dave");
ps.setString(2,"Smith");
ps.addBatch();
ps.clearParameters();
int[] results = ps.executeBatch();
(Attribution: above code copied from this answer by #Tusc)
If that still isn't fast enough, you should get even better performance using MySQL's native bulk insert mechanism; e.g. load data infile; see High-speed inserts with MySQL
For completeness, I am adding this suggestion from #Wilson Hauck
"In your configuration [mysqld] section, innodb_change_buffer_max_size=50 # from 25 (percent) for improved INSERT rate per second. SHOW FULL PROCESSLIST; to monitor when the instance has completed adjustment, then do your inserts and put it back to 25 percent for typical processing speed."
This may increase the insert rate depending on your table and its indexes, and on the order in which you are inserting the rows.
But the flip-side is that you may be able to achieve the same speedup (or more!) by other means; e.g.
by sorting your input so that rows are inserted in index order, or
by dropping the indexes, inserting the records and then recreating the indexes.
You can read about the change buffer here and make your own judgements.

Concurrent Read/Write MySQL EF Core

Using EF Core 2.2.6 and Pomelo.EntityFrameworkCore.MySql 2.2.6 (with MySqlConnector 0.59.2)). I have a model for UserData:
public class UserData
{
[DatabaseGenerated(DatabaseGeneratedOption.None)]
public ulong ID { get; private set; }
[Required]
public Dictionary<string, InventoryItem> Inventory { get; set; }
public UserData()
{
Data = new Dictionary<string, string>();
}
}
I have a REST method that can be called that will add items to the user inventory:
using (var transaction = context.Database.BeginTransaction())
{
UserData data = await context.UserData.FindAsync(userId);
// there is code here to detect duplicate entries/etc, but I've removed it for brevity
foreach (var item in items) data.Inventory.Add(item.ItemId, item);
context.UserData.Update(data);
await context.SaveChangesAsync();
transaction.Commit();
}
If two or more calls to this method are made with the same user id then I get concurrent accesses (despite the transaction). This causes the data to sometimes be incorrect. For example, if the inventory is empty and then two calls are made to add items simultaneously (item A and item B), sometimes the database will only contain either A or B, and not both. From logging it appears that it is possible for EF to read from the database while the other read/write is still occurring, causing the code to have the incorrect state of the inventory for when it tries to write back to the db. So I tried marking the isolation level as serializable.
using (var transaction = context.Database.BeginTransaction(System.Data.IsolationLevel.Serializable))
Now I sometimes see an exception:
MySql.Data.MySqlClient.MySqlException (0x80004005): Deadlock found when trying to get lock; try restarting transaction
I don't understand how this code could deadlock... Anyways, I tried to proceed by wrapping this whole thing in a try/catch, and retry:
public static async Task<ResponseError> AddUserItem(Controller controller, MyContext context, ulong userId, List<InventoryItem> items, int retry = 5)
{
ResponseError result = null;
try
{
using (var transaction = context.Database.BeginTransaction(System.Data.IsolationLevel.Serializable))
{
UserData data = await context.UserData.FindAsync(userId);
// there is code here to detect duplicate entries/etc, but I've removed it for brevity
foreach (var item in items) data.Inventory.Add(item.ItemId, item);
context.UserData.Update(data);
await context.SaveChangesAsync();
transaction.Commit();
}
}
catch (Exception e)
{
if (retry > 0)
{
await Task.Delay(SafeRandomGenerator(10, 500));
return await AddUserItem(controller, context, userId, items, retry--);
}
else
{
// store exception and return error
}
}
return result;
}
And now I am back to the data being sometimes correct, sometimes not. So I think the deadlock is another problem, but this is the only method accessing this data. So, I'm at a loss. Is there a simple way to read from the database (locking the row in the process) and then writing back (releasing the lock on write) using EF Core? I've looked at using concurrency tokens, but this seems overkill for what appears (on the surface to me) to be a trivial task.
I added logging for mysql connector as well as asp.net server and can see the following failure:
fail: Microsoft.EntityFrameworkCore.Database.Command[20102]
=> RequestId:0HLUD39EILP3R:00000001 RequestPath:/client/AddUserItem => Server.Controllers.ClientController.AddUserItem (ServerSoftware)
Failed executing DbCommand (78ms) [Parameters=[#p1='?' (DbType = UInt64), #p0='?' (Size = 4000)], CommandType='Text', CommandTimeout='30']
UPDATE `UserData` SET `Inventory` = #p0
WHERE `ID` = #p1;
SELECT ROW_COUNT();
A total hack is to just delay the arrival of the queries by a bit. This works because the client is most likely to generate these calls on load. Normally back-to-back calls aren't expected, so spreading them out in time by delaying on arrival works. However, I'd rather find a correct approach, since this just makes it less likely to be an issue:
ResponseError result = null;
await Task.Delay(SafeRandomGenerator(100, 500));
using (var transaction = context.Database.BeginTransaction(System.Data.IsolationLevel.Serializable))
// etc
This isn't a good answer, because it isn't what I wanted to do, but I'll post it here as it did solve my problem. My problem was that I was trying to read the database row, modify it in asp.net, and then write it back, all within a single transaction and while avoiding deadlocks. The backing field is JSON type, and MySQL provides some JSON functions to help modify that JSON directly in the database. This required me to write SQL statements directly instead of using EF, but it did work.
The first trick was to ensure I could create the row if it didn't exist, without requiring a transaction and lock.
INSERT INTO UserData VALUES ({0},'{{}}','{{}}') ON DUPLICATE KEY UPDATE ID = {0};
I used JSON_REMOVE to delete keys from the JSON field:
UPDATE UserData as S set S.Inventory = JSON_REMOVE(S.Inventory,{1}) WHERE S.ID = {0};
and JSON_SET to add/modify entries:
UPDATE UserData as S set S.Inventory = JSON_SET(S.Inventory,{1},CAST({2} as JSON)) WHERE S.ID = {0};
Note, if you're using EF Core and want to call this using FromSql then you need to return the entity as part of your SQL statement. So you'll need to add something like this to each SQL statement:
SELECT * from UserData where ID = {0} LIMIT 1;
Here is a full working example as an extension method:
public static async Task<UserData> FindOrCreateAsync(this IQueryable<UserData> table, ulong userId)
{
string sql = "INSERT INTO UserData VALUES ({0},'{{}}','{{}}') ON DUPLICATE KEY UPDATE ID = {0}; SELECT * FROM UserData WHERE ID={0} LIMIT 1;";
return await table.FromSql(sql, userId).SingleOrDefaultAsync();
}
public static async Task<UserData> JsonRemoveInventory(this DbSet<UserData> table, ulong userId, string key)
{
if (!key.StartsWith("$.")) key = $"$.\"{key}\"";
string sql = "UPDATE UserData as S set S.Inventory = JSON_REMOVE(S.Inventory,{1}) WHERE S.ID = {0}; SELECT * from UserData where ID = {0} LIMIT 1;";
return await table.AsNoTracking().FromSql(sql, userId, key).SingleOrDefaultAsync();
}
Usage:
var data = await context.UserData.FindOrCreateAsync(userId);
await context.UserData.JsonRemoveInventory(userId, itemId);

Does MySQL has concurrent control on generating auto-increment value?

My colleague provides me a code segement that simulates Oracle's sequence:
// generate ticket
pstmt = conn.prepareStatement( "insert seq_pkgid values (NULL);" );
if(pstmt.executeUpdate() > 1) {
success = 1;
} else {
throw new Exception("Generating seq_pkgid sequence failed!");
}
pstmt.close();
pstmt = null;
// get ticket
pstmt = conn.prepareStatement( "select last_insert_id() as maxid" );
rs = pstmt.executeQuery();
if( rs.next() ) {
nSeq = rs.getInt( "maxid" );
}
rs.close();
rs = null;
pstmt.close();
pstmt = null;
But I wonder what if this code segment executed from 2 instances about the same time. Will they get same generated auto-increment value? Does MySQL has concurrent control, e.g. critical section or semaphore, when generating a new auto-increment value?
Yes ! If the column has AUTO_INCREMENT in column definition, MySQL will have an Auto Increment Lock on the column. Please refer
https://dev.mysql.com/doc/refman/5.7/en/innodb-auto-increment-handling.html
If you really need to generate a counter from DB, you can read here how to do on MySQL using InnodDB and Locking Reads.
https://dev.mysql.com/doc/refman/5.7/en/innodb-locking-reads.html
Buy I'm wondering if you really need an autoincrement field or just a uniq identifier for the object you want to store; in this case maybe a UUID() is just fine:
https://dev.mysql.com/doc/refman/5.7/en/miscellaneous-functions.html#function_uuid

Update table on mysql after a bigdecimal is declared

I have the following work on my application, in which I am trying to update the value total on my mysql database table called "porcobrar2012". However, the only value that gets updated is the last one generated in the while loop. Why? all values are been printout on the screen with no problem, but those values are not getting updated in the database.
Here is the code:
BigDecimal total = new BigDecimal("0");
try
{
//Class.forName("sun.jdbc.odbc.JdbcOdbcDriver");
//Connection connection=DriverManager.getConnection("jdbc:odbc:db1","","");
Class.forName("com.mysql.jdbc.Driver").newInstance();
Connection connection=DriverManager.getConnection("jdbc:mysql://localhost/etoolsco_VecinetSM?user=etoolsco&password=g7Xm2heD41");
Statement statement=connection.createStatement();
String query;
query="SELECT * FROM porcobrar2012";
ResultSet resultSet=statement.executeQuery(query);
while(resultSet.next())
{
out.println(resultSet.getString(2)+"");out.println(resultSet.getBigDecimal(3)+"");out.println(resultSet.getBigDecimal(4)+"");out.println(resultSet.getBigDecimal(5)+"");out.println(resultSet.getBigDecimal(6)+"");out.println(resultSet.getBigDecimal(7)+"");out.println(resultSet.getBigDecimal(8)+"");out.println(resultSet.getBigDecimal(9)+"");out.println(resultSet.getBigDecimal(10)+"");out.println(resultSet.getBigDecimal(11)+"");out.println(resultSet.getBigDecimal(12)+"");out.println(resultSet.getBigDecimal(13)+"")out.println(resultSet.getBigDecimal(14)+"");out.println(resultSet.getBigDecimal(15)+"");
total = resultSet.getBigDecimal(3).add(resultSet.getBigDecimal(4)).add(resultSet.getBigDecimal(5)).add(resultSet.getBigDecimal(6)).add(resultSet.getBigDecimal(7)).add(resultSet.getBigDecimal(8)).add(resultSet.getBigDecimal(9)).add(resultSet.getBigDecimal(10)).add(resultSet.getBigDecimal(11)).add(resultSet.getBigDecimal(12)).add(resultSet.getBigDecimal(13)).add(resultSet.getBigDecimal(14)).add(resultSet.getBigDecimal(15));
String query1;
query1="UPDATE porcobrar2012 SET total=total";
PreparedStatement ps = connection.prepareStatement(query1);
ps.executeUpdate();
out.println(total);
}
connection.close();
statement.close();
}
catch (Exception e)
{
//e.printStackTrace();
out.println(e.toString());
}
It's because the update closes the existing result set. But I would ask why you aren't doing the addition in a single UPDATE statement without any prior query, at the database, no loops, no BigDecimals. Rule one of database programming is 'don't move the data further than you need to'. It would be many times as efficient to just write "UPDATE porcobrar2012 SET a=b+c+d+...". And you can remove the Class.forName() call too: it hasn't been required for years.

Linq to SQL concurrency problem

Hallo,
I have web service that has multiple methods that can be called. Each time one of these methods is called I am logging the call to a statistics database so we know how many times each method is called each month and the average process time.
Each time I log statistic data I first check the database to see if that method for the current month already exists, if not the row is created and added. If it already exists I update the needed columns to the database.
My problem is that sometimes when I update a row I get the "Row not found or changed" exception and yes I know it is because the row has been modified since I read it.
To solve this I have tried using the following without success:
Use using around my datacontext.
Use using around a TransactionScope.
Use a mutex, this doesn’t work because the web service is (not sure I am calling it the right think) replicated out on different PC for performance but still using the same database.
Resolve concurrency conflict in the exception, this doesn’t work because I need to get the new database value and add a value to it.
Below I have added the code used to log the statistics data. Any help would be appreciated very much.
public class StatisticsGateway : IStatisticsGateway
{
#region member variables
private StatisticsDataContext db;
#endregion
#region Singleton
[ThreadStatic]
private static IStatisticsGateway instance;
[ThreadStatic]
private static DateTime lastEntryTime = DateTime.MinValue;
public static IStatisticsGateway Instance
{
get
{
if (!lastEntryTime.Equals(OperationState.EntryTime) || instance == null)
{
instance = new StatisticsGateway();
lastEntryTime = OperationState.EntryTime;
}
return instance;
}
}
#endregion
#region constructor / initialize
private StatisticsGateway()
{
var configurationAppSettings = new System.Configuration.AppSettingsReader();
var connectionString = ((string)(configurationAppSettings.GetValue("sqlConnection1.ConnectionString", typeof(string))));
db = new StatisticsDataContext(connectionString);
}
#endregion
#region IStatisticsGateway members
public void AddStatisticRecord(StatisticRecord record)
{
using (db)
{
var existing = db.Statistics.SingleOrDefault(p => p.MethodName == record.MethodName &&
p.CountryID == record.CountryID &&
p.TokenType == record.TokenType &&
p.Year == record.Year &&
p.Month == record.Month);
if (existing == null)
{
//Add new row
this.AddNewRecord(record);
return;
}
//Update
existing.Count += record.Count;
existing.TotalTimeValue += record.TotalTimeValue;
db.SubmitChanges();
}
}
I would suggest letting SQL Server deal with the concurrency.
Here's how:
Create a stored procedure that accepts your log values (method name, month/date, and execution statistics) as arguments.
In the stored procedure, before anything else, get an application lock as described here, and here. Now you can be sure only one instance of the stored procedure will be running at once. (Disclaimer! I have not tried sp_getapplock myself. Just saying. But it seems fairly straightforward, given all the examples out there on the interwebs.)
Next, in the stored procedure, query the log table for a current-month's entry for the method to determine whether to insert or update, and then do the insert or update.
As you may know, in VS you can drag stored procedures from the Server Explorer into the DBML designer for easy access with LINQ to SQL.
If you're trying to avoid stored procedures then this solution obviously won't be for you, but it's how I'd solve it easily and quickly. Hope it helps!
If you don't want to use the stored procedure approach, a crude way of dealing with it would simply be retrying on that specific exception. E.g:
int maxRetryCount = 5;
for (int i = 0; i < maxRetryCount; i++)
{
try
{
QueryAndUpdateDB();
break;
}
catch(RowUpdateException ex)
{
if (i == maxRetryCount) throw;
}
}
I have not used the sp_getapplock, instead I have used HOLDLOCK and ROWLOCK as seen below:
CREATE PROCEDURE [dbo].[UpdateStatistics]
#MethodName as varchar(50) = null,
#CountryID as varchar(2) = null,
#TokenType as varchar(5) = null,
#Year as int,
#Month as int,
#Count bigint,
#TotalTimeValue bigint
AS
BEGIN
SET NOCOUNT ON;
BEGIN TRAN
UPDATE dbo.[Statistics]
WITH (HOLDLOCK, ROWLOCK)
SET Count = Count + #Count
WHERE MethodName=#MethodName and CountryID=#CountryID and TokenType=#TokenType and Year=#Year and Month=#Month
IF ##ROWCOUNT=0
INSERT INTO dbo.[Statistics] (MethodName, CountryID, TokenType, TotalTimeValue, Year, Month, Count) values (#MethodName, #CountryID, #TokenType, #TotalTimeValue, #Year, #Month, #Count)
COMMIT TRAN
END
GO
I have tested it by calling my web service methods by multiple threads simultaneous and each call is logged without any problems.