EF Core: Update thousands of records failed - exception

I am using EF Core to update thousands of records from a table called MyTable. It already has 1.500.000 rows. And here is the simple code I use to update one of its property called MyProp for each record with a different value for each of them:
using (MyDbContext ctx = new MyDbContext())
{
var rows = ctx.Set<MyTable>().Take(1000000).ToList(); // I only take the first 1 million rows
int idx = 0;
foreach (var row in rows)
{
row.MyProp = $"MyNewValue{idx++}";
}
ctx.SaveChanges();
}
If I take one million rows to update them, while calling ctx.SaveChanges(), I get the following exception: Expected to read 4 header bytes but only received 0. (see below for the full stack trace)
I don't know why am I facing such exception, it appears if I take 1000000 rows but not if I take only 750000 for example.
I read it could be related to the timeout of the SQL server: which are listed below, but I really do not know if it is really related to one timeout and how I can find a solution. Should I divide the update into several ones and call several times SaveChanges()?
Thank you for any suggestions.
Here are the timeouts from the MySql server:
connect_timeout 10
delayed_insert_timeout 300
have_statement_timeout YES
innodb_flush_log_at_timeout 1
innodb_lock_wait_timeout 50
innodb_print_lock_wait_timeout_info OFF
innodb_rollback_on_timeout OFF
interactive_timeout 610
lock_wait_timeout 31536000
net_read_timeout 30
net_write_timeout 60
rpl_stop_slave_timeout 31536000
slave_net_timeout 60
thread_pool_idle_timeout 60
wait_timeout 610
Here is the full stacktrace:
From Microsoft.EntityFrameworkCore.Update : ERROR An exception occurred in the database while saving changes for context type 'MyDbContext'.
Microsoft.EntityFrameworkCore.DbUpdateException: An error occurred while updating the entries. See the inner exception for details.
---> MySql.Data.MySqlClient.MySqlException (0x80004005): Failed to read the result set.
---> System.IO.EndOfStreamException: Expected to read 4 header bytes but only received 0.
at MySqlConnector.Protocol.Serialization.ProtocolUtility.DoReadPayloadAsync(BufferedByteReader bufferedByteReader, IByteHandler byteHandler, Func1 getNextSequenceNumber, ArraySegmentHolder1 previousPayloads, ProtocolErrorBehavior protocolErrorBehavior, IOBehavior ioBehavior) in C:\projects\mysqlconnector\src\MySqlConnector\Protocol\Serialization\ProtocolUtility.cs:line 462
at MySqlConnector.Protocol.Serialization.StandardPayloadHandler.ReadPayloadAsync(ArraySegmentHolder`1 cache, ProtocolErrorBehavior protocolErrorBehavior, IOBehavior ioBehavior) in C:\projects\mysqlconnector\src\MySqlConnector\Protocol\Serialization\StandardPayloadHandler.cs:line 37
at MySqlConnector.Core.ServerSession.ReceiveReplyAsync(IOBehavior ioBehavior, CancellationToken cancellationToken) in C:\projects\mysqlconnector\src\MySqlConnector\Core\ServerSession.cs:line 665

For such update better to use temporary table which will hold new values. Then normally records should be updated by join to temporary table.
There is extension for EF Core linq2db.EntityFrameworkCore (note that I'm one of the creators). It will add temporary tables support to LINQ queries and make possible to do such update without leaving type-safe LINQ.
// definition of temporary table structure
class IntermediateTable
{
public int Id {get; set;}
public string MyProp {get; set;}
}
var items = ... // some input list
var rowsToUpdate = items.Select(x => new IntermediateTable
{
Id = x.Id,
MyProp = x.MyProp
});
using var ctx = new MyDbContext();
using var db = ctx.CreateLinqToDBConnection();
// here all records are populated to database as fast as possible
using var tempTable = db.CreateTempTable(rowsToUpdate,
options: new BulkCopyOptions { BulkCopyType = BulkCopyType.ProviderSpecific },
tableName: "SomeTempTableName");
// preparing update query
var queryToUpdate =
from x in ctx.Set<MyTable>()
join t in tempTable on x.Id equals t.Id
select new {x, t};
queryToUpdate
.Set(u => u.x.MyProp, u => u.t.MyProp)
.Update();
At the end Temporary Table will be dropped automatically.

For your usecase since you want to set an explicit value you can execute some direct sql that will not return any rows on the server.
Read into this nice chapter and you can actually just translate the whole operation over to a simple sql statement that will run on db only.
The paging ( skip ) part will be moved to the database, no data will be retrieved saving performance and the whole operation will not consume memory on the server

Related

Mysql2 UPDATE query is taking too much time to update row

I am updating a row in MySQL database using UPDATE keyword (in an express server using mysql2). If the data I am using to update and the data in the row are same then it is taking time as usual. But if I update the table with different data then it takes longer time. There is my code below.
public update = async (
obj: TUpCred,
): Promise<ResultSetHeader> => {
const sql = 'UPDATE ?? SET ? WHERE ?';
const values = [obj.table, obj.data, obj.where];
const [data] = (await this.connection.query({
sql,
values,
})) as TResultSetHeader;
return data;
};
Then this query takes so long time. Usually it is taking 4 - 6 seconds but sometimes it even takes 10 to 15 seconds to update. The same happens with INSERT query. But other queries like SELECT is taking normal time to execute.

Distributed database insertion speed is very slow

#Test
public void transaction() throws Exception {
Connection conn = null;
PreparedStatement ps = null;
try {
String sql = "insert into `1` values(?, ?, ?, ?)";
conn = JDBCUtils.getConnection();
ps = conn.prepareStatement(sql);
conn.setAutoCommit(false);
for(int i = 1; i <= 10000; i++){
ps.setObject(1, i);
ps.setObject(2, 10.12345678);
ps.setObject(3, "num_" + i);
ps.setObject(4, "2021-12-24 19:00:00");
ps.addBatch();
}
ps.executeBatch();
ps.clearBatch();
conn.commit();
} catch (Exception e) {
conn.rollback();
e.printStackTrace();
}finally {
JDBCUtils.closeResources(conn, ps);
}
}
When setAutoCommit = true, local MySQL and distributed MySQL insert speeds are very slow.
When I set the transaction to commit manually, just like the code above, the local MySQL speed has increased a lot, but the insertion speed of distributed MySQL is still very slow.
Is there any additional parameters I need to set?
Setting parameters probably won't help (much).
There are a couple of reasons for the slowness:
With autocommit=true you are committing on every insert statement. That means the each new row must be written to disk before the database server returns the response to the client.
With autocommit=false there is still a client -> server -> client round trip for each insert statement. Those round trips add up to a significant amount of time.
One way to make this faster is to insert multiple rows with each insert statement, but that is messy because you would need to generate complex (multi-row) insert statements.
A better way is to use JDBC's batch feature to reduce the number of round-trips. For example:
PreparedStatement ps = c.prepareStatement("INSERT INTO employees VALUES (?, ?)");
ps.setString(1, "John");
ps.setString(2,"Doe");
ps.addBatch();
ps.clearParameters();
ps.setString(1, "Dave");
ps.setString(2,"Smith");
ps.addBatch();
ps.clearParameters();
int[] results = ps.executeBatch();
(Attribution: above code copied from this answer by #Tusc)
If that still isn't fast enough, you should get even better performance using MySQL's native bulk insert mechanism; e.g. load data infile; see High-speed inserts with MySQL
For completeness, I am adding this suggestion from #Wilson Hauck
"In your configuration [mysqld] section, innodb_change_buffer_max_size=50 # from 25 (percent) for improved INSERT rate per second. SHOW FULL PROCESSLIST; to monitor when the instance has completed adjustment, then do your inserts and put it back to 25 percent for typical processing speed."
This may increase the insert rate depending on your table and its indexes, and on the order in which you are inserting the rows.
But the flip-side is that you may be able to achieve the same speedup (or more!) by other means; e.g.
by sorting your input so that rows are inserted in index order, or
by dropping the indexes, inserting the records and then recreating the indexes.
You can read about the change buffer here and make your own judgements.

Concurrent Read/Write MySQL EF Core

Using EF Core 2.2.6 and Pomelo.EntityFrameworkCore.MySql 2.2.6 (with MySqlConnector 0.59.2)). I have a model for UserData:
public class UserData
{
[DatabaseGenerated(DatabaseGeneratedOption.None)]
public ulong ID { get; private set; }
[Required]
public Dictionary<string, InventoryItem> Inventory { get; set; }
public UserData()
{
Data = new Dictionary<string, string>();
}
}
I have a REST method that can be called that will add items to the user inventory:
using (var transaction = context.Database.BeginTransaction())
{
UserData data = await context.UserData.FindAsync(userId);
// there is code here to detect duplicate entries/etc, but I've removed it for brevity
foreach (var item in items) data.Inventory.Add(item.ItemId, item);
context.UserData.Update(data);
await context.SaveChangesAsync();
transaction.Commit();
}
If two or more calls to this method are made with the same user id then I get concurrent accesses (despite the transaction). This causes the data to sometimes be incorrect. For example, if the inventory is empty and then two calls are made to add items simultaneously (item A and item B), sometimes the database will only contain either A or B, and not both. From logging it appears that it is possible for EF to read from the database while the other read/write is still occurring, causing the code to have the incorrect state of the inventory for when it tries to write back to the db. So I tried marking the isolation level as serializable.
using (var transaction = context.Database.BeginTransaction(System.Data.IsolationLevel.Serializable))
Now I sometimes see an exception:
MySql.Data.MySqlClient.MySqlException (0x80004005): Deadlock found when trying to get lock; try restarting transaction
I don't understand how this code could deadlock... Anyways, I tried to proceed by wrapping this whole thing in a try/catch, and retry:
public static async Task<ResponseError> AddUserItem(Controller controller, MyContext context, ulong userId, List<InventoryItem> items, int retry = 5)
{
ResponseError result = null;
try
{
using (var transaction = context.Database.BeginTransaction(System.Data.IsolationLevel.Serializable))
{
UserData data = await context.UserData.FindAsync(userId);
// there is code here to detect duplicate entries/etc, but I've removed it for brevity
foreach (var item in items) data.Inventory.Add(item.ItemId, item);
context.UserData.Update(data);
await context.SaveChangesAsync();
transaction.Commit();
}
}
catch (Exception e)
{
if (retry > 0)
{
await Task.Delay(SafeRandomGenerator(10, 500));
return await AddUserItem(controller, context, userId, items, retry--);
}
else
{
// store exception and return error
}
}
return result;
}
And now I am back to the data being sometimes correct, sometimes not. So I think the deadlock is another problem, but this is the only method accessing this data. So, I'm at a loss. Is there a simple way to read from the database (locking the row in the process) and then writing back (releasing the lock on write) using EF Core? I've looked at using concurrency tokens, but this seems overkill for what appears (on the surface to me) to be a trivial task.
I added logging for mysql connector as well as asp.net server and can see the following failure:
fail: Microsoft.EntityFrameworkCore.Database.Command[20102]
=> RequestId:0HLUD39EILP3R:00000001 RequestPath:/client/AddUserItem => Server.Controllers.ClientController.AddUserItem (ServerSoftware)
Failed executing DbCommand (78ms) [Parameters=[#p1='?' (DbType = UInt64), #p0='?' (Size = 4000)], CommandType='Text', CommandTimeout='30']
UPDATE `UserData` SET `Inventory` = #p0
WHERE `ID` = #p1;
SELECT ROW_COUNT();
A total hack is to just delay the arrival of the queries by a bit. This works because the client is most likely to generate these calls on load. Normally back-to-back calls aren't expected, so spreading them out in time by delaying on arrival works. However, I'd rather find a correct approach, since this just makes it less likely to be an issue:
ResponseError result = null;
await Task.Delay(SafeRandomGenerator(100, 500));
using (var transaction = context.Database.BeginTransaction(System.Data.IsolationLevel.Serializable))
// etc
This isn't a good answer, because it isn't what I wanted to do, but I'll post it here as it did solve my problem. My problem was that I was trying to read the database row, modify it in asp.net, and then write it back, all within a single transaction and while avoiding deadlocks. The backing field is JSON type, and MySQL provides some JSON functions to help modify that JSON directly in the database. This required me to write SQL statements directly instead of using EF, but it did work.
The first trick was to ensure I could create the row if it didn't exist, without requiring a transaction and lock.
INSERT INTO UserData VALUES ({0},'{{}}','{{}}') ON DUPLICATE KEY UPDATE ID = {0};
I used JSON_REMOVE to delete keys from the JSON field:
UPDATE UserData as S set S.Inventory = JSON_REMOVE(S.Inventory,{1}) WHERE S.ID = {0};
and JSON_SET to add/modify entries:
UPDATE UserData as S set S.Inventory = JSON_SET(S.Inventory,{1},CAST({2} as JSON)) WHERE S.ID = {0};
Note, if you're using EF Core and want to call this using FromSql then you need to return the entity as part of your SQL statement. So you'll need to add something like this to each SQL statement:
SELECT * from UserData where ID = {0} LIMIT 1;
Here is a full working example as an extension method:
public static async Task<UserData> FindOrCreateAsync(this IQueryable<UserData> table, ulong userId)
{
string sql = "INSERT INTO UserData VALUES ({0},'{{}}','{{}}') ON DUPLICATE KEY UPDATE ID = {0}; SELECT * FROM UserData WHERE ID={0} LIMIT 1;";
return await table.FromSql(sql, userId).SingleOrDefaultAsync();
}
public static async Task<UserData> JsonRemoveInventory(this DbSet<UserData> table, ulong userId, string key)
{
if (!key.StartsWith("$.")) key = $"$.\"{key}\"";
string sql = "UPDATE UserData as S set S.Inventory = JSON_REMOVE(S.Inventory,{1}) WHERE S.ID = {0}; SELECT * from UserData where ID = {0} LIMIT 1;";
return await table.AsNoTracking().FromSql(sql, userId, key).SingleOrDefaultAsync();
}
Usage:
var data = await context.UserData.FindOrCreateAsync(userId);
await context.UserData.JsonRemoveInventory(userId, itemId);

query execution time is more in NamedParameterJdbcTemplate than mySql

I am using spring NamedParameterJdbcTemplate with my sql.I am executing 2 queries through jbdc template it is taking 1.1 and 4 s respectively. but the same query if I am running in my sql it is taking 0.5 and 1 s respectively I don't understand what could be my bottleneck. my application and db resides on same server so there can be no network overhead, I have my connection pooled. I can say it is working because query with less amount of data taking 50 ms through application. please let me know what could be my bottleneck
below is my code of namedjdbcTemplate
MapSqlParameterSource parameters = new MapSqlParameterSource();
parameters.addValue("organizationIds", list);
parameters.addValue("fromDate", fromDate);
parameters.addValue("toDate", toDate);
long startTime = System.currentTimeMillis();
List<MappingUI> list =namedParameterJdbcTemplate.query(QueryList.CALCULATE_SCORE,parameters,new RowMapper<MappingUI>() {
#Override
public MappingUI mapRow(ResultSet rs, int rowNum) throws SQLException {
MappingUI mapping= new MappingUI();
mapping.setCompetitor(rs.getInt("ID"));
mapping.setTotalRepufactScore(rs.getFloat("AVG_SCORE"));
return mapping;
}
});
below is my query
SELECT AVG(SCORE.SCORE) AS AVG_SCORE,ANALYSIS.ID AS ID FROM SCORE SCORE,QC_ANALYSIS ANALYSIS WHERE SCORE.TAG_ID = ANALYSIS.ID AND ANALYSIS.ORGANIZATION_ID IN (:organizationIds) AND DATE(ANALYSIS.DATES) BETWEEN DATE(:fromDate) AND DATE(:toDate) GROUP BY ANALYSIS.ORGANIZATION_ID

How can I execute a query when columns comparator_type is in LexicalUUIDType?

I have created a column family with Comparator_type="LexicalUUIDType", Default_validation_class="UTF8Type" and Key_validation_class="UTF8Type".
And set TimeUUID as column_name within the column family above. It's insertion runs very well, but how can I get the columns? I can't set the correct column_name! The following are the code:
ColumnPath path = new ColumnPath();
path.setColumn_family("test");
path.setColumn(("44c32fe1-38a4-11e1-a06a-485d60c81a3e".getBytes()));
ColumnOrSuperColumn or = new ColumnOrSuperColumn();
try {
or = client.get(ByteBuffer.wrap("key").getBytes()), path, ConsistencyLevel.ONE);
} catch (InvalidRequestException e) {
...
data in Cassandra DB:
=> (column=44c32fe0-38a4-11e1-a06a-485d60c81a3e, value=32, timestamp=1325881397726)
=> (column=44c32fe1-38a4-11e1-a06a-485d60c81a3e, value=33, timestamp=1325881397726)
=> (column=44c32fe2-38a4-11e1-a06a-485d60c81a3e, value=34, timestamp=1325881397727)
=> (column=44c37e00-38a4-11e1-a06a-485d60c81a3e, value=35, timestamp=1325881397728)
=> (column=44c37e01-38a4-11e1-a06a-485d60c81a3e, value=36, timestamp=1325881397728)
...
And the exception informations:
InvalidRequestException(why:LexicalUUID should be 16 or 0 bytes (36))
at org.apache.cassandra.thrift.Cassandra$get_result.read(Cassandra.java:6490)
at org.apache.cassandra.thrift.Cassandra$Client.recv_get(Cassandra.java:519)
at org.apache.cassandra.thrift.Cassandra$Client.get(Cassandra.java:492)
at test.cassandra.MainTest.query(MainTest.java:118)
...
That's why? I can't execute single query or slice query now. How can I execute query by key and column name with uuid? Thank in advance!
The byte representation of a UUID is not what you get when you call "44c32fe1-38a4-11e1-a06a-485d60c81a3e".getBytes() (a uuid is 16 bytes, this string.getBytes() is 36 bytes). The FAQ on the cassandra wiki has instructions how to do what you want in java:
java.util.UUID.fromString("44c32fe1-38a4-11e1-a06a-485d60c81a3e");