How to run MySQL multiple queries concurrently without waiting the result? - mysql

function dbQuery(sql)
{
dbcon.query(sql, function(err, result) {
//imagine the PING to DB server is 1 second.
//It would take 100 seconds to complete 100 Queries, if the query is run 1 by 1.
});
}
for (var i=0; i<100; i++) {
dbQuery("INSERT INTO table VALUES some values");
}
I am running a Socket Client to get continuous streaming data, and I need to feed the remote database in real time. However, current design of executing query is sequential.
Imagine a PING to remote MySQL server is 1 second. It would take 100 seconds to complete 100 queries.
I need 100 queries to complete in 1 second without waiting for the result. Meaning just "push to DB server and forget about it", and ignore the delays between Network.
If this is not possible in NodeQuery, it is possible to use other programming language to get the desired results ? EG Java, PHP, Python, something else?
Note: I need to execute each queries in real time. Sending by INSERT by batch is not an option. I need to feed database instantly when I get the data from else where.
Non-blocking, non sequential
Simultaneously, Concurrently, Parallel

Just put them in a promise all without waiting for the result? They are executed at the same time (more or less)
Promise.all(
Array.from(Array(100)).map(
() => dbQuery("INSERT INTO table VALUES some values")
)
)

Related

MySql get_lock for concurrency safe upsert

Writing an API in node.js with a mysql db and I am implementing a fairly standard pattern of:
If exists then update
else insert
This of course works fine until multiple simultaneous requests are made to the api at which point the If exists on request 2 can get executed before the insert of request 1, leading to two records instead of one.
I know that one way of dealing with this is ensure that the db has a constraint or key that prevents the duplicate record but in this case the rules that determine whether we should have an insert or update are more complicated and so the check needs to be done in code.
This sounded like a good case for using a mutex/lock. I need this to be distributed as the api may have multiple instances running as part of a pool/farm.
I've come up with the following implementation:
try {
await this.databaseConnection.knexRaw().raw(`SELECT GET_LOCK('lock1',10);`);
await this.databaseConnection.knexRaw().transaction(async (trx) => {
const existing = await this.findExisting(id);
if (existing) {
await this.update(myThing);
} else {
await this.insert(myThing);
}
});
} finally {
await this.databaseConnection.knexRaw().raw(`SELECT RELEASE_LOCK('lock1');`);
}
This all seems to work fine and my tests now produce only a single insert. Although it seems a bit brute force/manual. Being new to mysql and node (I come from a c# and sql server background) is this approach sane? Is there a better approach?
Is it sane? Subjective.
Is it technically safe? It could be -- GET_LOCK() is reliable -- but not as you have written it.
You are ignoring the return value of GET_LOCK(), which is 1 if you got the lock, 0 if the timeout expired and you didn't get the lock, and NULL in some failure cases.
As written, you'll wait 10 seconds and then do the work anyway, so, not safe.
This assumes you have only one MySQL master. It wouldn't work if you have multiple masters or Galera, since Galera doesn't replicate GET_LOCK() across all nodes. (A Galera cluster is a high availability MySQL/MariaDB/Percona cluster of writable masters that replicate synchronously and will survive the failure/isolation of up to (ceil(n/2) - 1) out of n total nodes).
It would be better to find and lock the relevant rows using SELECT ... FOR UPDATE, which locks the found rows or, in some cases, the gap where they would be if they existed, blocking other transactions that are attempting to capture the same locks until you rollback or commit... but if that is not practical, using GET_LOCK() is valid, subject to the point made above about the return value.

How does the web server communicate with the database?

This is just to explain how I think it probably works:
Let's say the web server needs data from 10 tables. The data that will finally be displayed on the client needs some kind of formatting which can be done either on the database or the web server.Let's say the time to fetch the raw data for one table is 1 sec and the time to fetch formatted data for one table is 2 sec (It takes one second to format the data for one table and the formatting can be easily done either on the web server or the database.)
Let's consider the following cases for communication:
Case 1:
for(i = 0; i < 10; i++)
{
table[i].getDataFromDB(); //2 sec - gets formatted datafrom DB, Call is completed before control goes to next statement
table[i].sendDataToClient(); //0 sec - control doesn't wait for completion of this step
}
Case 2:
for(i = 0; i < 10; i++)
{
table[i].getDataFromDB(); //1 sec - gets raw data from DB, Call is completed before control goes to next statement
table[i].formatData(); //0 sec - will be executed as a parallel process which takes 1 sec to complete (control moves to next statement before completion)
}
formatData()
{
//format the data which takes 1 sec
sendDataToClient(); //0 sec - control doesn't wait for completion of this step
}
Assume it takes no time (0 sec) to send the data from the web server to the client since it will be constant for both cases.
In case 1, the data for each table will be displayed at interval of 2 seconds on the client, and the complete data will be
on client after 20 seconds.
In case 2, the data for first table will be displayed after 2 sec, but the data for the next 9 will then be displayed at sec 3,4,...,11.
Which is the correct way and how is it achieved between popular web server and databases ?
Popular web servers and databases can work either way, depending on how the application is written.
That said, unless you have an extreme situation, you will likely find that the performance impact is small enough that your primary concern should instead by code maintainability. From this point of view, formatting the data in the application (which runs on the web server) is usually preferred, as it is usually harder to maintain business logic that is implemented on the database level.
Many web application frameworks will do much of the formatting work for you, as well.

Provoking MySQL read/write timeouts

Is there a way to test query timeouts systematically using a MySQL 5.6 server without overloading the server by some insane busy query? Is it maybe possible to build testing SQL statements (read and/or write) that run infinitely (or several minutes) without driving the server into the ground?
MySQL has a sleep() function, so you can do this:
SELECT SLEEP(10);
to craft a query that will take 10 seconds without taking up resources. Sleep returns either 0 or 1 so you can take advantage of that to craft an update or delete query that will have no effect:
UPDATE users SET username='blah' WHERE id=1 AND SLEEP(1) > 1;
you need to ensure that the rest of the where clause (id=1 in this case) matches exactly one row. If it matches more than one row, it will sleep for every single row it matches, if it matches zero, it will return immediately.

Speedy way to send Sql Data to RavenDB?

I created a console application which sends data from a Sql Database to RavenDB.
I have a freakish amount of data to transfer, so it's taking an incredibly long time.
(1,000,000 rows takes RavenDB about 2 hours to store)
RavenDB takes longer to import the data than is collected from Sql Server by the Console application.
Is there any way to speed up the transfer or perhaps an existing tool which does this already?
using (var session = this._store.OpenSession())
{
//row.Count is never more than 1024
while (i < row.Count)
{
session.Store(row[i]);
i++;
}
session.SaveChanges();
}
Could you post the code where you insert to RavenDB, this is likely where the bottleneck lies. You should be making requests concurrently.
Setting:
HttpJsonRequest.ConfigureRequest += (e,x)=>((HttpWebRequest)x.Request).UnsafeAuthenticatedConnectionSharing = true;
As well as processing your insert records in a batch.
As for insert performance you'll likely never match SQLServer as RavenDb is optimized for read vs write.

Which is faster: multiple single INSERTs or one multiple-row INSERT?

I am trying to optimize one part of my code that inserts data into MySQL. Should I chain INSERTs to make one huge multiple-row INSERT or are multiple separate INSERTs faster?
https://dev.mysql.com/doc/refman/8.0/en/insert-optimization.html
The time required for inserting a row is determined by the following factors, where the numbers indicate approximate proportions:
Connecting: (3)
Sending query to server: (2)
Parsing query: (2)
Inserting row: (1 × size of row)
Inserting indexes: (1 × number of indexes)
Closing: (1)
From this it should be obvious, that sending one large statement will save you an overhead of 7 per insert statement, which in further reading the text also says:
If you are inserting many rows from the same client at the same time, use INSERT statements with multiple VALUES lists to insert several rows at a time. This is considerably faster (many times faster in some cases) than using separate single-row INSERT statements.
I know I'm answering this question almost two and a half years after it was asked, but I just wanted to provide some hard data from a project I'm working on right now that shows that indeed doing multiple VALUE blocks per insert is MUCH faster than sequential single VALUE block INSERT statements.
The code I wrote for this benchmark in C# uses ODBC to read data into memory from an MSSQL data source (~19,000 rows, all are read before any writing commences), and the MySql .NET connector (Mysql.Data.*) stuff to INSERT the data from memory into a table on a MySQL server via prepared statements. It was written in such a way as to allow me to dynamically adjust the number of VALUE blocks per prepared INSERT (ie, insert n rows at a time, where I could adjust the value of n before a run.) I also ran the test multiple times for each n.
Doing single VALUE blocks (eg, 1 row at a time) took 5.7 - 5.9 seconds to run. The other values are as follows:
2 rows at a time: 3.5 - 3.5 seconds
5 rows at a time: 2.2 - 2.2 seconds
10 rows at a time: 1.7 - 1.7 seconds
50 rows at a time: 1.17 - 1.18 seconds
100 rows at a time: 1.1 - 1.4 seconds
500 rows at a time: 1.1 - 1.2 seconds
1000 rows at a time: 1.17 - 1.17 seconds
So yes, even just bundling 2 or 3 writes together provides a dramatic improvement in speed (runtime cut by a factor of n), until you get to somewhere between n = 5 and n = 10, at which point the improvement drops off markedly, and somewhere in the n = 10 to n = 50 range the improvement becomes negligible.
Hope that helps people decide on (a) whether to use the multiprepare idea, and (b) how many VALUE blocks to create per statement (assuming you want to work with data that may be large enough to push the query past the max query size for MySQL, which I believe is 16MB by default in a lot of places, possibly larger or smaller depending on the value of max_allowed_packet set on the server.)
A major factor will be whether you're using a transactional engine and whether you have autocommit on.
Autocommit is on by default and you probably want to leave it on; therefore, each insert that you do does its own transaction. This means that if you do one insert per row, you're going to be committing a transaction for each row.
Assuming a single thread, that means that the server needs to sync some data to disc for EVERY ROW. It needs to wait for the data to reach a persistent storage location (hopefully the battery-backed ram in your RAID controller). This is inherently rather slow and will probably become the limiting factor in these cases.
I'm of course assuming that you're using a transactional engine (usually innodb) AND that you haven't tweaked the settings to reduce durability.
I'm also assuming that you're using a single thread to do these inserts. Using multiple threads muddies things a bit because some versions of MySQL have working group-commit in innodb - this means that multiple threads doing their own commits can share a single write to the transaction log, which is good because it means fewer syncs to persistent storage.
On the other hand, the upshot is, that you REALLY WANT TO USE multi-row inserts.
There is a limit over which it gets counter-productive, but in most cases it's at least 10,000 rows. So if you batch them up to 1,000 rows, you're probably safe.
If you're using MyISAM, there's a whole other load of things, but I'll not bore you with those. Peace.
Here are the results of a little PHP bench I did :
I'm trying to insert 3000 records in 3 different ways, using PHP 8.0, MySQL 8.1 (mysqli)
Multiple insert queries, with multiple transaction :
$start = microtime(true);
for($i = 0; $i < 3000; $i++)
{
mysqli_query($res, "insert into app__debuglog VALUE (null,now(), 'msg : $i','callstack','user','debug_speed','vars')");
}
$end = microtime(true);
echo "Took " . ($end - $start) . " s\n";
Did it 5 times, average : 11.132s (+/- 0.6s)
Multiple insert queries, with single transaction :
$start = microtime(true);
mysqli_begin_transaction($res, MYSQLI_TRANS_START_READ_WRITE);
for($i = 0; $i < 3000; $i++)
{
mysqli_query($res, "insert into app__debuglog VALUE (null,now(), 'msg : $i','callstack','user','debug_speed','vars')");
}
mysqli_commit($res);
$end = microtime(true);
echo "Took " . ($end - $start) . " ms\n";
Result with 5 tests : 0.48s (+/- 0.04s)
Single aggregated insert query
$start = microtime(true);
$values = "";
for($i = 0; $i < 3000; $i++)
{
$values .= "(null,now(), 'msg : $i','callstack','user','debug_speed','vars')";
if($i !== 2999)
$values .= ",";
}
mysqli_query($res, "insert into app__debuglog VALUES $values");
$end = microtime(true);
echo "Took " . ($end - $start) . " ms\n";
Result with 5 tests : 0.085s (+/- 0.05s)
So, for a 3000 row insert, looks like :
Using multiple queries in a single write transaction is ~22 times faster than making a multiple queries with multiple transactions for each insert.
Using a single aggregated insert statement is still ~6 times faster than using multiple queries with a single write transaction
Send as many inserts across the wire at one time as possible. The actual insert speed should be the same, but you will see performance gains from the reduction of network overhead.
In general the less number of calls to the database the better (meaning faster, more efficient), so try to code the inserts in such a way that it minimizes database accesses. Remember, unless your using a connection pool, each databse access has to create a connection, execute the sql, and then tear down the connection. Quite a bit of overhead!
You might want to :
Check that auto-commit is off
Open Connection
Send multiple batches of inserts in a single transaction (size of about 4000-10000 rows ? you see)
Close connection
Depending on how well your server scales (its definitively ok with PostgreSQl, Oracle and MSSQL), do the thing above with multiple threads and multiple connections.
I just did a small benchmark and it appears that for a lot of line it's not faster. Here my result to insert 280 000 rows :
by 10 000 : 164.96 seconds
by 5 000 : 37seconds
by 1000 : 12.56 seconds
by 600 : 12.59 seconds
by 500 : 13.81 seconds
by 250 : 17.96 seconds
by 400 : 14.75 seconds
by 100 : 27seconds
It appears that 1000 by 1000 is the best choice.
In general, multiple inserts will be slower because of the connection overhead. Doing multiple inserts at once will reduce the cost of overhead per insert.
Depending on which language you are using, you can possibly create a batch in your programming/scripting language before going to the db and add each insert to the batch. Then you would be able to execute a large batch using one connect operation. Here's an example in Java.
MYSQL 5.5
One sql insert statement took ~300 to ~450ms.
while the below stats is for inline multiple insert statments.
(25492 row(s) affected)
Execution Time : 00:00:03:343
Transfer Time : 00:00:00:000
Total Time : 00:00:03:343
I would say inline is way to go :)
It's ridiculous how bad Mysql and MariaDB are optimized when it comes to inserts.
I tested mysql 5.7 and mariadb 10.3, no real difference on those.
I've tested this on a server with NVME disks, 70,000 IOPS, 1.1 GB/sec seq throughput and that's possible full duplex (read and write).
The server is a high performance server as well.
Gave it 20 GB of ram.
The database completely empty.
The speed I receive was 5000 inserts per second when doing multi row inserts (tried it with 1MB up to 10MB chunks of data)
Now the clue:
If I add another thread and insert into the SAME tables I suddenly have 2x5000 /sec.
One more thread and I have 15000 total /sec
Consider this: When doing ONE thread inserts it means you can sequentially write to the disk (with exceptions to indexes).
When using threads you actually degrade the possible performance because it now needs to do a lot more random accesses.
But reality check shows mysql is so badly optimized that threads help a lot.
The real performance possible with such a server is probably millions per second, the CPU is idle the disk is idle.
The reason is quite clearly that mariadb just as mysql has internal delays.
I would add the information that too many rows at a time depending on their contents could lead to Got a packet bigger than 'max_allowed_packet'.
Maybe consider using functions like PHP's array_chunk to do multiple inserts for your big datasets.
multiple inserts are faster but it has thredshould. another thrik is disabling constrains checks temprorary make inserts much much faster. It dosn't matter your table has it or not. For example test disabling foreign keys and enjoy the speed:
SET FOREIGN_KEY_CHECKS=0;
offcourse you should turn it back on after inserts by:
SET FOREIGN_KEY_CHECKS=1;
this is common way to inserting huge data.
the data integridity may break so you shoud care of that before disabling foreign key checks.