I use PHP with Mysql. I care about performance and stability and that's why I now use multiple inserts / updates in one single SQL query (which is generated by PHP). With that I insert / update 1000 rows. It takes about 6 seconds.
Kind of like this but larger:
Inserting multiple rows in mysql
I've read about transactions which is meant to speed up where multiple SQL queries are added to a kind of buffer (I guess) and then all is executed.
Which method should I use in terms of performance and stability? Pros and cons?
Related
Is it possible? From a single process?
DB is on SATA disk.
I am using ubuntu 14.04. All tables have 20-60 rows and 6 columns each.
I am using transactions.
The current sequence is:
Create table
Start transaction
Insert #1
Insert #2
...
Insert #n
Commit
Right now I am getting about 3-4 tables/second.
Conclusion: When I disabled logging my performance became similar to phpmyadmin. So, as Rick James suggested, I guess there is no way to achieve further improvements without a faster storage.
On a spinning drive, you can get about 100 operations per second. CREATE TABLE might be slower since it involves multiple file operations in the OS. So, I would expect 1000 CREATE TABLEs to take more than 10 seconds. That's on Ubuntu; longer on Windows.
It is usually poor schema design to make multiple tables that are identical; instead have a single table with an extra column to distinguish the subsets.
INSERTing 40K rows--
40K single-row INSERTs with autocommit=ON -- 400 seconds.
1000 multi-row INSERTs of 20-60 rows each, again COMMITted after each statement -- 10 seconds.
A single INSERT with 40K rows (if you don't blow out some other limitation) -- possibly less than 1 second.
Do not use multi-statement queries; it is a potential security problem. Anyway, it won't help much.
For create table you could perform a multi statement query (PDO support this) so in a single query you can create several table and t for insert
you could use bulk insert preparing a sql insert query with repeated insert value and the execute as a single query
The bulk insert is based on
INSERT INTO your_table( col1, col2,,,)
VALUES ( val1_1, val1_2 ,,,),
( vale2_1, val2_2 ,,,),
....
Then you can build a PDO query based on these tecnique and do the fact the execution if for you single statement and not for each statement as ne number of values you can inset thousand of value in a query and get the result in a few seconds
the Use the multiple-row INSERT syntax reduce communication overhead
between the client and the server if you need to insert many rows This
tip is valid for inserts into any table, not just InnoDB tables.
A single insert statement is taking, occasionally, more than 2 seconds. The inserts are potentially concurrent, as it depends on our site traffic which can result in 200 inserts per minute.
The table has more than 150M rows, 4 indexes and is accessed using a simple select statement for reporting purposes.
SHOW INDEX FROM ouptut
How to speed up the inserts considering that all indexes are required?
You haven't provided many details but it seems like you need partitions.
An insertion operation in an database index has, in general, an O(logN) time complexity where N is the number of rows in the table. If your table is really huge even logN may become too much.
So, to address that scalability issue you can make use of index partitions to transparently split up your table indexes in smaller internal pieces and reduce that N without changing your application or SQL scripts.
https://dev.mysql.com/doc/refman/5.7/en/partitioning-overview.html
[EDIT]
Considering information initially added in the comments and now updated in the question itself.
200 potentially concurrent inserts per minute
4 indexes
1 select for reporting purposes
There are a few not mutually exclusive improvements:
Check the output of EXPLAIN for that SELECT and remove indexes not being used, or, otherwise, combine them in a single index.
Make the inserts in batch
https://dev.mysql.com/doc/refman/5.6/en/insert-optimization.html
https://dev.mysql.com/doc/refman/5.6/en/optimizing-innodb-bulk-data-loading.html
Partitioning still an option.
Alternatively, change your approach: save the data to a nosql database like redis and populate the mysql table asynchronously for reporting purpose.
I have a large mysql database that receives large volumes of queries, each query takes around 5-10 seconds to perform.
Queries involve checking records, updating records and adding records.
I'm experiencing some significant bottle necks in the query executions, which I believe is due to incoming queries having to 'queue' whilst current queries are using records that these incoming queries need to access.
Is there a way, besides completely reformatting my database structure and SQL queries, to enable simultaneous use of database records by queries?
An INSERT, UPDATE, or DELETE operation locks the relevant tables - myISAM - or rows -InnoDB - until the operation completes. Be sure your query of this type are fastly commited .. and also chechck for you transacation isolating the part with relevant looking ..
For MySQL internal locking see: https://dev.mysql.com/doc/refman/5.5/en/internal-locking.html
Also remeber that in mysql there are differente storage engine with different features eg:
The MyISAM storage engine supports concurrent inserts to reduce
contention between readers and writers for a given table: If a MyISAM
table has no holes in the data file (deleted rows in the middle), an
INSERT statement can be executed to add rows to the end of the table
at the same time that SELECT statements are reading rows from the
table.
https://dev.mysql.com/doc/refman/5.7/en/concurrent-inserts.html
eventually take a look at https://dev.mysql.com/doc/refman/5.7/en/optimization.html
I was wondering if I can reduce the overhead by sending multiple statements in the same query to the database. Is this possible?
I am currently sending the queries one by one and would like to send a batch several at the same time (all in all I'm sending 2k or so queries)
They are all select queries
I used batch inserts when working with Grails and MySQL, and the time for inserts was reduced by a factor of 100! (I processed about 50 inserts at once with batch processing) So I can definitely say batch inserts save a lot of time.
I am not sure how much of this post can help you, but here Help with performance: SUBQUERY vs JOIN
The way you join tables could also be a major issue on performance.
Batching SQL operations can definitely improve overall speed. For small queries, the slowest part is often establishing a DB connection. This is an expensive step, regardless of the SQL query itself. You can reduce this overhead by reducing the number of times you have to create and destroy these DB connections.
I have to upload about 16 million records to a MySQL 5.1 server on a shared webspace which does not permit LOAD DATA functionality. The table is an Innodb table. I have not assigned any keys yet.
Therefore, I use a Python script to convert my CSV file (of 2.5 GB of size) to an SQL file with individual INSERT statements. I've launched the SQL file, and the process is incredibly slow, it feels like 1000-1500 lines are processed every minute!
In the meantime, I read about bulk inserts, but did not find any reliable source telling how many records one insert statement can have. Do you know?
Is it an advantage to have no keys and add them later?
Would a transaction around all the insert help speed up the process? In fact, there's just a single connection (mine) working with the database at this time.
If you use insert ... values ... syntax to insert multiple rows running a single request your query size is limited by max_allowed_packet value rather than by number of rows.
Concerning keys: it's a good practice to define keys before any data manipulations. Actually, when you build a model you must think of keys, relations, indexes etc.
It's better do define indexes before you insert data as well. CREATE INDEX works quite slowly on huge datasets. But postponing indexes creation is not a huge disadvantage.
To make your inserts faster try to turn autocommit mode on and do not run concurrent requests on your tables.