I have created a query using doctrine query builder which inserts almost 65000 rows(including all 3 tables) to 3 different tables at a time when something is performed.And for this complete process to finish it takes almost 2-3 mins to execute .
What i have done is persist records in loops and then flush finally.
So is there any ways that will minimize my execution time and inserts data within seconds.
No, unfortunately Doctrine doesn't support grouping inserts into a single statement. If you need to do bulk inserts, one possibility is doing a $em->flush() and $em->clear() after every 100th or so row, see the manual's recommendation:
https://doctrine-orm.readthedocs.org/en/latest/reference/batch-processing.html
Related
I have to fetch some records and then display in output. In my Spring Boot Application, I am using JPA Specification for creating Criteria and then calling repo.findALL(Specification, Pageable ), query generated by JPA is below:
Select *
From "Table"
Where "condition1" and "condition2" and condition"3"
OrderBy column1 desc
offset 0 rows` fetch next 10 rows only
This query sometimes take more than 40 secs and rest of the time some 100ms. Although this issue is very rare (Once in 300-400 times)
This table have around 40000 rows and one of the column is having JSON DATA
Is there any way to detect why this query taking much time randomly. When I manually triggered this query in DB then only once it took around 35+ secs and later on every time it is hardly taking 200ms. Any tool/approach to detect this rarely happening issue
First of all, for 40000 rows that amount of time for execution seems insane.
Do check your indexed columns.
Secondly, as per your question, the query takes long time when executing first time and shorter times while executed there after.
Databases use query_cache in order to perform faster executions.
When you execute a query for the first time, database has to go into execution plans for retrieving your result but when executed there after the execution plan is already there. Hence, it takes shorter interval to execute after first execution or ways to optimize your query.
You can check your query execution plan with
explain Select * from "Table" where "condition1" and "condition2" and condition"3" orderBy column1 desc offset 0 rows
Explain is a keyword which shows the tentative query execution plan.
Hope this helps.:)
What is the best way to execute a bulk update of all records in a large table in MySQL?
During the sanitization process, we are updating all rows in the users table, which has 28M rows, to mask a few columns. This is currently taking right around 2 hours to complete in a rake task, and the AWS session expiration is also 2 hours. If the rake task takes longer than the session expiration, the build will fail.
Due to a large number of records, we are updating 25K rows at a time by using find_in_batches, and then update_all in the results. We throttle between each batch by sleeping for 0.1s to avoid high CPU.
So the question is, is there any way we can optimize the bulk update further, or shall we increase the AWS session expiration to 3 hours?
One option could be to batch by id ranges, rather than by exact batch sizes. So update between id 1-100000, followed by 100001-200000, and so on. This avoids the large sets of ids being passed around. As there will be gaps in the ids, each batch would be a different size, but this might not be an issue.
Thanks for your input.
For such large updates overhead of fetching records and instantiating AR objects is very significant (and also there'll be slowdown from GC), fastest way to execute - is to write raw sql query that will do the update (or use update_all to construct it, which is very similar, but allows to use scopes/joins by relations).
I have a mysql table that keep gaining new records every 5 seconds.
The questions are
can I run query on this set of data that may takes more than 5 seconds?
if SELECT statement takes more than 5s, will it affect the scheduled INSERT statement?
what happen when INSERT statement invoked while SELECT is still running, will SELECT get the newly inserted records?
I'll go over your questions and some of the comments you added later.
can I run query on this set of data that may takes more than 5 seconds?
Can you? Yes. Should you? It depends. In a MySQL configuration I set up, any query taking longer than 3 seconds was considered slow and logged accordingly. In addition, you need to keep in mind the frequency of the queries you intend to run.
For example, if you try to run a 10 second query every 3 seconds, you can probably see how things won't end well. If you run a 10 second query every few hours or so, then it becomes more tolerable for the system.
That being said, slow queries can often benefit from optimizations, such as not scanning the entire table (i.e. search using primary keys), and using the explain keyword to get the database's query planner to tell you how it intends to work on that internally (e.g. is it using PKs, FKs, indices, or is it scanning all table rows?, etc).
if SELECT statement takes more than 5s, will it affect the scheduled INSERT statement?
"Affect" in what way? If you mean "prevent insert from actually inserting until the select has completed", that depends on the storage engine. For example, MyISAM and InnoDB are different, and that includes locking policies. For example, MyISAM tends to lock entire tables while InnoDB tends to lock specific rows. InnoDB is also ACID-compliant, which means it can provide certain integrity guarantees. You should read the docs on this for more details.
what happen when INSERT statement invoked while SELECT is still running, will SELECT get the newly inserted records?
Part of "what happens" is determined by how the specific storage engine behaves. Regardless of what happens, the database is designed to answer application queries in a way that's consistent.
As an example, if the select statement were to lock an entire table, then the insert statement would have to wait until the select has completed and the lock has been released, meaning that the app would see the results prior to the insert's update.
I understand that locking database can prevent messing up the SELECT statement.
It can also put a potentially unacceptable performance bottleneck, especially if, as you say, the system is inserting lots of rows every 5 seconds, and depending on the frequency with which you're running your queries, and how efficiently they've been built, etc.
what is the good practice to do when I need the data for calculations while those data will be updated within short period?
My recommendation is to simply accept the fact that the calculations are based on a snapshot of the data at the specific point in time the calculation was requested and to let the database do its job of ensuring the consistency and integrity of said data. When the app requests data, it should trust that the database has done its best to provide the most up-to-date piece of consistent information (i.e. not providing a row where some columns have been updated, but others yet haven't).
With new rows coming in at the frequency you mentioned, reasonable users will understand that the results they're seeing are based on data available at the time of request.
All of your questions are related to locking of table.
Your all questions depend on the way database is configured.
Read : http://www.mysqltutorial.org/mysql-table-locking/
Perform Select Statement While insert statement working
If you want to perform a select statement during insert SQL is performing, you should check by open new connection and close connection every time. i.e If I want to insert lots of records, and want to know that last record has inserted by selecting query. I must have to open connection and close connection in for loop or while loop.
# send a request to store data
insert statement working // take a long time
# select statement in while loop.
while true:
cnx.open()
select statement
cnx.close
//break while loop if you get the result
I have a big sql dump ~1,3 million of rows.
I try to import it through mysql console this way:
source mysql_dump.sql
It goes well on start. It creates new table and etc., but after some time of inserting queries, it takes more and more time to proceed insertion queries.
E.g. every ~1700 records console outputs the results and time consumption for this stack of queries. In the beginning to do ~1700 mysql spends ~0.3 sec. After 5 minutes it takes ~ 1 minute.
What can be done, to make it proceed queries that faster, as in the beginning?
This is a bit long for a comment.
One possibility is indexes. You should drop all the indexes on the table before inserting records. Then add the indexes after all the data is in the table. Index insertions can slow down inserts.
Second, if you want to save all the data in a table, it is better to reload it using load data infile.
When you do so many inserts, do a commit after every 1000 recs.
I have the following scenario:
I have a database with a particular MyISAM table of about 4 million rows. I use stored procedures (MySQL Version 5.1) and one in particular to search through these rows on various criteria. This table has several indexes on it, and the queries through this stored procedure are normally very fast ( <1s). Basically I use a prepared statement and create and execute some dynamic SQL in this search sp. After executing the prepared statement, I perform "DEALLOCATE PREPARED stmt;"
Most of the queries run in under a second (I use LIMIT to get just 15 rows at any time). However, there are some rare queries which take longer to run (say 2-3s). I have optimized the searched table as far as I can.
I have developed a web application and I can run and see the results of the fast queries in under a second on my development machine.
However, if I open two browser instances and do a simultaneous search (against the development machine), one with the longer running query, and the other with the faster query, the results are returned at the same time, i.e. it seems as if the fast query waits for the slower query to finish before returning the results. i.e. both queries will take 2-3 seconds...
Is there a reason for this? Because I thought that MyISAM handles SELECTS irrespective of one another and currently this is not the behaviour I am experiencing...
Thanks in advance!
Tim
This is just due to you doing it from the same machine, if the searches were coming from two different machines they would go at the same time. Would you really like one person to be able to bog down your MySQL server just by opening a bunch of browser windows and hitting refresh?
That is right. Each select query on a MyISAM table locks the entire table until it is finished. Their excuse is that this achieves "a very high read throughput". Switching to innoDB will allow concurrent reads.