What does it mean if the Mysql query:
SHOW PROCESSLIST;
returns "Sending data" in the State column?
I imagine it means the query has been executed and MySQL is sending “result” Data to the client but I'm wondering why its taking so much time (up to an hour).
Thank you.
This is quite a misleading status. It should be called "reading and filtering data".
This means that MySQL has some data stored on the disk (or in memory) which is yet to be read and sent over. It may be the table itself, an index, a temporary table, a sorted output etc.
If you have a 1M records table (without an index) of which you need only one record, MySQL will still output the status as "sending data" while scanning the table, despite the fact it has not sent anything yet.
MySQL 8.0.17 and later: This state is no longer indicated separately, but rather is included in the Executing state.
In this state:
The thread is reading and processing rows for a
SELECT statement, and sending data to the client.
Because operations occurring during this this state tend to perform
large amounts of disk access (reads).
That's why it takes more time to complete and so is the longest-running state over the lifetime of a given query.
Related
What does it mean if the Mysql query:
SHOW PROCESSLIST;
returns "Sending data" in the State column?
I imagine it means the query has been executed and MySQL is sending “result” Data to the client but I'm wondering why its taking so much time (up to an hour).
Thank you.
This is quite a misleading status. It should be called "reading and filtering data".
This means that MySQL has some data stored on the disk (or in memory) which is yet to be read and sent over. It may be the table itself, an index, a temporary table, a sorted output etc.
If you have a 1M records table (without an index) of which you need only one record, MySQL will still output the status as "sending data" while scanning the table, despite the fact it has not sent anything yet.
MySQL 8.0.17 and later: This state is no longer indicated separately, but rather is included in the Executing state.
In this state:
The thread is reading and processing rows for a
SELECT statement, and sending data to the client.
Because operations occurring during this this state tend to perform
large amounts of disk access (reads).
That's why it takes more time to complete and so is the longest-running state over the lifetime of a given query.
I have a mysql table that keep gaining new records every 5 seconds.
The questions are
can I run query on this set of data that may takes more than 5 seconds?
if SELECT statement takes more than 5s, will it affect the scheduled INSERT statement?
what happen when INSERT statement invoked while SELECT is still running, will SELECT get the newly inserted records?
I'll go over your questions and some of the comments you added later.
can I run query on this set of data that may takes more than 5 seconds?
Can you? Yes. Should you? It depends. In a MySQL configuration I set up, any query taking longer than 3 seconds was considered slow and logged accordingly. In addition, you need to keep in mind the frequency of the queries you intend to run.
For example, if you try to run a 10 second query every 3 seconds, you can probably see how things won't end well. If you run a 10 second query every few hours or so, then it becomes more tolerable for the system.
That being said, slow queries can often benefit from optimizations, such as not scanning the entire table (i.e. search using primary keys), and using the explain keyword to get the database's query planner to tell you how it intends to work on that internally (e.g. is it using PKs, FKs, indices, or is it scanning all table rows?, etc).
if SELECT statement takes more than 5s, will it affect the scheduled INSERT statement?
"Affect" in what way? If you mean "prevent insert from actually inserting until the select has completed", that depends on the storage engine. For example, MyISAM and InnoDB are different, and that includes locking policies. For example, MyISAM tends to lock entire tables while InnoDB tends to lock specific rows. InnoDB is also ACID-compliant, which means it can provide certain integrity guarantees. You should read the docs on this for more details.
what happen when INSERT statement invoked while SELECT is still running, will SELECT get the newly inserted records?
Part of "what happens" is determined by how the specific storage engine behaves. Regardless of what happens, the database is designed to answer application queries in a way that's consistent.
As an example, if the select statement were to lock an entire table, then the insert statement would have to wait until the select has completed and the lock has been released, meaning that the app would see the results prior to the insert's update.
I understand that locking database can prevent messing up the SELECT statement.
It can also put a potentially unacceptable performance bottleneck, especially if, as you say, the system is inserting lots of rows every 5 seconds, and depending on the frequency with which you're running your queries, and how efficiently they've been built, etc.
what is the good practice to do when I need the data for calculations while those data will be updated within short period?
My recommendation is to simply accept the fact that the calculations are based on a snapshot of the data at the specific point in time the calculation was requested and to let the database do its job of ensuring the consistency and integrity of said data. When the app requests data, it should trust that the database has done its best to provide the most up-to-date piece of consistent information (i.e. not providing a row where some columns have been updated, but others yet haven't).
With new rows coming in at the frequency you mentioned, reasonable users will understand that the results they're seeing are based on data available at the time of request.
All of your questions are related to locking of table.
Your all questions depend on the way database is configured.
Read : http://www.mysqltutorial.org/mysql-table-locking/
Perform Select Statement While insert statement working
If you want to perform a select statement during insert SQL is performing, you should check by open new connection and close connection every time. i.e If I want to insert lots of records, and want to know that last record has inserted by selecting query. I must have to open connection and close connection in for loop or while loop.
# send a request to store data
insert statement working // take a long time
# select statement in while loop.
while true:
cnx.open()
select statement
cnx.close
//break while loop if you get the result
Does my query get sent to the database once and I get a list of all the results in one shot which I then loop through, or do I have to request the next row from the DB each time?
Essentially, does reducing the number of rows I expect to return mean less connections/calls to the DB meaning my DB will be able to handle more connections at once, or is the number of database connections not dependent on the number of returned rows?
Your question is very vague, and seems to have the terminology jumbled up.
The number of rows retrieved from a resultset has no bearing on the number of connections. Nor does the number of statements executed have any bearing on connections.
(Unless it's a very badly written application that churns connections, connecting and disconnecting from the database for each statement execution.)
I think what you're asking is whether there's a "roundtrip" made to the database server for each "fetch", to retrieve a row from a resultset returned by a SELECT query.
The answer to that question is no, most database libraries fetch a "batch" of rows. When the client requests the next row, it's returned from the set already returned by the library. Once the batch is exhausted, it's another roundtrip to get the next set. That's all "under the covers" and your application doesn't need to worry about it.
It doesn't matter whether you fetch just one row, and then discard the resultset, or whether you loo[ through and fetch every row. It's the same pattern.
In terms of performance and scalability, if you only need to retrieve four rows, then there's no sense in preparing a resultset containing more than that. When your query runs against the database, the database server generates the resultset, and holds that, until the client requests a row from the resultset. Larger resultsets require more resources on the database server.
A huge resultset is going to need more roundtrips to the database server, to retrieve all of the rows, than a smaller resultset.
It's not just the number of rows, it's also the size of the row being returned. (Which is why DBA types gripe about developer queries that do a SELECT * FROM query to retrieve every flipping column, when the client is actually using only a small subset of the columns.)
Reducing roundtrips to the database generally improves performance, especially if the client is connecting to the database server over a network connection.
But we don't really need to be concerned how many roundtrips it requires to fetch all the rows from a resultset... it takes what it takes, the client needs what it needs.
What we should be concerned with is the number of queries we run
A query execution is WAY more overhead than the simple roundtrip for a fetch. The SQL statement has to be parsed for syntax (order and pattern of keywords and identifiers is correct), for semantics (the identifiers reference valid tables, columns, functions, and user has to appropriate permissions on the database objects), generate a execution plan (evaluate which predicates and operations can be satisfied with which indexes, the permutations for the order the operations are performed in). Finally, the statement can be executed, and if a resultset is being returned, prepare that, and then notify the client of query completion status, and wait for the client to request rows from the resultset, when the client closes the resultset, the database server can clean up, releasing the memory, etc.
These aren't linked. The number of connections you are able to make is based on the quality of the thread library and the amount of RAM available / used by each thread. So essentially, it is limited by the quality of the systems and not the complexity of the database. As the threads use a buffer, the number of rows will only make the processes slower or a fixed amount of RAM. see here for more
https://dev.mysql.com/doc/refman/5.0/en/memory-use.html
Q: Does my query get sent to the database once and I get a list of all the results in one shot which I then loop through, or do I have to request the next row from the DB each time?
A: You are getting back the batch of rows. You are iterating thru until you need the next batch (trip to DB on the same connection). Size of batch depends on multiple conditions, if your query result dataset is small you may get all results in one shot.
Q: Essentially, does reducing the number of rows I expect to return mean less connections/calls to the DB meaning my DB will be able to handle more connections at once, or is the number of database connections not dependent on the number of returned rows?
A: The larger the dataset the more trips (to grab next set of rows) to DB there may be. But number of connections opened to DB does not depend on your result dataset size for single query.
I'm currently building a system that does running computations, and every 5 seconds inserts or updates information based on those computations to a few rows in MySQL. I'm working on running this system on a few different servers at once right now with a few agents that are each doing similar processing and then writing on the same set of rows. I already randomize the order in which each agent writes its set of rows, but there's still a lot of deadlock happening. What's the best/fastest way to get through those deadlocks? Should I just rerun the query each time one happens, or do row locks, or something else entirely?
I suggest you try something that won't require more than one client to update your 'few rows.'
For example, you could have each agent that produces results do an INSERT to a staging table with the MEMORY access method.
Then, every five seconds you can run a MySQL event (a stored procedure within the server) that loops through all the rows in that table, posting their results to your 'few rows' and then deleting them. If it's important for the rows in your staging table to be processed in order, then you can use an AUTO_INCREMENT id field. But it might not be important for them to be in order.
If you want to get fancier and more scalable than that, you'll need a queue management system like Apache ActiveMQ.
While working with MySQL and some really "performance greedy queries" I noticed, that if I run such a greedy query it could take 2 or 3 minutes to be computed. But if I retry the query immediately after it finished the first time, it takes only some seconds. Does MySQL store something like "the last x queries"?
The short answer is yes. there is a Query Cache.
The query cache stores the text of a SELECT statement together with the corresponding result that was sent to the client. If an identical statement is received later, the server retrieves the results from the query cache rather than parsing and executing the statement again. The query cache is shared among sessions, so a result set generated by one client can be sent in response to the same query issued by another client.
from here
The execution plan for the query will be calculated and re-used. The data can be cached, so subsequent executions will be faster.
Yes, depending on how the MySQL Server is configured, it may be using the query cache. This stores the results of identical queries until a certain limit (which you can set if you control the server) has been reached. Read http://dev.mysql.com/doc/refman/5.1/en/query-cache.html to find out more about how to tune your query cache to speed up your application if it issues many identical queries.