Are "cleaned up" MySQL connections from a connection pool safe to delete? - mysql

Consider following list of connections:
+----------+---------+------+------------------------+
| ID | COMMAND | TIME | STATE |
+----------+---------+------+------------------------+
| 87997796 | Sleep | 15 | cleaned up |
| 90850182 | Sleep | 105 | cleaned up |
| 88009697 | Sleep | 38 | delayed commit ok done |
| 88000267 | Sleep | 6 | delayed commit ok done |
| 88009819 | Sleep | 38 | delayed commit ok done |
| 90634882 | Sleep | 21 | cleaned up |
| 90634878 | Sleep | 21 | cleaned up |
| 90634884 | Sleep | 21 | cleaned up |
| 90634875 | Sleep | 21 | cleaned up |
+----------+---------+------+------------------------+
After some short time under minute:
+----------+---------+------+------------------------+
| ID | COMMAND | TIME | STATE |
+----------+---------+------+------------------------+
| 87997796 | Sleep | 9 | cleaned up |
| 88009697 | Sleep | 32 | delayed commit ok done |
| 88000267 | Sleep | 9 | delayed commit ok done |
| 88009819 | Sleep | 31 | delayed commit ok done |
| 90634882 | Sleep | 14 | cleaned up |
| 90634878 | Sleep | 14 | cleaned up |
| 90634884 | Sleep | 14 | cleaned up |
| 90634875 | Sleep | 14 | cleaned up |
+----------+---------+------+------------------------+
8 rows in set (0.02 sec)
enter code here
After I finished writing this stackoverflow post:
+----------+---------+------+------------------------+
| ID | COMMAND | TIME | STATE |
+----------+---------+------+------------------------+
| 87997796 | Sleep | 0 | cleaned up |
| 88009697 | Sleep | 53 | delayed commit ok done |
| 88000267 | Sleep | 0 | delayed commit ok done |
| 88009819 | Sleep | 52 | delayed commit ok done |
| 90634882 | Sleep | 5 | cleaned up |
| 90634878 | Sleep | 5 | cleaned up |
| 90634884 | Sleep | 5 | cleaned up |
| 90634875 | Sleep | 5 | cleaned up |
+----------+---------+------+------------------------+
Context:
This is some 3rd vendor app opening connections (source code isn't available to us, so we don't know details). We know that their connection management is awful , they know it as well. It is awful because connections leak which you can see in first table - 90850182. If others have their timers reset, then this one starts to age infinitely. In older versions of the app it would stay forever. In newer version it is eventually captured by a "patch" which vendor introduced , which effectively cleans connections after the x seconds you specify. So it's "a leak healing patch".
The problem:
We are hosting hundreds of such vendor apps and most of them have much more than 8 connections as they have more traffic. That results in disgusting number(talking thousands) of connections we have to maintain. About 80% of connections sit in "cleaned up" state and under 120 seconds (cleaned eventually by aforementioned configurable app parameter).
This is all handled by Aurora RDS and AWS engineers told us that if the app doesn't close properly connections the standard "wait_timeout" isn't going to work. Well, "wait_timeout" becomes useless decoration in AWS Aurora, but let us take it with Jeff in other thread/topic.
So regardless, we have this magic configurable parameter from third party vendor set on this obscure app which controls eviction of stale connections and it works.
The questions:
Is it safe to evict connections which are in "cleaned up" state immediately?
At the moment this happens after 120 seconds which results in huge number of such connections. Yet in the tables above you can see that the timers are reset meaning that something is happening to these connections and they are not entirely stale. I.e. connection pooling of the app "touches" them for further re-use?
I don't posses knowledge of connection pools inner guts as how they are seen from within database. Are all reserved connections of a connection pool by default are "sleeping" in "cleaned up" state?
So say if you start cleaning too much, you will fight connection pool aggressively creating more to replenish?
Or reserved connections have some different state?
Even if you don't fully understand the context I'd expect veteran DBA or connection pool library maintainer to help with such questions. Otherwise will get my hands dirty and answer this myself eventually, would try apache connection pool, hikari, observe them and try to kill their idle connections (simulating magic parameter) and try this 3rd party app connection with 0 seconds magic parameter, see if it still works.
Appreciate your time :bow:.

The Answer
Yes, from AWS forum (https://forums.aws.amazon.com/thread.jspa?messageID=708499)
In Aurora the 'cleaned up' state is the final state of a connection
whose work is complete but which has not been closed from the client
side. In MySQL this field is left blank (no State) in the same
circumstance.
Also from the same post:
Ultimately, explicitly closing the connection in code is the best
solution here
From my personal experience as a MySQL DBA, and knowing that "cleaned up" represents a blank state, I'd definitely kill those connections.

Related

Google Cloud functions + SQL Broken Pipe error

I have various Google Cloud functions which are writing and reading to a Cloud SQL database (MySQL). The processes work however when the functions happen to run at the same time I am getting a Broken pipe error. I am using SQLAlchemy with Python, MySQL and the processes are cloud functions and the db is a google cloud database.I have seen suggested solutions that involve setting timeout values to longer. I was wondering if this would be a good approach or if there is a better approach? Thanks for your help in advance.
Heres the SQL broken pipe error:
(pymysql.err.OperationalError) (2006, "MySQL server has gone away (BrokenPipeError(32, 'Broken pipe'))")
(Background on this error at: http://sqlalche.me/e/13/e3q8)
Here are the MySQL timeout values:
show variables like '%timeout%';
+-------------------------------------------+----------+
| Variable_name | Value |
+-------------------------------------------+----------+
| connect_timeout | 10 |
| delayed_insert_timeout | 300 |
| have_statement_timeout | YES |
| innodb_flush_log_at_timeout | 1 |
| innodb_lock_wait_timeout | 50 |
| innodb_rollback_on_timeout | OFF |
| interactive_timeout | 28800 |
| lock_wait_timeout | 31536000 |
| net_read_timeout | 30 |
| net_write_timeout | 60 |
| rpl_semi_sync_master_async_notify_timeout | 5000000 |
| rpl_semi_sync_master_timeout | 3000 |
| rpl_stop_slave_timeout | 31536000 |
| slave_net_timeout | 30 |
| wait_timeout | 28800 |
+-------------------------------------------+----------+
15 rows in set (0.01 sec)
If you cache your connection, for performance, it's normal to lost the connection after a while. To prevent this, you have to deal with disconnection.
In addition, because you are working with Cloud Functions, only one request can be handle in the same time on one instance (if you have 2 concurrent requests, you will have 2 instances). Thus, set your pool size to 1 to save resource on your database side (in case of huge parallelization)

Why am i getting these random timeout errors on mysql database being accessed directly

I have a Tomcat and Mysql installed on the same machine, its not been recently updated so we are using Tomcat 7.0.31 and MySql 5.0.95 on Linux.
When a user makes a purchase, its processed by Paypal and then they contact our server and we create the license and store it in the database, but unfortunately it doesn't always work giving errors such as:
The last packet successfully received from the server was 44,533,707
milliseconds ago. The last packet sent successfully to the server was
44,533,707 milliseconds ago. is longer than the server configured
value of 'wait_timeout'. You should consider either expiring and/or
testing connection validity before use in your application, increasing
the server configured values for client timeouts, or using the
Connector/J connection property 'autoReconnect=true' to avoid this
problem.
But I dont think MySql is down since I have never ever had a problem connecting to it, the errors occur randomly about 5% of the time.
Mysql contains two db instances and in my web.xml file I have
<resource-ref>
<description>DB Connection</description>
<res-ref-name>jdbc/jaikoz</res-ref-name>
<res-type>javax.sql.DataSource</res-type>
<res-auth>Container</res-auth>
</resource-ref>
<resource-ref>
<description>DB Connection2</description>
<res-ref-name>jdbc/songkong</res-ref-name>
<res-type>javax.sql.DataSource</res-type>
<res-auth>Container</res-auth>
</resource-ref>
and in context.xml (I have changed username and password) I have
<Context path="/store" privileged="true">
<Resource name="jdbc/jaikoz" auth="Container" type="javax.sql.DataSource"
maxActive="-1" maxIdle="-1" maxWait="10000"
username="usrnm" password="pwd" driverClassName="com.mysql.jdbc.Driver"
url="jdbc:mysql://localhost:3306/jaikoz?autoReconnect=true"/>
<Resource name="jdbc/songkong" auth="Container" type="javax.sql.DataSource"
maxActive="-1" maxIdle="-1" maxWait="10000"
username="usernm" password="pwd" driverClassName="com.mysql.jdbc.Driver"
url="jdbc:mysql://localhost:3306/songkong?autoReconnect=true"/>
</Context>
Also this configuration is from my Store web application, I also a have a jaikoz and a songkong web application both of these define one of the connections. I had to introduce 'store' since payment provider required a single url to send all successful payments (be it from songkong or jaikoz)
I wonder if having two database connections is breaking things or having two applications define the same connection is breaking things since I don't think i had such a problem when I only the jaikoz application.
The errors say I could use AutorReconnect=true, but I am already doing that.
Ran Show Process List as suggested and got the following:
mysql> show processlist
-> ;
+--------+------+-----------------+----------+---------+-------+-------+------------------+
| Id | User | Host | db | Command | Time | State | Info |
+--------+------+-----------------+----------+---------+-------+-------+------------------+
| 127681 | paul | localhost:40360 | jaikoz | Sleep | 5 | | NULL |
| 127682 | paul | localhost:40361 | jaikoz | Sleep | 7 | | NULL |
| 127683 | paul | localhost:40362 | jaikoz | Sleep | 11 | | NULL |
| 127684 | paul | localhost:40363 | jaikoz | Sleep | 2 | | NULL |
| 127685 | paul | localhost:40364 | jaikoz | Sleep | 17 | | NULL |
| 127754 | paul | localhost:40664 | jaikoz | Sleep | 20 | | NULL |
| 127755 | paul | localhost:40665 | jaikoz | Sleep | 8 | | NULL |
| 127756 | paul | localhost:40666 | jaikoz | Sleep | 25 | | NULL |
| 128444 | paul | localhost:41250 | jaikoz | Sleep | 14 | | NULL |
| 128445 | paul | localhost:41251 | jaikoz | Sleep | 10 | | NULL |
| 134807 | paul | localhost:56829 | jaikoz | Sleep | 226 | | NULL |
| 134849 | paul | localhost:38795 | songkong | Sleep | 475 | | NULL |
| 143552 | paul | localhost:35811 | jaikoz | Sleep | 19338 | | NULL |
| 145211 | paul | localhost | jaikoz | Query | 0 | NULL | show processlist |
+--------+------+-----------------+----------+---------+-------+-------+------------------+
14 rows in set (0.00 sec)
Maybe shoud not be so many Jaikoz processes ?
How do i resolve this ?
Edited
Speaking to someone else since it is only connections from the Store web application that is failing and that only does anything when a purchase is made which is currently only a few times a day it sounds like the connection is being timed out by MySql itself without telling the jdbc pool, so when try to use the connection via the pool we get the error.
I added testWhileIdle="true" to my context.xml and removed the autoreconnect=true hoping this will remove connections from pool before mysql just gives up on them.
Unfortunately still failing sporadically but now giving a slightly different error message
JDBC begin transaction failed
Turn off connection pooling.
Turn off auto-reconnect
Be sure your code catches transaction errors, and replays them if necessary -- especially in the case of deadlocks.
Those tend to be issues relating to integrity of financial transactions.
If those do not clear up the problem, then provide
Queries per second
SHOW VARIABLES LIKE '%timeout';`
Sample of the transaction (SQL, please, not Java)
SHOW CREATE TABLE
As for the PROCESSLIST...
I see one connection (possibly pooled) that performed some SQL 5 seconds ago.
I see one connection (possibly pooled) that has not done anything for 19338 seconds ago (over 5 hours).
I would guess that you have not needed more than 12 simultaneous connections in the past 5 hours.
The list looks "normal".
The key thing was that I needed to provide a validationQuery in order for the validation done by testForIdle to work, since then had no problems.
<Context path="/store" privileged="true">
<Resource name="jdbc/myapp" auth="Container" type="javax.sql.DataSource"
maxActive="10" maxIdle="10"
testWhileIdle="true" validationQuery="select 1" validationQueryTimeout="5"
username="usrnm" password="pwd"
driverClassName="com.mysql.jdbc.Driver" testOnBorrow="true"
url="jdbc:mysql://localhost:3306/myapp"/>
</Context>

Killing sleeping processes in Mysql?

Can anyone tell me how can I kill all the sleeping processes?
I searched for it and I found that we can do it by command
mk-kill --match-command Sleep --kill --victims all --interval 10
I connected the DB server(Linux) but I find the message that command not found.
I tried to connect via MYSQL administrator and it doesn't say that command not found but also doesn't executes the query , just says you have an SQl error
login to Mysql as admin:
mysql -uroot -ppassword;
And than run command:
mysql> show processlist;
You will get something like below :
+----+-------------+--------------------+----------+---------+------+-------+------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+-------------+--------------------+----------+---------+------+-------+------------------+
| 49 | application | 192.168.44.1:51718 | XXXXXXXX | Sleep | 183 | | NULL ||
| 55 | application | 192.168.44.1:51769 | XXXXXXXX | Sleep | 148 | | NULL |
| 56 | application | 192.168.44.1:51770 | XXXXXXXX | Sleep | 148 | | NULL |
| 57 | application | 192.168.44.1:51771 | XXXXXXXX | Sleep | 148 | | NULL |
| 58 | application | 192.168.44.1:51968 | XXXXXXXX | Sleep | 11 | | NULL |
| 59 | root | localhost | NULL | Query | 0 | NULL | show processlist |
+----+-------------+--------------------+----------+---------+------+-------+------------------+
You will see complete details of different connections. Now you can kill the sleeping connection as below:
mysql> kill 55;
Query OK, 0 rows affected (0.00 sec)
kill $queryID; is helpful but if there is only one query causing an issue;
Having a lot of MySQL sleeping processes can cause a huge spike in your CPU load or IO
Here is a simple one-line command (if behind the MySQL server is linux) which would kill all of the current sleeping MySQL processes:
for i in `mysql -e "show processlist" | awk '/Sleep/ {print $1}'` ; do mysql -e "KILL $i;"; done
This is only a temporary repair; I strongly advise identifying and addressing the problem's main cause.
For instance, you may set the wait timeout variable to the amount of time you want MySQL to hold open connections before shutting them.
But if the issue still persists and you have to investigate the DB queries that cause the problem there is another way. In screen session, you can use another while cycle to continuously kill the sleeping queries. (while there is an output of the mysql show processlit | grep -i sleep | awk id column and kill it.) If you are using MySQL replication between different hosts this will help them to catch up. So when using show slave status\G; Seconds_behind_master will be going to catch up.
Of course, you should investigate the root cause again.

Logarithmically increasing execution time for each loop of a ForEach control

First, some background, I’m an SSIS newbie and I’ve just completed my second data-import project.
The package is very simple and consists of a dataflow that imports a tab-separated customer values file of ~30,000 records into an ADO recordset variable which in turn is used to power a ForEach Loop Container that executes a piece of SQL passing in values from each row of the recordset.
The import of the first ~21,000 records took 59 hours to accomplish, prior to it failing! The last ~9,000 took a further 8 hours. Yes, 67 hours in total!
The SQL consists of a check to determine if the record already exists, a call to a procedure to generate a new password, and a final call to another procedure to insert the customer data into our system. The final procedure returns a recordset, but I’m disintersted in the result and so I have just ignored it. I don’t know whether SSIS discards the recordset or not. I am aware that this is the slowest possible way of getting the data into the system, but I did not expect it to be this slow, nor to fail two thirds of the way through, and again whilst processing the last ~9,000.
When I tested the a ~3,000 record subset on my local machine the Execute Package Utility reported that each insert was taking approximately 1 second. A bit of quick math and the suggestion was that the total import would take around 8 hours to run. Seemed like a long time, which I had expected given all that I had read about SSIS and RBAR execution. I figured that the final import would be a bit quicker as the server is considerably more powerful. Although I am accessing the server remotely, but I wouldn’t have expected this to be an issue, as I have performed imports in the past, using bespoke c# console applications that use simple ADO connections and have had nothing run anywhere near as slowly.
Initially the destination table wasn’t optimised for the existence check, and I thought this could be the cause of the slow performance. I added an appropriate index to the table to change the test from a scan to a seek, expecting that this would get rid of the performance issue. Bizarrely it seemed to have no visible effect!
The reason we use the sproc to insert the data into our system is for consistency. It represents the same route that the data takes if it is inserted into our system via our web front-end. The insertion of the data also causes a number of triggers to fire and update various other entities in the database.
What’s been occurring during this import though, and has me scratching my head, is that the execution time for the SQL batch, as reported by the output of the Execute Package Utility has been logarithmically increasing during the run. What starts out as a sub-one second execution time, ends up over the course of the import at greater than 20 seconds, and eventually the import package just simply ground to a complete halt.
I've searched all over the web multiple times, thanks Google, as well as StackOverflow, and haven’t found anything that describes these symptoms.
Hopefully someone out there has some clues.
Thanks
In response to ErikE: (I couldn’t fit this into a comment, so I've added it here.)
Erik. as per your request I ran the profiler over the database whilst running the three thousand item test file through it’s paces.
I wasn’t able to easily figure out how to get SSIS to insert a visible difference into the code that would be visible to the profiler, so I just ran the profiler for the whole run. I know there will be some overhead associated with this, but, theoretically, it should be more or less consistent over the run.
The duration on a per item basis remains pretty constant over the whole run.
Below is cropped output from the trace. In the run that I've done here the first 800 overlapped previously entered data, so the system was effectively doing no work (Yay indexes!). As soon as the index stopped being useful and the system was actually inserting new data, you can see the times jump accordingly, but they don’t seem to change much, if at all between the first and last elements, with the number of reads being the largest item.
------------------------------------------
| Item | CPU | Reads | Writes | Duration |
------------------------------------------
| 0001 | 0 | 29 | 0 | 0 |
| 0002 | 0 | 32 | 0 | 0 |
| 0003 | 0 | 27 | 0 | 0 |
|… |
| 0799 | 0 | 32 | 0 | 0 |
| 0800 | 78 | 4073 | 40 | 124 |
| 0801 | 32 | 2122 | 4 | 54 |
| 0802 | 46 | 2128 | 8 | 174 |
| 0803 | 46 | 2128 | 8 | 174 |
| 0804 | 47 | 2131 | 15 | 242 |
|… |
| 1400 | 16 | 2156 | 1 | 54 |
| 1401 | 16 | 2167 | 3 | 72 |
| 1402 | 16 | 2153 | 4 | 84 |
|… |
| 2997 | 31 | 2193 | 2 | 72 |
| 2998 | 31 | 2195 | 2 | 48 |
| 2999 | 31 | 2184 | 2 | 35 |
| 3000 | 31 | 2180 | 2 | 53 |
------------------------------------------
Overnight I've also put the system through a full re-run of the import with the profiler switched on to see how things feared. It managed to get through 1 third of the import in 15.5 hours on my local machine. I exported the trace data to a SQL table so that I could get some statistics from it. Looking at the data in the trace, the delta between inserts increases by ~1 second per thousand records processed, so by the time it’s reached record 10,000 it’s taking 10 seconds per record to perform the insert. The actual code being executed for each record is below. Don’t bother critiquing the procedure, the SQL was written by the self-taught developer who was originally our receptionist long before anyone with actual developer education was employed by the company. We are well aware that it’s not good. The main thing is that I believe it should execute at a constant rate, and it very obviously doesn’t.
if not exists
(
select 1
from [dbo].[tblSubscriber]
where strSubscriberEmail = #EmailAddress
and ProductId = #ProductId
and strTrialSource = #Source
)
begin
declare #ThePassword varchar(20)
select #ThePassword = [dbo].[DefaultPassword]()
exec [dbo].[MemberLookupTransitionCDS5]
#ProductId
,#EmailAddress
,#ThePassword
,NULL --IP Address
,NULL --BrowserName
,NULL --BrowserVersion
,2 --blnUpdate
,#FirstName --strFirstName
,#Surname --strLastName
,#Source --strTrialSource
,#Comments --strTrialComments
,#Phone --strSubscriberPhone
,#TrialType --intTrialType
,NULL --Redundant MonitorGroupID
,NULL --strTrialFirstPage
,NULL --strTrialRefererUrl
,30 --intTrialSubscriptionDaysLength
,0 --SourceCategoryId
end
GO
Results of determining the difference in time between each execution (cropped for brevity).
----------------------
| Row | Delta (ms) |
----------------------
| 500 | 510 |
| 1000 | 976 |
| 1500 | 1436 |
| 2000 | 1916 |
| 2500 | 2336 |
| 3000 | 2816 |
| 3500 | 3263 |
| 4000 | 3726 |
| 4500 | 4163 |
| 5000 | 4633 |
| 5500 | 5223 |
| 6000 | 5563 |
| 6500 | 6053 |
| 7000 | 6510 |
| 7500 | 6926 |
| 8000 | 7393 |
| 8500 | 7846 |
| 9000 | 8503 |
| 9500 | 8820 |
| 10000 | 9296 |
| 10500 | 9750 |
----------------------
Let's take some steps:
Advice: Isolate if it is a server issue or a client one. Run a trace and see how long the first insert takes compared to the 3000th. Include in the SQL statements some difference on the 1st and 3000th iteration that can be filtered for in the trace so it is not capturing the other events. Try to avoid statement completion--use batch or RPC completion.
Response: The recorded CPU, reads, and duration from your profiler trace are not increasing, but the actual elapsed/effective insert time is.
Advice: Assuming that the above pattern holds true through the 10,000th insert (please advise if different), my best guess is that some blocking is occurring, maybe something like a constraint validation that is doing a nested loop join, which would scale logarithmically with the number of rows in the table just as you are seeing. Would you please do the following:
Provide the full execution plan of the INSERT statement using SET SHOWPLAN_TEXT ON.
Run a trace on the Blocked Process Report event and report on anything interesting.
Read Eliminating Deadlocks Caused by Foreign Keys with Large Transactions and let me know if this might be the cause or if I am barking up the wrong tree.
If none of this makes progress on the problem, simply update your question with any new information and comment here, and I'll continue to do my best to help.

How to delete sleep process in Mysql

I found that my mysql sever have many of connection who is sleep. i want to delete them all.
so how i can configure my mysql server than then delete or dispose the connection who is in sleep not currently in process.
are this possible to delete this thing in mysql tell me how i can do following
a connection allow only one time datareader open and destroy the connection [process] after giving resposnse of query.
If you want to do it manually you can do like this:
login to Mysql as admin:
mysql -uroot -ppassword;
And than run command:
mysql> show processlist;
You will get something like below :
+----+-------------+--------------------+----------+---------+------+-------+------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----+-------------+--------------------+----------+---------+------+-------+------------------+
| 49 | application | 192.168.44.1:51718 | XXXXXXXX | Sleep | 183 | | NULL ||
| 55 | application | 192.168.44.1:51769 | XXXXXXXX | Sleep | 148 | | NULL |
| 56 | application | 192.168.44.1:51770 | XXXXXXXX | Sleep | 148 | | NULL |
| 57 | application | 192.168.44.1:51771 | XXXXXXXX | Sleep | 148 | | NULL |
| 58 | application | 192.168.44.1:51968 | XXXXXXXX | Sleep | 11 | | NULL |
| 59 | root | localhost | NULL | Query | 0 | NULL | show processlist |
+----+-------------+--------------------+----------+---------+------+-------+------------------+
You will see complete details of different connections. Now you can kill the sleeping connection as below:
mysql> kill 52;
Query OK, 0 rows affected (0.00 sec)
Why would you want to delete a sleeping thread? MySQL creates threads for connection requests, and when the client disconnects the thread is put back into the cache and waits for another connection.
This reduces a lot of overhead of creating threads 'on-demand', and it's nothing to worry about. A sleeping thread uses about 256k of memory.
you can find all working process execute the sql:
show process;
and you will find the sleep process, if you want terminate it, please remember the processid and excute this sql:
kill processid
but actually you can set a timeout variable in my.cnf:
wait_timeout=15
connect_timeout=10
interactive_timeout=100
for me with MySql server on windows,
I update the file (because cannot set variable with sql request due privileges):
D:\MySQL\mysql-5.6.48-winx64\my.ini
add the lines:
wait_timeout=61
interactive_timeout=61
restart service, and acknowledge new values with:
SHOW VARIABLES LIKE '%_timeout';
==> i do a connection tests and after 1 minutes all 10+ connections in sleep are disapeared!