I have a reasonably large dataset of about 6,000,000 rows X 60 columns that I am trying to insert into a database. I am chunking them, and inserting them 10,000 at a time into a mysql database using a class I've written and pymysql. The problem is, I occasionally time out the server when writing, so I've modified my executemany call to re-connect on errors. This works fine for when I lose connection once, but if I lose the error a second time, I get a pymysql.InternalException stating that lock wait timeout exceeded. I was wondering how I could modify the following code to catch that and destroy the transaction completely before attempting again.
I've tried calling rollback() on the connection, but this causes another InternalException if the connection is destroyed because there is no cursor anymore.
Any help would be greatly appreciated (I also don't understand why I am getting the timeouts to begin with, but the data is relatively large.)
class Database:
def __init__(self, **creds):
self.conn = None
self.user = creds['user']
self.password = creds['password']
self.host = creds['host']
self.port = creds['port']
self.database = creds['database']
def connect(self, type=None):
self.conn = pymysql.connect(
host = self.host,
user = self.user,
password = self.password,
port = self.port,
database = self.database
)
def executemany(self, sql, data):
while True:
try:
with self.conn.cursor() as cursor:
cursor.executemany(sql, data)
self.conn.commit()
break
except pymysql.err.OperationalError:
print('Connection error. Reconnecting to database.')
time.sleep(2)
self.connect()
continue
return cursor
and I am calling it like this:
for index, chunk in enumerate(dataframe_chunker(df), start=1):
print(f"Writing chunk\t{index}\t{timer():.2f}")
db.executemany(insert_query, chunk.values.tolist())
Take a look at what MySQL is doing. The lockwait timeouts are because the inserts cannot be done until something else finishes, which could be your own code.
SELECT * FROM `information_schema`.`innodb_locks`;
Will show the current locks.
select * from information_schema.innodb_trx where trx_id = [lock_trx_id];
Will show the involved transactions
SELECT * FROM INFORMATION_SCHEMA.PROCESSLIST where id = [trx_mysql_thread_id];
Will show the involved connection and may show the query whose lock results in the lock wait timeout. Maybe there is an uncommitted transaction.
It is likely your own code, because of the interaction with your executemany function which catches exceptions and reconnects to the database. What of the prior connection? Does the lockwait timeout kill the prior connection? That while true is going to be trouble.
For the code calling executemany on the db connection, be more defensive on the try/except with something like:
def executemany(self, sql, data):
while True:
try:
with self.conn.cursor() as cursor:
cursor.executemany(sql, data)
self.conn.commit()
break
except pymysql.err.OperationalError:
print('Connection error. Reconnecting to database.')
if self.conn.is_connected():
connection.close()
finally:
time.sleep(2)
self.connect()
But the solution here will be to not induce lockwait timeouts if there are no other database clients.
Related
I do multiple requests to Mysqlsd and for specific users I get this error
MySQLdb._exceptions.OperationalError: (2013, 'Lost connection to MySQL server during query')
This error occurs on line
cursor.execute('Select * from process WHERE email=%s ORDER BY timestamp DESC LIMIT 20', ("tommy345a#outlook.com",))
But when I do the same query but for a different user, there is no problem.
cursor.execute('Select * from process WHERE email=%s ORDER BY timestamp DESC LIMIT 20', ("cafahxxxx#gmail.com",))
The page loads correctly.
More details on MySQL is given below
#modules used
from flask_mysqldb import MySQL
import MySQLdb.cursors
#setup
app.config['MYSQL_HOST'] = 'localhost'
app.config['MYSQL_USER'] = 'myusername'
app.config['MYSQL_PASSWORD'] = 'mypassword'
app.config['MYSQL_DB'] = 'my_db'
mysql = MySQL(app)
#extract of requests made to db
#app.route('/myhome', methods=['GET', 'POST'])
def home_page():
email = "tommy345a#outlook.com"
cursor = mysql.connection.cursor(MySQLdb.cursors.DictCursor)
cursor.execute('SELECT * FROM mail WHERE email = %s', (email,))
completed = cursor.fetchone()
cursor.execute('SELECT sum(transaction) FROM transactions WHERE email=%s', (email,))
points = cursor.fetchone()
cursor.execute('Select * from process WHERE email=%s ORDER BY timestamp DESC LIMIT 20', (email,))
transactions = cursor.fetchall()
I will post the process table from Mysql here so you can figure out why the issue is happening for red marked user and not for green marked user.
Also, this might be an issue of packet size but I haven't had any issue till yesterday (there was more than 11 entries under the user tommy. I deleted 6 rows and it is still not working). Also if you think it is due to packet size, please tell me how can I solve it without increasing packet size because I am on a shared network and the hosting provider is not letting me increase the packet size limit.
Structure
There are multiple potential reasons for "server has gone away" and only one of them is due to data being too large for your max allowed packet size.
https://dev.mysql.com/doc/refman/8.0/en/gone-away.html
If it is caused by the data being too large, keep in mind that each row of result is its own packet. It's not the whole result set that must fit in a packet. If you had 11 rows and you deleted 6 but still get the error, then the row that caused the problem still exists.
You could:
Remove the row that is too large. You might want to change the column data types of your table so that a given row cannot be too large. You showed a screenshot of some data, but I have no idea what data types you use. Hint: use SHOW CREATE TABLE process.
Change the max_allowed_packet as a session variable, since you don't have access to change the global variable. Also keep in mind the client must also change its max allowed packet to match. Read https://dev.mysql.com/doc/refman/8.0/en/packet-too-large.html
Here is one part of my python code for inserting data retrieved from Sharepoint into MySQL Database, it runs every 10 mins.
cnx = mysql.connector.connect(user='user', password='pwd', host='CP-MSQL03',port='3306',database='datacenter')
cnx.autocommit=True
MySQL_cursor = cnx.cursor()
r = requests.get(url, headers=headers, params=params)
delete_data = []
for item in r.json()['d']['results']:
id = item['Id']
lot_No = item['LOT_No']
start = datetime.strptime(item['START'], '%Y-%m-%dT%H:%M:%SZ')
lbs = item['LBS']
line_NAV = item['Line_No']['NAV_x0020_HS_x0020_LINE']
rpo_No = item['RPO_No']['RPO_No']
item_NO = item['RPO_No']['Item_No']
mrt_No = item['RPO_No']['MRT_No']
SQL_Insert = (
"INSERT INTO datacenter.mrt_consumption_archive (Line_No, RPO_No, Item_No, MRT_No, LOT_No, Start_Time, LBS) "
"VALUES('%s', '%s', '%s', '%s', '%s', '%s', %s);" % (
line_NAV, rpo_No, item_NO, mrt_No, lot_No, start, lbs))
MySQL_cursor.execute(SQL_Insert)
delete_data.append(id)
And Here is the error code I got after it ran successfully for a few hours.
My question is, why do I get this error? Is it a firewall issue? timeout setting issue? How can I troubleshoot it?
And, why am I keep getting the same error at all the retries after it failed for the first time?
Connections drop, it happens. A firewall, NAT-enabled router, etc. may be making it happen more often than it should, but it's still not something you want your program to crash from.
So, in general, before you run your query, it has to do a test of the connection and catch any connection exceptions. when caught, restart the connection. Luckily this is a familiar concept called pooling and it's already available from the connector.
Excerpted from https://dev.mysql.com/doc/connector-python/en/connector-python-connection-pooling.html :
To create a connection pool implicitly: Open a connection and specify one or more pool-related arguments (pool_name, pool_size). For example:
dbconfig = {
"database": "test",
"user": "joe"
}
cnx = mysql.connector.connect(pool_name = "mypool",
pool_size = 3,
**dbconfig)
If you just want to look at it from the perspective of keeping the connections open for some reason, you could also set a short duration keepalive (which may be all the workaround you need, if the problem is that an unreliable network device is purging your connections from tables in its memory). If you can get your network problem fixed, that is a better route than customizing connection settings.
If your script ran successfully for a few hours I think could be one parameter of your database configuration, review your configuration whit this sentence,
SHOW GLOBAL VARIABLES where variable_name like'%time%';
Maybe it's a lot of connections or maybe a timeout of session, but I'm prety sure that it's problem of your server of MySQL
I got the same ConnectionAbortedError and the resulting OperationalError while attempting to write a dataframe to a MySQL table.
df.to_sql('table_name', db_conn, if_exists='append', index=False)
Technically, the above operation is similar to yours:
SQL_Insert = (
"INSERT INTO datacenter.mrt_consumption_archive (Line_No, RPO_No, Item_No, MRT_No, LOT_No, Start_Time, LBS) "
"VALUES('%s', '%s', '%s', '%s', '%s', '%s', %s);" % (
line_NAV, rpo_No, item_NO, mrt_No, lot_No, start, lbs))
MySQL_cursor.execute(SQL_Insert)
MySQL Documentation suggests increasing the connect_timeout might help:
Increasing the connect_timeout value might help if clients frequently encounter errors of the form Lost connection to MySQL server at 'XXX', system error: errno.
That did not work for me though. I resolved the error by specifying chunksize parameter:
df.to_sql('table_name', db_conn, if_exists='append', index=False, chunksize=2000)
Try:
increasing connect_timeout
specifying and increasing pool_size in your database connection.
I have a large query to execute through SQL Alchemy which has approximately 2.5 million rows. It's connecting to a MySQL database. When I do:
transactions = Transaction.query.all()
It eventually times out around ten minutes. And gets this error: sqlalchemy.exc.OperationalError: (pymysql.err.OperationalError) (2013, 'Lost connection to MySQL server during query')
I've tried setting different parameters when doing create_engine like:
create_engine(connect_args={'connect_timeout': 30})
What do I need to change so the query will not timeout?
I would also be fine if there is a way to paginate the results and go through them that way.
Solved by pagination:
page_size = 10000 # get x number of items at a time
step = 0
while True:
start, stop = page_size * step, page_size * (step+1)
transactions = sql_session.query(Transaction).slice(start, stop).all()
if transactions is None:
break
for t in transactions:
f.write(str(t))
f.write('\n')
if len(transactions) < page_size:
break
step += 1
f.close()
I'm connecting to a MySQL database through the Matlab Database Toolbox in order to run the same query over and over again within 2 nested for loops. After each iteration I get this warning:
Warning: com.mathworks.toolbox.database.databaseConnect#26960369 is not serializable
In Import_Matrices_DOandT_julaugsept_inflow_nomettsed at 476
Warning: com.mysql.jdbc.Connection#6e544a45 is not serializable
In Import_Matrices_DOandT_julaugsept_inflow_nomettsed at 476
Warning: com.mathworks.toolbox.database.databaseConnect#26960369 not serializable
In Import_Matrices_DOandT_julaugsept_inflow_nomettsed at 476
Warning: com.mysql.jdbc.Connection#6e544a45 is not serializable
In Import_Matrices_DOandT_julaugsept_inflow_nomettsed at 476
My code is basically structured like this:
%Server
host =
user =
password =
dbName =
%# JDBC parameters
jdbcString = sprintf('jdbc:mysql://%s/%s', host, dbName);
jdbcDriver = 'com.mysql.jdbc.Driver';
%# Create the database connection object
conn = database(dbName, user , password, jdbcDriver, jdbcString);
setdbprefs('DataReturnFormat', 'numeric');
%Loop
for SegmentNum=3:41;
for tl=1:15;
tic;
sqlquery=['giant string'];
results = fetch(conn, sqlquery);
(some code here that saves the results into a few variables)
save('inflow.mat');
end
end
time = toc
close(conn);
clear conn
Eventually, after some iterations the code will crash with this error:
Error using database/fetch (line 37)
Query execution was interrupted
Error in Import_Matrices_DOandT_julaugsept_inflow_nomettsed (line
466)
results = fetch(conn, sqlquery);
Last night it errored after 25 iterations. I have about 600 iterations total I need to do, and I don't want to have to keep checking back on it every 25. I've heard there can be memory issues with database connection objects...is there a way to keep my code running?
Let's take this one step at a time.
Warning: com.mathworks.toolbox.database.databaseConnect#26960369 is not serializable
This comes from this line
save('inflow.mat');
You are trying to save the database connection. That doesn't work. Try specifying the variables you wish to save only, and it should work better.
There are a couple of tricks to excluding the values, but honestly, I suggest you just find the most important variables you wish to save, and save those. But if you wish, you can piece together a solution from this page.
save inflow.mat a b c d e
Try wrapping the query in a try catch block. Whenever you catch an error reset the connection to the database which should free up the object.
nQuery = 100;
while(nQuery>0)
try
query_the_database();
nQuery = nQuery - 1;
catch
reset_database_connection();
end
end
The ultimate main reason for this is that database connection objects are TCP/IP ports and multiple processes cannot access the same port. That is why database connection object are not serialized. Ports cannot be serialized.
Workaround is to create a connection with in the for loop.
This is a sample code I'd like to run:
for i in range(1,2000):
db = create_engine('mysql://root#localhost/test_database')
conn = db.connect()
#some simple data operations
conn.close()
db.dispose()
Is there a way of running this without getting "Too many connections" errors from MySQL?
I already know I can handle the connection otherwise or have a connection pool. I'd just like to understand how to properly close a connection from sqlalchemy.
Here's how to write that code correctly:
db = create_engine('mysql://root#localhost/test_database')
for i in range(1,2000):
conn = db.connect()
#some simple data operations
conn.close()
db.dispose()
That is, the Engine is a factory for connections as well as a pool of connections, not the connection itself. When you say conn.close(), the connection is returned to the connection pool within the Engine, not actually closed.
If you do want the connection to be actually closed, that is, not pooled, disable pooling via NullPool:
from sqlalchemy.pool import NullPool
db = create_engine('mysql://root#localhost/test_database', poolclass=NullPool)
With the above Engine configuration, each call to conn.close() will close the underlying DBAPI connection.
If OTOH you actually want to connect to different databases on each call, that is, your hardcoded "localhost/test_database" is just an example and you actually have lots of different databases, then the approach using dispose() is fine; it will close out every connection that is not checked out from the pool.
In all of the above cases, the important thing is that the Connection object is closed via close(). If you're using any kind of "connectionless" execution, that is engine.execute() or statement.execute(), the ResultProxy object returned from that execute call should be fully read, or otherwise explicitly closed via close(). A Connection or ResultProxy that's still open will prohibit the NullPool or dispose() approaches from closing every last connection.
Tried to figure out a solution to disconnect from database for an unrelated problem (must disconnect before forking).
You need to invalidate the connection from the connection Pool too.
In your example:
for i in range(1,2000):
db = create_engine('mysql://root#localhost/test_database')
conn = db.connect()
# some simple data operations
# session.close() if needed
conn.invalidate()
db.dispose()
I use this one
engine = create_engine('...')
with engine.connect() as conn:
conn.execute(text(f"CREATE SCHEMA IF NOT EXISTS...")
engine.dispose()
In my case these always works and I am able to close!
So using invalidate() before close() makes the trick. Otherwise close() sucks.
conn = engine.raw_connection()
conn.get_warnings = True
curSql = xx_tmpsql
myresults = cur.execute(curSql, multi=True)
print("Warnings: #####")
print(cur.fetchwarnings())
for curresult in myresults:
print(curresult)
if curresult.with_rows:
print(curresult.column_names)
print(curresult.fetchall())
else:
print("no rows returned")
cur.close()
conn.invalidate()
conn.close()
engine.dispose()