I was running into a memory leak using mathworks' database toolbox with mysql. If I was returning large amounts of data, eventually the I would get an out of memory error running out of java heap space.
I had seen this before and used the com.mysql.jdbc.JDBC4Prepared Statement and com.mysql.jdbc.JDBC4ResultSet driver calls myself. When you do this, you have to be sure to call the close() methods of each of those, or else you have the same problem.
I remembered this today when mathworks shoddy object was hitting heap problems by just exec'ing the same query and fetching the results a few thousand times. It turns out you can see the mysql jdbc objects in the mathworks cursor and resultset objects:
curs = exec(dbConn, sqlStr)
rs = fetch(curs)
curs =
Attributes: []
Data: 0
DatabaseObject: [1x1 database]
RowLimit: 0
SQLQuery: 'xxxxxomitxxxx'
Message: []
Type: 'Database Cursor Object'
ResultSet: [1x1 com.mysql.jdbc.JDBC4ResultSet]
Cursor: [1x1 com.mathworks.toolbox.database.sqlExec]
Statement: [1x1 com.mysql.jdbc.StatementImpl]
Fetch: 0
rs =
Attributes: []
Data: {2497x8 cell}
DatabaseObject: [1x1 database]
RowLimit: 0
SQLQuery: 'xxxxxomitxxxx'
Message: []
Type: 'Database Cursor Object'
ResultSet: [1x1 com.mysql.jdbc.JDBC4ResultSet]
Cursor: [1x1 com.mathworks.toolbox.database.sqlExec]
Statement: [1x1 com.mysql.jdbc.StatementImpl]
Fetch: [1x1 com.mathworks.toolbox.database.fetchTheData]
The OP wrote:
well, by closing the mysql jdbc objects yourself after every fetch you can fix the problem, and not have to extract all of the mathworks code you've used everywhere:
rs.Statement.close()
rs.ResultSet.close()
Related
I've tried using SqlAlchemy, as well as raw mysql.connector here, but commiting an insert into a SQL database from FastAPI takes forever.
I wanted to make sure it wasn't just my DB, so I tried it on a local script and it ran in a couple seconds.
How can I work with FastAPI to make this query possible?
Thanks!
'''
#router.post('/')
def postStockData(data:List[pydanticModels.StockPrices], raw_db = Depends(get_raw_db)):
cursor = raw_db[0]
cnxn = raw_db[1]
# i = 0
# for row in data:
# if i % 10 == 0:
# print(i)
# db.flush()
# i += 1
# db_pricing = models.StockPricing(**row.dict())
# db.add(db_pricing)
# db.commit()
SQL = "INSERT INTO " + models.StockPricing.__tablename__ + " VALUES (%s, %s, %s)"
print(SQL)
valsToInsert = []
for row in data:
rowD = row.dict()
valsToInsert.append((rowD['date'], rowD['symbol'], rowD['value']))
cursor.executemany(SQL, valsToInsert)
cnxn.commit()
return {'message':'Pricing Updated'}
'''
You are killing performances because you try a "RBAR" approach which is not suitable in RDBMS...
You use a loop and execute an SQL INSERT of only one row...
When the RDBMS is facing a query, the sequence of execution is the following :
does the user that throw the query be authenticate ?
parsing the string to verify the syntax
looking for metadata (tables, columns, datatypes...)
analyzing which operations on tables and columns this user is granted
creating an execution plan to sequences all the operations needed for the query
setting up lock for concurrency
executing the query (inserting only 1 row)
throw back an error or a OK message
Every steps consumes time... and your are all theses steps 100 000 times because of your loop.
Usually when inserting in a table many rows, there just one query to do even if the INSERT concerns 10000000000 rows from a file !
Actually, I am trying to update one table with multiple processes via pymysql, and each process reads a CSV file split from a huge one in order to promote the speed. But I get the Lock wait timeout exceeded; try restarting transaction exception when I run the script. After searching the posts on this site, I found one post which mentioned that to set or build the built-in LOAD_DATA_INFILE, but no details on it. How can I do it with 'pymysql' to reach my aim?
---------------------------first edit----------------------------------------
Here's the job method:
`def importprogram(path, name):
begin = time.time()
print('begin to import program' + name + ' info.')
# "c:\\sometest.csv"
file = open(path, mode='rb')
csvfile = csv.reader(codecs.iterdecode(file, 'utf-8'))
connection = None
try:
connection = pymysql.connect(host='a host', user='someuser', password='somepsd', db='mydb',
cursorclass=pymysql.cursors.DictCursor)
count = 1
with connection.cursor() as cursor:
sql = '''update sometable set Acolumn='{guid}' where someid='{pid}';'''
next(csvfile, None)
for line in csvfile:
try:
count = count + 1
if ''.join(line).strip():
command = sql.format(guid=line[2], pid=line[1])
cursor.execute(command)
if count % 1000 == 0:
print('program' + name + ' cursor execute', count)
except csv.Error:
print('program csv.Error:', count)
continue
except IndexError:
print('program IndexError:', count)
continue
except StopIteration:
break
except Exception as e:
print('program' + name, str(e))
finally:
connection.commit()
connection.close()
file.close()
print('program' + name + ' info done.time cost:', time.time()-begin)`
And the multi-processing method:
import multiprocessing as mp
def multiproccess():
pool = mp.Pool(3)
results = []
paths = ['C:\\testfile01.csv', 'C:\\testfile02.csv', 'C:\\testfile03.csv']
name = 1
for path in paths:
results.append(pool.apply_async(importprogram, args=(path, str(name))))
name = name + 1
print(result.get() for result in results)
pool.close()
pool.join()
And the main method:
if __name__ == '__main__':
multiproccess()
I am new to Python. How can I make the code or the way itself goes wrong? Should I use only one single process to finish the data reading and importing?
Your issue is that you are exceeding the time allowed for a response to be fetched from the server, so the client is automatically timing out.
In my experience, adjust the wait timeout to something like 6000 seconds, combine into one CSV and just leave the data to import. Also, I would recommend running the query direct from MySQL rather than Python.
The way I usually import CSV data from Python to MySQL is through the INSERT ... VALUES ... method, and I only do so when some kind of manipulation of the data is required (i.e. inserting different rows into different tables).
I like your approach and understand your thinking but in reality there is no need. The benefit to the INSERT ... VALUES ... method is that you won't run into any timeout issue.
Good day to you. I'm writing a cron job that hopefully will split a huge MySQL table to several threads and do some work on them. This is the minimal sample of what I have at the moment:
require 'mysql'
require 'parallel'
#db = Mysql.real_connect("localhost", "root", "", "database")
#threads = 10
Parallel.map(1..#threads, :in_processes => 8) do |i|
begin
#db.query("SELECT url FROM pages LIMIT 1 OFFSET #{i}")
rescue Mysql::Error => e
#db.reconnect()
puts "Error code: #{e.errno}"
puts "Error message: #{e.error}"
puts "Error SQLSTATE: #{e.sqlstate}" if e.respond_to?("sqlstate")
end
end
#db.close
The threads don't need to return anything, they get their job share and they do it. Only they don't. Either connection to MySQL is lost during the query, or connection doesn't exist (MySQL server has gone away?!), or no _dump_data is defined for class Mysql::Result and then Parallel::DeadWorker.
How to do that right?
map method expects a result; I don't need a result, so I switched to each:
Parallel.each(1..#threads, :in_processes => 8) do |i|
Also this solves a problem with MySQL: I just needed to start the connection inside the parallel process. When using each loop, it's possible. Of course, connection should be closed inside the process also.
I'd like to open a connection to mysql database and retrieve data with different queries. Do I need to close the connection every time I fetch the data or is there a better way to query multiple times and close the connection only at the end?
Currently I do this:
db = dbConnect(MySQL(), user='root', password='1234', dbname='my_db', host='localhost')
query1=dbSendQuery(db, "select * from table1")
data1 = fetch(query1, n=10000)
query2=dbSendQuery(db, "select * from table2") ##ERROR !
and I get the error message:
Error in mysqlExecStatement(conn, statement, ...) :
RS-DBI driver: (connection with pending rows, close resultSet before continuing)
Now if I clear the result with dbClearResult(query1) I need to redo the connection (dbConnect...)
Is there a better/efficient way to fetch everything first instead of open/close every time?
Try dbGetQuery(...) instead of using dbSendQuery(...) and fetch() like this
db = dbConnect(MySQL(), user='root', password='1234', dbname='my_db', host='localhost')
query1=dbGetQuery(db, "select * from table1")
query2=dbGetQuery(db, "select * from table1")
From the help page:
The function ‘dbGetQuery’ does all these in one operation (submits the statement, fetches all output records, and clears the result set).
First of all I must say that I have somewhat basic knowledge of sql server and with that I'm trying to figure out how to resolve a deadlock.
I ran dbcc traceon (1204, -1), executed the culprit code and finally executed the xp_readerrorlog stored proc which gave me the following output:
Deadlock encountered .... Printing deadlock information
Wait-for graph
NULL
Node:1
OBJECT: 9:1093578934:0 CleanCnt:2 Mode:IX Flags: 0x1
Grant List 2:
Grant List 3:
Owner:0x000000008165A780 Mode: IX Flg:0x40 Ref:2 Life:02000000 SPID:57 ECID:0 XactLockInfo: 0x0000000082F00EC0
SPID: 57 ECID: 0 Statement Type: EXECUTE Line #: 1
Input Buf: RPC Event: Proc [Database Id = 9 Object Id = 1877581727]
Requested by:
ResType:LockOwner Stype:'OR'Xdes:0x0000000082E02E80 Mode: S SPID:56 BatchID:0 ECID:0 TaskProxy:(0x00000000826EE538) Value:0x81a6f9c0 Cost:(0/1492)
NULL
Node:2
APPLICATION: 9:0:[Proligent Analytics]:(6ff56412) CleanCnt:2 Mode:X Flags: 0x5
Grant List 2:
Owner:0x000000008165DE40 Mode: X Flg:0x40 Ref:1 Life:00000000 SPID:56 ECID:0 XactLockInfo: 0x0000000082E02EC0
SPID: 56 ECID: 0 Statement Type: OPEN CURSOR Line #: 27
Input Buf: RPC Event: Proc [Database Id = 9 Object Id = 1966630049]
Requested by:
ResType:LockOwner Stype:'OR'Xdes:0x0000000082F00E80 Mode: X SPID:57 BatchID:0 ECID:0 TaskProxy:(0x00000000827B8538) Value:0x83e29d40 Cost:(0/250576)
NULL
Victim Resource Owner:
ResType:LockOwner Stype:'OR'Xdes:0x0000000082E02E80 Mode: S SPID:56 BatchID:0 ECID:0 TaskProxy:(0x00000000826EE538) Value:0x81a6f9c0 Cost:(0/1492)
My problem is that I have no idea how to use this to find out what's going on. I've read that you can get the stored procedure that is getting locked but I don't know how.
Please a few pointers would be appreciated.
Thanks
As #Martin Smith says in his comment: select db_name(9), object_name(1093578934,9), object_name(1966630049,9), object_name(1877581727,9) should give you some of the objects.