I'm connecting to a MySQL database through the Matlab Database Toolbox in order to run the same query over and over again within 2 nested for loops. After each iteration I get this warning:
Warning: com.mathworks.toolbox.database.databaseConnect#26960369 is not serializable
In Import_Matrices_DOandT_julaugsept_inflow_nomettsed at 476
Warning: com.mysql.jdbc.Connection#6e544a45 is not serializable
In Import_Matrices_DOandT_julaugsept_inflow_nomettsed at 476
Warning: com.mathworks.toolbox.database.databaseConnect#26960369 not serializable
In Import_Matrices_DOandT_julaugsept_inflow_nomettsed at 476
Warning: com.mysql.jdbc.Connection#6e544a45 is not serializable
In Import_Matrices_DOandT_julaugsept_inflow_nomettsed at 476
My code is basically structured like this:
%Server
host =
user =
password =
dbName =
%# JDBC parameters
jdbcString = sprintf('jdbc:mysql://%s/%s', host, dbName);
jdbcDriver = 'com.mysql.jdbc.Driver';
%# Create the database connection object
conn = database(dbName, user , password, jdbcDriver, jdbcString);
setdbprefs('DataReturnFormat', 'numeric');
%Loop
for SegmentNum=3:41;
for tl=1:15;
tic;
sqlquery=['giant string'];
results = fetch(conn, sqlquery);
(some code here that saves the results into a few variables)
save('inflow.mat');
end
end
time = toc
close(conn);
clear conn
Eventually, after some iterations the code will crash with this error:
Error using database/fetch (line 37)
Query execution was interrupted
Error in Import_Matrices_DOandT_julaugsept_inflow_nomettsed (line
466)
results = fetch(conn, sqlquery);
Last night it errored after 25 iterations. I have about 600 iterations total I need to do, and I don't want to have to keep checking back on it every 25. I've heard there can be memory issues with database connection objects...is there a way to keep my code running?
Let's take this one step at a time.
Warning: com.mathworks.toolbox.database.databaseConnect#26960369 is not serializable
This comes from this line
save('inflow.mat');
You are trying to save the database connection. That doesn't work. Try specifying the variables you wish to save only, and it should work better.
There are a couple of tricks to excluding the values, but honestly, I suggest you just find the most important variables you wish to save, and save those. But if you wish, you can piece together a solution from this page.
save inflow.mat a b c d e
Try wrapping the query in a try catch block. Whenever you catch an error reset the connection to the database which should free up the object.
nQuery = 100;
while(nQuery>0)
try
query_the_database();
nQuery = nQuery - 1;
catch
reset_database_connection();
end
end
The ultimate main reason for this is that database connection objects are TCP/IP ports and multiple processes cannot access the same port. That is why database connection object are not serialized. Ports cannot be serialized.
Workaround is to create a connection with in the for loop.
Related
MySQL scenario:
When I execute "SELECT" queries in MySQL using multiple threads I get the following message: "Commands out of sync; you can't run this command now", I found that this is due to the limitation of having to wait "consume" the results to make another query.
C ++ example:
void DataProcAsyncWorker :: Execute ()
{
std :: thread (& DataProcAsyncWorker :: Run, this) .join ();
}
void DataProcAsyncWorker :: Run () {
sql :: PreparedStatement * prep_stmt = c-> con-> prepareStatement (query);
...
}
Important:
I can't help using multiple threads per query (SELECT, INSERT, ETC) because the module I'm building that is being integrated with NodeJS "locks" the thread until the result is already obtained, for this reason I need to run in the background (new thread) and resolve the "promise" containing the result obtained from MySQL
Important:
I am saving several "connections" [example: 10], and with each SQL call the function chooses a connection.
This is:
1. A connection pool that contains 10 established connections, Ex:
for (int i = 0; i <10; i ++) {
Com * c = new Com;
c-> id = i;
c-> con = openConnection ();
c-> con-> setSchema ("gateway");
conns.push_back (c);
}
2. The problem occurs when executing> = 100 SELECT queries per second, I believe that even with the connection balance 100 connections per second is a high number and the connection "ex: conns.at (50)" is in process and was not consumed
My question:
A. Does PostgreSQL have this limitation as well? Or in PostgreSQL there is also such a limitation?
B. Which server using SQL commands is recommended for large SQL queries per second without the need to "open new connections", that is:
In a conns.at (0) connection I can execute (through 2 simultaneous threads) SELECT commands.
Additional:
1. I can even create a larger number of connections in the pool, but when I simulate a number of queries per second greater than the number of pre-set connections I will get the error: "Commands out of sync", the the only solution I found was mutex, which is bad for performance
I found that PostgreSQL looks great with this (queue / queue), in a very efficient way, unlike MySQL where I need to call "_free_result", in PostgreSQL, I can run multiple queries on the same connection without receiving the error: "Commands out of sync ".
Note: I did the test using libpqxx (library for connection / queries to the PostgreSQL server in C) and it really worked like a wonder without giving me a headache.
Note: I don't know if it allows multi-thread execution or the execution is done synchronously on the server side for each connection, the only thing I know is that there is no such error in postgresql.
MySQL scenario:
When I execute "SELECT" queries in MySQL using multiple threads I get the following message: "Commands out of sync; you can't run this command now", I found that this is due to the limitation of having to wait "consume" the results to make another query. C ++ example:
void DataProcAsyncWorker::Execute()
{
std::thread (&DataProcAsyncWorker::Run, this).join();
}
void DataProcAsyncWorker :: Run () {
sql::PreparedStatement * prep_stmt = c->con->prepareStatement(query);
...
}
Important:
I can't help using multiple threads per query (SELECT, INSERT, ETC) because the module I'm building that is being integrated with NodeJS "locks" the thread until the result is already obtained, for this reason I need to run in the background (new thread) and resolve the "promise" containing the result obtained from MySQL
Important:
I am saving several "connections" [example: 10], and with each SQL call the function chooses a connection. This is: 1. A connection pool that contains 10 established connections, Ex:
for (int i = 0; i <10; i ++) {
Com * c = new Com;
c->id = i;
c->con = openConnection ();
c->con->setSchema("gateway");
conns.push_back(c);
}
The problem occurs when executing> = 100 SELECT queries per second, I believe that even with the connection balance 100 connections per second is a high number and the connection "ex: conns.at(10)" is in process and was not consumed
My question:
Does PostgreSQL have this limitation as well? Or in PostgreSQL there is also such a limitation?
Note:
In PHP Docs about MySQL, the mysqli_free_result command is required after using mysqli_query, if not, I will get a "Commands out of sync" error, in contrast to the PostgreSQL documentation, the pg_free_result command is completely optional after using pg_query.
That said, someone using PostgreSQL has already faced problems related to "commands are out of sync", maybe there is another name for this error?
Or is PostgreSQL able to deal with this problem automatically for this reason the free_result is being called invisibly by the server without causing me this error?
You need to finish using one prepared statement (or cursor or similar construct) before starting another.
"Commands out of sync" is often cured by adding the closing statement.
"Question:
Does PostgreSQL have this limitation as well? Or in PostgreSQL there is also such a limitation?"
No, the PostgreSQL does not have this limitation.
Strange problem with my database that I host using AWS RDS. For a certain table, I sometimes suddenly get timeouts for almost all queries. Interestingly, for the other tables, there are almost no time outs (after 150.000 ms which is the max I have set for the lambda, after that it terminates) while they contain similar data.
This is the Lambda (the function that gets the data from the database) log:
15:38:10 Connecting db: jdbc:mysql://database.rds.amazonaws.com:3306/database_name Connected
15:38:10 Connection retrieved for matches_table matches, proceeding to statement
15:38:10 Statement created, proceeding to executing SQL
15:40:35 END RequestId: 410f7edf-0f48-45df-b509-a9b822fa5c1c
15:40:35 REPORT RequestId: 410f7edf-0f48-45df-b509-a9b822fa5c1c Duration: 150083.43 ms Billed Duration: 150000 ms Memory Size: 1024 MB Max Memory Used: 115 MB
15:40:35 2019-06-04T15:40:35.514Z 410f7edf-0f48-45df-b509-a9b822fa5c1c Task timed out after 150.08 seconds
And this is Java code that I use:
LinkedList<Object> matches = new LinkedList<Object>();
try {
String sql = db_conn.getRetrieveAllMatchesSqlSpecificColumn(userid, websiteid, profileid, matches_table, "matchid");
Connection conn = db_conn.getConnection();
System.out.println("Connection retrieved for matches_table " +matches_table+", proceeding to statement");
Statement st = conn.createStatement();
System.out.println("Statement created, proceeding to executing SQL");
// execute the query, and get a java resultset
ResultSet rs = st.executeQuery(sql);
System.out.println("SQL executed, now iterating to resultset");
// iterate through the java resultset
st.close();
} catch (SQLException ex) {
Logger.getLogger(AncestryDnaSQliteJDBC.class.getName()).log(Level.SEVERE, null, ex);
}
return matches;
A couple of months ago I did a big database resources upgrade and some removal of unwanted data and that more or less fixed it. But if I look at the current stats, it looks ok. Plenty of RAM (1GB) available, no swap used, enough cpu credits.
So I am not sure if this is a MySQL problem or a database problem linked to RDW AWS. Any suggestions?
Alright, it turned out to be an AWS specific thing. Turns out, there is some kind of IO credit system linked to the database. Interestingly, the chart that describes the number of credits left is not available in the default monitoring view of AWS RDS. You have to dive into CloudWatch and find it quite hidden. By increasing the allocated storage for this database, you earn more credits and by doing so I fixed the problem.
I'm using the Sequel gem which works great. However I'm trying to debug a multithreading bug so I activated the log (at the Sequel level : .i.e using a Logger when creating the connection to the database). My problem is , all the SQL logs coming from the different connections are tangled in the log file and there is no know which query correspond to which connection. Having a connection id or something added to the log would be really useful.
Is there a way to do so or an alternative solution ?
If there's nothing built-in, try monkey patching or changing the logger, or the call to it, so it prepends each log line with the thread's ID.
The relevant file in Sequel would be:
https://github.com/jeremyevans/sequel/blob/master/lib/sequel/database/logging.rb
Based on it, chances are you could subclass Logger and throw that in to make it work.
http://www.ruby-doc.org/stdlib-2.1.0/libdoc/logger/rdoc/Logger.html
If the Logger docs and its code is anything to go by, you can probably do what you want by overriding the add() method, e.g.:
def add(severity, message = nil, progname = nil, &block)
thread_msg = "thread: #{Thread.current.object_id}"
progname ||= #progname
if message.nil?
if block_given?
message = yield
else
message = progname
progname = #progname
end
end
message = "#{thread_msg}\n#{message}"
super(severity, message, progname, &block)
end
This is a sample code I'd like to run:
for i in range(1,2000):
db = create_engine('mysql://root#localhost/test_database')
conn = db.connect()
#some simple data operations
conn.close()
db.dispose()
Is there a way of running this without getting "Too many connections" errors from MySQL?
I already know I can handle the connection otherwise or have a connection pool. I'd just like to understand how to properly close a connection from sqlalchemy.
Here's how to write that code correctly:
db = create_engine('mysql://root#localhost/test_database')
for i in range(1,2000):
conn = db.connect()
#some simple data operations
conn.close()
db.dispose()
That is, the Engine is a factory for connections as well as a pool of connections, not the connection itself. When you say conn.close(), the connection is returned to the connection pool within the Engine, not actually closed.
If you do want the connection to be actually closed, that is, not pooled, disable pooling via NullPool:
from sqlalchemy.pool import NullPool
db = create_engine('mysql://root#localhost/test_database', poolclass=NullPool)
With the above Engine configuration, each call to conn.close() will close the underlying DBAPI connection.
If OTOH you actually want to connect to different databases on each call, that is, your hardcoded "localhost/test_database" is just an example and you actually have lots of different databases, then the approach using dispose() is fine; it will close out every connection that is not checked out from the pool.
In all of the above cases, the important thing is that the Connection object is closed via close(). If you're using any kind of "connectionless" execution, that is engine.execute() or statement.execute(), the ResultProxy object returned from that execute call should be fully read, or otherwise explicitly closed via close(). A Connection or ResultProxy that's still open will prohibit the NullPool or dispose() approaches from closing every last connection.
Tried to figure out a solution to disconnect from database for an unrelated problem (must disconnect before forking).
You need to invalidate the connection from the connection Pool too.
In your example:
for i in range(1,2000):
db = create_engine('mysql://root#localhost/test_database')
conn = db.connect()
# some simple data operations
# session.close() if needed
conn.invalidate()
db.dispose()
I use this one
engine = create_engine('...')
with engine.connect() as conn:
conn.execute(text(f"CREATE SCHEMA IF NOT EXISTS...")
engine.dispose()
In my case these always works and I am able to close!
So using invalidate() before close() makes the trick. Otherwise close() sucks.
conn = engine.raw_connection()
conn.get_warnings = True
curSql = xx_tmpsql
myresults = cur.execute(curSql, multi=True)
print("Warnings: #####")
print(cur.fetchwarnings())
for curresult in myresults:
print(curresult)
if curresult.with_rows:
print(curresult.column_names)
print(curresult.fetchall())
else:
print("no rows returned")
cur.close()
conn.invalidate()
conn.close()
engine.dispose()