Strange problem with my database that I host using AWS RDS. For a certain table, I sometimes suddenly get timeouts for almost all queries. Interestingly, for the other tables, there are almost no time outs (after 150.000 ms which is the max I have set for the lambda, after that it terminates) while they contain similar data.
This is the Lambda (the function that gets the data from the database) log:
15:38:10 Connecting db: jdbc:mysql://database.rds.amazonaws.com:3306/database_name Connected
15:38:10 Connection retrieved for matches_table matches, proceeding to statement
15:38:10 Statement created, proceeding to executing SQL
15:40:35 END RequestId: 410f7edf-0f48-45df-b509-a9b822fa5c1c
15:40:35 REPORT RequestId: 410f7edf-0f48-45df-b509-a9b822fa5c1c Duration: 150083.43 ms Billed Duration: 150000 ms Memory Size: 1024 MB Max Memory Used: 115 MB
15:40:35 2019-06-04T15:40:35.514Z 410f7edf-0f48-45df-b509-a9b822fa5c1c Task timed out after 150.08 seconds
And this is Java code that I use:
LinkedList<Object> matches = new LinkedList<Object>();
try {
String sql = db_conn.getRetrieveAllMatchesSqlSpecificColumn(userid, websiteid, profileid, matches_table, "matchid");
Connection conn = db_conn.getConnection();
System.out.println("Connection retrieved for matches_table " +matches_table+", proceeding to statement");
Statement st = conn.createStatement();
System.out.println("Statement created, proceeding to executing SQL");
// execute the query, and get a java resultset
ResultSet rs = st.executeQuery(sql);
System.out.println("SQL executed, now iterating to resultset");
// iterate through the java resultset
st.close();
} catch (SQLException ex) {
Logger.getLogger(AncestryDnaSQliteJDBC.class.getName()).log(Level.SEVERE, null, ex);
}
return matches;
A couple of months ago I did a big database resources upgrade and some removal of unwanted data and that more or less fixed it. But if I look at the current stats, it looks ok. Plenty of RAM (1GB) available, no swap used, enough cpu credits.
So I am not sure if this is a MySQL problem or a database problem linked to RDW AWS. Any suggestions?
Alright, it turned out to be an AWS specific thing. Turns out, there is some kind of IO credit system linked to the database. Interestingly, the chart that describes the number of credits left is not available in the default monitoring view of AWS RDS. You have to dive into CloudWatch and find it quite hidden. By increasing the allocated storage for this database, you earn more credits and by doing so I fixed the problem.
Related
I have this really big table with some millions of records every day and in the end of every day I am extracting all the records of the previous day. I am doing this like:
String SQL = "select col1, col2, coln from mytable where timecol = yesterday";
Statement.executeQuery(SQL);
The problem is that this program takes like 2GB of memory because it takes all the results in memory then it processes it.
I tried setting the Statement.setFetchSize(10) but it takes exactly the same memory from OS it does not make any difference. I am using Microsoft SQL Server 2005 JDBC Driver for this.
Is there any way to read the results in small chunks like the Oracle database driver does when the query is executed to show only a few rows and as you scroll down more results are shown?
In JDBC, the setFetchSize(int) method is very important to performance and memory-management within the JVM as it controls the number of network calls from the JVM to the database and correspondingly the amount of RAM used for ResultSet processing.
Inherently if setFetchSize(10) is being called and the driver is ignoring it, there are probably only two options:
Try a different JDBC driver that will honor the fetch-size hint.
Look at driver-specific properties on the Connection (URL and/or property map when creating the Connection instance).
The RESULT-SET is the number of rows marshalled on the DB in response to the query.
The ROW-SET is the chunk of rows that are fetched out of the RESULT-SET per call from the JVM to the DB.
The number of these calls and resulting RAM required for processing is dependent on the fetch-size setting.
So if the RESULT-SET has 100 rows and the fetch-size is 10,
there will be 10 network calls to retrieve all of the data, using roughly 10*{row-content-size} RAM at any given time.
The default fetch-size is 10, which is rather small.
In the case posted, it would appear the driver is ignoring the fetch-size setting, retrieving all data in one call (large RAM requirement, optimum minimal network calls).
What happens underneath ResultSet.next() is that it doesn't actually fetch one row at a time from the RESULT-SET. It fetches that from the (local) ROW-SET and fetches the next ROW-SET (invisibly) from the server as it becomes exhausted on the local client.
All of this depends on the driver as the setting is just a 'hint' but in practice I have found this is how it works for many drivers and databases (verified in many versions of Oracle, DB2 and MySQL).
The fetchSize parameter is a hint to the JDBC driver as to many rows to fetch in one go from the database. But the driver is free to ignore this and do what it sees fit. Some drivers, like the Oracle one, fetch rows in chunks, so you can read very large result sets without needing lots of memory. Other drivers just read in the whole result set in one go, and I'm guessing that's what your driver is doing.
You can try upgrading your driver to the SQL Server 2008 version (which might be better), or the open-source jTDS driver.
You need to ensure that auto-commit on the Connection is turned off, or setFetchSize will have no effect.
dbConnection.setAutoCommit(false);
Edit: Remembered that when I used this fix it was Postgres-specific, but hopefully it will still work for SQL Server.
Statement interface Doc
SUMMARY: void setFetchSize(int rows)
Gives the JDBC driver a hint as to the
number of rows that should be fetched
from the database when more rows are
needed.
Read this ebook J2EE and beyond By Art Taylor
Sounds like mssql jdbc is buffering the entire resultset for you. You can add a connect string parameter saying selectMode=cursor or responseBuffering=adaptive. If you are on version 2.0+ of the 2005 mssql jdbc driver then response buffering should default to adaptive.
http://msdn.microsoft.com/en-us/library/bb879937.aspx
It sounds to me that you really want to limit the rows being returned in your query and page through the results. If so, you can do something like:
select * from (select rownum myrow, a.* from TEST1 a )
where myrow between 5 and 10 ;
You just have to determine your boundaries.
Try this:
String SQL = "select col1, col2, coln from mytable where timecol = yesterday";
connection.setAutoCommit(false);
PreparedStatement stmt = connection.prepareStatement(SQL, SQLServerResultSet.TYPE_SS_SERVER_CURSOR_FORWARD_ONLY, SQLServerResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(2000);
stmt.set....
stmt.execute();
ResultSet rset = stmt.getResultSet();
while (rset.next()) {
// ......
I had the exact same problem in a project. The issue is that even though the fetch size might be small enough, the JDBCTemplate reads all the result of your query and maps it out in a huge list which might blow your memory. I ended up extending NamedParameterJdbcTemplate to create a function which returns a Stream of Object. That Stream is based on the ResultSet normally returned by JDBC but will pull data from the ResultSet only as the Stream requires it. This will work if you don't keep a reference of all the Object this Stream spits. I did inspire myself a lot on the implementation of org.springframework.jdbc.core.JdbcTemplate#execute(org.springframework.jdbc.core.ConnectionCallback). The only real difference has to do with what to do with the ResultSet. I ended up writing this function to wrap up the ResultSet:
private <T> Stream<T> wrapIntoStream(ResultSet rs, RowMapper<T> mapper) {
CustomSpliterator<T> spliterator = new CustomSpliterator<T>(rs, mapper, Long.MAX_VALUE, NON-NULL | IMMUTABLE | ORDERED);
Stream<T> stream = StreamSupport.stream(spliterator, false);
return stream;
}
private static class CustomSpliterator<T> extends Spliterators.AbstractSpliterator<T> {
// won't put code for constructor or properties here
// the idea is to pull for the ResultSet and set into the Stream
#Override
public boolean tryAdvance(Consumer<? super T> action) {
try {
// you can add some logic to close the stream/Resultset automatically
if(rs.next()) {
T mapped = mapper.mapRow(rs, rowNumber++);
action.accept(mapped);
return true;
} else {
return false;
}
} catch (SQLException) {
// do something with this Exception
}
}
}
you can add some logic to make that Stream "auto closable", otherwise don't forget to close it when you are done.
MySQL scenario:
When I execute "SELECT" queries in MySQL using multiple threads I get the following message: "Commands out of sync; you can't run this command now", I found that this is due to the limitation of having to wait "consume" the results to make another query.
C ++ example:
void DataProcAsyncWorker :: Execute ()
{
std :: thread (& DataProcAsyncWorker :: Run, this) .join ();
}
void DataProcAsyncWorker :: Run () {
sql :: PreparedStatement * prep_stmt = c-> con-> prepareStatement (query);
...
}
Important:
I can't help using multiple threads per query (SELECT, INSERT, ETC) because the module I'm building that is being integrated with NodeJS "locks" the thread until the result is already obtained, for this reason I need to run in the background (new thread) and resolve the "promise" containing the result obtained from MySQL
Important:
I am saving several "connections" [example: 10], and with each SQL call the function chooses a connection.
This is:
1. A connection pool that contains 10 established connections, Ex:
for (int i = 0; i <10; i ++) {
Com * c = new Com;
c-> id = i;
c-> con = openConnection ();
c-> con-> setSchema ("gateway");
conns.push_back (c);
}
2. The problem occurs when executing> = 100 SELECT queries per second, I believe that even with the connection balance 100 connections per second is a high number and the connection "ex: conns.at (50)" is in process and was not consumed
My question:
A. Does PostgreSQL have this limitation as well? Or in PostgreSQL there is also such a limitation?
B. Which server using SQL commands is recommended for large SQL queries per second without the need to "open new connections", that is:
In a conns.at (0) connection I can execute (through 2 simultaneous threads) SELECT commands.
Additional:
1. I can even create a larger number of connections in the pool, but when I simulate a number of queries per second greater than the number of pre-set connections I will get the error: "Commands out of sync", the the only solution I found was mutex, which is bad for performance
I found that PostgreSQL looks great with this (queue / queue), in a very efficient way, unlike MySQL where I need to call "_free_result", in PostgreSQL, I can run multiple queries on the same connection without receiving the error: "Commands out of sync ".
Note: I did the test using libpqxx (library for connection / queries to the PostgreSQL server in C) and it really worked like a wonder without giving me a headache.
Note: I don't know if it allows multi-thread execution or the execution is done synchronously on the server side for each connection, the only thing I know is that there is no such error in postgresql.
MySQL scenario:
When I execute "SELECT" queries in MySQL using multiple threads I get the following message: "Commands out of sync; you can't run this command now", I found that this is due to the limitation of having to wait "consume" the results to make another query. C ++ example:
void DataProcAsyncWorker::Execute()
{
std::thread (&DataProcAsyncWorker::Run, this).join();
}
void DataProcAsyncWorker :: Run () {
sql::PreparedStatement * prep_stmt = c->con->prepareStatement(query);
...
}
Important:
I can't help using multiple threads per query (SELECT, INSERT, ETC) because the module I'm building that is being integrated with NodeJS "locks" the thread until the result is already obtained, for this reason I need to run in the background (new thread) and resolve the "promise" containing the result obtained from MySQL
Important:
I am saving several "connections" [example: 10], and with each SQL call the function chooses a connection. This is: 1. A connection pool that contains 10 established connections, Ex:
for (int i = 0; i <10; i ++) {
Com * c = new Com;
c->id = i;
c->con = openConnection ();
c->con->setSchema("gateway");
conns.push_back(c);
}
The problem occurs when executing> = 100 SELECT queries per second, I believe that even with the connection balance 100 connections per second is a high number and the connection "ex: conns.at(10)" is in process and was not consumed
My question:
Does PostgreSQL have this limitation as well? Or in PostgreSQL there is also such a limitation?
Note:
In PHP Docs about MySQL, the mysqli_free_result command is required after using mysqli_query, if not, I will get a "Commands out of sync" error, in contrast to the PostgreSQL documentation, the pg_free_result command is completely optional after using pg_query.
That said, someone using PostgreSQL has already faced problems related to "commands are out of sync", maybe there is another name for this error?
Or is PostgreSQL able to deal with this problem automatically for this reason the free_result is being called invisibly by the server without causing me this error?
You need to finish using one prepared statement (or cursor or similar construct) before starting another.
"Commands out of sync" is often cured by adding the closing statement.
"Question:
Does PostgreSQL have this limitation as well? Or in PostgreSQL there is also such a limitation?"
No, the PostgreSQL does not have this limitation.
I'm connecting to a MySQL database through the Matlab Database Toolbox in order to run the same query over and over again within 2 nested for loops. After each iteration I get this warning:
Warning: com.mathworks.toolbox.database.databaseConnect#26960369 is not serializable
In Import_Matrices_DOandT_julaugsept_inflow_nomettsed at 476
Warning: com.mysql.jdbc.Connection#6e544a45 is not serializable
In Import_Matrices_DOandT_julaugsept_inflow_nomettsed at 476
Warning: com.mathworks.toolbox.database.databaseConnect#26960369 not serializable
In Import_Matrices_DOandT_julaugsept_inflow_nomettsed at 476
Warning: com.mysql.jdbc.Connection#6e544a45 is not serializable
In Import_Matrices_DOandT_julaugsept_inflow_nomettsed at 476
My code is basically structured like this:
%Server
host =
user =
password =
dbName =
%# JDBC parameters
jdbcString = sprintf('jdbc:mysql://%s/%s', host, dbName);
jdbcDriver = 'com.mysql.jdbc.Driver';
%# Create the database connection object
conn = database(dbName, user , password, jdbcDriver, jdbcString);
setdbprefs('DataReturnFormat', 'numeric');
%Loop
for SegmentNum=3:41;
for tl=1:15;
tic;
sqlquery=['giant string'];
results = fetch(conn, sqlquery);
(some code here that saves the results into a few variables)
save('inflow.mat');
end
end
time = toc
close(conn);
clear conn
Eventually, after some iterations the code will crash with this error:
Error using database/fetch (line 37)
Query execution was interrupted
Error in Import_Matrices_DOandT_julaugsept_inflow_nomettsed (line
466)
results = fetch(conn, sqlquery);
Last night it errored after 25 iterations. I have about 600 iterations total I need to do, and I don't want to have to keep checking back on it every 25. I've heard there can be memory issues with database connection objects...is there a way to keep my code running?
Let's take this one step at a time.
Warning: com.mathworks.toolbox.database.databaseConnect#26960369 is not serializable
This comes from this line
save('inflow.mat');
You are trying to save the database connection. That doesn't work. Try specifying the variables you wish to save only, and it should work better.
There are a couple of tricks to excluding the values, but honestly, I suggest you just find the most important variables you wish to save, and save those. But if you wish, you can piece together a solution from this page.
save inflow.mat a b c d e
Try wrapping the query in a try catch block. Whenever you catch an error reset the connection to the database which should free up the object.
nQuery = 100;
while(nQuery>0)
try
query_the_database();
nQuery = nQuery - 1;
catch
reset_database_connection();
end
end
The ultimate main reason for this is that database connection objects are TCP/IP ports and multiple processes cannot access the same port. That is why database connection object are not serialized. Ports cannot be serialized.
Workaround is to create a connection with in the for loop.
I intend to use "foreach" to uitlize all the cores in my CPU. The catch is i need to send a sql query inside the loop. The script is working fine with normal 'for' loop, but it is giving following error when i change it to 'foreach'.
The error is :
select: Interrupted system call
select: Interrupted system call
select: Interrupted system call
Error in { : task 1 failed - "expired MySQLConnection"
The code i used is :
library(foreach)
library(doMC)
library(RMySQL)
library(multicore)
registerDoMC(cores=6)
m <- dbDriver("MySQL", max.con = 100)
con <- dbConnect(m, user="*****", password = "******", host ="**.**.***",dbname="dbname")
list<-dbListTables(con)
foreach(i = 1:(length(list))%dopar%{
query<-paste("SELECT * FROM ",list[i]," WHERE `CLOSE` BETWEEN 1 AND 100",sep="")
t<-dbGetQuery(con,query)
}
Though 'foreach' is working fine in my system for all other purposes, it is giving error only in case of sql queries. Is there a way to send sql queries inside 'foreach' loop?
My suggestion is this:
Move the database queries outside the loop, and lock access so you dont do parallel database queries. I think that will speed things up too, as you won't have parallel disk access, while still being able to do parallel processing.
Meaning (pseudo code)
db = connect to database
threadlock = lock();
parfor {
threadlock.lock
result = db query (pull all data here, as you cant process while you load without keeping the database locked)
thread.unlock
process resulting data (which is now just data, and not a sql object).
}