I'm implementing a code where I need to perform few actions at fixed intervals.
Few of them are related to fetching data from mysql database.
To schedule these actions at fixed interval I'm using gocron. It is working quite well.
For the database, as of now, I'm creating an instance at the start of main program and passing it to subroutines. I'm using https://github.com/jmoiron/sqlx to work with DB.
The flow of code is:
i- initialise resources. For eg db = sql.Open;put the DB in common struct to pass to all subroutine
ii- scheduleActions using gocron (pass resources as needed)
iii- actions are specific subroutine that perform task as needed using given resource (for eg DB)
I got few cases where the mysql service needs to be restarted.
Then as expected I get error stating invalid connection. some thing like
[mysql] packets.go:33: unexpected EOF
[mysql] packets.go:130: write tcp 127.0.0.1:36191->127.0.0.1:3306: write: broken pipe
[mysql] connection.go:312: invalid connection
To get around this, I did an implementation to acquire the DB connection with in the subroutine and close with defer db.close(). With this I'm getting error related to too many open connections. I have checked for proper close of rows, as well usage of scan. And see the recommendations are being followed.
I would like to understand how to go about DB open and close handling in my case.
You can use sync.Once to prevent this:
var conn *sql.DB // Set package-wide, but not exported
var once sync.Once
func GetConnection() *sql.DB {
once.Do(func() {
var err error
if conn, err = sql.Open("postgres", "<credentials>"); err != nil {
log.Panic(err)
}
conn.SetMaxOpenConns(20) // Sane default
conn.SetMaxIdleConns(0)
conn.SetConnMaxLifetime(time.Nanosecond)
})
return conn
}
Read this: https://aaronoellis.com/articles/preventing-max-connection-errors-in-go
Related
I am using gorm v1 (ORM), go version 1.14
DB connection is created at the start of my app
and that DB is being passed throughout the app.
I have a complex & long functionality.
Let's say I have 10 sets of queries to run and the order doesn't matter.
So, what I did was
go queryset1(DB)
go queryset2(DB)
...
go queryset10(DB)
// here I have a wait, maybe via channel or WaitGroup.
Inside queryset1:
func queryset1(db *gorm.DB, /*wg or errChannel*/){
db.Count() // basic count query
wg.Done() or errChannel <- nil
}
Now, the problem is I encounter the error :1040 "too many connections" - Mysql.
Why is this happening? Does every go routine create a new connection?
If so, is there a way to check this & "live connections" in mysql
(Not the show status variables like connection)
How can I concurrently query the DB?
Edit:
This guy has the same problem
The error is not directly related to go-gorm, but to the underlying MySQL configuration and your initial connection configuration. In your code, you can manage the following parameters during your initial connection to the database.
maximum open connections (SetMaxOpenConns function)
maximum idle connections (SetMaxIdleConns function)
maximum timeout for idle connections (SetConnMaxLifetime function)
For more details, check the official docs or this article how to get the maximum performance from your connection configuration.
If you want to prevent a situation where each goroutine uses a separate connection, you can do something like this:
// restrict goroutines to be executed 5 at a time
connCh := make(chan bool, 5)
go queryset1(DB, &wg, connCh)
go queryset2(DB, &wg, connCh)
...
go queryset10(DB, &wg, connCh)
wg.Wait()
close(connCh)
Inside your queryset functions:
func queryset1(db *gorm.DB, wg *sync.WaitGroup, connCh chan bool){
connCh <- true
db.Count() // basic count query
<-connCh
wg.Done()
}
The connCh will allow the first 5 goroutines to write in it and block the execution of the rest of the goroutines until one of the first 5 goroutines takes the value from the connCh channel. This will prevent the situations where each goroutine will start it's own connection. Some of the connections should be reused, but that also depends on the initial connection configuration.
I'm working on a piece of code that is making calls to the database from several different places. In this code I have been using the GORM library, and calling gorm.Open() every time I need to interact with the database.
What I'm wondering is what is happening under the hood when I call this? Is a new connection pool created every time I call it or is each call to gorm.Open() sharing the same connection pool?
TLDR: yes, try to reuse the returned DB object.
gorm.Open does the following: (more or less):
lookup the driver for the given dialect
call sql.Open to return a DB object
call DB.Ping() to force it to talk to the database
This means that one sql.DB object is created for every gorm.Open. Per the doc, this means one connection pool for each DB object.
This means that the recommendations for sql.Open apply to gorm.Open:
The returned DB is safe for concurrent use by multiple goroutines and
maintains its own pool of idle connections. Thus, the Open function
should be called just once. It is rarely necessary to close a DB.
Yes, also note that the connection pool can be configured as such, in both GORM v1 and v2:
// SetMaxIdleConns sets the maximum number of connections in the idle connection pool.
db.DB().SetMaxIdleConns(10)
// SetMaxOpenConns sets the maximum number of open connections to the database.
db.DB().SetMaxOpenConns(100)
// SetConnMaxLifetime sets the maximum amount of time a connection may be reused.
db.DB().SetConnMaxLifetime(time.Hour)
Calling the DB() function on the *gorm.DB instance returns the underlying *sql.DB instance.
For those who are just starting with gorm, here is a more complete example.
db, err := gorm.Open(mysql.Open(url))
if err != nil {
// control error
}
sqlDB, err := db.DB()
if err != nil {
// control error
}
sqlDB.SetMaxIdleConns(10)
sqlDB.SetMaxOpenConns(100)
sqlDB.SetConnMaxLifetime(time.Hour)
There's a couple of earlier related questions, but none of which solve the issue for me:
https://dba.stackexchange.com/questions/160444/parallel-postgresql-queries-with-r
Parallel Database calls with RODBC
"foreach" loop : Using all cores in R (especially if we are sending sql queries inside foreach loop)
My use case is the following: I have a large database of data that needs to be plotted. Each plot takes a few seconds to create due to some necessary pre-processing of the data and the plotting itself (ggplot2). I need to do a large number of plots. My thinking is that I will connect to the database via dplyr without downloading all the data to memory. Then I have a function that fetches a subset of the data to be plotted. This approach works fine when using single-threading, but when I try to use parallel processing I run into SQL errors related to the connection MySQL server has gone away.
Now, I recently solved the same issue working in Python, in which case the solution was simply to kill the current connection inside the function, which forced the establishment of a new connection. I did this using connection.close() where connection is from Django's django.db.
My problem is that I cannot find an R equivalent of this approach. I thought I had found the solution when I found the pool package for R:
This package enables the creation of object pools for various types of
objects in R, to make it less computationally expensive to fetch one.
Currently the only supported pooled objects are DBI connections (see
the DBI package for more info), which can be used to query a database
either directly through DBI or through dplyr. However, the Pool class
is general enough to allow for pooling of any R objects, provided that
someone implements the backend appropriately (creating the object
factory class and all the required methods) -- a vignette with
instructions on how to do so will be coming soon.
My code is too large to post here, but essentially, it looks like this:
#libraries loaded as necessary
#connect to the db in some kind of way
#with dplyr
db = src_mysql(db_database, username = db_username, password = db_password)
#with RMySQL directly
db = dbConnect(RMySQL::MySQL(), dbname = db_database, username = db_username, password = db_password)
#with pool
db = pool::dbPool(RMySQL::MySQL(),
dbname = db_database,
username = db_username,
password = db_password,
minSize = 4)
#I tried all 3
#connect to a table
some_large_table = tbl(db, 'table')
#define the function
some_function = function(some_id) {
#fetch data from table
subtable = some_large_table %>% filter(id == some_id) %>% collect()
#do something with the data
something(subtable)
}
#parallel process
mclapply(vector_of_ids,
FUN = some_function,
mc.cores = num_of_threads)
The code you have above is not the equivalent of your Python code, and that is the key difference. What you did in Python is totally possible in R (see MWE below). However, the code you have above is not:
kill[ing] the current connection inside the function, which forced the establishment of a new connection.
What it is trying (and failing) to do is to make a database connection travel from the parent process to each child process opened by the call to mclapply. This is not possible. Database connections can never travel across process boundaries no matter what.
This is an example of the more general "rule" that the child process cannot affect the state of the parent process, period. For example, the child process also cannot write to memory locations. You can’t plot (to the parent process’s graphics device) from those child processes either.
In order to do the same thing you did in Python, you need to open a new connection inside of the function FUN (the second argument to mclapply) if you want it to be truly parallel. I.e. you have to make sure that the dbConnect call happens inside the child process.
This eliminates the point of pool (though it’s perfectly safe to use), since pool is useful when you reuse connections and generally want them to be easily accessible. For your parallel use case, since you can't cross process boundaries, this is useless: you will always need to open and close the connection for each new process, so you might as well skip pool entirely.
Here's the correct "translation" of your Python solution to R:
library(dplyr)
getById <- function(id) {
# create a connection and close it on exit
conn <- DBI::dbConnect(
drv = RMySQL::MySQL(),
dbname = "shinydemo",
host = "shiny-demo.csa7qlmguqrf.us-east-1.rds.amazonaws.com",
username = "guest",
password = "guest"
)
on.exit(DBI::dbDisconnect(conn))
# get a specific row based on ID
conn %>% tbl("City") %>% filter(ID == id) %>% collect()
}
parallel::mclapply(1:10, getById, mc.cores = 12)
I think I am having serious issue managing database connection pool in Golang. I built an RESTful API using Gorilla web toolkit which works great when only few requests are being sent over to the server. But now I started performing load testing using loader.io site. I apologize for the long post, but I wanted to give you the full picture.
Before going further, here are some info on the server running the API and MySQL:
Dedicated Hosting Linux
8GB RAM
Go version 1.1.1
Database connectivity using go-sql-driver
MySQL 5.1
Using loader.io I can send 1000 GET requests/15 seconds without problems. But when I send 1000 POST requests/15 seconds I get lots of errors all of which are ERROR 1040 too many database connections. Many people have reported similar issues online. Note that I am only testing on one specific POST request for now. For this post request I ensured the following (which was also suggested by many others online)
I made sure I use not Open and Close *sql.DB for short lived functions. So I created only global variable for the connection pool as you see in the code below, although I am open for suggestion here because I do not like to use global variables.
I made sure to use db.Exec when possible and only use db.Query and db.QueryRow when results are expected.
Since the above did not solve my problem, I tried to set db.SetMaxIdleConns(1000), which solved the problem for 1000 POST requests/15 seconds. Meaning no more 1040 errors. Then I increased the load to 2000 POST requests/15 seconds and I started getting ERROR 1040 again. I tried to increase the value in db.SetMaxIdleConns() but that did not make a difference.
Here some connection statistics I get from the MySQL database on the number of connections by running SHOW STATUS WHERE variable_name = 'Threads_connected';
For 1000 POST requests/15 seconds: observed #threads_connected ~= 100
For 2000 POST requests/15 seconds: observed #threads_connected ~= 600
I also increased the maximum connections for MySQL in my.cnf but that did not make a difference. What do you suggest? Does the code look fine? If yes, then it is probably the connections are just limited.
You will find a simplified version of the code below.
var db *sql.DB
func main() {
db = DbConnect()
db.SetMaxIdleConns(1000)
http.Handle("/", r)
err := http.ListenAndServe(fmt.Sprintf("%s:%s", API_HOST, API_PORT), nil)
if err != nil {
fmt.Println(err)
}
}
func DbConnect() *sql.DB {
db, err := sql.Open("mysql", connectionString)
if err != nil {
fmt.Printf("Connection error: %s\n", err.Error())
return nil
}
return db
}
func PostBounce(w http.ResponseWriter, r *http.Request) {
userId, err := AuthRequest(r)
//error checking
//ready requesy body and use json.Unmarshal
bounceId, err := CreateBounce(userId, b)
//return HTTP status code here
}
func AuthRequest(r *http.Request) (id int, err error) {
//parse header and get username and password
query := "SELECT Id FROM Users WHERE Username=? AND Password=PASSWORD(?)"
err = db.QueryRow(query, username, password).Scan(&id)
//error checking and return
}
func CreateBounce(userId int, bounce NewBounce) (bounceId int64, err error) {
//initialize some variables
query := "INSERT INTO Bounces (.....) VALUES (?, ?, ?, ?, ?, ?, ?, ?)"
result, err := db.Exec(query, ......)
//error checking
bounceId,_ = result.LastInsertId()
//return
}
Go database/sql doesn't prevent you from creating an infinite number of connections to the database. If there is an idle connection in the pool, it will be used, otherwise a new connection is created.
So, under load, your request handlers sql.DB is probably finding no idle connections and so a new connection is created when needed. This churns for a bit -reusing idle connections when possible and creating new when needed-, ultimately reaching the max connections for the Db. And, unfortunately, in Go 1.1 there isn't a convenient way (e.g. SetMaxOpenConns) to limit open connections.
Upgrade to a newer version of Golang. In Go 1.2+ you get SetMaxOpenConns. And check out the MySql docs for starting setting and then tune.
db.SetMaxOpenConns(100) //tune this
If you must use Go 1.1 you'll need to ensure in your code that *sql.DB is only being used by N clients at a time.
#MattSelf proposed solution is correct, but I ran into other issues. Here I highlighted what I did exactly to solve the problem (by the way, the server is running CentOS).
Since I have a dedicated server I increased the max_connections for MySQL
In /etc/my.cnf I added the line max_connections=10000. Although, that is more connections than what I need.
Restart MySQL: service mysql restart
Changed the ulimit -n. That is to increase the number of descriptive files that are open.
To do that I made changes to two files:
In /etc/sysctl.conf I added the line
fs.file-max = 65536
In /etc/security/limits.conf I added the following lines:
* soft nproc 65535
* hard nproc 65535
* soft nofile 65535
* hard nofile 65535
Reboot your server
Upgraded Go to 1.3.3 as suggested by #MattSelf
Set
db.SetMaxOpenConns(10000)
Again the number is too large for what I need, but this proved to me that things worked.
I ran a test using loader.io which consists of 5000 clients each sending POST request all within 15 seconds. All went through without errors.
Something else to note is setting the back_log to a higher value in your my.cnf file to something like a few hundred or 1000. This will help handle more connections per second. See High connections per second.
I have a Go API endpoint that makes several MySQL query. When the endpoint receives a small number of requests, it works just fine. However, I am now testing it using apache bench with 100 requests. The first 100 all went through. However, the 2nd 100 caused this error to appear
2014/01/15 12:08:03 http: panic serving 127.0.0.1:58602: runtime error: invalid memory address or nil pointer dereference
goroutine 973 [running]:
net/http.func·009()
/usr/local/Cellar/go/1.2/libexec/src/pkg/net/http/server.go:1093 +0xae
runtime.panic(0x402960, 0x9cf419)
/usr/local/Cellar/go/1.2/libexec/src/pkg/runtime/panic.c:248 +0x106
database/sql.(*Rows).Close(0x0, 0xc2107af540, 0x69)
/usr/local/Cellar/go/1.2/libexec/src/pkg/database/sql/sql.go:1576 +0x1e
store.findProductByQuery(0xc2107af540, 0x69, 0x0, 0xb88e80, 0xc21000ac70)
/Users/dennis.suratna/workspace/session-go/src/store/product.go:83 +0xe3
store.FindProductByAppKey(0xc210337748, 0x7, 0x496960, 0x6, 0xc2105eb1b0)
/Users/dennis.suratna/workspace/session-go/src/store/product.go:28 +0x11c
api.SessionHandler(0xb9eff8, 0xc2108ee200, 0xc2108f5750, 0xc2103285a0, 0x0, ...)
/Users/dennis.suratna/workspace/session-go/src/api/session_handler.go:31 +0x2fb
api.func·001(0xb9eff8, 0xc2108ee200, 0xc2108f5750, 0xc2103285a0)
/Users/dennis.suratna/workspace/session-go/src/api/api.go:81 +0x4f
reflect.Value.call(0x3ad9a0, 0xc2101ffdb0, 0x130, 0x48d520, 0x4, ...)
/usr/local/Cellar/go/1.2/libexec/src/pkg/reflect/value.go:474 +0xe0b
reflect.Value.Call(0x3ad9a0, 0xc2101ffdb0, 0x130, 0xc2103c4a00, 0x3, ...)
/usr/local/Cellar/go/1.2/libexec/src/pkg/reflect/value.go:345 +0x9d
github.com/codegangsta/inject.(*injector).Invoke(0xc2103379c0, 0x3ad9a0, 0xc2101ffdb0, 0x4311a0, 0x1db94e, ...)
It looks like it's not caused by the number of concurrent requests but, rather, something that is not properly closed. I am already closing every prepare statement that I create in my code. I am wondering if anyone has ever seen this before.
Edit:
This is how I am initializing my MySQL connection:
func InitStore(environment string) error {
db, err := sql.Open("mysql", connStr(environment))
....
S = &Store{
Mysql: db,
Environment: environment,
}
}
In this happens only once when I start the server.
Ok so I was able to solve this problem and now I can send ~500 requests with 10 concurrency with no more Broken pipe or Too many connections error.
I think it all comes down to following best practices. When you don't expect multiple rows to be returned user QueryRow instead of Query AND chain it with Scan
db.QueryRow(...).Scan(...)
If you don't expect rows to be returned and if you're not going to re-use your statements, use Exec not Prepare.
If you have prepared statement or querying multiple rows. Don't forget to Close()
Got all of the above from
https://github.com/go-sql-driver/mysql/issues/111
If you use Go 1.2.x you can use db.SetMaxOpenConns to tell the sql package to not open more than X connections. Queries that need a database connection after X connections are already open (and busy) will block until there's an available connection.
That being said: what are the next lines of the "stack trace"? Line ~1093 in http/server.go is the recover code when your serve function fails. It looks more like you are just mishandling some data and that makes it fail or you are missing an error check and then try processing data when you really were returned an error, etc.