Go write unix /tmp/mysql.sock: broken pipe when sending a lot of requests - mysql

I have a Go API endpoint that makes several MySQL query. When the endpoint receives a small number of requests, it works just fine. However, I am now testing it using apache bench with 100 requests. The first 100 all went through. However, the 2nd 100 caused this error to appear
2014/01/15 12:08:03 http: panic serving 127.0.0.1:58602: runtime error: invalid memory address or nil pointer dereference
goroutine 973 [running]:
net/http.func·009()
/usr/local/Cellar/go/1.2/libexec/src/pkg/net/http/server.go:1093 +0xae
runtime.panic(0x402960, 0x9cf419)
/usr/local/Cellar/go/1.2/libexec/src/pkg/runtime/panic.c:248 +0x106
database/sql.(*Rows).Close(0x0, 0xc2107af540, 0x69)
/usr/local/Cellar/go/1.2/libexec/src/pkg/database/sql/sql.go:1576 +0x1e
store.findProductByQuery(0xc2107af540, 0x69, 0x0, 0xb88e80, 0xc21000ac70)
/Users/dennis.suratna/workspace/session-go/src/store/product.go:83 +0xe3
store.FindProductByAppKey(0xc210337748, 0x7, 0x496960, 0x6, 0xc2105eb1b0)
/Users/dennis.suratna/workspace/session-go/src/store/product.go:28 +0x11c
api.SessionHandler(0xb9eff8, 0xc2108ee200, 0xc2108f5750, 0xc2103285a0, 0x0, ...)
/Users/dennis.suratna/workspace/session-go/src/api/session_handler.go:31 +0x2fb
api.func·001(0xb9eff8, 0xc2108ee200, 0xc2108f5750, 0xc2103285a0)
/Users/dennis.suratna/workspace/session-go/src/api/api.go:81 +0x4f
reflect.Value.call(0x3ad9a0, 0xc2101ffdb0, 0x130, 0x48d520, 0x4, ...)
/usr/local/Cellar/go/1.2/libexec/src/pkg/reflect/value.go:474 +0xe0b
reflect.Value.Call(0x3ad9a0, 0xc2101ffdb0, 0x130, 0xc2103c4a00, 0x3, ...)
/usr/local/Cellar/go/1.2/libexec/src/pkg/reflect/value.go:345 +0x9d
github.com/codegangsta/inject.(*injector).Invoke(0xc2103379c0, 0x3ad9a0, 0xc2101ffdb0, 0x4311a0, 0x1db94e, ...)
It looks like it's not caused by the number of concurrent requests but, rather, something that is not properly closed. I am already closing every prepare statement that I create in my code. I am wondering if anyone has ever seen this before.
Edit:
This is how I am initializing my MySQL connection:
func InitStore(environment string) error {
db, err := sql.Open("mysql", connStr(environment))
....
S = &Store{
Mysql: db,
Environment: environment,
}
}
In this happens only once when I start the server.

Ok so I was able to solve this problem and now I can send ~500 requests with 10 concurrency with no more Broken pipe or Too many connections error.
I think it all comes down to following best practices. When you don't expect multiple rows to be returned user QueryRow instead of Query AND chain it with Scan
db.QueryRow(...).Scan(...)
If you don't expect rows to be returned and if you're not going to re-use your statements, use Exec not Prepare.
If you have prepared statement or querying multiple rows. Don't forget to Close()
Got all of the above from
https://github.com/go-sql-driver/mysql/issues/111

If you use Go 1.2.x you can use db.SetMaxOpenConns to tell the sql package to not open more than X connections. Queries that need a database connection after X connections are already open (and busy) will block until there's an available connection.
That being said: what are the next lines of the "stack trace"? Line ~1093 in http/server.go is the recover code when your serve function fails. It looks more like you are just mishandling some data and that makes it fail or you are missing an error check and then try processing data when you really were returned an error, etc.

Related

How can we run queries concurrently, using go routines?

I am using gorm v1 (ORM), go version 1.14
DB connection is created at the start of my app
and that DB is being passed throughout the app.
I have a complex & long functionality.
Let's say I have 10 sets of queries to run and the order doesn't matter.
So, what I did was
go queryset1(DB)
go queryset2(DB)
...
go queryset10(DB)
// here I have a wait, maybe via channel or WaitGroup.
Inside queryset1:
func queryset1(db *gorm.DB, /*wg or errChannel*/){
db.Count() // basic count query
wg.Done() or errChannel <- nil
}
Now, the problem is I encounter the error :1040 "too many connections" - Mysql.
Why is this happening? Does every go routine create a new connection?
If so, is there a way to check this & "live connections" in mysql
(Not the show status variables like connection)
How can I concurrently query the DB?
Edit:
This guy has the same problem
The error is not directly related to go-gorm, but to the underlying MySQL configuration and your initial connection configuration. In your code, you can manage the following parameters during your initial connection to the database.
maximum open connections (SetMaxOpenConns function)
maximum idle connections (SetMaxIdleConns function)
maximum timeout for idle connections (SetConnMaxLifetime function)
For more details, check the official docs or this article how to get the maximum performance from your connection configuration.
If you want to prevent a situation where each goroutine uses a separate connection, you can do something like this:
// restrict goroutines to be executed 5 at a time
connCh := make(chan bool, 5)
go queryset1(DB, &wg, connCh)
go queryset2(DB, &wg, connCh)
...
go queryset10(DB, &wg, connCh)
wg.Wait()
close(connCh)
Inside your queryset functions:
func queryset1(db *gorm.DB, wg *sync.WaitGroup, connCh chan bool){
connCh <- true
db.Count() // basic count query
<-connCh
wg.Done()
}
The connCh will allow the first 5 goroutines to write in it and block the execution of the rest of the goroutines until one of the first 5 goroutines takes the value from the connCh channel. This will prevent the situations where each goroutine will start it's own connection. Some of the connections should be reused, but that also depends on the initial connection configuration.

go-sql-driver: get invalid connection when wait_timeout is 8h as default

One Sentence
Got MySQL invalid connection issue when MaxOpenConns are abundant and wait_timeout is 8h.
Detailed
I've a script intending to read all records from table A, make some transformation, and write the resulted records to table B. And the code works this way:
One goroutine scans table A, putting the records into a channel;
Other four goroutine (number configurable) concurrently consume from above channel, accumulating 50 rows (batch size configurable) to insert into table B, then accumulating another 50 rows, and so on so forth.
Scanner goroutine holds one *sql.DB, and inserter goroutines share another *sql.DB
go-sql-driver: either Version 1.4.1 (2018-11-14) or Version 1.5 (2020-01-07)
(problem encountered with 1.4.1, and reproducible demo, see below, uses 1.5)
Go version: go1.13.15 darwin/amd64
The invalid connection issue is almost steadily reproducible. 
In a specific running case, table A has 67227 records, channel size is set to 100000, table A scanner (1 goroutine) reads 1000 a time, table B inserter(4 goroutines) write 50 a time. It ends up with 67127 records in table B (2*50 lost), and 2 lines of error output in console:
[mysql] 2020/12/11 21:54:18 packets.go:36: read tcp x.x.x.x:64062->x.x.x.x:3306: read: operation timed out
[mysql] 2020/12/11 21:54:21 packets.go:36: read tcp x.x.x.x:64070->x.x.x.x:3306: read: operation timed out
(The number of error lines varies when I reproduce, it's usually 1, 2 or 3. N error lines coincide with N*50 records insertion failure into table B.)
And from my log file, it prints invalid connection:
2020/12/11 21:54:18 main.go:135: [goroutine 56] BatchExecute: BatchInsertPlace(): SqlDb.ExecContext(): invalid connection
Stats={MaxOpenConnections:0 OpenConnections:4 InUse:3 Idle:1 WaitCount:0 WaitDuration:0s MaxIdleClosed:14 MaxLifetimeClosed:0}
2020/12/11 21:54:21 main.go:135: [goroutine 55] BatchExecute: BatchInsertPlace(): SqlDb.ExecContext(): invalid connection
Stats={MaxOpenConnections:0 OpenConnections:4 InUse:3 Idle:1 WaitCount:0 WaitDuration:0s MaxIdleClosed:14 MaxLifetimeClosed:0}
Trials and observations
By printing each success/ fail write operation with goroutine id in log, it appears that the error always happen when any 1 of all 4 inserting goroutines has an over ~45 seconds interval between 2 consecutive writes. I think it's just taking this long to accumulate 50 records before inserting them to table B.
In contrast, when I happened to make a change so that the 4 inserting goroutines write some averagely, (i.e. no one has a much longer writing interval than others), the error is not seen. Repeated 3 times.
Looks one error only affects one batch write operation, and the following batches work well. So why not retry with the errored batch? I suppose one retry and it will get through. Still, I don't mind keep retrying until success:
var retryExecTillSucc = func(goroutineId int, records []*MyDto) {
err := inserter.BatchInsert(records)
for { // retry until success. This is a workaround for 'invalid connection' issue
if err == nil { break }
logger.Printf("[goroutine %v] BatchExecute: %v \nStats=%+v\n", goroutineId, err, inserter.RdsClient.SqlDb.Stats())
err = inserter.retryBatchInsert(records)
}
logger.Printf("[goroutine %v] BatchExecute: Success \nStats=%+v\n", goroutineId, inserter.RdsClient.SqlDb.Stats())
}
Surprisingly, with this change, retries of the errored batch keep getting error and never succeed...
Summary
It looks obvious that one (idle) connection was broken when the error occur, but my question is:
MySQL wait_timeout is set 8h, so why is the connection timed out so quickly?
Since MaxOpenConns is not set, it shouldn't be a limitation, especially considering the merely 4 OpenConnections in log.
What else to check as potential root cause?
(Too long, but just hope to put it clearly and get some advice~)
Update
Minimal, reproducible example, including:
Code
One sample log file
MySQL error log
Don't you use Context? I suppose the read timeout is caused by Context Timeout, or readTimeout parameter.
MySQL doesn't provide safe and efficient canceling mechanism. When context is cancelled or reached readTimeout, DB.ExecContext returns without terminating using connection. It cause "invalid connection" next time the connection is used.
If you want to limit execution time of long query, you can use MAX_EXECUTION_TIME hint instead of context.
See https://dev.mysql.com/doc/refman/5.7/en/optimizer-hints.html#optimizer-hints-execution-time for reference.

Handling database open and close in subroutines

I'm implementing a code where I need to perform few actions at fixed intervals.
Few of them are related to fetching data from mysql database.
To schedule these actions at fixed interval I'm using gocron. It is working quite well.
For the database, as of now, I'm creating an instance at the start of main program and passing it to subroutines. I'm using https://github.com/jmoiron/sqlx to work with DB.
The flow of code is:
i- initialise resources. For eg db = sql.Open;put the DB in common struct to pass to all subroutine
ii- scheduleActions using gocron (pass resources as needed)
iii- actions are specific subroutine that perform task as needed using given resource (for eg DB)
I got few cases where the mysql service needs to be restarted.
Then as expected I get error stating invalid connection. some thing like
[mysql] packets.go:33: unexpected EOF
[mysql] packets.go:130: write tcp 127.0.0.1:36191->127.0.0.1:3306: write: broken pipe
[mysql] connection.go:312: invalid connection
To get around this, I did an implementation to acquire the DB connection with in the subroutine and close with defer db.close(). With this I'm getting error related to too many open connections. I have checked for proper close of rows, as well usage of scan. And see the recommendations are being followed.
I would like to understand how to go about DB open and close handling in my case.
You can use sync.Once to prevent this:
var conn *sql.DB // Set package-wide, but not exported
var once sync.Once
func GetConnection() *sql.DB {
once.Do(func() {
var err error
if conn, err = sql.Open("postgres", "<credentials>"); err != nil {
log.Panic(err)
}
conn.SetMaxOpenConns(20) // Sane default
conn.SetMaxIdleConns(0)
conn.SetConnMaxLifetime(time.Nanosecond)
})
return conn
}
Read this: https://aaronoellis.com/articles/preventing-max-connection-errors-in-go

Neo4j server hangs every 2 hours consistently. Please help me understand if something is wrong with the configuration

We have a neo4j graph database with around 60 million nodes and an equivalent relationships.
We have been facing consistent packet drops and delays in processing and a complete hung server after 2 hours. We had to shutdown and restart our servers every time this happens and we are having trouble understanding where we went wrong with our configuration.
We are seeing the following kind of exceptions in the console.log file -
java.lang.IllegalStateException: s=DISPATCHED i=true a=null o.e.jetty.server.HttpConnection - HttpConnection#609c1158{FILLING}
java.lang.IllegalStateException: s=DISPATCHED i=true a=null o.e.j.util.thread.QueuedThreadPool
java.lang.IllegalStateException: org.eclipse.jetty.util.SharedBlockingCallback$BlockerTimeoutException
o.e.j.util.thread.QueuedThreadPool - Unexpected thread death: org.eclipse.jetty.util.thread.QueuedThreadPool$3#59d5a975 in
qtp1667455214{STARTED,14<=21<=21,i=0,q=58}
org.eclipse.jetty.server.Response - Committed before 500 org.neo4j.server.rest.repr.OutputFormat$1#39beaadf
o.e.jetty.servlet.ServletHandler - /db/data/cypher java.lang.IllegalStateException: Committed at
org.eclipse.jetty.server.Response.resetBuffer(Response.java:1253)
~[jetty-server-9.2.
org.eclipse.jetty.server.HttpChannel - /db/data/cypher java.lang.IllegalStateException: Committed at
org.eclipse.jetty.server.Response.resetBuffer(Response.java:1253)
~[jetty-server-9.2.
org.eclipse.jetty.server.HttpChannel - Could not send response error 500: java.lang.IllegalStateException: Committed
o.e.jetty.server.ServerConnector - Stopped
o.e.jetty.servlet.ServletHandler - /db/data/cypher org.neo4j.graphdb.TransactionFailureException: Transaction was marked
as successful, but unable to commit transaction so rolled back.
We are using neo4j enterprise edition 2.2.5 server in SINGLE/NON CLUSTER mode on Azure D series 8 core CPU,56 GB RAM UBUNTU 14.04 LTS machine with an attached 500GB data disk.
Here is a snapshot of the sizes of neostore files
8.5G Oct 2 15:48 neostore.propertystore.db
15G Oct 2 15:48 neostore.relationshipstore.db
2.5G Oct 2 15:48 neostore.nodestore.db
6.9M Oct 2 15:48 neostore.relationshipgroupstore.db
3.7K Oct 2 15:07 neostore.schemastore.db
145 Oct 2 15:07 neostore.labeltokenstore.db
170 Oct 2 15:07 neostore.relationshiptypestore.db
The Neo4j configuration is as follows -
Allocated 30GB to file buffer cache (dbms.pagecache.memory=30G)
Allocated 20GB to JVM heap memory (wrapper.java.initmemory=20480, wrapper.java.maxmemory=20480)
Using the default hpc(High performance) type cache.
Forcing the RULE planner by default (dbms.cypher.planner=RULE)
Maximum threads processing queries is 16(twice the number of cores) - org.neo4j.server.webserver.maxthreads=16
Transaction timeout of 60 seconds - org.neo4j.server.transaction.timeout=60
Guard Timeout if query execution time is greater than 10 seconds - org.neo4j.server.webserver.limit.executiontime=10000
Rest of the settings are default
We actually want to setup a cluster of 3 nodes but before that we want to be sure if our basic configuration is correct. Please help us
--------------------------------------------------------------------------
EDITED to ADD Query Sample
Typically our cypher query frequency is 18K queries in an hour with an average of roughly 5-6 queries a second. There are also times when there are about 80 queries per second.
Our Typical Queries look like the ones below
match (a:TypeA {param:{param}})-[:RELA]->(d:TypeD) with distinct d,a skip {skip} limit 100 optional match (d)-[:RELF]->(c:TypeC)<-[:RELF]-(b:TypeB)<-[:RELB]-(a) with distinct d,a,collect(distinct b.bid) as bids,collect(distinct c.param3) as param3Coll optional match (d)-[:RELE]->(p:TypeE)<-[:RELE]-(b1:TypeB)<-[:RELB]-(a) with distinct d as distD,bids+collect(distinct b1.bid) as tbids,param3Coll,collect(distinct p.param4) as param4Coll optional match (distD)-[:RELC]->(f:TypeF) return id(distD),distD.param5,exists((distD)<-[:RELG]-()) as param6, tbids,param3Coll,param4Coll,collect(distinct id(f)) as fids
match (a:TypeA {param:{param}})-[:RELB]->(b) return count(distinct b)
MATCH (a:TypeA{param:{param}})-[r:RELD]->(a1)-[:RELH]->(h) where r.param1=true with a,a1,h match (h)-[:RELL]->(d:TypeI) where (d.param2/2)%2=1 optional match (a)-[:RELB]-(b)-[:RELM {param3:true}]->(c) return a1.param,id(a1),collect(b.bid),c.param5
match (a:TypeA {param:{param}}) match (a)-[:RELB]->(b) with distinct b,a skip {skip} limit 100 match (a)-[:RELH]->(h1:TypeH) match (b)-[:RELF|RELE]->(x)<-[:RELF|RELE]-(h2:TypeH)<-[:RELH]-(a1) optional match (a1)<-[rd:RELD]-(a) with distinct a1,a,h1,b,h2,rd.param1 as param2,collect(distinct x.param3) as param3s,collect(distinct x.param4) as param4s optional match (a1)-[:RELB]->(b1) where b1.param7 in [0,1] and exists((b1)-[:RELF|RELE]->()<-[:RELF|RELE]-(h1)) with distinct a1,a,b,h2,param2,param3s,param4s,b1,case when param2 then false else case when ((a1.param5 in [2,3] or length(param3s)>0) or (a1.param5 in [1,3] or length(param4s)>0)) then case when b1.param7=0 then false else true end else false end end as param8 MERGE (a)-[r2:RELD]->(a1) on create set r2.param6=true on match set r2.param6=case when param8=true and r2.param9=false then true else false end MERGE (b)-[r3:RELM]->(h2) SET r2.param9=param8, r3.param9=param8
MATCH (a:TypeA {param:{param}})-[:RELI]->(g:TypeG {type:'type1'}) match (g)<-[r:RELI]-(a1:TypeA)-[:RELJ]->(j)-[:RELK]->(g) return distinct g, collect(j.displayName), collect(r.param1), g.gid, collect(a1.param),collect(id(a1))
match (a:TypeA {param:{param}})-[r:RELD {param2:true}]->(a1:TypeA)-[:RELH]->(b:TypeE) remove r.param2 return id(a1),b.displayName, b.firstName,b.lastName
match (a:TypeA {param:{param}})-[:RELA]->(b:TypeB) return a.param1,count(distinct id(b))
MATCH (a:TypeA {param:{param}}) set a.param1=true;
match (a:TypeE)<-[r:RELE]-(b:TypeB) where a.param4 in {param4s} delete r return count(b);
MATCH (a:TypeA {param:{param}}) return id(a);
Adding a few more strange things I have been noticing....
I am have stopped all my webservers. So, currently there are no incoming requests to neo4j. However I see that there are about 40K open file handles in TCP close/wait state implying the client has closed its connection because of time out and Neo4j has not processed it and responded to that request. I also see (from messages.log) that Neo4j server is
still processing queries and as it does this, the 40K open file handles is slowly reducing. By the time I write this post there are about 27K open file handles in TCP close/wait state.
Also I see that the queries are not continuously processed. Every once in a while I am seeing a pause in messages.log and I see these messages about log rotation because of some out of order sequence as below
Rotating log version:5630
2015-10-04 05:10:42.712+0000 INFO
[o.n.k.LogRotationImpl]: Log Rotation [5630]: Awaiting all
transactions closed...
2015-10-04 05:10:42.712+0000 INFO
[o.n.k.i.s.StoreFactory]: Waiting for all transactions to close...
committed: out-of-order-sequence:95494483 [95494476]
committing:
95494483
closed: out-of-order-sequence:95494480 [95494246]
2015-10-04 05:10:43.293+0000 INFO [o.n.k.LogRotationImpl]: Log
Rotation [5630]: Starting store flush...
2015-10-04 05:10:44.941+0000
INFO [o.n.k.i.s.StoreFactory]: About to rotate counts store at
transaction 95494483 to [/datadrive/graph.db/neostore.counts.db.b],
from [/datadrive/graph.db/neostore.counts.db.a].
2015-10-04
05:10:44.944+0000 INFO [o.n.k.i.s.StoreFactory]: Successfully rotated
counts store at transaction 95494483 to
[/datadrive/graph.db/neostore.counts.db.b], from
[/datadrive/graph.db/neostore.counts.db.a].
I also see these messages once in a while
2015-10-04 04:59:59.731+0000 DEBUG [o.n.k.EmbeddedGraphDatabase]:
NodeCache array:66890956 purge:93 size:1.3485746GiB misses:0.80978173%
collisions:1.9829895% (345785) av.purge waits:13 purge waits:0 avg.
purge time:110ms
or
2015-10-04 05:10:20.768+0000 DEBUG [o.n.k.EmbeddedGraphDatabase]:
RelationshipCache array:66890956 purge:0 size:257.883MiB
misses:10.522135% collisions:11.121769% (5442101) av.purge waits:0
purge waits:0 avg. purge time:N/A
All of this is happening when there are no incoming requests and neo4j is processing old pending 40K requests as I mentioned above.
Since, it is a dedicated server, should not the server be processing the queries continuously without such a large pending queue? Am I missing something here? Please help me
Didn't go completely over your queries. You should examine each of the queries you send often by prefixing with PROFILE or EXPLAIN to see the query plan and get an idea how many accesses they cause.
E.g. the second match in the following query looks like being expensive since the two patterns are not connected with each other:
MATCH (a:TypeA{param:{param}})-[r:RELD]->(a1)-[:RELH]->(h) where r.param1=true with a,a1,h match (m)-[:RELL]->(d:TypeI) where (d.param2/2)%2=1 optional match (a)-[:RELB]-(b)-[:RELM {param3:true}]->(c) return a1.param,id(a1),collect(b.bid),c.bPhoto
Also enable garbage collection logging in neo4j-wrapper.conf and check if you're suffering from long pauses. If so, consider to reduce heap size.
Looks like that this issue requires more research from your side, but there is some things from my experience.
TL;DR; - I had similar issue with my own unmanaged extension, where transactions were not properly handled.
Language/connector
What language/connector is used in your application?
You should verify that:
If some popular open-source library is used - your application is using latest version. Probably there is bug in your connector.
If you have your own, hand-written solution that works with REST API - verify that ALL http request are closed at client side.
Extension/plugins
It's quite easy to mess things up, if custom-written extensions/plugins are used.
What should be checked:
All transaction are always closed (try-with-resource is used)
Neo4j settings
Verify your server configuration. For example, if you have large value for org.neo4j.server.transaction.timeout and you don't handle properly transaction at client side - you can end up with a lot of running transactions.
Monitoring
You are using Enterprise version. That means that you have access to JMX. It's good idea to check information about active Locks & Transactions.
Another Neo4j version
Maybe you can try another Neo4j version. For example 2.3.0-M03.
This will give answers for questions like:
Is this Neo4j 2.2.5 bug?
Is this existing Neo4j installation misconfiguration?
Linux configuration
Check your Linux configuration.
What is in your /etc/sysctl.conf? Are there any invalid/unrelated settings?
Another server
You can try to spin-up another server (i.e. VM at DigitalOcean), deploy database there and load it with Gatling.
Maybe your server have some invalid configuration?
Try to get rid of everything, that can be cause of the problem, to make it easier to find a problem.

Golang RESTful API load testing causing too many database connections

I think I am having serious issue managing database connection pool in Golang. I built an RESTful API using Gorilla web toolkit which works great when only few requests are being sent over to the server. But now I started performing load testing using loader.io site. I apologize for the long post, but I wanted to give you the full picture.
Before going further, here are some info on the server running the API and MySQL:
Dedicated Hosting Linux
8GB RAM
Go version 1.1.1
Database connectivity using go-sql-driver
MySQL 5.1
Using loader.io I can send 1000 GET requests/15 seconds without problems. But when I send 1000 POST requests/15 seconds I get lots of errors all of which are ERROR 1040 too many database connections. Many people have reported similar issues online. Note that I am only testing on one specific POST request for now. For this post request I ensured the following (which was also suggested by many others online)
I made sure I use not Open and Close *sql.DB for short lived functions. So I created only global variable for the connection pool as you see in the code below, although I am open for suggestion here because I do not like to use global variables.
I made sure to use db.Exec when possible and only use db.Query and db.QueryRow when results are expected.
Since the above did not solve my problem, I tried to set db.SetMaxIdleConns(1000), which solved the problem for 1000 POST requests/15 seconds. Meaning no more 1040 errors. Then I increased the load to 2000 POST requests/15 seconds and I started getting ERROR 1040 again. I tried to increase the value in db.SetMaxIdleConns() but that did not make a difference.
Here some connection statistics I get from the MySQL database on the number of connections by running SHOW STATUS WHERE variable_name = 'Threads_connected';
For 1000 POST requests/15 seconds: observed #threads_connected ~= 100
For 2000 POST requests/15 seconds: observed #threads_connected ~= 600
I also increased the maximum connections for MySQL in my.cnf but that did not make a difference. What do you suggest? Does the code look fine? If yes, then it is probably the connections are just limited.
You will find a simplified version of the code below.
var db *sql.DB
func main() {
db = DbConnect()
db.SetMaxIdleConns(1000)
http.Handle("/", r)
err := http.ListenAndServe(fmt.Sprintf("%s:%s", API_HOST, API_PORT), nil)
if err != nil {
fmt.Println(err)
}
}
func DbConnect() *sql.DB {
db, err := sql.Open("mysql", connectionString)
if err != nil {
fmt.Printf("Connection error: %s\n", err.Error())
return nil
}
return db
}
func PostBounce(w http.ResponseWriter, r *http.Request) {
userId, err := AuthRequest(r)
//error checking
//ready requesy body and use json.Unmarshal
bounceId, err := CreateBounce(userId, b)
//return HTTP status code here
}
func AuthRequest(r *http.Request) (id int, err error) {
//parse header and get username and password
query := "SELECT Id FROM Users WHERE Username=? AND Password=PASSWORD(?)"
err = db.QueryRow(query, username, password).Scan(&id)
//error checking and return
}
func CreateBounce(userId int, bounce NewBounce) (bounceId int64, err error) {
//initialize some variables
query := "INSERT INTO Bounces (.....) VALUES (?, ?, ?, ?, ?, ?, ?, ?)"
result, err := db.Exec(query, ......)
//error checking
bounceId,_ = result.LastInsertId()
//return
}
Go database/sql doesn't prevent you from creating an infinite number of connections to the database. If there is an idle connection in the pool, it will be used, otherwise a new connection is created.
So, under load, your request handlers sql.DB is probably finding no idle connections and so a new connection is created when needed. This churns for a bit -reusing idle connections when possible and creating new when needed-, ultimately reaching the max connections for the Db. And, unfortunately, in Go 1.1 there isn't a convenient way (e.g. SetMaxOpenConns) to limit open connections.
Upgrade to a newer version of Golang. In Go 1.2+ you get SetMaxOpenConns. And check out the MySql docs for starting setting and then tune.
db.SetMaxOpenConns(100) //tune this
If you must use Go 1.1 you'll need to ensure in your code that *sql.DB is only being used by N clients at a time.
#MattSelf proposed solution is correct, but I ran into other issues. Here I highlighted what I did exactly to solve the problem (by the way, the server is running CentOS).
Since I have a dedicated server I increased the max_connections for MySQL
In /etc/my.cnf I added the line max_connections=10000. Although, that is more connections than what I need.
Restart MySQL: service mysql restart
Changed the ulimit -n. That is to increase the number of descriptive files that are open.
To do that I made changes to two files:
In /etc/sysctl.conf I added the line
fs.file-max = 65536
In /etc/security/limits.conf I added the following lines:
* soft nproc 65535
* hard nproc 65535
* soft nofile 65535
* hard nofile 65535
Reboot your server
Upgraded Go to 1.3.3 as suggested by #MattSelf
Set
db.SetMaxOpenConns(10000)
Again the number is too large for what I need, but this proved to me that things worked.
I ran a test using loader.io which consists of 5000 clients each sending POST request all within 15 seconds. All went through without errors.
Something else to note is setting the back_log to a higher value in your my.cnf file to something like a few hundred or 1000. This will help handle more connections per second. See High connections per second.