SQL Server "network-related or instance-specific error" once a day or so (perplexed!) - linq-to-sql

We are experiencing the same error as this StackOverflow Q ...
System.Data.SqlClient.SqlException (0x80131904): A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: TCP Provider, error: 0 - A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.)
at System.Data.ProviderBase.DbConnectionPool.GetConnection(DbConnection owningObject)
at System.Data.ProviderBase.DbConnectionFactory.GetConnection(DbConnection owningConnection)
at System.Data.ProviderBase.DbConnectionClosed.OpenConnection(DbConnection outerConnection, DbConnectionFactory connectionFactory)
at System.Data.SqlClient.SqlConnection.Open()
at System.Data.Linq.SqlClient.SqlConnectionManager.UseConnection(IConnectionUser user)
at System.Data.Linq.SqlClient.SqlProvider.get_IsSqlCe()
at System.Data.Linq.SqlClient.SqlProvider.InitializeProviderMode()
at System.Data.Linq.SqlClient.SqlProvider.System.Data.Linq.Provider.IProvider.Execute(Expression query)
... except that in the referenced StackOverflow Q, they need to restart SQL Server once the error occurs - and we do not. We'll get this error once a day, or once every few days - and all is fine after the error occurs, until the next time it occurs.
This makes us think it's not a "forgot to close connections" issue. We have a moderately busy ASP.NET 4.0 WebForms / SQL Server 2008 R2 app; but we're quite positive we're not exceeding the max # of database connections.
Any thoughts on this problem, or an approach to diagnose?

Thought I would comment on our progress with this.
While none of the SQL Server documentation/articles/blogs mention that this error can be caused by server busyness, I found a forum posting where some seasoned IT pro named Matt Neerincx states that it can be, as follows:
Possible reasons for this error include:
1. Poor network link from client to server.
2. Server is very busy (meaning high CPU) and cannot respond to new connection attempts.
3. Server is running out of memory (so high memory usage for SQL).
4. tcp-ip layer on client is over-saturated with connection attempts so tcp-ip layer rejects the connection.
5. tcp-ip layer on server side is over-staturated with connection attempts and so tcp-ip layer is rejecting new connections.
6. With SQL 2005 SP2 and later there could be a custom login trigger that rejects your connection.
You can increase the connect timeout to potentially alleviate issues #2, #3, #4, #5. Setting a longer connect timeout means the driver will try longer to connect and may eventually succeed.
To determine the root cause of these intermittent failures is not super easy to do unfortunately. What I normally do is start by examining the server environment, is the server constantly running in high CPU for example, this points to #2. Is the server using a hugh amount of memory, this points to #3. You can run SQL Profiler to monitor logins and look for patterns of logins, perhaps every morning at 9AM there is a flurry of connections etc...
So we are presently walking down this path - reducing the # of queries that execute at the same time in some of our batch queries, optimizing some of our queries, etc.
Also, in our app connection string, we increased the connection timeout, and set Min Pool Size to 20 (thinking it's good to try to ensure some existing, unused connections for the app to grab, rather than needing to establish a new connection).
At this moment, it's been almost 48 hours without receiving the error; making us very hopeful.

Related

Azure database for MySQL DB 5.7 Transient handling in .net core

I am creating .net core 2.1 MVC application and using Azure database for MySQL DB 5.7.
I have read below links but seems they are applicable for MS SQL DB.
https://learn.microsoft.com/en-us/azure/mysql/concepts-high-availability
https://learn.microsoft.com/en-us/azure/architecture/best-practices/retry-service-specific
Transient handling for MySQL not possible? Help me link to MYSQL related similar pages.
A transient error, also known as a transient fault, is an error that will resolve itself. Most typically these errors manifest as a connection to the database server being dropped. Also new connections to a server can't be opened. Transient errors can occur for example when hardware or network failure happens.
Transient errors should be handled using retry logic. Situations that must be considered:
An error occurs when you try to open a connection
An idle connection is dropped on the server side. When you try to issue a command it can't be executed
An active connection that currently is executing a command is dropped.
The first and second case are fairly straight forward to handle. Try to open the connection again. When you succeed, the transient error has been mitigated by the system. You can use your Azure Database for MySQL again. We recommend having waits before retrying the connection. Back off if the initial retries fail. This way the system can use all resources available to overcome the error situation. A good pattern to follow is:
Wait for 5 seconds before your first retry.
For each following retry, the increase the wait exponentially, up to 60 seconds.
Set a max number of retries at which point your application considers the operation failed.
Read more here.
And you can read more on how to troubleshoot connection issues to Troubleshoot connection issues to Azure Database for MySQL here.

Increase the amount of connections in my server MySQL

I have aplications that connect to a remote server (MySQL 5.5 on Windows Server 2012), at first I started receiving "too many connections" message which I solved by increasing MAX_CONNECTION value in my.inf to 500, then I start getting "can't create new thread" message so I decrease decrease timeouts to avoid idle connections using a socket, which didn't completely work. Now I get odd messages like 'file not found', as soon as I restart the service I stop getting the messages and everything works correctly.
The problem occurs when the server reaches around 170 connections at the same time.
Is there some configuration I'm missing?, I really don't know what info you need to give me a hint to fix this. I mean, there are servers that accept a lot morw of connections at the same time, right? waht I'm missing.
RAM and CPU of the system dosen't reach 35-40% at max connections (170).
Edit: Error occur at 2 'places', when running a query or at the attempt of conennection, it's like the MySQL service rejects the attempt. VB6 is the language used in the client app (ODBC connector). The app opens, executes and closes the connection.
Note: I have full control over client app and server config.

A transport-level error has occurred when receiving results from the server in sql server 2008

I have fired below Select Statement and I got this error. Any one help me.
select top 100
MenuID, MenuGroup, MenuName, ObjectName, ObjectTitle
from tblMenuMaster
where
ApplicationID = 3
and recStatus = 'A'
Error Message.
A transport-level error has occurred when receiving results from the
server. (provider: TCP Provider, error: 0 - The semaphore timeout
period has expired.)
Already apply Non Clustered Index on tblMenuMaster (MenuGroup,MenuName,ObjectName,ObjectTitle).
Its one of the random error which comes on SQL Server. If you reboot your machine and then try to execute the query, mostly it will not come.
You can check this MSDN blogs to get the details however.
Removing Connections
The connection pooler removes a connection from the pool after it has
been idle for a long time, or if the pooler detects that the
connection with the server has been severed.
Note that a severed connection can be detected only after attempting
to communicate with the server. If a connection is found that is no
longer connected to the server, it is marked as invalid.
Invalid connections are removed from the connection pool only when
they are closed or reclaimed.
If a connection exists to a server that has disappeared, this
connection can be drawn from the pool even if the connection pooler
has not detected the severed connection and marked it as invalid.
This is the case because the overhead of checking that the connection
is still valid would eliminate the benefits of having a pooler by
causing another round trip to the server to occur.
When this occurs, the first attempt to use the connection will detect
that the connection has been severed, and an exception is thrown.
I think the sever connection must be terminated due to which the error occurs. You need to reconnect the server and again fire the query.
However if this occurs on a frequent basis you need to get in touch with your DBA.

Intermittently can't connect to mysql on AWS RDS (Error 2003)

We are having an intermittent issue with connections to our mysql server timing out.
The error we are receiving is as following.
(2003, 'Can\'t connect to MySQL server on \'<connection>\' ((2013, "Lost connection to MySQL server during query (error(104, \'Connection reset by peer\'))"))')
Callstack:
File "/usr/lib64/python2.7/site-packages/pymysql/connections.py", line 818, in _connect
2003, "Can't connect to MySQL server on %r (%s)" % (self.host, e))
File "/usr/lib64/python2.7/site-packages/pymysql/connections.py", line 626, in __init__
self._connect()
Some more info:
We have a flight of EC2 servers that are constantly running queries to a backend RDS.
We average about 500 connections per second to the RDS
We have around 0 - 4 hiccups per RDS per day
The hiccups don't correspond with our maintenance window
When we hit a hiccup it can affect quite a few connections ~50
When a hiccup happens it will disrupt connections across all servers and ports
The error itself looks to be generated from the tcp connection being closed on the ec2. Our TCP keep alive time is set to 7200 seconds and that's when the error is fired off.
My question is what can be done to track down why these hiccups happen? It's great that they don't happen often, but it's not ideal that they happen at all.
Any advice would be appreciated thanks!
Update 10/29:
I've been running a service checking to see if I have any long processes running on the sql server and it looks like these errors aren't getting that far. A new process is never created for this connection! I have still been receiving the hiccups, just no signs of connections.
So after a back and forth with amazon support here is the current solution we have come to.
Amazon has raised our socket listen backlog by adjusting the somaxconn value on the RDS instance.
The value was at the default of 128 and has been bumped up to 1024.
Once the value was adjusted we no longer received the Lost Connection error.

IIS SQL connection fails under heavy load; reset IIS fixes temporarily

Server 1: SQL Server 2008 Standard Edition
Server 2: Windows 2008 Server R2, IIS 7.0
A web-site on Server 2 requires data in SQL Server on Server 1.
Everything works fine for a while (weeks sometimes). Then, under heavy load, Server 2 reports it cannot connect SQL Server on Server 1. Once IIS on Server 2 reports it cannot connect to SQL on Server 1, it does not get better until IIS on Server 2 is restarted. Perhaps restarting the application pool would work as well as a full IIS reset. I'm not certain.
I've tried changing the connection string to increase the connection pool size to unreasonably large values (1,000). The failures still happen.
The web-site is written in C#. The data access layer is NHibernate.
Here is the start of the exception:
[SqlException (0x80131904): A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: Named Pipes Provider, error: 40 - Could not open a connection to SQL Server)]
System.Data.ProviderBase.DbConnectionPool.GetConnection(DbConnection owningObject) +428
System.Data.ProviderBase.DbConnectionFactory.GetConnection(DbConnection owningConnection) +65
System.Data.ProviderBase.DbConnectionClosed.OpenConnection(DbConnection outerConnection, DbConnectionFactory connectionFactory) +117
System.Data.SqlClient.SqlConnection.Open() +122
NHibernate.Connection.DriverConnectionProvider.GetConnection() +60
NHibernate.Impl.SessionFactoryImpl.OpenConnection() +39
Any ideas why this is happening? How to diagnose/fix it? I'm frustrated and considering ripping out NHibernate, which will take months and probably lead to many other kinds of problems.
I think you have to increase connection timeout value and connection pooling as well. So that it won't release connection to SQL Server in heavy load.
The fix for me was to configure SQL server to allow an unlimited number of connections (set to value of "0"). I still had a few connection issues as described in the original post, but no where near as much. So I also setup a direct connection to the SQL server via the secondary Ethernet port on each server (Webserver to SQL Server),and kept the connection private using 196.168.1.x between the servers, and using it's own Ethernet switch not connected to the public switch.