I'm setting up a PrestaShop installation on a development server which is a GCE instance and using Cloud SQL as a database server. Everything works just fine except one thing: whenever there is a long period of inactivity on the site, the first page load after that always gives me this error:
Link to database cannot be established: SQLSTATE[HY000] [2003]
If I refresh the page the error is gone and never appears again until I stop using the site for an hour or so. It almost looks like database instance is going into sleep mode or something like that.
The reason I mentioned Prestashop is the fact that I never get this error when using Adminer or connecting to the database from mysql console client.
With the per use billing model, instances are spun down after a 15 minute timeout to save you money. They then take a few seconds to be spun up when next accessed. It may be the Prestashop is timing out on these first requests (though I have no experience with that application).
Try changing your instance to a package billing, which has a 12 hour timeout, to see if this helps
https://developers.google.com/cloud-sql/faq#how_usage_calculated
According to GCE documentation,
Once a connection has been established with an instance, traffic is permitted in both directions over that connection, until the connection times out after 10 minutes of inactivity
I suspect that might be the cause. To get around it, you can try to lower the tcp keepalive time.
Refer here: https://cloud.google.com/sql/docs/compute-engine-access
To keep long-lived unused connections alive, you can set the TCP keepalive. The following commands set the TCP keepalive value to one minute and make the configuration permanent across instance reboots.
# Display the current tcp_keepalive_time value.
$ cat /proc/sys/net/ipv4/tcp_keepalive_time
# Set tcp_keepalive_time to 60 seconds and make it permanent across reboots.
$ echo 'net.ipv4.tcp_keepalive_time = 60' | sudo tee -a /etc/sysctl.conf
# Apply the change.
$ sudo /sbin/sysctl --load=/etc/sysctl.conf
# Display the tcp_keepalive_time value to verify the change was applied.
$ cat /proc/sys/net/ipv4/tcp_keepalive_time
Related
Since google Compute engine does not provides internal DNS i created 2 centos bind machines which will do the resolving for the machines on GCE and forward the resolvings over vpn to my private cloud and vice versa.
as the google cloud help docs suggests you can have this kind of scenario. and edit the resolv.conf on each instance to do the resolving.
What i did was edit the ifcg-eth0 to disable the PEERDNS and in /etc/resolv.conf
i added the search domain and top 2 nameservrs my instances.
now after one instance gets rebooted..it wont start again because its searching for the metadata.google.internal domain
Jul 8 10:17:14 instance-1 google: Waiting for metadata server, attempt 412
What is the best practice in this kind of scenarios?
ty
Also i need the internal DNS for to do the poor's man round-robin failover, since GCE does not provides internal balancers.
As mentioned at https://cloud.google.com/compute/docs/networking:
Each instance's metadata server acts as a DNS server. It stores the DNS entries for all network IP addresses in the local network and calls Google's public DNS server for entries outside the network. You cannot configure this DNS server, but you can set up your own DNS server if you like and configure your instances to use that server instead by editing the /etc/resolv.conf file.
So you should be able to just use 169.254.169.254 for your DNS server. If you need to define external DNS entries, you might like Cloud DNS. If you set up a domain with Cloud DNS, or any other DNS provider, the 169.254.169.254 resolver should find it.
If you need something more complex, such as customer internal DNS names, then your own BIND server might be the best solution. Just make sure that metadata.google.internal. resolves to 169.254.169.254.
OK, I just ran in to this.. but unfortunately there was no timeout after 30 minutes that got it working. Fortunatly nelasx had correctly diagnosed it, and given the fix. I'm adding this to give the steps I had to take based on his excellent question and commented answer. I've just pulled the info I had to gather together in one place, to get to a solution.
Symptoms: on startup of the google instance - getting connection refused
After inspecting serial console output, will see:
Jul 8 10:17:14 instance-1 google: Waiting for metadata server, attempt 412
You could try waiting, didn't work for me, and inspection of https://github.com/GoogleCloudPlatform/compute-image-packages/blob/master/google-startup-scripts/usr/share/google/onboot
# Failed to resolve host or connect to host. Retry indefinitely.
6|7) sleep 1.0
log "Waiting for metadata server, attempt ${count}"
Led me to believe that will not work.
So, the solution was to fiddle with the disk, to add in nelasx's solution:
"edit ifcfg-eth and change PEERDNS=no edit /etc/resolv.conf and put on top your nameservers + search domain edit /etc/hosts and add: 169.254.169.254 metadata.google.internal"
To do this,
Best to create a snapshot backup before you start in case it goes awry
Uncheck "Delete boot disk when instance is deleted" for your instance
Delete the instance
Create a micro instance
Mount the disk
sudo ls -l /dev/disk/by-id/* # this will give you the name of the instances
sudo mkdir /mnt/new
sudo mount /dev/disk/by-id/scsi-0Google_PersistentDisk_instance-1-part1 /mnt/new
where instance-1 will be changed as per your setup
Go in an edit as per nelasx's solution - idiot trap I fell for - use a relative path - don't just sudo vi /etc/hosts use /mnt/new/etc/hosts - that cost me 15 more minutes as I had to go through the: got depressed, scratched head, kicked myself cycle.
Delete the debug instance, ensuring your attached disk delete option is unchecked
Create a new instance matching your original with the edited disk as your boot disk and fire it up.
I setup an old Django app on a new GCE instance on Sunday and pointed it to a new CloudSQL instance with imported data there. This code and data has successfully run over the past few years on a variety of dedicated hosting setups, on EC2 and on EC2+RDS.
Since Sunday I have had intermittent reports of 2013, "Lost connection to MySQL server at 'reading initial communication packet', system error: 0" from the App. In particular today it happened in two bursts of 3 separated by about 7 hours.
I panicked in the earlier outages and restarted both the app and the CloudSQL instance which did the trick. However the latter ones today righted themselves after a few minutes.
I've never encountered this error before working with MySQL and any searching on the error gives results related to people who have general access problems to the DB.
On the GCE side the only difference I can think of from previous setups is that it is using Google's out-of-the-box Debian image instead of Ubuntu 12.04. On the MySQL side I have no idea as I've successfully run this on both MySQL 5.x and MariaDB.
Is there any way of figuring out why this is happening and fixing it?
Thanks.
Have you tried changing keep alive settings for TCP connections? GCE has a firewall rules that drops idle TCP connections after 10 mins:
https://cloud.google.com/compute/docs/troubleshooting#communicatewithinternet
You can check current value of 'tcp_keepalive_time':
cat /proc/sys/net/ipv4/tcp_keepalive_time
And change it to 60 seconds:
vi /etc/sysctl.conf
# Add this line
net.ipv4.tcp_keepalive_time = 60
# Reload Sysctl interface
sudo /sbin/sysctl --load=/etc/sysctl.conf
You might need to restart Django server to pick new keep alive settings.
Note: If this problem was limited to yesterday (18/11/2014) and your Cloud SQL instances are located in EU, you might have been affected by this:
https://groups.google.com/forum/#!topic/google-cloud-sql-announce/k5raPT48hc0
It seems there is an issue with the firewall, preventing incoming connections, just change the location from the server to another mirror things suppose to go fine. It worked for me
We have 8 phusion passengers with 20 connections each. should be 160 max connection. Today our Mysql connection crossed 300 and our server stopped responding.
What would happen if the thread dies unnaturally ? how do db connections associated with it get cleaned-up?
How to debug this type of scenario ?
What would happen if the thread dies unnaturally ?
If you are using your statements under transaction then all statements under un-successful transactions will be rolled back but if you are executing your statements individually then how many have been executed will be saved in db but other un-successful will not and data can be in-consistant.
how do db connections associated with it get cleaned-up?
As per my assumption your application is opening connections more than required and also not closing properly. In this case if you check connections in mysql admininstrator then you will get so many connections in sleep mode. So you can kill all sleep connections to clean them.
How to debug this type of scenario ?
Step1: Enable general logs on your server:
Step2: execute below command in any gui tool like sqlyog etc.
show full processlist;
Step3: Take first sleep process from above command:
Step3: find/grep above process id (suppose it is 235798) in your general log. You can use below command.
$ cat /var/log/mysqldquery.log | grep 235798
Note: there can be different file name and path for your general log file.
Above command will show you few lines, check if you are closing connection at the end, should show "quite" statement at the end of line. In this way you need to check few processes those are in sleep mode and you can jugde which type statements/from which module opening extra connectiions and not closing and accordingly you can take action.
I am getting intermittent 'Connection Timed Out' errors when a php script on my web server connects to the MySQL database server over the private network. However, if I tell the script to use the public network to connect, these errors do not appear.
My connection script is setup so that whenever I try to connect to mysql, it checks for errors, if there is an error, it sends me an email then automatically switches to the public network to try that connection. If the public connection fails, it sends me another email and displays a custom web page to the user.
I get about 5 to 10 connection errors every hour. There are hundreds of successful connections every minute.
These machines are dedicated machines. I contacted our hosting company and they tested the routers and cables and said everything is fine. I tried pinging the servers both ways and there are no errors at all for test periods over an hour.
I am using the latest Nginx with the latest PHP and PHP-FPM. Mysql is 5.5.27. These are Centos 6 64bit systems with that latest updates.
I've tried many network configuration options, adjustments to php-fpm & mysql config file and no matter what I do or change, nothing fixes it.
The weird thing is, everything works great over the public network and pings and file transfer work great over the private network between both machines.
Any ideas?
** UPDATE **
I made some changes to the PHP-FPM config file and to the MySQL config file and the errors are now about 2 to 3 per hour but still unresolved.
I'm not sure this is your case but still worth mentioning as it helped me in a similar situation. Basically, there is a cap on max number of connections in linux kernel: https://serverfault.com/questions/10852/what-limits-the-maximum-number-of-connections-on-a-linux-server
Not sure if it is shared between all the networks, but if you think it's worth checking I'd just raise those variable values say twice and see if it had any effect on how frequently the error happens.
When i was checking for mysql load time on site. i got result showing connections as TIME_WAIT. Even though i close connection on every page. Sometimes the site doesnt load saying too many connections. What could be solution to this problem?
Thanks in Advance for any replies or suggestions
If a client connects to a MySQL-Server, it usually opens a local port, example:
localhost:12345 -> mysqlserver:3306
If the client closes the connection, the client gets a TIME_WAIT. Due to TCP routing, a packet might arrive late on the temporary port. A connection in TIME_WAIT just discards these packets. Without a TIME_WAIT, the local port might be reused for another connection and might receive packets from a former connection.
On an high frequent application on the web which opens a mysql-connection per request, a high amount of TIME_WAIT connections is expectable. There is nothing wrong with it.
Problems can occur, if your local port range is too low, so you cannot open outgoing connections any more. The usual timeout is set to 60 seconds. So a problem can already occur on more than 400 requests per second on low ranges.
Check:
To check the amount of TIME_WAIT, you can use the following command:
$ cat /proc/net/sockstat
sockets: used 341
TCP: inuse 12 orphan 0 tw 33365 alloc 23 mem 16
UDP: inuse 9 mem 2
UDPLITE: inuse 0
RAW: inuse 0
FRAG: inuse 0 memory 0
The value after "tw", in this case 33365, shows the amount of TIME_WAIT.
Solutions:
a. TIME_WAIT tuning (Linux based OS examples):
Reduce the timeout for TIME_WAIT:
# small values are ok, if your mysql server is in the same local network
echo 15 > /proc/sys/net/ipv4/tcp_fin_timeout
Increase the port range for local ports:
# check, what you highest listening ports are, before setting this
echo 15000 65000 > /proc/sys/net/ipv4/ip_local_port_range
The settings /proc/sys/net/ipv4/tcp_tw_recycle and /proc/sys/net/ipv4/tcp_tw_reuse might be interesting, too. (But we experienced strange side effects with these settings, so better avoid them. More informations in this answer)
b. Persistent Connections
Some programming languages and libraries support persistent connections. Another solution might be using a locally installed proxy like "ProxySQL". This reduces the amount of new and closed connections.
If you are getting alot of TIME_WAIT connections on the Mysql Server then that means that Mysql server is closing the connection. The most likely case in this instance would be that a host or several hosts got on a block list. You can clear this by running
mysqladmin flush-hosts
to get a list of the number of connections you have per ip run,
netstat -nat | awk {'print $5'} | cut -d ":" -f1 | sort | uniq -c | sort -n
you can also confirm this is happening by going to one of your clients that is having trouble connecting and telnet to port 3306. It will thow a message with something like,
telnet mysqlserver 3306
Trying 192.168.1.102...
Connected to mysqlserver.
Escape character is '^]'.
sHost 'clienthost.local' is blocked because of many connection errors; unblock with 'mysqladmin flush-hosts'Connection closed by foreign host.
As #Zimbabao suggested in the comment, debug your code for any potential errors that may halt the execution of closing the Mysql connection.
If nothing works, check your my.cnf for a system variable called wait_timeout. If its not present add it to the section [mysqld] and restart your Mysql server.
[mysqld]
wait_timeout = 3600
Its the number of seconds the server waits for activity on a noninteractive connection before closing it. Further information can be found http://dev.mysql.com/doc/refman/5.0/en/server-system-variables.html#sysvar_wait_timeout
Tune the figure 3600 (1 hour) to your requirements.
HTH