Spain2 node: instance rebooting forever - fiware

i'm working on the Spain2 node. I'm trying to reboot an instance, but the instance keeps in the "rebooting" state forever. I have tried to shut off the instance, it doesn't work. After shutting off, the instance keeps in the "rebooting" state.
What can I do? Thank you in advance!

the service was recovered again in Spain2 node. Try it again and if you have any problem send us a ticket through the mail list fiware-lab-help#lists.fiware.org

Related

Communication link failure: 1047 WSREP has not yet prepared node for application use in

We had a three-node cluster with MariaDB 10.4. We had an outage and the servers all rebooted with one having an irrecoverable network issue at the time.
We set up another server and added it to the cluster as a third member later.
However, ever since that, we have constantly been getting this error every now and then.
*3287799 FastCGI sent in stderr: "PHP message: An Error occurred while handling another error:
PDOException: SQLSTATE[08S01]: Communication link failure: 1047 WSREP has not yet prepared node for application use in /var/....yii2/db/Command.php:1293
In order to fix this issue, we turned down all three nodes one by one and then re-initialized the cluster, even with a new cluster name and all.
The first one was started with "galera_new_cluster" and the remaining two were added to this cluster. However, we still kept getting the same error intermittently.
The workaround at mariadb galera - Error when a node shutdown ERROR 1047 WSREP has not yet prepared node for application use was followed but that didn't do anything, as expected.
Next, what we did is set up a single fresh server and installed the new 10.5.X MariaDB server on it. Took backup from the old cluster using mariabackup and restored it onto this new single server.
This single server was set up as a new cluster with fresh details and everything. We wanted to run it as a single node cluster to make sure if the error still persisted. Oddly enough, the error is still there and it comes off every half an hour or so.
Has anyone got any clue what could be the reason for this weird issue we're facing? Currently, we don't know what exactly is the issue which is why we're facing a hard time solving it.
Any help would be greatly appreciated.
Update:
We turned off galera on this single-node cluster and ran it as a simple stand-alone mariadb server. However we still go the same errors in our web-server's logs. This is bonkers.
Any idea? Anyone?

Running Fiware-Cygnus listener in CentOS

I have a VM with CentOS installed, where Orion Context Broker, Cygnus, Mosqquito and MongoDB are present. When I am checking connections with the following command:
netstat -ntlpd
I receive the following data Connections
It is seen that something is already listening to ports 8081 and 5050 (which are of Cygnus). But the Cygnus itself is not active, when I use the following:
service cygnus status
There aren't any instance of Cygnus running
While trying to run Cygnus test, it gives me fatal error which states that ports are taken and that the configuration is wrong.
Trying to run cygnus from
sudo service cygnus start
also fails. Here is the systemctl status:
FailedCygnus
After checking what is the process under the PID that is assigned to the Cygnus ports, I have this:
CygnusPorts
Perhaps someone has any clue what that can be? It feels like Cygnus is there but something is configured wrong. Also, is there another way of running Cygnus then, because I need to receive notifications from subscriptions somehow.
Thank you in advance, Stackoverflow!
EDIT 1
Tried killing processes under those PIDs that are listening to ports 5050 and 8081 but it did not help, cygnus still cannot be started.
Currently thinking of simply reinstalling everything.
EDIT 2
So, I have managed to run the simple "dummy" listener using the agent_test file. But I guess it is good only in the beginning and for learning purposes, later using own configurations is preferred?
So, for further investigation using agent-test.conf file is enough for me, the listener works and data is stored in a database. Perhaps in the future I will encounter this problem again, but for now it works.
What I had to do beforehand is to kill existing processes.

Google Cloud instance can't be accessed via SSH after cloning

I'm desperate for help here. I have a compute engine instance that hosts a lot of websites. These are the steps that I took:
Go to Compute Engine > Snapshots and take a snapshot of my instance
Click on the newly created snapshot and click Create Instance.
The new instance has all the configs of the current running instance
Then when I tried to access the new instance via SSH, it wouldn't work. Error message:
"Connection Failed
We are unable to connect to the VM on port 22. Learn more about possible causes of this issue."
Clicking on Learn more gets me to https://cloud.google.com/compute/docs/ssh-in-browser#ssherror
The instance is booting up and sshd is not yet running - Not sure how to check this
The instance is not running sshd - Not sure how to check this either
sshd is listening on a port other than the one you are connecting to - My current instance is having ssh running on port 22 so I guess this is fine?
There is no firewall rule allowing SSH access on the port - Again, my current instance is having ssh running so I don't think it's because of firewall, right?
The firewall rule allowing SSH access is enabled, but is not configured to allow connections from GCP Console services. - Same as above
The instance is shut down - Instance is still running.
Strange thing is if I create a fresh instance from scratch and then do the steps above to clone to a new instance then that new instance can be accessed normally via SSH.
Can anyone show me how to fix this if possible? Or show me how to see logs, check for what went wrong etc as I tried to google but pretty confused with all the jargons or where to find a particular stuff. Sorry for the wall of text. Thanks
**
Edit #1
**: I got technical support from Google. The steps below might help someone else, but not me as when I reached step 7, I waited forever and couldn't get to the login page.
1.) Go to the VM instances page and click on the Instance name of your VM.
2.) Click the Edit button at the top of the page.
3.) Under Custom metadata, click Add item.
4.) Set 'Key' to 'startup-script' and set 'Value' to this script:
#! /bin/bash
useradd -G sudo USERNAME
echo 'USERNAME:PASSWORD' | chpasswd
NOTE: change the value of USERNAME and PASSWORD to the name and password of your choice.
5.) Enable "Enable connecting to serial ports" by checking the box below the SSH button.
6.) Click Save and then click RESET on the top of the page. Wait for some time for the instance to reboot.
7.) Click on 'Connect to serial port' in the page. In the new window, you might need to wait a bit and press on Enter of your keyboard once; then, you should see the login prompt.
8.) Login using the USERNAME and PASSWORD you provided.
Note: Please do not share any of your password and username for your data security.
As those steps above couldn't help me and the Google support representative looked at the log but didn't see anything wrong, she suggested to debug SSH following this guide https://cloud.google.com/compute/docs/troubleshooting/troubleshooting-ssh#use_your_disk_on_a_new_instance which I will do when I have time. Feel like I'm writing an essay. Will keep posted
The troubleshooting steps that you can follow are:
Use the serial console to view your instance logs and check whether the new instance you created from the snapshot failed to start to the appropriate run level where the ssh daemon would get started. If sshd was not started you would not have ssh access to your instance.
You can try restarting the instance if it doesn’t affect production and try to gain ssh access again. Might be that some issue prevented the instance from starting up properly and restarting it could fix it.
You can try creating another VM instance from the snapshot in case the previous instance wasn’t created properly.
If creating a new VM instance from the snapshot doesn’t fix the issue, it might be that the snapshot itself wasn’t created properly. You can read this documentation guide, section Understanding snapshot best practices, and try creating another snapshot and VM instances from it.
I had the same problem and after a lot of searching, I found an answer from user Peripheral from ServerFault that worked for me.
I found the fix for me. A recent update has a known issue where it removes the default gateway from the iptables. To fix it, I have to go to the instance and select Edit. Scroll down, and under Custom Metadata put the following:
key: startup-script
value: route add default gw <gatewayIP> eth0
Save and restart the VM.
Source
All credits to him/her, just want to share to help others find their solution faster.
I had the same issue. I eventually figured that it was because I attached a persistent disk added an entry into the /etc/fstab file. This entry is supposed to automatically mount the attached disk upon restart of the instance.
However, when I created a snapshot of the boot disk, I didn't remove the /etc/fstab entry. So creating a new instance from this snapshot will always cause a boot error as the script tries to mount a disk that is not attached.
This information is present in the documentation

WARNING neutron_lbaas.services.loadbalancer.drivers.haproxy.namespace_driver [-] Stats socket not found for pool

I have the following problem with OpenStack Libery lbaas. When I create a new pool this error starts to appear:
WARNING neutron_lbaas.services.loadbalancer.drivers.haproxy.namespace_driver [-] Stats socket not found for pool
I have deployed controller and compute services on the same node and I use lbaas not lbaasv2. I use linuxBridgeDriver.
Can you help with that, cause I don't know what is wrong.
I had the same problem. For me the messages stopped as soon as I assigned a VIP to the LBaaS.
The stats socket is in a directory with the ID as part of it:
/var/lib/neutron/lbaas/<lbaas-id>
But this directory is created the moment you assign a VIP.

TimeOut Error in Openshift

I want to create application in my local broker with rhc tools and i get this error, any ideas how to fix it :
Unable to complete the requested operation due to: Timed out while trying to fetch
information from the nodes. If the problem persists please contact Red Hat support
Thanks
Wait a few minutes and try again. Worked for me.