openshift 3.11 storageos networking issue - openshift

I've created an openshift 3.11 3 node cluster, 2 of which are compute
nodes. I've installed storageos on this cluster. One of the compute
nodes seems fine with the storageos installation, however the 2nd
compute node can't reach the 1st node. It appears that the error
is routing related.
the 2nd node will not route to the 1st node it appears.
[root#cortado-o1 standard]# oc get pod -n storageos
NAME READY STATUS RESTARTS AGE
storageos-47qgc 1/1 Running 0 6m
storageos-6bqqp 0/1 Running 3 7m
[root#cortado-o2 ~]# netstat -na | grep 5705
tcp6 0 0 :::5705
[root#cortado-o3 ~]# netstat -na | grep 5705
tcp 0 0 192.168.0.101:43588 192.168.0.101:5705 TIME_WAIT
tcp 0 0 192.168.0.101:43548 192.168.0.101:5705 TIME_WAIT
tcp 0 0 192.168.0.101:43522 192.168.0.101:5705 TIME_WAIT
tcp 0 0 192.168.0.101:43458 192.168.0.101:5705 TIME_WAIT
tcp 0 0 192.168.0.101:43628 192.168.0.101:5705 TIME_WAIT
tcp 0 0 192.168.0.101:43602 192.168.0.101:5705 TIME_WAIT
tcp 0 0 192.168.0.101:43562 192.168.0.101:5705 TIME_WAIT
tcp 0 0 192.168.0.101:43502 192.168.0.101:5705 TIME_WAIT
tcp 0 0 192.168.0.101:43476 192.168.0.101:5705 TIME_WAIT
tcp 0 0 192.168.0.101:43412 192.168.0.101:5705 TIME_WAIT
tcp 0 0 192.168.0.101:43430 192.168.0.101:5705 TIME_WAIT
tcp6 0 0 :::5705 :::* LISTEN
[root#cortado-o3 ~]# !nc
nc 192.168.0.102 5705
Ncat: No route to host.
[root#cortado-o3 ~]# hostname --ip-address
192.168.0.101
time="2018-11-13T04:24:38Z" level=error msg="failed to join existing cluster" action=create category=etcd endpoint="192.168.0.102,192.168.0.101" error="Get http://192.168.0.102:5705/v1/members: dial tcp 192.168.0.102:5705: connect: no route to host" module=cp
time="2018-11-13T04:24:38Z" level=info msg="not first cluster node, joining first node" action=create address=192.168.0.101 category=etcd host=cortado-o3 module=cp target=192.168.0.101
time="2018-11-13T04:24:38Z" level=error msg="failed to join existing cluster" action=create category=etcd endpoint="192.168.0.102,192.168.0.101" error="503 Service Unavailable" module=cp
time="2018-11-13T04:24:38Z" level=info msg="retrying cluster join in 5 seconds..." action=create category=etcd module=cp
any suggestions? many thanks.

I can see on your netstat output that StorageOS is bound to the port, not that they can communicate. In fact the Ncat shows that there is no route to host, so they can't connect. StorageOS needs to be able to communicate among its nodes.
The StorageOS docs have a reference about the prerequisites of the ports and how to open them. https://docs.storageos.com/docs/prerequisites/firewalls
It depends on your OpenShift installation if you use ufw, firewalld or straight ip tables.
For ufw try this:
ufw default allow outgoing
ufw allow 5701:5711/tcp
ufw allow 5711/udp
For firewalld try this:
firewall-cmd --permanent --new-service=storageos
firewall-cmd --permanent --service=storageos --add-port=5700-5800/tcp
firewall-cmd --add-service=storageos --zone=public --permanent
firewall-cmd --reload
For straight iptables:
# Inbound traffic
iptables -I INPUT -i lo -m comment --comment 'Permit loopback traffic' -j ACCEPT
iptables -I INPUT -m state --state ESTABLISHED,RELATED -m comment --comment 'Permit established traffic' -j ACCEPT
iptables -A INPUT -p tcp --dport 5701:5711 -m comment --comment 'StorageOS' -j ACCEPT
iptables -A INPUT -p udp --dport 5711 -m comment --comment 'StorageOS' -j ACCEPT
# Outbound traffic
iptables -I OUTPUT -o lo -m comment --comment 'Permit loopback traffic' -j ACCEPT
iptables -I OUTPUT -d 0.0.0.0/0 -m comment --comment 'Permit outbound traffic' -j ACCEPT
Check also the troubleshooting page of storageos for this particular issue.
https://docs.storageos.com/docs/platforms/openshift/troubleshoot/install#peer-discovery---networking
In addition, less than 3 node cluster is not supported. You can have 1 node for testing or 3+. But having 2 nodes makes impossible to ensure quorum in a distributed environment unless you use StorageOS pointing the kv store to a external etcd.

Related

OKD 4.5 single node installation

I'm trying to build an OKD 4.5 single node cluster following Craig Robinson blog post (at https://medium.com/swlh/guide-okd-4-5-single-node-cluster-832693cb752b). I faced with this issue first on bootstrap node, but after deleting and recreating the whole process again, it booted up successfully. But the same issue happened again while preparing control plane master node. After initial coreos download (which proves webserver is working fine), I get this recurring GET error message over and over again:
ignition[xxx]: GET error: Get "https://api-int.lab.okd.local:22623/config/master": EOF
And this is my control plane node config:
ip=10.106.31.233::10.106.31.1:255.255.255.0:::none nameserver=10.106.31.231 coreos.inst.install_dev=/dev/sda coreos.inst.image_url=http://10.106.31.231:8080/okd4/ fcos.raw.xz coreos.inst.ignition_url=http://10.106.31.231:8080/okd4/master.ign
IPs are:
okd-services: 10.106.31.231 ;
bootstrap: 10.106.31.232 ;
control-plane: 10.106.31.233
I can reach the http://10.106.31.231:8080/okd4 address from remote pc and list the contents including master.ign file. Also pinging "api-int.lab.okd.local" is successful too. firewalld open ports on okd-services node are:
[root#okd4-services ~]# ss -ltu
Netid State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
udp UNCONN 0 0 0.0.0.0:hostmon 0.0.0.0:*
udp UNCONN 0 0 10.106.31.231:domain 0.0.0.0:*
udp UNCONN 0 0 127.0.0.1:domain 0.0.0.0:*
udp UNCONN 0 0 127.0.0.53%lo:domain 0.0.0.0:*
udp UNCONN 0 0 [::]:hostmon [::]:*
udp UNCONN 0 0 [::]:domain [::]:*
tcp LISTEN 0 128 0.0.0.0:ssh 0.0.0.0:*
tcp LISTEN 0 4096 127.0.0.1:rndc 0.0.0.0:*
tcp LISTEN 0 4096 0.0.0.0:https 0.0.0.0:*
tcp LISTEN 0 4096 0.0.0.0:22623 0.0.0.0:*
tcp LISTEN 0 4096 0.0.0.0:cslistener 0.0.0.0:*
tcp LISTEN 0 4096 0.0.0.0:sun-sr-https 0.0.0.0:*
tcp LISTEN 0 4096 0.0.0.0:hostmon 0.0.0.0:*
tcp LISTEN 0 4096 0.0.0.0:http 0.0.0.0:*
tcp LISTEN 0 10 10.106.31.231:domain 0.0.0.0:*
tcp LISTEN 0 10 127.0.0.1:domain 0.0.0.0:*
tcp LISTEN 0 4096 127.0.0.53%lo:domain 0.0.0.0:*
tcp LISTEN 0 128 [::]:ssh [::]:*
tcp LISTEN 0 4096 [::1]:rndc [::]:*
tcp LISTEN 0 4096 [::]:hostmon [::]:*
tcp LISTEN 0 511 *:webcache *:*
tcp LISTEN 0 10 [::]:domain [::]:*
the output of the dig test on okd-services node is:
[root#okd4-services ~]# dig -x 10.106.31.231
; <<>> DiG 9.11.25-RedHat-9.11.25-2.fc33 <<>> -x 10.106.31.231
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 60620
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;231.31.106.10.in-addr.arpa. IN PTR
;; ANSWER SECTION:
231.31.106.10.in-addr.arpa. 604800 IN PTR api-int.lab.okd.local.
231.31.106.10.in-addr.arpa. 604800 IN PTR api.lab.okd.local.
231.31.106.10.in-addr.arpa. 604800 IN PTR okd4-services.okd.local.
;; SERVER: 127.0.0.53#53(127.0.0.53)
I deleted and recreated the control plane to see if it solved the issue, but was not successful. Any idea what this issue means?
I had exact the same issue with this article. The problem was with bootstrap node that cannot finish initialization process. First of all initialize bootstrap node and sure that the process was finished. The simplest way to check what's going on with a node:
Connect to a node with ssh core#<NODE'S-IP> from the machine where you generate ssh-certificate
ssh will provide you useful information:
This is the bootstrap node; it will be destroyed when the master is
fully up.
The primary services are release-image.service followed by
bootkube.service. To watch their status, run e.g.
journalctl -b -f -u release-image.service -u bootkube.service This
is the bootstrap node; it will be destroyed when the master is fully
up.
The primary services are release-image.service followed by
bootkube.service. To watch their status, run e.g.
journalctl -b -f -u release-image.service -u bootkube.service Fedora
CoreOS 32.20200629.3.0 Tracker:
https://github.com/coreos/fedora-coreos-tracker Discuss:
https://discussion.fedoraproject.org/c/server/coreos/
First of all you have to check journalctl -b -f -u release-image.service -u bootkube.service because it's main units. However, there're might be other issues so check failed/not finished systemd's service units with systemctl list-units --type=service and journalctl -f -u <UNIT-NAME> to follow the process
Whole process will spend some time (~20-40 mins in my case) and there're might be some timeouts in journal's logs so just wait the final status of the unit.
Only then you can start initializing of control-plane node
Finally, my issue was in wrong fedora-core-os version because you can use 32 version only. For me it's installed okay with:
fedora-coreos-32.20201104.3.0-metal.x86_64.raw.xz
fedora-coreos-32.20201104.3.0-metal.x86_64.raw.xz.sig
fedora-coreos-32.20201104.3.0-live.x86_64.iso
openshift-client-linux-4.5.0-0.okd-2020-10-15-235428.tar.gz
openshift-install-linux-4.5.0-0.okd-2020-10-15-235428.tar.gz

NodePort in Openshift is not working

Deployed single node cluster with oc
$ sudo oc cluster up --public-hostname=$PUBLIC_IP
$ oc version
oc v3.7.14
kubernetes v1.7.6+a08f5eeb62
features: Basic-Auth GSSAPI Kerberos SPNEGO
Added allowHostPorts: true in oc edit scc restricted's config.
Deployed an application with NodePort config
tiger-dev 172.30.30.215 <nodes> 8080:30111/TCP 10d
I was not able to access the application with $NODEIP:30111, but I can see the traffic coming into the node and NOT going out.
Update1
iptables rules -> https://pastebin.com/NPeZM26W
$ sudo iptables-save | grep 30111
-A KUBE-NODEPORTS -p tcp -m comment --comment "default/tiger-dev:http" -m tcp --dport 30111 -j KUBE-MARK-MASQ
-A KUBE-NODEPORTS -p tcp -m comment --comment "default/tiger-dev:http" -m tcp --dport 30111 -j KUBE-SVC-PPQAPMXK5BHYARPU

Can't connect to MySQL server on 'x.x.x.x' (110)

Hello I've an issue on MySQL I can't connect to it remotely, I already looked for the answers posted here but none of them works for me!
This is the Error Message when I trying to connect to mysql
$> mysql -u user01 -h x.x.x.x -p
ERROR 2003 (HY000): Can't connect to MySQL server on 'x.x.x.x' (110)
Telnet
[root#machine2 ~]# telnet x.x.x.x 3306
Trying x.x.x.x...
telnet: connect to address x.x.x.x: Connection timed out
This the IPTABLES file
#Generated by iptables-save v1.4.7 on Thu Jan 4 21:58:18 2018
*filter
:INPUT ACCEPT [56:6256]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [35:3538]
-A INPUT -p tcp -m tcp --dport 3306 -m state --state NEW,ESTABLISHED -j ACCEPT
-A INPUT -p tcp -m tcp --dport 3306 -j ACCEPT
-A INPUT -i eth0 -p tcp -m tcp --dport 3306 -j ACCEPT
-A INPUT -i eth0 -p tcp -m tcp --dport 3306 -j ACCEPT
-A INPUT -i lo -p tcp -m tcp --dport 3306 -j ACCEPT
-A INPUT -p tcp -m tcp --dport 3306 -j ACCEPT
-A OUTPUT -p tcp -m tcp --sport 3306 -m state --state ESTABLISHED -j ACCEPT
COMMIT
#Completed on Thu Jan 4 21:58:18 2018
-- I already create a user
CREATE USER 'user' IDENTIFIED BY 'pass';
GRANT ALL PRIVILEGES ON *.* TO 'user';
FLUSH PRIVILEGES;
and this is my.cnf file
#For advice on how to change settings please see
#http://dev.mysql.com/doc/refman/5.7/en/server-configuration-defaults.html
[mysqld]
port=3306
skip-name-resolve
skip-external-locking
innodb_buffer_pool_size=3G
#
# Remove leading # and set to the amount of RAM for the most important data
# cache in MySQL. Start at 70% of total RAM for dedicated server, else 10%.
# innodb_buffer_pool_size = 128M
#
# Remove leading # to turn on a very important data integrity option: logging
# changes to the binary log between backups.
# log_bin
#
# Remove leading # to set options mainly useful for reporting servers.
# The server defaults are faster for transactions and fast SELECTs.
# Adjust sizes as needed, experiment to find the optimal values.
# join_buffer_size = 128M
# sort_buffer_size = 2M
# read_rnd_buffer_size = 2M
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
and My port 3306 is already open
[root#localhost ~]# netstat -petulan
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State User Inode PID/Program name
tcp 0 0 0.0.0.0:111 0.0.0.0:* LISTEN 0 12889 2056/rpcbind
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 0 13930 2441/sshd
tcp 0 0 127.0.0.1:631 0.0.0.0:* LISTEN 0 13141 2133/cupsd
tcp 0 0 0.0.0.0:32895 0.0.0.0:* LISTEN 29 12977 2078/rpc.statd
tcp 0 0 x.x.x.x:22 x.x.x.x:49964 ESTABLISHED 0 7891927 4453/sshd
tcp 0 64 x.x.x.x:22 x.x.x.x:50203 ESTABLISHED 0 7892871 4482/sshd
tcp 0 0 :::3306 :::* LISTEN 27 7896831 6648/mysqld
tcp 0 0 :::111 :::* LISTEN 0 12892 2056/rpcbind
Is there something here wrong ?, thank you !
I'm using CentOS 6.9

Can't open ports on VM instances

I'm trying to open tcp and udp port 7774 on google cloud VM instance without results.
I'm sure that my server is using this network. For example, the ssh port is opened, rdp port also should be opened but i can't communicate with the server on this port, the same situation is with 7774 port, i have to setup something which needs this port to communicate, but i don't know how.
I also added rules to iptables:
iptables -A INPUT -p tcp -d 0/0 -s 0/0 --dport 7774 -j ACCEPT
iptables -A INPUT -p udp -d 0/0 -s 0/0 --dport 7774 -j ACCEPT
Without any results.

How to whitelist 2 ip-adresses with ip-tables and block everything else?

I want to whitelist 2 external ip-adresses vor port 3306 (mysql), but block all other IP-adresses to the port 3306 on a debian server running a mysql-instance. Both external ip-adresses should be able to connect to the mysql-server.
What is the best way in iptables?
What i did:
/sbin/iptables -A INPUT -p tcp -d 127.0.0.1 --dport 3306 -j ACCEPT
/sbin/iptables -A INPUT -p tcp -d 1.1.1.1.1 --dport 3306 -j ACCEPT
/sbin/iptables -A INPUT -p tcp -d 85.x.x.x --dport 3306 -j ACCEPT
(1.1.1.1 is an internal ip and masked here for security purposes)
## Block all connections to 3306 ##
/sbin/iptables -A INPUT -p tcp --dport 3306 -j DROP
What happened:
every external ip is locked and can't connect
What should happen:
every external ip will be locked cand can't connect but not 1.1.1.1 and 85.x.x.x and 127.0.0.1
iptables -N mysql # create chain for mysql
iptables -A mysql --src 127.0.0.1 -j ACCEPT
iptables -A mysql --src 1.1.1.1.1 -j ACCEPT
iptables -A mysql --src 85.x.x.x -j ACCEPT
iptables -A mysql -j DROP # drop packets from other hosts
iptables -I INPUT -m tcp -p tcp --dport 3306 -j mysql # use chain for packets to MySQL port