OKD 4.5 single node installation - openshift

I'm trying to build an OKD 4.5 single node cluster following Craig Robinson blog post (at https://medium.com/swlh/guide-okd-4-5-single-node-cluster-832693cb752b). I faced with this issue first on bootstrap node, but after deleting and recreating the whole process again, it booted up successfully. But the same issue happened again while preparing control plane master node. After initial coreos download (which proves webserver is working fine), I get this recurring GET error message over and over again:
ignition[xxx]: GET error: Get "https://api-int.lab.okd.local:22623/config/master": EOF
And this is my control plane node config:
ip=10.106.31.233::10.106.31.1:255.255.255.0:::none nameserver=10.106.31.231 coreos.inst.install_dev=/dev/sda coreos.inst.image_url=http://10.106.31.231:8080/okd4/ fcos.raw.xz coreos.inst.ignition_url=http://10.106.31.231:8080/okd4/master.ign
IPs are:
okd-services: 10.106.31.231 ;
bootstrap: 10.106.31.232 ;
control-plane: 10.106.31.233
I can reach the http://10.106.31.231:8080/okd4 address from remote pc and list the contents including master.ign file. Also pinging "api-int.lab.okd.local" is successful too. firewalld open ports on okd-services node are:
[root#okd4-services ~]# ss -ltu
Netid State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
udp UNCONN 0 0 0.0.0.0:hostmon 0.0.0.0:*
udp UNCONN 0 0 10.106.31.231:domain 0.0.0.0:*
udp UNCONN 0 0 127.0.0.1:domain 0.0.0.0:*
udp UNCONN 0 0 127.0.0.53%lo:domain 0.0.0.0:*
udp UNCONN 0 0 [::]:hostmon [::]:*
udp UNCONN 0 0 [::]:domain [::]:*
tcp LISTEN 0 128 0.0.0.0:ssh 0.0.0.0:*
tcp LISTEN 0 4096 127.0.0.1:rndc 0.0.0.0:*
tcp LISTEN 0 4096 0.0.0.0:https 0.0.0.0:*
tcp LISTEN 0 4096 0.0.0.0:22623 0.0.0.0:*
tcp LISTEN 0 4096 0.0.0.0:cslistener 0.0.0.0:*
tcp LISTEN 0 4096 0.0.0.0:sun-sr-https 0.0.0.0:*
tcp LISTEN 0 4096 0.0.0.0:hostmon 0.0.0.0:*
tcp LISTEN 0 4096 0.0.0.0:http 0.0.0.0:*
tcp LISTEN 0 10 10.106.31.231:domain 0.0.0.0:*
tcp LISTEN 0 10 127.0.0.1:domain 0.0.0.0:*
tcp LISTEN 0 4096 127.0.0.53%lo:domain 0.0.0.0:*
tcp LISTEN 0 128 [::]:ssh [::]:*
tcp LISTEN 0 4096 [::1]:rndc [::]:*
tcp LISTEN 0 4096 [::]:hostmon [::]:*
tcp LISTEN 0 511 *:webcache *:*
tcp LISTEN 0 10 [::]:domain [::]:*
the output of the dig test on okd-services node is:
[root#okd4-services ~]# dig -x 10.106.31.231
; <<>> DiG 9.11.25-RedHat-9.11.25-2.fc33 <<>> -x 10.106.31.231
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 60620
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;231.31.106.10.in-addr.arpa. IN PTR
;; ANSWER SECTION:
231.31.106.10.in-addr.arpa. 604800 IN PTR api-int.lab.okd.local.
231.31.106.10.in-addr.arpa. 604800 IN PTR api.lab.okd.local.
231.31.106.10.in-addr.arpa. 604800 IN PTR okd4-services.okd.local.
;; SERVER: 127.0.0.53#53(127.0.0.53)
I deleted and recreated the control plane to see if it solved the issue, but was not successful. Any idea what this issue means?

I had exact the same issue with this article. The problem was with bootstrap node that cannot finish initialization process. First of all initialize bootstrap node and sure that the process was finished. The simplest way to check what's going on with a node:
Connect to a node with ssh core#<NODE'S-IP> from the machine where you generate ssh-certificate
ssh will provide you useful information:
This is the bootstrap node; it will be destroyed when the master is
fully up.
The primary services are release-image.service followed by
bootkube.service. To watch their status, run e.g.
journalctl -b -f -u release-image.service -u bootkube.service This
is the bootstrap node; it will be destroyed when the master is fully
up.
The primary services are release-image.service followed by
bootkube.service. To watch their status, run e.g.
journalctl -b -f -u release-image.service -u bootkube.service Fedora
CoreOS 32.20200629.3.0 Tracker:
https://github.com/coreos/fedora-coreos-tracker Discuss:
https://discussion.fedoraproject.org/c/server/coreos/
First of all you have to check journalctl -b -f -u release-image.service -u bootkube.service because it's main units. However, there're might be other issues so check failed/not finished systemd's service units with systemctl list-units --type=service and journalctl -f -u <UNIT-NAME> to follow the process
Whole process will spend some time (~20-40 mins in my case) and there're might be some timeouts in journal's logs so just wait the final status of the unit.
Only then you can start initializing of control-plane node
Finally, my issue was in wrong fedora-core-os version because you can use 32 version only. For me it's installed okay with:
fedora-coreos-32.20201104.3.0-metal.x86_64.raw.xz
fedora-coreos-32.20201104.3.0-metal.x86_64.raw.xz.sig
fedora-coreos-32.20201104.3.0-live.x86_64.iso
openshift-client-linux-4.5.0-0.okd-2020-10-15-235428.tar.gz
openshift-install-linux-4.5.0-0.okd-2020-10-15-235428.tar.gz

Related

Ejabberd does not work on MAC with [Failed to open socket at [::]:5222]

I am facing errors at the first cup of Ejabberd.
On my Mac(10.13.6) I installed "ejabberd-18.12.1-osx.app" and I have followed all instruction written in official website. (https://docs.ejabberd.im/admin/installation/#install-on-macos)
After installation was completed I noticed nothing significant and found error logs were generated as below.
2019-01-16 10:02:03.936 [error] <0.316.0>#ejabberd_listener:report_socket_error:417 Failed to open socket at [::]:5222 for ejabberd_c2s: address already in use
2019-01-16 10:02:03.937 [error] <0.315.0> Supervisor ejabberd_listener had child {5222,{0,0,0,0,0,0,0,0},tcp} started with ejabberd_listener:start({5222,{0,0,0,0,0,0,0,0},tcp}, ejabberd_c2s, [{ip,{0,0,0,0,0,0,0,0}},{max_stanza_size,262144},{shaper,c2s_shaper},{access,c2s},{starttls_required,...}]) at undefined exit with reason eaddrinuse in context start_error
2019-01-16 10:02:03.937 [error] <0.274.0> Supervisor ejabberd_sup had child ejabberd_listener started with ejabberd_listener:start_link() at undefined exit with reason {shutdown,{failed_to_start_child,{5222,{0,0,0,0,0,0,0,0},tcp},eaddrinuse}} in context start_error
2019-01-16 10:02:03.942 [critical] <0.81.0>#ejabberd_app:start:66 Failed to start ejabberd application: {error,{shutdown,{failed_to_start_child,ejabberd_listener,{shutdown,{failed_to_start_child,{5222,{0,0,0,0,0,0,0,0},tcp},eaddrinuse}}}}}
I exactly understand what "address already in use" means but netstat does not show any possession on this port. Also I never changed any of the server configuration. I tried to start server manually but same errors repeats.
Does this version of Ejabberd have bugs on Mac installation?
Many thanks in advance.
When ejabberd starts, it uses several ports (some for XMPP, others for additional ejabberd features, others for Erlang). Notice that some ports may be in IPv6:
tcp 0 0 0.0.0.0:42859 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:4560 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:epmd 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:5280 0.0.0.0:* LISTEN
tcp6 0 0 [::]:epmd [::]:* LISTEN
tcp6 0 0 [::]:xmpp-client [::]:* LISTEN
tcp6 0 0 [::]:xmpp-server [::]:* LISTEN
Maybe you have other previous ejabberd installation around there messing? Or other XMPP server?

openshift 3.11 storageos networking issue

I've created an openshift 3.11 3 node cluster, 2 of which are compute
nodes. I've installed storageos on this cluster. One of the compute
nodes seems fine with the storageos installation, however the 2nd
compute node can't reach the 1st node. It appears that the error
is routing related.
the 2nd node will not route to the 1st node it appears.
[root#cortado-o1 standard]# oc get pod -n storageos
NAME READY STATUS RESTARTS AGE
storageos-47qgc 1/1 Running 0 6m
storageos-6bqqp 0/1 Running 3 7m
[root#cortado-o2 ~]# netstat -na | grep 5705
tcp6 0 0 :::5705
[root#cortado-o3 ~]# netstat -na | grep 5705
tcp 0 0 192.168.0.101:43588 192.168.0.101:5705 TIME_WAIT
tcp 0 0 192.168.0.101:43548 192.168.0.101:5705 TIME_WAIT
tcp 0 0 192.168.0.101:43522 192.168.0.101:5705 TIME_WAIT
tcp 0 0 192.168.0.101:43458 192.168.0.101:5705 TIME_WAIT
tcp 0 0 192.168.0.101:43628 192.168.0.101:5705 TIME_WAIT
tcp 0 0 192.168.0.101:43602 192.168.0.101:5705 TIME_WAIT
tcp 0 0 192.168.0.101:43562 192.168.0.101:5705 TIME_WAIT
tcp 0 0 192.168.0.101:43502 192.168.0.101:5705 TIME_WAIT
tcp 0 0 192.168.0.101:43476 192.168.0.101:5705 TIME_WAIT
tcp 0 0 192.168.0.101:43412 192.168.0.101:5705 TIME_WAIT
tcp 0 0 192.168.0.101:43430 192.168.0.101:5705 TIME_WAIT
tcp6 0 0 :::5705 :::* LISTEN
[root#cortado-o3 ~]# !nc
nc 192.168.0.102 5705
Ncat: No route to host.
[root#cortado-o3 ~]# hostname --ip-address
192.168.0.101
time="2018-11-13T04:24:38Z" level=error msg="failed to join existing cluster" action=create category=etcd endpoint="192.168.0.102,192.168.0.101" error="Get http://192.168.0.102:5705/v1/members: dial tcp 192.168.0.102:5705: connect: no route to host" module=cp
time="2018-11-13T04:24:38Z" level=info msg="not first cluster node, joining first node" action=create address=192.168.0.101 category=etcd host=cortado-o3 module=cp target=192.168.0.101
time="2018-11-13T04:24:38Z" level=error msg="failed to join existing cluster" action=create category=etcd endpoint="192.168.0.102,192.168.0.101" error="503 Service Unavailable" module=cp
time="2018-11-13T04:24:38Z" level=info msg="retrying cluster join in 5 seconds..." action=create category=etcd module=cp
any suggestions? many thanks.
I can see on your netstat output that StorageOS is bound to the port, not that they can communicate. In fact the Ncat shows that there is no route to host, so they can't connect. StorageOS needs to be able to communicate among its nodes.
The StorageOS docs have a reference about the prerequisites of the ports and how to open them. https://docs.storageos.com/docs/prerequisites/firewalls
It depends on your OpenShift installation if you use ufw, firewalld or straight ip tables.
For ufw try this:
ufw default allow outgoing
ufw allow 5701:5711/tcp
ufw allow 5711/udp
For firewalld try this:
firewall-cmd --permanent --new-service=storageos
firewall-cmd --permanent --service=storageos --add-port=5700-5800/tcp
firewall-cmd --add-service=storageos --zone=public --permanent
firewall-cmd --reload
For straight iptables:
# Inbound traffic
iptables -I INPUT -i lo -m comment --comment 'Permit loopback traffic' -j ACCEPT
iptables -I INPUT -m state --state ESTABLISHED,RELATED -m comment --comment 'Permit established traffic' -j ACCEPT
iptables -A INPUT -p tcp --dport 5701:5711 -m comment --comment 'StorageOS' -j ACCEPT
iptables -A INPUT -p udp --dport 5711 -m comment --comment 'StorageOS' -j ACCEPT
# Outbound traffic
iptables -I OUTPUT -o lo -m comment --comment 'Permit loopback traffic' -j ACCEPT
iptables -I OUTPUT -d 0.0.0.0/0 -m comment --comment 'Permit outbound traffic' -j ACCEPT
Check also the troubleshooting page of storageos for this particular issue.
https://docs.storageos.com/docs/platforms/openshift/troubleshoot/install#peer-discovery---networking
In addition, less than 3 node cluster is not supported. You can have 1 node for testing or 3+. But having 2 nodes makes impossible to ensure quorum in a distributed environment unless you use StorageOS pointing the kv store to a external etcd.

Bootnode public address

I am trying to deploy an small private Ethereum network using geth. I have a server running geth configured as a miner in my local network. In the other side I have a droplet in DigitalOcean that I want to use as a bootnode to connect future nodes to my network.
I have executed the following commands in my DigitalOcean Droplet:
bootnode --genkey=boot.key
bootnode --nodekey=boot.key --addr:$(MY_PUBLICIP):30301
And I get the following output from the command instead of my public key that I need to introduce as my enode reference in the future nodes:
INFO [10-29|18:13:32.851] New local node record seq=1 id=785b198c28c625f8 ip=<nil> udp=0 tcp=0
Could please somebody tell how to interpret the output from the bootnode command?
I introduced a netstat command in order to find out whether or not the program is opening a port.
ether#ubuntu-s-1vcpu-2gb-ams3-01:~$ netstat -l
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 localhost:domain 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:ssh 0.0.0.0:* LISTEN
tcp6 0 0 [::]:ssh [::]:* LISTEN
udp 6912 0 localhost:domain 0.0.0.0:*
udp 0 0 ubuntu-s-1vcpu-2g:30301 0.0.0.0:*
raw6 0 0 [::]:ipv6-icmp [::]:* 7
raw6 0 0 [::]:ipv6-icmp [::]:* 7
I'm using a standard Ubuntu 18.04 configuration of the basic droplet DigitalOcean, I would like to know if I should configure something else besides the usual compilation of the geth code in order to make a bootnode work.
Thanks any help is welcomed.

Mysql connect host change every time in the docker container,why?

In the docker container.I try to login the host mysql server. First the host ip is changed,so confused for me.But second login success. Anyone can explain this strange happening?
I login ip is 192.168.100.164,but the error info shows ip 172.18.0.4,which is the container localhost.
More info:
root#b67c39311dbb:~/project# netstat -nr
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface
0.0.0.0 172.18.0.1 0.0.0.0 UG 0 0 0 eth0
172.18.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0
root#b67c39311dbb:~/project# ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 172.18.0.4 netmask 255.255.0.0 broadcast 172.18.255.255
ether 02:42:ac:12:00:04 txqueuelen 0 (Ethernet)
RX packets 2099 bytes 2414555 (2.4 MB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 1752 bytes 132863 (132.8 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
loop txqueuelen 1000 (Local Loopback)
RX packets 35 bytes 3216 (3.2 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 35 bytes 3216 (3.2 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
Try adding --add-host="localhost:192.168.100.164" when launching docker run. But this is not a good practice to my mind. You should move your mysql database to another container and create a network between them
That is true, when we start docker container it is get their own ip for container. You need to map host port with docker container. then, when you try to connect host port it redirect to myssql docker container. Please look at https://docs.docker.com/config/containers/container-networking/
I'd suggest you to create a docker bridged network and then create your container using the --add-host as suggested by Alexey.
In a simple script:
DOCKER_NET_NAME='the_docker_network_name'
DOCKER_GATEWAY='172.100.0.1'
docker network create \
--driver bridge \
--subnet=172.100.0.0/16 \
--gateway=$DOCKER_GATEWAY \
$DOCKER_NET_NAME
docker run -d \
--add-host db.onthehost:$DOCKER_GATEWAY \
--restart=always \
--network=$DOCKER_NET_NAME \
--name=magicContainerName \
yourImage:latest
EDIT: creating a network will also simplify the communication among containers (if you plan to have it in the future) since you'll be able to use the container names instead of their IP.

Can't open port 8080 on Google Compute Engine running Debian

I am trying to run a simple Python http server that displays "hello world" on port 8080 using a micro instance. I also have 4 instances of Tornado running behind Nginx. Connecting to Nginx/Tornado on port 80 is not a problem.
I have added port 8080 to my firewall settings, and ensured port 8080 is open and listening on the server but no matter what I do, my connection is always refused. I have tried connecting using browsers, telnet and wget and every single connection is refused.
Here is the output of netstat -an | grep "LISTEN "
tcp 0 0 0.0.0.0:8000 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:8001 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:8002 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:8003 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:8080 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN
tcp6 0 0 :::8000 :::* LISTEN
tcp6 0 0 :::8001 :::* LISTEN
tcp6 0 0 :::8002 :::* LISTEN
tcp6 0 0 :::8003 :::* LISTEN
tcp6 0 0 :::22 :::* LISTEN
Here is my iptables list
Chain INPUT (policy ACCEPT)
target prot opt source destination
ACCEPT tcp -- anywhere anywhere tcp dpt:http-alt
Chain FORWARD (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
Here is the Python script I am using:
#!/usr/bin/python
from BaseHTTPServer import BaseHTTPRequestHandler,HTTPServer
PORT_NUMBER = 8080
#This class will handles any incoming request from
#the browser
class myHandler(BaseHTTPRequestHandler):
#Handler for the GET requests
def do_GET(self):
self.send_response(200)
self.send_header('Content-type','text/html')
self.end_headers()
# Send the html message
self.wfile.write("Hello World!")
return
try:
#Create a web server and define the handler to manage the
#incoming request
server = HTTPServer(('', PORT_NUMBER), myHandler)
print 'Started httpserver on port ' , PORT_NUMBER
#Wait forever for incoming htto requests
server.serve_forever()
except KeyboardInterrupt:
print '^C received, shutting down the web server'
server.socket.close()
Does your network have the corresponding firewall rule? Follow the next steps to create it.
Go to the Developers Console and click on the corresponding project.
Click on 'Compute'
Click on 'Networks'
Click on the name of the corresponding network. You can see in which network is your instance clicking on 'VM instances' under the 'Compute Engine' section or with the command:
gcloud compute instances describe <instance> | grep "network:" | awk -F/ '{print $(NF)}'
Under the Firewall rules section, click 'Create new'
Enter a name for the firewall rule and in the field 'Protocols & ports' type: tcp:8080
Save the rule
After that, you should be able to access your HTTP server.
Otherwise you can try to see if your machine receives the SYN TCP packets in that port with the command: sudo tcpdump -i eth0 port 8080
Hope it helps
In GCE Web Console > Networks > Firewall rules > edit your RULE, remove TARGET TAGS and apply.
GL
probably somethings goes wrong when You've created the network rule. When a network rule is described and related MetaTag is created, assure that the VMs instances contain the same MetaTag, so the wanted traffic will be redirected to the machine.
Still not sure what went wrong, but I deleted my instance and network then created new ones. The new instance and network seem to be working fine, so I can only assume something went wrong when playing around with the old network as the new one doesn't seem to have the same problem.
make sure to add the right port.
the answer above states "tcp:80" , but this will not work if your server is running on another port.
thats probably the reason why it is not working for others
In GCE Web Console, goto Networks > Firewall rules > create firewall rule,
then
input a name for the new rule
choose "all instance in the network" for targets
put 0.0.0.0/0 in source IPv4 ranges
checkbox TCP and input 8080 as port
and save