Our origin-node.service on the master node fails with:
root#master> systemctl start origin-node.service
Job for origin-node.service failed because the control process exited with error code. See "systemctl status origin-node.service" and "journalctl -xe" for details.
root#master> systemctl status origin-node.service -l
[...]
May 05 07:17:47 master origin-node[44066]: bootstrap.go:195] Part of the existing bootstrap client certificate is expired: 2020-02-20 13:14:27 +0000 UTC
May 05 07:17:47 master origin-node[44066]: bootstrap.go:56] Using bootstrap kubeconfig to generate TLS client cert, key and kubeconfig file
May 05 07:17:47 master origin-node[44066]: certificate_store.go:131] Loading cert/key pair from "/etc/origin/node/certificates/kubelet-client-current.pem".
May 05 07:17:47 master origin-node[44066]: server.go:262] failed to run Kubelet: cannot create certificate signing request: Post https://lb.openshift-cluster.mydomain.com:8443/apis/certificates.k8s.io/v1beta1/certificatesigningrequests: EOF
So it seems that kubelet-client-current.pem and/or kubelet-server-current.pem contains an expired certificate and the service tries to create a CSR using an endpoint which is probably not yet available (because the master is down). We tried redeploying the certificates according to the OpenShift documentation Redeploying Certificates, but this fails while detecting an expired certificate:
root#master> ansible-playbook -i /etc/ansible/hosts openshift-master/redeploy-openshift-ca.yml
[...]
TASK [openshift_certificate_expiry : Fail when certs are near or already expired] *******************************************************************************************************************************************
fatal: [master.openshift-cluster.mydomain.com]: FAILED! => {"changed": false, "msg": "Cluster certificates found to be expired or within 60 days of expiring. You may view the report at /root/cert-expiry-report.20200505T042754.html or /root/cert-expiry-report.20200505T042754.json.\n"}
[...]
root#master> cat /root/cert-expiry-report.20200505T042754.json
[...]
"kubeconfigs": [
{
"cert_cn": "O:system:cluster-admins, CN:system:admin",
"days_remaining": -75,
"expiry": "2020-02-20 13:14:27",
"health": "expired",
"issuer": "CN=openshift-signer#1519045219 ",
"path": "/etc/origin/node/node.kubeconfig",
"serial": 27,
"serial_hex": "0x1b"
},
{
"cert_cn": "O:system:cluster-admins, CN:system:admin",
"days_remaining": -75,
"expiry": "2020-02-20 13:14:27",
"health": "expired",
"issuer": "CN=openshift-signer#1519045219 ",
"path": "/etc/origin/node/node.kubeconfig",
"serial": 27,
"serial_hex": "0x1b"
},
[...]
"summary": {
"expired": 2,
"ok": 22,
"total": 24,
"warning": 0
}
}
There is a guide for OpenShift 4.4 for Recovering from expired control plane certificates, but that does not apply for 3.11 and we did not find such a guide for our version.
Is it possible to recreate the expired certificates without a running master node for 3.11? Thanks for any help.
OpenShift Ansible: https://github.com/openshift/openshift-ansible/releases/tag/openshift-ansible-3.11.153-2
Update 2020-05-06: I also executed redeploy-certificates.yml, but it fails at the same TASK:
root#master> ansible-playbook -i /etc/ansible/hosts playbooks/redeploy-certificates.yml
[...]
TASK [openshift_certificate_expiry : Fail when certs are near or already expired] ******************************************************************************
Wednesday 06 May 2020 04:07:06 -0400 (0:00:00.909) 0:01:07.582 *********
fatal: [master.openshift-cluster.mydomain.com]: FAILED! => {"changed": false, "msg": "Cluster certificates found to be expired or within 60 days of expiring. You may view the report at /root/cert-expiry-report.20200506T040603.html or /root/cert-expiry-report.20200506T040603.json.\n"}
Update 2020-05-11: Running with -e openshift_certificate_expiry_fail_on_warn=False results in:
root#master> ansible-playbook -i /etc/ansible/hosts -e openshift_certificate_expiry_fail_on_warn=False playbooks/redeploy-certificates.yml
[...]
TASK [Wait for master API to come back online] *****************************************************************************************************************
Monday 11 May 2020 03:48:56 -0400 (0:00:00.111) 0:02:25.186 ************
skipping: [master.openshift-cluster.mydomain.com]
TASK [openshift_control_plane : restart master] ****************************************************************************************************************
Monday 11 May 2020 03:48:56 -0400 (0:00:00.257) 0:02:25.444 ************
changed: [master.openshift-cluster.mydomain.com] => (item=api)
changed: [master.openshift-cluster.mydomain.com] => (item=controllers)
RUNNING HANDLER [openshift_control_plane : verify API server] **************************************************************************************************
Monday 11 May 2020 03:48:57 -0400 (0:00:00.945) 0:02:26.389 ************
FAILED - RETRYING: verify API server (120 retries left).
FAILED - RETRYING: verify API server (119 retries left).
[...]
FAILED - RETRYING: verify API server (1 retries left).
fatal: [master.openshift-cluster.mydomain.com]: FAILED! => {"attempts": 120, "changed": false, "cmd": ["curl", "--silent", "--tlsv1.2", "--max-time", "2", "--cacert", "/etc/origin/master/ca-bundle.crt", "https://lb.openshift-cluster.mydomain.com:8443/healthz/ready"], "delta": "0:00:00.182367", "end": "2020-05-11 03:51:52.245644", "msg": "non-zero return code", "rc": 35, "start": "2020-05-11 03:51:52.063277", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
root#master> systemctl status origin-node.service -l
[...]
May 11 04:23:28 master.openshift-cluster.mydomain.com origin-node[109972]: E0511 04:23:28.077964 109972 bootstrap.go:195] Part of the existing bootstrap client certificate is expired: 2020-02-20 13:14:27 +0000 UTC
May 11 04:23:28 master.openshift-cluster.mydomain.com origin-node[109972]: I0511 04:23:28.078001 109972 bootstrap.go:56] Using bootstrap kubeconfig to generate TLS client cert, key and kubeconfig file
May 11 04:23:28 master.openshift-cluster.mydomain.com origin-node[109972]: I0511 04:23:28.080555 109972 certificate_store.go:131] Loading cert/key pair from "/etc/origin/node/certificates/kubelet-client-current.pem".
May 11 04:23:28 master.openshift-cluster.mydomain.com origin-node[109972]: F0511 04:23:28.130968 109972 server.go:262] failed to run Kubelet: cannot create certificate signing request: Post https://lb.openshift-cluster.mydomain.com:8443/apis/certificates.k8s.io/v1beta1/certificatesigningrequests: EOF
[...]
I have this same case in customer environment, this error is because the certified was expiry, i "cheated" changing da S.O date before the expiry date. And the origin-node service started in my masters:
systemctl status origin-node
● origin-node.service - OpenShift Node
Loaded: loaded (/etc/systemd/system/origin-node.service; enabled; vendor preset: disabled)
Active: active (running) since Sáb 2021-02-20 20:22:21 -02; 6min ago
Docs: https://github.com/openshift/origin
Main PID: 37230 (hyperkube)
Memory: 79.0M
CGroup: /system.slice/origin-node.service
└─37230 /usr/bin/hyperkube kubelet --v=2 --address=0.0.0.0 --allow-privileged=true --anonymous-auth=true --authentication-token-webhook=true --authentication-token-webhook-cache-ttl=5m --authorization-mode=Webhook --authorization-webhook-c...
Você tem mensagem de correio em /var/spool/mail/okd
The openshift_certificate_expiry role uses the openshift_certificate_expiry_fail_on_warn variable to determine if the playbook should fail when the days left are less than openshift_certificate_expiry_warning_days.
So try running the redeploy-certificates.yml with this additional variable set to "False":
ansible-playbook -i /etc/ansible/hosts -e openshift_certificate_expiry_fail_on_warn=False playbooks/redeploy-certificates.yml
Please bear with me I'm new to fileUpload express. I have simple form to upload file. It works for file size of around 50MB but it fails for file sizes more than 100MB.
Here is the HTML file:
<form action="/parse"
method="post"
ref="parse"
encType="multipart/form-data">
<input type="file" class="btn btn-secondary input-file" name="inputFile" id="inputFile"/>
<input type="submit" class="btn btn-primary submit-btn" value="Upload"/>
</form>
And here is my initialization:
const fileUpload = require('express-fileupload');
var app = express();
app.use(fileUpload({
limits: {
fileSize: 1024 * 1024 * 1024 * 1024,
abortOnLimit: false
}
}));
It won't even hit my /parse url, it gives the following error on express app console:
0 info it worked if it ends with ok
1 verbose cli [ 'C:\\Program Files\\nodejs\\node.exe',
1 verbose cli '~\AppData\\Roaming\\npm\\node_modules\\npm\\bin\\npm-cli.js',
1 verbose cli 'start' ]
2 info using npm#5.6.0
3 info using node#v8.6.0
4 verbose run-script [ 'prestart', 'start', 'poststart' ]
5 info lifecycle flow-trace-analizer#1.2.2~prestart: flow-trace-analizer#1.2.2
6 info lifecycle flow-trace-analizer#1.2.2~start: flow-trace-analizer#1.2.2
7 verbose lifecycle flow-trace-analizer#1.2.2~start: unsafe-perm in lifecycle true
8 verbose lifecycle flow-trace-analizer#1.2.2~start: PATH: ~\AppData\Roaming\npm\node_modules\npm\node_modules\npm-lifecycle\node-gyp-bin;~\flow-trace-analizer\node_modules\.bin;C:\ProgramData\Oracle\Java\javapath;C:\windows\system32;C:\windows;C:\windows\System32\Wbem;C:\windows\System32\WindowsPowerShell\v1.0\;C:\Program Files (x86)\Sennheiser\SoftphoneSDK\;~\apache-maven-3.3.9\bin;C:\Program Files\nodejs\;C:\Program Files\Git\bin;~\AppData\Roaming\npm;C:\Program Files (x86)\WebEx\PTools020000000;C:\Program Files\nodejs\;C:\Program Files\Git\cmd;C:\Program Files\dotnet\;C:\Program Files\R\R-3.5.0\bin\R.exe;~\Gradle\gradle-4.7\bin;~\Groovy\groovy-2.4.12\bin;C:\Program Files\Heroku\bin;~\AppData\Roaming\npm
9 verbose lifecycle flow-trace-analizer#1.2.2~start: CWD: ~\flow-trace-analizer
10 silly lifecycle flow-trace-analizer#1.2.2~start: Args: [ '/d /s /c', 'node ./bin/www' ]
11 silly lifecycle flow-trace-analizer#1.2.2~start: Returned: code: 3 signal: null
12 info lifecycle flow-trace-analizer#1.2.2~start: Failed to exec start script
13 verbose stack Error: flow-trace-analizer#1.2.2 start: `node ./bin/www`
13 verbose stack Exit status 3
13 verbose stack at EventEmitter.<anonymous> (
~\AppData\Roaming\npm\node_modules\npm\node_modules\npm-lifecycle\index.js:285:16)
13 verbose stack at emitTwo (events.js:125:13)
13 verbose stack at EventEmitter.emit (events.js:213:7)
13 verbose stack at ChildProcess.<anonymous> (
~\AppData\Roaming\npm\node_modules\npm\node_modules\npm-lifecycle\lib\spawn.js:55:14)
13 verbose stack at emitTwo (events.js:125:13)
13 verbose stack at ChildProcess.emit (events.js:213:7)
13 verbose stack at maybeClose (internal/child_process.js:927:16)
13 verbose stack at Process.ChildProcess._handle.onexit (internal/child_process.js:211:5)
14 verbose pkgid flow-trace-analizer#1.2.2
15 verbose cwd ~\flow-trace-analizer
16 verbose Windows_NT 6.1.7601
17 verbose argv "C:\\Program Files\\nodejs\\node.exe" "~\\AppData\\Roaming\\npm\\node_modules\\npm\\bin\\npm-cli.js" "start"
18 verbose node v8.6.0
19 verbose npm v5.6.0
20 error code ELIFECYCLE
21 error errno 3
22 error flow-trace-analizer#1.2.2 start: `node ./bin/www`
22 error Exit status 3
23 error Failed at the flow-trace-analizer#1.2.2 start script.
23 error This is probably not a problem with npm. There is likely additional logging output above.
24 verbose exit [ 3, true ]
No code change is needed. In your package json for start script, have this command:
"start": "node --max-old-space-size=4096 ./bin/www"
Or if you run your node file from command line, run it like:
node --max-old-space-size=4096 PATH/TO/YOUR/EXECUTABLE/NODE/FILE
I am setting up a cluster. I tried to join 3 nodes but while re-balancing. I got below error. So i extracted some info from debug.log and unable to identify the exact issue. Appreciate any help.
=========================CRASH REPORT=========================
crasher:
initial call: service_agent:-spawn_connection_waiter/2-fun-0-/0
pid: <0.18486.7>
registered_name: []
exception exit: {no_connection,"index-service_api"}
in function service_agent:wait_for_connection_loop/3 (src/service_agent.erl, line 305)
ancestors: ['service_agent-index',service_agent_children_sup,
service_agent_sup,ns_server_sup,ns_server_nodes_sup,
<0.170.0>,ns_server_cluster_sup,<0.89.0>]
messages: []
links: [<0.18481.7>,<0.18490.7>]
dictionary: []
trap_exit: false
status: running
heap_size: 987
stack_size: 27
reductions: 1195
neighbours:
[ns_server:error,2018-02-12T13:54:43.531-05:00,ns_1#xuodf9.firebrand.com:service_agent-index<0.18481.7>:service_agent:terminate:264]Terminating abnormally
[ns_server:debug,2018-02-12T13:54:43.531-05:00,ns_1#xuodf9.firebrand.com:<0.18487.7>:ns_pubsub:do_subscribe_link:145]Parent process of subscription {ns_config_events,<0.18481.7>} exited with reason {linked_process_died,
<0.18486.7>,
{no_connection,
"index-service_api"}}
[error_logger:error,2018-02-12T13:54:43.531-05:00,ns_1#xuodf9.firebrand.com:error_logger<0.6.0>:ale_error_logger_handler:do_log:203]** Generic server 'service_agent-index' terminating
** Last message in was {'EXIT',<0.18486.7>,
{no_connection,"index-service_api"}}
** When Server state == {state,index,
{dict,6,16,16,8,80,48,
{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},
{{[[{uuid,<<"55a14ec6b06d72205b3cd956e6de60e7">>}|
'ns_1#xuodf7.firebrand.com']],
[],
[[{uuid,<<"c5e67322a74826bef8edf27d51de3257">>}|
'ns_1#xuodf8.firebrand.com']],
[],
[[{uuid,<<"3b55f7739e3fe85127dcf857a5819bdf">>}|
'ns_1#xuodf9.firebrand.com']],
[],
[[{node,'ns_1#xuodf7.firebrand.com'}|
<<"55a14ec6b06d72205b3cd956e6de60e7">>],
[{node,'ns_1#xuodf8.firebrand.com'}|
<<"c5e67322a74826bef8edf27d51de3257">>],
[{node,'ns_1#xuodf9.firebrand.com'}|
<<"3b55f7739e3fe85127dcf857a5819bdf">>]],
[],[],[],[],[],[],[],[],[]}}},
undefined,undefined,<0.18626.7>,#Ref<0.0.5.56873>,
<0.18639.7>,
{[{<0.18646.7>,#Ref<0.0.5.56891>}],[]},
undefined,undefined,undefined,undefined,undefined}
** Reason for termination ==
** {linked_process_died,<0.18486.7>,{no_connection,"index-service_api"}}
[error_logger:error,2018-02-12T13:54:43.532-05:00,ns_1#xuodf9.firebrand.com:error_logger<0.6.0>:ale_error_logger_handler:do_log:203]
=========================CRASH REPORT=========================
crasher:
initial call: service_agent:init/1
pid: <0.18481.7>
registered_name: 'service_agent-index'
exception exit: {linked_process_died,<0.18486.7>,
{no_connection,"index-service_api"}}
in function gen_server:terminate/6 (gen_server.erl, line 744)
ancestors: [service_agent_children_sup,service_agent_sup,ns_server_sup,
ns_server_nodes_sup,<0.170.0>,ns_server_cluster_sup,
<0.89.0>]
messages: [{'EXIT',<0.18639.7>,
{linked_process_died,<0.18486.7>,
{no_connection,"index-service_api"}}}]
links: [<0.18487.7>,<0.4805.0>]
dictionary: []
trap_exit: true
status: running
heap_size: 28690
stack_size: 27
reductions: 6334
neighbours:
[error_logger:error,2018-02-12T13:54:43.533-05:00,ns_1#xuodf9.firebrand.com:error_logger<0.6.0>:ale_error_logger_handler:do_log:203]
=========================SUPERVISOR REPORT=========================
Supervisor: {local,service_agent_children_sup}
Context: child_terminated
Reason: {linked_process_died,<0.18486.7>,
{no_connection,"index-service_api"}}
Offender: [{pid,<0.18481.7>},
{name,{service_agent,index}},
{mfargs,{service_agent,start_link,[index]}},
{restart_type,permanent},
{shutdown,1000},
{child_type,worker}]
[ns_server:error,2018-02-12T13:54:43.533-05:00,ns_1#xuodf9.firebrand.com:service_rebalancer-index<0.18626.7>:service_rebalancer:run_rebalance:80]Agent terminated during the rebalance: {'DOWN',#Ref<0.0.5.56860>,process,
<0.18481.7>,
{linked_process_died,<0.18486.7>,
{no_connection,"index-service_api"}}}
[error_logger:info,2018-02-12T13:54:43.534-05:00,ns_1#xuodf9.firebrand.com:error_logger<0.6.0>:ale_error_logger_handler:do_log:203]
=========================PROGRESS REPORT=========================
supervisor: {local,service_agent_children_sup}
started: [{pid,<0.20369.7>},
{name,{service_agent,index}},
{mfargs,{service_agent,start_link,[index]}},
{restart_type,permanent},
{shutdown,1000},
{child_type,worker}]
[ns_server:error,2018-02-12T13:54:43.534-05:00,ns_1#xuodf9.firebrand.com:service_agent-index<0.20369.7>:service_agent:handle_call:186]Got rebalance-only call {if_rebalance,<0.18626.7>,unset_rebalancer} that doesn't match rebalancer pid undefined
[ns_server:error,2018-02-12T13:54:43.534-05:00,ns_1#xuodf9.firebrand.com:service_rebalancer-index<0.18626.7>:service_agent:process_bad_results:815]Service call unset_rebalancer (service index) failed on some nodes:
[{'ns_1#xuodf9.firebrand.com',nack}]
[ns_server:warn,2018-02-12T13:54:43.534-05:00,ns_1#xuodf9.firebrand.com:service_rebalancer-index<0.18626.7>:service_rebalancer:run_rebalance:89]Failed to unset rebalancer on some nodes:
{error,{bad_nodes,index,unset_rebalancer,
[{'ns_1#xuodf9.firebrand.com',nack}]}}
[error_logger:error,2018-02-12T13:54:43.535-05:00,ns_1#xuodf9.firebrand.com:error_logger<0.6.0>:ale_error_logger_handler:do_log:203]
=========================CRASH REPORT=========================
crasher:
initial call: service_rebalancer:-spawn_monitor/6-fun-0-/0
pid: <0.18626.7>
registered_name: 'service_rebalancer-index'
exception exit: {linked_process_died,<0.18486.7>,
{no_connection,"index-service_api"}}
in function service_rebalancer:run_rebalance/7 (src/service_rebalancer.erl, line 92)
ancestors: [cleanup_process,ns_janitor_server,ns_orchestrator_child_sup,
ns_orchestrator_sup,mb_master_sup,mb_master,<0.4893.0>,
ns_server_sup,ns_server_nodes_sup,<0.170.0>,
ns_server_cluster_sup,<0.89.0>]
messages: [{'EXIT',<0.18640.7>,
{linked_process_died,<0.18486.7>,
{no_connection,"index-service_api"}}}]
links: []
dictionary: []
trap_exit: true
status: running
heap_size: 2586
stack_size: 27
reductions: 6359
neighbours:
[ns_server:error,2018-02-12T13:54:43.536-05:00,ns_1#xuodf9.firebrand.com:cleanup_process<0.18625.7>:service_janitor:maybe_init_topology_aware_service:84]Initial rebalance for `index` failed: {error,
{initial_rebalance_failed,index,
{linked_process_died,<0.18486.7>,
{no_connection,
"index-service_api"}}}}
[ns_server:debug,2018-02-12T13:54:43.536-05:00,ns_1#xuodf9.firebrand.com:menelaus_cbauth<0.4796.0>:menelaus_cbauth:handle_cast:95]Observed json rpc process {"projector-cbauth",<0.5099.0>} needs_update
[ns_server:debug,2018-02-12T13:54:43.538-05:00,ns_1#xuodf9.firebrand.com:menelaus_cbauth<0.4796.0>:menelaus_cbauth:handle_cast:95]Observed json rpc process {"goxdcr-cbauth",<0.479.0>} needs_update
[ns_server:debug,2018-02-12T13:54:43.539-05:00,ns_1#xuodf9.firebrand.com:menelaus_cbauth<0.4796.0>:menelaus_cbauth:handle_cast:95]Observed json rpc process {"cbq-engine-cbauth",<0.5124.0>} needs_update
[ns_server:debug,2018-02-12T13:54:43.540-05:00,ns_1#xuodf9.firebrand.com:menelaus_cbauth<0.4796.0>:menelaus_cbauth:handle_cast:95]Observed json rpc process {"fts-cbauth",<0.5129.0>} needs_update
This is a blocker for cluster creation at this point.
The rebalance error is coming due to index service. You can check indexer.log to see if there are any errors and the process is able to bootstrap correctly.
Please make sure the communication ports are open as mentioned here: https://developer.couchbase.com/documentation/server/current/install/install-ports.html
projector_port 9999 being blocked can lead to this.
I am trying to provision an AWS AMI but the packer script terminates with the following error.
Build 'amazon-ebs' errored: Script exited with non-zero exit status: 127
==> Some builds didn't complete successfully and had errors:
--> amazon-ebs: Script exited with non-zero exit status: 127
==> Builds finished but no artifacts were created.
My template for Packer is as follows:
{
"variables": {
"aws_access_key": "{{env `MY_ACCESS_KEY`}}",
"aws_secret_key": "{{env `MY_SECRET_KEY`}}"
},
"builders": [{
"type": "amazon-ebs",
"access_key": "{{user `aws_access_key`}}",
"secret_key": "{{user `aws_secret_key`}}",
"region": "us-east-1",
"source_ami":"ami-8c1be5f6",
"instance_type": "t2.micro",
"ssh_username": "ec2-user",
"ami_name": "packer-example {{timestamp}}"
}],
"provisioners":[
{
"type": "shell",
"script": "provision.sh"
}]}
Error log is as follows: PACKER_LOG=1 packer build template.json
2017/11/01 01:30:37 [INFO] (telemetry) ending amazon-ebs
2017/11/01 01:30:37 [INFO] (telemetry) found error: Script exited with non-zero exit status: 127
2017/11/01 01:30:37 ui error: Build 'amazon-ebs' errored: Script exited with non-zero exit status: 127
2017/11/01 01:30:37 Builds completed. Waiting on interrupt barrier...
2017/11/01 01:30:37 machine readable: error-count []string{"1"}
2017/11/01 01:30:37 ui error:
==> Some builds didn't complete successfully and had errors:
2017/11/01 01:30:37 machine readable: amazon-ebs,error []string{"Script exited with non-zero exit status: 127"}
2017/11/01 01:30:37 ui error: --> amazon-ebs: Script exited with non-zero exit status: 127
2017/11/01 01:30:37 ui:
==> Builds finished but no artifacts were created.
Build 'amazon-ebs' errored: Script exited with non-zero exit status: 127
provision.sh comprises of
#!/bin/bash
sudo yum install httpd -y
sudo yum update -y
sudo aws s3 cp s3://zbcxlkxcjvlxkj/index1.html /var/www/html/ --region us-east-1
sudo service httpd start
sudo chkconfig httpd on
Exit code 127 is returned by bash when a command is not found. Most likely you do not have all of the commands used in your script installed on your image prior to running it.