Can PM2 take an action upon a process being marked "Errored" - pm2

PM2 will mark a process status "Errored" if it restarts more than "max_restarts" where each restart lasts less than "min_uptime". Perhaps it happens in other circumstances as well.
I'd like to take an action in the event that such a string of fatal errors occur. In my case, I'd like to reboot the whole machine since it means something horrible has occurred. Is this possible?
Note: I now see that it's possible to do this when PM2 is being used programmatically (see answer below). Is there a way to do it automatically through the CLI instead? Something similar to a githook that runs automatically upon the "errored" status being raised.

If PM2 is being used programmatically, this function can be used:
pm2.describe(process,errback)
It returns 'processDescription', which includes 'pm2_env', which includes 'status', which would show 'errored'.
This may answer the question for someone else, but it does not answer the question for me, as I would like to use PM2 via CLI call, and not from within another node script.

The question is quite old, but I had the same problem and nowadays, there is a CLI solution:
You can use pm2 jlist to get the current process list as JSON and parse it for example with jq. To search for all processes managed by pm2 in status "errored", you could call something like:
pm2 jlist | jq '.[] | {"name": .name, "status": .pm2_env.status} | select(.status=="errored")'

Related

How to get who or what turned off a pod?

We are currently trying to debug an issue with a pod and figured out that 6 other pod (not related) was turned off and would want to figure out when that happens and who or what turned it off (to see if it's related or not with the first issue).
Is it possible to get this kind of information with openshift ?
These operations are typically recorded in the audit logs (if you have enabled those): https://docs.openshift.com/container-platform/4.7/security/audit-log-view.html
So you can filter certain actions for example like so (GET actions):
oc adm node-logs node-1.example.com --path=oauth-apiserver/audit.log \
| jq 'select(.verb != "get")'

How to ping from Zabbix agent?

Is it possible to ping from Zabbix agent and pass that data into Zabbix server? I would like to be able to get response time from the agent.
I read that it is possible by using fping, would be great if someone could guide me to the correct path.
Thank you,
Rijath Mohammed
While that is not currently available out of the box, you can implement such a functionality using a feature called "user parameters". This forum thread has a simple example:
UserParameter=myping[*],/etc/zabbix/fping -q $1;echo $?
Although for you the path to fping is likely to be /usr/sbin/fping or /usr/bin/fping.
You can read more about user parameters in the official manual: https://www.zabbix.com/documentation/3.0/manual/config/items/userparameters .
While I haven't ever configured that, it would be similar on Windows - see this forum thread for some inspiration.
And if you would like to see this feature implemented out of the box, make sure to vote on this feature request.
Got it working using the below powershell script, :)
$Test = test-connection google.com -count 1
$Test.responsetime
This will just return the response time for Google.com and that value is passed to Zabbix using the below user parameter:
UnsafeUserParameters=1
UserParameter=ping.google,C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe C:\zabbix\pinggoogle.ps1
I am calling this parameter from Zabbix using the key "ping.google"

Fiware CEP server stops responding

In developing in Fi-Cloud's CEP I've been having an issue that has been happening repeatedly. As I'm trying to develop a definition to perform a task, CEP's server and Authoring Tool stop responding, although ssh is still responsive.
This issue happens as I develop. I'm using the AuthoringTool to alter the definition bit by bit and then I re-upload it to the server through the authoring tool's export feature.
To reinitiate the proton with the new definition each time I alter it, I use Google's Postman with this single operation:
-PUT (url:http://{ip}:8080/ProtonOnWebServerAdmin/resources/instances/ProtonOnWebServer)
header: 'Content-Type' : 'application/json'; body : {"action": "ChangeDefinitions","definitions-url" : "/ProtonOnWebServerAdmin/resources/definitions/Definition_Name"}
At the same time, I'm logged in with three ssh intances, one to monitor the files being created on /opt/tomcat10/sample/ and other things, and the other two to 'tail -f ' log files the definition writes to, as events are processed: one log for events recieved and another log for events detected by the EPAgent.
I'm iterating through these procedures over and over as I'm developing and eventualy CEP server and the Authoring Tool stop responding.
By "tailing" tomcat's log file (# tail -f /opt/tomcat10/logs/catalina.out) I can see that, when under these circumstances, if I attemp a:
-GET (url: http://{ip}:8080/ProtonOnWebServerAdmin/resources/instances/ProtonOnWebServer)
I get no response back and tomcat logs the following response:
11452100 [http-bio-8080-exec-167] ERROR org.apache.wink.server.internal.RequestProcessor - An unhandled exception occurred which will be propagated to the container.
java.lang.OutOfMemoryError: PermGen space
Exception in thread "http-bio-8080-exec-167" java.lang.OutOfMemoryError: PermGen space
Ssh is still responsive and I can look at tomcat's log this way.
To get over this and continue, I exit ssh connections and restart CEP's instance in the Fi-Cloud.
Is the procedure I'm using to re-upload and re-run the definition inapropriate? Should I take a different approach to developing?
When you update a definition that the CEP is already working with, and you want the CEP engine to work with the updated definition, you need to:
Export the definition using the authoring tool export (as you did)
Stop the engine run, using REST PUT
PUT //host:8080/ProtonOnWebServerAdmin/resources/instances/ProtonOnWebServer
{"action":"ChangeState","state":"stop"}
Start the engine, using REST PUT
PUT //host:8080/ProtonOnWebServerAdmin/resources/instances/ProtonOnWebServer
{"action":"ChangeState","state":"start"}
You don't need to activate the "ChangeDefinitions" action, since it is the same definition name that the engine is already working with.
Activating "ChangeDefinitions" action, only influences the next run of the CEP, and has no influence on the current run.
This answer your question about how you should update a CEP definition.
Hope it will solve your issue.

my nodejs script is not exiting on its own after successful execution

I have written a script to update my db table after reading data from db tables and solr. I am using asyn.waterfall module. The problem is that the script is not getting exited after successful completion of all operations. I have used db connection pool also thinking that may be creating the script to wait infinitly.
I want to put this script in crontab and if it will not exit properly it would be creating a hell lot of instances unnecessarily.
I just went through this issue.
The problem with just using process.exit() is that the program I am working on was creating handles, but never destroying them.
It was processing a directory and putting data into orientdb.
so some of the things that I have come to learn is that database connections need to be closed before getting rid of the reference. And that process.exit() does not solve all cases.
When my project processed 2,000 files. It would get down to about 500 left, and the extra handles would have filled up the available working memory. Which means it would not be able to continue. Therefore never reaching the process.exit at the end.
On the other hand, if you close the items that are requesting the app to stay open, you can solve the problem at its source.
The two "Undocumented Functions" that I was able to use, were
process._getActiveHandles();
process._getActiveRequests();
I am not sure what other functions will help with debugging these types of issues, but these ones were amazing.
They return an array, and you can determine a lot about what is going on in your process by using these methods.
You have to tell it when you're done, by calling
process.exit();
More specifically, you'll want to call this in the callback from async.waterfall() (the second argument to that function). At that point, all your asynchronous code has executed, and your script should be ready to exit.
EDIT: As pointed out by #Aaron below, this likely has to do with something like a database connection being active, and not allowing the node process to end.
You can use the node module why-is-node-running:
Run npm install -D why-is-node-running
Add import * as log from 'why-is-node-running'; in your code
When you expect your program to exit, add a log statement:
afterAll(async () => {
await app.close();
log();
})
This will print a list of open handles with a stacktrace to find out where they originated:
There are 5 handle(s) keeping the process running
# Timeout
/home/maf/dev/node_modules/why-is-node-running/example.js:6 - setInterval(function () {}, 1000)
/home/maf/dev/node_modules/why-is-node-running/example.js:10 - createServer()
# TCPSERVERWRAP
/home/maf/dev/node_modules/why-is-node-running/example.js:7 - server.listen(0)
/home/maf/dev/node_modules/why-is-node-running/example.js:10 - createServer()
We can quit the execution by using:
connection.destroy();
If you use Visual Studio code, you can attach to an already running Node script directly from it.
First, run the Debug: Attached to Node Process command:
When you invoke the command, VS Code will prompt you which Node.js process to attach to:
Your terminal should display this message:
Debugger listening on ws://127.0.0.1:9229/<...>
For help, see: https://nodejs.org/en/docs/inspector
Debugger attached.
Then, inside your debug console, you can use the code from The Lazy Coder’s answer:
process._getActiveHandles();
process._getActiveRequests();

How do BundleActivator, ManagedService, and my application interact on start/stop?

I had a non-OSGi application. To convert it to OSGi, I first bundled it up and gave it a simple BundleActivator. The activator's start() started up a thread of what used to be the main() of my app (and is now a Runnable), and remembered that thread. The activator's stop() interrupted that thread, and waited for it to end (via join()), then returned. This all seemed to be working fine.
As a next step in the OSGiification process, I am now trying to use OSGi configuration management instead of the Properties-based configuration that the application used to use. So I am adding in a ManagedService in addition to the Activator.
But it's no longer clear to me how I am supposed to start and stop my application; examples that I've seen are only serving to confuse me. Specifically, here:
http://felix.apache.org/site/apache-felix-config-admin.html
They no longer seem to do any real starting of the application in BundleActivator.start(). Instead, they just register a ManagedService to receive configuration. So I'm guessing maybe I start up the app's main thread when I receive configuration, in the ManagedService? They don't show it - the ManagedService's updated() just has vague comments saying to "apply configuration from config admin" when it is passed a non-null Dictionary.
So then I look here:
http://blog.osgi.org/2010/06/how-to-use-config-admin.html
In there, it seems like maybe they're doing what I guessed. They seem to have moved the actual app from BundleActivator to ManagedService, and are dealing with starting it when updated() receives non-null configuration, stopping it first if it's already started.
But now what about when the BundleActivator's stop() gets called?
Back on the first example page that I mentioned above, they unregister the ManagedService. On the second example page, they don't show what they do.
So I'm guessing maybe unregistering the ManagedService will cause null configuration to be sent to ManagedService.updated(), at which point I can interrupte the app thread, wait for it to end, and then return?
I suspect that I'm thoroughly incorrect, but I don't know what the "real" way to do this is. Thanks in advance for any help.
BundleActivator (BA) and ManagedService (MS) are callbacks to your bundle. BundleActivator is for the active state of your bundle. BA.start is when you bundle is being started and BA.stop is when it is being stopped. MS is called to provide your bundle a configuration, if there is one, or notify you there is no configuration.
So in BA.start, you register your MS service and return. When MS is called (on some other thread), you will either receive your configuration or be told there is no configuration and you can act accordingly (start app, etc.)
Your MS can also be called at anytime to advice of the modification or deletion of your configuration and you should act accordingly (i.e. adjust your app behavior).
When you are called at BA.stop, you need to stop your app. You can unregister the MS or let the framework do it for you as part of normal bundle stop processing.