zabbix items 36080 .
Zabbix_proxy,
Zabbix poller processes more than 75% busy not waring everytime.i was add pollers ,but it's not work!
You should adjust these two values in the zabbix_server.conf depending on the resources you have at your disposal.
StartPollers=56
StartPollersUnreachable=10
Related
We have a plan to scaleup the Max Pod Per node from 200 to 350.
Based on the documentation - in order for the new node config to take effect the atomic-openshift-node service needs to be restarted.
This cluster where the node is located, served business critical DCs, PODS, services, routes etc.
The question is, what are the possible operational impact during the restart of the atomic service if any? Or is it totally no direct impact to the applications?
Ref: https://docs.openshift.com/container-platform/3.3/install_config/master_node_configuration.html
Containers should run fine without interruption. But nodeservice (node controller) in some cases can send "NotReady" status for some PODs. I dont know what is the cause. I suspect a race condition, guessing the dependence on the time of restart and readiness probe parameters, maybe on other node performance conditions.
This may result in the service being unavailable for a while if it will be removed from the "router" backends.
Most likely, in case only one node is changed at time and important applications are well scaled with HA rules in mind, there should be no "business impact".
But, in case of node control configmap change (3.11, new design introduced long after the original question), possible many node controllers are restarted in parallel (in fact, it does not happen immediately, but still within a short period), which I consider as problematic consequence of node controll configmap concept (one for all appnodes).
im having some trouble with our mail server since yesterday.
First, the server was down for couple days, thanks to KVM, VMs were paused because storage was apparently full. So i managed to fix the issue. But since the mail server is back online, CPU usage was always at 100%, i checked logs, and there was "millions", of mails waiting in the postfix queue.
I tried to flush the queue, thanks to the PFDel script, it took some times, but all mails were gone, and we were finally able to receive new emails. I also forced a logrotate, because fail2ban was also using lots of CPU.
Unfortunately, after couple hours, postfix active queue is still growing, and i really dont understand why.
Another script i found is giving me that result right now:
Incoming: 1649
Active: 10760
Deferred: 0
Bounced: 2
Hold: 0
Corrupt: 0
is there a possibility to desactivate ""Undelivered Mail returned to Sender" ?
Any help would be very helpful.
Many thanks
You could firstly temporarily stop sending bounce mails completely or set more strict rules in order to analyze the reasons of the flood. See for example:http://domainhostseotool.com/how-to-configure-postfix-to-stop-sending-undelivered-mail-returned-to-sender-emails-thoroughly.html
Sometimes the spammers find some weakness (or even vulnerability) in your configuration or in SMTP server and using that to send the spam (also if it could reach the addressee via bounce only). Mostly in this case, you will find your IP/domain in some common blacklist services (or it will be blacklisted by large mail providers very fast), so this will participate additionally to the flood (the bounces will be rejected by recipient server, what then let grow you queue still more).
So also check your IP/domain using https://mxtoolbox.com/blacklists.aspx or similar service (sometimes they provide also the reason why it was blocked).
As for fail2ban, you can also analyze logs (find some pattern) to detect the evildoers (initial sender), and write custom RE for fail2ban to ban them for example after 10 attempts in 20 minutes (or add it to ignore list for bounce messages in postfix)... so you'd firstly send X bounces, but hereafter it'd ban the recidive IPs, what could also help to reduce the flood significantly.
An last but not least, check your config (follow best practices for it) and set up at least MX/SPF records, DKIM signing/verification and DMARC policies.
the zone does not have enough resources available to fulfil the request/ the resource is not ready
I failed to start my instance (through the web browser), it gave me the error:
"The zone 'projects/XXXXX/zones/europe-west2-c' does not have enough resources available to fulfill the request. Try a different zone, or try again later."
I thought it might be the quota problem at first, after checking my quota, it showed all good. Actually, I listed the available zones, europe-west2-c was available, but I still gave a shot to move the zone. Then I tried "gcloud compute instances move XXXX --zone europe-west2-c --destination-zone europe-west2-c", however, it still failed, popped up the error:
"ERROR: (gcloud.compute.instances.move) Instance cannot be moved while in state: TERMINATED"
Okay, terminated... then I tried to restart it by "gcloud compute instances reset XXX", the error showed in the way:
ERROR: (gcloud.compute.instances.reset) Could not fetch resource: - The resource 'projects/XXXXX/zones/europe-west2-c/instances/XXX' is not ready
I searched the error, some people solved this problem by deleting the disk. While I don't want to wipe the memory, how could I solve this problem?
BTW, I only have one instance, with one persistent disk attached.
Its recommended to deploy and balance your workload across multiple zones or regions1 to reduce the likelihood of an outage, by building resilient and scalable architectures.
If you want an immediate solution, create a snapshot 2, then create an instance from the snapshot with different zone or region 3.
After migrating it you are still experiencing the same issue, I suggest to contact GCP support4.
Is there some way to run gnu parallel with a dynamically changing list of remote hosts? The dynamism isn't intermittent or irregular -- I'm looking for a way to use the Google compute engine autoscaling feature to smoothly scale up to a max number of hosts and have gnu parallel dispatch jobs as these hosts come alive. I guess I can create a fake job to trigger autoscaling to launch the multiple hosts and have them register themselves to some central host file.. Any ideas how best to manage this?
From man parallel:
--slf filename
File with sshlogins. The file consists of sshlogins on
separate lines.
:
If the sshloginfile is changed it will be re-read when a
job finishes though at most once per second. This makes it
possible to add and remove hosts while running.
I need to set up a job/message queue with the option to set a delay for the task so that it's not picked up immediately by a free worker, but after a certain time (can vary from task to task). I looked into a couple of linux queue solutions (rabbitmq, gearman, memcacheq), but none of them seem to offer this feature out of the box.
Any ideas on how I could achieve this?
Thanks!
I've used BeanstalkD to great effect, using the delay option on inserting a new job to wait several seconds till the item becomes available to be reserved.
If you are doing longer-term delays (more than say 30 seconds), or the jobs are somewhat important to perform (abeit later), then it also has a binary logging system so that any daemon crash would still have a record of the job. That said, I've put hundreds of thousands of live jobs through Beanstalkd instances and the workers that I wrote were always more problematical than the server.
You could use an AMQP broker (such as RabbitMQ) and I have an "agent" (e.g. a python process built using pyton-amqplib) that sits on an exchange an intercepts specific messages (specific routing_key); once a timer has elapsed, send back the message on the exchange with a different routing_key.
I realize this means "translating/mapping" routing keys but it works. Working with RabbitMQ and python-amqplib is very straightforward.