Why does the CPU load dropped in the last days? - google-compute-engine

Anybody has a hint? I didn't change anything in the machine (except for the security updates), and the sites hosted there didn't suffer a significant change in connections.
May be Google changed something in their infrastructure? Coincidentally, it was an issue with the Cloud DNS ManagedZone these days: they charged me with $ 920 for half month usage, and it was an error (they counted thousands of weeks of usage too) so they recently changed back to $ 0,28. May be there was some process that indeed used the Cloud DNS by error and thus consumed CPU power, and they corrected now?
I wish to know what is happening from someone that knows what going on in GC. Thank you.

CPU utilization reporting is now more accurate from a VM guest perspective as it doesn't include virtualization layer overhead anymore. It has nothing to do with Cloud DNS.
See this issue for some extra context:
https://code.google.com/p/google-compute-engine/issues/detail?id=281

Related

IMX6ULL 4G download stalling issue

We have been using IMX6ULL processors along with a Quectel 4G module in our custom made boards. The 4G module can be initialised, brought up and the PPP0 interface can also be initialised which in-turn does provide us with internet connectivity too but, when we start downloading files (of about 10 MB - 200 MB), we have observed that the download begins to stall at irregular intervals. While the download does stall, the PPP0 interface is still up but we lose internet connectivity hence, we have to kill PPPD and re-initialise PPP0.
We have tried using different variations of PPP0 initialisation scripts that we could get our hands on but the issue still persisted however, recently when we wanted to dump the traffic on the PPP0 interface using TCPDUMP in-order to analyse the same, we observed that the download does not stall anymore and we also observed a much better 4G throughput too. We have still not been able to figure out why this is indeed the case. Any inputs or guidance on the same would be of great help.
P.S: The kernel version we have been using is 4.1.15 but, we have observed a similar behavior with the 5.4.70 kernel too.
Thanks in advance
Regards
Nitin
Check the 4G network first with AT+COPS? and AT+CSQ
Does the Module disconnect from the base station?
Do not try kill pppd and restart setting up ppp0, and try AT+CFUN=0 \ AT+CFUN=1 to restart the network registration first.
And for 4G module, the Quectel provide a tool named quectel-CM to setup internet connection, it is of better performance than ppp.
btw, have you check the the memory used and the CPU status?

Kubernetes on GCE / Prevent pods undergoing an eviction with "The node was low on compute resources."

Painful investigation on aspects that so far are not that highlighted by documentation (at least from what I've googled)
My cluster's kube-proxy became evicted (+-experienced users might be able to consider the faced issues). Searched a lot, but no clues about how to have them up again.
Until describing the concerned pod gave a clear reason : "The node was low on compute resources."
Still not that experienced with resources balance between pods/deployments and "physical" compute, how would one 'prioritizes' (or similar approach) to make sure specific pods will never end up in such a state ?
The cluster has been created with fairly low resources in order to get our hands on while keeping low costs and eventually witnessing such problems (gcloud container clusters create deemx --machine-type g1-small --enable-autoscaling --min-nodes=1 --max-nodes=5 --disk-size=30), is using g1-small is to prohibit ?
If you are using iptables-based kube-proxy (the current best practice), then kube-proxy being killed should not immediately cause your network connectivity to fail, but new services and updates to endpoints will stop working. Still, your apps should continue to work, but degrade slowly.
If you are using userspace kube-proxy, you might want to upgrade.
The error message sounds like it was due to memory pressure on the machine.
When there is memory pressure, Kubelet tries to terminate things in order of lowest to highest QoS level.
If your kube-proxy pod is not using Guaranteed resources, then you might want to change that.
Other things to look at:
if kube-proxy suddenly used a lot more memory, it could be terminated. If you made a huge number of pods or services or endpoints, this could cause it to use more memory.
if you started processes on the machine that are not under kubernetes control, that could cause kubelet to make an incorrect decision about what to terminate. Avoid this.
It is possible that on such a small machine as a g1-small, the amount of node resources held back is insufficient, such that too much guaranteed work got put on the machine -- see allocatable vs capacity. This might need tweaking.
Node oom documentation

How to find running process details?

i need to get the running process on windows phone 8. and also the details of each process like.
Memory usage
CPU usage
Running time
and what if i want to kill any process by user action
and also total used and free memory and CPU usage status
help required
regards
On Windows Phone, applications are sandboxed. You can get the amount of memory your own application uses, but that's about it. You can't get any kind of information about the other applications. Of course, it also means you can't kill them.
If you ever wondered why there isn't any task manager app on the marketplace, now you know.

swap space used while physical memory is free

i recently have migrated between 2 servers (the newest has lower specs), and it freezes all the time even though there is no load on the server, below are my specs:
HP DL120G5 / Intel Quad-Core Xeon X3210 / 8GB RAM
free -m output:
total used free shared buffers cached
Mem: 7863 7603 260 0 176 5736
-/+ buffers/cache: 1690 6173
Swap: 4094 412 3681
as you can see there is 412 mb ysed in swap while there is almost 80% of the physical ram available
I don't know if this should cause any trouble, but almost no swap was used in my old server so I'm thinking this does not seem right.
i have cPanel license so i contacted their support and they noted that i have high iowait, and yes when i ran sar i noticed sometimes it exceeds 60%, most often it's 20% but sometimes it reaches to 60% or even 70%
i don't really know how to diagnose that, i was suspecting my drive is slow and this might cause the latency so i ran a test using dd and the speed was 250 mb/s so i think the transfer speed is ok plus the hardware is supposed to be brand new.
the high load usually happens when i use gzip or tar to extract files (backup or restore a cpanel account).
one important thing to mention is that top is reporting that mysql is using 100% to 125% of the CPU and sometimes it reaches much more, if i trace the mysql process i keep getting this error continually:
setsockopt(376, SOL_IP, IP_TOS, [8], 4) = -1 EOPNOTSUPP (Operation not supported)
i don't know what that means nor did i get useful information googling it.
i forgot to mention that it's a web hosting server for what it's worth, so it has the standard setup for web hosting (apache,php,mysql .. etc)
so how do i properly diagnose this issue and find the solution, or what might be the possible causes?
As you may have realized by now, the free -m output shows 7603MiB (~7.6GiB) USED, not free.
You're out of memory and it has started swapping which will drastically slow things down. Since most applications are unaware that the virtual memory is now coming from much slower disk, the system may very well appear to "hang" with no feedback describing the problem.
From your description, the first process I'd kill in order to regain control would be the Mysql. If you have ssh/rsh/telnet connectivity to this box from another machine, you may have to login from that in order to get a usable commandline to kill from.
My first thought (hypothesis?) for what's happening is...
MySQL is trying to do something that is not supported as this machine is currently configured. It could be missing a library or an environment variable is not set or any number things.
That operation allocates some memory but is failing and not cleaning up the allocation when it does. If this were a shell script, it could be fixed by putting an event trap command at the beginning that runs a function that releases memory and cleans up.
The code is written to keep retrying on failure so it rapidly uses up all your memory. Refering back to the shell script illustration, the trap function might also prompt to see if you really want to keep retrying.
Not a complete answer but hopefully will help.

What happens during Stand-By and Hibernation?

It just hit me the other day. What actually happens when I tell the computer to go into Stand-By or to Hibernate?
More spesifically, what implications, if any, does it have on code that is running? For example if an application is compressing some files, encoding video files, checking email, running a database query, generating reports or just processing lots of data or doing complicated math stuff. What happens? Can you end up with a bug in your video? Can the database query fail? Can data processing end up containing errors?
I'm asking this both out of general curiosity, but also because I started to wonder if this is something I should think about when I program myself.
You should remember that the OS (scheduler) freezes your program about a gazillion times each second. This means that your program can already function pretty well when the operating system freezes it. There isn't much difference, from your point of view, between stand-by, hibernate and context switching.
What is different is that you'll be frozen for a long time. And this is the only thing you need to think about. In most cases, this shouldn't be a problem.
If you have a network connection you'll probably need to re-establish it, and similar issues. But this just means checking for errors in all IO operations, which I'm sure you're already doing... :-)
My initial thought is that as long as your program and its eco-system is contained within the pc that is going on stand - by or hibernation, then, upon resume your program should not be affected.
However, if you are say updating a record in some database hosted on a separate machine then hibernation / stand - by will be treated as a timeout.
If your program is dependent on such a change in "power status" you can listen to WM_POWERBROADCAST Message as mentioned on msdn
Stand-By keeps your "state" alive by keeping it in RAM. As a consequence if you lose power you'll lose your stored "state".
But it makes it quicker to achieve.
Hibernation stores your "state" in virtual RAM on the hard disk, so if you lose power you can still come back three days later. But it's slower.
I guess a limitation with Stand-By is how much RAM you've got, but I'm sure virtual RAM must be employed by Stand-By when it runs out of standard RAM. I'll look that up though and get back!
The Wikipedia article on ACPI contains the details about the different power savings modes which are present in modern PCs.
Here's the basic idea, from how I understand things:
The basic idea is to keep the current state of the system persisted, so when the machine is brought back into operation, it can resume at the state it was before the machine was put into sleep/standby/hibernation, etc. Think of it as serialization for your PC.
In standby, the computer will keep feeding power to the RAM, as the main memory is volatile memory that needs constant refreshing to hold on to its state. This means that the hard drives, CPU, and other components can be turned off, as long as there is enough power to keep the DRAM refreshed to keep its contents from disappearing.
In hibernation, the main memory will also be turned off, so the contents must be copied to permanent storage, such as a hard drive, before the system power is turned off. Other than that, the basic premise of hiberation is no different from standby -- to store the current state of the machine to restore at a later time.
With that in mind, it's probably not too likely that going into standby or hibernate will cause problems with tasks that are executing at the moment. However, it may not be a good idea to allow network activity to stop in the middle of execution, as depending on the protocol, your network connection could timeout and be unable to resume upon returning the system to its running state.
Also, there may be some machines that just have flaky power-savings drivers which may cause it to go to standby and never come back, but that's completely a different issue.
There are some implications for your code. Hibernation is more than just a context switch from the scheduler. Network connections will be closed, network drives or removable media might be disconnected during the hibernation, ...
I dont think your application can be notified of hibernation (but I might be wrong). What you should do is handle error scenarios (loss of network connectivity for example) as gracefully as possible. And note that those error scenario can occur during normal operation as well, not only when going into hibernation ...