Why is GCE VM with COS image faster? - google-compute-engine

I have two GCE instances: one with a COS image running CentOS 7 in a container. Let's call it VM1. And another with a CentOS 7 image directly on it. Let's call it VM2. Both of them run the same PHP app (Laravel).
VM1
Image: COS container with CentOS 7
Type: n1-standard-1 (1 vCPUj, 3,75 GB)
Disk: persistent disk 10 GB
VM2
Image: CentOS 7
Type: n1-standard-1 (2 vCPUj, 3,75 GB)
Disk: persistent disk 20 GB
As you can see, VM2 has a slightly better spec than VM1. So it should perform better, right?
That said, when I request an specific endpoint, VM1 responds in ~ 1.6s, while VM2 responds in ~ 10s. It's about 10x slower. The endpoint does exactly the same thing on both VMs, it queries the data base on a GCP SQL instance, and returns the results. Nothing abnormal.
So, it's almost the same hardware, it's the same guest OS and the same app. The only difference is that the VM1 is running the app via Docker.
I search and tried to debug many things but have no idea of what is going on. Maybe I'm misunderstanding something.
My best guess is that the COS image has some optimization that makes the app execution faster. But I don't know what exactly. Firstly I thought it could be some disk IO problem. But the disk utilization is OK on VM2. Then I thought it could be some OS configuration, then I compared the sysctl settings of both VMs and there's a lot of differences, as well. But I'm not sure what could be the key for optimization.
My questions are: why is this difference? And what can I change to make VM2 as faster as VM1?

First of all, Container-Optimized OS is based on the open-source Chromium OS, it is not CentOS, basically it is another Linux distribution.
Having said that, you need to understand that this OS is optimized for running Docker containers.
It means that Container-Optimized OS instances come pre-installed with the Docker runtime and cloud-init and basically that is all that this OS contains, because it is a minimalistic container-optimized operating system.
So, this OS doesn’t waste resources with all the applications, libraries and packages that CentOS have that can consume extra resources.
I have installed both OSs in my own project to check the Disk usage of each OS, and Container-Optimized OS from Google only uses 740MB, and CentOS consumes 2.1GB instead.
$ hostnamectl | grep Operating
Operating System: Container-Optimized OS from Google
$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/root 1.2G 740M 482M 61% /
$ hostnamectl | grep Operating
Operating System: CentOS Linux 7 (Core)
$ df -h
/dev/sda2 20G 2.1G 18G 11% /
I wasn't able to use a small persistent disk with CentOS, the minimum is 20 GB.
On the other hand, containers let your apps run with fewer dependencies on the host virtual machine (VM) and run independently from other containerized apps that you deploy to the same VM instance and optimize the resources used.

I don't have many experience with GCP (just Azure and AWS), but this problem maybe about latency. So You need confirm if all your assets are in the same region.
You can try to verify the time to response from each VM to your BD. With this information you will be able to know if this situation is about of latency or not.
https://cloud.google.com/compute/docs/regions-zones/

Related

EVE-NG QEMU based nodes are not starting

Setup:Dell PowerEdge R620 128GB Ram 12 Core server.
Vmware ESXI 6.5 Based setup: 1 VM for EVE-NG: 500GB SSD + 32 GB allocated RAM.
2nd VM for Windows Server 2016: 100GB HDD + 16 GB RAM.
On Windows client, I can access the EVE-NG via Firefox and Putty. I have tried cisco Dynamips images and nodes are starting (I can telnet with putty and change config)
When I try to created nodes based on Qemu Images(Cisco, Aruba, Paloalto, etc), the nodes do not start. I have followed the guidelines for qcow2 names as well as checked multiple sources. I have also edited the node and tried to play with all possible settings.
I have reinstalled the EVE-NG on ESXi as well but the issue remains the same.
Thanks a lot for your help and support.
I finally found an answer in the EVE-NG cookbook: https://www.eve-ng.net/wp-content/uploads/2020/06/EVE-Comm-BOOK-1.09-2020.pdf
Page33: Step 6: IMPORTANT Open VM Settings. Set the quantity of CPUs and number of
cores per socket. Set Intel VT-x/EPT. Hardware Virtualization engine to ON
(checked).
Once I checked this field, all nodes are started to work.
On CPU Settings on Set "Enable Virtualized CPU Performance Counter"
This helped me a bit. I was using mac and struggling to get the consoles up. Posting the below steps in case it helps someone like me:
Shut down the VM using command shutdown -h now
Go to the VM settings
Click Advanced -> Check "Remote display over VNC"
Check Enable IOMMU in this virtual machine

Image push/pull very slow with OpenShift Origin 3.11

I'm setting up an OKD Cluster in a VM via oc cluster up. My VM has 6 GB of RAM and 2 CPUs. Interactions with the internal image registry are very, very slow (multiple minutes to pull or push an image, e.g. when building an application via S2I). At the same time, htop shows me a CPU utilization of 100% for both CPUs within the VM. Is there any way to avoid this issue?

Instance is overutilized. Consider switching to the machine type: g1-small

I created a new f1 micro instance with Ubuntu 16.04. I haven't logged in yet as I have not figured out how to create the SSH key-pair yet. But after two days, the Dashboard now shows:
Instance "xxx" is overutilized. Consider switching to the machine type: g1-small
Why is this happening? Isn't a f1 micro similar to an ec2 t1.nano? I have a t1.nano running a Node.js web site (with nginx, pm2, etc) and my CPU credit has been consistently at the maximum of 150 during this period with only me as a test user.
I started the f1 micro to run the same Node application to see which is more cost-effective. The parameter that was cloudy to me was that unexplained "0.2 virtual CPU". Is 0.2 CPU virtually unuseable? Would 0.5 (g1 small) be significantly better?
To address your connection problems, perhaps temporarily until you figure out the manual key management, you might want to try SSH from the browser which is possible from the Cloud Platform console or use gcloud CLI to assist you.
https://cloud.google.com/compute/docs/instances/connecting-to-instance
Once you get access via the terminal I would run 'top' or 'ps'.
Example of using ps to find the top CPU users:
ps wwaxr -o pid,stat,%cpu,time,command | head -10
Example of running top to find the top memory users:
top -l 1 -o rsize | head -20
Google Cloud also offers a monitoring product called Stackdriver which would give you this information in the Cloud console but it requires an agent to be running on your VM. See the getting started guide if this sounds like a good option for you.
https://cloud.google.com/monitoring/quickstart-lamp
Once you get access to the resource usage data you should be able to determine if 1) the VM isn't powerful enough to run your node.js server or 2) perhaps something else got started on the host unexpectedly and that's the source of your usage.

EC2 Instance is running very slow

I am running an EC2 Instance on Ubuntu Server machine. Tomcat and MySQL are installed and deployed java web-application on it since 1 month. It was running good with great performance for almost 1 month but now my application is responding very slow.
Also, point to note is: Earlier when I used to log into my Ubuntu Server through PuTTY, it was quick but now its taking time even when I enter Ubuntu password.
Is there any solution?
I would start with checking with memory/CPU/network availability to check if it is not bottleneck.
Try following commands:
To check memory availability:
free -m
To check CPU usage:
top
To check network usage:
ntop
To check disk usage:
df -h
To check disk io operations:
iotop
Please also check if when you disable your application you are able to quickly log in to that machine. If login is still slow, then you should contact your EC2 support complaining about poor performance and asking for assigning more resources for that machine.
You can use WAIT Tool to diagnose what is wrong with your server or your application. The tool will gather all information about CPU and memoru utilization, running threads etc.
In addition, I would definitely check Tomcat application server with VisualVM or some other profiler. For configuring JMX for Tomcat you can check article here.
For network monitoring - nload tool is worth your attention. You can launch it in screen so you always check network utilization stats when server is slown.
First check is there any application using too much cpu or memory. This can be checked by using top command. I'll tell you two simple shortcut keys that may be helpful while using top command. In top command result page, if you enter M it will sort application based on memory usage, from highest to lowest. If you enter P it will sort application based on cpu usage, from highest to lowest.
If you are unable to find any suspicious application using top you can use iotop it will show disk I/O usage details.
I was facing the same issue, the solution which worked for me was
Restart the ec2 instance
Edit
lately, I figure out this issue is happening due to the fewer resources (memory, CPU) available to the EC2 machine. So check available resources to the EC2 machine.

Development Environment for Testing MySQL Replication

Is there an easy way to setup an environment on one machine (or a VM) with MySQL replication? I would like to put together a proof of concept of MySQL replication with one Master write instance and two slave instances for reads.
I can see doing it across 2 or 3 VMs running on my computer, but that would really bog down my system. I'd rather have everything running on the same VM. What's the best way to proof out scalability solutions like this in a local dev environment?
Thanks for your help,
Dave
I think to truly test MySQL Replication it is important to do so in realistic constraints.
If you put all the replicate nodes under one operating system then you no longer have the bandwidth constraint, the data transfer speed would be much higher that what you would get if those replicate DBs are on different sites.
Everything under one VM is a shortcut to configurations, for instance it does not make you go through the configuration of the networking.
I suggest you use multiple VMs, even if you have to put them under one physical machine, you can always configure the hypervisor to make the packets go through a router, in which case the I/O will be bound by whatever the network interface has as throughput.
I can see doing it across 2 or 3 VMs
running on my computer, but that would
really bog down my system.
You can try and make a few VMs with JeOS (Just Enough OS) versions of the operating system you want. I know Ubuntu has one and it can boot on 128 RAM, which makes it convenient to deploy lots of cloned VMs under one physical machine without monster RAM.
Next step would be doing the same thing on a cloud (Infrastructure as a Service, IaaS) provider, and try your setup on different geographical sites.
If what you're testing is machine-to-machine replication, then setting up multiple VMs on a virtual private network would be the correct environment to test it. If you use Ubuntu Server, you don't have to install more than you actually need -- just give the VMs enough space for a base install + MySQL + your data. Memory usage can be as little as 256MB per VM. All you have to do is suspend or shutdown the VMs when you're not running a full-up test.
I've had situations where I was running 4 or more VMs simultaneously on my workstation, either for development or testing purposes -- it's not that taxing unless you're trying to do video rendering in each VM.