gear size of more than one CPU / core in OpenShift Enterprise 2? - openshift

I'm setting up OpenShift Enterprise 2 and I'd like to create a district with a larger gear size. Changing
/etc/openshift/resource_limits.conf
on the nodes is straightforward for increasing memory and disk available to the gear, but CPU resource management is less intuitive (from resource_limits.conf):
# cpu cpu_rt_period_us=100000 cpu_rt_runtime_us=950000
cpu_shares=128
cpu_cfs_quota_us=100000
By default, a gear can only consume a maximum of 100% of a single processor core. If I want to allow a bigger gear size that could allow full utilization of 2 processor cores, how would I do that, or is it currently not possible at all in OpenShift?

Since all the gears are the same, and since 'cpu_shares' are compared on a relative basis when restricting a group, I'm not sure it makes sense to change 'cpu_shares'.
However, 'cpu_cfs_quota_us' looks like it might be the right knob to turn. From this page:
https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/sec-cpu.html
It appears that I should be able to double the quota to get a full 2 cores. However, it's not clear whether OpenShift will respect this, since the 'cpu_cfs_period_us' parameter is not even found in resource_limits.conf.
I performed an experiment using 'stress'. I first confirmed that I could load 2 cores under a normal ssh login (using 'stress --cpu 2'). Then I logged in to a gear on that host and ran the same thing. With cpu_cfs_quota_us=100000, I can only consume a max of 50% CPU for each stress process. But when I change to cpu_cfs_quota_us=200000, I can consume over 99% for each process, so it appears that it is now successful. Would be nice if this was called out in the OpenShift docs...

Related

Compute Engine - Automatic scale

I have one VM Compute Engine to host simple apps. My apps is growing and the number of users too.
Now my users work basicaly from 08:00 AM to 07:00 PM, in this period the usage os CPU and Memory is High and the speed of work is very important.
I'm preparing to expand the memory and processor in the next days, but i search a more scalable and cost efective way.
Is there a way for automatic add resources when i need and reduce after no more need?
Thanks
The cost of running your VMs is directly related to a number of different factors i.e. the type of network in use (premium vs standard), the machine type, the boot disk image you use (premium vs open-source images) and the region/zone where your workloads are running, among other things.
Your use case seems to fit managed instance groups (MIGs). With MIGs you essentially configure a template for VMs that share the same attributes. During the configuration of your MIG, you will be able to specify the CPU/memory limit beyond which the MIG autoscaler will kick off. When your CPU/memory reading goes below that threshold, MIG scales your VMs down to the number of instances specified in your template.
You can also use requests per second as a threshold for autoscaling and I would recommend you explore the docs to know more about it.
See docs

looking for simple cluster configuration

I am using compute engine for embarrassingly parallel scientific calculations. Some of my calculations require a single core and some require 64-cores machines. I am currently using my own scripts: I have a qsub-like command that creates a new instance with the required number of cores, booting it from a custom image with the pre-installed software, connects to a storage bucket via gcsfuse, runs the required command and then kills the instance after it's done.
Do I really need to do all of that with my own scripts, or is there any tool that I should use instead? I'd much rather use some ready made tool for all of the management.
My usage fluctuates widely (hundreds of cores in parallel for 3 hours, then 2 days with nothing, etc). So I don't want constant sized machines: I like to be billed by the minute for my computations.
You may want to use auto-scaling feature for managed instance group in Google Compute Engine(GCE). This feature adds more instances to your instance group when there is more load (upscaling), and removes instances when there is less load (downscaling). Moreover, you can define autoscaling policy based upon CPU utilization, or Load balancer utilization or request per seconds. Please refer autoscaler decisions document to understand decisions that autoscaler might make when scaling instance groups.

How to correctly calculate RAM requirement for a bucket in Couchbase

We have a bucket of about 34 million items in a Couchbase cluster setup of 6 AWS nodes. The bucket has been allocated 32.1GB of RAM (5482MB per node) and is currently using 29.1GB. If I use the formula provided in the Couchbase documentation (http://docs.couchbase.com/admin/admin/Concepts/bp-sizingGuidelines.html) it should use approx. 8.94GB of RAM.
Am I calculating it incorrectly? Below is link to google spreadsheet with all the details.
https://docs.google.com/spreadsheets/d/1b9XQn030TBCurUjv3bkhiHJ_aahepaBmFg_lJQj-EzQ/edit?usp=sharing
Assuming that you indeed have a working set of 0.5%, which as Kirk pointed out in his comment, is odd but not impossible, then you are calculating the result of the memory sizing formula correctly. However, it's important to understand that the formula is not a hard and fast rule that fits all situations. Rather, it's a general guideline and serves as a good starting point for you to go and begin your performance tests. Also, keep in mind that the RAM sizing isn't the only consideration for deciding on cluster size, because you also have to consider data safety, total disk write throughput, network bandwidth, CPU, how much a single node failure affects the rest of the cluster, and more.
Using the result of the RAM sizing formula as a starting point, you should now actually test whether your working assumptions were correct. Which means putting real (or close to representative) load on the bucket and seeing whether the % of cache misses is low enough and the operation lacency is within your acceptable limits. There is no general rule for this, what's acceptable to some applications might be too slow for others.
Just as an example, if you see that under load your cache miss ratio is 5% and while the average read latency is 3ms, the top 1% latency is 100ms - then you have to consider whether having one out of every 100 reads take that much longer is acceptable in your application. If it is - great, if not - you need to start increasing the RAM size until it matches your actual working set. Similarly, you should keep an eye on the disk throughput, CPU usage, etc.

Knowing ressources taken by libgdx

I've made a game using libgdx. When I launch it, it takes ~30 Mo of RAM. But the memory taken in RAM keep expanding with time, whereas all of the textures are loaded.
Is there a way to know the ressources used by libgdx?
Yes there is. Start the jvisualvm.exe from your java sdk and connect to your running application. (Program/java/jdk_x_x/bin) It does show you the real RAM usage and the created classes and so on (See the Monitor tab).
Moreover you can profile with it to check if there are performance issues. It also can track the RAM usage. Check out the sampler tab for profiling. Simply start the sampler play a bit and shut down the game. After that it asks if you like to have a snapshot of the current status. Or just stop sampling and check the data. Take it and check than which of your stuff does take the most RAM and so on.
Else go for logging your Assetloading and check if you meight make misstakes.

swap space used while physical memory is free

i recently have migrated between 2 servers (the newest has lower specs), and it freezes all the time even though there is no load on the server, below are my specs:
HP DL120G5 / Intel Quad-Core Xeon X3210 / 8GB RAM
free -m output:
total used free shared buffers cached
Mem: 7863 7603 260 0 176 5736
-/+ buffers/cache: 1690 6173
Swap: 4094 412 3681
as you can see there is 412 mb ysed in swap while there is almost 80% of the physical ram available
I don't know if this should cause any trouble, but almost no swap was used in my old server so I'm thinking this does not seem right.
i have cPanel license so i contacted their support and they noted that i have high iowait, and yes when i ran sar i noticed sometimes it exceeds 60%, most often it's 20% but sometimes it reaches to 60% or even 70%
i don't really know how to diagnose that, i was suspecting my drive is slow and this might cause the latency so i ran a test using dd and the speed was 250 mb/s so i think the transfer speed is ok plus the hardware is supposed to be brand new.
the high load usually happens when i use gzip or tar to extract files (backup or restore a cpanel account).
one important thing to mention is that top is reporting that mysql is using 100% to 125% of the CPU and sometimes it reaches much more, if i trace the mysql process i keep getting this error continually:
setsockopt(376, SOL_IP, IP_TOS, [8], 4) = -1 EOPNOTSUPP (Operation not supported)
i don't know what that means nor did i get useful information googling it.
i forgot to mention that it's a web hosting server for what it's worth, so it has the standard setup for web hosting (apache,php,mysql .. etc)
so how do i properly diagnose this issue and find the solution, or what might be the possible causes?
As you may have realized by now, the free -m output shows 7603MiB (~7.6GiB) USED, not free.
You're out of memory and it has started swapping which will drastically slow things down. Since most applications are unaware that the virtual memory is now coming from much slower disk, the system may very well appear to "hang" with no feedback describing the problem.
From your description, the first process I'd kill in order to regain control would be the Mysql. If you have ssh/rsh/telnet connectivity to this box from another machine, you may have to login from that in order to get a usable commandline to kill from.
My first thought (hypothesis?) for what's happening is...
MySQL is trying to do something that is not supported as this machine is currently configured. It could be missing a library or an environment variable is not set or any number things.
That operation allocates some memory but is failing and not cleaning up the allocation when it does. If this were a shell script, it could be fixed by putting an event trap command at the beginning that runs a function that releases memory and cleans up.
The code is written to keep retrying on failure so it rapidly uses up all your memory. Refering back to the shell script illustration, the trap function might also prompt to see if you really want to keep retrying.
Not a complete answer but hopefully will help.