Changing metrics time intervals in OCP - openshift

I'm using Openshift Container Platform. When I see the metrics of a pod, I noticed the intervals of the metrics are set to 8 minutes. Is there any way to make it to be less or an option to edit that time intervals at all?
Thank you.

Related

Having a hard time to correctly set the RAILS_MAX_THREADS based on the max number of allowed connections

I'm having a hard time to understand the math that I have to do to find out the correct number of RAILS_MAX_THREADS based on my infrastructure.
I'm using multi containers to host one copy of my API that accepts HTTP requests and one copy of my API that runs sidekiq (job processing). The database that I'm using has a max_connections of 45. With that being said, what should be the number of RAILS_MAX_THREADS? I'm using 9 for RAILS_MAX_THREADS AND WEB_CONCURRENCY. I read a few articles about it but I haven't been able to fully wrap my head around it.
Heroku's docs on puma sizing are some of the best even if you aren't using heroku.
https://devcenter.heroku.com/articles/deploying-rails-applications-with-the-puma-web-server
Where they say "dyno", you can just read "host", "virtual machine", or "container" for a non-heroku deploy.
If you are using 9 for RAILS_MAX_THREADS and WEB_CONCURRENCY, and your heroku config is set to use these settings in the normal way -- then every host will have 9 puma workers running (WEB_CONCURRENCY), and each worker will be running 9 threads (RAILS_MAX_THREADS), for a total of 9*9=81 threads.
You really need enough database connections for each thread, so you are already over your 45 database connections by almost 2x. And that's only on ONE container -- if you are running multiple containers, each with these settings, than multiply the 81 by the number of containers -- so that's far too many for your database connections!
So if you are unable to change the max database connections, that is a hard limit and you need to reduce your numbers.
Otherwise, the main limiting factor is how much RAM you have available in each container, and how many vCPUs. Ideally you'd run at least as many workers on a container (WEB_CONCURRENCY) as you have vCPUs -- if you have enough RAM to do it. Workers take a lot of RAM. There is usually no reason to run MORE workers than vCPUs, so whether 9 makes sense or is larger than needed depends on your infrastructure.
How many threads per worker (RAILS_MAX_THREADS) is optimal can depend on exactly what your app is doing, but as a good rule of thumb you can start at 5. 9 is probably more than is useful, generally.
So I'd try RAILS_MAX_THREADS of 3-5. Then as much WEB_CONCURRENCY as you can without running out of RAM (to see how much RAM the app will take after being up under load for a while, you might need to leave it up under load for a while). So long as containers * RAILS_MAX_THREADS * WEB_CONCURRENCY is less than your database max connections -- if it's not, either reduce your values so it is, or increase your database max connections.

Why Google compute engine autoscaling creates too much instances when multi-zone is selected?

I've been using autoscaling based on cpu usage. We used to set it up using a single zone, but to ensure instance availability we are now creating it with multi-zone enabled.
Now it seems to create much more instances than required according to cpu usage. I believe it has to do with the fact that instances are created among different zones and the total usage calculation somehow is not taking that into consideration.
From the documentation, the regional autoscaler will need at least 3 instances that will be located in 3 different zones, even though your utilisation is lower and it could be served from an instance in a single zone. This is to provide resiliency, because a region is less likely to go down than a single zone.

Google Cloud SQL Storage Usage High Issue

We have Setup Cloud SQL in google cloud with configuration of Tier db-n1-standard-4 with storage of 100GB SSD. My actual database size is having only 160MB Max but in Cloud Cloud SQL instances it showing up to 72GB used i don't know why? and its still increasing per day about 10GB. Can anyone explain about this issue.
Thanks
Most of the time this is due to binary logs that are used for replication.
The growth of binary logs is roughly proportional to the amount of modified rows.
Binary logs are purged after 7 days so the space will stabilize after 7 days.
Possibly you are enabling general_log option. Check EDIT -> Cloud SQL flags -> general_log. If this is on, turn it to off.

Creating a large click-to-deploy cassandra cluster

I have worked through a number of quota issues in trying to stand up a 30 node click-to-deploy cassandra cluster. The next issue is that the data disks are not becoming available within the 300 seconds allotted in wait-for-disk.sh.
I've tried several times in us-central1-b, once in us-central1-a and the results range from half of the disks up to 24 of 30. The disks eventually all show up, so no quota issue here, just the timing as far as I can tell.
I've been able to ssh into one node and nearly figure out which steps to run, setting up required env vars and running the steps in /gagent/. I've gotten the disk mounted and configured and get cassandra started but the manually repaired node is still missing from the all-important CASSANDRA_NODE_VIEW_NAME and I must be missing some services because I still can't run cqlsh on the manually repaired node.
It's a bit tedious to set up this way but I could complete the cluster this way manually. Do I need to get it added to the view? How? Or is there a way to specify a longer timeout in wait-for-disk.sh? I'd be willing to wait a pretty long time over doing the remaining setup manually.
We'll look at updating the disk wait value for the next release. Thanks for the feedback! You should be able to join the Cassandra cluster manually after running the install scripts in /gagent. Let me know if you're still having trouble.

Best Approach to Storing Mean Uptime Data

We have 500+ remote locations. Each location has a linux router which checks in to our management system (homemade using RoR3) every 15 minutes.
We need to log and calculate mean uptime of each boxes Internet connectivity.
Each router posts a request every 15 minutes to a script on the server. (Currently this just records the last checkin time and the uptime.)
If we want to plot the historical uptime of each box, what is the most efficient way to do this without clogging our db up.
500 boxes checking in every 15 minutes would (according to my calculations) result in 17,520,000 inserts. Quite a hefty amount of data that I don't think we need.
Could anyone help solve this riddle for us?
Why not take a look at RRDTool (Wiki-entry). It's just the tool for this kind of situation.
It works as a sort of a round-robin self-averaging database, and it's used in many logging applications just for similar purposes to your situation.
As an example take a look at Cacti which is a data-logging / network monitoring and graphing front-end app built around RRDTool (implemented in PHP).