Openshift scaling best practices - openshift

Could you help me in one thing?
I have got a project with few deployments - databases, app1, app2, monitoring etc.
I have a very specific system that needs to be scaled with multiple metrics that are stored in monitoring system.
I have created a small microservice which checks if conditions are met (it's not something like simple GET - there is a whole algorithm that calculates it) and scales the environment (oc app or curl - it doesn't matter at this point).
Here's my question - is it a good solution? I wonder if it can be done in a better way.
And the second one - is it okay to create new serviceaccount with edit role just to perform scaling?
I know it's not complicated when you have some experience in openshift but this system is so specific (autoscaling managed by algorithms) that I couldn't find anything useful in docs.
Thank you.

Related

Kubernetes container cluster conventions

I have been trying out kubernetes on GCP to build microservices for a while and it has been amazing so far.
Although i am a bit confused on what would probably be the best approach, should i
create (gcloud container clusters create "[cluster-name]" ...) one container-cluster per service?
create one container-cluster for multiple services? or
do both of those above depending on my situation?
all of the examples i have managed to find has only covered #2, but my hunch is kind of telling me that i should do #1, and my hunch is also kind of telling me that i have probably missed some basic understanding around containers, i have been trying to find answers without any luck, i guess i just can't figure out the right search keyword, i am hoping that i could find some answer here.
I suspect the answer is "it depends" (Option 3)
How isolated do you need each application to be? How much redundancy (tolerance of VM failure) do you need? How many developers will have access to the Kubernetes cluster? How much traffic do the apps generate?
If in doubt I recommend running all your apps on a single cluster. Simpler to manage and the overhead in providing highly available infrastructure is then shared. You'll also have greater levels of VM utilization, which perhaps might result in reduced hosting costs. The latter is a subjective observation, some apps have very little or occasional traffic resulting in very bored servers :-)

how to get started with a project

I have to write a program that tests products fully automatic.
I still don't have the finished prototype board but i have an development board for the I.MX6UL processor
see picture below
http://www.nxp.com/products/sensors/gyroscopes/i.mx6ultralite-evaluation-kit:MCIMX6UL-EVK?
My first task is to put an uboot and linux file system on the board trought TCL code => asked by customer.
This all have to be done trough a usb connection that is connected to the development kit board.
NXP provide some tools that is called MFGTOOL2 => with this i can install a fully working linux but ofcourse i need to do this with code scripting and not via a tool because it's for production testing.
All this has to be installed on an nand flash ?
Your question is very vague and likely to not be useful to others as so much of it looks specific to your development environment. This means that the first step is going to be for you to do some research so that you can learn how to split the project up into smaller tasks that are more easily achievable. One of the key things is likely to be identifying a mechanism — any mechanism — for achieving what you want in terms of communication (and enumerating exactly what are the things that will be communicating!) Once you know what program to run, API to call or message to send, you can then think “How do I do that in Tcl?” but until you are clear what you are trying to do, you won't get good answers from anyone else. Indeed, right now your question is so vague that we could answer “write some programs” and have that be a precisely correct (but unhelpful) reply to you.
You might want to start by identifying in your head whether the program will be running on the board, or on the system that is connected to the board, or on a system that is remote from both of those and that will be doing more complex management from afar.

Accessing heapster metrics in Kubernetes code

I would like to expand/shrink the number of kubelets being used by kubernetes cluster based on resource usage. I have been looking at the code and have some idea of how to implement it at a high level.
I am stuck on 2 things:
What will be a good way for accessing the cluster metrics (via Heapster)? Should I try to use the kubedns for finding the heapster endpoint and directly query the API or is there some other way possible? Also, I am not sure on how to use kubedns to get the heapster URL in the former.
The rescheduler which expands/shrinks the number of nodes will need to kick in every 30 minutes. What will be the best way for it. Is there some interface or something in the code which I can use for it or should I write a code segment which gets called every 30 mins and put it in the main loop?
Any help would be greatly appreciated :)
Part 1:
What you said about using kubedns to find heapster and querying that REST API is fine.
You could also write a client interface that abstracts the interface to heapster -- that would help with unit testing.
Take a look at this metrics client:
https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/podautoscaler/metrics/metrics_client.go
It doesn't do exactly what you want: it gets per-Pod stats instead of per-cluster or per-node stats. But you could modify it.
In function getForPods, you can see the code that resolves the heapster service and connects to it here:
resultRaw, err := h.client.Services(h.heapsterNamespace).
ProxyGet(h.heapsterService, metricPath, map[string]string{"start": startTime.Format(time.RFC3339)}).
DoRaw()
where heapsterNamespace is "kube-system" and heapsterService is "heapster".
That metrics client is part of the "horizonal pod autoscaler" implementation. It is solving a slightly different problem, but you should take a look at it if you haven't already. If is described here: https://github.com/kubernetes/kubernetes/blob/master/docs/design/horizontal-pod-autoscaler.md
FYI: The Heapster REST API is defined here:
https://github.com/kubernetes/heapster/blob/master/docs/model.md
You should poke around and see if there are node-level or cluster-level CPU metrics that work for you.
Part 2:
There is no standard interface for shrinking nodes. It is different for each cloud provider. And if you are on-premises, then you can't shrink nodes.
Related discussion:
https://github.com/kubernetes/kubernetes/issues/11935
Side note: Among kubernetes developers, we typically use the term "rescheduler" when talking about something that rebalances pods across machines, by removing a pod from one machine and creates the same kind of pod on another machine. That is a different thing than the thing you are talking about building. We haven't built a rescheduler yet, but there is an outline of how to build one here:
https://github.com/kubernetes/kubernetes/blob/master/docs/proposals/rescheduler.md

Simple Pulling Mechanisms and Approaches in Enterprise Platforms

This question is not directly on code or to debug rather more on a solution.
Working on defining a solution around the following requirements:
Consume data from diverse upstream systems.
Publish the data to the subscribers.
Straight forward to some extent because I need to identify the suitable pub/sub mechanisms which best suites for enterprise platform, but point 1 is where I am facing some complexity because I am confused about whether to go for push approach or pull approach.
For a push approach, what I can think of is to use MQ as a broker in between and define the standard message format for the upstream systems. But the main drawback with this approach is, there will be a some level code changes at the upstream systems.
So, why not the pull approach? If this question arises, then I don't have the right answers because I could not think of any way to pull the data from the source systems without the native services available with source systems. Please suggest some of the approaches best suited for this problem.
Please do not suggest on ESB kind of solutions because, this is a simple case which we are trying to solve and hence ESB will be an overkill. Please let me know if my question is not clear.

Building a recommendation engine for a webapp (Rails + MySQL + ?) -- where to start

I'd like to build a recommendation engine to support a web app which is running on Rails and has its data stored in MySQL... something along the lines where users click on things and their feedback updates the database, which then is processed in some sort of realtime-esque fashion. Order of magnitude I'm thinking probably 10s of interactions a second across all users; 1M datapoints a day.
My question is how do I structure and handle analysis such that things can be quickly processed. Utilizing what I already know, I can use some flavor of Ruby and R (RServe, RSRuby) to run SVD/clustering/ensemble/whatevermodels on the existing dataset, and update the model/formulas via sampling every so often, but that seems like a really clunky way to do things. What is a better way of doing this? Running the math directly in MySQL? Using some cool Ruby library that has great math functions? Use an off-the-shelf recommendation engine package?
(I have a distinct lack of awareness in what's out there, despite looking at all the "similar questions" links suggested. Sweet irony. :( )
PS: My background: numbers guy with a few years of R, but entirely for static/offline data. Newbie programmer in Python, Rails, etc., but I can work on that front.
Do you really need realtime?
I found that most of these "realtime" cases usually don't really require true realtime but can be done in the background.
Assuming a web-shop where you want to give your customer recommendations on his past sales on maybe on the current selected item (related items other people bought with this one) you could simply precalculate that data in set intervals.
For cases like the one described above I would suggest you use a Rake task to do the heavy lifting (recommendations on past sales is not really something that changes right during the session - and recommendations on related items is also fairly static).
So I would calculate those during a Cron job or some other recurring task that is asynchronous to your Web-Application while you serve the resulting (precomputed) data to active users.
That way you also get a bit more flexibility on the complexity of your calculations since you can run longer than the web requestion should take at maximum.
A sample rake task would look like this:
task :calculate_recommendations => :environment do
// do your calculation
// you have full access to ActiveRecord here
end
(Make sure to include the :environmentotherwise Rake will not load the Database connections for you.
How do you do the calculation is up to you then, but I would suggest you look at Gems like Recommendify to see what libraries they are using to calculate the recommendations. Maybe that is of help to you.
Also in the RubyToolbox there is a Recommender-Engine category that lists a few gems that are similar and may give you pointers on the right direction.