Dataflow in OpenShift: Configure memory for individual Dataflow tasks - openshift

I'd like to configure individual dataflow tasks with their own memory request and limit values. The default configuration works fine for most tasks, but we have some tasks with higher memory needs. We can start those tasks from dataflow with their own properties, overriding the default configuration. But is it possible to have a configuration in the dataflow-config on OpenShift for individual tasks? (So that we don't have to use those overriding arguments each time we start the task).
Something like this:
deployer:
kubernetes:
requests:
memory: '256Mi'
cpu: '1m'
limits:
memory: '4Gi'
cpu: '6000m'
my-individual-task:
kubernetes:
requests:
memory: '8G'
limits:
memory :'8G'
Testing it with this configuration, the "my-individual-task" had the default configuration with 256Mi-4Gi instead of 8G-8G. (I restarted the dataflow pod with the new configuration before starting the task).⠀⠀⠀⠀⠀⠀⠀

You shouldn't set the task specific properties as server level properties. Instead, you can set these properties as task launch properties when launching the task. For instance:
task launch mytask --properties "deployer.my-individual-task.kubernetes.requests.memory=8G,deployer.my-individual-task.kubernetes.limits.memory=8G"

Testing it with this configuration, the "my-individual-task" had the default configuration with 256Mi-4Gi instead of 8G-8G
In my experience, the only way to successfully change the memory and CPU for an individual task is to provide the following properties when launching the task from the Data Flow Server UI (as properties) or when using the Task Java DSL :
deployer.<application>.kubernetes.limits.cpu=1000m
deployer.<application>.kubernetes.limits.memory=1024Mi
deployer.<application>.kubernetes.requests.cpu=800m
deployer.<application>.kubernetes.requests.memory=640Mi
Note : Specifying these in the application.properties or applicaiton.yaml file does not have any effect whatsoever.

Related

How to change memory allocation for cloud function when deploying by gcloud command

When deploying a cloud function, I'm using command like this.
gcloud functions deploy MyCloudFunction --runtime nodejs8 --trigger-http
Default memory allocation is 256MB. I changed it 1GB using Google cloud console from browser.
Is there a way to change memory allocation when deploying by gcloud command?
You might want to read over the CLI documentation for gcloud functions deploy.
You can use the --memory flag to set the memory:
gcloud functions deploy MyCloud Functions ... --memory 1024MB
You may also need to increase the CPU count to be able to increase memory beyond 512 MiB. Otherwise, with the default 0.33 vCPU Cloud Function allocation, I saw errors like the following, where [SERVICE] is the name of your Google Cloud Function below:
ERROR: (gcloud.functions.deploy) INVALID_ARGUMENT: Could not update Cloud Run service [SERVICE]. spec.template.spec.containers[0].resources.limits.memory: Invalid value specified for container memory. For 0.333 CPU, memory must be between 128Mi and 512Mi inclusive.
From https://cloud.google.com/run/docs/configuring/cpu#command-line, this can be done by calling gcloud run services update [SERVICE] --cpu [CPU], for example:
gcloud run services update [SERVICE] --cpu=4 --memory=16Gi --region=northamerica-northeast1
You should see a response like:
Service [SERVICE] revision [SERVICE-*****-***] has been deployed and is serving 100 percent of traffic.
https://console.cloud.google.com/run can help show what is happening too.

ECS service keeps deregistering Target Group and start/stop tasks

I have an ECS service that is repeatedly starting and stopping a task running on a EC2 (m5.large) launch type container. The Events tab says these messages in a loop -
service test-service deregistered 1 targets in target-group localhost-localhost-default
service test-service has begun draining connections on 1 tasks.
service test-service deregistered 1 targets in target-group localhost-localhost-default
service test-service has started 2 tasks: task 4e1569b3-a15c-4bac-85f7-396b530113a5 task d5651035-8e3d-48df-b457-d05e5b7be8db.
There is nothing more there to help understand what might be going on. When I checked the Target group itself, the instances are not registered anymore to it. I have allocated memory: 1024 and cpu: 512 for the task which should be enough.
Is there anything I can do to understand what the problem here is ?
On this line,
service test-service has started 2 tasks: task 4e1569b3-a15c-4bac-85f7-396b530113a5 task d5651035-8e3d-48df-b457-d05e5b7be8db.
Task ID is a hyperlink, when you click that it will take you the page where you can find all the details about that particular task.
Here there is a entry "stopped reason" which will show why was the task stopped.
If it stopped because of health check failures, it will show in events page itself.

How to compare memory quota control implementation, openshift vs. docker

My customer asked me if openshift can provide the same control on memory usage as docker can, for example, docker run can have the following parameters to control memory usage when running a container:
--kernel-memory
--memory
--memory-reservation
While I searched the corresponding part in openshift, I found ResoureQuota and LimitRange should work for that, but what if a pod claims itself will use 100Mi memory by using LimitRange but actually it will consume 500Mi memory instead? the memory can still be used "illegally", seems docker with --memory can control this situation more better.
In openshift, is there any method for controlling real memory usage instead of checking what a pod claimed in LimitRange or using "oc set resources dc hello --requests=memory=256Mi"?
Best regards
Lan
As far as my experience with Openshift I have not come across the situation where the POD has consumed more memory or CPU for which it has configured. If in case it reaches the threshold, the POD automatically will be killed and restarts.
You can set the POD resource limits in the Deployment config:
resources:
limits:
cpu: 750m
memory: 1024Mi
The resources can be monitored in the metrics section of the respective POD:
Apart from the indiviual POD settings you can define your own overall project settings for each container in the POD.
$ oc get limits
NAME
limits
$ oc describe limits <NAME>
Name: <NAME>
Namespace: <NAME_SPACE>
Type Resource Min Max Default Request Default Limit Max Limit/Request Ratio
---- -------- --- --- --------------- ------------- -----------------------
Pod memory 256Mi 32Gi - - -
Pod cpu 125m 6400m - - -
Container cpu 125m 6400m 125m 750m -
Container memory 256Mi 32Gi 512Mi 1Gi -
For more information on resource settings refer here.
If you only use --requests=memory=256Mi, you set QoS level to "burstable", which means pod can request at least 256Mi memory without upper limit except reaching project quota. If you want to limit pod memory, use --limit=memory=256Mi instead.

OpenShift Next Gen fails to mount persistent volume

I'm trying to set up an app on OpenShift Online Next Gen and I need to store a small file at runtime and read it again during startup. The content of the file changes, so I cannot simply add it to my source code.
My project is already up and running, all I need is persistent storage. So, I open the Web Console, click Browse->Storage and it says there are no volumes available. Same things if I go to Browse->Deployments and try to attach a volume.
So, I logged in via cli and issued the following command:
oc volume dc/mypingbot --add --type=pvc --claim-name=data1 --claim-size=1Gi
Now my volume appears both in the Storage section and in the deplyment section. I attach it to my deployment config using the Web Console and set its mount point to /data1.
The deployment process now takes a while and then fails with the following two errors:
Error syncing pod, skipping: Could not attach EBS Disk "aws://us-east-1c/vol-ba78501e": Error attaching EBS volume: VolumeInUse: vol-ba78501e is already attached to an instance status code: 400, request id:
Unable to mount volumes for pod "mypingbot-18-ilklx_mypingbot(0d22f712-58a3-11e6-a1a5-0e3d364e19a5)": Could not attach EBS Disk "aws://us-east-1c/vol-ba78501e": Error attaching EBS volume: VolumeInUse: vol-ba78501e is already attached to an instance status code: 400, request id:
What am I missing?

Unable to access Google Compute Engine instance using external IP address

I have a Google compute engine instance(Cent-Os) which I could access using its external IP address till recently.
Now suddenly the instance cannot be accessed using its using its external IP address.
I logged in to the developer console and tried rebooting the instance but that did not help.
I also noticed that the CPU usage is almost at 100% continuously.
On further analysis of the Serial port output it appears the init module is not loading properly.
I am pasting below the last few lines from the serial port output of the virtual machine.
rtc_cmos 00:01: RTC can wake from S4
rtc_cmos 00:01: rtc core: registered rtc_cmos as rtc0
rtc0: alarms up to one day, 114 bytes nvram
cpuidle: using governor ladder
cpuidle: using governor menu
EFI Variables Facility v0.08 2004-May-17
usbcore: registered new interface driver hiddev
usbcore: registered new interface driver usbhid
usbhid: v2.6:USB HID core driver
GRE over IPv4 demultiplexor driver
TCP cubic registered
Initializing XFRM netlink socket
NET: Registered protocol family 17
registered taskstats version 1
rtc_cmos 00:01: setting system clock to 2014-07-04 07:40:53 UTC (1404459653)
Initalizing network drop monitor service
Freeing unused kernel memory: 1280k freed
Write protecting the kernel read-only data: 10240k
Freeing unused kernel memory: 800k freed
Freeing unused kernel memory: 1584k freed
Failed to execute /init
Kernel panic - not syncing: No init found. Try passing init= option to kernel.
Pid: 1, comm: swapper Not tainted 2.6.32-431.17.1.el6.x86_64 #1
Call Trace:
[] ? panic+0xa7/0x16f
[] ? init_post+0xa8/0x100
[] ? kernel_init+0x2e6/0x2f7
[] ? child_rip+0xa/0x20
[] ? kernel_init+0x0/0x2f7
[] ? child_rip+0x0/0x20
Thanks in advance for any tips to resolve this issue.
Mathew
It looks like you might have an script or other program that is causing you to run out of Inodes.
You can delete the instance without deleting the persistent disk (PD) and create a new vm with a higher capacity using your PD, however if it's an script causing this, you will end up with the same issue. It's always recommended to backup your PD before making any changes.
Run this command to find more info about your instance:
gcutil --project= getserialportoutput
If the issue still continue, you can either
- Make a snapshot of your PD and make a PD's copy or
- Delete the instance without deleting the PD
Attach and mount the PD to another vm as a second disk, so you can access it to find what is causing this issue. Visit this link https://developers.google.com/compute/docs/disks#attach_disk for more information on how to do this.
Visit this page http://www.ivankuznetsov.com/2010/02/no-space-left-on-device-running-out-of-inodes.html for more information about inodes troubleshooting.
Make sure the Allow HTTP traffic setting on the vm is still enabled.
Then see which network firewall you are using and it's rules.
If your network is set up to use an ephemral IP, it will be periodically released back. This will cause your IP to change over time. Set it to static/reserved then (on networks page).
https://developers.google.com/compute/docs/instances-and-network#externaladdresses