HAProxy - URL Based routing with load balancing - configuration

I am new to HAProxy and I have a question about HAProxy configuration which helps me make a key decision in taking the right approach. This will greatly help me deciding the architecture.
I have 3 apps. Let's say app1, app2, app3.
Each app is differentiated by the urls as follows:
www.example.com/app1/123 -> app1
www.example.com/app2/123 -> app2
www.example.com/app3/123 -> app3
I am planning to have 2 instances of each app in 2 different regions:
Region 1 - app1, app2, app3
Region 2 - app1, app2, app3
I see 2 methods to configure this but I am not sure which is the best practice here:
Method 1: Have HAProxy1 to first differentiate the requests using the url patterns.
Requests from HAProxy1 will be routed to another HAProxy server set up individual apps (3 HAProxy servers in this case) for load balancing.
Method 2: Have one great HAProxy server which does the both as stated in method 1. That is, have configuration to segregate the requests depending on the url and then pass each request through individual filter like things set up for each app for load balancing.
I am not sure if Method 2 is supported in haproxy. Any ideas or suggested is greatly appreciated. Please put some light.

You can segregate requests based on URL and load balance with a single HAProxy server.
Your configuration will have something like this:
frontend http
acl app1 path_end -i /app1/123 #matches path ending with "/app/123"
acl app2 path_end -i /app2/123
acl app3 path_end -i /app3/123
use_backend srvs_app1 if app1
use_backend srvs_app2 if app2
use_backend srvs_app3 if app3
backend srvs_app1 #backend that lists your servers. Use a balancing algorithm as per your need.
balance roundrobin
server host1 REGION1_HOST_FOR_APP1:PORT
server host2 REGION2_HOST_FOR_APP1:PORT
backend srvs_app2
balance roundrobin
server host1 REGION1_HOST_FOR_APP2:PORT
server host2 REGION2_HOST_FOR_APP2:PORT
backend srvs_app3
balance roundrobin
server host1 REGION1_HOST_FOR_APP3:PORT
server host2 REGION2_HOST_FOR_APP3:PORT
More information can be found on the homepage.

Using acl in HAProxy to separate route for each application. You can use path_end or path_beg to match the path. Anyway, if 'd like to change request path to backend, using 'http-request set-uri' and using reg-sub pattern.
backend be_images
balance roundrobin
http-request set-uri '%[path,regsub(^/images/,/static/images,g)]'
server srv1 127.0.0.1:8001

Related

How to allow IP dynamically using ingress controller

My structure
Kubernetes cluster on GKE
Ingress controller deployed using helm
An application which will return list of IP ranges note: it will get updated periodically
curl https://allowed.domain.com
172.30.1.210/32,172.30.2.60/32
Secured application which is not working
What I am trying to do?
Have my clients IPs in my API endpoint which is done
curl https://allowed.domain.com
172.30.1.210/32,172.30.2.60/32
Deploy my example app with ingress so it can pull from the https://allowed.domain.com and allow people to access to the app
What I tried and didn't work?
Deploy the application with include feature of nginx
nginx.ingress.kubernetes.io/configuration-snippet: |
include /tmp/allowed-ips.conf;
deny all;
yes its working but the problem is when /tmp/allowed-ips.conf gets updated the ingress config doesn't
I tried to use if condition to pull the IPs from the endpoint and deny if user is not in the list
nginx.ingress.kubernetes.io/configuration-snippet: |
set $deny_access off;
if ($remote_addr !~ (https://2ce8-73-56-131-204.ngrok.io)) {
set $deny_access on;
}
I am using nginx.ingress.kubernetes.io/whitelist-source-range annotation but that is not what I am looking for
None of the options are working for me.
From the official docs of ingress-nginx controller:
The goal of this Ingress controller is the assembly of a configuration file (nginx.conf). The main implication of this requirement is the need to reload NGINX after any change in the configuration file. Though it is important to note that we don't reload Nginx on changes that impact only an upstream configuration (i.e Endpoints change when you deploy your app)
After the nginx ingress resource was initially created, the ingress controller assembles the nginx.conf file and uses it for routing traffic. Nginx web server does not auto-reload its configuration if the nginx.conf and other config files were changed.
So, you can work around this problem in several ways:
update the k8s ingress resource with new IP addresses and then apply changes to the Kubernetes cluster (kubectl apply / kubectl patch / smth else) / for your options 2 and 3.
run nginx -s reload inside an ingress Pod to reload nginx configuration / for your option 1 with include the allowed list file.
$ kubectl exec ingress-nginx-controller-xxx-xxx -n ingress-nginx -- nginx -s reload
try to write a Lua script (there is a good example for Nginx+Lua+Redis here and here). You should have a good understanding of nginx and lua to estimate if it is worth trying.
Sharing what I implemented at my workplace. We had a managed monitoring tool called Site24x7. The tool pings our server from their VMs with dynamic IPs and we had to automate the whitelisting of the IPs at GKE.
nginx.ingress.kubernetes.io/configuration-snippet allows you to set arbitrary Nginx configurations.
Set up a K8s CronJob resource on the specific namespace.
The CronJob runs a shell script, which
fetches the list of IPs to be allowed (curl, getent, etc.)
generates a set of NGINX configurations (= the value for nginx.ingress.kubernetes.io/configuration-snippet)
runs a kubectl command which overwrites the annotation of the target ingresses.
Example shell/bash script:
#!/bin/bash
site24x7_ip_lookup_url="site24x7.enduserexp.com"
site247_ips=$(getent ahosts $site24x7_ip_lookup_url | awk '{print "allow "$1";"}' | sort -u)
ip_whitelist=$(cat <<-EOT
# ---------- Default whitelist (Static IPs) ----------
# Office
allow vv.xx.yyy.zzz;
# VPN
allow aa.bbb.ccc.ddd;
# ---------- Custom whitelist (Dynamic IPs) ----------
$site247_ips # Here!
deny all;
EOT
)
for target_ingress in $TARGET_INGRESS_NAMES; do
kubectl -n $NAMESPACE annotate ingress/$target_ingress \
--overwrite \
nginx.ingress.kubernetes.io/satisfy="any" \
nginx.ingress.kubernetes.io/configuration-snippet="$ip_whitelist" \
description="*** $(date '+%Y/%m/%d %H:%M:%S') NGINX annotation 'configuration-snippet' updated by cronjob $CRONJOB_NAME ***"
done
The shell/bash script can be stored as ConfigMap to be mounted on the CronJob resource.

How to view Routes pod in OpenShift

I have created a routes for my service in the OpenShift,
oc get routes
NAME HOST/PORT PATH SERVICES PORT
simplewebserver simpleweb.apps.devcluster.os.fly.com simplewebserver 9999
When I ran command: curl http://simpleweb.apps.devcluster.os.fly.com/world
it failed to access my web service. I suspect my route has some problem, but I could not see any route debug information.
My question is, how to find the route pod in the OpenShift Or how to find some route activity information when I access route?
You can check the router logs in logs container of router pods. in our OCP cluster i could see router pods in openshift-ingress namespace.
oc get pods -n openshift-ingress
NAME READY STATUS RESTARTS AGE
router-default-5f9c4b6cb4-12121a 2/2 Running 0 40h
router-default-5f9c4b6cb4-12133a 2/2 Running 0 40h
To get the logs, use below command,
oc -n openshift-ingress -c logs logs -f <router_pod_name>
Also make sure haproxy logs are enabled to find out urls getting hit via router.
https://access.redhat.com/solutions/3397701
As there is limited information about your problem. Here are few things you can try.
Try to curl using a port
curl -kv http://simpleweb.apps.devcluster.os.fly.com:9999
Access the pod logs for which the route was created. Check the service simplewebserver is using the correct selector to route the traffic to the pod.
Do a oc describe service simplewebserver to see the selectors being used.
Check if any network policy is blocking the external traffic.
Check if you can access the target pod using that service from within the same namespace. You can do that by rsh to a pod and then access the service using:
curl -kv http://servicename.projectname.svc.cluster.local

how is the traffic to the openshift_cluster_hostname is redirected to the openshift web console

Question 1 :
1.1. who is sitting behind the "openshift_master_cluster_public_hostname" hostname ? is it the web console ( web console service ? or web service deployment ) or something else ?
1.2. when doing oc get service -n openshift-web-console i can see that the web console is runnung in 443 , isn't it supposed to work on port 8443 , same thing for api server , shouldn't be working on port 8443 ?
1.3. can you explain to me the flow of a request to https://openshift_master_cluster_public_hostname:8443 ?
1.4. in the documentation is
Question 2:
why i get different response for curl and wget ?
when i : curl https://openshift_master_cluster_public_hostname:8443 , i get :
{
"paths": [
"/api",
"/api/v1",
"/apis",
"/apis/",
"/apis/admissionregistration.k8s.io",
"/apis/admissionregistration.k8s.io/v1beta1",
"/apis/apiextensions.k8s.io",
"/apis/apiextensions.k8s.io/v1beta1",
...
"/swagger.json",
"/swaggerapi",
"/version",
"/version/openshift"
]
}
when i : wget https://openshift_master_cluster_public_hostname:8443 i get an index.html page.
Is the web console answering this request or the
Question 3 :
how can i do to expose the web console on port 443 rather then the 8443 , i found several solution :
using variables "openshift_master_console_port,openshift_master_api_port" but found out that these ports are ‘internal’ ports and not designed to be the public ports. So changing this ports could crash your OpenShift setup
using an external service ( described here )
I'm kind of trying to setup port forwarding on an external haporxy , is it doable ?
Answer to Q1:
1.1. Cite from the documentation Configuring Your Inventory File
This variable overrides the public host name for the cluster,
which defaults to the host name of the master. If you use an
external load balancer, specify the address of the external load balancer.
For example:
> openshift_master_cluster_public_hostname=openshift-ansible.public.example.com
This means that this Variable is the Public facing interface to the OpenShift Web-Console.
1.2 A Service is a virtual Object which connects the Service Name to the pods and is used to connect the Route Object with the Service Object. This is explained in the documentation Services. You can use almost every port for a Service because it's virtual and nothing will bind on this Port.
1.3. The answer depend on your setup. I explain it in a ha-setup with a TCP loadbalancer in front of the masters.
/> Master API 1
client -> loadbalancer -> Master API 2
\> Master API 3
The Client make a request to https://openshift_master_cluster_public_hostname:8443 the loadbalancer forwards the Client to the Master API 1 or 2 or 3 and the Client get the answer from the requested Master API Server.
api server redirect to console if request come from a browser ( https://github.com/openshift/origin/blob/release-3.11/pkg/cmd/openshift-kube-apiserver/openshiftkubeapiserver/patch_handlerchain.go#L60-L61 )
Answer to Q2:
curl and wget behaves different because they are different tools but the https request is the same.
curl behavior with wget
wget --output-document=- https://openshift_master_cluster_public_hostname:8443
wget behavior with curl
curl -o index.html https://openshift_master_cluster_public_hostname:8443
Why - is described in Usage of dash (-) in place of a filename
Answer to Q3:
You can use the OpenShift Router which you use for the apps to make the Web-Console available on 443. It's a little bit outdated but the concept is the same for the current 3.x versions Make OpenShift console available on port 443 (https) [UPDATE]

HAProxy - Rewriting URL's transparently

I need to implement an URL rewriting action for a project. This has to be done with HAProxy-1.5 because it is implemented on a PfSense firewall and later versions are not available to this point.
I have the following URLS:
update.domain.com
repository.domain.com
which both point to the same backend server1. The challenge now is to move the document root:
- update.domain.com >> /some/path/repo1.
- repository.domian.com >> /some/path/repo2
Not only is the document root moved but due to a earlier implementation with TMG servers links exists that point to files like this:
update.domain.com/file1.txt
I have tried to work with http-request set-path and some ACL's on the frontend but unfortuanly this function is available with versions > haproxy-1.6
frontend www
bind *:80
acl update_url hdr_beg(host) -m beg update.domain.com
acl update_root path_beg /some/path/repo1/
http-request set-header /some/path/repo1/%[path] if !update_root update_url
use_backend testServer if update_root update_url
default_backend testServer
Links to files such as update.domain.com/file1.txt cant be changed. Keeping TMG is not a solution. How can i get this working with Haproxy-1.5?
For HAProxy 1.5, you can use reqrep, which will replace the request line (and any header lines) with what you specify in your regex, e.g something like:
reqrep ^([^\ :]*)\ /some/path/repo1/(.*) \1\ /some/path/repo2\2
A more detailed explanation of how to use reqrep can be found here.

HAProxy define subdomain wildcard

I am trying to create a HAProxy script which matches certain subdomains to a specific backend.
Given the domains:
foo.x.y.z
bar.x.y.z
bar.a.b.c
baz.a.b.d.e
I want these frontends to be mapped to the backends foo, bar and baz.
I've tried to get the thing working by using hdr_beg() - but I'm missing something so it does not work :-/
This is my config so far:
frontend HttpFrontend
bind *:80
mode http
acl fooBackend hdr_beg(host) -i foo.
acl barBackend hdr_beg(host) -i bar.
default_backend bazBackend
backend bazBackend
mode http
balance leastconn
option forwardfor
server node1 10.0.1.10:80 check inter 5000 rise 3 fall 3
server node2 10.0.2.10:80 check inter 5000 rise 3 fall 3
server node3 10.0.3.10:80 check inter 5000 rise 3 fall 3
backend fooBackend
mode http
option forwardfor
server node4 10.0.1.14:80
backend barBackend
mode http
option forwardfor
server node4 10.0.1.14:80
Can you give me a hint what I am missing?!
Thanks in advance!
You need the use_backend.
frontend HttpFrontend
bind *:80
mode http
acl fooBackend hdr_beg(host) -i foo.
acl barBackend hdr_beg(host) -i bar.
use_backend fooBackend if fooBackend
use_backend barBackend if barBackend
default_backend bazBackend
<...>
Source: https://cbonte.github.io/haproxy-dconv/configuration-1.6.html#use_backend