dev-docs: Helm chart for full L3 VPN connectivity (#2620)

* dev-docs: add 'things to try' section to VPN howto

* dev-docs: full L3 connectivity in VPN chart
This commit is contained in:
Markus Rudy 2024-01-16 13:59:33 +01:00 committed by GitHub
parent 9181705299
commit 16c63d57cd
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
15 changed files with 242 additions and 246 deletions

View File

@ -2,35 +2,83 @@
This Helm chart deploys a VPN server to your Constellation cluster.
## Installation
## Prerequisites
1. Create and populate the configuration.
* Constellation >= v2.14.0
* A publicly routable VPN endpoint on premises that supports IPSec in IKEv2
tunnel mode with NAT traversal enabled.
* A list of on-prem CIDRs that should be reachable from Constellation.
## Setup
1. Configure Cilium to route services for the VPN (see [Architecture](#architecture) for details).
* Edit the Cilium config: `kubectl -n kube-system edit configmap cilium-config`.
* Set the config item `enable-sctp: "true"`.
* Restart the Cilium agents: `kubectl -n kube-system rollout restart daemonset/cilium`.
2. Create the Constellation VPN configuration file.
```sh
helm inspect values . >config.yaml
```
2. Install the Helm chart.
3. Populate the Constellation VPN configuration file. At least the following
need to be configured:
* The list of on-prem CIDRs (`peerCIDRs`).
* The `ipsec` subsection.
4. Install the Helm chart.
```sh
helm install -f config.yaml vpn .
```
3. Follow the post-installation instructions displayed by the CLI.
5. Configure the on-prem gateway with Constellation's pod and service CIDR
(see `config.yaml`).
## Things to try
Ask CoreDNS about its own service IP:
```sh
dig +notcp @10.96.0.10 kube-dns.kube-system.svc.cluster.local
```
Ask the Kubernetes API server about its wellbeing:
```sh
curl --insecure https://10.96.0.1:6443/healthz
```
Ping a pod:
```sh
ping $(kubectl get pods vpn-frontend-0 -o go-template --template '{{ .status.podIP }}')
```
## Architecture
The VPN server is deployed as a `StatefulSet` to the cluster. It hosts the VPN frontend component, which is responsible for relaying traffic between the pod and the on-prem network, and the routing components that provide access to Constellation resources. The frontend supports IPSec and Wireguard.
The VPN server is deployed as a `StatefulSet` to the cluster. It hosts the VPN
frontend component, which is responsible for relaying traffic between the pod
and the on-prem network over an IPSec tunnel.
The VPN frontend is exposed with a public LoadBalancer to be accessible from the on-prem network. Traffic that reaches the VPN server pod is split into two categories: pod IPs and service IPs.
The VPN frontend is exposed with a public LoadBalancer so that it becomes
accessible from the on-prem network.
The pod IP range is NATed with an iptables rule. On-prem worklaods can establish connections to a pod IP, but the Constellation workloads will see the client IP translated to that of the VPN frontend pod.
An init container sets up IP routes on the frontend host and inside the
frontend pod. All routes are bound to the frontend pod's lxc interface and thus
deleted together with it.
The service IP range is handed to a transparent proxy running in the VPN frontend pod, which relays the connection to a backend pod. This is necessary because of the load-balancing mechanism of Cilium, which assumes service IP traffic to originate from the Constellation cluster itself. As for pod IP ranges, Constellation pods will only see the translated client address.
A VPN operator deployment is added that configures the `CiliumEndpoint` with
on-prem IP ranges, thus configuring routes on non-frontend hosts. The endpoint
shares the frontend pod's lifecycle.
In Cilium's default configuration, service endpoints are resolved in cgroup
eBPF hooks that are not applicable to VPN traffic. We force Cilium to apply
service NAT at the LXC interface by enabling SCTP support.
## Limitations
* Service IPs need to be proxied by the VPN frontend pod. This is a single point of failure, and it may become a bottleneck.
* IPs are NATed, so the Constellation pods won't see the real on-prem IPs.
* NetworkPolicy can't be applied selectively to the on-prem ranges.
* No connectivity from Constellation to on-prem workloads.
* VPN traffic is handled by a single pod, which may become a bottleneck.
* Frontend pod restarts / migrations invalidate IPSec connections.
* Only pre-shared key authentication is supported.

View File

@ -0,0 +1,46 @@
#!/bin/sh
signaled() {
exit 143
}
trap signaled INT TERM
all_ips() {
kubectl get pods "${VPN_FRONTEND_POD}" -o go-template --template '{{ range .status.podIPs }}{{ printf "%s " .ip }}{{ end }}'
echo "${VPN_PEER_CIDRS}"
}
cep_patch() {
for ip in $(all_ips); do printf '{"ipv4": "%s"}' "${ip}"; done | jq -s -c -j |
jq '[{op: "replace", path: "/status/networking/addressing", value: . }]'
}
# Format the space-separated CIDRs into a JSON array.
vpn_cidrs=$(for ip in ${VPN_PEER_CIDRS}; do printf '"%s" ' "${ip}"; done | jq -s -c -j)
masq_patch() {
kubectl -n kube-system get configmap ip-masq-agent -o json |
jq -r .data.config |
jq "{ masqLinkLocal: .masqLinkLocal, nonMasqueradeCIDRs: ((.nonMasqueradeCIDRs - ${vpn_cidrs}) + ${vpn_cidrs}) }" |
jq '@json | [{op: "replace", path: "/data/config", value: . }]'
}
reconcile_masq() {
if ! kubectl -n kube-system get configmap ip-masq-agent > /dev/null; then
# We don't know enough to create an ip-masq-agent.
return 0
fi
kubectl -n kube-system patch configmap ip-masq-agent --type json --patch "$(masq_patch)" > /dev/null
}
while true; do
# Reconcile CiliumEndpoint to advertise VPN CIDRs.
kubectl patch ciliumendpoint "${VPN_FRONTEND_POD}" --type json --patch "$(cep_patch)" > /dev/null
# Reconcile ip-masq-agent configuration to exclude VPN traffic.
reconcile_masq
sleep 10
done

View File

@ -0,0 +1,44 @@
#!/bin/sh
set -u
if [ "$$" -eq "1" ]; then
echo 'This script must run in the root PID namespace, but $$ == 1!' >&2
exit 1
fi
myip() {
ip -j addr show eth0 | jq -r '.[0].addr_info[] | select(.family == "inet") | .local'
}
# Disable source IP verification on our network interface. Otherwise, VPN
# packets will be dropped by Cilium.
reconcile_sip_verification() {
# We want all of the cilium calls in this function to target the same
# process, so that we fail if the agent restarts in between. Thus, we only
# query the pid once per reconciliation.
cilium_agent=$(pidof cilium-agent) || return 0
cilium() {
nsenter -t "${cilium_agent}" -a -r -w cilium "$@"
}
myendpoint=$(cilium endpoint get "ipv4:$(myip)" | jq '.[0].id') || return 0
if [ "$(cilium endpoint config "${myendpoint}" -o json | jq -r .realized.options.SourceIPVerification)" = "Enabled" ]; then
cilium endpoint config "${myendpoint}" SourceIPVerification=Disabled
fi
}
# Set up the route from the node network namespace to the VPN pod.
reconcile_route() {
for cidr in ${VPN_PEER_CIDRS}; do
nsenter -t 1 -n ip route replace "${cidr}" via "$(myip)"
done
}
while true; do
reconcile_route
reconcile_sip_verification
sleep 10
done

View File

@ -1,38 +0,0 @@
#!/bin/sh
set -eu
### Pod IPs ###
# Pod IPs are just NATed.
iptables -t nat -N VPN_POST || iptables -t nat -F VPN_POST
for cidr in ${VPN_PEER_CIDRS}; do
iptables -t nat -A VPN_POST -s "${cidr}" -d "${VPN_POD_CIDR}" -j MASQUERADE
done
iptables -t nat -C POSTROUTING -j VPN_POST || iptables -t nat -A POSTROUTING -j VPN_POST
### Service IPs ###
# Service IPs need to be connected to locally to trigger the cgroup connect hook, thus we send them to the transparent proxy.
# Packets with mark 1 are for tproxy and need to be delivered locally.
# For more information see: https://www.kernel.org/doc/Documentation/networking/tproxy.txt
pref=42
table=42
mark=0x1/0x1
ip rule add pref "${pref}" fwmark "${mark}" lookup "${table}"
ip route replace local 0.0.0.0/0 dev lo table "${table}"
iptables -t mangle -N VPN_PRE || iptables -t mangle -F VPN_PRE
for cidr in ${VPN_PEER_CIDRS}; do
for proto in tcp udp; do
iptables -t mangle -A VPN_PRE -p "${proto}" -s "${cidr}" -d "${VPN_SERVICE_CIDR}" \
-j TPROXY --tproxy-mark "${mark}" --on-port 61001
done
done
iptables -t mangle -C PREROUTING -j VPN_PRE || iptables -t mangle -A PREROUTING -j VPN_PRE

View File

@ -1,13 +0,0 @@
#!/bin/sh
set -eu
dev=vpn_wg0
ip link add dev "${dev}" type wireguard
wg setconf "${dev}" /etc/wireguard/wg.conf
ip link set dev "${dev}" up
for cidr in ${VPN_PEER_CIDRS}; do
ip route replace "${cidr}" dev "${dev}"
done

View File

@ -37,4 +37,6 @@ app.kubernetes.io/instance: {{ .Release.Name }}
value: {{ .Values.podCIDR | quote }}
- name: VPN_SERVICE_CIDR
value: {{ .Values.serviceCIDR | quote }}
- name: VPN_FRONTEND_POD
value: {{ include "..fullname" . }}-frontend-0
{{- end }}

View File

@ -1,22 +1,11 @@
apiVersion: v1
kind: ConfigMap
metadata:
name: {{ include "..fullname" . }}-tproxy
name: {{ include "..fullname" . }}-operator
labels: {{- include "..labels" . | nindent 4 }}
data:
{{ (.Files.Glob "files/tproxy-setup.sh").AsConfig | indent 2 }}
{{ (.Files.Glob "files/operator/*").AsConfig | indent 2 }}
---
{{- if .Values.wireguard.enabled }}
apiVersion: v1
kind: ConfigMap
metadata:
name: {{ include "..fullname" . }}-wg
labels: {{- include "..labels" . | nindent 4 }}
data:
{{ (.Files.Glob "files/wireguard-setup.sh").AsConfig | indent 2 }}
{{- end }}
---
{{ if .Values.ipsec.enabled }}
apiVersion: v1
kind: ConfigMap
metadata:
@ -24,4 +13,3 @@ metadata:
labels: {{- include "..labels" . | nindent 4 }}
data:
{{ (.Files.Glob "files/strongswan/*").AsConfig | indent 2 }}
{{- end }}

View File

@ -0,0 +1,32 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ include "..fullname" . }}-operator
labels: {{- include "..labels" . | nindent 4 }}
spec:
replicas: 1
selector:
matchLabels:
{{- include "..selectorLabels" . | nindent 6 }}
component: operator
template:
metadata:
labels:
{{- include "..selectorLabels" . | nindent 8 }}
component: operator
spec:
serviceAccountName: {{ include "..fullname" . }}
automountServiceAccountToken: true
containers:
- name: operator
image: {{ .Values.image | quote }}
command: ["sh", "/scripts/entrypoint.sh"]
env: {{- include "..commonEnv" . | nindent 10 }}
volumeMounts:
- name: scripts
mountPath: "/scripts"
readOnly: true
volumes:
- name: scripts
configMap:
name: {{ include "..fullname" . }}-operator

View File

@ -0,0 +1,33 @@
apiVersion: v1
kind: ServiceAccount
metadata:
name: {{ include "..fullname" . }}
automountServiceAccountToken: false
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: {{ include "..fullname" . }}
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get"]
- apiGroups: [""]
resources: ["configmaps"]
verbs: ["get", "patch"]
- apiGroups: ["cilium.io"]
resources: ["ciliumendpoints"]
verbs: ["get", "patch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: {{ include "..fullname" . }}
subjects:
- kind: ServiceAccount
name: {{ include "..fullname" . }}
namespace: {{ .Release.Namespace }}
roleRef:
kind: ClusterRole
name: {{ include "..fullname" . }}
apiGroup: rbac.authorization.k8s.io

View File

@ -1,15 +1,3 @@
{{- if .Values.wireguard.enabled }}
apiVersion: v1
kind: Secret
metadata:
name: {{ include "..fullname" . }}-wg
labels:
{{- include "..labels" . | nindent 4 }}
data:
wg.conf: {{ include "wireguard.conf" . | b64enc }}
{{- end }}
---
{{ if .Values.ipsec.enabled }}
apiVersion: v1
kind: Secret
metadata:
@ -18,4 +6,3 @@ metadata:
{{- include "..labels" . | nindent 4 }}
data:
swanctl.conf: {{ include "strongswan.swanctl-conf" . | b64enc }}
{{- end }}

View File

@ -11,16 +11,9 @@ spec:
component: frontend
externalTrafficPolicy: Local
ports:
{{- if .Values.ipsec.enabled }}
- name: isakmp
protocol: UDP
port: 500
- name: ipsec-nat-t
protocol: UDP
port: 4500
{{- end }}
{{- if .Values.wireguard.enabled }}
- name: wg
protocol: UDP
port: {{ .Values.wireguard.port }}
{{- end }}

View File

@ -1,4 +1,3 @@
{{ if .Values.ipsec.enabled -}}
apiVersion: apps/v1
kind: StatefulSet
metadata:
@ -15,64 +14,41 @@ spec:
{{- include "..selectorLabels" . | nindent 8 }}
component: frontend
spec:
hostNetwork: false
initContainers:
- name: tproxy-setup
image: nixery.dev/busybox/iptables
command: ["/bin/sh", "-x", "/entrypoint.sh"]
env: {{- include "..commonEnv" . | nindent 10 }}
securityContext:
capabilities:
add: ["NET_ADMIN"]
volumeMounts:
- name: tproxy-setup
mountPath: "/entrypoint.sh"
subPath: "tproxy-setup.sh"
readOnly: true
hostPID: true
containers:
- name: tproxy
# Image source: github.com/burgerdev/go-tproxy
image: ghcr.io/burgerdev/go-tproxy:latest
command: ["/tproxy", "--port=61001", "--nat=true"]
securityContext:
capabilities:
add: ["NET_RAW"]
- name: strongswan
image: "nixery.dev/shell/strongswan"
command: ["/bin/sh", "-x", "/entrypoint.sh"]
image: {{ .Values.image | quote }}
command: ["sh", "-x", "/entrypoint.sh"]
securityContext:
capabilities:
add: ["NET_ADMIN"]
volumeMounts:
- name: strongswan
- name: files
mountPath: "/entrypoint.sh"
subPath: "entrypoint.sh"
readOnly: true
- name: strongswan
- name: files
mountPath: "/etc/strongswan.d/charon-logging.conf"
subPath: "charon-logging.conf"
readOnly: true
- name: strongswan
- name: config
mountPath: "/etc/swanctl/swanctl.conf"
subPath: "swanctl.conf"
readOnly: true
- name: cilium-setup
image: {{ .Values.image | quote }}
command: ["sh", "/scripts/sidecar.sh"]
env: {{- include "..commonEnv" . | nindent 10 }}
securityContext:
privileged: true
volumeMounts:
- name: files
mountPath: "/scripts"
readOnly: true
volumes:
- name: tproxy-setup
- name: files
configMap:
name: {{ include "..fullname" . }}-tproxy
- name: strongswan
projected:
sources:
- secret:
name: {{ include "..fullname" . }}-strongswan
items:
- key: swanctl.conf
path: swanctl.conf
- configMap:
name: {{ include "..fullname" . }}-strongswan
items:
- key: entrypoint.sh
path: entrypoint.sh
- key: charon-logging.conf
path: charon-logging.conf
{{- end }}
- name: config
secret:
secretName: {{ include "..fullname" . }}-strongswan

View File

@ -1,14 +0,0 @@
{{- define "wireguard.conf" }}
[Interface]
ListenPort = {{ .Values.wireguard.port }}
PrivateKey = {{ .Values.wireguard.private_key }}
[Peer]
PublicKey = {{ .Values.wireguard.peer_key }}
AllowedIPs = {{ join "," .Values.peerCIDRs }}
{{- if .Values.wireguard.endpoint }}
Endpoint = {{- .Values.wireguard.endpoint }}
{{- end }}
{{- if .Values.wireguard.keepAlive }}
PersistentKeepalive = {{- .Values.wireguard.keepAlive }}
{{- end }}
{{ end }}

View File

@ -1,68 +0,0 @@
{{ if .Values.wireguard.enabled -}}
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: {{ include "..fullname" . }}-frontend
labels: {{- include "..labels" . | nindent 4 }}
spec:
selector:
matchLabels:
{{- include "..selectorLabels" . | nindent 6 }}
component: frontend
template:
metadata:
labels:
{{- include "..selectorLabels" . | nindent 8 }}
component: frontend
spec:
hostNetwork: false
initContainers:
- name: tproxy-setup
image: nixery.dev/busybox/iptables
command: ["/bin/sh", "-x", "/entrypoint.sh"]
env: {{- include "..commonEnv" . | nindent 10 }}
securityContext:
capabilities:
add: ["NET_ADMIN"]
volumeMounts:
- name: tproxy-setup
mountPath: "/entrypoint.sh"
subPath: "tproxy-setup.sh"
readOnly: true
- name: wg-setup
image: "nixery.dev/busybox/wireguard-tools"
command: ["/bin/sh", "-x", "/etc/wireguard/wireguard-setup.sh"]
env: {{- include "..commonEnv" . | nindent 10 }}
securityContext:
capabilities:
add: ["NET_ADMIN"]
volumeMounts:
- name: wireguard
mountPath: "/etc/wireguard"
readOnly: true
containers:
- name: tproxy
# Image source: github.com/burgerdev/go-tproxy
image: ghcr.io/burgerdev/go-tproxy:latest
command: ["/tproxy", "--port=61001", "--nat=true"]
securityContext:
capabilities:
add: ["NET_RAW"]
volumes:
- name: tproxy-setup
configMap:
name: {{ include "..fullname" . }}-tproxy
- name: wireguard
projected:
sources:
- secret:
name: {{ include "..fullname" . }}-wg
items:
- key: wg.conf
path: wg.conf
- configMap:
name: {{ include "..fullname" . }}-wg
items:
- key: wireguard-setup.sh
path: wireguard-setup.sh
{{- end }}

View File

@ -8,32 +8,12 @@ serviceCIDR: "10.96.0.0/12"
# on-prem IP ranges to expose to Constellation. Must contain at least one CIDR.
peerCIDRs: []
# The sections below configure the VPN connectivity to the Constellation
# cluster. Exactly one `enabled` must be set to true.
# IPSec configuration
ipsec:
enabled: false
# pre-shared key used for authentication
psk: ""
# Address of the peer's gateway router.
peer: ""
# Wireguard configuration
wireguard:
enabled: false
# If Wireguard is enabled, these fields for the Constellation side must be populated.
private_key: ""
peer_key: ""
# Listening port of the Constellation Wireguard.
port: 51820
# Optional host:port of the on-prem Wireguard.
endpoint: ""
# Optional interval for keep-alive packets in seconds. Setting this helps the on-prem server to
# discover a restarted Constellation VPN frontend.
keepAlive: ""
# required tools: sh nsenter ip pidof jq kubectl charon
image: "nixery.dev/shell/util-linux/iproute2/procps/jq/kubernetes/strongswan"