dev-docs: Helm chart for full L3 VPN connectivity (#2620)

* dev-docs: add 'things to try' section to VPN howto

* dev-docs: full L3 connectivity in VPN chart
This commit is contained in:
Markus Rudy 2024-01-16 13:59:33 +01:00 committed by GitHub
parent 9181705299
commit 16c63d57cd
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
15 changed files with 242 additions and 246 deletions

View file

@ -2,35 +2,83 @@
This Helm chart deploys a VPN server to your Constellation cluster.
## Installation
## Prerequisites
1. Create and populate the configuration.
* Constellation >= v2.14.0
* A publicly routable VPN endpoint on premises that supports IPSec in IKEv2
tunnel mode with NAT traversal enabled.
* A list of on-prem CIDRs that should be reachable from Constellation.
## Setup
1. Configure Cilium to route services for the VPN (see [Architecture](#architecture) for details).
* Edit the Cilium config: `kubectl -n kube-system edit configmap cilium-config`.
* Set the config item `enable-sctp: "true"`.
* Restart the Cilium agents: `kubectl -n kube-system rollout restart daemonset/cilium`.
2. Create the Constellation VPN configuration file.
```sh
helm inspect values . >config.yaml
```
2. Install the Helm chart.
3. Populate the Constellation VPN configuration file. At least the following
need to be configured:
* The list of on-prem CIDRs (`peerCIDRs`).
* The `ipsec` subsection.
4. Install the Helm chart.
```sh
helm install -f config.yaml vpn .
```
3. Follow the post-installation instructions displayed by the CLI.
5. Configure the on-prem gateway with Constellation's pod and service CIDR
(see `config.yaml`).
## Things to try
Ask CoreDNS about its own service IP:
```sh
dig +notcp @10.96.0.10 kube-dns.kube-system.svc.cluster.local
```
Ask the Kubernetes API server about its wellbeing:
```sh
curl --insecure https://10.96.0.1:6443/healthz
```
Ping a pod:
```sh
ping $(kubectl get pods vpn-frontend-0 -o go-template --template '{{ .status.podIP }}')
```
## Architecture
The VPN server is deployed as a `StatefulSet` to the cluster. It hosts the VPN frontend component, which is responsible for relaying traffic between the pod and the on-prem network, and the routing components that provide access to Constellation resources. The frontend supports IPSec and Wireguard.
The VPN server is deployed as a `StatefulSet` to the cluster. It hosts the VPN
frontend component, which is responsible for relaying traffic between the pod
and the on-prem network over an IPSec tunnel.
The VPN frontend is exposed with a public LoadBalancer to be accessible from the on-prem network. Traffic that reaches the VPN server pod is split into two categories: pod IPs and service IPs.
The VPN frontend is exposed with a public LoadBalancer so that it becomes
accessible from the on-prem network.
The pod IP range is NATed with an iptables rule. On-prem worklaods can establish connections to a pod IP, but the Constellation workloads will see the client IP translated to that of the VPN frontend pod.
An init container sets up IP routes on the frontend host and inside the
frontend pod. All routes are bound to the frontend pod's lxc interface and thus
deleted together with it.
The service IP range is handed to a transparent proxy running in the VPN frontend pod, which relays the connection to a backend pod. This is necessary because of the load-balancing mechanism of Cilium, which assumes service IP traffic to originate from the Constellation cluster itself. As for pod IP ranges, Constellation pods will only see the translated client address.
A VPN operator deployment is added that configures the `CiliumEndpoint` with
on-prem IP ranges, thus configuring routes on non-frontend hosts. The endpoint
shares the frontend pod's lifecycle.
In Cilium's default configuration, service endpoints are resolved in cgroup
eBPF hooks that are not applicable to VPN traffic. We force Cilium to apply
service NAT at the LXC interface by enabling SCTP support.
## Limitations
* Service IPs need to be proxied by the VPN frontend pod. This is a single point of failure, and it may become a bottleneck.
* IPs are NATed, so the Constellation pods won't see the real on-prem IPs.
* NetworkPolicy can't be applied selectively to the on-prem ranges.
* No connectivity from Constellation to on-prem workloads.
* VPN traffic is handled by a single pod, which may become a bottleneck.
* Frontend pod restarts / migrations invalidate IPSec connections.
* Only pre-shared key authentication is supported.