constellation/dev-docs/workflows/logcollection.md
Moritz Sanft f4b2d02194
ci: collect cluster metrics to OpenSearch (#2347)
* add Metricbeat deployment to debugd

Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com>

* set metricbeat debugd image version

Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com>

* fix k8s deployment

Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com>

* use 2 separate deployments

Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com>

* only deploy via k8s in non-debug-images

Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com>

* add missing tilde

* remove k8s metrics

Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com>

* unify flag

Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com>

* add cloud metadata processor to filebeat

Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com>

* ci: fix debugd logcollection (#2355)

* add missing keyvault access role

Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com>

* bump logstash image version

Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com>

* bump filebeat / metricbeat image version

Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com>

* log used image version

Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com>

* use debugging image versions

Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com>

* increase wait timeout for image upload

* add cloud metadata processor to filebeat

Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com>

* fix template locations in container

Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com>

* fix image version typo

Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com>

* add filebeat / metricbeat users

Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com>

* remove user additions

Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com>

* update workflow step name

Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com>

* only mount config files

Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com>

* document potential rc

Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com>

* fix IAM permissions in workflow

Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com>

* fix AWS permissions

Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com>

* tidy

Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com>

* add missing workflow input

Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com>

* rename action

Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com>

* pin image versions

Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com>

* remove unnecessary workflow inputs

Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com>

---------

Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com>

* add refStream input

Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com>

* remove inputs.yml dep

Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com>

* increase system metric period

Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com>

* fix linkchecker

Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com>

---------

Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com>
2023-09-27 16:17:31 +02:00

4.1 KiB

Logcollection

One can deploy Filebeat and Logstash to enable collection of logs to OpenSearch, which allows for agreggation and easy inspection of said logs. The logcollection functionality can be deployed to both debug and non-debug clusters.

Deployment in Debug Clusters

In debug clusters, logcollection functionality should be deployed automatically through the debug daemon debugd, which runs before the bootstrapper and can therefore, contrary to non-debug clusters, also collect logs of the bootstrapper.

Warning

If logs from a E2E test run for a debug-cluster with a bootstrapping-failure are missing in OpenSearch, this might be caused by a race condition between the termination of the cluster and the start-up of the logcollection containers in the debugd. If the failure can be reproduced manually, it is best to do so and observe the serial console of the bootstrapping node with the following command until the logcollection containers have started.

journalctl _SYSTEMD_UNIT=debugd.service | grep > logcollect

Deployment in Non-Debug Clusters

In non-debug clusters, logcollection functionality needs to be explicitly deployed as a Kubernetes Deployment through Helm. To do that, a few steps need to be followed:

  1. Template the deployment configuration through the loco CLI.

    bazel run //hack/logcollector template -- \
        --dir $(realpath .) \
        --username <OPENSEARCH_USERNAME> \
        --password <OPENSEARCH_PW> \
        --info deployment-type={k8s, debugd}
        ...
    

    This will place the templated configuration in the current directory. OpenSearch user credentials can be created by any admin in OpenSearch. Logging in with your company CSP accounts should grant you sufficient permissions to create a user and grant him the required all_access role. One can add additional key-value pairs to the configuration by appending --info key=value to the command. These key-value pairs will be attached to the log entries and can be used to filter them in OpenSearch. For example, it might be helpful to add a test=<xyz> tag to be able to filter out logs from a specific test run.

  2. Add the Elastic Helm repository

    helm repo add elastic https://helm.elastic.co
    helm repo update
    
  3. Deploy Logstash

    cd logstash
    helm install logstash elastic/logstash \
        --wait --timeout=1200s --values values.yml
    cd ..
    

    This will add the required Logstash Helm charts and deploy them to your cluster.

  4. Deploy Beats

    cd metricbeat
    helm install metricbeat-k8s elastic/metricbeat \
        --wait --timeout=1200s --values values-control-plane.yml
    helm install metricbeat-system elastic/metricbeat \
        --wait --timeout=1200s --values values-all-nodes.yml
    cd ..
    cd filebeat
    helm install filebeat elastic/filebeat \
        --wait --timeout=1200s --values values.yml
    cd ..
    

    This will add the required Filebeat and Metricbeat Helm charts and deploy them to your cluster.

To remove Logstash or one of the beats, cd into the corresponding directory and run helm uninstall {logstash,filebeat,metricbeat}.

Inspecting Logs in OpenSearch

To search through logs in OpenSearch, head to the discover page in the OpenSearch dashboard and configure the timeframe selector in the top right accordingly. Click Refresh. You can now see all logs recorded in the specified timeframe. To get a less cluttered view, select the fields you want to inspect in the left sidebar.