* add Metricbeat deployment to debugd Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com> * set metricbeat debugd image version Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com> * fix k8s deployment Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com> * use 2 separate deployments Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com> * only deploy via k8s in non-debug-images Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com> * add missing tilde * remove k8s metrics Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com> * unify flag Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com> * add cloud metadata processor to filebeat Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com> * ci: fix debugd logcollection (#2355) * add missing keyvault access role Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com> * bump logstash image version Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com> * bump filebeat / metricbeat image version Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com> * log used image version Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com> * use debugging image versions Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com> * increase wait timeout for image upload * add cloud metadata processor to filebeat Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com> * fix template locations in container Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com> * fix image version typo Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com> * add filebeat / metricbeat users Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com> * remove user additions Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com> * update workflow step name Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com> * only mount config files Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com> * document potential rc Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com> * fix IAM permissions in workflow Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com> * fix AWS permissions Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com> * tidy Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com> * add missing workflow input Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com> * rename action Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com> * pin image versions Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com> * remove unnecessary workflow inputs Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com> --------- Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com> * add refStream input Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com> * remove inputs.yml dep Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com> * increase system metric period Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com> * fix linkchecker Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com> --------- Signed-off-by: Moritz Sanft <58110325+msanft@users.noreply.github.com>
4.1 KiB
Logcollection
One can deploy Filebeat and Logstash to enable collection of logs to OpenSearch, which allows for agreggation and easy inspection of said logs. The logcollection functionality can be deployed to both debug and non-debug clusters.
Deployment in Debug Clusters
In debug clusters, logcollection functionality should be deployed automatically through the debug daemon debugd
, which runs before the bootstrapper
and can therefore, contrary to non-debug clusters, also collect logs of the bootstrapper.
Warning
If logs from a E2E test run for a debug-cluster with a bootstrapping-failure are missing in OpenSearch, this might be caused by a race condition between the termination of the cluster and the start-up of the logcollection containers in the debugd. If the failure can be reproduced manually, it is best to do so and observe the serial console of the bootstrapping node with the following command until the logcollection containers have started.
journalctl _SYSTEMD_UNIT=debugd.service | grep > logcollect
Deployment in Non-Debug Clusters
In non-debug clusters, logcollection functionality needs to be explicitly deployed as a Kubernetes Deployment through Helm. To do that, a few steps need to be followed:
-
Template the deployment configuration through the
loco
CLI.bazel run //hack/logcollector template -- \ --dir $(realpath .) \ --username <OPENSEARCH_USERNAME> \ --password <OPENSEARCH_PW> \ --info deployment-type={k8s, debugd} ...
This will place the templated configuration in the current directory. OpenSearch user credentials can be created by any admin in OpenSearch. Logging in with your company CSP accounts should grant you sufficient permissions to create a user and grant him the required
all_access
role. One can add additional key-value pairs to the configuration by appending--info key=value
to the command. These key-value pairs will be attached to the log entries and can be used to filter them in OpenSearch. For example, it might be helpful to add atest=<xyz>
tag to be able to filter out logs from a specific test run. -
Add the Elastic Helm repository
helm repo add elastic https://helm.elastic.co helm repo update
-
Deploy Logstash
cd logstash helm install logstash elastic/logstash \ --wait --timeout=1200s --values values.yml cd ..
This will add the required Logstash Helm charts and deploy them to your cluster.
-
Deploy Beats
cd metricbeat helm install metricbeat-k8s elastic/metricbeat \ --wait --timeout=1200s --values values-control-plane.yml helm install metricbeat-system elastic/metricbeat \ --wait --timeout=1200s --values values-all-nodes.yml cd .. cd filebeat helm install filebeat elastic/filebeat \ --wait --timeout=1200s --values values.yml cd ..
This will add the required Filebeat and Metricbeat Helm charts and deploy them to your cluster.
To remove Logstash or one of the beats, cd
into the corresponding directory and run helm uninstall {logstash,filebeat,metricbeat}
.
Inspecting Logs in OpenSearch
To search through logs in OpenSearch, head to the discover page in the
OpenSearch dashboard and configure the timeframe selector in the top right accordingly.
Click Refresh
. You can now see all logs recorded in the specified timeframe. To get a less cluttered view, select the fields you want to inspect in the left sidebar.