Application Logging in Kubernetes

Note: Throughout the article we will use the following terms interchangeably: log aggregator, log ingestion tool, log forwarder  – A tool that is able to read logs from a certain input and stream it to a log repository. e.g.  Logstash, Fluentd, FluentBit, Splunk Forwarder. log repository, centralized log storage – a component capable to store a large volume of log data. e.g.  ElasticSearch, Splunk. Container engine – an engine that handles running containers on a host. e.g.  dockerd, docker-engine

Most applications support the generation of log messages to track events, configuration changes, and errors. Often, these logs are stored in local files. When working with such log files, one has to consider how to maintain them to avoid running out of disk space (roll-over, max file size, number of log files to preserve), and, in the case of applications that have multiple instances, how to track events over multiple instances. These challenges call for externalizing and storing all log messages in a centralized repository, to allow for querying, alerting, and tracking of issues and activities reported by the application across all of its instances. 

Running applications in containers stresses the need for externalizing the logs because of the transient nature of containers and the dynamic nature of the containers environment, where scaling (creation and deletion of instances) can happen frequently.

In this article, we will see how to handle the collection and aggregation of the logs from applications running in containers on Kubernetes, into centralized log storage.

Kubernetes offers 3 ways for application logs to be exposed off of a container
(see:  Kubernetes Cluster Level Logging Architecture ):

  1. Use a node-level logging agent that runs on every node. It uses a Kubernetes/Docker feature that saves the application’s screen printouts to a file on the host machine.
    This is recommended and the most common facility of handling application logs and would be covered in more detail below.
  2. Use the side-car pattern in the application’s pod to run a Log Forwarder container together with the main application container. In this option, the Log Forwarder Sidecar container runs alongside the application container and sends the application log file content directly to the log storage (Elastic Search, Splunk, etc.). The Log Forwarder can be a Fluentd sidecar container that streams the file content to elastic-search, a Splunk Universal Forwarder that sends data to Splunk, Filebeat, etc.
  3. Push logs directly from within the application to a backend. This option requires the application to be aware of the centralized log storage and to use a special API to communicate and send log data to it.  This option may also cause the application to be directly dependent and coupled with the log storage and its availability. An example of this option can be adding support for sending log messages to Kafka to be handled and sent from there, or sending the log message directly using the log-storage API (like Splunk, ElasticSearch, etc.). 

In this article, we will concentrate on the first approach, using a node-level logging agent.

Using Node-Level Logging Agent

The Node-Level Logging Agent approach assumes that the containerized application prints all of its log messages to its (the container’s) console i.e. its standard output and standard error (stdout/stderr). It then uses the functionality of the container engine (e.g. Docker engine) to save the container level logging a certain folder on the host machine (node).

Collecting Logs on the Node

According to the 12 factors app methodology, an application should print its log messages to stdout/stderr. When running a Docker container, the Docker Engine captures and persists stdout and stderr of the container according to the configured logging driver.
Docker supports multiple docker logging drivers. The logging driver can be configured:

  1. Via a command-line option specified when starting the Docker daemon. These options can be set in a number of ways, for example, by adding –log-driver=journald  to the OPTIONS environment variable in /etc/sysconfig/docker
  2. By specifying the log driver in /etc/docker/daemon.json. This is the recommended way since it’s platform-independent. Note however that if the setting in the daemon.json file and the options on the dockerd command conflict, Docker will fail to start.
    Note: If daemon.json is not used, check whether dockerd is being initialized with –config-file option.

To determine the current logging driver, use the following command:

docker info | grep -i log

If no log driver is specified, the default log driver is json-file. It is recommended to use the json-file log driver for log aggregation in Kubernetes. This driver would save the logs in a JSON format on the host machine using the following directory structure: /var/lib/docker/containers/{ContainerId}.
Since saving the logs is done by the Docker engine, these files are named according to their container id without reference to their Kubernetes context, a fact that makes it difficult to identify their origin. Kubernetes resolves this by creating a symbolic link for each log that contains the pod name, namespace, container name, and Docker containerId as follows: /var/log/containers/{PodName}_{Namespace}-{ContainerName-ContainerId}.log –> /var/lib/docker/containers/{ContainerId}/{ContainerId}-json.log.

Log Rotation

It is important in node-level logging to make sure that log rotation is being handled. Log rotation is required so that logs don’t consume all available storage on the node. Kubernetes currently is not responsible for rotating logs. Log rotation can be handled by the docker json-file log driver by setting the max-file and max-size options.

e.g.
Set daemon.json with the following values:

{

  “log-driver”: “json-file”,

  “log-opts”: {

    “max-size”: “10m”,

    “max-file”: “3” 

  }

}

Or set sysconfig/docker OPTIONS variable to include “–log-opt max-size=50m –log-opt max-file=5‘”

This option would have the Docker log driver automatically rotate the logs when they reach the specified size, and remove old log file as needed.
Some Kubernetes deployment tools may already configure some scheduled task that takes care of log rotation. 

Note that since the kubectl logs command returns that content of these log files (as read and returned by the kubelet on the pod’s node) there may be a situation where kubectl logs return an empty output.

Forwarding Logs to a Centralized Log Repository

Once all of the containers’ logs are saved on their respective nodes, we will need to use a log forwarder to collect and send them to a centralized log-repository, for example, we can use a Splunk Forwarder to forward messages into Splunk, or Fluentd to send the log messages to ElasticSearch (the EFK stack – ElasticSearch, Fluentd, Kibanna).

Since Kubernetes can schedule out our application’s pod to run on any node in our cluster, we will need to have a Log Forwarder instance running on every node in the cluster. In order to guarantee that every node in our cluster is running the log forwarder, Kubernetes offers us a construct called “DaemonSet”. Unlike a Deployment where one can specify the desired number of instances (replicas) of a pod (but not on which node they are going to run), a DaemonSet will make sure that every node in the cluster runs an instance of its pod. Whenever a new machine joins the cluster, the DaemonSet pod will be scheduled to run on it. Therefore, by defining a log-forwarder DaemonSet we guarantee that each node in the Kubernetes cluster is going to have one running on it. Each instance of the log forwarder mounts a hostPath volume which points to the node’s logs folder and streams all of the logs into the centralized log repository.

What to do when my app doesn’t support writing logs to stdout:

For applications that write their logs into files and do not support writing their logs to stdout/stderr, or for applications that have multiple log files, there are a couple of options that allow sending the logs to stdout without making changes to the application.

  1. Use ln to create a symbolic link to stdout in place of the log file so that when the application writes to the file, it will actually write to stdout.
    For example, see nginx docker file:
    # forward request and error logs to docker log collector
    RUN ln -sf /dev/stdout /var/log/nginx/access.log \ &&
        ln -sf /dev/stderr /var/log/nginx/error.log

This approach may not always work, for example, if the application keeps using different log file names or renames the log file on start-up. This option is also not sufficient when the application has multiple log files that we want to expose.
Use a side-car container that will ‘tail -F’ the log file as it’s being written by the main application container. In order to accomplish this, a lightweight container should be added to the application’s pod definition, this pattern (when the main application container has a “helper” container that runs along with it) is called sidecar.
The application container and the sidecar container must be able to access the same log file. This is done by:

  1. defining a volume of type emptyDir in the application pod.
  2. mounting that volume into both the application container and the sidecar container (using volumeMounts).
  3. Make sure the application writes it log to that mounted volume folder. This way, when the application writes its log file to that location, the sidecar container will be able to read it. 

e.g.

  spec:
          containers:

    – name: mainApp

      image: mainApp:latest

      volumeMounts:

      – name: mainApp-logs-dir

        mountPath: /opt/mainApp/logs

    – name: mainApp-log

      image: log-collector:latest

      args: [/bin/bash, -c, ‘tail -n+1 -F /opt/mainApp/logs/app.log’]

      volumeMounts:

      – name: mainApp-logs-dir

        mountPath: /opt/mainApp/logs

  volumes:

  – name: mainApp-logs-dir

    emptyDir: {}

Note that with this approach, since the application would still write to a file, the application itself would have to take care of rolling the file after a certain size to avoid running out of space/memory (depending on the emptyDir medium).

An advantage of this approach is that in case the application has more than one type of logs that should be exposed, we would be able to create a side-car container for each log file. This would make sure that each log is handled separately.

Converting Traditional Log Output to JSON

Once the application logs go through the json-file Docker log driver, they show in /var/log/containers folder in the following format (JSON formatted with spaces and newlines for readability):

{

    “log”:”[02:36:43] – example message\n”,

    “stream”:”stdout”,

    “time”:”2019-09-24T02:36:43.777792885Z”

}

When aggregating logs from and different applications and different components, it is important that these logs would be sent in a unified format. Log forwarding tools such as Fluentd/FluentBit can be configured to transform a plaintext log message into JSON format (although multiline messages are still problematic, see https://github.com/fluent/fluent-bit/issues/1115).

Alternatively, plain log messages can be converted to JSON format on the container level, before being sent to stdout (to be collected by the container-engines log driver). This conversion can be done by passing the output of ‘tail’ through a conversion program that converts the plain log messages to JSON format.

For example, the command line to /bin/bash described above could be changed to:

tail -n+1 -F /opt/mainApp/logs/app.log | gawk -f /opt/mainApp/bin/log2json.awk

Where /bin/log2json.awk is a GAWK script that takes care of UTF-8 encoding, wrapping of multiline log entries, and conversion to JSON format. This approach should be used in case the log messages are in a complex format (including multiline log messages) and require custom transformation into a JSON format that takes care of the edge cases (see the section below for edge cases that may have to be handled). 

Note that with this approach, the application’s JSON log messages will be encapsulated and escaped by the container engine log driver into the log field. This will require the log forwarder to un-escape the JSON string. For example, in Fluent-bit this can be done using a parser’s decorator Decode_Field   json log parameter. 

Another enhancement to the log can be to identify the time field name and its format, to make sure the log repository stores the message with its original timestamp and not with the time that the message was sent to the container engine.

An example of such filer from fluent-bit can be seen below:

[PARSER]

        Name        myapp_parser
        #Time field example: \”date\”:\”Mon Jul 22 2019 21:58:41\”
     Time_Key    date

        Time_Format %a %b %d %Y %H:%M:%S

        Time_Keep   On

        Format      json

        # Command      | Decoder | Field | Optional Action

        # =============|==================|=================

        Decode_Field      json log

Considerations For Converting Traditional Log Output to JSON

UTF-8 Encoding

Most application logs tend to be in plain ASCII format. There may be instances, though, where applications log user-supplied data without proper encoding, leading to logging of characters that are not valid, well-formed UTF-8 byte sequences. In these cases, the invalid bytes must be encoded using the JSON “\uXXXX” format for each offending byte (where XXXX are four hexadecimal digits) so the result is a valid JSON string.

Encode or Strip Windows Carriage-Return Characters

Logs coming from Microsoft Windows systems or written in Microsoft Windows format end with carriage-return/newline pairs as opposed to UNIX-format lines that end only with a newline. A decision must be made whether the carriage-return characters are important and should be encoded, or if the carriage-return characters should be removed from the end of lines. We tend to prefer stripping over encoding but encoding might make more sense if the carriage-return is actually part of application protocol messages (like HTTP headers).

Multiline Wrapping

Some applications may emit a single log entry that spans multiple lines, e.g. a multiline XML response. These multiline messages must be recognized and formatted such that internal newlines are encoded with “\n” or “\u000a” character sequences instead. This usually requires an understanding of how new log entries begin and/or end, typically specified using a regular expression, to determine that either the next line starts a new log entry (using a beginning regular expression) or that a complete log entry has been read (using an ending regular expression). Tools that process multiline input based on a beginning regular expression should support some sort of read timeout so the last line of the log is not held in memory while the tool waits to observe the start of the next valid log entry.

References

  1. UTF-8, a transformation format of ISO 10646
    https://tools.ietf.org/html/rfc3629
  2. The JavaScript Object Notation (JSON) Data Interchange Format
    Section 8: String and Character Issues
    https://tools.ietf.org/html/rfc8259#section-8
  3. The Unicode Standard: Version 12.1.0 – Core Specification
    Chapter 3: Conformance, D92 UTF-8, Table 3-7. Well-Formed UTF-8 Byte Sequences
    https://www.unicode.org/versions/Unicode12.1.0/ch03.pdf#G27506
  4. tail(1) Manual Page
    http://man7.org/linux/man-pages/man1/tail.1.html

About the author

Uzi Cohen, CISSP, is a Consulting Architect at Broadcom with over 23 years of experience in Software Development and Security. Previously, Uzi was a team-lead, lead-engineer, and the software architect for SiteMinder. Currently, Uzi is working with customers to help them better utilize the Identity & Access Management portfolio products. In addition, Uzi is working with the development teams to further enhance these products.

Uzi has a Bachelor’s degree in Computer Science and holds patents in software design. He is an evangelist in the Containers and Kubernetes field and guides development teams on the enhancements of Broadcom's software solutions in that field.