Friday, June 11, 2021

Send Journald to CloudWatch Logs with Vector

Timber's Vector log collection tool is a nifty Swiss Army knife for collecting and shipping logs and metrics from one system to another. In particular, I think it's the best tool for shipping structured journald events to CloudWatch Logs.

Here's how to start using Vector to send journald log events to CloudWatch:

Grant Permissions to EC2 Roles

In order to push logs (or metrics) from your EC2 instances to CloudWatch, you first need to grant those EC2 instances some CloudWatch permissions. The permissions you need are basically the same as the AWS CloudWatch Agent needs, so just follow the Create IAM roles and users for use with the CloudWatch agent tutorial to assign the AWS-managed CloudWatchAgentServerPolicy to the IAM roles of the EC2 instances from which you plan on shipping journald logs.

The current version of the CloudWatchAgentServerPolicy looks like this:

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "cloudwatch:PutMetricData", "ec2:DescribeVolumes", "ec2:DescribeTags", "logs:PutLogEvents", "logs:DescribeLogStreams", "logs:DescribeLogGroups", "logs:CreateLogStream", "logs:CreateLogGroup" ], "Resource": "*" }, { "Effect": "Allow", "Action": [ "ssm:GetParameter" ], "Resource": "arn:aws:ssm:*:*:parameter/AmazonCloudWatch-*" } ] }

With the Vector configuration described below, however, you actually only need to grant the logs:PutLogEvents, logs:DescribeLogStreams, logs:DescribeLogGroups, logs:CreateLogStream, and logs:CreateLogGroup permissions to your EC2 roles.

Install Vector

Installing Vector is easy on Linux. Timber maintains their own deb repo for Vector, so on a Debian-based distro like Ubuntu, you can just update the system's APT package manager with the Vector signing-key and repo, and install the Vector package:

$ wget https://repositories.timber.io/public/vector/gpg.3543DB2D0A2BC4B8.key -O - | sudo apt-key add - $ cat <<EOF | sudo tee /etc/apt/sources.list.d/timber-vector.list deb https://repositories.timber.io/public/vector/deb/ubuntu focal main deb-src https://repositories.timber.io/public/vector/deb/ubuntu focal main EOF $ sudo apt update $ sudo apt install vector

Configure Vector

The default Vector config file, located at /etc/vector/vector.toml, just includes a sample source and sink, so you can replace it entirely with your own config settings. This is the minimum you need to ship journald logs to CloudWatch:

[sources.my_journald_source] type = "journald" [sinks.my_cloudwatch_sink] type = "aws_cloudwatch_logs" inputs = ["my_journald_source"] compression = "gzip" encoding.codec = "json" region = "us-east-1" group_name = "myenv" stream_name = "mysite/myhost"

Replace the CloudWatch region, group_name, and stream_name settings above with whatever's appropriate for your EC2 instances.

Restart Vector

In one terminal screen, watch for errors by tailing Vector's own log entries with the journalctl -u vector -f command, and in another terminal restart Vector with the sudo systemctl restart vector command. If everything works, this is what you'll see in Vector's own logs:

$ journalctl -u vector -f Jun 11 19:54:02 myhost systemd[1]: Started Vector. Jun 11 19:54:03 myhost vector[686208]: Jun 11 19:54:03.008 INFO vector::app: Log level is enabled. level="vector=info,codec=info,vrl=info,file_source=info,tower_limit=trace,rdkafka=info" Jun 11 19:54:03 myhost vector[686208]: Jun 11 19:54:03.008 INFO vector::sources::host_metrics: PROCFS_ROOT is unset. Using default '/proc' for procfs root. Jun 11 19:54:03 myhost vector[686208]: Jun 11 19:54:03.008 INFO vector::sources::host_metrics: SYSFS_ROOT is unset. Using default '/sys' for sysfs root. Jun 11 19:54:03 myhost vector[686208]: Jun 11 19:54:03.010 INFO vector::app: Loading configs. path=[("/etc/vector/vector.toml", Some(Toml))] Jun 11 19:54:03 myhost vector[686208]: Jun 11 19:54:03.060 INFO vector::topology: Running healthchecks. Jun 11 19:54:03 myhost vector[686208]: Jun 11 19:54:03.060 INFO vector::topology: Starting source. name="journald" Jun 11 19:54:03 myhost vector[686208]: Jun 11 19:54:03.061 INFO vector::topology: Starting sink. name="aws_cloudwatch_logs" Jun 11 19:54:03 myhost vector[686208]: Jun 11 19:54:03.061 INFO vector: Vector has started. version="0.14.0" arch="x86_64" build_id="5f3a319 2021-06-03" Jun 11 19:54:03 myhost vector[686208]: Jun 11 19:54:03.062 INFO vector::app: API is disabled, enable by setting `api.enabled` to `true` and use commands like `vector top`. Jun 11 19:54:03 myhost vector[686208]: Jun 11 19:54:03.063 INFO journald-server: vector::sources::journald: Starting journalctl. Jun 11 19:54:03 myhost vector[686208]: Jun 11 19:54:03.128 INFO vector::sinks::aws_cloudwatch_logs: Skipping healthcheck log group check: `group_name` will be created if missing. Jun 11 19:55:04 myhost vector[686208]: Jun 11 19:55:04.430 INFO sink{component_kind="sink" component_name=aws_cloudwatch_logs component_type=aws_cloudwatch_logs}:request{request_id=0}: vector::sinks::aws_cloudwatch_logs: Sending events. events=4 Jun 11 19:55:04 myhost vector[686208]: Jun 11 19:55:04.453 INFO sink{component_kind="sink" component_name=aws_cloudwatch_logs component_type=aws_cloudwatch_logs}:request{request_id=0}: vector::sinks::aws_cloudwatch_logs::request: Log group provided does not exist; creating a new one. Jun 11 19:55:04 myhost vector[686208]: Jun 11 19:55:04.489 INFO sink{component_kind="sink" component_name=aws_cloudwatch_logs component_type=aws_cloudwatch_logs}:request{request_id=0}: vector::sinks::aws_cloudwatch_logs::request: Group created. name=myenv Jun 11 19:55:04 myhost vector[686208]: Jun 11 19:55:04.507 INFO sink{component_kind="sink" component_name=aws_cloudwatch_logs component_type=aws_cloudwatch_logs}:request{request_id=0}: vector::sinks::aws_cloudwatch_logs::request: Stream created. name=mysite/myhost Jun 11 19:55:04 myhost vector[686208]: Jun 11 19:55:04.523 INFO sink{component_kind="sink" component_name=aws_cloudwatch_logs component_type=aws_cloudwatch_logs}:request{request_id=0}: vector::sinks::aws_cloudwatch_logs::request: Putting logs. token=None Jun 11 19:55:04 myhost vector[686208]: Jun 11 19:55:04.560 INFO sink{component_kind="sink" component_name=aws_cloudwatch_logs component_type=aws_cloudwatch_logs}:request{request_id=0}: vector::sinks::aws_cloudwatch_logs::request: Putting logs was successful. next_token=Some("49610241853835534178700884863462197886393926766970915618")

If something went wrong, Vector will output some error messages (these are especially helpful as you add transformation steps to your basic Vector configuration).

Check Your CloudWatch Logs

Vector will have also shipped some logs to CloudWatch, so check them now. If you use a command-line tool like Saw, you'll see some log events like this:

$ saw watch myenv --expand --prefix mysite [2021-06-11T13:00:27-07:00] (myhost) { "PRIORITY": "6", "SYSLOG_FACILITY": "3", "SYSLOG_IDENTIFIER": "uwsgi", "_BOOT_ID": "6cb87d254d3742728b4fe20e746bcbe6", "_CAP_EFFECTIVE": "0", "_CMDLINE": "/usr/bin/uwsgi /etc/myapp/uwsgi.ini", "_COMM": "uwsgi", "_EXE": "/usr/bin/uwsgi-core", "_GID": "33", "_MACHINE_ID": "ec2aff1204bfae2781faf97e68afb1d4", "_PID": "363", "_SELINUX_CONTEXT": "unconfined\n", "_STREAM_ID": "aa261772c2e74663a7bb122c24b92e64", "_SYSTEMD_CGROUP": "/system.slice/myapp.service", "_SYSTEMD_INVOCATION_ID": "b5e117501bbb43428ab7565659022c20", "_SYSTEMD_SLICE": "system.slice", "_SYSTEMD_UNIT": "myapp.service", "_TRANSPORT": "stdout", "_UID": "33", "__MONOTONIC_TIMESTAMP": "511441719050", "__REALTIME_TIMESTAMP": "1623441627906124", "host": "myhost", "message": "[pid: 363|app: 0|req: 501/501] 203.0.113.2 () {34 vars in 377 bytes} [Fri Jun 11 20:00:27 2021] HEAD / =< generated 0 bytes in 0 msecs (HTTP/1.1 200) 2 headers in 78 bytes (0 switches on core 0)", "source_type": "journald" }

With Saw, use the saw watch command to tail log events as they come in, and use the saw get command to get historical events. For example, this command will print the last 10 minutes of events using the mysite log stream prefix from the myenv log group:

$ saw get myenv --expand --pretty --prefix mysite --start -10m

Filter and Remap Your Logs

With that working, you can tune your Vector configuration to filter out log events you don't care about, and remap certain log fields into a more useful format. Let's add two "transform" steps to our /etc/vector/vector.toml file between the Journald Source and the AWS CloudWatch Logs Sink: a Filter transform, and a Remap transform:

[sources.my_journald_source] type = "journald" [transforms.my_journald_filter] type = "filter" inputs = ["my_journald_source"] condition = ''' (includes(["0", "1", "2", "3", "4"], .PRIORITY) || includes(["systemd", "uwsgi"], .SYSLOG_IDENTIFIER)) ''' [transforms.my_journald_remap] type = "remap" inputs = ["my_journald_filter"] source = ''' .app = .SYSLOG_IDENTIFIER .datetime = to_timestamp(round((to_int(.__REALTIME_TIMESTAMP) ?? 0) / 1000000 ?? 0)) .facility = to_syslog_facility(to_int(.SYSLOG_FACILITY) ?? 0) ?? "" .severity = to_int(.PRIORITY) ?? 0 .level = to_syslog_level(.severity) ?? "" ''' [sinks.my_cloudwatch_sink] type = "aws_cloudwatch_logs" inputs = ["my_journald_filter"] compression = "gzip" encoding.codec = "json" region = "us-east-1" group_name = "myenv" stream_name = "mysite/myhost"

In the above pipeline, the my_journald_source step pipes to the my_journald_transform step, which pipes to the my_journald_transform step, which pipes to the my_cloudwatch_sink step (configured via the inputs setting of each receiving step). The condition VRL expression in the filter step drops entries unless the entry's PRIORITY field is less than 5 (aka "emerg", "alert", "crit", "err", and "warning"), or unless the entry's SYSLOG_IDENTITY field is "systemd" or "uwsgi". And the source VRL program in the remap step adds some additional conveniently-formatted fields (app datetime facility severity and level) to each log entry (the ?? operator in the source coerces "fallible" expressions to a default value when they would otherwise throw an error).

Now if you restart Vector and check your CloudWatch logs, you'll see fewer unimportant entries (those with lower priorities or uninteresting sources that we filtered), plus some additional fields that we added:

$ saw watch myenv --expand --prefix mysite [2021-06-11T13:00:27-07:00] (myhost) { "PRIORITY": "6", "SYSLOG_FACILITY": "3", "SYSLOG_IDENTIFIER": "uwsgi", "_BOOT_ID": "6cb87d254d3742728b4fe20e746bcbe6", "_CAP_EFFECTIVE": "0", "_CMDLINE": "/usr/bin/uwsgi /etc/myapp/uwsgi.ini", "_COMM": "uwsgi", "_EXE": "/usr/bin/uwsgi-core", "_GID": "33", "_MACHINE_ID": "ec2aff1204bfae2781faf97e68afb1d4", "_PID": "363", "_SELINUX_CONTEXT": "unconfined\n", "_STREAM_ID": "aa261772c2e74663a7bb122c24b92e64", "_SYSTEMD_CGROUP": "/system.slice/myapp.service", "_SYSTEMD_INVOCATION_ID": "b5e117501bbb43428ab7565659022c20", "_SYSTEMD_SLICE": "system.slice", "_SYSTEMD_UNIT": "myapp.service", "_TRANSPORT": "stdout", "_UID": "33", "__MONOTONIC_TIMESTAMP": "511441719050", "__REALTIME_TIMESTAMP": "1623441627906124", "app": "uwsgi", "datetime": "2021-06-11T20:00:27Z", "facility": "daemon", "host": "myhost", "level": "info", "message": "[pid: 363|app: 0|req: 501/501] 203.0.113.2 () {34 vars in 377 bytes} [Fri Jun 11 20:00:27 2021] HEAD / =< generated 0 bytes in 0 msecs (HTTP/1.1 200) 2 headers in 78 bytes (0 switches on core 0)", "severity": 6, "source_type": "journald" }

And we can use the new fields we added to further filter our output from Saw, as well as print compact log lines with jq:

$ saw watch myenv --raw --prefix mysite --filter '{ $.severity < 4 || $.app = "uwsgi" }' | jq --unbuffered -r '[.datetime, .level, .host, .app, .message] | join(" ")' 2021-06-11T20:00:27Z info myhost uwsgi [pid: 363|app: 0|req: 501/501] 203.0.113.2 () {34 vars in 377 bytes} [Fri Jun 11 20:00:27 2021] HEAD / =< generated 0 bytes in 0 msecs (HTTP/1.1 200) 2 headers in 78 bytes (0 switches on core 0)

Remove Irrelevant Fields

You can also use Vector's remap filter to remove extraneous fields that you don't want to ship to and store in CloudWatch. You can use the del function to delete specific fields from each event — for example, to skip the journald fields which duplicate the custom fields we added:

source = ''' .app = .SYSLOG_IDENTIFIER .datetime = to_timestamp(round((to_int(.__REALTIME_TIMESTAMP) ?? 0) / 1000000 ?? 0)) .facility = to_syslog_facility(to_int(.SYSLOG_FACILITY) ?? 0) ?? "" .severity = to_int(.PRIORITY) ?? 0 .level = to_syslog_level(.severity) ?? "" del(.PRIORITY) del(.SYSLOG_IDENTIFIER) del(.SYSLOG_FACILITY) '''

Or you could replace the original event entirely with a new object that contains just your desired fields:

source = ''' e = {} e.app = .SYSLOG_IDENTIFIER e.cgroup = ._SYSTEMD_CGROUP e.cmd = ._CMDLINE e.facility = to_int(.SYSLOG_FACILITY) ?? 0 e.gid = to_int(._GID) ?? 0 e.host = .host e.message = .message e.monotime = to_int(.__MONOTONIC_TIMESTAMP) ?? 0 e.pid = to_int(._PID) ?? 0 e.realtime = to_int(.__REALTIME_TIMESTAMP) ?? 0 e.datetime = to_timestamp(round(e.realtime / 1000000 ?? 0)) e.severity = to_int(.PRIORITY) ?? 0 e.level = to_syslog_level(e.severity) ?? "" e.uid = to_int(._UID) ?? 0 . = [e] '''

If you change your Vector pipeline to remap events like the above and restart it, you'll now see log events with only the following fields shipped to CloudWatch:

$ saw watch myenv --expand --prefix mysite [2021-06-11T13:00:27-07:00] (myhost) { "app": "uwsgi", "cgroup": "/system.slice/myapp.service", "cmd": "/usr/bin/uwsgi /etc/myapp/uwsgi.ini", "datetime": "2021-06-11T20:00:27Z", "facility": 3, "gid": 33, "host": "myhost", "level": "info", "message": "[pid: 363|app: 0|req: 501/501] 203.0.113.2 () {34 vars in 377 bytes} [Fri Jun 11 20:00:27 2021] HEAD / =< generated 0 bytes in 0 msecs (HTTP/1.1 200) 2 headers in 78 bytes (0 switches on core 0)", "monotime": 511441719050, "pid": 363, "realtime": 1623441627906124, "severity": 6, "uid": 33 }


Edit 4/23/2022: As of Vector 0.21.1, the rounding shown in the to_timestamp examples is no longer fallible — but the to_timestamp function itself is. So the to_timestamp examples should now look like the following:

e.datetime = to_timestamp(round(e.realtime / 1000000)) ?? now()

No comments:

Post a Comment