Friday, June 11, 2021

Send Journald to CloudWatch Logs with Vector

Timber's Vector log collection tool is a nifty Swiss Army knife for collecting and shipping logs and metrics from one system to another. In particular, I think it's the best tool for shipping structured journald events to CloudWatch Logs.

Here's how to start using Vector to send journald log events to CloudWatch:

Grant Permissions to EC2 Roles

In order to push logs (or metrics) from your EC2 instances to CloudWatch, you first need to grant those EC2 instances some CloudWatch permissions. The permissions you need are basically the same as the AWS CloudWatch Agent needs, so just follow the Create IAM roles and users for use with the CloudWatch agent tutorial to assign the AWS-managed CloudWatchAgentServerPolicy to the IAM roles of the EC2 instances from which you plan on shipping journald logs.

The current version of the CloudWatchAgentServerPolicy looks like this:

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "cloudwatch:PutMetricData", "ec2:DescribeVolumes", "ec2:DescribeTags", "logs:PutLogEvents", "logs:DescribeLogStreams", "logs:DescribeLogGroups", "logs:CreateLogStream", "logs:CreateLogGroup" ], "Resource": "*" }, { "Effect": "Allow", "Action": [ "ssm:GetParameter" ], "Resource": "arn:aws:ssm:*:*:parameter/AmazonCloudWatch-*" } ] }

With the Vector configuration described below, however, you actually only need to grant the logs:PutLogEvents, logs:DescribeLogStreams, logs:DescribeLogGroups, logs:CreateLogStream, and logs:CreateLogGroup permissions to your EC2 roles.

Install Vector

Installing Vector is easy on Linux. Timber maintains their own deb repo for Vector, so on a Debian-based distro like Ubuntu, you can just update the system's APT package manager with the Vector signing-key and repo, and install the Vector package:

$ wget https://repositories.timber.io/public/vector/gpg.3543DB2D0A2BC4B8.key -O - | sudo apt-key add - $ cat <<EOF | sudo tee /etc/apt/sources.list.d/timber-vector.list deb https://repositories.timber.io/public/vector/deb/ubuntu focal main deb-src https://repositories.timber.io/public/vector/deb/ubuntu focal main EOF $ sudo apt update $ sudo apt install vector

Configure Vector

The default Vector config file, located at /etc/vector/vector.toml, just includes a sample source and sink, so you can replace it entirely with your own config settings. This is the minimum you need to ship journald logs to CloudWatch:

[sources.my_journald_source] type = "journald" [sinks.my_cloudwatch_sink] type = "aws_cloudwatch_logs" inputs = ["my_journald_source"] compression = "gzip" encoding.codec = "json" region = "us-east-1" group_name = "myenv" stream_name = "mysite/myhost"

Replace the CloudWatch region, group_name, and stream_name settings above with whatever's appropriate for your EC2 instances.

Restart Vector

In one terminal screen, watch for errors by tailing Vector's own log entries with the journalctl -u vector -f command, and in another terminal restart Vector with the sudo systemctl restart vector command. If everything works, this is what you'll see in Vector's own logs:

$ journalctl -u vector -f Jun 11 19:54:02 myhost systemd[1]: Started Vector. Jun 11 19:54:03 myhost vector[686208]: Jun 11 19:54:03.008 INFO vector::app: Log level is enabled. level="vector=info,codec=info,vrl=info,file_source=info,tower_limit=trace,rdkafka=info" Jun 11 19:54:03 myhost vector[686208]: Jun 11 19:54:03.008 INFO vector::sources::host_metrics: PROCFS_ROOT is unset. Using default '/proc' for procfs root. Jun 11 19:54:03 myhost vector[686208]: Jun 11 19:54:03.008 INFO vector::sources::host_metrics: SYSFS_ROOT is unset. Using default '/sys' for sysfs root. Jun 11 19:54:03 myhost vector[686208]: Jun 11 19:54:03.010 INFO vector::app: Loading configs. path=[("/etc/vector/vector.toml", Some(Toml))] Jun 11 19:54:03 myhost vector[686208]: Jun 11 19:54:03.060 INFO vector::topology: Running healthchecks. Jun 11 19:54:03 myhost vector[686208]: Jun 11 19:54:03.060 INFO vector::topology: Starting source. name="journald" Jun 11 19:54:03 myhost vector[686208]: Jun 11 19:54:03.061 INFO vector::topology: Starting sink. name="aws_cloudwatch_logs" Jun 11 19:54:03 myhost vector[686208]: Jun 11 19:54:03.061 INFO vector: Vector has started. version="0.14.0" arch="x86_64" build_id="5f3a319 2021-06-03" Jun 11 19:54:03 myhost vector[686208]: Jun 11 19:54:03.062 INFO vector::app: API is disabled, enable by setting `api.enabled` to `true` and use commands like `vector top`. Jun 11 19:54:03 myhost vector[686208]: Jun 11 19:54:03.063 INFO journald-server: vector::sources::journald: Starting journalctl. Jun 11 19:54:03 myhost vector[686208]: Jun 11 19:54:03.128 INFO vector::sinks::aws_cloudwatch_logs: Skipping healthcheck log group check: `group_name` will be created if missing. Jun 11 19:55:04 myhost vector[686208]: Jun 11 19:55:04.430 INFO sink{component_kind="sink" component_name=aws_cloudwatch_logs component_type=aws_cloudwatch_logs}:request{request_id=0}: vector::sinks::aws_cloudwatch_logs: Sending events. events=4 Jun 11 19:55:04 myhost vector[686208]: Jun 11 19:55:04.453 INFO sink{component_kind="sink" component_name=aws_cloudwatch_logs component_type=aws_cloudwatch_logs}:request{request_id=0}: vector::sinks::aws_cloudwatch_logs::request: Log group provided does not exist; creating a new one. Jun 11 19:55:04 myhost vector[686208]: Jun 11 19:55:04.489 INFO sink{component_kind="sink" component_name=aws_cloudwatch_logs component_type=aws_cloudwatch_logs}:request{request_id=0}: vector::sinks::aws_cloudwatch_logs::request: Group created. name=myenv Jun 11 19:55:04 myhost vector[686208]: Jun 11 19:55:04.507 INFO sink{component_kind="sink" component_name=aws_cloudwatch_logs component_type=aws_cloudwatch_logs}:request{request_id=0}: vector::sinks::aws_cloudwatch_logs::request: Stream created. name=mysite/myhost Jun 11 19:55:04 myhost vector[686208]: Jun 11 19:55:04.523 INFO sink{component_kind="sink" component_name=aws_cloudwatch_logs component_type=aws_cloudwatch_logs}:request{request_id=0}: vector::sinks::aws_cloudwatch_logs::request: Putting logs. token=None Jun 11 19:55:04 myhost vector[686208]: Jun 11 19:55:04.560 INFO sink{component_kind="sink" component_name=aws_cloudwatch_logs component_type=aws_cloudwatch_logs}:request{request_id=0}: vector::sinks::aws_cloudwatch_logs::request: Putting logs was successful. next_token=Some("49610241853835534178700884863462197886393926766970915618")

If something went wrong, Vector will output some error messages (these are especially helpful as you add transformation steps to your basic Vector configuration).

Check Your CloudWatch Logs

Vector will have also shipped some logs to CloudWatch, so check them now. If you use a command-line tool like Saw, you'll see some log events like this:

$ saw watch myenv --expand --prefix mysite [2021-06-11T13:00:27-07:00] (myhost) { "PRIORITY": "6", "SYSLOG_FACILITY": "3", "SYSLOG_IDENTIFIER": "uwsgi", "_BOOT_ID": "6cb87d254d3742728b4fe20e746bcbe6", "_CAP_EFFECTIVE": "0", "_CMDLINE": "/usr/bin/uwsgi /etc/myapp/uwsgi.ini", "_COMM": "uwsgi", "_EXE": "/usr/bin/uwsgi-core", "_GID": "33", "_MACHINE_ID": "ec2aff1204bfae2781faf97e68afb1d4", "_PID": "363", "_SELINUX_CONTEXT": "unconfined\n", "_STREAM_ID": "aa261772c2e74663a7bb122c24b92e64", "_SYSTEMD_CGROUP": "/system.slice/myapp.service", "_SYSTEMD_INVOCATION_ID": "b5e117501bbb43428ab7565659022c20", "_SYSTEMD_SLICE": "system.slice", "_SYSTEMD_UNIT": "myapp.service", "_TRANSPORT": "stdout", "_UID": "33", "__MONOTONIC_TIMESTAMP": "511441719050", "__REALTIME_TIMESTAMP": "1623441627906124", "host": "myhost", "message": "[pid: 363|app: 0|req: 501/501] 203.0.113.2 () {34 vars in 377 bytes} [Fri Jun 11 20:00:27 2021] HEAD / =< generated 0 bytes in 0 msecs (HTTP/1.1 200) 2 headers in 78 bytes (0 switches on core 0)", "source_type": "journald" }

With Saw, use the saw watch command to tail log events as they come in, and use the saw get command to get historical events. For example, this command will print the last 10 minutes of events using the mysite log stream prefix from the myenv log group:

$ saw get myenv --expand --pretty --prefix mysite --start -10m

Filter and Remap Your Logs

With that working, you can tune your Vector configuration to filter out log events you don't care about, and remap certain log fields into a more useful format. Let's add two "transform" steps to our /etc/vector/vector.toml file between the Journald Source and the AWS CloudWatch Logs Sink: a Filter transform, and a Remap transform:

[sources.my_journald_source] type = "journald" [transforms.my_journald_filter] type = "filter" inputs = ["my_journald_source"] condition = ''' (includes(["0", "1", "2", "3", "4"], .PRIORITY) || includes(["systemd", "uwsgi"], .SYSLOG_IDENTIFIER)) ''' [transforms.my_journald_remap] type = "remap" inputs = ["my_journald_filter"] source = ''' .app = .SYSLOG_IDENTIFIER .datetime = to_timestamp(round((to_int(.__REALTIME_TIMESTAMP) ?? 0) / 1000000 ?? 0)) .facility = to_syslog_facility(to_int(.SYSLOG_FACILITY) ?? 0) ?? "" .severity = to_int(.PRIORITY) ?? 0 .level = to_syslog_level(.severity) ?? "" ''' [sinks.my_cloudwatch_sink] type = "aws_cloudwatch_logs" inputs = ["my_journald_filter"] compression = "gzip" encoding.codec = "json" region = "us-east-1" group_name = "myenv" stream_name = "mysite/myhost"

In the above pipeline, the my_journald_source step pipes to the my_journald_transform step, which pipes to the my_journald_transform step, which pipes to the my_cloudwatch_sink step (configured via the inputs setting of each receiving step). The condition VRL expression in the filter step drops entries unless the entry's PRIORITY field is less than 5 (aka "emerg", "alert", "crit", "err", and "warning"), or unless the entry's SYSLOG_IDENTITY field is "systemd" or "uwsgi". And the source VRL program in the remap step adds some additional conveniently-formatted fields (app datetime facility severity and level) to each log entry (the ?? operator in the source coerces "fallible" expressions to a default value when they would otherwise throw an error).

Now if you restart Vector and check your CloudWatch logs, you'll see fewer unimportant entries (those with lower priorities or uninteresting sources that we filtered), plus some additional fields that we added:

$ saw watch myenv --expand --prefix mysite [2021-06-11T13:00:27-07:00] (myhost) { "PRIORITY": "6", "SYSLOG_FACILITY": "3", "SYSLOG_IDENTIFIER": "uwsgi", "_BOOT_ID": "6cb87d254d3742728b4fe20e746bcbe6", "_CAP_EFFECTIVE": "0", "_CMDLINE": "/usr/bin/uwsgi /etc/myapp/uwsgi.ini", "_COMM": "uwsgi", "_EXE": "/usr/bin/uwsgi-core", "_GID": "33", "_MACHINE_ID": "ec2aff1204bfae2781faf97e68afb1d4", "_PID": "363", "_SELINUX_CONTEXT": "unconfined\n", "_STREAM_ID": "aa261772c2e74663a7bb122c24b92e64", "_SYSTEMD_CGROUP": "/system.slice/myapp.service", "_SYSTEMD_INVOCATION_ID": "b5e117501bbb43428ab7565659022c20", "_SYSTEMD_SLICE": "system.slice", "_SYSTEMD_UNIT": "myapp.service", "_TRANSPORT": "stdout", "_UID": "33", "__MONOTONIC_TIMESTAMP": "511441719050", "__REALTIME_TIMESTAMP": "1623441627906124", "app": "uwsgi", "datetime": "2021-06-11T20:00:27Z", "facility": "daemon", "host": "myhost", "level": "info", "message": "[pid: 363|app: 0|req: 501/501] 203.0.113.2 () {34 vars in 377 bytes} [Fri Jun 11 20:00:27 2021] HEAD / =< generated 0 bytes in 0 msecs (HTTP/1.1 200) 2 headers in 78 bytes (0 switches on core 0)", "severity": 6, "source_type": "journald" }

And we can use the new fields we added to further filter our output from Saw, as well as print compact log lines with jq:

$ saw watch myenv --raw --prefix mysite --filter '{ $.severity < 4 || $.app = "uwsgi" }' | jq --unbuffered -r '[.datetime, .level, .host, .app, .message] | join(" ")' 2021-06-11T20:00:27Z info myhost uwsgi [pid: 363|app: 0|req: 501/501] 203.0.113.2 () {34 vars in 377 bytes} [Fri Jun 11 20:00:27 2021] HEAD / =< generated 0 bytes in 0 msecs (HTTP/1.1 200) 2 headers in 78 bytes (0 switches on core 0)

Remove Irrelevant Fields

You can also use Vector's remap filter to remove extraneous fields that you don't want to ship to and store in CloudWatch. You can use the del function to delete specific fields from each event — for example, to skip the journald fields which duplicate the custom fields we added:

source = ''' .app = .SYSLOG_IDENTIFIER .datetime = to_timestamp(round((to_int(.__REALTIME_TIMESTAMP) ?? 0) / 1000000 ?? 0)) .facility = to_syslog_facility(to_int(.SYSLOG_FACILITY) ?? 0) ?? "" .severity = to_int(.PRIORITY) ?? 0 .level = to_syslog_level(.severity) ?? "" del(.PRIORITY) del(.SYSLOG_IDENTIFIER) del(.SYSLOG_FACILITY) '''

Or you could replace the original event entirely with a new object that contains just your desired fields:

source = ''' e = {} e.app = .SYSLOG_IDENTIFIER e.cgroup = ._SYSTEMD_CGROUP e.cmd = ._CMDLINE e.facility = to_int(.SYSLOG_FACILITY) ?? 0 e.gid = to_int(._GID) ?? 0 e.host = .host e.message = .message e.monotime = to_int(.__MONOTONIC_TIMESTAMP) ?? 0 e.pid = to_int(._PID) ?? 0 e.realtime = to_int(.__REALTIME_TIMESTAMP) ?? 0 e.datetime = to_timestamp(round(e.realtime / 1000000 ?? 0)) e.severity = to_int(.PRIORITY) ?? 0 e.level = to_syslog_level(e.severity) ?? "" e.uid = to_int(._UID) ?? 0 . = [e] '''

If you change your Vector pipeline to remap events like the above and restart it, you'll now see log events with only the following fields shipped to CloudWatch:

$ saw watch myenv --expand --prefix mysite [2021-06-11T13:00:27-07:00] (myhost) { "app": "uwsgi", "cgroup": "/system.slice/myapp.service", "cmd": "/usr/bin/uwsgi /etc/myapp/uwsgi.ini", "datetime": "2021-06-11T20:00:27Z", "facility": 3, "gid": 33, "host": "myhost", "level": "info", "message": "[pid: 363|app: 0|req: 501/501] 203.0.113.2 () {34 vars in 377 bytes} [Fri Jun 11 20:00:27 2021] HEAD / =< generated 0 bytes in 0 msecs (HTTP/1.1 200) 2 headers in 78 bytes (0 switches on core 0)", "monotime": 511441719050, "pid": 363, "realtime": 1623441627906124, "severity": 6, "uid": 33 }

Monday, April 19, 2021

Elixir AWS SDK

While AWS doesn't provide an SDK directly for Erlang or Elixir, the AWS for the BEAM project has built a nice solution for this — a code generator that uses the JSON API definitions from the official AWS Go SDK to create native Erlang and Elixir AWS SDK bindings. The result for Elixir is the nifty aws-elixir library.

The aws-elixir library itself doesn't have the automagic functionality from other AWS SDKs of being able to pull AWS credentials from various sources like environment variables, profile files, IAM roles for tasks or EC2, etc. However, the AWS for the BEAM project has another library you can use for that: aws_credentials. Here's how to use aws-elixir in combination with aws_credentials for a standard Mix project:

1. Add aws dependencies

First, add the aws, aws_credentials, and hackney libraries as dependencies to your mix.exs file:

# mix.exs defp deps do [ {:aws, "~> 0.7.0"}, {:aws_credentials, git: "https://github.com/aws-beam/aws_credentials", ref: "0.1.0"}, {:hackney, "~> 1.17"}, ] end

2. Set up AWS.Client struct

Next, set up aws-elixir's AWS.Client struct with the AWS credentials found by the :aws_credentials.get_credentials/0 function. In this example, I'm going to create a simple MyApp.AwsUtils module, with a client/0 function that I can call from anywhere else in my app to initialize the AWS.Client struct:

# lib/my_app/aws_utils.ex defmodule MyApp.AwsUtils do @doc """ Creates a new AWS.Client with default settings. """ @spec client() :: AWS.Client.t() def client, do: :aws_credentials.get_credentials() |> build_client() defp build_client(%{access_key_id: id, secret_access_key: key, token: "", region: region}) do AWS.Client.create(id, key, region) end defp build_client(%{access_key_id: id, secret_access_key: key, token: token, region: region}) do AWS.Client.create(id, key, token, region) end defp build_client(credentials), do: struct(AWS.Client, credentials) end

The aws_credentials library will handle caching for you, so you don't need to separately cache the credentials it returns — just call get_credentials/0 every time you need them. By default, it will first check for the standard AWS environment variables (AWS_ACCESS_KEY_ID etc), then for the standard credentials file (~/.aws/credentials), then for ECS task credentials, and then for credentials from the EC2 metadata service.

So the above example will work if on one system you configure the environment variables for your Elixir program like this:

# .env AWS_DEFAULT_REGION=us-east-1 AWS_ACCESS_KEY_ID=ABCDEFGHIJKLMNOPQRST AWS_SECRET_ACCESS_KEY=01234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ/+a AWS_SESSION_TOKEN=

And on another system you configure the user account running your Elixir program with a ~/.aws/credentials file like this:

# ~/.aws/credentials [default] aws_access_key_id = ABCDEFGHIJKLMNOPQRST aws_secret_access_key = 01234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ/+a

And when running the Elixir program in an ECS task or EC2 instance, it will automatically pick up the credentials configured for the ECS task or EC2 instance under which the program is running.

If you do use a credentials file, you can customize the path to the credentials file, or profile within the file, via the :provider_options configuration parameter, like so:

# config/config.exs config :aws_credentials, :provider_options, %{ credential_path: "/home/me/.aws/config", profile: "myprofile" }

Some caveats with the current aws_credentials implementation are:

  1. With environment variables, you can specify the region (via the AWS_DEFAULT_REGION or AWS_REGION variable) only if you also specify the session token (via the AWS_SESSION_TOKEN or AWS_SECURITY_TOKEN variable).
  2. With credential files, the region and aws_session_token settings won't be included.

3. Call AWS.* module functions

Now you can go ahead and call any AWS SDK function. In this example, I'm going to create a get_my_special_file/0 function to get the contents of a file from S3:

# lib/my_app/my_files.ex defmodule MyApp.MyFiles do @doc """ Gets the content of my special file from S3. """ @spec get_my_special_file() :: binary def get_my_special_file do client = MyApp.AwsUtils.client() bucket = "my-bucket" key = "my/special/file.txt" {:ok, %{"Body" => body}, %{status_code: 200}} = AWS.S3.get_object(client, bucket, key) body end

For any AWS SDK function, you can use the Hex docs to guide you as to the Elixir function signature, the Go docs for any structs not explained in the Hex docs, and the AWS docs for more details and examples. For example, here are the docs for the get_object function used above:

  1. Hex docs for AWS.S3.get_object/22
  2. Go docs for S3.GetObject
  3. AWS docs for S3 GetObject

The general response format form each aws-elixir SDK function is this:

# successful response { :ok, map_of_parsed_response_body_with_string_keys, %{body: body_binary, headers: list_of_string_header_tuples, status_code: integer} } # error response { :error, { :unexpected_response, %{body: body_binary, headers: list_of_string_header_tuples, status_code: integer} } }

With the AWS.S3.get_object/22 example above, a successful response will look like this:

iex> AWS.S3.get_object(MyApp.AwsUtils.client(), "my-bucket", "my/special/file.txt") {:ok, %{ "Body" => "my special file content\n", "ContentLength" => "24", "ContentType" => "text/plain", "ETag" => "\"00733c197e5877adf705a2ec6d881d44\"", "LastModified" => "Wed, 14 Apr 2021 19:05:34 GMT" }, %{ body: "my special file content\n", headers: [ {"x-amz-id-2", "ouJJOzsesw0m24Y6SCxtnDquPbo4rg0BwSORyMn3lOJ8PIeptboR8ozKgIwuPGRAtRPyRIPi6Dk="}, {"x-amz-request-id", "P9ZVDJ2L378Q3EGX"}, {"Date", "Wed, 14 Apr 2021 20:40:46 GMT"}, {"Last-Modified", "Wed, 14 Apr 2021 19:05:34 GMT"}, {"ETag", "\"00733c197e59877ad705a2ec6d881d44\""}, {"Accept-Ranges", "bytes"}, {"Content-Type", "text/plain"}, {"Content-Length", "24"}, {"Server", "AmazonS3"} ], status_code: 200 }}

And an error response will look like this:

iex> AWS.S3.get_object(MyApp.AwsUtils.client(), "my-bucket", "not/my/special/file.txt") {:error, {:unexpected_response, %{ body: "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>FJWGFYKL44AB4XZK</RequestId><HostId>G4mzxVPQdjFsHpErTWZhG7djVLks1Vu7RLLYS37XA38c6JsAaJs+QMp3bR3Vm9aKhoWBuS/Mk6Y=</HostId></Error>", headers: [ {"x-amz-request-id", "FJWGFYKL44AB4XZK"}, {"x-amz-id-2", "G4mzxVPQdjFsHpErTWZhG7djVLks1Vu7RLLYS37XA38c6JsAaJs+QMp3bR3Vm9aKhoWBuS/Mk6Y="}, {"Content-Type", "application/xml"}, {"Transfer-Encoding", "chunked"}, {"Date", "Wed, 14 Apr 2021 19:25:01 GMT"}, {"Server", "AmazonS3"} ], status_code: 403 }}}

Temporary aws-elixir workaround

The 0.7.0 version of the aws-elixir library has a few bugs that affect its S3 module. These bugs have already been fixed in the master branch, so if you want to use the S3 module, until its next release, just use the latest version of aws-elixir from the master branch on GitHub — this is the latest version as of 2021-04-16:

# mix.exs defp deps do [ {:aws, "~> 0.7.0"}, {:aws, git: "https://github.com/aws-beam/aws-elixir", ref: "52f67ba"}, {:aws_credentials, git: "https://github.com/aws-beam/aws_credentials", ref: "0.1.0"}, {:hackney, "~> 1.17"}, ] end

Temporary aws_credentials workaround

The 0.1.0 version of the aws_credentials library has bug where it fails to start if you don't explicitly set its :provider_options configuration parameter. Until its next release, the workaround is simply to set that config param to an empty map:

# config/config.exs config :aws_credentials, :provider_options, %{}

Friday, March 26, 2021

Elixir Systemd Logging

If you run an Elixir application as a Linux service with systemd, you'll probably find that logging works pretty well out of the box. By default, Elixir uses the Console logger backend, which sends all log messages to stdout. And with systemd services, by default all stdout messages are sent to journald.

This means you can view your application's logs easily via the journalctl command. For example, you can "tail" your app's logs with a command like this (if the systemd unit for the app was named my_app):

journalctl -u my_app -f

You can also configure systemd to send your app's stdout to a custom log file instead of journald, using the StandardOutput directive. You can add that directive to the [Service] section of a systemd unit file (for example, to log to a custom /var/log/my_app.log):

# /etc/systemd/system/my_app.service [Service] ExecStart=/srv/my_app/bin/my_app start ExecStop=/srv/my_app/bin/my_app stop StandardOutput=append:/var/log/my_app.log

Problems

If you collect and ship your log messages off to a centralized log service (like AWS CloudWatch, Google Cloud Logging, Azure Monitor, Splunk, Sumologic, Elasticsearch, Loggly, Datadog, New Relic, etc), you'll find two problems with this, however:

  1. Multi-line messages are broken up into a separate log entry for each line
  2. Log level/priority is lost

You can add some steps further down your logging pipeline to try to correct this, but the easiest way to fix it is at the source: Replace the default Console logger with the ExSyslogger backend.

Here's how you'd do that with a Phoenix web app:

1. Add the ex_syslogger dependency

First, add the ex_syslogger library as a dependency to your mix.exs file:

# mix.exs defp deps do [ {:ex_syslogger, "~> 1.5"} ] end

2. Register the ex_syslogger backend

Update the root config :logger options in your config/prod.exs file to register the ExSyslogger backend under the name :ex_syslogger:

# config/prod.exs # Do not print debug messages in production config :logger, level: :info config :logger, level: :info, backends: [{ExSyslogger, :ex_syslogger}]

Note that the :ex_syslogger name isn't special — you can call it whatever you want. It just has to match the name you use in the next section:

3. Configure the ex_syslogger backend

Now add config :logger, :ex_syslogger options to your config/config.exs file to configure the backend named :ex_syslogger that you registered above. I'd suggest just duplicating the configuration you already have for the default :console backend, plus setting the syslog APP-NAME field to your app's name via the ident option:

# config/config.exs # Configures Elixir's Logger config :logger, :console, format: "$time $metadata[$level] $message\n", metadata: [:request_id] config :logger, :ex_syslogger, format: "$time $metadata[$level] $message\n", metadata: [:request_id], ident: "my_app"

Result

Now when you compile your app with MIX_ENV=prod and run it as a systemd service, journald will automatically handle multi-line messages and log levels/priorities correctly. Furthermore, you can use any generic syslog collector to ship log entries to your log service as soon as they occur — with multi-line messages and log levels intact.

For example, when using the default Console logger, an error message from a Phoenix web app would have been displayed like this by journalctl:

$ journalctl -u my_app -f Mar 26 18:21:10 foo my_app[580361]: 18:21:10.337 request_id=Fm_3dFhPMtEHARkAAALy [info] Sent 500 in 16ms Mar 26 18:21:10 foo my_app[580361]: 18:21:10.345 [error] #PID<0.4149.0> running MyAppWeb.Endpoint (connection #PID<0.4148.0>, stream id 1) terminated Mar 26 18:21:10 foo my_app[580361]: Server: foo.example.com:443 (https) Mar 26 18:21:10 foo my_app[580361]: Request: GET /test/error Mar 26 18:21:10 foo my_app[580361]: ** (exit) an exception was raised: Mar 26 18:21:10 foo my_app[580361]: ** (RuntimeError) test runtime error Mar 26 18:21:10 foo my_app[580361]: (my_app 0.1.0) lib/my_app_web/controllers/test_controller.ex:9: MyAppWeb.TestController.error/2 Mar 26 18:21:10 foo my_app[580361]: (my_app 0.1.0) lib/my_app_web/controllers/test_controller.ex:1: MyAppWeb.TestController.action/2 Mar 26 18:21:10 foo my_app[580361]: (my_app 0.1.0) lib/my_app_web/controllers/test_controller.ex:1: MyAppWeb.TestController.phoenix_controller_pipeline/2 Mar 26 18:21:10 foo my_app[580361]: (phoenix 1.5.8) lib/phoenix/router.ex:352: Phoenix.Router.__call__/2 Mar 26 18:21:10 foo my_app[580361]: (my_app 0.1.0) lib/my_app_web/endpoint.ex:1: MyAppWeb.Endpoint.plug_builder_call/2 Mar 26 18:21:10 foo my_app[580361]: (my_app 0.1.0) lib/my_app_web/endpoint.ex:1: MyAppWeb.Endpoint.call/2 Mar 26 18:21:10 foo my_app[580361]: (phoenix 1.5.8) lib/phoenix/endpoint/cowboy2_handler.ex:65: Phoenix.Endpoint.Cowboy2Handler.init/4 Mar 26 18:21:10 foo my_app[580361]: (cowboy 2.8.0) /srv/my_app/deps/cowboy/src/cowboy_handler.erl:37: :cowboy_handler.execute/2

But with ExSyslogger in place, you'll now see this (where the full error message is captured as a single log entry, and is recognized as an error-level message):

$ journalctl -u my_app -f Mar 26 18:21:10 foo my_app[580361]: 18:21:10.337 request_id=Fm_3dFhPMtEHARkAAALy [info] Sent 500 in 16ms Mar 26 18:21:10 foo my_app[580361]: 18:21:10.345 [error] #PID<0.4149.0> running MyAppWeb.Endpoint (connection #PID<0.4148.0>, stream id 1) terminated Server: foo.example.com:443 (https) Request: GET /test/error ** (exit) an exception was raised: ** (RuntimeError) test runtime error (my_app 0.1.0) lib/my_app_web/controllers/test_controller.ex:9: MyAppWeb.TestController.error/2 (my_app 0.1.0) lib/my_app_web/controllers/test_controller.ex:1: MyAppWeb.TestController.action/2 (my_app 0.1.0) lib/my_app_web/controllers/test_controller.ex:1: MyAppWeb.TestController.phoenix_controller_pipeline/2 (phoenix 1.5.8) lib/phoenix/router.ex:352: Phoenix.Router.__call__/2 (my_app 0.1.0) lib/my_app_web/endpoint.ex:1: MyAppWeb.Endpoint.plug_builder_call/2 (my_app 0.1.0) lib/my_app_web/endpoint.ex:1: MyAppWeb.Endpoint.call/2 (phoenix 1.5.8) lib/phoenix/endpoint/cowboy2_handler.ex:65: Phoenix.Endpoint.Cowboy2Handler.init/4 (cowboy 2.8.0) /srv/my_app/deps/cowboy/src/cowboy_handler.erl:37: :cowboy_handler.execute/2

And as a side note, you can use journalctl to view just error-level messages and above via the --priority=err flag (-p3 for short):

journalctl -u my_app -p3

Monday, March 8, 2021

D3v6 Pan and Zoom

Since D3 version 3 it's been really easy to add panning and zooming to custom visualizations, allowing the user to scroll the SVG canvas vertically and horizontally by clicking and dragging the mouse cursor around the canvas, and to scale the canvas larger and smaller by spinning the mouse wheel.

Simplest way

For the simplest case, all you need is to apply the d3.zoom() behavior to your root svg element. This is how you do it with D3 version 6 (d3.v6.js):

<svg id="viz1" width="300" height="300" style="background:#ffc"> <circle cx="50%" cy="50%" r="25%" fill="#69c" /> </svg> <script> const svg = d3 .select('#viz1') .call(d3.zoom().on('zoom', ({ transform }) => svg.attr('transform', transform))) </script>

It'll work like the following:

Smoothest way

In most cases, however, you'll get smoother behavior by adding some group (<g>) elements to wrap your main visualization elements. If you're starting with a structure like the following, where you've got a .canvas group element containing the main content you want to pan and zoom:

<svg id="viz2" width="300" height="300"> <g class="canvas" transform="translate(150,150)"> <circle cx="0" cy="0" r="25%" fill="#69c" /> </g> </svg>

Do this: add one wrapper group element, .zoomed, around the original .canvas group; and a second group element, .bg, around .zoomed; and add a rect inside the .bg group:

<svg id="viz2" width="300" height="300"> <g class="bg"> <rect width="100%" height="100%" fill="#efc" /> <g class="zoomed"> <g class="canvas" transform="translate(150,150)"> <circle cx="0" cy="0" r="25%" fill="#69c" /> </g> </g> </g> </svg>

The rect inside the .bg group will ensure that the user's click-n-drag or mouse wheeling will be captured as long as the mouse pointer is anywhere inside the svg element (without this rect, the mouse would be captured only when the user positions the mouse over a graphical element drawn inside the .bg group — like the circle in this example). For this example, I've set the fill of the rect to a light green-yellow color; but usually you'd just set it to transparent.

Then attach the pan & zoom behavior to the .bg group — but apply the pan & zoom transform to the .zoomed group it contains. This will prevent stuttering when panning, since the .bg group will remain fixed; and it will avoid messing with any transforms or other fancy styling/positioning you already have on your inner .canvas group:

<script> const zoomed = d3.select('#viz2 .zoomed') const bg = d3 .select('#viz2 .bg') .call( d3 // base d3 pan & zoom behavior .zoom() // limit zoom to between 20% and 200% of original size .scaleExtent([0.2, 2]) // apply pan & zoom transform to 'zoomed' element .on('zoom', ({ transform }) => zoomed.attr('transform', transform)) // add 'grabbing' class to 'bg' element when panning; // add 'scaling' class to 'bg' element when zooming .on('start', ({ sourceEvent: { type } }) => { bg.classed(type === 'wheel' ? 'scaling' : 'grabbing', true) }) // remove 'grabbing' and 'scaling' classes when done panning & zooming .on('end', () => bg.classed('grabbing scaling', false)), ) </script>

Finally, set the mouse cursor via CSS when the user positions the pointer over the rect element. The grabbing and scaling classes will be added to the .bg group while the pan or zoom activity is ongoing, via the on('start') and on('end') hooks above:

<style lang="css"> .bg > rect { cursor: move; } .bg.grabbing > rect { cursor: grabbing; } .bg.scaling > rect { cursor: zoom-in; } </style>

When you put it all together, it will work like the following:

Thursday, January 14, 2021

Using an Ecto Readonly Replica Repo

Elixir Ecto has excellent documentation for how to use read-only replica databases, but because I'm so dense it took me a bit of trial and error to figure out where all the changes suggested by the documentation should go in my own app. Here's a concrete example of what I had to change for my conventional Mix + Phoenix application.

(The docs describe how to add N different replicas dynamically — MyApp.Repo.Replica1, MyApp.Repo.Replica2, etc — but since I only have to worry about a single endpoint for my read replicas, I simplified things and just used a single, static MyApp.Repo.Replica instance in my Elixir configuration and code.)

Mix Environment Helper

To allow my app to determine whether it was compiled and is running with a test, dev, or prod configuration, I added a config_env setting to my app, and set it to the value of the Mix.env/0 function at compile time:

# config/config.exs config :my_app, config_env: Mix.env(), ecto_repos: [MyApp.Repo] end

Note that with Elixir 1.11 and newer, you can instead use Config.config_env/0 in place of Mix.env/0:

# config/config.exs config :my_app, config_env: Config.config_env(), ecto_repos: [MyApp.Repo] end

And in my root MyApp module, I added a helper function to access this config_env setting:

# lib/my_app.ex defmodule MyApp do def config_env, do: Application.get_env(:my_app, :config_env) end

This means that I can call MyApp.config_env/0 at runtime in various places in my app's code, and get the Mix.env/0 value with which the app was compiled (like :test, :dev, or :prod).

Replica Module

To my existing lib/my_app/repo.ex file (which already contained the MyApp.Repo module), I added the definition for my new MyApp.Repo.Replica module, like so:

# lib/my_app/repo.ex defmodule MyApp.Repo do use Ecto.Repo, otp_app: :my_app, adapter: Ecto.Adapters.Postgres def replica, do: MyApp.Repo.Replica end defmodule MyApp.Repo.Replica do use Ecto.Repo, otp_app: :my_app, adapter: Ecto.Adapters.Postgres, default_dynamic_repo: if(MyApp.config_env() != :test, do: MyApp.Repo.Replica, else: MyApp.Repo), read_only: true end

The default_dynamic_repo option in the MyApp.Repo.Replica module uses the config_env helper I added above to set up the module to use the primary MyApp.Repo's own connection pool for the read replica in the test environment, as recommended by the Ecto docs. This way the replica instance will just delegate to the primary repo instance for all of its read operations in the test environment, but will still enforce its own read-only setting. Also, this way I don't have to configure any test-env-specific settings for the read replica in my config/test.exs file (nor do I need to start up another child process for the replica, as we'll see in the next section).

Application Module

In non-test environments, the new read replica module does need to be started as a child process, alongside the primary repo. So I modified the start/2 function in my application module to start it:

# lib/my_app/application.ex defmodule MyApp.Application do use Application def start(_type, _args) do # don't start separate readonly repo in test mode repos = if MyApp.config_env() != :test do [MyApp.Repo, MyApp.Repo.Replica] else [MyApp.Repo] end children = repos ++ [ MyApp.Repo, MyAppWeb.Endpoint ] opts = [strategy: :one_for_one, name: MyApp.Supervisor] Supervisor.start_link(children, opts) end end

Dev Config

For my dev environment configuration, I updated my config/dev.exs file to simply duplicate the configuration of the primary MyApp.Reop for the MyApp.Repo.Replica (creating a separate connection pool to the same database as the primary for the replica):

# config/dev.exs config :my_app, MyApp.Repo, for repo <- [MyApp.Repo, MyApp.Repo.Replica] do config :my_app, repo, username: "myusername", password: "mypassword", database: "mydatabase", hostname: "localhost", show_sensitive_data_on_connection_error: true, pool_size: 10 end

Prod Config

For the prod environment configuration, I updated my config/releases.exs file to use a similar configuration as the primary for the replica, but have it instead pull the replica hostname from a different environment variable (DB_READONLY in this case):

# config/releases.exs config :my_app, MyApp.Repo, ssl: true, username: System.get_env("DB_USERNAME"), password: System.get_env("DB_PASSWORD"), database: System.get_env("DB_DATABASE"), hostname: System.get_env("DB_HOSTNAME"), pool_size: String.to_integer(System.get_env("DB_POOLSIZE") || "10") config :my_app, MyApp.Repo.Replica, ssl: true, username: System.get_env("DB_USERNAME"), password: System.get_env("DB_PASSWORD"), database: System.get_env("DB_DATABASE"), hostname: System.get_env("DB_READONLY"), pool_size: String.to_integer(System.get_env("DB_POOLSIZE") || "10")

Using the Replica

With all the above in place, everywhere in my Elixir code that I want to query a read replica instead the primary database, I can just replace MyApp.Repo with MyApp.Repo.replica():

# lib/my_app/users.ex import Ecto.Query alias MyApp.Repo alias MyApp.Users.User def list_usernames do from(u in User, select: u.username) |> Repo.replica().all() end