Thursday, September 23, 2021

Sourcehut Docker Builds on Fedora

Building and running Docker images on builds.sr.ht works nicely with Alpine Linux VMs (example here from Drew DeVault). Tim Schumacher figured out a similar way to set it up with Arch Linux VMs (example here).

I couldn't find an example specifically for Fedora VMs, however. But with a little trial and error, it turns out what you need is pretty similar to Arch — this is what I ended up with:

# .build.yml image: fedora/34 tasks: - install-docker: | curl -fsSL https://get.docker.com | sudo bash sudo mount -t tmpfs -o size=4G /dev/null /dev/shm until [ -e /dev/shm ]; do sleep 1; done sudo nohup dockerd --bip 172.18.0.1/16 </dev/null >/dev/null 2>&1 & sudo usermod -aG docker $(whoami) until sudo docker version >/dev/null 2>&1; do sleep 1; done - run-docker: | cat <<EOF >Dockerfile FROM alpine:latest RUN apk add htop CMD ["htop"] EOF docker build .

In the install-docker task, the first line installs the latest version of Docker. The second line sets up the shared-memory mount that Docker requires; and the third line waits until the mount is ready. The fourth line runs the Docker daemon as a background job; and the sixth line waits the Docker daemon is fully up and initialized.

The fifth line (the usermod command) makes the current user a member of the docker group, so the current user can run Docker commands directly (without sudo). It doesn't take effect within the install-docker task, however — so within the install-docker task, you still have to use sudo to run Docker; but in following tasks (like run-docker), it is in effect — so the example docker build . can be run without sudo.

Friday, June 11, 2021

Send Journald to CloudWatch Logs with Vector

Timber's Vector log collection tool is a nifty Swiss Army knife for collecting and shipping logs and metrics from one system to another. In particular, I think it's the best tool for shipping structured journald events to CloudWatch Logs.

Here's how to start using Vector to send journald log events to CloudWatch:

Grant Permissions to EC2 Roles

In order to push logs (or metrics) from your EC2 instances to CloudWatch, you first need to grant those EC2 instances some CloudWatch permissions. The permissions you need are basically the same as the AWS CloudWatch Agent needs, so just follow the Create IAM roles and users for use with the CloudWatch agent tutorial to assign the AWS-managed CloudWatchAgentServerPolicy to the IAM roles of the EC2 instances from which you plan on shipping journald logs.

The current version of the CloudWatchAgentServerPolicy looks like this:

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "cloudwatch:PutMetricData", "ec2:DescribeVolumes", "ec2:DescribeTags", "logs:PutLogEvents", "logs:DescribeLogStreams", "logs:DescribeLogGroups", "logs:CreateLogStream", "logs:CreateLogGroup" ], "Resource": "*" }, { "Effect": "Allow", "Action": [ "ssm:GetParameter" ], "Resource": "arn:aws:ssm:*:*:parameter/AmazonCloudWatch-*" } ] }

With the Vector configuration described below, however, you actually only need to grant the logs:PutLogEvents, logs:DescribeLogStreams, logs:DescribeLogGroups, logs:CreateLogStream, and logs:CreateLogGroup permissions to your EC2 roles.

Install Vector

Installing Vector is easy on Linux. Timber maintains their own deb repo for Vector, so on a Debian-based distro like Ubuntu, you can just update the system's APT package manager with the Vector signing-key and repo, and install the Vector package:

$ wget https://repositories.timber.io/public/vector/gpg.3543DB2D0A2BC4B8.key -O - | sudo apt-key add - $ cat <<EOF | sudo tee /etc/apt/sources.list.d/timber-vector.list deb https://repositories.timber.io/public/vector/deb/ubuntu focal main deb-src https://repositories.timber.io/public/vector/deb/ubuntu focal main EOF $ sudo apt update $ sudo apt install vector

Configure Vector

The default Vector config file, located at /etc/vector/vector.toml, just includes a sample source and sink, so you can replace it entirely with your own config settings. This is the minimum you need to ship journald logs to CloudWatch:

[sources.my_journald_source] type = "journald" [sinks.my_cloudwatch_sink] type = "aws_cloudwatch_logs" inputs = ["my_journald_source"] compression = "gzip" encoding.codec = "json" region = "us-east-1" group_name = "myenv" stream_name = "mysite/myhost"

Replace the CloudWatch region, group_name, and stream_name settings above with whatever's appropriate for your EC2 instances.

Restart Vector

In one terminal screen, watch for errors by tailing Vector's own log entries with the journalctl -u vector -f command, and in another terminal restart Vector with the sudo systemctl restart vector command. If everything works, this is what you'll see in Vector's own logs:

$ journalctl -u vector -f Jun 11 19:54:02 myhost systemd[1]: Started Vector. Jun 11 19:54:03 myhost vector[686208]: Jun 11 19:54:03.008 INFO vector::app: Log level is enabled. level="vector=info,codec=info,vrl=info,file_source=info,tower_limit=trace,rdkafka=info" Jun 11 19:54:03 myhost vector[686208]: Jun 11 19:54:03.008 INFO vector::sources::host_metrics: PROCFS_ROOT is unset. Using default '/proc' for procfs root. Jun 11 19:54:03 myhost vector[686208]: Jun 11 19:54:03.008 INFO vector::sources::host_metrics: SYSFS_ROOT is unset. Using default '/sys' for sysfs root. Jun 11 19:54:03 myhost vector[686208]: Jun 11 19:54:03.010 INFO vector::app: Loading configs. path=[("/etc/vector/vector.toml", Some(Toml))] Jun 11 19:54:03 myhost vector[686208]: Jun 11 19:54:03.060 INFO vector::topology: Running healthchecks. Jun 11 19:54:03 myhost vector[686208]: Jun 11 19:54:03.060 INFO vector::topology: Starting source. name="journald" Jun 11 19:54:03 myhost vector[686208]: Jun 11 19:54:03.061 INFO vector::topology: Starting sink. name="aws_cloudwatch_logs" Jun 11 19:54:03 myhost vector[686208]: Jun 11 19:54:03.061 INFO vector: Vector has started. version="0.14.0" arch="x86_64" build_id="5f3a319 2021-06-03" Jun 11 19:54:03 myhost vector[686208]: Jun 11 19:54:03.062 INFO vector::app: API is disabled, enable by setting `api.enabled` to `true` and use commands like `vector top`. Jun 11 19:54:03 myhost vector[686208]: Jun 11 19:54:03.063 INFO journald-server: vector::sources::journald: Starting journalctl. Jun 11 19:54:03 myhost vector[686208]: Jun 11 19:54:03.128 INFO vector::sinks::aws_cloudwatch_logs: Skipping healthcheck log group check: `group_name` will be created if missing. Jun 11 19:55:04 myhost vector[686208]: Jun 11 19:55:04.430 INFO sink{component_kind="sink" component_name=aws_cloudwatch_logs component_type=aws_cloudwatch_logs}:request{request_id=0}: vector::sinks::aws_cloudwatch_logs: Sending events. events=4 Jun 11 19:55:04 myhost vector[686208]: Jun 11 19:55:04.453 INFO sink{component_kind="sink" component_name=aws_cloudwatch_logs component_type=aws_cloudwatch_logs}:request{request_id=0}: vector::sinks::aws_cloudwatch_logs::request: Log group provided does not exist; creating a new one. Jun 11 19:55:04 myhost vector[686208]: Jun 11 19:55:04.489 INFO sink{component_kind="sink" component_name=aws_cloudwatch_logs component_type=aws_cloudwatch_logs}:request{request_id=0}: vector::sinks::aws_cloudwatch_logs::request: Group created. name=myenv Jun 11 19:55:04 myhost vector[686208]: Jun 11 19:55:04.507 INFO sink{component_kind="sink" component_name=aws_cloudwatch_logs component_type=aws_cloudwatch_logs}:request{request_id=0}: vector::sinks::aws_cloudwatch_logs::request: Stream created. name=mysite/myhost Jun 11 19:55:04 myhost vector[686208]: Jun 11 19:55:04.523 INFO sink{component_kind="sink" component_name=aws_cloudwatch_logs component_type=aws_cloudwatch_logs}:request{request_id=0}: vector::sinks::aws_cloudwatch_logs::request: Putting logs. token=None Jun 11 19:55:04 myhost vector[686208]: Jun 11 19:55:04.560 INFO sink{component_kind="sink" component_name=aws_cloudwatch_logs component_type=aws_cloudwatch_logs}:request{request_id=0}: vector::sinks::aws_cloudwatch_logs::request: Putting logs was successful. next_token=Some("49610241853835534178700884863462197886393926766970915618")

If something went wrong, Vector will output some error messages (these are especially helpful as you add transformation steps to your basic Vector configuration).

Check Your CloudWatch Logs

Vector will have also shipped some logs to CloudWatch, so check them now. If you use a command-line tool like Saw, you'll see some log events like this:

$ saw watch myenv --expand --prefix mysite [2021-06-11T13:00:27-07:00] (myhost) { "PRIORITY": "6", "SYSLOG_FACILITY": "3", "SYSLOG_IDENTIFIER": "uwsgi", "_BOOT_ID": "6cb87d254d3742728b4fe20e746bcbe6", "_CAP_EFFECTIVE": "0", "_CMDLINE": "/usr/bin/uwsgi /etc/myapp/uwsgi.ini", "_COMM": "uwsgi", "_EXE": "/usr/bin/uwsgi-core", "_GID": "33", "_MACHINE_ID": "ec2aff1204bfae2781faf97e68afb1d4", "_PID": "363", "_SELINUX_CONTEXT": "unconfined\n", "_STREAM_ID": "aa261772c2e74663a7bb122c24b92e64", "_SYSTEMD_CGROUP": "/system.slice/myapp.service", "_SYSTEMD_INVOCATION_ID": "b5e117501bbb43428ab7565659022c20", "_SYSTEMD_SLICE": "system.slice", "_SYSTEMD_UNIT": "myapp.service", "_TRANSPORT": "stdout", "_UID": "33", "__MONOTONIC_TIMESTAMP": "511441719050", "__REALTIME_TIMESTAMP": "1623441627906124", "host": "myhost", "message": "[pid: 363|app: 0|req: 501/501] 203.0.113.2 () {34 vars in 377 bytes} [Fri Jun 11 20:00:27 2021] HEAD / =< generated 0 bytes in 0 msecs (HTTP/1.1 200) 2 headers in 78 bytes (0 switches on core 0)", "source_type": "journald" }

With Saw, use the saw watch command to tail log events as they come in, and use the saw get command to get historical events. For example, this command will print the last 10 minutes of events using the mysite log stream prefix from the myenv log group:

$ saw get myenv --expand --pretty --prefix mysite --start -10m

Filter and Remap Your Logs

With that working, you can tune your Vector configuration to filter out log events you don't care about, and remap certain log fields into a more useful format. Let's add two "transform" steps to our /etc/vector/vector.toml file between the Journald Source and the AWS CloudWatch Logs Sink: a Filter transform, and a Remap transform:

[sources.my_journald_source] type = "journald" [transforms.my_journald_filter] type = "filter" inputs = ["my_journald_source"] condition = ''' (includes(["0", "1", "2", "3", "4"], .PRIORITY) || includes(["systemd", "uwsgi"], .SYSLOG_IDENTIFIER)) ''' [transforms.my_journald_remap] type = "remap" inputs = ["my_journald_filter"] source = ''' .app = .SYSLOG_IDENTIFIER .datetime = to_timestamp(round((to_int(.__REALTIME_TIMESTAMP) ?? 0) / 1000000 ?? 0)) .facility = to_syslog_facility(to_int(.SYSLOG_FACILITY) ?? 0) ?? "" .severity = to_int(.PRIORITY) ?? 0 .level = to_syslog_level(.severity) ?? "" ''' [sinks.my_cloudwatch_sink] type = "aws_cloudwatch_logs" inputs = ["my_journald_filter"] compression = "gzip" encoding.codec = "json" region = "us-east-1" group_name = "myenv" stream_name = "mysite/myhost"

In the above pipeline, the my_journald_source step pipes to the my_journald_transform step, which pipes to the my_journald_transform step, which pipes to the my_cloudwatch_sink step (configured via the inputs setting of each receiving step). The condition VRL expression in the filter step drops entries unless the entry's PRIORITY field is less than 5 (aka "emerg", "alert", "crit", "err", and "warning"), or unless the entry's SYSLOG_IDENTITY field is "systemd" or "uwsgi". And the source VRL program in the remap step adds some additional conveniently-formatted fields (app datetime facility severity and level) to each log entry (the ?? operator in the source coerces "fallible" expressions to a default value when they would otherwise throw an error).

Now if you restart Vector and check your CloudWatch logs, you'll see fewer unimportant entries (those with lower priorities or uninteresting sources that we filtered), plus some additional fields that we added:

$ saw watch myenv --expand --prefix mysite [2021-06-11T13:00:27-07:00] (myhost) { "PRIORITY": "6", "SYSLOG_FACILITY": "3", "SYSLOG_IDENTIFIER": "uwsgi", "_BOOT_ID": "6cb87d254d3742728b4fe20e746bcbe6", "_CAP_EFFECTIVE": "0", "_CMDLINE": "/usr/bin/uwsgi /etc/myapp/uwsgi.ini", "_COMM": "uwsgi", "_EXE": "/usr/bin/uwsgi-core", "_GID": "33", "_MACHINE_ID": "ec2aff1204bfae2781faf97e68afb1d4", "_PID": "363", "_SELINUX_CONTEXT": "unconfined\n", "_STREAM_ID": "aa261772c2e74663a7bb122c24b92e64", "_SYSTEMD_CGROUP": "/system.slice/myapp.service", "_SYSTEMD_INVOCATION_ID": "b5e117501bbb43428ab7565659022c20", "_SYSTEMD_SLICE": "system.slice", "_SYSTEMD_UNIT": "myapp.service", "_TRANSPORT": "stdout", "_UID": "33", "__MONOTONIC_TIMESTAMP": "511441719050", "__REALTIME_TIMESTAMP": "1623441627906124", "app": "uwsgi", "datetime": "2021-06-11T20:00:27Z", "facility": "daemon", "host": "myhost", "level": "info", "message": "[pid: 363|app: 0|req: 501/501] 203.0.113.2 () {34 vars in 377 bytes} [Fri Jun 11 20:00:27 2021] HEAD / =< generated 0 bytes in 0 msecs (HTTP/1.1 200) 2 headers in 78 bytes (0 switches on core 0)", "severity": 6, "source_type": "journald" }

And we can use the new fields we added to further filter our output from Saw, as well as print compact log lines with jq:

$ saw watch myenv --raw --prefix mysite --filter '{ $.severity < 4 || $.app = "uwsgi" }' | jq --unbuffered -r '[.datetime, .level, .host, .app, .message] | join(" ")' 2021-06-11T20:00:27Z info myhost uwsgi [pid: 363|app: 0|req: 501/501] 203.0.113.2 () {34 vars in 377 bytes} [Fri Jun 11 20:00:27 2021] HEAD / =< generated 0 bytes in 0 msecs (HTTP/1.1 200) 2 headers in 78 bytes (0 switches on core 0)

Remove Irrelevant Fields

You can also use Vector's remap filter to remove extraneous fields that you don't want to ship to and store in CloudWatch. You can use the del function to delete specific fields from each event — for example, to skip the journald fields which duplicate the custom fields we added:

source = ''' .app = .SYSLOG_IDENTIFIER .datetime = to_timestamp(round((to_int(.__REALTIME_TIMESTAMP) ?? 0) / 1000000 ?? 0)) .facility = to_syslog_facility(to_int(.SYSLOG_FACILITY) ?? 0) ?? "" .severity = to_int(.PRIORITY) ?? 0 .level = to_syslog_level(.severity) ?? "" del(.PRIORITY) del(.SYSLOG_IDENTIFIER) del(.SYSLOG_FACILITY) '''

Or you could replace the original event entirely with a new object that contains just your desired fields:

source = ''' e = {} e.app = .SYSLOG_IDENTIFIER e.cgroup = ._SYSTEMD_CGROUP e.cmd = ._CMDLINE e.facility = to_int(.SYSLOG_FACILITY) ?? 0 e.gid = to_int(._GID) ?? 0 e.host = .host e.message = .message e.monotime = to_int(.__MONOTONIC_TIMESTAMP) ?? 0 e.pid = to_int(._PID) ?? 0 e.realtime = to_int(.__REALTIME_TIMESTAMP) ?? 0 e.datetime = to_timestamp(round(e.realtime / 1000000 ?? 0)) e.severity = to_int(.PRIORITY) ?? 0 e.level = to_syslog_level(e.severity) ?? "" e.uid = to_int(._UID) ?? 0 . = [e] '''

If you change your Vector pipeline to remap events like the above and restart it, you'll now see log events with only the following fields shipped to CloudWatch:

$ saw watch myenv --expand --prefix mysite [2021-06-11T13:00:27-07:00] (myhost) { "app": "uwsgi", "cgroup": "/system.slice/myapp.service", "cmd": "/usr/bin/uwsgi /etc/myapp/uwsgi.ini", "datetime": "2021-06-11T20:00:27Z", "facility": 3, "gid": 33, "host": "myhost", "level": "info", "message": "[pid: 363|app: 0|req: 501/501] 203.0.113.2 () {34 vars in 377 bytes} [Fri Jun 11 20:00:27 2021] HEAD / =< generated 0 bytes in 0 msecs (HTTP/1.1 200) 2 headers in 78 bytes (0 switches on core 0)", "monotime": 511441719050, "pid": 363, "realtime": 1623441627906124, "severity": 6, "uid": 33 }


Edit 4/23/2022: As of Vector 0.21.1, the rounding shown in the to_timestamp examples is no longer fallible — but the to_timestamp function itself is. So the to_timestamp examples should now look like the following:

e.datetime = to_timestamp(round(e.realtime / 1000000)) ?? now()

Monday, April 19, 2021

Elixir AWS SDK

While AWS doesn't provide an SDK directly for Erlang or Elixir, the AWS for the BEAM project has built a nice solution for this — a code generator that uses the JSON API definitions from the official AWS Go SDK to create native Erlang and Elixir AWS SDK bindings. The result for Elixir is the nifty aws-elixir library.

The aws-elixir library itself doesn't have the automagic functionality from other AWS SDKs of being able to pull AWS credentials from various sources like environment variables, profile files, IAM roles for tasks or EC2, etc. However, the AWS for the BEAM project has another library you can use for that: aws_credentials. Here's how to use aws-elixir in combination with aws_credentials for a standard Mix project:

1. Add aws dependencies

First, add the aws, aws_credentials, and hackney libraries as dependencies to your mix.exs file:

# mix.exs defp deps do [ {:aws, "~> 0.8.0"}, {:aws_credentials, git: "https://github.com/aws-beam/aws_credentials", ref: "0.1.1"}, {:hackney, "~> 1.17"}, ] end

2. Set up AWS.Client struct

Next, set up aws-elixir's AWS.Client struct with the AWS credentials found by the :aws_credentials.get_credentials/0 function. In this example, I'm going to create a simple MyApp.AwsUtils module, with a client/0 function that I can call from anywhere else in my app to initialize the AWS.Client struct:

# lib/my_app/aws_utils.ex defmodule MyApp.AwsUtils do @doc """ Creates a new AWS.Client with default settings. """ @spec client() :: AWS.Client.t() def client, do: :aws_credentials.get_credentials() |> build_client() defp build_client(%{access_key_id: id, secret_access_key: key, token: "", region: region}) do AWS.Client.create(id, key, region) end defp build_client(%{access_key_id: id, secret_access_key: key, token: token, region: region}) do AWS.Client.create(id, key, token, region) end defp build_client(credentials), do: struct(AWS.Client, credentials) end

The aws_credentials library will handle caching for you, so you don't need to separately cache the credentials it returns — just call get_credentials/0 every time you need them. By default, it will first check for the standard AWS environment variables (AWS_ACCESS_KEY_ID etc), then for the standard credentials file (~/.aws/credentials), then for ECS task credentials, and then for credentials from the EC2 metadata service.

So the above example will work if on one system you configure the environment variables for your Elixir program like this:

# .env AWS_DEFAULT_REGION=us-east-1 AWS_ACCESS_KEY_ID=ABCDEFGHIJKLMNOPQRST AWS_SECRET_ACCESS_KEY=01234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ/+a AWS_SESSION_TOKEN=

And on another system you configure the user account running your Elixir program with a ~/.aws/credentials file like this:

# ~/.aws/credentials [default] aws_access_key_id = ABCDEFGHIJKLMNOPQRST aws_secret_access_key = 01234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ/+a

And when running the Elixir program in an ECS task or EC2 instance, it will automatically pick up the credentials configured for the ECS task or EC2 instance under which the program is running.

If you do use a credentials file, you can customize the path to the credentials file, or profile within the file, via the :provider_options configuration parameter, like so:

# config/config.exs config :aws_credentials, :provider_options, %{ credential_path: "/home/me/.aws/config", profile: "myprofile" }

Some caveats with the current aws_credentials implementation are:

  1. With environment variables, you can specify the region (via the AWS_DEFAULT_REGION or AWS_REGION variable) only if you also specify the session token (via the AWS_SESSION_TOKEN or AWS_SECURITY_TOKEN variable).
  2. With credential files, the region and aws_session_token settings won't be included.

3. Call AWS.* module functions

Now you can go ahead and call any AWS SDK function. In this example, I'm going to create a get_my_special_file/0 function to get the contents of a file from S3:

# lib/my_app/my_files.ex defmodule MyApp.MyFiles do @doc """ Gets the content of my special file from S3. """ @spec get_my_special_file() :: binary def get_my_special_file do client = MyApp.AwsUtils.client() bucket = "my-bucket" key = "my/special/file.txt" {:ok, %{"Body" => body}, %{status_code: 200}} = AWS.S3.get_object(client, bucket, key) body end

For any AWS SDK function, you can use the Hex docs to guide you as to the Elixir function signature, the Go docs for any structs not explained in the Hex docs, and the AWS docs for more details and examples. For example, here are the docs for the get_object function used above:

  1. Hex docs for AWS.S3.get_object/22
  2. Go docs for S3.GetObject
  3. AWS docs for S3 GetObject

The general response format form each aws-elixir SDK function is this:

# successful response { :ok, map_of_parsed_response_body_with_string_keys, %{body: body_binary, headers: list_of_string_header_tuples, status_code: integer} } # error response { :error, { :unexpected_response, %{body: body_binary, headers: list_of_string_header_tuples, status_code: integer} } }

With the AWS.S3.get_object/22 example above, a successful response will look like this:

iex> AWS.S3.get_object(MyApp.AwsUtils.client(), "my-bucket", "my/special/file.txt") {:ok, %{ "Body" => "my special file content\n", "ContentLength" => "24", "ContentType" => "text/plain", "ETag" => "\"00733c197e5877adf705a2ec6d881d44\"", "LastModified" => "Wed, 14 Apr 2021 19:05:34 GMT" }, %{ body: "my special file content\n", headers: [ {"x-amz-id-2", "ouJJOzsesw0m24Y6SCxtnDquPbo4rg0BwSORyMn3lOJ8PIeptboR8ozKgIwuPGRAtRPyRIPi6Dk="}, {"x-amz-request-id", "P9ZVDJ2L378Q3EGX"}, {"Date", "Wed, 14 Apr 2021 20:40:46 GMT"}, {"Last-Modified", "Wed, 14 Apr 2021 19:05:34 GMT"}, {"ETag", "\"00733c197e59877ad705a2ec6d881d44\""}, {"Accept-Ranges", "bytes"}, {"Content-Type", "text/plain"}, {"Content-Length", "24"}, {"Server", "AmazonS3"} ], status_code: 200 }}

And an error response will look like this:

iex> AWS.S3.get_object(MyApp.AwsUtils.client(), "my-bucket", "not/my/special/file.txt") {:error, {:unexpected_response, %{ body: "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>FJWGFYKL44AB4XZK</RequestId><HostId>G4mzxVPQdjFsHpErTWZhG7djVLks1Vu7RLLYS37XA38c6JsAaJs+QMp3bR3Vm9aKhoWBuS/Mk6Y=</HostId></Error>", headers: [ {"x-amz-request-id", "FJWGFYKL44AB4XZK"}, {"x-amz-id-2", "G4mzxVPQdjFsHpErTWZhG7djVLks1Vu7RLLYS37XA38c6JsAaJs+QMp3bR3Vm9aKhoWBuS/Mk6Y="}, {"Content-Type", "application/xml"}, {"Transfer-Encoding", "chunked"}, {"Date", "Wed, 14 Apr 2021 19:25:01 GMT"}, {"Server", "AmazonS3"} ], status_code: 403 }}}

Friday, March 26, 2021

Elixir Systemd Logging

If you run an Elixir application as a Linux service with systemd, you'll probably find that logging works pretty well out of the box. By default, Elixir uses the Console logger backend, which sends all log messages to stdout. And with systemd services, by default all stdout messages are sent to journald.

This means you can view your application's logs easily via the journalctl command. For example, you can "tail" your app's logs with a command like this (if the systemd unit for the app was named my_app):

journalctl -u my_app -f

You can also configure systemd to send your app's stdout to a custom log file instead of journald, using the StandardOutput directive. You can add that directive to the [Service] section of a systemd unit file (for example, to log to a custom /var/log/my_app.log):

# /etc/systemd/system/my_app.service [Service] ExecStart=/srv/my_app/bin/my_app start ExecStop=/srv/my_app/bin/my_app stop StandardOutput=append:/var/log/my_app.log

Problems

If you collect and ship your log messages off to a centralized log service (like AWS CloudWatch, Google Cloud Logging, Azure Monitor, Splunk, Sumologic, Elasticsearch, Loggly, Datadog, New Relic, etc), you'll find two problems with this, however:

  1. Multi-line messages are broken up into a separate log entry for each line
  2. Log level/priority is lost

You can add some steps further down your logging pipeline to try to correct this, but the easiest way to fix it is at the source: Replace the default Console logger with the ExSyslogger backend.

Here's how you'd do that with a Phoenix web app:

1. Add the ex_syslogger dependency

First, add the ex_syslogger library as a dependency to your mix.exs file:

# mix.exs defp deps do [ {:ex_syslogger, "~> 1.5"} ] end

2. Register the ex_syslogger backend

Update the root config :logger options in your config/prod.exs file to register the ExSyslogger backend under the name :ex_syslogger:

# config/prod.exs # Do not print debug messages in production config :logger, level: :info config :logger, level: :info, backends: [{ExSyslogger, :ex_syslogger}]

Note that the :ex_syslogger name isn't special — you can call it whatever you want. It just has to match the name you use in the next section:

3. Configure the ex_syslogger backend

Now add config :logger, :ex_syslogger options to your config/config.exs file to configure the backend named :ex_syslogger that you registered above. I'd suggest just duplicating the configuration you already have for the default :console backend, plus setting the syslog APP-NAME field to your app's name via the ident option:

# config/config.exs # Configures Elixir's Logger config :logger, :console, format: "$time $metadata[$level] $message\n", metadata: [:request_id] config :logger, :ex_syslogger, format: "$time $metadata[$level] $message\n", metadata: [:request_id], ident: "my_app"

Result

Now when you compile your app with MIX_ENV=prod and run it as a systemd service, journald will automatically handle multi-line messages and log levels/priorities correctly. Furthermore, you can use any generic syslog collector to ship log entries to your log service as soon as they occur — with multi-line messages and log levels intact.

For example, when using the default Console logger, an error message from a Phoenix web app would have been displayed like this by journalctl:

$ journalctl -u my_app -f Mar 26 18:21:10 foo my_app[580361]: 18:21:10.337 request_id=Fm_3dFhPMtEHARkAAALy [info] Sent 500 in 16ms Mar 26 18:21:10 foo my_app[580361]: 18:21:10.345 [error] #PID<0.4149.0> running MyAppWeb.Endpoint (connection #PID<0.4148.0>, stream id 1) terminated Mar 26 18:21:10 foo my_app[580361]: Server: foo.example.com:443 (https) Mar 26 18:21:10 foo my_app[580361]: Request: GET /test/error Mar 26 18:21:10 foo my_app[580361]: ** (exit) an exception was raised: Mar 26 18:21:10 foo my_app[580361]: ** (RuntimeError) test runtime error Mar 26 18:21:10 foo my_app[580361]: (my_app 0.1.0) lib/my_app_web/controllers/test_controller.ex:9: MyAppWeb.TestController.error/2 Mar 26 18:21:10 foo my_app[580361]: (my_app 0.1.0) lib/my_app_web/controllers/test_controller.ex:1: MyAppWeb.TestController.action/2 Mar 26 18:21:10 foo my_app[580361]: (my_app 0.1.0) lib/my_app_web/controllers/test_controller.ex:1: MyAppWeb.TestController.phoenix_controller_pipeline/2 Mar 26 18:21:10 foo my_app[580361]: (phoenix 1.5.8) lib/phoenix/router.ex:352: Phoenix.Router.__call__/2 Mar 26 18:21:10 foo my_app[580361]: (my_app 0.1.0) lib/my_app_web/endpoint.ex:1: MyAppWeb.Endpoint.plug_builder_call/2 Mar 26 18:21:10 foo my_app[580361]: (my_app 0.1.0) lib/my_app_web/endpoint.ex:1: MyAppWeb.Endpoint.call/2 Mar 26 18:21:10 foo my_app[580361]: (phoenix 1.5.8) lib/phoenix/endpoint/cowboy2_handler.ex:65: Phoenix.Endpoint.Cowboy2Handler.init/4 Mar 26 18:21:10 foo my_app[580361]: (cowboy 2.8.0) /srv/my_app/deps/cowboy/src/cowboy_handler.erl:37: :cowboy_handler.execute/2

But with ExSyslogger in place, you'll now see this (where the full error message is captured as a single log entry, and is recognized as an error-level message):

$ journalctl -u my_app -f Mar 26 18:21:10 foo my_app[580361]: 18:21:10.337 request_id=Fm_3dFhPMtEHARkAAALy [info] Sent 500 in 16ms Mar 26 18:21:10 foo my_app[580361]: 18:21:10.345 [error] #PID<0.4149.0> running MyAppWeb.Endpoint (connection #PID<0.4148.0>, stream id 1) terminated Server: foo.example.com:443 (https) Request: GET /test/error ** (exit) an exception was raised: ** (RuntimeError) test runtime error (my_app 0.1.0) lib/my_app_web/controllers/test_controller.ex:9: MyAppWeb.TestController.error/2 (my_app 0.1.0) lib/my_app_web/controllers/test_controller.ex:1: MyAppWeb.TestController.action/2 (my_app 0.1.0) lib/my_app_web/controllers/test_controller.ex:1: MyAppWeb.TestController.phoenix_controller_pipeline/2 (phoenix 1.5.8) lib/phoenix/router.ex:352: Phoenix.Router.__call__/2 (my_app 0.1.0) lib/my_app_web/endpoint.ex:1: MyAppWeb.Endpoint.plug_builder_call/2 (my_app 0.1.0) lib/my_app_web/endpoint.ex:1: MyAppWeb.Endpoint.call/2 (phoenix 1.5.8) lib/phoenix/endpoint/cowboy2_handler.ex:65: Phoenix.Endpoint.Cowboy2Handler.init/4 (cowboy 2.8.0) /srv/my_app/deps/cowboy/src/cowboy_handler.erl:37: :cowboy_handler.execute/2

And as a side note, you can use journalctl to view just error-level messages and above via the --priority=err flag (-p3 for short):

journalctl -u my_app -p3

Monday, March 8, 2021

D3v6 Pan and Zoom

Since D3 version 3 it's been really easy to add panning and zooming to custom visualizations, allowing the user to scroll the SVG canvas vertically and horizontally by clicking and dragging the mouse cursor around the canvas, and to scale the canvas larger and smaller by spinning the mouse wheel.

Simplest way

For the simplest case, all you need is to apply the d3.zoom() behavior to your root svg element. This is how you do it with D3 version 6 (d3.v6.js):

<svg id="viz1" width="300" height="300" style="background:#ffc"> <circle cx="50%" cy="50%" r="25%" fill="#69c" /> </svg> <script> const svg = d3 .select('#viz1') .call(d3.zoom().on('zoom', ({ transform }) => svg.attr('transform', transform))) </script>

It'll work like the following:

Smoothest way

In most cases, however, you'll get smoother behavior by adding some group (<g>) elements to wrap your main visualization elements. If you're starting with a structure like the following, where you've got a .canvas group element containing the main content you want to pan and zoom:

<svg id="viz2" width="300" height="300"> <g class="canvas" transform="translate(150,150)"> <circle cx="0" cy="0" r="25%" fill="#69c" /> </g> </svg>

Do this: add one wrapper group element, .zoomed, around the original .canvas group; and a second group element, .bg, around .zoomed; and add a rect inside the .bg group:

<svg id="viz2" width="300" height="300"> <g class="bg"> <rect width="100%" height="100%" fill="#efc" /> <g class="zoomed"> <g class="canvas" transform="translate(150,150)"> <circle cx="0" cy="0" r="25%" fill="#69c" /> </g> </g> </g> </svg>

The rect inside the .bg group will ensure that the user's click-n-drag or mouse wheeling will be captured as long as the mouse pointer is anywhere inside the svg element (without this rect, the mouse would be captured only when the user positions the mouse over a graphical element drawn inside the .bg group — like the circle in this example). For this example, I've set the fill of the rect to a light green-yellow color; but usually you'd just set it to transparent.

Then attach the pan & zoom behavior to the .bg group — but apply the pan & zoom transform to the .zoomed group it contains. This will prevent stuttering when panning, since the .bg group will remain fixed; and it will avoid messing with any transforms or other fancy styling/positioning you already have on your inner .canvas group:

<script> const zoomed = d3.select('#viz2 .zoomed') const bg = d3 .select('#viz2 .bg') .call( d3 // base d3 pan & zoom behavior .zoom() // limit zoom to between 20% and 200% of original size .scaleExtent([0.2, 2]) // apply pan & zoom transform to 'zoomed' element .on('zoom', ({ transform }) => zoomed.attr('transform', transform)) // add 'grabbing' class to 'bg' element when panning; // add 'scaling' class to 'bg' element when zooming .on('start', ({ sourceEvent: { type } }) => { bg.classed(type === 'wheel' ? 'scaling' : 'grabbing', true) }) // remove 'grabbing' and 'scaling' classes when done panning & zooming .on('end', () => bg.classed('grabbing scaling', false)), ) </script>

Finally, set the mouse cursor via CSS when the user positions the pointer over the rect element. The grabbing and scaling classes will be added to the .bg group while the pan or zoom activity is ongoing, via the on('start') and on('end') hooks above:

<style lang="css"> .bg > rect { cursor: move; } .bg.grabbing > rect { cursor: grabbing; } .bg.scaling > rect { cursor: zoom-in; } </style>

When you put it all together, it will work like the following:

Thursday, January 14, 2021

Using an Ecto Readonly Replica Repo

Elixir Ecto has excellent documentation for how to use read-only replica databases, but because I'm so dense it took me a bit of trial and error to figure out where all the changes suggested by the documentation should go in my own app. Here's a concrete example of what I had to change for my conventional Mix + Phoenix application.

(The docs describe how to add N different replicas dynamically — MyApp.Repo.Replica1, MyApp.Repo.Replica2, etc — but since I only have to worry about a single endpoint for my read replicas, I simplified things and just used a single, static MyApp.Repo.Replica instance in my Elixir configuration and code.)

Mix Environment Helper

To allow my app to determine whether it was compiled and is running with a test, dev, or prod configuration, I added a config_env setting to my app, and set it to the value of the Mix.env/0 function at compile time:

# config/config.exs config :my_app, config_env: Mix.env(), ecto_repos: [MyApp.Repo] end

Note that with Elixir 1.11 and newer, you can instead use Config.config_env/0 in place of Mix.env/0:

# config/config.exs config :my_app, config_env: Config.config_env(), ecto_repos: [MyApp.Repo] end

And in my root MyApp module, I added a helper function to access this config_env setting:

# lib/my_app.ex defmodule MyApp do def config_env, do: Application.get_env(:my_app, :config_env) end

This means that I can call MyApp.config_env/0 at runtime in various places in my app's code, and get the Mix.env/0 value with which the app was compiled (like :test, :dev, or :prod).

Replica Module

To my existing lib/my_app/repo.ex file (which already contained the MyApp.Repo module), I added the definition for my new MyApp.Repo.Replica module, like so:

# lib/my_app/repo.ex defmodule MyApp.Repo do use Ecto.Repo, otp_app: :my_app, adapter: Ecto.Adapters.Postgres def replica, do: MyApp.Repo.Replica end defmodule MyApp.Repo.Replica do use Ecto.Repo, otp_app: :my_app, adapter: Ecto.Adapters.Postgres, default_dynamic_repo: if(MyApp.config_env() != :test, do: MyApp.Repo.Replica, else: MyApp.Repo), read_only: true end

The default_dynamic_repo option in the MyApp.Repo.Replica module uses the config_env helper I added above to set up the module to use the primary MyApp.Repo's own connection pool for the read replica in the test environment, as recommended by the Ecto docs. This way the replica instance will just delegate to the primary repo instance for all of its read operations in the test environment, but will still enforce its own read-only setting. Also, this way I don't have to configure any test-env-specific settings for the read replica in my config/test.exs file (nor do I need to start up another child process for the replica, as we'll see in the next section).

Application Module

In non-test environments, the new read replica module does need to be started as a child process, alongside the primary repo. So I modified the start/2 function in my application module to start it:

# lib/my_app/application.ex defmodule MyApp.Application do use Application def start(_type, _args) do # don't start separate readonly repo in test mode repos = if MyApp.config_env() != :test do [MyApp.Repo, MyApp.Repo.Replica] else [MyApp.Repo] end children = repos ++ [ MyApp.Repo, MyAppWeb.Endpoint ] opts = [strategy: :one_for_one, name: MyApp.Supervisor] Supervisor.start_link(children, opts) end end

Dev Config

For my dev environment configuration, I updated my config/dev.exs file to simply duplicate the configuration of the primary MyApp.Reop for the MyApp.Repo.Replica (creating a separate connection pool to the same database as the primary for the replica):

# config/dev.exs config :my_app, MyApp.Repo, for repo <- [MyApp.Repo, MyApp.Repo.Replica] do config :my_app, repo, username: "myusername", password: "mypassword", database: "mydatabase", hostname: "localhost", show_sensitive_data_on_connection_error: true, pool_size: 10 end

Prod Config

For the prod environment configuration, I updated my config/releases.exs file to use a similar configuration as the primary for the replica, but have it instead pull the replica hostname from a different environment variable (DB_READONLY in this case):

# config/releases.exs config :my_app, MyApp.Repo, ssl: true, username: System.get_env("DB_USERNAME"), password: System.get_env("DB_PASSWORD"), database: System.get_env("DB_DATABASE"), hostname: System.get_env("DB_HOSTNAME"), pool_size: String.to_integer(System.get_env("DB_POOLSIZE") || "10") config :my_app, MyApp.Repo.Replica, ssl: true, username: System.get_env("DB_USERNAME"), password: System.get_env("DB_PASSWORD"), database: System.get_env("DB_DATABASE"), hostname: System.get_env("DB_READONLY"), pool_size: String.to_integer(System.get_env("DB_POOLSIZE") || "10")

Using the Replica

With all the above in place, everywhere in my Elixir code that I want to query a read replica instead the primary database, I can just replace MyApp.Repo with MyApp.Repo.replica():

# lib/my_app/users.ex import Ecto.Query alias MyApp.Repo alias MyApp.Users.User def list_usernames do from(u in User, select: u.username) |> Repo.replica().all() end

Wednesday, December 30, 2020

Ecto RDS SSL Connection with Certificate Verification

It's nice and easy to connect to an AWS RDS instance with Elixir Ecto over SSL/TLS, as long as you're not worried about verifying the database server's certificate. You just add a ssl: true setting when your configure the Ecto Repo, like this snippet from a config/releases.exs file for a hypothetical "myapp":

# config/releases.exs config :myapp, MyApp.Repo, hostname: System.get_env("DB_HOSTNAME"), database: System.get_env("DB_DATABASE"), username: System.get_env("DB_USERNAME"), password: System.get_env("DB_PASSWORD"), ssl: true

That's probably good enough for most cloud environments; but if you want to defend against a sophisticated attacker eavesdropping on or manipulating the SSL connections between your DB client and the RDS server, you also need to configure your Ecto Repo's ssl_opts setting to verify the server's certificate.

Unfortunately, this is not so straightforward. You need to either write your own certificate verification function (not trivial), or use one supplied by another library — like the ssl_verify_fun.erl library.

To use the :ssl_verify_hostname verification function from the ssl_verify_fun.erl library, first add the library as a dependency to your mix.exs file:

# mix.exs defp deps do [ {:ecto_sql, "~> 3.5"}, {:ssl_verify_fun, ">= 0.0.0"} ] end

Then add the following ssl_opts setting to your Ecto Repo config:

# config/releases.exs check_hostname = String.to_charlist(System.get_env("DB_HOSTNAME")) config :myapp, MyApp.Repo, hostname: System.get_env("DB_HOSTNAME"), database: System.get_env("DB_DATABASE"), username: System.get_env("DB_USERNAME"), password: System.get_env("DB_PASSWORD"), ssl: true, ssl_opts: [ cacertfile: "/etc/ssl/certs/rds-ca-2019-root.pem", server_name_indication: check_hostname, verify: :verify_peer, verify_fun: {&:ssl_verify_hostname.verify_fun/3, [check_hostname: check_hostname]} ]

Note the RDS server hostname (which would be something like my-rds-cluster.cluster-abcd1234efgh.us-east-1.rds.amazonaws.com) needs to be passed to the server_name_indication and check_hostname options as a charlist. The above example also assumes that you have downloaded the root RDS SSL certificate to /etc/ssl/certs/rds-ca-2019-root.pem on your DB client hosts.

I'd also suggest pulling out the generation of ssl_opts into a function, to make it easy to set up multiple repos. This is the way I'd do it with out our hypothetical "myapp" repo: I'd add one environment variable (DB_SSL) to trigger the Ecto ssl setting (with or without verifying the server cert), and another environment variable (DB_SSL_CA_CERT) to specify the path for the cacertfile option (triggering cert verification):

# config/releases.exs make_ssl_opts = fn "", _hostname -> [] cacertfile, hostname -> check_hostname = String.to_charlist(hostname) [ cacertfile: cacertfile, server_name_indication: check_hostname, verify: :verify_peer, verify_fun: {&:ssl_verify_hostname.verify_fun/3, [check_hostname: check_hostname]} ] end db_ssl_ca_cert = System.get_env("DB_SSL_CA_CERT", "") db_ssl = db_ssl_ca_cert != "" or System.get_env("DB_SSL", "") != "" db_hostname = System.get_env("DB_HOSTNAME") config :myapp, MyApp.Repo, hostname: db_hostname, database: System.get_env("DB_DATABASE"), username: System.get_env("DB_USERNAME"), password: System.get_env("DB_PASSWORD"), ssl: db_ssl, ssl_opts: make_ssl_opts.(db_ssl_ca_cert, db_hostname)

With this verification in place, you'd see an error like the following if your DB client tries to connect to a server with a SSL certificate signed by a CA other than the one you configured:

{:tls_alert, {:unknown_ca, 'TLS client: In state certify at ssl_handshake.erl:1950 generated CLIENT ALERT: Fatal - Unknown CA\n'}}

And you'd see an error like the following if the certificate was signed by the expected CA, but for a different hostname:

{bad_cert,unable_to_match_altnames} - {:tls_alert, {:handshake_failure, 'TLS client: In state certify at ssl_handshake.erl:1952 generated CLIENT ALERT: Fatal - Handshake Failure\n {bad_cert,unable_to_match_altnames}'}}