Tuesday, February 11, 2025

Windows MSI for a Python Service With Cx_Freeze

The excellent cx_Freeze project makes it easy to build Windows executables for Python scripts. It also has some handy glue code to enable you to install and run a Python script as a Windows service; and includes a simple Windows service example in the project's samples directory.

It's also possible to package up the service into an MSI file (which can be installed via the msiexec tool included with Windows). However, I didn't find any good examples for how to do this; so this is what I did for a recent project:

  1. Set Up Basic Project With UV
  2. Set Up Basic Windows Service
  3. Build and Test Windows Service
  4. Build Basic MSI Package
  5. Configure MSI to Install and Start the Service
  6. Extra: Add Other Executables
  7. Extra: Add Icons
  8. Extra: Add Other Files
  9. Extra: Add Start Menu

Set Up Basic Project With UV

I set up a basic project with UV, using a pyproject.toml file that looks like this:

# pyproject.toml [project] name = "my_service" dynamic = ["version"] description = "My service description." readme = "README.md" license = { file = "LICENSE" } requires-python = ">=3.12" dependencies = [ "pywin32~=306.0 ; sys_platform == 'win32'", ] [project.scripts] my-service-cli = "my_service.cli:main" [build-system] requires = ["hatchling"] build-backend = "hatchling.build" [dependency-groups] dev = [ "pytest>=8.3.4", "ruff>=0.8.6", ] freeze = [ "cx-freeze>=6.13.2", "cx-logging>=3.0", ] [tool.hatch.version] path = "my_service/__init__.py"

I put the cx_Freeze-specific dependencies in my own custom freeze dependency group, as I only need them to be installed when running cx_Freeze commands.

In the my_service package, I set up globals for the version number and service details:

# src/my_service/__init__.py """My Service.""" __version__ = "1.0.0" DISPLAY_NAME = "My Service" DESCRIPTION = "My service description." SERVICE_NAME = "my-service"

Set Up Basic Windows Service

Next I set up a basic service class (similar to the Handler class in the ServiceHandler.py file of cx_Freeze's service sample) to run my service code (and to set up some log files):

# src/my_service/windows_service.py """cx_Freeze Win32Service for the my service.""" import sys from pathlib import Path try: import cx_Logging except ImportError: cx_Logging = None from my_service.service import run_my_service class Service: """cx_Freeze Win32Service for the agent.""" def initialize(self, cnf_file): """Called when the service is starting. Arguments: cnf_file (str): Path to configuration file. """ try: init_log() except Exception: cx_Logging.LogException() def run(self): """Called when the service is running.""" try: cx_Logging.Debug("running my service") run_my_service() except Exception: cx_Logging.LogException() def stop(self): """Called when the service is stopping.""" self.cnf.loop = 0 def init_log(): """Initializes service logging.""" log_dir = get_log_dir() cx_Logging.StartLogging(str(log_dir / "init.log"), cx_Logging.DEBUG) sys.stdout = open(log_dir / "stdout.log", "a") def get_log_dir(): """Gets service logging directory. Returns: str: Path to logging directory. """ executable_dir = Path(sys.executable).parent log_dir = executable_dir / "log" Path.mkdir(log_dir, parents=True, exist_ok=True) return log_dir

Then I set up the service-definition config for it (similar to the Config.py file of cx_Freeze's service sample), using some of the constants I had defined in my root my_service module:

# src/my_service/windows_service_config.py """cx_Freeze config for my service as a Win32Service.""" import my_service NAME = f"{my_service.SERVICE_NAME}-%s" DISPLAY_NAME = f"{my_service.DISPLAY_NAME} %s" MODULE_NAME = "my_service.windows_service" CLASS_NAME = "Service" DESCRIPTION = my_service.DESCRIPTION AUTO_START = False SESSION_CHANGES = False

And then I put together a cx_Freeze setup script (similar to the setup.py file of cx_Freeze's service sample) to run the cx_Freeze build_exe build, again using some of the constants I had defined in my root my_service module:

# cx_freeze_setup.py """cx_Freeze setup script.""" from cx_Freeze import Executable, setup from my_service import DESCRIPTION, SERVICE_NAME from my_service import __version__ as VERSION EXECUTABLES = [ Executable( script="src/my_service/windows_service_config.py", base="Win32Service", target_name=SERVICE_NAME, ), ] setup( name=SERVICE_NAME, version=VERSION, description=DESCRIPTION, options={ "build_exe": { "excludes": [ "test", "tkinter", "unittest", ], "includes": [ "_cffi_backend", "cx_Logging", ], "include_msvcr": True, "packages": [ "my_service", ], }, }, executables=EXECUTABLES, )

Build and Test Windows Service

With that in place, I could run cx_Freeze's build_exe command to build my service as a Windows executable, generating a my-service-svc.exe file in my project's build\exe.win-amd64-3.12 directory; and alongside the exe file, a lib folder containing all the compiled Python modules needed for the executable, plus some core DLLs:

> uv run --group freeze cx_freeze_setup.py build_exe Using CPython 3.12.9 Creating virtual environment at: .venv Built my-service @ file:///C:/my_project Installed 31 packages in 44.15s running build_exe creating directory C:\my_project\build\exe.win-amd64-3.12 copying C:\my_project\.venv\Lib\site-packages\cx_Freeze\bases\Win32Service-cpython-312-win_amd64.exe -> C:\my_project\build\exe.win-amd64-3.12\my-service.exe ... copying C:\my_project\.venv\Lib\site-packages\pywin32_system32\pywintypes312.dll -> C:\my_project\build\exe.win-amd64-3.12\lib\pywintypes312.dll writing zip file C:\my_project\build\exe.win-amd64-3.12\lib\library.zip Name File ---- ---- m BUILD_CONSTANTS C:\Users\JUSTIN~1\AppData\Local\Temp\2\cxfreeze-8edcizah\BUILD_CONSTANTS.py ... m zipimport C:\Users\Justin\AppData\Roaming\uv\python\cpython-3.12.9-windows-x86_64-none\Lib\zipimport.py Missing modules: ? OpenSSL.SSL imported from urllib3.contrib.pyopenssl ... ? zstandard imported from urllib3.response, urllib3.util.request This is not necessarily a problem - the modules may not be needed on this platform. Missing dependencies: ? api-ms-win-core-path-l1-1-0.dll ... ? api-ms-win-crt-utility-l1-1-0.dll This is not necessarily a problem - the dependencies may not be needed on this platform. > dir build\exe.win-amd64-3.12 Volume in drive C is Windows Volume Serial Number is 7EC2-1A39 Directory of C:\my_project\build\exe.win-amd64-3.12 02/13/2025 12:19 AM <DIR> . 02/13/2025 12:19 AM <DIR> .. 02/13/2025 12:10 AM 44,032 cx_Logging.cp312-win_amd64.pyd 02/13/2025 12:14 AM 3,326 frozen_application_license.txt 02/13/2025 12:19 AM <DIR> lib 02/13/2025 12:19 AM 139,264 my-service.exe 02/13/2025 12:04 AM 6,925,312 python312.dll 02/13/2025 12:06 AM 99,840 vcruntime140.dll 02/13/2025 12:06 AM 29,184 vcruntime140_1.dll 6 File(s) 7,623,119 bytes 3 Dir(s) 10,450,911,232 bytes free

This build\exe.win-amd64-3.12 directory is basically what I want to package up and ship to users. In this state, I can also test out the Windows service by manually installing it and running it on the build box:

> .\build\exe.win-amd64-3.12\my-service.exe --install default Service installed. > sc start my-service-default SERVICE_NAME: start my-service-default TYPE : 10 WIN32_OWN_PROCESS STATE : 2 START_PENDING (NOT_STOPPABLE, NOT_PAUSABLE, IGNORES_SHUTDOWN) WIN32_EXIT_CODE : 0 (0x0) SERVICE_EXIT_CODE : 0 (0x0) CHECKPOINT : 0x0 WAIT_HINT : 0x7d0 PID : 1732 FLAGS : > sc stop my-service-default SERVICE_NAME: my-service-default TYPE : 10 WIN32_OWN_PROCESS STATE : 1 STOPPED WIN32_EXIT_CODE : 0 (0x0) SERVICE_EXIT_CODE : 0 (0x0) CHECKPOINT : 0x0 WAIT_HINT : 0x0 > .\build\exe.win-amd64-3.12\my-service.exe --uninstall default Service uninstalled.

Build Basic MSI Package

Next I can use cx_Freeze's bdist_msi command to create an MSI package that doesn't install or start my service, but simply installs the contents of the build\exe.win-amd64-3.12 into the Program Files directory of a user's Windows machine. First, however, I added a bit more configuration to my cx_Freeze setup script for the bdist_msi command options:

# cx_freeze_setup.py """cx_Freeze setup script.""" from cx_Freeze import Executable, setup from my_service import DESCRIPTION, SERVICE_NAME from my_service import __version__ as VERSION EXECUTABLES = [ Executable( script="src/my_service/windows_service_config.py", base="Win32Service", target_name=SERVICE_NAME, ), ] setup( name=SERVICE_NAME, version=VERSION, description=DESCRIPTION, options={ "build_exe": { "excludes": [ "test", "tkinter", "unittest", ], "includes": [ "_cffi_backend", "cx_Logging", ], "include_msvcr": True, "packages": [ "my_service", ], }, "bdist_msi": { "all_users": True, "initial_target_dir": f"[ProgramFiles64Folder]{DISPLAY_NAME}", # IMPORTANT: generate a unique UUID for your service "upgrade_code": "{26483DF9-540E-43D7-B543-795C62E3AF2D}", }, }, executables=EXECUTABLES, )

You should generate a separate UUID for each project, and then keep than UUID constant for the project's liftetime, so that Windows will know which existing package to upgrade when the user tries to install a new version of it. I generated the UUID for my service by running this command on a Linux box:

$ tr [a-z] [A-Z] < /proc/sys/kernel/random/uuid

Building the MSI is now as simple as running the following command:

> uv run --group freeze cx_freeze_setup.py bdist_msi running bdist_msi running build running build_exe creating directory C:\my_project\build\exe.win-amd64-3.12 copying C:\my_project\.venv\Lib\site-packages\cx_Freeze\bases\Win32Service-cpython-312-win_amd64.exe -> C:\my_project\build\exe.win-amd64-3.12\my-service.exe ... ? api-ms-win-crt-utility-l1-1-0.dll This is not necessarily a problem - the dependencies may not be needed on this platform. installing to build\bdist.win-amd64\msi running install_exe creating build\bdist.win-amd64\msi ... copying build\exe.win-amd64-3.12\vcruntime140_1.dll -> build\bdist.win-amd64\msi creating dist removing 'build\bdist.win-amd64\msi' (and everything under it) > dir dist Volume in drive C is Windows Volume Serial Number is 7EC2-1A39 Directory of C:\my_project\dist 02/13/2025 01:05 AM <DIR> . 02/13/2025 01:05 AM <DIR> .. 02/13/2025 01:05 AM 10,788,864 my-service-1.0.0-win64.msi 1 File(s) 10,788,864 bytes 2 Dir(s) 10,826,158,080 bytes free

This will automatically run the same build_exe command as before, but after that will also create an MSI file with the contents of the generated build\exe.win-amd64-3.12 directory, and save it as the dist\my-service-1.0.0-win64.msi file.

This dist\my-service-1.0.0-win64.msi file can be run directly to launch a graphical installer; or it can be run with the msiexec utility to install silently with no prompts:

> msiexec /i dist\my-service-1.0.0-win64.msi /qn

Configure MSI to Install and Start the Service

To enable my service to install and start automatically when the user installs the MSI package, I had to configure the bdist_msi command with some special MSI data tables for installing a Windows service (ServiceInstall), and for running it (ServiceControl):

# cx_freeze_setup.py """cx_Freeze setup script.""" from re import sub from cx_Freeze import Executable, setup from my_service import DESCRIPTION, SERVICE_NAME from my_service import __version__ as VERSION SERVICE_DEFAULT=default SERVICE_WIN32_OWN_PROCESS = 0x10 SERVICE_AUTO_START = 0x2 SERVICE_ERROR_NORMAL = 0x1 MSIDB_SERVICE_CONTROL_EVENT_START = 0x1 MSIDB_SERVICE_CONTROL_EVENT_STOP = 0x2 MSIDB_SERVICE_CONTROL_EVENT_UNINSTALL_STOP = 0x20 MSIDB_SERVICE_CONTROL_EVENT_UNINSTALL_DELETE = 0x80 EXECUTABLES = [ Executable( script="src/my_service/windows_service_config.py", base="Win32Service", target_name=SERVICE_NAME, ), ] def _make_component_id(executables, index): executable = executables[index] component = f"_cx_executable{index}_{executable}" return sub(r"[^\w.]", "_", component) setup( name=SERVICE_NAME, version=VERSION, description=DESCRIPTION, options={ "build_exe": { "excludes": [ "test", "tkinter", "unittest", ], "includes": [ "_cffi_backend", "cx_Logging", ], "include_msvcr": True, "packages": [ "my_service", ], }, "bdist_msi": { "all_users": True, "initial_target_dir": f"[ProgramFiles64Folder]{DISPLAY_NAME}", # IMPORTANT: generate a unique UUID for your service "upgrade_code": "{26483DF9-540E-43D7-B543-795C62E3AF2D}", "data": { "ServiceInstall": [ ( f"{SERVICE_NAME}-{SERVICE_DEFAULT}Install", # ID f"{SERVICE_NAME}-{SERVICE_DEFAULT}", # Name DISPLAY_NAME, # DisplayName SERVICE_WIN32_OWN_PROCESS, # ServiceType SERVICE_AUTO_START, # StartType SERVICE_ERROR_NORMAL, # ErrorControl None, # LoadOrderGroup None, # Dependencies None, # StartName None, # Password None, # Arguments _make_component_id(EXECUTABLES, 0), # Component DESCRIPTION, # Description ), ], "ServiceControl": [ ( f"{SERVICE_NAME}-{SERVICE_DEFAULT}Control", # ID f"{SERVICE_NAME}-{SERVICE_DEFAULT}", # Name ( MSIDB_SERVICE_CONTROL_EVENT_START + MSIDB_SERVICE_CONTROL_EVENT_STOP + MSIDB_SERVICE_CONTROL_EVENT_UNINSTALL_STOP + MSIDB_SERVICE_CONTROL_EVENT_UNINSTALL_DELETE ), # Event None, # Arguments 0, # Wait _make_component_id(EXECUTABLES, 0), # Component ), ], }, }, }, executables=EXECUTABLES, )

In the above, I've set up a row for the ServiceInstall MSI table to direct the Windows installer to install my service as a Windows service, and configure it to auto-start on system boot. This is basically the equivalent of running the following two commands:

> .\build\exe.win-amd64-3.12\my-service.exe --install default > sc config my-service-default start=auto

I also set up a row for the ServiceControl MSI table to direct the Windows installer to:

  1. Start my service on install (MSIDB_SERVICE_CONTROL_EVENT_START)
  2. Stop the old version of my service on upgrade (MSIDB_SERVICE_CONTROL_EVENT_STOP)
  3. Stop my service on uninstall (MSIDB_SERVICE_CONTROL_EVENT_UNINSTALL_STOP)
  4. Delete my service on uninstall (MSIDB_SERVICE_CONTROL_EVENT_UNINSTALL_DELETE)

This is basically the equivalent of running this command on install:

> sc start my-service-default

These commands on upgrade:

> sc stop my-service-default ... install the new executable and support files ... > sc start my-service-default

And these commands on uninstall:

> sc stop my-service-default > .\build\exe.win-amd64-3.12\my-service.exe --uninstall default

The one tricky part of the above is that in the ServiceInstall and ServiceControl tables, you have to reference the service executable by the component ID that cx_Freeze auto-generates for it — which is based on the index of the corresponding Executable object that you've configured in the setup command's executables option, as well as the string representation of the Executable object itself. I encapsulated the logic to derive this ID in my _make_component_id() function, which takes as arguments the list that will be used for the executables option, as well as the index in that list to the service's Executable object.

Now when building and installing my service's MSI file, we can see that my service is started automatically after install:

> uv run --group freeze cx_freeze_setup.py bdist_msi ... > msiexec /i dist\my-service-1.0.0-win64.msi /qn > sc query my-service-default SERVICE_NAME: my-service-default TYPE : 10 WIN32_OWN_PROCESS STATE : 4 RUNNING (STOPPABLE, NOT_PAUSABLE, IGNORES_SHUTDOWN) WIN32_EXIT_CODE : 0 (0x0) SERVICE_EXIT_CODE : 0 (0x0) CHECKPOINT : 0x0 WAIT_HINT : 0x0 > sc qc my-service-default [SC] QueryServiceConfig SUCCESS SERVICE_NAME: my-service-default TYPE : 10 WIN32_OWN_PROCESS START_TYPE : 2 AUTO_START ERROR_CONTROL : 1 NORMAL BINARY_PATH_NAME : "C:\Program Files\My Service\my-service.exe" LOAD_ORDER_GROUP : TAG : 0 DISPLAY_NAME : My Service DEPENDENCIES : SERVICE_START_NAME : LocalSystem

Extra: Add Other Executables

That's all I needed for the service itself; but for my project, I also had some CLI (Command Line Interface) scripts I wanted to include in its installed Program Files directory.

For each script, I simply had to add another Executable object to my EXECUTABLES list — for example, I had a my-service-cli script defined in my pyproject.toml file:

# pyproject.toml [project.scripts] my-service-cli = "my_service.cli:main"

So I added a corresponding my-service-cli executable definition to my cx_Freeeze script:

# cx_freeze_setup.py EXECUTABLES = ... Executable( script="src/my_service/cli.py", base="console", target_name="my-service-cli", ), ]

Which resulted in a my-service-cli.exe executable generated by cx_Freeze — and added to my project's Program Files directory when the MSI is installed:

C:\Program Files\My Service\my-service-cli.exe

Extra: Add Icons

To add icons to my executables — and to my installer MSI file — I created a custom ICO icon, saved it in my project's source directory as installer/my_icon.ico, and annotated my Executable and bdist_msi configuration with it:

# cx_freeze_setup.py EXECUTABLES = Executable( script="src/my_service/windows_service_config.py", base="Win32Service", target_name=SERVICE_NAME, icon="installer/my_icon.ico", ), Executable( script="src/my_service/cli.py", base="console", target_name="my-service-cli", icon="installer/my_icon.ico", ), ] ... setup( ... options={ "bdist_msi": { ... "install_icon": "installer/my_icon.ico", ... }, }, )

Extra: Add Other Files

I also had some other miscellaneous files I wanted to include in my project's Program Files directory. That turned out to be as easy as using the include_files option of the build_exe command:

# cx_freeze_setup.py setup( ... options={ "build_exe": { ... "include_files": [ ("LICENSE", "LICENSE.txt"), ("installer/help.url", "help.url"), ("installer/log.txt", "log/README.txt"), ], ... }, }, )

This enabled me to take these files from my project source directory:

LICENSE installer/help.url installer/log.txt

And install them into the C:\Program Files\My Service directory as these files:

LICENSE.txt help.url log\README.txt

The installer automatically creates any necessary subdirectories for these files (such as the log subdirectory).

Extra: Add Start Menu

Finally, I also wanted to add some custom commands and links to things for my project in its own Start Menu folder. That turned out to require adding several more custom MSI tables to the bdist_msi config.

First, I had to create a custom Start Menu folder for my project, using the Directory table:

# cx_freeze_setup.py setup( ... options={ "bdist_msi": { ... "data": { "Directory": [ ( "ProgramMenuFolder", # ID "TARGETDIR", # DirectoryParent ".", # DefaultDir ), ( f"{SERVICE_NAME}Folder", # ID "ProgramMenuFolder", # DirectoryParent DISPLAY_NAME, # DefaultDir ), ], ... }, }, }, )

The first row in the Directory table sets up a reference to Window's Start Menu folder (using special ProgramMenuFolder and TARGETDIR keywords); and the second row creates a new folder named "My Service" in it.

Next, I used the Property table to define installation properties for the Windows' cmd and explorer commands, so that I could use them for several of the Start Menu items themselves:

# cx_freeze_setup.py setup( ... options={ "bdist_msi": { ... "data": { ... "Property": [ ("cmd", "cmd"), ("explorer", "explorer"), ], ... }, }, }, )

Finally, I used the Shortcut table to define each item I wanted to add to my custom Start Menu:

# cx_freeze_setup.py setup( ... options={ "bdist_msi": { ... "data": { ... "Shortcut": [ ( f"{SERVICE_NAME}VersionMenuItem", # ID f"{SERVICE_NAME}Folder", # Directory f"{DISPLAY_NAME} Version", # Name "TARGETDIR", # Component "[cmd]", # Target "/k my-service-cli.exe --version", # Arguments None, # Description None, # Hotkey None, # Icon None, # IconIndex None, # ShowCmd "TARGETDIR", # WkDir ), ( f"{SERVICE_NAME}HelpMenuItem", # ID f"{SERVICE_NAME}Folder", # Directory "Help", # Name "TARGETDIR", # Component "[TARGETDIR]help.url", # Target None, # Arguments None, # Description None, # Hotkey None, # Icon None, # IconIndex None, # ShowCmd "TARGETDIR", # WkDir ), ( ( f"{SERVICE_NAME}LogsMenuItem", # ID f"{SERVICE_NAME}Folder", # Directory "Logs", # Name "TARGETDIR", # Component "[explorer]", # Target "log", # Arguments None, # Description None, # Hotkey None, # Icon None, # IconIndex None, # ShowCmd "TARGETDIR", # WkDir ), ], ... }, }, }, )

In each of the above rows, the first column (like f"{SERVICE_NAME}VersionMenuItem") is an arbitrary ID for the row; the second column (f"{SERVICE_NAME}Folder") is a reference to the second row in the Directory table (directing the installer to create the shortcut in my service's custom Start Menu folder); and the third column (like f"{DISPLAY_NAME} Version") is the display name for the shortcut.

The fifth and sixth columns are the command and command arguments to use for the shortcut; and the last column is a reference to the working directory in which to run the command. So my first row defined a shortcut that is equivalent to opening up a command prompt and running the following commands:

> cd C:\Program Files\My Service > my-service-cli.exe --version

My second row defined a shortcut that's equivalent to double-clicking on the help.url file that I installed into my project's Program Files directory with the include_files config from the earlier Add Other Files section. And my third row defined a shortcut that's equivalent to opening Windows' File Explorer to the C:\Program Files\My Service\log directory (created via the same include_files config).

Finished Product

With the above extras, my complete cx_Freeze setup script ended up looking like the following:

# cx_freeze_setup.py """cx_Freeze setup script.""" from re import sub from cx_Freeze import Executable, setup from my_service import DESCRIPTION, SERVICE_NAME from my_service import __version__ as VERSION SERVICE_DEFAULT=default SERVICE_WIN32_OWN_PROCESS = 0x10 SERVICE_AUTO_START = 0x2 SERVICE_ERROR_NORMAL = 0x1 MSIDB_SERVICE_CONTROL_EVENT_START = 0x1 MSIDB_SERVICE_CONTROL_EVENT_STOP = 0x2 MSIDB_SERVICE_CONTROL_EVENT_UNINSTALL_STOP = 0x20 MSIDB_SERVICE_CONTROL_EVENT_UNINSTALL_DELETE = 0x80 EXECUTABLES = [ Executable( script="src/my_service/windows_service_config.py", base="Win32Service", target_name=SERVICE_NAME, icon="installer/my_icon.ico", ), Executable( script="src/my_service/cli.py", base="console", target_name="my-service-cli", icon="installer/my_icon.ico", ), ] def _make_component_id(executables, index): executable = executables[index] component = f"_cx_executable{index}_{executable}" return sub(r"[^\w.]", "_", component) setup( name=SERVICE_NAME, version=VERSION, description=DESCRIPTION, options={ "build_exe": { "excludes": [ "test", "tkinter", "unittest", ], "includes": [ "_cffi_backend", "cx_Logging", ], "include_files": [ ("LICENSE", "LICENSE.txt"), ("installer/help.url", "help.url"), ("installer/log.txt", "log/README.txt"), ], "include_msvcr": True, "packages": [ "my_service", ], }, "bdist_msi": { "all_users": True, "initial_target_dir": f"[ProgramFiles64Folder]{DISPLAY_NAME}", "install_icon": "installer/my_icon.ico", # IMPORTANT: generate a unique UUID for your service "upgrade_code": "{26483DF9-540E-43D7-B543-795C62E3AF2D}", "data": { "Directory": [ ( "ProgramMenuFolder", # ID "TARGETDIR", # DirectoryParent ".", # DefaultDir ), ( f"{SERVICE_NAME}Folder", # ID "ProgramMenuFolder", # DirectoryParent DISPLAY_NAME, # DefaultDir ), ], "Property": [ ("cmd", "cmd"), ("explorer", "explorer"), ], "ServiceInstall": [ ( f"{SERVICE_NAME}-{SERVICE_DEFAULT}Install", # ID f"{SERVICE_NAME}-{SERVICE_DEFAULT}", # Name DISPLAY_NAME, # DisplayName SERVICE_WIN32_OWN_PROCESS, # ServiceType SERVICE_AUTO_START, # StartType SERVICE_ERROR_NORMAL, # ErrorControl None, # LoadOrderGroup None, # Dependencies None, # StartName None, # Password None, # Arguments _make_component_id(EXECUTABLES, 0), # Component DESCRIPTION, # Description ), ], "ServiceControl": [ ( f"{SERVICE_NAME}-{SERVICE_DEFAULT}Control", # ID f"{SERVICE_NAME}-{SERVICE_DEFAULT}", # Name ( MSIDB_SERVICE_CONTROL_EVENT_START + MSIDB_SERVICE_CONTROL_EVENT_STOP + MSIDB_SERVICE_CONTROL_EVENT_UNINSTALL_STOP + MSIDB_SERVICE_CONTROL_EVENT_UNINSTALL_DELETE ), # Event None, # Arguments 0, # Wait _make_component_id(EXECUTABLES, 0), # Component ), ], "Shortcut": [ ( f"{SERVICE_NAME}VersionMenuItem", # ID f"{SERVICE_NAME}Folder", # Directory f"{DISPLAY_NAME} Version", # Name "TARGETDIR", # Component "[cmd]", # Target "/k my-service-cli.exe --version", # Arguments None, # Description None, # Hotkey None, # Icon None, # IconIndex None, # ShowCmd "TARGETDIR", # WkDir ), ( f"{SERVICE_NAME}HelpMenuItem", # ID f"{SERVICE_NAME}Folder", # Directory "Help", # Name "TARGETDIR", # Component "[TARGETDIR]help.url", # Target None, # Arguments None, # Description None, # Hotkey None, # Icon None, # IconIndex None, # ShowCmd "TARGETDIR", # WkDir ), ( ( f"{SERVICE_NAME}LogsMenuItem", # ID f"{SERVICE_NAME}Folder", # Directory "Logs", # Name "TARGETDIR", # Component "[explorer]", # Target "log", # Arguments None, # Description None, # Hotkey None, # Icon None, # IconIndex None, # ShowCmd "TARGETDIR", # WkDir ), ], }, }, }, executables=EXECUTABLES, )

Wednesday, October 4, 2023

LXD Containers and FIDO Security Keys

With the rise of WebAuthn, I've had to figure out how expose my various FIDO security keys (YubiKey, Nitrokey, OnlyKey, SoloKeys, etc) to the LXD containers I use for web browsers.

The core of the solution is to expose the HIDRAW device that the security key is using to the LXD container — and to configure the device in the container to be owned by the user account who will use it. If you only have one such key plugged in, it's most likely using the /dev/hidraw0 device; and usually it's user 1000 who needs to use it. An LXD profile entry like the following allows such access:

config: {}
description: exposes FIDO devices
devices:
  hidraw0:
    required: false
    source: /dev/hidraw0
    type: unix-char
    uid: "1000"
name: fido
used_by: []

A profile like this can be created, configured, and applied to a container with the following commands:

$ lxc profile create fido
Profile fido created
$ lxc profile device add fido hidraw0 unix-char required=false source=/dev/hidraw0 uid=1000
$ lxc profile add mycontainer fido
Profile fido added to mycontainer

However, the exact HIDRAW device number that a particular security key uses is not stable, and may vary as you plug and unplug various keys (or other USB or Bluetooth devices). How do you tell which HIDRAW device is being used by a particular physical device? The simplest way is to print out the content of the uevent pseudo file in the sysfs filesystem corresponding to each HIDRAW device until you find the one you want. For example, this is what the entry for one of my SoloKeys looks like, at hidraw11:

$ cat /sys/class/hidraw/hidraw11/device/uevent
DRIVER=hid-generic
HID_ID=0003:00001209:0000BEEE
HID_NAME=SoloKeys Solo 2 Security Key
HID_PHYS=usb-0000:00:14.0-4/input1
HID_UNIQ=1234567890ABCDEF1234567890ABCDEF
MODALIAS=hid:b0003g0001v00001209p0000BEEE

You can also get similar information — without the specific device name, but with the general type of device, like FIDO_TOKEN — from the udevadm command:

$ udevadm info /dev/hidraw11
P: /devices/pci0000:00/0000:00:24.0/usb1/2-4/2-4:1.4/0003:1209:BEEE.0022/hidraw/hidraw11
N: hidraw11
L: 0
E: DEVPATH=/devices/pci0000:00/0000:00:24.0/usb1/2-4/2-4:1.4/0003:1209:BEEE.0022/hidraw/hidraw11
E: DEVNAME=/dev/hidraw11
E: MAJOR=232
E: MINOR=12
E: SUBSYSTEM=hidraw
E: USEC_INITIALIZED=123456789010
E: ID_FIDO_TOKEN=1
E: ID_SECURITY_TOKEN=1
E: ID_PATH=pci-0000:00:24.0-usb-0:4:1.4
E: ID_PATH_TAG=pci-0000_00_24_0-usb-0_4_1_4
E: ID_FOR_SEAT=hidraw-pci-0000_00_24_0-usb-0_4_1_4
E: TAGS=:uaccess:seat:snap_firefox_geckodriver:security-device:snap_firefox_firefox:
E: CURRENT_TAGS=:uaccess:seat:snap_firefox_geckodriver:security-device:snap_firefox_firefox:

Using the udevadm info and lxc profile device list and commands, you can write a simple script that checks each /dev/hidraw* device on your host system against the HIDRAW devices registered for a particular LXD profile, and add or remove HIDRAW devices dynamically to that profile to match the current FIDO devices you have plugged in. Here's such a script:

#!/bin/sh -eu
profile=${1:-fido}
existing=$(lxc profile device list $profile)

for dev_path in /dev/hidraw*; do
    dev_name=$(basename $dev_path)
    if udevadm info $dev_path | grep FIDO >/dev/null; then
        if ! echo "$existing" | egrep '^'$dev_name'$' >/dev/null; then
            lxc profile device add $profile $dev_name \
                unix-char required=false source=$dev_path uid=1000
        fi
    else
        if echo "$existing" | egrep '^'$dev_name'$' >/dev/null; then
            lxc profile device remove $profile $dev_name
        fi
    fi
done

echo done

You can run the script manually every time you plug in a new security key, to make sure the security key is registered at the right HIDRAW slot in your LXD profile — or you can add a custom udev rule file to run it automatically.

If you save the above script as /usr/local/bin/add-fido-hidraw-devices-to-lxc-profile.sh, you can then add the below file as /etc/udev/rules.d/75-fido.rules (replacing justin with the username of your daily user) to automatically run the script for several different brands of FIDO security keys:

# Nitrokey 3
SUBSYSTEM=="hidraw", KERNEL=="hidraw*", ATTRS{idVendor}=="20a0", ATTRS{idProduct}=="42b2", RUN+="/bin/su justin -c /usr/local/bin/add-fido-hidraw-devices-to-lxc-profile.sh"
# OnlyKey
SUBSYSTEM=="hidraw", KERNEL=="hidraw*", ATTRS{idVendor}=="1d50", ATTRS{idProduct}=="60fc", RUN+="/bin/su justin -c /usr/local/bin/add-fido-hidraw-devices-to-lxc-profile.sh"
# SoloKeys
SUBSYSTEM=="hidraw", KERNEL=="hidraw*", ATTRS{idVendor}=="1209", ATTRS{idProduct}=="5070|50b0|beee", RUN+="/bin/su justin -c /usr/local/bin/add-fido-hidraw-devices-to-lxc-profile.sh"
# Yubico YubiKey
SUBSYSTEM=="hidraw", KERNEL=="hidraw*", ATTRS{idVendor}=="1050", ATTRS{idProduct}=="0113|0114|0115|0116|0120|0121|0200|0402|0403|0406|0407|0410", RUN+="/bin/su justin -c /usr/local/bin/add-fido-hidraw-devices-to-lxc-profile.sh"

Run the sudo udevadm control --reload-rules and sudo udevadm trigger commands to reload your udev rule files and trigger them for your currently plugged-in devices. If you use a different brand of security key, you can probably find its vendor and product IDs in the libfido2 udev rules file (or you can figure it out from the output of the udevadm info command).

Sunday, August 28, 2022

LXD Containers for Wayland GUI Apps

Having upgraded my home computers to Ubuntu 22.04, which features the latest version of LXD (5.5) via Snap, and using Wayland (via the Sway window manager), I spent some time working out how to run Wayland-native GUI apps in an LXD container. With the help of a few posts (Running X11 Software in LXD Containers, GUI Application via Wayland From Ubuntu LXD Container on Arch Linux Host, and Howto Use the Host's Wayland and XWayland Servers Inside Containers), I was able to get this working quite nicely.

Basic Profile

Most apps I tried, like LibreOffice or Eye of Gnome, worked with this basic LXD container profile (for Ubuntu 22.04 container images):

config: boot.autostart: false user.user-data: | #cloud-config write_files: - path: /usr/local/bin/mystartup.sh permissions: 0755 content: | #!/bin/sh uid=$(id -u) run_dir=/run/user/$uid mkdir -p $run_dir && chmod 700 $run_dir && chown $uid:$uid $run_dir ln -sf /mnt/wayland-socket $run_dir/wayland-0 - path: /usr/local/etc/mystartup.service content: | [Unit] After=local-fs.target [Service] Type=oneshot ExecStart=/usr/local/bin/mystartup.sh [Install] WantedBy=default.target runcmd: - mkdir -p /home/ubuntu/.config/systemd/user/default.target.wants - ln -s /usr/local/etc/mystartup.service /home/ubuntu/.config/systemd/user/default.target.wants/mystartup.service - ln -s /usr/local/etc/mystartup.service /home/ubuntu/.config/systemd/user/mystartup.service - chown -R ubuntu:ubuntu /home/ubuntu - echo 'export WAYLAND_DISPLAY=wayland-0' >> /home/ubuntu/.profile description: Basic Wayland Jammy devices: eth0: name: eth0 network: lxdbr0 type: nic root: path: / pool: default type: disk wayland-socket: bind: container connect: unix:/run/user/1000/wayland-1 listen: unix:/mnt/wayland-socket uid: 1000 gid: 1000 type: proxy

It binds the host's Wayland socket (/run/user/1000/wayland-1) to the container at /mnt/wayland-socket, via the wayland-socket device config. Via its cloud config user data, it sets up a systemd service in the container that will run when the ubuntu user logs in, and link the Wayland socket to its usual location in the container (/run/user/1000/wayland-0). This cloud config also adds the WAYLAND_DISPLAY variable to the ubuntu user's .profile, ensuring that Wayland-capable apps will try to access the Wayland socket at that location.

(Note that you may be using a different user ID or Wayland socket number on your own host; run ls /run/user/*/wayland-? to check. If so, change the connect: unix:/run/user/1000/wayland-1 line above to match the actual location of your Wayland socket.)

To set up a profile like this, save it as a file like wayland-basic.yml on the host. Create a new profile with the following command:

$ lxc profile create wayland-basic

And then update the profile with the file's content:

$ cat wayland-basic.yml | lxc profile edit wayland-basic

You can continue to edit the profile and update it with the same lxc profile edit command; LXD will apply your changes to existing containers which use the profile. You can view the latest version of the profile with the following command:

$ lxc profile show wayland-basic

With this profile set up, you can launch a new Ubuntu 22.04 container from it using the following command (the last argument, mycontainer, is the name to use for the new container):

$ lxc ubuntu:22.04 --profile wayland-basic mycontainer

Once launched, you can log into an interactive terminal session on the container as the ubuntu user with the following command:

$ lxc exec mycontainer -- sudo -u ubuntu -i

Once logged in, you can install apps into the container, like to install LibreOffice Writer (the LibreOffice alternative to Microsoft Word):

ubuntu@mycontainer:~$ sudo apt update ubuntu@mycontainer:~$ sudo apt install libreoffice-gtk3 libreoffice-writer

Then you can run the app, which should open up in a native Wayland window:

ubuntu@mycontainer:~$ libreoffice

Sharing Folders

This basic profile doesn't have access to the host's filesystem, however. To allow the container to access a specific directory on the host, run the following command on the host:

$ lxc config device add mycontainer mymount disk source=/home/me/Documents/myshare path=/home/ubuntu/mydir

This will mount the source directory from the host (/home/me/Documents/myshare) at the specified path in the container (/home/ubuntu/mydir). LXD's name for the device within the container will be mymount — you can use the device's name in combination with the container's own name to edit or remove the device; and you can mount additional directories if you give each mount device a unique name within the container.

Our basic profile allows only read access to the mounted directory within the container, however, as the directory will be mounted with the nobody user as its owner. To change the owner to the ubuntu user (so you can write to the directory from within the container), shut down the container, change the user ID mapping for its mounts, and then start the container back up again:

$ lxc stop mycontainer $ lxc config set mycontainer raw.idmap='both 1000 1000' $ lxc start mycontainer $ lxc exec mycontainer -- sudo -u ubuntu -i ubuntu@mycontainer:~$ ls -l mydir

The mounted mydir directory and its contents will now be owned by the ubuntu user, with full read and write access. (If you need to map a host user or group with an ID other than 1000 to the container's ubuntu user, you can do so with the uid and gid directives instead of the both directive; see the LXD idmap documentation for details.)

If you want to use these same settings for all containers that use the same profile, you can add these settings directly to the profile's config:

config: boot.autostart: false raw.idmap: both 1000 1000 user.user-data: | #cloud-config write_files: - path: /usr/local/bin/mystartup.sh permissions: 0755 content: | #!/bin/sh uid=$(id -u) run_dir=/run/user/$uid mkdir -p $run_dir && chmod 700 $run_dir && chown $uid:$uid $run_dir ln -sf /mnt/wayland-socket $run_dir/wayland-0 - path: /usr/local/etc/mystartup.service content: | [Unit] After=local-fs.target [Service] Type=oneshot ExecStart=/usr/local/bin/mystartup.sh [Install] WantedBy=default.target runcmd: - mkdir -p /home/ubuntu/.config/systemd/user/default.target.wants - ln -s /usr/local/etc/mystartup.service /home/ubuntu/.config/systemd/user/default.target.wants/mystartup.service - ln -s /usr/local/etc/mystartup.service /home/ubuntu/.config/systemd/user/mystartup.service - chown -R ubuntu:ubuntu /home/ubuntu - echo 'export WAYLAND_DISPLAY=wayland-0' >> /home/ubuntu/.profile description: Myshare Wayland Jammy devices: eth0: name: eth0 network: lxdbr0 type: nic mymount: source: /home/me/Documents/myshare path: /home/ubuntu/mydir type: disk root: path: / pool: default type: disk wayland-socket: bind: container connect: unix:/run/user/1000/wayland-1 listen: unix:/mnt/wayland-socket uid: 1000 gid: 1000 type: proxy

Launcher Script

When an LXD container is running, you don't have to log into it via a terminal session to launch an application in it — you can launch the application directly from the host. The following command will launch LibreOffice directly from the host:

$ lxc exec mycontainer -- sudo -u ubuntu -i libreoffice

So save the following as a shell script on the host (eg mycontainer-libreoffice.sh) and make it executable (eg chmod +x mycontainer-libreoffice.sh), and then you can simply run the script any time you want to launch libreoffice in mycontainer:

#/bin/sh lxc info mycontainer 2>/dev/null | grep RUNNING >/dev/null || (lxc start mycontainer; sleep 2) lxc exec mycontainer -- sudo -u ubuntu -i libreoffice

(Note that if you did not add the WAYLAND_DISPLAY variable to the user's .profile file, or if you added it to the user's .bashrc file instead of .profile, you'll need to include this variable in the launch command like this: lxc exec mycontainer -- sudo WAYLAND_DISPLAY=wayland-0 -u ubuntu -i libreoffice .)

AppArmor Issues

Some Wayland-capable GUI apps may fail to run inside an LXD container due to issues with the app's AppArmor profile; but you may be able to work-around it by adjusting the profile. One such app I've encountered is Evince.

A good way to check for AppArmor issues is by tailing the syslog, and filtering on its audit identifier, like with the following command:

$ journalctl -t audit -f

Access denied by AppArmor will look like this:

Aug 25 19:30:07 jp audit[99194]: AVC apparmor="DENIED" operation="connect" namespace="root//lxd-mycontainer_<var-snap-lxd-common-lxd>" profile="/usr/bin/evince" name="/mnt/wayland-socket" pid=99194 comm="evince" requested_mask="wr" denied_mask="wr" fsuid=1001000 ouid=1001000

In the case of Evince, I found I could work-around it by adjusting the container's own AppArmor profile for Evince. Run the following commands in the container to grant Evince read/write access to the Wayland socket:

ubuntu@mycontainer:~$ echo '/mnt/wayland-socket wr,' | sudo tee -a /etc/apparmor.d/local/usr.bin.evince ubuntu@mycontainer:~$ sudo apparmor_parser -r /etc/apparmor.d/usr.bin.evince

The first command adds a line to the user-managed additions of the Evince AppArmor policy (which is usually empty); the second command reloads the packaged version of the policy (a different file), which references the user-managed additions via an include statement.

Browser Quirks

Unfortunately, Firefox and Chromium don't work with the LXD-proxied Wayland socket (at least the Snap-packaged Ubuntu versions of Firefox and Chromium don't). But fortunately, they do work (mostly) when the Wayland socket is shared with them via disk mount.

If you create a new profile like the following, with a disk mount used to share the Wayland socket instead of a network proxy, you can keep the startup script the same as before:

config: boot.autostart: false raw.idmap: both 1000 1000 user.user-data: | #cloud-config write_files: - path: /usr/local/bin/mystartup.sh permissions: 0755 content: | #!/bin/sh uid=$(id -u) run_dir=/run/user/$uid mkdir -p $run_dir && chmod 700 $run_dir && chown $uid:$uid $run_dir ln -sf /mnt/wayland-socket $run_dir/wayland-0 - path: /usr/local/etc/mystartup.service content: | [Unit] After=local-fs.target [Service] Type=oneshot ExecStart=/usr/local/bin/mystartup.sh [Install] WantedBy=default.target runcmd: - mkdir -p /home/ubuntu/.config/systemd/user/default.target.wants - ln -s /usr/local/etc/mystartup.service /home/ubuntu/.config/systemd/user/default.target.wants/mystartup.service - ln -s /usr/local/etc/mystartup.service /home/ubuntu/.config/systemd/user/mystartup.service - chown -R ubuntu:ubuntu /home/ubuntu - echo 'export WAYLAND_DISPLAY=wayland-0' >> /home/ubuntu/.profile description: Browser Wayland Jammy devices: eth0: name: eth0 network: lxdbr0 type: nic root: path: / pool: default type: disk wayland-socket: source: /run/user/1000/wayland-1 path: /mnt/wayland-socket type: disk

Save this profile as a file like wayland-browser.yml. Create a new profile for it, and update the profile from the file's content:

$ lxc profile create wayland-browser $ cat wayland-browser.yml | lxc profile edit wayland-browser

Launch an Ubuntu 22.04 container with it, log into it, and install a browser:

$ lxc ubuntu:22.04 --profile wayland-basic myfirefox $ lxc exec myfirefox -- sudo -u ubuntu -i ubuntu@myfirefox:~$ sudo snap install firefox

Once installed, you should be able to start up the browser and have it open in a new Wayland window:

ubuntu@myfirefox:~$ firefox

Using a disk mount instead of a network proxy to share the Wayland socket seems much more flaky, however. I find that I'm not always able to start Firefox back up after quitting from it if I leave its LXD container running (especially if I put the computer to sleep in between quitting and starting again). Also, Firefox's "crash reporter" window, when it appears, seems to trigger a new crash, resulting in a continuous loop of crashes.

So now I always stop and restart the browser's LXD container before starting a new browser session (and I disable the crash reporter). This is what I use for my Firefox launcher script:

#/bin/sh lxc info myfirefox 2>/dev/null | grep STOPPED >/dev/null || lxc stop myfirefox lxc start myfirefox sleep 3 lxc exec myfirefox -- sudo MOZ_CRASHREPORTER_DISABLE=1 -u ubuntu -i firefox

And this for my Chromium launcher:

#/bin/sh lxc info mychromium 2>/dev/null | grep STOPPED >/dev/null || lxc stop mychromium lxc start mychromium sleep 3 lxc exec mychromium -- sudo -u ubuntu -i chromium --ozone-platform=wayland

Also, there are a few facets of the browsers that still don't work under this regime — in particular, open/save file dialogs don't appear when you try to download/upload files.

PulseAudio Output

To output audio from an LXD container, bind the host's PulseAudio socket (/run/user/1000/pulse/native) to the container at /mnt/pulse-socket, similar to the original Wayland socket:

config: boot.autostart: false raw.idmap: both 1000 1000 user.user-data: | #cloud-config write_files: - path: /usr/local/bin/mystartup.sh permissions: 0755 content: | #!/bin/sh uid=$(id -u) run_dir=/run/user/$uid mkdir -p $run_dir && chmod 700 $run_dir && chown $uid:$uid $run_dir ln -sf /mnt/wayland-socket $run_dir/wayland-0 mkdir -p $run_dir/pulse && chmod 700 $run_dir/pulse && chown $uid:$uid $run_dir/pulse ln -sf /mnt/pulse-socket $run_dir/pulse/native - path: /usr/local/etc/mystartup.service content: | [Unit] After=local-fs.target [Service] Type=oneshot ExecStart=/usr/local/bin/mystartup.sh [Install] WantedBy=default.target runcmd: - mkdir -p /home/ubuntu/.config/systemd/user/default.target.wants - ln -s /usr/local/etc/mystartup.service /home/ubuntu/.config/systemd/user/default.target.wants/mystartup.service - ln -s /usr/local/etc/mystartup.service /home/ubuntu/.config/systemd/user/mystartup.service - chown -R ubuntu:ubuntu /home/ubuntu - echo 'export WAYLAND_DISPLAY=wayland-0' >> /home/ubuntu/.profile description: Pulse Wayland Jammy devices: eth0: name: eth0 network: lxdbr0 type: nic root: path: / pool: default type: disk pulse-socket: bind: container connect: unix:/run/user/1000/pulse/native listen: unix:/mnt/pulse-socket uid: 1000 gid: 1000 type: proxy wayland-socket: source: /run/user/1000/wayland-1 path: /mnt/wayland-socket type: disk

Update the startup script to link the PulseAudio socket to its usual location in the container (/run/user/1000/pulse/native) when the ubuntu user logs in, just like we did for the Wayland socket. (Note that the mystartup.sh script's content from the cloud config of this profile is applied only when the container is first created, so you have to manually edit it in any containers that you've already created if you want to update them, too.)

Useful Commands

If you are just getting started with LXD containers, here are a few more useful commands that are good to know:

  • lxc ls: Lists all LXD containers.
  • lxc snapshot mycontainer mysnapshot: Creates a snapshot of mycontainer named mysnapshot.
  • lxc restore mycontainer mysnapshot: Restores mycontainer to the mysnapshot snapshot.
  • lxc delete mycontainer: Deletes mycontainer.
  • lxc storage info default: Shows the space used and available in the default storage pool.
  • lxc config show mycontainer: Shows the container-customized config settings for mycontainer.
  • lxc config show mycontainer -e: Shows all config settings for mycontainer (including those inherited from its profiles).

Thursday, September 23, 2021

Sourcehut Docker Builds on Fedora

Building and running Docker images on builds.sr.ht works nicely with Alpine Linux VMs (example here from Drew DeVault). Tim Schumacher figured out a similar way to set it up with Arch Linux VMs (example here).

I couldn't find an example specifically for Fedora VMs, however. But with a little trial and error, it turns out what you need is pretty similar to Arch — this is what I ended up with:

# .build.yml image: fedora/34 tasks: - install-docker: | curl -fsSL https://get.docker.com | sudo bash sudo mount -t tmpfs -o size=4G /dev/null /dev/shm until [ -e /dev/shm ]; do sleep 1; done sudo nohup dockerd --bip 172.18.0.1/16 </dev/null >/dev/null 2>&1 & sudo usermod -aG docker $(whoami) until sudo docker version >/dev/null 2>&1; do sleep 1; done - run-docker: | cat <<EOF >Dockerfile FROM alpine:latest RUN apk add htop CMD ["htop"] EOF docker build .

In the install-docker task, the first line installs the latest version of Docker. The second line sets up the shared-memory mount that Docker requires; and the third line waits until the mount is ready. The fourth line runs the Docker daemon as a background job; and the sixth line waits the Docker daemon is fully up and initialized.

The fifth line (the usermod command) makes the current user a member of the docker group, so the current user can run Docker commands directly (without sudo). It doesn't take effect within the install-docker task, however — so within the install-docker task, you still have to use sudo to run Docker; but in following tasks (like run-docker), it is in effect — so the example docker build . can be run without sudo.

Friday, June 11, 2021

Send Journald to CloudWatch Logs with Vector

Timber's Vector log collection tool is a nifty Swiss Army knife for collecting and shipping logs and metrics from one system to another. In particular, I think it's the best tool for shipping structured journald events to CloudWatch Logs.

Here's how to start using Vector to send journald log events to CloudWatch:

Grant Permissions to EC2 Roles

In order to push logs (or metrics) from your EC2 instances to CloudWatch, you first need to grant those EC2 instances some CloudWatch permissions. The permissions you need are basically the same as the AWS CloudWatch Agent needs, so just follow the Create IAM roles and users for use with the CloudWatch agent tutorial to assign the AWS-managed CloudWatchAgentServerPolicy to the IAM roles of the EC2 instances from which you plan on shipping journald logs.

The current version of the CloudWatchAgentServerPolicy looks like this:

{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "cloudwatch:PutMetricData", "ec2:DescribeVolumes", "ec2:DescribeTags", "logs:PutLogEvents", "logs:DescribeLogStreams", "logs:DescribeLogGroups", "logs:CreateLogStream", "logs:CreateLogGroup" ], "Resource": "*" }, { "Effect": "Allow", "Action": [ "ssm:GetParameter" ], "Resource": "arn:aws:ssm:*:*:parameter/AmazonCloudWatch-*" } ] }

With the Vector configuration described below, however, you actually only need to grant the logs:PutLogEvents, logs:DescribeLogStreams, logs:DescribeLogGroups, logs:CreateLogStream, and logs:CreateLogGroup permissions to your EC2 roles.

Install Vector

Installing Vector is easy on Linux. Timber maintains their own deb repo for Vector, so on a Debian-based distro like Ubuntu, you can just update the system's APT package manager with the Vector signing-key and repo, and install the Vector package:

$ wget https://repositories.timber.io/public/vector/gpg.3543DB2D0A2BC4B8.key -O - | sudo apt-key add - $ cat <<EOF | sudo tee /etc/apt/sources.list.d/timber-vector.list deb https://repositories.timber.io/public/vector/deb/ubuntu focal main deb-src https://repositories.timber.io/public/vector/deb/ubuntu focal main EOF $ sudo apt update $ sudo apt install vector

Configure Vector

The default Vector config file, located at /etc/vector/vector.toml, just includes a sample source and sink, so you can replace it entirely with your own config settings. This is the minimum you need to ship journald logs to CloudWatch:

[sources.my_journald_source] type = "journald" [sinks.my_cloudwatch_sink] type = "aws_cloudwatch_logs" inputs = ["my_journald_source"] compression = "gzip" encoding.codec = "json" region = "us-east-1" group_name = "myenv" stream_name = "mysite/myhost"

Replace the CloudWatch region, group_name, and stream_name settings above with whatever's appropriate for your EC2 instances.

Restart Vector

In one terminal screen, watch for errors by tailing Vector's own log entries with the journalctl -u vector -f command, and in another terminal restart Vector with the sudo systemctl restart vector command. If everything works, this is what you'll see in Vector's own logs:

$ journalctl -u vector -f Jun 11 19:54:02 myhost systemd[1]: Started Vector. Jun 11 19:54:03 myhost vector[686208]: Jun 11 19:54:03.008 INFO vector::app: Log level is enabled. level="vector=info,codec=info,vrl=info,file_source=info,tower_limit=trace,rdkafka=info" Jun 11 19:54:03 myhost vector[686208]: Jun 11 19:54:03.008 INFO vector::sources::host_metrics: PROCFS_ROOT is unset. Using default '/proc' for procfs root. Jun 11 19:54:03 myhost vector[686208]: Jun 11 19:54:03.008 INFO vector::sources::host_metrics: SYSFS_ROOT is unset. Using default '/sys' for sysfs root. Jun 11 19:54:03 myhost vector[686208]: Jun 11 19:54:03.010 INFO vector::app: Loading configs. path=[("/etc/vector/vector.toml", Some(Toml))] Jun 11 19:54:03 myhost vector[686208]: Jun 11 19:54:03.060 INFO vector::topology: Running healthchecks. Jun 11 19:54:03 myhost vector[686208]: Jun 11 19:54:03.060 INFO vector::topology: Starting source. name="journald" Jun 11 19:54:03 myhost vector[686208]: Jun 11 19:54:03.061 INFO vector::topology: Starting sink. name="aws_cloudwatch_logs" Jun 11 19:54:03 myhost vector[686208]: Jun 11 19:54:03.061 INFO vector: Vector has started. version="0.14.0" arch="x86_64" build_id="5f3a319 2021-06-03" Jun 11 19:54:03 myhost vector[686208]: Jun 11 19:54:03.062 INFO vector::app: API is disabled, enable by setting `api.enabled` to `true` and use commands like `vector top`. Jun 11 19:54:03 myhost vector[686208]: Jun 11 19:54:03.063 INFO journald-server: vector::sources::journald: Starting journalctl. Jun 11 19:54:03 myhost vector[686208]: Jun 11 19:54:03.128 INFO vector::sinks::aws_cloudwatch_logs: Skipping healthcheck log group check: `group_name` will be created if missing. Jun 11 19:55:04 myhost vector[686208]: Jun 11 19:55:04.430 INFO sink{component_kind="sink" component_name=aws_cloudwatch_logs component_type=aws_cloudwatch_logs}:request{request_id=0}: vector::sinks::aws_cloudwatch_logs: Sending events. events=4 Jun 11 19:55:04 myhost vector[686208]: Jun 11 19:55:04.453 INFO sink{component_kind="sink" component_name=aws_cloudwatch_logs component_type=aws_cloudwatch_logs}:request{request_id=0}: vector::sinks::aws_cloudwatch_logs::request: Log group provided does not exist; creating a new one. Jun 11 19:55:04 myhost vector[686208]: Jun 11 19:55:04.489 INFO sink{component_kind="sink" component_name=aws_cloudwatch_logs component_type=aws_cloudwatch_logs}:request{request_id=0}: vector::sinks::aws_cloudwatch_logs::request: Group created. name=myenv Jun 11 19:55:04 myhost vector[686208]: Jun 11 19:55:04.507 INFO sink{component_kind="sink" component_name=aws_cloudwatch_logs component_type=aws_cloudwatch_logs}:request{request_id=0}: vector::sinks::aws_cloudwatch_logs::request: Stream created. name=mysite/myhost Jun 11 19:55:04 myhost vector[686208]: Jun 11 19:55:04.523 INFO sink{component_kind="sink" component_name=aws_cloudwatch_logs component_type=aws_cloudwatch_logs}:request{request_id=0}: vector::sinks::aws_cloudwatch_logs::request: Putting logs. token=None Jun 11 19:55:04 myhost vector[686208]: Jun 11 19:55:04.560 INFO sink{component_kind="sink" component_name=aws_cloudwatch_logs component_type=aws_cloudwatch_logs}:request{request_id=0}: vector::sinks::aws_cloudwatch_logs::request: Putting logs was successful. next_token=Some("49610241853835534178700884863462197886393926766970915618")

If something went wrong, Vector will output some error messages (these are especially helpful as you add transformation steps to your basic Vector configuration).

Check Your CloudWatch Logs

Vector will have also shipped some logs to CloudWatch, so check them now. If you use a command-line tool like Saw, you'll see some log events like this:

$ saw watch myenv --expand --prefix mysite [2021-06-11T13:00:27-07:00] (myhost) { "PRIORITY": "6", "SYSLOG_FACILITY": "3", "SYSLOG_IDENTIFIER": "uwsgi", "_BOOT_ID": "6cb87d254d3742728b4fe20e746bcbe6", "_CAP_EFFECTIVE": "0", "_CMDLINE": "/usr/bin/uwsgi /etc/myapp/uwsgi.ini", "_COMM": "uwsgi", "_EXE": "/usr/bin/uwsgi-core", "_GID": "33", "_MACHINE_ID": "ec2aff1204bfae2781faf97e68afb1d4", "_PID": "363", "_SELINUX_CONTEXT": "unconfined\n", "_STREAM_ID": "aa261772c2e74663a7bb122c24b92e64", "_SYSTEMD_CGROUP": "/system.slice/myapp.service", "_SYSTEMD_INVOCATION_ID": "b5e117501bbb43428ab7565659022c20", "_SYSTEMD_SLICE": "system.slice", "_SYSTEMD_UNIT": "myapp.service", "_TRANSPORT": "stdout", "_UID": "33", "__MONOTONIC_TIMESTAMP": "511441719050", "__REALTIME_TIMESTAMP": "1623441627906124", "host": "myhost", "message": "[pid: 363|app: 0|req: 501/501] 203.0.113.2 () {34 vars in 377 bytes} [Fri Jun 11 20:00:27 2021] HEAD / =< generated 0 bytes in 0 msecs (HTTP/1.1 200) 2 headers in 78 bytes (0 switches on core 0)", "source_type": "journald" }

With Saw, use the saw watch command to tail log events as they come in, and use the saw get command to get historical events. For example, this command will print the last 10 minutes of events using the mysite log stream prefix from the myenv log group:

$ saw get myenv --expand --pretty --prefix mysite --start -10m

Filter and Remap Your Logs

With that working, you can tune your Vector configuration to filter out log events you don't care about, and remap certain log fields into a more useful format. Let's add two "transform" steps to our /etc/vector/vector.toml file between the Journald Source and the AWS CloudWatch Logs Sink: a Filter transform, and a Remap transform:

[sources.my_journald_source] type = "journald" [transforms.my_journald_filter] type = "filter" inputs = ["my_journald_source"] condition = ''' (includes(["0", "1", "2", "3", "4"], .PRIORITY) || includes(["systemd", "uwsgi"], .SYSLOG_IDENTIFIER)) ''' [transforms.my_journald_remap] type = "remap" inputs = ["my_journald_filter"] source = ''' .app = .SYSLOG_IDENTIFIER .datetime = to_timestamp(round((to_int(.__REALTIME_TIMESTAMP) ?? 0) / 1000000 ?? 0)) .facility = to_syslog_facility(to_int(.SYSLOG_FACILITY) ?? 0) ?? "" .severity = to_int(.PRIORITY) ?? 0 .level = to_syslog_level(.severity) ?? "" ''' [sinks.my_cloudwatch_sink] type = "aws_cloudwatch_logs" inputs = ["my_journald_filter"] compression = "gzip" encoding.codec = "json" region = "us-east-1" group_name = "myenv" stream_name = "mysite/myhost"

In the above pipeline, the my_journald_source step pipes to the my_journald_transform step, which pipes to the my_journald_transform step, which pipes to the my_cloudwatch_sink step (configured via the inputs setting of each receiving step). The condition VRL expression in the filter step drops entries unless the entry's PRIORITY field is less than 5 (aka "emerg", "alert", "crit", "err", and "warning"), or unless the entry's SYSLOG_IDENTITY field is "systemd" or "uwsgi". And the source VRL program in the remap step adds some additional conveniently-formatted fields (app datetime facility severity and level) to each log entry (the ?? operator in the source coerces "fallible" expressions to a default value when they would otherwise throw an error).

Now if you restart Vector and check your CloudWatch logs, you'll see fewer unimportant entries (those with lower priorities or uninteresting sources that we filtered), plus some additional fields that we added:

$ saw watch myenv --expand --prefix mysite [2021-06-11T13:00:27-07:00] (myhost) { "PRIORITY": "6", "SYSLOG_FACILITY": "3", "SYSLOG_IDENTIFIER": "uwsgi", "_BOOT_ID": "6cb87d254d3742728b4fe20e746bcbe6", "_CAP_EFFECTIVE": "0", "_CMDLINE": "/usr/bin/uwsgi /etc/myapp/uwsgi.ini", "_COMM": "uwsgi", "_EXE": "/usr/bin/uwsgi-core", "_GID": "33", "_MACHINE_ID": "ec2aff1204bfae2781faf97e68afb1d4", "_PID": "363", "_SELINUX_CONTEXT": "unconfined\n", "_STREAM_ID": "aa261772c2e74663a7bb122c24b92e64", "_SYSTEMD_CGROUP": "/system.slice/myapp.service", "_SYSTEMD_INVOCATION_ID": "b5e117501bbb43428ab7565659022c20", "_SYSTEMD_SLICE": "system.slice", "_SYSTEMD_UNIT": "myapp.service", "_TRANSPORT": "stdout", "_UID": "33", "__MONOTONIC_TIMESTAMP": "511441719050", "__REALTIME_TIMESTAMP": "1623441627906124", "app": "uwsgi", "datetime": "2021-06-11T20:00:27Z", "facility": "daemon", "host": "myhost", "level": "info", "message": "[pid: 363|app: 0|req: 501/501] 203.0.113.2 () {34 vars in 377 bytes} [Fri Jun 11 20:00:27 2021] HEAD / =< generated 0 bytes in 0 msecs (HTTP/1.1 200) 2 headers in 78 bytes (0 switches on core 0)", "severity": 6, "source_type": "journald" }

And we can use the new fields we added to further filter our output from Saw, as well as print compact log lines with jq:

$ saw watch myenv --raw --prefix mysite --filter '{ $.severity < 4 || $.app = "uwsgi" }' | jq --unbuffered -r '[.datetime, .level, .host, .app, .message] | join(" ")' 2021-06-11T20:00:27Z info myhost uwsgi [pid: 363|app: 0|req: 501/501] 203.0.113.2 () {34 vars in 377 bytes} [Fri Jun 11 20:00:27 2021] HEAD / =< generated 0 bytes in 0 msecs (HTTP/1.1 200) 2 headers in 78 bytes (0 switches on core 0)

Remove Irrelevant Fields

You can also use Vector's remap filter to remove extraneous fields that you don't want to ship to and store in CloudWatch. You can use the del function to delete specific fields from each event — for example, to skip the journald fields which duplicate the custom fields we added:

source = ''' .app = .SYSLOG_IDENTIFIER .datetime = to_timestamp(round((to_int(.__REALTIME_TIMESTAMP) ?? 0) / 1000000 ?? 0)) .facility = to_syslog_facility(to_int(.SYSLOG_FACILITY) ?? 0) ?? "" .severity = to_int(.PRIORITY) ?? 0 .level = to_syslog_level(.severity) ?? "" del(.PRIORITY) del(.SYSLOG_IDENTIFIER) del(.SYSLOG_FACILITY) '''

Or you could replace the original event entirely with a new object that contains just your desired fields:

source = ''' e = {} e.app = .SYSLOG_IDENTIFIER e.cgroup = ._SYSTEMD_CGROUP e.cmd = ._CMDLINE e.facility = to_int(.SYSLOG_FACILITY) ?? 0 e.gid = to_int(._GID) ?? 0 e.host = .host e.message = .message e.monotime = to_int(.__MONOTONIC_TIMESTAMP) ?? 0 e.pid = to_int(._PID) ?? 0 e.realtime = to_int(.__REALTIME_TIMESTAMP) ?? 0 e.datetime = to_timestamp(round(e.realtime / 1000000 ?? 0)) e.severity = to_int(.PRIORITY) ?? 0 e.level = to_syslog_level(e.severity) ?? "" e.uid = to_int(._UID) ?? 0 . = [e] '''

If you change your Vector pipeline to remap events like the above and restart it, you'll now see log events with only the following fields shipped to CloudWatch:

$ saw watch myenv --expand --prefix mysite [2021-06-11T13:00:27-07:00] (myhost) { "app": "uwsgi", "cgroup": "/system.slice/myapp.service", "cmd": "/usr/bin/uwsgi /etc/myapp/uwsgi.ini", "datetime": "2021-06-11T20:00:27Z", "facility": 3, "gid": 33, "host": "myhost", "level": "info", "message": "[pid: 363|app: 0|req: 501/501] 203.0.113.2 () {34 vars in 377 bytes} [Fri Jun 11 20:00:27 2021] HEAD / =< generated 0 bytes in 0 msecs (HTTP/1.1 200) 2 headers in 78 bytes (0 switches on core 0)", "monotime": 511441719050, "pid": 363, "realtime": 1623441627906124, "severity": 6, "uid": 33 }


Edit 4/23/2022: As of Vector 0.21.1, the rounding shown in the to_timestamp examples is no longer fallible — but the to_timestamp function itself is. So the to_timestamp examples should now look like the following:

e.datetime = to_timestamp(round(e.realtime / 1000000)) ?? now()

Monday, April 19, 2021

Elixir AWS SDK

While AWS doesn't provide an SDK directly for Erlang or Elixir, the AWS for the BEAM project has built a nice solution for this — a code generator that uses the JSON API definitions from the official AWS Go SDK to create native Erlang and Elixir AWS SDK bindings. The result for Elixir is the nifty aws-elixir library.

The aws-elixir library itself doesn't have the automagic functionality from other AWS SDKs of being able to pull AWS credentials from various sources like environment variables, profile files, IAM roles for tasks or EC2, etc. However, the AWS for the BEAM project has another library you can use for that: aws_credentials. Here's how to use aws-elixir in combination with aws_credentials for a standard Mix project:

1. Add aws dependencies

First, add the aws, aws_credentials, and hackney libraries as dependencies to your mix.exs file:

# mix.exs defp deps do [ {:aws, "~> 0.8.0"}, {:aws_credentials, git: "https://github.com/aws-beam/aws_credentials", ref: "0.1.1"}, {:hackney, "~> 1.17"}, ] end

2. Set up AWS.Client struct

Next, set up aws-elixir's AWS.Client struct with the AWS credentials found by the :aws_credentials.get_credentials/0 function. In this example, I'm going to create a simple MyApp.AwsUtils module, with a client/0 function that I can call from anywhere else in my app to initialize the AWS.Client struct:

# lib/my_app/aws_utils.ex defmodule MyApp.AwsUtils do @doc """ Creates a new AWS.Client with default settings. """ @spec client() :: AWS.Client.t() def client, do: :aws_credentials.get_credentials() |> build_client() defp build_client(%{access_key_id: id, secret_access_key: key, token: "", region: region}) do AWS.Client.create(id, key, region) end defp build_client(%{access_key_id: id, secret_access_key: key, token: token, region: region}) do AWS.Client.create(id, key, token, region) end defp build_client(credentials), do: struct(AWS.Client, credentials) end

The aws_credentials library will handle caching for you, so you don't need to separately cache the credentials it returns — just call get_credentials/0 every time you need them. By default, it will first check for the standard AWS environment variables (AWS_ACCESS_KEY_ID etc), then for the standard credentials file (~/.aws/credentials), then for ECS task credentials, and then for credentials from the EC2 metadata service.

So the above example will work if on one system you configure the environment variables for your Elixir program like this:

# .env AWS_DEFAULT_REGION=us-east-1 AWS_ACCESS_KEY_ID=ABCDEFGHIJKLMNOPQRST AWS_SECRET_ACCESS_KEY=01234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ/+a AWS_SESSION_TOKEN=

And on another system you configure the user account running your Elixir program with a ~/.aws/credentials file like this:

# ~/.aws/credentials [default] aws_access_key_id = ABCDEFGHIJKLMNOPQRST aws_secret_access_key = 01234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ/+a

And when running the Elixir program in an ECS task or EC2 instance, it will automatically pick up the credentials configured for the ECS task or EC2 instance under which the program is running.

If you do use a credentials file, you can customize the path to the credentials file, or profile within the file, via the :provider_options configuration parameter, like so:

# config/config.exs config :aws_credentials, :provider_options, %{ credential_path: "/home/me/.aws/config", profile: "myprofile" }

Some caveats with the current aws_credentials implementation are:

  1. With environment variables, you can specify the region (via the AWS_DEFAULT_REGION or AWS_REGION variable) only if you also specify the session token (via the AWS_SESSION_TOKEN or AWS_SECURITY_TOKEN variable).
  2. With credential files, the region and aws_session_token settings won't be included.

3. Call AWS.* module functions

Now you can go ahead and call any AWS SDK function. In this example, I'm going to create a get_my_special_file/0 function to get the contents of a file from S3:

# lib/my_app/my_files.ex defmodule MyApp.MyFiles do @doc """ Gets the content of my special file from S3. """ @spec get_my_special_file() :: binary def get_my_special_file do client = MyApp.AwsUtils.client() bucket = "my-bucket" key = "my/special/file.txt" {:ok, %{"Body" => body}, %{status_code: 200}} = AWS.S3.get_object(client, bucket, key) body end

For any AWS SDK function, you can use the Hex docs to guide you as to the Elixir function signature, the Go docs for any structs not explained in the Hex docs, and the AWS docs for more details and examples. For example, here are the docs for the get_object function used above:

  1. Hex docs for AWS.S3.get_object/22
  2. Go docs for S3.GetObject
  3. AWS docs for S3 GetObject

The general response format form each aws-elixir SDK function is this:

# successful response { :ok, map_of_parsed_response_body_with_string_keys, %{body: body_binary, headers: list_of_string_header_tuples, status_code: integer} } # error response { :error, { :unexpected_response, %{body: body_binary, headers: list_of_string_header_tuples, status_code: integer} } }

With the AWS.S3.get_object/22 example above, a successful response will look like this:

iex> AWS.S3.get_object(MyApp.AwsUtils.client(), "my-bucket", "my/special/file.txt") {:ok, %{ "Body" => "my special file content\n", "ContentLength" => "24", "ContentType" => "text/plain", "ETag" => "\"00733c197e5877adf705a2ec6d881d44\"", "LastModified" => "Wed, 14 Apr 2021 19:05:34 GMT" }, %{ body: "my special file content\n", headers: [ {"x-amz-id-2", "ouJJOzsesw0m24Y6SCxtnDquPbo4rg0BwSORyMn3lOJ8PIeptboR8ozKgIwuPGRAtRPyRIPi6Dk="}, {"x-amz-request-id", "P9ZVDJ2L378Q3EGX"}, {"Date", "Wed, 14 Apr 2021 20:40:46 GMT"}, {"Last-Modified", "Wed, 14 Apr 2021 19:05:34 GMT"}, {"ETag", "\"00733c197e59877ad705a2ec6d881d44\""}, {"Accept-Ranges", "bytes"}, {"Content-Type", "text/plain"}, {"Content-Length", "24"}, {"Server", "AmazonS3"} ], status_code: 200 }}

And an error response will look like this:

iex> AWS.S3.get_object(MyApp.AwsUtils.client(), "my-bucket", "not/my/special/file.txt") {:error, {:unexpected_response, %{ body: "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>FJWGFYKL44AB4XZK</RequestId><HostId>G4mzxVPQdjFsHpErTWZhG7djVLks1Vu7RLLYS37XA38c6JsAaJs+QMp3bR3Vm9aKhoWBuS/Mk6Y=</HostId></Error>", headers: [ {"x-amz-request-id", "FJWGFYKL44AB4XZK"}, {"x-amz-id-2", "G4mzxVPQdjFsHpErTWZhG7djVLks1Vu7RLLYS37XA38c6JsAaJs+QMp3bR3Vm9aKhoWBuS/Mk6Y="}, {"Content-Type", "application/xml"}, {"Transfer-Encoding", "chunked"}, {"Date", "Wed, 14 Apr 2021 19:25:01 GMT"}, {"Server", "AmazonS3"} ], status_code: 403 }}}

Friday, March 26, 2021

Elixir Systemd Logging

If you run an Elixir application as a Linux service with systemd, you'll probably find that logging works pretty well out of the box. By default, Elixir uses the Console logger backend, which sends all log messages to stdout. And with systemd services, by default all stdout messages are sent to journald.

This means you can view your application's logs easily via the journalctl command. For example, you can "tail" your app's logs with a command like this (if the systemd unit for the app was named my_app):

journalctl -u my_app -f

You can also configure systemd to send your app's stdout to a custom log file instead of journald, using the StandardOutput directive. You can add that directive to the [Service] section of a systemd unit file (for example, to log to a custom /var/log/my_app.log):

# /etc/systemd/system/my_app.service [Service] ExecStart=/srv/my_app/bin/my_app start ExecStop=/srv/my_app/bin/my_app stop StandardOutput=append:/var/log/my_app.log

Problems

If you collect and ship your log messages off to a centralized log service (like AWS CloudWatch, Google Cloud Logging, Azure Monitor, Splunk, Sumologic, Elasticsearch, Loggly, Datadog, New Relic, etc), you'll find two problems with this, however:

  1. Multi-line messages are broken up into a separate log entry for each line
  2. Log level/priority is lost

You can add some steps further down your logging pipeline to try to correct this, but the easiest way to fix it is at the source: Replace the default Console logger with the ExSyslogger backend.

Here's how you'd do that with a Phoenix web app:

1. Add the ex_syslogger dependency

First, add the ex_syslogger library as a dependency to your mix.exs file:

# mix.exs defp deps do [ {:ex_syslogger, "~> 1.5"} ] end

2. Register the ex_syslogger backend

Update the root config :logger options in your config/prod.exs file to register the ExSyslogger backend under the name :ex_syslogger:

# config/prod.exs # Do not print debug messages in production config :logger, level: :info config :logger, level: :info, backends: [{ExSyslogger, :ex_syslogger}]

Note that the :ex_syslogger name isn't special — you can call it whatever you want. It just has to match the name you use in the next section:

3. Configure the ex_syslogger backend

Now add config :logger, :ex_syslogger options to your config/config.exs file to configure the backend named :ex_syslogger that you registered above. I'd suggest just duplicating the configuration you already have for the default :console backend, plus setting the syslog APP-NAME field to your app's name via the ident option:

# config/config.exs # Configures Elixir's Logger config :logger, :console, format: "$time $metadata[$level] $message\n", metadata: [:request_id] config :logger, :ex_syslogger, format: "$time $metadata[$level] $message\n", metadata: [:request_id], ident: "my_app"

Result

Now when you compile your app with MIX_ENV=prod and run it as a systemd service, journald will automatically handle multi-line messages and log levels/priorities correctly. Furthermore, you can use any generic syslog collector to ship log entries to your log service as soon as they occur — with multi-line messages and log levels intact.

For example, when using the default Console logger, an error message from a Phoenix web app would have been displayed like this by journalctl:

$ journalctl -u my_app -f Mar 26 18:21:10 foo my_app[580361]: 18:21:10.337 request_id=Fm_3dFhPMtEHARkAAALy [info] Sent 500 in 16ms Mar 26 18:21:10 foo my_app[580361]: 18:21:10.345 [error] #PID<0.4149.0> running MyAppWeb.Endpoint (connection #PID<0.4148.0>, stream id 1) terminated Mar 26 18:21:10 foo my_app[580361]: Server: foo.example.com:443 (https) Mar 26 18:21:10 foo my_app[580361]: Request: GET /test/error Mar 26 18:21:10 foo my_app[580361]: ** (exit) an exception was raised: Mar 26 18:21:10 foo my_app[580361]: ** (RuntimeError) test runtime error Mar 26 18:21:10 foo my_app[580361]: (my_app 0.1.0) lib/my_app_web/controllers/test_controller.ex:9: MyAppWeb.TestController.error/2 Mar 26 18:21:10 foo my_app[580361]: (my_app 0.1.0) lib/my_app_web/controllers/test_controller.ex:1: MyAppWeb.TestController.action/2 Mar 26 18:21:10 foo my_app[580361]: (my_app 0.1.0) lib/my_app_web/controllers/test_controller.ex:1: MyAppWeb.TestController.phoenix_controller_pipeline/2 Mar 26 18:21:10 foo my_app[580361]: (phoenix 1.5.8) lib/phoenix/router.ex:352: Phoenix.Router.__call__/2 Mar 26 18:21:10 foo my_app[580361]: (my_app 0.1.0) lib/my_app_web/endpoint.ex:1: MyAppWeb.Endpoint.plug_builder_call/2 Mar 26 18:21:10 foo my_app[580361]: (my_app 0.1.0) lib/my_app_web/endpoint.ex:1: MyAppWeb.Endpoint.call/2 Mar 26 18:21:10 foo my_app[580361]: (phoenix 1.5.8) lib/phoenix/endpoint/cowboy2_handler.ex:65: Phoenix.Endpoint.Cowboy2Handler.init/4 Mar 26 18:21:10 foo my_app[580361]: (cowboy 2.8.0) /srv/my_app/deps/cowboy/src/cowboy_handler.erl:37: :cowboy_handler.execute/2

But with ExSyslogger in place, you'll now see this (where the full error message is captured as a single log entry, and is recognized as an error-level message):

$ journalctl -u my_app -f Mar 26 18:21:10 foo my_app[580361]: 18:21:10.337 request_id=Fm_3dFhPMtEHARkAAALy [info] Sent 500 in 16ms Mar 26 18:21:10 foo my_app[580361]: 18:21:10.345 [error] #PID<0.4149.0> running MyAppWeb.Endpoint (connection #PID<0.4148.0>, stream id 1) terminated Server: foo.example.com:443 (https) Request: GET /test/error ** (exit) an exception was raised: ** (RuntimeError) test runtime error (my_app 0.1.0) lib/my_app_web/controllers/test_controller.ex:9: MyAppWeb.TestController.error/2 (my_app 0.1.0) lib/my_app_web/controllers/test_controller.ex:1: MyAppWeb.TestController.action/2 (my_app 0.1.0) lib/my_app_web/controllers/test_controller.ex:1: MyAppWeb.TestController.phoenix_controller_pipeline/2 (phoenix 1.5.8) lib/phoenix/router.ex:352: Phoenix.Router.__call__/2 (my_app 0.1.0) lib/my_app_web/endpoint.ex:1: MyAppWeb.Endpoint.plug_builder_call/2 (my_app 0.1.0) lib/my_app_web/endpoint.ex:1: MyAppWeb.Endpoint.call/2 (phoenix 1.5.8) lib/phoenix/endpoint/cowboy2_handler.ex:65: Phoenix.Endpoint.Cowboy2Handler.init/4 (cowboy 2.8.0) /srv/my_app/deps/cowboy/src/cowboy_handler.erl:37: :cowboy_handler.execute/2

And as a side note, you can use journalctl to view just error-level messages and above via the --priority=err flag (-p3 for short):

journalctl -u my_app -p3