tag:blogger.com,1999:blog-37787688904726147192024-03-13T12:00:55.422-07:00SWWOMMSoftware Which Works on My MachineJustin Ludwighttp://www.blogger.com/profile/03245749869056259124noreply@blogger.comBlogger77125tag:blogger.com,1999:blog-3778768890472614719.post-60180611411543890182023-10-04T16:34:00.003-07:002023-10-04T16:50:43.982-07:00LXD Containers and FIDO Security Keys<p>With the rise of <a href="https://webauthn.guide/">WebAuthn</a>, I've had to figure out how expose my various <a href="https://fidoalliance.org/how-fido-works/">FIDO</a> security keys (<a href="https://www.yubico.com/">YubiKey</a>, <a href="https://www.nitrokey.com/">Nitrokey</a>, <a href="https://onlykey.io/">OnlyKey</a>, <a href="https://solokeys.com/">SoloKeys</a>, etc) to the LXD containers I use for <a href="/2022/08/lxd-containers-for-wayland-gui-apps.html">web browsers</a>.</p>
<p>The core of the solution is to expose the <a href="https://docs.kernel.org/hid/hidraw.html">HIDRAW</a> device that the security key is using to the LXD container — and to configure the device in the container to be owned by the user account who will use it. If you only have one such key plugged in, it's most likely using the <tt>/dev/hidraw0</tt> device; and usually it's user <tt>1000</tt> who needs to use it. An LXD <a href="https://documentation.ubuntu.com/lxd/en/latest/profiles/">profile</a> entry like the following allows such access:</p>
<blockquote><pre>
config: {}
description: exposes FIDO devices
devices:
hidraw0:
required: false
source: /dev/hidraw0
type: unix-char
uid: "1000"
name: fido
used_by: []
</pre></blockquote>
<p>A profile like this can be created, configured, and applied to a container with the following commands:</p>
<blockquote><pre>
$ lxc profile create fido
Profile fido created
$ lxc profile device add fido hidraw0 unix-char required=false source=/dev/hidraw0 uid=1000
$ lxc profile add mycontainer fido
Profile fido added to mycontainer
</pre></blockquote>
<p>However, the exact HIDRAW device number that a particular security key uses is not stable, and may vary as you plug and unplug various keys (or other USB or Bluetooth devices). How do you tell which HIDRAW device is being used by a particular physical device? The simplest way is to print out the content of the <tt>uevent</tt> pseudo file in the <a href="https://www.man7.org/linux/man-pages/man5/sysfs.5.html">sysfs</a> filesystem corresponding to each HIDRAW device until you find the one you want. For example, this is what the entry for one of my SoloKeys looks like, at <tt>hidraw11</tt>:</p>
<blockquote><pre>
$ cat /sys/class/hidraw/hidraw11/device/uevent
DRIVER=hid-generic
HID_ID=0003:00001209:0000BEEE
HID_NAME=SoloKeys Solo 2 Security Key
HID_PHYS=usb-0000:00:14.0-4/input1
HID_UNIQ=1234567890ABCDEF1234567890ABCDEF
MODALIAS=hid:b0003g0001v00001209p0000BEEE
</pre></blockquote>
<p>You can also get similar information — without the specific device name, but with the general type of device, like <tt>FIDO_TOKEN</tt> — from the <a href="https://www.man7.org/linux/man-pages/man8/udevadm.8.html">udevadm</a> command:</p>
<blockquote><pre>
$ udevadm info /dev/hidraw11
P: /devices/pci0000:00/0000:00:24.0/usb1/2-4/2-4:1.4/0003:1209:BEEE.0022/hidraw/hidraw11
N: hidraw11
L: 0
E: DEVPATH=/devices/pci0000:00/0000:00:24.0/usb1/2-4/2-4:1.4/0003:1209:BEEE.0022/hidraw/hidraw11
E: DEVNAME=/dev/hidraw11
E: MAJOR=232
E: MINOR=12
E: SUBSYSTEM=hidraw
E: USEC_INITIALIZED=123456789010
E: ID_FIDO_TOKEN=1
E: ID_SECURITY_TOKEN=1
E: ID_PATH=pci-0000:00:24.0-usb-0:4:1.4
E: ID_PATH_TAG=pci-0000_00_24_0-usb-0_4_1_4
E: ID_FOR_SEAT=hidraw-pci-0000_00_24_0-usb-0_4_1_4
E: TAGS=:uaccess:seat:snap_firefox_geckodriver:security-device:snap_firefox_firefox:
E: CURRENT_TAGS=:uaccess:seat:snap_firefox_geckodriver:security-device:snap_firefox_firefox:
</pre></blockquote>
<p>Using the <tt>udevadm info</tt> and <tt>lxc profile device list</tt> and commands, you can write a simple script that checks each <tt>/dev/hidraw*</tt> device on your host system against the HIDRAW devices registered for a particular LXD profile, and add or remove HIDRAW devices dynamically to that profile to match the current FIDO devices you have plugged in. Here's such a script:</p>
<blockquote><pre>
#!/bin/sh -eu
profile=${1:-fido}
existing=$(lxc profile device list $profile)
for dev_path in /dev/hidraw*; do
dev_name=$(basename $dev_path)
if udevadm info $dev_path | grep FIDO >/dev/null; then
if ! echo "$existing" | egrep '^'$dev_name'$' >/dev/null; then
lxc profile device add $profile $dev_name \
unix-char required=false source=$dev_path uid=1000
fi
else
if echo "$existing" | egrep '^'$dev_name'$' >/dev/null; then
lxc profile device remove $profile $dev_name
fi
fi
done
echo done
</pre></blockquote>
<p>You can run the script manually every time you plug in a new security key, to make sure the security key is registered at the right HIDRAW slot in your LXD profile — or you can add a custom <a href="https://www.man7.org/linux/man-pages/man7/udev.7.html#RULES_FILES">udev rule file</a> to run it automatically.</p>
<p>If you save the above script as <tt>/usr/local/bin/add-fido-hidraw-devices-to-lxc-profile.sh</tt>, you can then add the below file as <tt>/etc/udev/rules.d/75-fido.rules</tt> (replacing <tt>justin</tt> with the username of your daily user) to automatically run the script for several different brands of FIDO security keys:</p>
<blockquote><pre>
# Nitrokey 3
SUBSYSTEM=="hidraw", KERNEL=="hidraw*", ATTRS{idVendor}=="20a0", ATTRS{idProduct}=="42b2", RUN+="/bin/su justin -c /usr/local/bin/add-fido-hidraw-devices-to-lxc-profile.sh"
# OnlyKey
SUBSYSTEM=="hidraw", KERNEL=="hidraw*", ATTRS{idVendor}=="1d50", ATTRS{idProduct}=="60fc", RUN+="/bin/su justin -c /usr/local/bin/add-fido-hidraw-devices-to-lxc-profile.sh"
# SoloKeys
SUBSYSTEM=="hidraw", KERNEL=="hidraw*", ATTRS{idVendor}=="1209", ATTRS{idProduct}=="5070|50b0|beee", RUN+="/bin/su justin -c /usr/local/bin/add-fido-hidraw-devices-to-lxc-profile.sh"
# Yubico YubiKey
SUBSYSTEM=="hidraw", KERNEL=="hidraw*", ATTRS{idVendor}=="1050", ATTRS{idProduct}=="0113|0114|0115|0116|0120|0121|0200|0402|0403|0406|0407|0410", RUN+="/bin/su justin -c /usr/local/bin/add-fido-hidraw-devices-to-lxc-profile.sh"
</pre></blockquote>
<p>Run the <tt>sudo udevadm control --reload-rules</tt> and <tt>sudo udevadm trigger</tt> commands to reload your udev rule files and trigger them for your currently plugged-in devices. If you use a different brand of security key, you can probably find its vendor and product IDs in the <a href="https://github.com/Yubico/libfido2/blob/main/udev/70-u2f.rules">libfido2 udev rules file</a> (or you can figure it out from the output of the <tt>udevadm info</tt> command).</p>
Justin Ludwighttp://www.blogger.com/profile/03245749869056259124noreply@blogger.com0tag:blogger.com,1999:blog-3778768890472614719.post-82205068711794407022022-08-28T14:32:00.001-07:002022-08-28T14:35:58.032-07:00LXD Containers for Wayland GUI Apps<p>Having upgraded my home computers to <a href="https://wiki.ubuntu.com/Releases">Ubuntu 22.04</a>, which features the latest version of <a href="https://linuxcontainers.org/lxd/">LXD</a> (5.5) via <a href="https://snapcraft.io/">Snap</a>, and using <a href="https://wayland.freedesktop.org/">Wayland</a> (via the <a href="https://swaywm.org/">Sway</a> window manager), I spent some time working out how to run Wayland-native GUI apps in an LXD container. With the help of a few posts (<a href="https://blog.simos.info/running-x11-software-in-lxd-containers/">Running X11 Software in LXD Containers</a>, <a href="https://gist.github.com/stueja/447bd3bc0d510a0a7e50f9f1ef58ad75">GUI Application via Wayland From Ubuntu LXD Container on Arch Linux Host</a>, and <a href="https://discuss.linuxcontainers.org/t/howto-use-the-hosts-wayland-and-xwayland-servers-inside-containers/8765">Howto Use the Host's Wayland and XWayland Servers Inside Containers</a>), I was able to get this working quite nicely.</p>
<h3>Basic Profile</h3>
<p>Most apps I tried, like <a href="https://www.libreoffice.org/">LibreOffice</a> or <a href="https://wiki.gnome.org/Apps/EyeOfGnome">Eye of Gnome</a>, worked with this basic LXD container profile (for Ubuntu 22.04 container images):
<blockquote><code>config:
boot.autostart: false
user.user-data: |
#cloud-config
write_files:
- path: /usr/local/bin/mystartup.sh
permissions: 0755
content: |
#!/bin/sh
uid=$(id -u)
run_dir=/run/user/$uid
mkdir -p $run_dir && chmod 700 $run_dir && chown $uid:$uid $run_dir
ln -sf /mnt/wayland-socket $run_dir/wayland-0
- path: /usr/local/etc/mystartup.service
content: |
[Unit]
After=local-fs.target
[Service]
Type=oneshot
ExecStart=/usr/local/bin/mystartup.sh
[Install]
WantedBy=default.target
runcmd:
- mkdir -p /home/ubuntu/.config/systemd/user/default.target.wants
- ln -s /usr/local/etc/mystartup.service /home/ubuntu/.config/systemd/user/default.target.wants/mystartup.service
- ln -s /usr/local/etc/mystartup.service /home/ubuntu/.config/systemd/user/mystartup.service
- chown -R ubuntu:ubuntu /home/ubuntu
- echo 'export WAYLAND_DISPLAY=wayland-0' >> /home/ubuntu/.profile
description: Basic Wayland Jammy
devices:
eth0:
name: eth0
network: lxdbr0
type: nic
root:
path: /
pool: default
type: disk
wayland-socket:
bind: container
connect: unix:/run/user/1000/wayland-1
listen: unix:/mnt/wayland-socket
uid: 1000
gid: 1000
type: proxy
</code></blockquote>
<p>It binds the host's Wayland socket (<tt>/run/user/1000/wayland-1</tt>) to the container at <tt>/mnt/wayland-socket</tt>, via the <tt>wayland-socket</tt> device config. Via its <a href="https://cloudinit.readthedocs.io/en/latest/topics/format.html">cloud config user data</a>, it sets up a <a href="https://systemd.io/">systemd</a> service in the container that will run when the <tt>ubuntu</tt> user logs in, and link the Wayland socket to its usual location in the container (<tt>/run/user/1000/wayland-0</tt>). This cloud config also adds the <tt>WAYLAND_DISPLAY</tt> variable to the <tt>ubuntu</tt> user's <tt>.profile</tt>, ensuring that Wayland-capable apps will try to access the Wayland socket at that location.</p>
<p>(Note that you may be using a different user ID or Wayland socket number on your own host; run <tt>ls /run/user/*/wayland-?</tt> to check. If so, change the <tt>connect: unix:/run/user/1000/wayland-1</tt> line above to match the actual location of your Wayland socket.)</p>
<p>To set up a profile like this, save it as a file like <tt>wayland-basic.yml</tt> on the host. Create a new profile with the following command:</p>
<blockquote><code>$ lxc profile create wayland-basic</code></blockquote>
<p>And then update the profile with the file's content:</p>
<blockquote><code>$ cat wayland-basic.yml | lxc profile edit wayland-basic</code></blockquote>
<p>You can continue to edit the profile and update it with the same <tt>lxc profile edit</tt> command; LXD will apply your changes to existing containers which use the profile. You can view the latest version of the profile with the following command:</p>
<blockquote><code>$ lxc profile show wayland-basic</code></blockquote>
<p>With this profile set up, you can launch a new Ubuntu 22.04 container from it using the following command (the last argument, <tt>mycontainer</tt>, is the name to use for the new container):</p>
<blockquote><code>$ lxc ubuntu:22.04 --profile wayland-basic mycontainer</code></blockquote>
<p>Once launched, you can log into an interactive terminal session on the container as the <tt>ubuntu</tt> user with the following command:</p>
<blockquote><code>$ lxc exec mycontainer -- sudo -u ubuntu -i</code></blockquote>
<p>Once logged in, you can install apps into the container, like to install LibreOffice Writer (the LibreOffice alternative to Microsoft Word):</p>
<blockquote><code>ubuntu@mycontainer:~$ sudo apt update
ubuntu@mycontainer:~$ sudo apt install libreoffice-gtk3 libreoffice-writer</code></blockquote>
<p>Then you can run the app, which should open up in a native Wayland window:</p>
<blockquote><code>ubuntu@mycontainer:~$ libreoffice</code></blockquote>
<h3>Sharing Folders</h3>
<p>This basic profile doesn't have access to the host's filesystem, however. To allow the container to access a specific directory on the host, run the following command on the host:</p>
<blockquote><code>$ lxc config device add mycontainer mymount disk source=/home/me/Documents/myshare path=/home/ubuntu/mydir</code></blockquote>
<p>This will mount the source directory from the host (<tt>/home/me/Documents/myshare</tt>) at the specified path in the container (<tt>/home/ubuntu/mydir</tt>). LXD's name for the device within the container will be <tt>mymount</tt> — you can use the device's name in combination with the container's own name to edit or remove the device; and you can mount additional directories if you give each mount device a unique name within the container.</p>
<p>Our basic profile allows only read access to the mounted directory within the container, however, as the directory will be mounted with the <tt>nobody</tt> user as its owner. To change the owner to the <tt>ubuntu</tt> user (so you can write to the directory from within the container), shut down the container, change the user ID mapping for its mounts, and then start the container back up again:</p>
<blockquote><code>$ lxc stop mycontainer
$ lxc config set mycontainer raw.idmap='both 1000 1000'
$ lxc start mycontainer
$ lxc exec mycontainer -- sudo -u ubuntu -i
ubuntu@mycontainer:~$ ls -l mydir
</code></blockquote>
<p>The mounted <tt>mydir</tt> directory and its contents will now be owned by the <tt>ubuntu</tt> user, with full read and write access. (If you need to map a host user or group with an ID other than <tt>1000</tt> to the container's <tt>ubuntu</tt> user, you can do so with the <tt>uid</tt> and <tt>gid</tt> directives instead of the <tt>both</tt> directive; see the LXD <a href="https://linuxcontainers.org/lxd/docs/master/userns-idmap">idmap</a> documentation for details.)</p>
<p>If you want to use these same settings for all containers that use the same profile, you can add these settings directly to the profile's config:</p>
<blockquote><code>config:
boot.autostart: false
<ins>raw.idmap: both 1000 1000</ins>
user.user-data: |
#cloud-config
write_files:
- path: /usr/local/bin/mystartup.sh
permissions: 0755
content: |
#!/bin/sh
uid=$(id -u)
run_dir=/run/user/$uid
mkdir -p $run_dir && chmod 700 $run_dir && chown $uid:$uid $run_dir
ln -sf /mnt/wayland-socket $run_dir/wayland-0
- path: /usr/local/etc/mystartup.service
content: |
[Unit]
After=local-fs.target
[Service]
Type=oneshot
ExecStart=/usr/local/bin/mystartup.sh
[Install]
WantedBy=default.target
runcmd:
- mkdir -p /home/ubuntu/.config/systemd/user/default.target.wants
- ln -s /usr/local/etc/mystartup.service /home/ubuntu/.config/systemd/user/default.target.wants/mystartup.service
- ln -s /usr/local/etc/mystartup.service /home/ubuntu/.config/systemd/user/mystartup.service
- chown -R ubuntu:ubuntu /home/ubuntu
- echo 'export WAYLAND_DISPLAY=wayland-0' >> /home/ubuntu/.profile
description: Myshare Wayland Jammy
devices:
eth0:
name: eth0
network: lxdbr0
type: nic
<ins>mymount:
source: /home/me/Documents/myshare
path: /home/ubuntu/mydir
type: disk</ins>
root:
path: /
pool: default
type: disk
wayland-socket:
bind: container
connect: unix:/run/user/1000/wayland-1
listen: unix:/mnt/wayland-socket
uid: 1000
gid: 1000
type: proxy
</code></blockquote>
<h3>Launcher Script</h3>
<p>When an LXD container is running, you don't have to log into it via a terminal session to launch an application in it — you can launch the application directly from the host. The following command will launch LibreOffice directly from the host:</p>
<blockquote><code>$ lxc exec mycontainer -- sudo -u ubuntu -i libreoffice</code></blockquote>
<p>So save the following as a shell script on the host (eg <tt>mycontainer-libreoffice.sh</tt>) and make it executable (eg <tt>chmod +x mycontainer-libreoffice.sh</tt>), and then you can simply run the script any time you want to launch <tt>libreoffice</tt> in <tt>mycontainer</tt>:</p>
<blockquote><code>#/bin/sh
lxc info mycontainer 2>/dev/null | grep RUNNING >/dev/null || (lxc start mycontainer; sleep 2)
lxc exec mycontainer -- sudo -u ubuntu -i libreoffice
</code></blockquote>
<p>(Note that if you <i>did not</i> add the <tt>WAYLAND_DISPLAY</tt> variable to the user's <tt>.profile</tt> file, or if you added it to the user's <tt>.bashrc</tt> file instead of <tt>.profile</tt>, you'll need to include this variable in the launch command like this: <tt>lxc exec mycontainer -- sudo WAYLAND_DISPLAY=wayland-0 -u ubuntu -i libreoffice</tt> .)</p>
<h3>AppArmor Issues</h3>
<p>Some Wayland-capable GUI apps may fail to run inside an LXD container due to issues with the app's <a href="https://www.apparmor.net/">AppArmor</a> profile; but you may be able to work-around it by adjusting the profile. One such app I've encountered is <a href="https://wiki.gnome.org/Apps/Evince">Evince</a>.</p>
<p>A good way to check for AppArmor issues is by tailing the syslog, and filtering on its <tt>audit</tt> identifier, like with the following command:</p>
<blockquote><code>$ journalctl -t audit -f</code></blockquote>
<p>Access denied by AppArmor will look like this:</p>
<blockquote><code>Aug 25 19:30:07 jp audit[99194]: AVC apparmor="DENIED" operation="connect" namespace="root//lxd-mycontainer_<var-snap-lxd-common-lxd>" profile="/usr/bin/evince" name="/mnt/wayland-socket" pid=99194 comm="evince" requested_mask="wr" denied_mask="wr" fsuid=1001000 ouid=1001000</code></blockquote>
<p>In the case of Evince, I found I could work-around it by adjusting the container's own AppArmor profile for Evince. Run the following commands in the container to grant Evince read/write access to the Wayland socket:</p>
<blockquote><code>ubuntu@mycontainer:~$ echo '/mnt/wayland-socket wr,' | sudo tee -a /etc/apparmor.d/local/usr.bin.evince
ubuntu@mycontainer:~$ sudo apparmor_parser -r /etc/apparmor.d/usr.bin.evince
</code></blockquote>
<p>The first command adds a line to the <i>user-managed</i> additions of the Evince AppArmor policy (which is usually empty); the second command reloads the <i>packaged</i> version of the policy (a different file), which references the user-managed additions via an include statement.</p>
<h3>Browser Quirks</h3>
<p>Unfortunately, <a href="https://www.mozilla.org/en-US/firefox/new/">Firefox</a> and <a href="https://www.chromium.org/Home/">Chromium</a> don't work with the LXD-proxied Wayland socket (at least the Snap-packaged Ubuntu versions of Firefox and Chromium don't). But fortunately, they <i>do work</i> (mostly) when the Wayland socket is shared with them via <i>disk mount</i>.</p>
<p>If you create a new profile like the following, with a disk mount used to share the Wayland socket instead of a network proxy, you can keep the startup script the same as before:</p>
<blockquote><code>config:
boot.autostart: false
<ins>raw.idmap: both 1000 1000</ins>
user.user-data: |
#cloud-config
write_files:
- path: /usr/local/bin/mystartup.sh
permissions: 0755
content: |
#!/bin/sh
uid=$(id -u)
run_dir=/run/user/$uid
mkdir -p $run_dir && chmod 700 $run_dir && chown $uid:$uid $run_dir
ln -sf /mnt/wayland-socket $run_dir/wayland-0
- path: /usr/local/etc/mystartup.service
content: |
[Unit]
After=local-fs.target
[Service]
Type=oneshot
ExecStart=/usr/local/bin/mystartup.sh
[Install]
WantedBy=default.target
runcmd:
- mkdir -p /home/ubuntu/.config/systemd/user/default.target.wants
- ln -s /usr/local/etc/mystartup.service /home/ubuntu/.config/systemd/user/default.target.wants/mystartup.service
- ln -s /usr/local/etc/mystartup.service /home/ubuntu/.config/systemd/user/mystartup.service
- chown -R ubuntu:ubuntu /home/ubuntu
- echo 'export WAYLAND_DISPLAY=wayland-0' >> /home/ubuntu/.profile
description: Browser Wayland Jammy
devices:
eth0:
name: eth0
network: lxdbr0
type: nic
root:
path: /
pool: default
type: disk
<ins>wayland-socket:
source: /run/user/1000/wayland-1
path: /mnt/wayland-socket
type: disk</ins>
</code></blockquote>
<p>Save this profile as a file like <tt>wayland-browser.yml</tt>. Create a new profile for it, and update the profile from the file's content:</p>
<blockquote><code>$ lxc profile create wayland-browser
$ cat wayland-browser.yml | lxc profile edit wayland-browser</code></blockquote>
<p>Launch an Ubuntu 22.04 container with it, log into it, and install a browser:</p>
<blockquote><code>$ lxc ubuntu:22.04 --profile wayland-basic myfirefox
$ lxc exec myfirefox -- sudo -u ubuntu -i
ubuntu@myfirefox:~$ sudo snap install firefox</code></blockquote>
<p>Once installed, you should be able to start up the browser and have it open in a new Wayland window:</p>
<blockquote><code>ubuntu@myfirefox:~$ firefox</code></blockquote>
<p>Using a disk mount instead of a network proxy to share the Wayland socket seems much more flaky, however. I find that I'm not always able to start Firefox back up after quitting from it if I leave its LXD container running (especially if I put the computer to sleep in between quitting and starting again). Also, Firefox's "crash reporter" window, when it appears, seems to trigger a new crash, resulting in a continuous loop of crashes.</p>
<p>So now I always stop and restart the browser's LXD container before starting a new browser session (and I disable the crash reporter). This is what I use for my Firefox launcher script:</p>
<blockquote><code>#/bin/sh
lxc info myfirefox 2>/dev/null | grep STOPPED >/dev/null || lxc stop myfirefox
lxc start myfirefox
sleep 3
lxc exec myfirefox -- sudo MOZ_CRASHREPORTER_DISABLE=1 -u ubuntu -i firefox
</code></blockquote>
<p>And this for my Chromium launcher:</p>
<blockquote><code>#/bin/sh
lxc info mychromium 2>/dev/null | grep STOPPED >/dev/null || lxc stop mychromium
lxc start mychromium
sleep 3
lxc exec mychromium -- sudo -u ubuntu -i chromium --ozone-platform=wayland
</code></blockquote>
<p>Also, there are a few facets of the browsers that still don't work under this regime — in particular, open/save file dialogs don't appear when you try to download/upload files.</p>
<h3>PulseAudio Output</h3>
<p>To output audio from an LXD container, bind the host's <a href="https://www.freedesktop.org/wiki/Software/PulseAudio/">PulseAudio</a> socket (<tt>/run/user/1000/pulse/native</tt>) to the container at <tt>/mnt/pulse-socket</tt>, similar to the original Wayland socket:</p>
<blockquote><code>config:
boot.autostart: false
raw.idmap: both 1000 1000
user.user-data: |
#cloud-config
write_files:
- path: /usr/local/bin/mystartup.sh
permissions: 0755
content: |
#!/bin/sh
uid=$(id -u)
run_dir=/run/user/$uid
mkdir -p $run_dir && chmod 700 $run_dir && chown $uid:$uid $run_dir
ln -sf /mnt/wayland-socket $run_dir/wayland-0
<ins>mkdir -p $run_dir/pulse && chmod 700 $run_dir/pulse && chown $uid:$uid $run_dir/pulse
ln -sf /mnt/pulse-socket $run_dir/pulse/native</ins>
- path: /usr/local/etc/mystartup.service
content: |
[Unit]
After=local-fs.target
[Service]
Type=oneshot
ExecStart=/usr/local/bin/mystartup.sh
[Install]
WantedBy=default.target
runcmd:
- mkdir -p /home/ubuntu/.config/systemd/user/default.target.wants
- ln -s /usr/local/etc/mystartup.service /home/ubuntu/.config/systemd/user/default.target.wants/mystartup.service
- ln -s /usr/local/etc/mystartup.service /home/ubuntu/.config/systemd/user/mystartup.service
- chown -R ubuntu:ubuntu /home/ubuntu
- echo 'export WAYLAND_DISPLAY=wayland-0' >> /home/ubuntu/.profile
description: Pulse Wayland Jammy
devices:
eth0:
name: eth0
network: lxdbr0
type: nic
root:
path: /
pool: default
type: disk
<ins>pulse-socket:
bind: container
connect: unix:/run/user/1000/pulse/native
listen: unix:/mnt/pulse-socket
uid: 1000
gid: 1000
type: proxy</ins>
wayland-socket:
source: /run/user/1000/wayland-1
path: /mnt/wayland-socket
type: disk
</code></blockquote>
<p>Update the startup script to link the PulseAudio socket to its usual location in the container (<tt>/run/user/1000/pulse/native</tt>) when the <tt>ubuntu</tt> user logs in, just like we did for the Wayland socket. (Note that the <tt>mystartup.sh</tt> script's content from the cloud config of this profile is applied only when the container is first created, so you have to manually edit it in any containers that you've already created if you want to update them, too.)</p>
<h3>Useful Commands</h3>
<p>If you are just getting started with LXD containers, here are a few more useful commands that are good to know:
<ul>
<li><tt>lxc ls</tt>: Lists all LXD containers.</li>
<li><tt>lxc snapshot mycontainer mysnapshot</tt>: Creates a snapshot of <tt>mycontainer</tt> named <tt>mysnapshot</tt>.</li>
<li><tt>lxc restore mycontainer mysnapshot</tt>: Restores <tt>mycontainer</tt> to the <tt>mysnapshot</tt> snapshot.</li>
<li><tt>lxc delete mycontainer</tt>: Deletes <tt>mycontainer</tt>.
<li><tt>lxc storage info default</tt>: Shows the space used and available in the <tt>default</tt> storage pool.</li>
<li><tt>lxc config show mycontainer</tt>: Shows the container-customized config settings for <tt>mycontainer</tt>.</li>
<li><tt>lxc config show mycontainer -e</tt>: Shows all config settings for <tt>mycontainer</tt> (including those inherited from its profiles).</li>
</ul>
Justin Ludwighttp://www.blogger.com/profile/03245749869056259124noreply@blogger.com0tag:blogger.com,1999:blog-3778768890472614719.post-6356431925953767642021-09-23T18:44:00.001-07:002021-09-23T18:44:23.187-07:00Sourcehut Docker Builds on Fedora<p>Building and running Docker images on <a href="https://builds.sr.ht/">builds.sr.ht</a> works nicely with <a href="https://www.alpinelinux.org/">Alpine Linux</a> VMs (<a href="https://builds.sr.ht/api/jobs/339091/manifest">example here</a> from Drew DeVault). Tim Schumacher figured out a similar way to set it up with <a href="https://archlinux.org/">Arch Linux</a> VMs (<a href="https://git.sr.ht/~xaffe/takingstack/tree/develop/files/setup-docker.sh">example here</a>).</p>
<p>I couldn't find an example specifically for <a href="https://getfedora.org/">Fedora</a> VMs, however. But with a little trial and error, it turns out what you need is pretty similar to Arch — this is what I ended up with:</p>
<blockquote><code># .build.yml
image: fedora/34
tasks:
- install-docker: |
curl -fsSL https://get.docker.com | sudo bash
sudo mount -t tmpfs -o size=4G /dev/null /dev/shm
until [ -e /dev/shm ]; do sleep 1; done
sudo nohup dockerd --bip 172.18.0.1/16 </dev/null >/dev/null 2>&1 &
sudo usermod -aG docker $(whoami)
until sudo docker version >/dev/null 2>&1; do sleep 1; done
- run-docker: |
cat <<EOF >Dockerfile
FROM alpine:latest
RUN apk add htop
CMD ["htop"]
EOF
docker build .
</code></blockquote>
<p>In the <tt>install-docker</tt> task, the first line installs the latest version of Docker. The second line sets up the shared-memory mount that Docker requires; and the third line waits until the mount is ready. The fourth line runs the Docker daemon as a background job; and the sixth line waits the Docker daemon is fully up and initialized.</p>
<p>The fifth line (the <tt>usermod</tt> command) makes the current user a member of the <tt>docker</tt> group, so the current user can run Docker commands directly (without <tt>sudo</tt>). It doesn't take effect within the <tt>install-docker</tt> task, however — so within the <tt>install-docker</tt> task, you still have to use <tt>sudo</tt> to run Docker; but in following tasks (like <tt>run-docker</tt>), it is in effect — so the example <tt>docker build .</tt> can be run without <tt>sudo</tt>.
Justin Ludwighttp://www.blogger.com/profile/03245749869056259124noreply@blogger.com0tag:blogger.com,1999:blog-3778768890472614719.post-34217737064513550982021-06-11T15:46:00.007-07:002022-04-23T14:19:36.121-07:00Send Journald to CloudWatch Logs with Vector<p>Timber's <a href="https://vector.dev">Vector</a> log collection tool is a nifty Swiss Army knife for collecting and shipping logs and metrics from one system to another. In particular, I think it's the best tool for shipping structured <a href="https://man7.org/linux/man-pages/man5/journald.conf.5.html">journald</a> events to <a href="https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/WhatIsCloudWatchLogs.html">CloudWatch Logs</a>.</p>
<p>Here's how to start using Vector to send journald log events to CloudWatch:</p>
<h3>Grant Permissions to EC2 Roles</h3>
<p>In order to push logs (or metrics) from your EC2 instances to CloudWatch, you first need to grant those EC2 instances some CloudWatch permissions. The permissions you need are basically the same as the <a href="https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Install-CloudWatch-Agent.html">AWS CloudWatch Agent</a> needs, so just follow the <a href="https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/create-iam-roles-for-cloudwatch-agent.html">Create IAM roles and users for use with the CloudWatch agent</a> tutorial to assign the AWS-managed <tt>CloudWatchAgentServerPolicy</tt> to the IAM roles of the EC2 instances from which you plan on shipping journald logs.</p>
<p>The current version of the <tt>CloudWatchAgentServerPolicy</tt> looks like this:</p>
<blockquote><code>{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"cloudwatch:PutMetricData",
"ec2:DescribeVolumes",
"ec2:DescribeTags",
"logs:PutLogEvents",
"logs:DescribeLogStreams",
"logs:DescribeLogGroups",
"logs:CreateLogStream",
"logs:CreateLogGroup"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"ssm:GetParameter"
],
"Resource": "arn:aws:ssm:*:*:parameter/AmazonCloudWatch-*"
}
]
}
</code></blockquote>
<p>With the Vector configuration described below, however, you actually only need to grant the <tt>logs:PutLogEvents</tt>, <tt>logs:DescribeLogStreams</tt>, <tt>logs:DescribeLogGroups</tt>, <tt>logs:CreateLogStream</tt>, and <tt>logs:CreateLogGroup</tt> permissions to your EC2 roles.</p>
<h3>Install Vector</h3>
<p>Installing Vector is easy on Linux. Timber maintains their own deb repo for Vector, so on a Debian-based distro like Ubuntu, you can just update the system's APT package manager with the Vector signing-key and repo, and install the Vector package:</p>
<blockquote><code>$ wget https://repositories.timber.io/public/vector/gpg.3543DB2D0A2BC4B8.key -O - | sudo apt-key add -
$ cat <<EOF | sudo tee /etc/apt/sources.list.d/timber-vector.list
deb https://repositories.timber.io/public/vector/deb/ubuntu focal main
deb-src https://repositories.timber.io/public/vector/deb/ubuntu focal main
EOF
$ sudo apt update
$ sudo apt install vector
</code></blockquote>
<h3>Configure Vector</h3>
<p>The default Vector config file, located at <tt>/etc/vector/vector.toml</tt>, just includes a sample source and sink, so you can replace it entirely with your own config settings. This is the minimum you need to ship journald logs to CloudWatch:</p>
<blockquote><code>[sources.my_journald_source]
type = "journald"
[sinks.my_cloudwatch_sink]
type = "aws_cloudwatch_logs"
inputs = ["my_journald_source"]
compression = "gzip"
encoding.codec = "json"
region = "us-east-1"
group_name = "myenv"
stream_name = "mysite/myhost"
</code></blockquote>
<p>Replace the CloudWatch <tt>region</tt>, <tt>group_name</tt>, and <tt>stream_name</tt> settings above with whatever's appropriate for your EC2 instances.</p>
<h3>Restart Vector</h3>
<p>In one terminal screen, watch for errors by tailing Vector's own log entries with the <tt>journalctl -u vector -f</tt> command, and in another terminal restart Vector with the <tt>sudo systemctl restart vector</tt> command. If everything works, this is what you'll see in Vector's own logs:</p>
<blockquote><code>$ journalctl -u vector -f
Jun 11 19:54:02 myhost systemd[1]: Started Vector.
Jun 11 19:54:03 myhost vector[686208]: Jun 11 19:54:03.008 INFO vector::app: Log level is enabled. level="vector=info,codec=info,vrl=info,file_source=info,tower_limit=trace,rdkafka=info"
Jun 11 19:54:03 myhost vector[686208]: Jun 11 19:54:03.008 INFO vector::sources::host_metrics: PROCFS_ROOT is unset. Using default '/proc' for procfs root.
Jun 11 19:54:03 myhost vector[686208]: Jun 11 19:54:03.008 INFO vector::sources::host_metrics: SYSFS_ROOT is unset. Using default '/sys' for sysfs root.
Jun 11 19:54:03 myhost vector[686208]: Jun 11 19:54:03.010 INFO vector::app: Loading configs. path=[("/etc/vector/vector.toml", Some(Toml))]
Jun 11 19:54:03 myhost vector[686208]: Jun 11 19:54:03.060 INFO vector::topology: Running healthchecks.
Jun 11 19:54:03 myhost vector[686208]: Jun 11 19:54:03.060 INFO vector::topology: Starting source. name="journald"
Jun 11 19:54:03 myhost vector[686208]: Jun 11 19:54:03.061 INFO vector::topology: Starting sink. name="aws_cloudwatch_logs"
Jun 11 19:54:03 myhost vector[686208]: Jun 11 19:54:03.061 INFO vector: Vector has started. version="0.14.0" arch="x86_64" build_id="5f3a319 2021-06-03"
Jun 11 19:54:03 myhost vector[686208]: Jun 11 19:54:03.062 INFO vector::app: API is disabled, enable by setting `api.enabled` to `true` and use commands like `vector top`.
Jun 11 19:54:03 myhost vector[686208]: Jun 11 19:54:03.063 INFO journald-server: vector::sources::journald: Starting journalctl.
Jun 11 19:54:03 myhost vector[686208]: Jun 11 19:54:03.128 INFO vector::sinks::aws_cloudwatch_logs: Skipping healthcheck log group check: `group_name` will be created if missing.
Jun 11 19:55:04 myhost vector[686208]: Jun 11 19:55:04.430 INFO sink{component_kind="sink" component_name=aws_cloudwatch_logs component_type=aws_cloudwatch_logs}:request{request_id=0}: vector::sinks::aws_cloudwatch_logs: Sending events. events=4
Jun 11 19:55:04 myhost vector[686208]: Jun 11 19:55:04.453 INFO sink{component_kind="sink" component_name=aws_cloudwatch_logs component_type=aws_cloudwatch_logs}:request{request_id=0}: vector::sinks::aws_cloudwatch_logs::request: Log group provided does not exist; creating a new one.
Jun 11 19:55:04 myhost vector[686208]: Jun 11 19:55:04.489 INFO sink{component_kind="sink" component_name=aws_cloudwatch_logs component_type=aws_cloudwatch_logs}:request{request_id=0}: vector::sinks::aws_cloudwatch_logs::request: Group created. name=myenv
Jun 11 19:55:04 myhost vector[686208]: Jun 11 19:55:04.507 INFO sink{component_kind="sink" component_name=aws_cloudwatch_logs component_type=aws_cloudwatch_logs}:request{request_id=0}: vector::sinks::aws_cloudwatch_logs::request: Stream created. name=mysite/myhost
Jun 11 19:55:04 myhost vector[686208]: Jun 11 19:55:04.523 INFO sink{component_kind="sink" component_name=aws_cloudwatch_logs component_type=aws_cloudwatch_logs}:request{request_id=0}: vector::sinks::aws_cloudwatch_logs::request: Putting logs. token=None
Jun 11 19:55:04 myhost vector[686208]: Jun 11 19:55:04.560 INFO sink{component_kind="sink" component_name=aws_cloudwatch_logs component_type=aws_cloudwatch_logs}:request{request_id=0}: vector::sinks::aws_cloudwatch_logs::request: Putting logs was successful. next_token=Some("49610241853835534178700884863462197886393926766970915618")
</code></blockquote>
<p>If something went wrong, Vector will output some error messages (these are especially helpful as you add transformation steps to your basic Vector configuration).</p>
<h3>Check Your CloudWatch Logs</h3>
<p>Vector will have also shipped some logs to CloudWatch, so check them now. If you use a command-line tool like <a href="https://github.com/TylerBrock/saw">Saw</a>, you'll see some log events like this:</p>
<blockquote><code>$ saw watch myenv --expand --prefix mysite
[2021-06-11T13:00:27-07:00] (myhost) {
"PRIORITY": "6",
"SYSLOG_FACILITY": "3",
"SYSLOG_IDENTIFIER": "uwsgi",
"_BOOT_ID": "6cb87d254d3742728b4fe20e746bcbe6",
"_CAP_EFFECTIVE": "0",
"_CMDLINE": "/usr/bin/uwsgi /etc/myapp/uwsgi.ini",
"_COMM": "uwsgi",
"_EXE": "/usr/bin/uwsgi-core",
"_GID": "33",
"_MACHINE_ID": "ec2aff1204bfae2781faf97e68afb1d4",
"_PID": "363",
"_SELINUX_CONTEXT": "unconfined\n",
"_STREAM_ID": "aa261772c2e74663a7bb122c24b92e64",
"_SYSTEMD_CGROUP": "/system.slice/myapp.service",
"_SYSTEMD_INVOCATION_ID": "b5e117501bbb43428ab7565659022c20",
"_SYSTEMD_SLICE": "system.slice",
"_SYSTEMD_UNIT": "myapp.service",
"_TRANSPORT": "stdout",
"_UID": "33",
"__MONOTONIC_TIMESTAMP": "511441719050",
"__REALTIME_TIMESTAMP": "1623441627906124",
"host": "myhost",
"message": "[pid: 363|app: 0|req: 501/501] 203.0.113.2 () {34 vars in 377 bytes} [Fri Jun 11 20:00:27 2021] HEAD / =< generated 0 bytes in 0 msecs (HTTP/1.1 200) 2 headers in 78 bytes (0 switches on core 0)",
"source_type": "journald"
}
</code></blockquote>
<p>With Saw, use the <tt>saw watch</tt> command to tail log events as they come in, and use the <tt>saw get</tt> command to get historical events. For example, this command will print the last 10 minutes of events using the <tt>mysite</tt> log stream prefix from the <tt>myenv</tt> log group:</p>
<blockquote><code>$ saw get myenv --expand --pretty --prefix mysite --start -10m
</code></blockquote>
<h3>Filter and Remap Your Logs</h3>
<p>With that working, you can tune your Vector configuration to filter out log events you don't care about, and remap certain log fields into a more useful format. Let's add two "transform" steps to our <tt>/etc/vector/vector.toml</tt> file between the <a href="https://vector.dev/docs/reference/configuration/sources/journald/">Journald Source</a> and the <a href="https://vector.dev/docs/reference/configuration/sinks/aws_cloudwatch_logs/">AWS CloudWatch Logs Sink</a>: a <a href="https://vector.dev/docs/reference/configuration/transforms/filter/">Filter</a> transform, and a <a href="https://vector.dev/docs/reference/configuration/transforms/remap/">Remap</a> transform:</p>
<blockquote><code>[sources.my_journald_source]
type = "journald"
[transforms.my_journald_filter]
type = "filter"
inputs = ["my_journald_source"]
condition = '''
(includes(["0", "1", "2", "3", "4"], .PRIORITY) || includes(["systemd", "uwsgi"], .SYSLOG_IDENTIFIER))
'''
[transforms.my_journald_remap]
type = "remap"
inputs = ["my_journald_filter"]
source = '''
.app = .SYSLOG_IDENTIFIER
.datetime = to_timestamp(round((to_int(.__REALTIME_TIMESTAMP) ?? 0) / 1000000 ?? 0))
.facility = to_syslog_facility(to_int(.SYSLOG_FACILITY) ?? 0) ?? ""
.severity = to_int(.PRIORITY) ?? 0
.level = to_syslog_level(.severity) ?? ""
'''
[sinks.my_cloudwatch_sink]
type = "aws_cloudwatch_logs"
inputs = ["my_journald_filter"]
compression = "gzip"
encoding.codec = "json"
region = "us-east-1"
group_name = "myenv"
stream_name = "mysite/myhost"
</code></blockquote>
<p>In the above pipeline, the <tt>my_journald_source</tt> step pipes to the <tt>my_journald_transform</tt> step, which pipes to the <tt>my_journald_transform</tt> step, which pipes to the <tt>my_cloudwatch_sink</tt> step (configured via the <tt>inputs</tt> setting of each receiving step). The <tt>condition</tt> <a href="https://vector.dev/docs/reference/vrl/">VRL</a> expression in the filter step drops entries unless the entry's <tt>PRIORITY</tt> field is less than 5 (aka "emerg", "alert", "crit", "err", and "warning"), or unless the entry's <tt>SYSLOG_IDENTITY</tt> field is "systemd" or "uwsgi". And the <tt>source</tt> VRL program in the remap step adds some additional conveniently-formatted fields (<tt>app</tt> <tt>datetime</tt> <tt>facility</tt> <tt>severity</tt> and <tt>level</tt>) to each log entry (the <tt>??</tt> operator in the source coerces "fallible" expressions to a default value when they would otherwise throw an error).</p>
<p>Now if you restart Vector and check your CloudWatch logs, you'll see fewer unimportant entries (those with lower priorities or uninteresting sources that we filtered), plus some additional fields that we added:</p>
<blockquote><code>$ saw watch myenv --expand --prefix mysite
[2021-06-11T13:00:27-07:00] (myhost) {
"PRIORITY": "6",
"SYSLOG_FACILITY": "3",
"SYSLOG_IDENTIFIER": "uwsgi",
"_BOOT_ID": "6cb87d254d3742728b4fe20e746bcbe6",
"_CAP_EFFECTIVE": "0",
"_CMDLINE": "/usr/bin/uwsgi /etc/myapp/uwsgi.ini",
"_COMM": "uwsgi",
"_EXE": "/usr/bin/uwsgi-core",
"_GID": "33",
"_MACHINE_ID": "ec2aff1204bfae2781faf97e68afb1d4",
"_PID": "363",
"_SELINUX_CONTEXT": "unconfined\n",
"_STREAM_ID": "aa261772c2e74663a7bb122c24b92e64",
"_SYSTEMD_CGROUP": "/system.slice/myapp.service",
"_SYSTEMD_INVOCATION_ID": "b5e117501bbb43428ab7565659022c20",
"_SYSTEMD_SLICE": "system.slice",
"_SYSTEMD_UNIT": "myapp.service",
"_TRANSPORT": "stdout",
"_UID": "33",
"__MONOTONIC_TIMESTAMP": "511441719050",
"__REALTIME_TIMESTAMP": "1623441627906124",
"app": "uwsgi",
"datetime": "2021-06-11T20:00:27Z",
"facility": "daemon",
"host": "myhost",
"level": "info",
"message": "[pid: 363|app: 0|req: 501/501] 203.0.113.2 () {34 vars in 377 bytes} [Fri Jun 11 20:00:27 2021] HEAD / =< generated 0 bytes in 0 msecs (HTTP/1.1 200) 2 headers in 78 bytes (0 switches on core 0)",
"severity": 6,
"source_type": "journald"
}
</code></blockquote>
<p>And we can use the new fields we added to further <a href="https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/FilterAndPatternSyntax.html">filter</a> our output from Saw, as well as print compact log lines with <a href="https://stedolan.github.io/jq/">jq</a>:</p>
<blockquote><code>$ saw watch myenv --raw --prefix mysite --filter '{ $.severity < 4 || $.app = "uwsgi" }' | jq --unbuffered -r '[.datetime, .level, .host, .app, .message] | join(" ")'
2021-06-11T20:00:27Z info myhost uwsgi [pid: 363|app: 0|req: 501/501] 203.0.113.2 () {34 vars in 377 bytes} [Fri Jun 11 20:00:27 2021] HEAD / =< generated 0 bytes in 0 msecs (HTTP/1.1 200) 2 headers in 78 bytes (0 switches on core 0)
</code></blockquote>
<h3>Remove Irrelevant Fields</h3>
<p>You can also use Vector's remap filter to remove extraneous fields that you don't want to ship to and store in CloudWatch. You can use the <tt><a href="https://vector.dev/docs/reference/vrl/functions/#del">del</a></tt> function to delete specific fields from each event — for example, to skip the journald fields which duplicate the custom fields we added:</p>
<blockquote><code>source = '''
.app = .SYSLOG_IDENTIFIER
.datetime = to_timestamp(round((to_int(.__REALTIME_TIMESTAMP) ?? 0) / 1000000 ?? 0))
.facility = to_syslog_facility(to_int(.SYSLOG_FACILITY) ?? 0) ?? ""
.severity = to_int(.PRIORITY) ?? 0
.level = to_syslog_level(.severity) ?? ""
del(.PRIORITY)
del(.SYSLOG_IDENTIFIER)
del(.SYSLOG_FACILITY)
'''
</code></blockquote>
<p>Or you could replace the original event entirely with a new object that contains just your desired fields:</p>
<blockquote><code>source = '''
e = {}
e.app = .SYSLOG_IDENTIFIER
e.cgroup = ._SYSTEMD_CGROUP
e.cmd = ._CMDLINE
e.facility = to_int(.SYSLOG_FACILITY) ?? 0
e.gid = to_int(._GID) ?? 0
e.host = .host
e.message = .message
e.monotime = to_int(.__MONOTONIC_TIMESTAMP) ?? 0
e.pid = to_int(._PID) ?? 0
e.realtime = to_int(.__REALTIME_TIMESTAMP) ?? 0
e.datetime = to_timestamp(round(e.realtime / 1000000 ?? 0))
e.severity = to_int(.PRIORITY) ?? 0
e.level = to_syslog_level(e.severity) ?? ""
e.uid = to_int(._UID) ?? 0
. = [e]
'''
</code></blockquote>
<p>If you change your Vector pipeline to remap events like the above and restart it, you'll now see log events with only the following fields shipped to CloudWatch:</p>
<blockquote><code>$ saw watch myenv --expand --prefix mysite
[2021-06-11T13:00:27-07:00] (myhost) {
"app": "uwsgi",
"cgroup": "/system.slice/myapp.service",
"cmd": "/usr/bin/uwsgi /etc/myapp/uwsgi.ini",
"datetime": "2021-06-11T20:00:27Z",
"facility": 3,
"gid": 33,
"host": "myhost",
"level": "info",
"message": "[pid: 363|app: 0|req: 501/501] 203.0.113.2 () {34 vars in 377 bytes} [Fri Jun 11 20:00:27 2021] HEAD / =< generated 0 bytes in 0 msecs (HTTP/1.1 200) 2 headers in 78 bytes (0 switches on core 0)",
"monotime": 511441719050,
"pid": 363,
"realtime": 1623441627906124,
"severity": 6,
"uid": 33
}
</code></blockquote>
<br/><hr/>
<p><b>Edit 4/23/2022</b>: As of Vector 0.21.1, the rounding shown in the <tt>to_timestamp</tt> examples is no longer fallible — but the <tt>to_timestamp</tt> function itself is. So the <tt>to_timestamp</tt> examples should now look like the following:</p>
<blockquote><code>e.datetime = to_timestamp(round(e.realtime / 1000000)) ?? now()</code></blockquote>
Justin Ludwighttp://www.blogger.com/profile/03245749869056259124noreply@blogger.com0tag:blogger.com,1999:blog-3778768890472614719.post-81982913893414247462021-04-19T17:09:00.003-07:002021-12-21T11:46:31.241-08:00Elixir AWS SDK<p>While <a href="https://aws.amazon.com/">AWS</a> doesn't provide an SDK directly for <a href="https://www.erlang.org/">Erlang</a> or <a href="https://elixir-lang.org/">Elixir</a>, the <a href="https://github.com/aws-beam">AWS for the BEAM</a> project has built a nice solution for this — a code generator that uses the JSON API definitions from the official AWS Go SDK to create native Erlang and Elixir AWS SDK bindings. The result for Elixir is the nifty <a href="https://hexdocs.pm/aws/">aws-elixir</a> library.</p>
<p>The aws-elixir library itself doesn't have the automagic functionality from other AWS SDKs of being able to pull AWS credentials from various sources like environment variables, profile files, IAM roles for tasks or EC2, etc. However, the AWS for the BEAM project has another library you can use for that: <a href="https://github.com/aws-beam/aws_credentials">aws_credentials</a>. Here's how to use aws-elixir in combination with aws_credentials for a standard <a href="https://hexdocs.pm/mix/">Mix</a> project:</p>
<h3>1. Add <tt>aws</tt> dependencies</h3>
<p>First, add the <tt>aws</tt>, <tt>aws_credentials</tt>, and <tt>hackney</tt> libraries as dependencies to your <tt>mix.exs</tt> file:</p>
<blockquote><code># mix.exs
defp deps do
[
<ins>{:aws, "~> 0.8.0"},
{:aws_credentials, git: "https://github.com/aws-beam/aws_credentials", ref: "0.1.1"},
{:hackney, "~> 1.17"},</ins>
]
end
</code></blockquote>
<h3>2. Set up <tt>AWS.Client</tt> struct</h3>
<p>Next, set up aws-elixir's <tt>AWS.Client</tt> struct with the AWS credentials found by the <tt>:aws_credentials.get_credentials/0</tt> function. In this example, I'm going to create a simple <tt>MyApp.AwsUtils</tt> module, with a <tt>client/0</tt> function that I can call from anywhere else in my app to initialize the <tt>AWS.Client</tt> struct:</p>
<blockquote><code># lib/my_app/aws_utils.ex
defmodule MyApp.AwsUtils do
@doc """
Creates a new AWS.Client with default settings.
"""
@spec client() :: AWS.Client.t()
def client, do: :aws_credentials.get_credentials() |> build_client()
defp build_client(%{access_key_id: id, secret_access_key: key, token: "", region: region}) do
AWS.Client.create(id, key, region)
end
defp build_client(%{access_key_id: id, secret_access_key: key, token: token, region: region}) do
AWS.Client.create(id, key, token, region)
end
defp build_client(credentials), do: struct(AWS.Client, credentials)
end
</code></blockquote>
<p>The aws_credentials library will handle caching for you, so you don't need to separately cache the credentials it returns — just call <tt>get_credentials/0</tt> every time you need them. By default, it will first check for the standard AWS environment variables (<tt>AWS_ACCESS_KEY_ID</tt> etc), then for the standard credentials file (<tt>~/.aws/credentials</tt>), then for ECS task credentials, and then for credentials from the EC2 metadata service.</p>
<p>So the above example will work if on one system you configure the environment variables for your Elixir program like this:</p>
<blockquote><code># .env
AWS_DEFAULT_REGION=us-east-1
AWS_ACCESS_KEY_ID=ABCDEFGHIJKLMNOPQRST
AWS_SECRET_ACCESS_KEY=01234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ/+a
AWS_SESSION_TOKEN=
</code></blockquote>
<p>And on another system you configure the user account running your Elixir program with a <tt>~/.aws/credentials</tt> file like this:</p>
<blockquote><code># ~/.aws/credentials
[default]
aws_access_key_id = ABCDEFGHIJKLMNOPQRST
aws_secret_access_key = 01234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ/+a
</code></blockquote>
<p>And when running the Elixir program in an ECS task or EC2 instance, it will automatically pick up the credentials configured for the ECS task or EC2 instance under which the program is running.</p>
<p>If you do use a credentials file, you can customize the path to the credentials file, or profile within the file, via the <tt>:provider_options</tt> configuration parameter, like so:</p>
<blockquote><code># config/config.exs
config :aws_credentials, :provider_options, %{
credential_path: "/home/me/.aws/config",
profile: "myprofile"
}
</code></blockquote>
<p>Some caveats with the current aws_credentials implementation are:</p>
<ol>
<li>With environment variables, you can specify the region (via the <tt>AWS_DEFAULT_REGION</tt> or <tt>AWS_REGION</tt> variable) only if you also specify the session token (via the <tt>AWS_SESSION_TOKEN</tt> or <tt>AWS_SECURITY_TOKEN</tt> variable).</li>
<li>With credential files, the <tt>region</tt> and <tt>aws_session_token</tt> settings won't be included.</li>
</ol>
<h3>3. Call <tt>AWS.*</tt> module functions</h3>
<p>Now you can go ahead and call any AWS SDK function. In this example, I'm going to create a <tt>get_my_special_file/0</tt> function to get the contents of a file from S3:</p>
<blockquote><code># lib/my_app/my_files.ex
defmodule MyApp.MyFiles do
@doc """
Gets the content of my special file from S3.
"""
@spec get_my_special_file() :: binary
def get_my_special_file do
client = MyApp.AwsUtils.client()
bucket = "my-bucket"
key = "my/special/file.txt"
{:ok, %{"Body" => body}, %{status_code: 200}} = AWS.S3.get_object(client, bucket, key)
body
end
</code></blockquote>
<p>For any AWS SDK function, you can use the Hex docs to guide you as to the Elixir function signature, the Go docs for any structs not explained in the Hex docs, and the AWS docs for more details and examples. For example, here are the docs for the <tt>get_object</tt> function used above:</p>
<ol>
<li><a href="https://hexdocs.pm/aws/AWS.S3.html#get_object/22">Hex docs for <tt>AWS.S3.get_object/22</tt></a>
<li><a href="https://docs.aws.amazon.com/sdk-for-go/api/service/s3/#S3.GetObject">Go docs for <tt>S3.GetObject</tt></a>
<li><a href="https://docs.aws.amazon.com/AmazonS3/latest/API/API_GetObject.html">AWS docs for S3 GetObject</a>
</ol>
<p>The general response format form each aws-elixir SDK function is this:</p>
<blockquote><code># successful response
{
:ok,
map_of_parsed_response_body_with_string_keys,
%{body: body_binary, headers: list_of_string_header_tuples, status_code: integer}
}
# error response
{
:error,
{
:unexpected_response,
%{body: body_binary, headers: list_of_string_header_tuples, status_code: integer}
}
}
</code></blockquote>
<p>With the <tt>AWS.S3.get_object/22</tt> example above, a successful response will look like this:</p>
<blockquote><code>
iex> AWS.S3.get_object(MyApp.AwsUtils.client(), "my-bucket", "my/special/file.txt")
{:ok,
%{
"Body" => "my special file content\n",
"ContentLength" => "24",
"ContentType" => "text/plain",
"ETag" => "\"00733c197e5877adf705a2ec6d881d44\"",
"LastModified" => "Wed, 14 Apr 2021 19:05:34 GMT"
},
%{
body: "my special file content\n",
headers: [
{"x-amz-id-2",
"ouJJOzsesw0m24Y6SCxtnDquPbo4rg0BwSORyMn3lOJ8PIeptboR8ozKgIwuPGRAtRPyRIPi6Dk="},
{"x-amz-request-id", "P9ZVDJ2L378Q3EGX"},
{"Date", "Wed, 14 Apr 2021 20:40:46 GMT"},
{"Last-Modified", "Wed, 14 Apr 2021 19:05:34 GMT"},
{"ETag", "\"00733c197e59877ad705a2ec6d881d44\""},
{"Accept-Ranges", "bytes"},
{"Content-Type", "text/plain"},
{"Content-Length", "24"},
{"Server", "AmazonS3"}
],
status_code: 200
}}
</code></blockquote>
<p>And an error response will look like this:</p>
<blockquote><code>
iex> AWS.S3.get_object(MyApp.AwsUtils.client(), "my-bucket", "not/my/special/file.txt")
{:error,
{:unexpected_response,
%{
body: "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<Error><Code>AccessDenied</Code><Message>Access Denied</Message><RequestId>FJWGFYKL44AB4XZK</RequestId><HostId>G4mzxVPQdjFsHpErTWZhG7djVLks1Vu7RLLYS37XA38c6JsAaJs+QMp3bR3Vm9aKhoWBuS/Mk6Y=</HostId></Error>",
headers: [
{"x-amz-request-id", "FJWGFYKL44AB4XZK"},
{"x-amz-id-2",
"G4mzxVPQdjFsHpErTWZhG7djVLks1Vu7RLLYS37XA38c6JsAaJs+QMp3bR3Vm9aKhoWBuS/Mk6Y="},
{"Content-Type", "application/xml"},
{"Transfer-Encoding", "chunked"},
{"Date", "Wed, 14 Apr 2021 19:25:01 GMT"},
{"Server", "AmazonS3"}
],
status_code: 403
}}}
</code></blockquote>
Justin Ludwighttp://www.blogger.com/profile/03245749869056259124noreply@blogger.com0tag:blogger.com,1999:blog-3778768890472614719.post-41603983312146058372021-03-26T14:43:00.000-07:002021-03-26T14:43:14.766-07:00Elixir Systemd Logging<p>If you run an <a href="https://elixir-lang.org/">Elixir</a> application as a Linux service with <a href="https://systemd.io/">systemd</a>, you'll probably find that logging works pretty well out of the box. By default, Elixir uses the <a href="https://hexdocs.pm/logger/master/Logger.Backends.Console.html">Console</a> logger backend, which sends all log messages to stdout. And with systemd services, by default all stdout messages are sent to <a href="https://man7.org/linux/man-pages/man5/journald.conf.5.html">journald.</a></p>
<p>This means you can view your application's logs easily via the <a href="https://man7.org/linux/man-pages/man1/journalctl.1.html">journalctl</a> command. For example, you can "tail" your app's logs with a command like this (if the systemd unit for the app was named <tt>my_app</tt>):</p>
<blockquote><code>journalctl -u my_app -f</code></blockquote>
<p>You can also configure systemd to send your app's stdout to a custom log file instead of journald, using the <a href="https://www.freedesktop.org/software/systemd/man/systemd.exec.html#StandardOutput=">StandardOutput</a> directive. You can add that directive to the <tt>[Service]</tt> section of a systemd unit file (for example, to log to a custom <tt>/var/log/my_app.log</tt>):</p>
<blockquote><code># /etc/systemd/system/my_app.service
[Service]
ExecStart=/srv/my_app/bin/my_app start
ExecStop=/srv/my_app/bin/my_app stop
<ins>StandardOutput=append:/var/log/my_app.log</ins>
</code></blockquote>
<h3>Problems</h3>
<p>If you collect and ship your log messages off to a centralized log service (like <a href="https://aws.amazon.com/cloudwatch/">AWS CloudWatch</a>, <a href="https://cloud.google.com/logging/">Google Cloud Logging</a>, <a href="https://azure.microsoft.com/en-us/services/monitor/">Azure Monitor</a>, <a href="https://www.splunk.com/">Splunk</a>, <a href="https://www.sumologic.com/">Sumologic</a>, <a href="https://www.elastic.co/">Elasticsearch</a>, <a href="https://www.loggly.com/">Loggly</a>, <a href="https://www.datadoghq.com/">Datadog</a>, <a href="https://newrelic.com/">New Relic</a>, etc), you'll find two problems with this, however:</p>
<ol>
<li>Multi-line messages are broken up into a separate log entry for each line</li>
<li>Log level/priority is lost</li>
</ol>
<p>You can add some steps further down your logging pipeline to try to correct this, but the easiest way to fix it is at the source: Replace the default Console logger with the <a href="https://github.com/slashmili/ex_syslogger">ExSyslogger</a> backend.</p>
<p>Here's how you'd do that with a <a href="https://www.phoenixframework.org/">Phoenix</a> web app:</p>
<h3>1. Add the ex_syslogger dependency</h3>
<p>First, add the <tt>ex_syslogger</tt> library as a dependency to your <tt>mix.exs</tt> file:</p>
<blockquote><code># mix.exs
defp deps do
[
<ins>{:ex_syslogger, "~> 1.5"}</ins>
]
end
</code></blockquote>
<h3>2. Register the ex_syslogger backend</h3>
<p>Update the root <tt>config :logger</tt> options in your <tt>config/prod.exs</tt> file to register the <tt>ExSyslogger</tt> backend under the name <tt>:ex_syslogger</tt>:</p>
<blockquote><code># config/prod.exs
# Do not print debug messages in production
<del>config :logger, level: :info</del>
<ins>config :logger,
level: :info,
backends: [{ExSyslogger, :ex_syslogger}]</ins>
</code></blockquote>
<p>Note that the <tt>:ex_syslogger</tt> name isn't special — you can call it whatever you want. It just has to match the name you use in the next section:</p>
<h3>3. Configure the ex_syslogger backend</h3>
<p>Now add <tt>config :logger, :ex_syslogger</tt> options to your <tt>config/config.exs</tt> file to configure the backend named <tt>:ex_syslogger</tt> that you registered above. I'd suggest just duplicating the configuration you already have for the default <tt>:console</tt> backend, plus setting the syslog <tt>APP-NAME</tt> field to your app's name via the <tt>ident</tt> option:</p>
<blockquote><code># config/config.exs
# Configures Elixir's Logger
config :logger, :console,
format: "$time $metadata[$level] $message\n",
metadata: [:request_id]
<ins>
config :logger, :ex_syslogger,
format: "$time $metadata[$level] $message\n",
metadata: [:request_id],
ident: "my_app"</ins>
</code></blockquote>
<h3>Result</h3>
<p>Now when you compile your app with <tt>MIX_ENV=prod</tt> and run it as a systemd service, journald will automatically handle multi-line messages and log levels/priorities correctly. Furthermore, you can use any generic <a href="https://man7.org/linux/man-pages/man3/syslog.3.html">syslog</a> collector to ship log entries to your log service as soon as they occur — with multi-line messages and log levels intact.</p>
<p>For example, when using the default Console logger, an error message from a Phoenix web app would have been displayed like this by journalctl:</p>
<blockquote><code>$ journalctl -u my_app -f
Mar 26 18:21:10 foo my_app[580361]: 18:21:10.337 request_id=Fm_3dFhPMtEHARkAAALy [info] Sent 500 in 16ms
Mar 26 18:21:10 foo my_app[580361]: 18:21:10.345 [error] #PID<0.4149.0> running MyAppWeb.Endpoint (connection #PID<0.4148.0>, stream id 1) terminated
Mar 26 18:21:10 foo my_app[580361]: Server: foo.example.com:443 (https)
Mar 26 18:21:10 foo my_app[580361]: Request: GET /test/error
Mar 26 18:21:10 foo my_app[580361]: ** (exit) an exception was raised:
Mar 26 18:21:10 foo my_app[580361]: ** (RuntimeError) test runtime error
Mar 26 18:21:10 foo my_app[580361]: (my_app 0.1.0) lib/my_app_web/controllers/test_controller.ex:9: MyAppWeb.TestController.error/2
Mar 26 18:21:10 foo my_app[580361]: (my_app 0.1.0) lib/my_app_web/controllers/test_controller.ex:1: MyAppWeb.TestController.action/2
Mar 26 18:21:10 foo my_app[580361]: (my_app 0.1.0) lib/my_app_web/controllers/test_controller.ex:1: MyAppWeb.TestController.phoenix_controller_pipeline/2
Mar 26 18:21:10 foo my_app[580361]: (phoenix 1.5.8) lib/phoenix/router.ex:352: Phoenix.Router.__call__/2
Mar 26 18:21:10 foo my_app[580361]: (my_app 0.1.0) lib/my_app_web/endpoint.ex:1: MyAppWeb.Endpoint.plug_builder_call/2
Mar 26 18:21:10 foo my_app[580361]: (my_app 0.1.0) lib/my_app_web/endpoint.ex:1: MyAppWeb.Endpoint.call/2
Mar 26 18:21:10 foo my_app[580361]: (phoenix 1.5.8) lib/phoenix/endpoint/cowboy2_handler.ex:65: Phoenix.Endpoint.Cowboy2Handler.init/4
Mar 26 18:21:10 foo my_app[580361]: (cowboy 2.8.0) /srv/my_app/deps/cowboy/src/cowboy_handler.erl:37: :cowboy_handler.execute/2
</code></blockquote>
<p>But with ExSyslogger in place, you'll now see this (where the full error message is captured as a single log entry, and is recognized as an error-level message):</p>
<blockquote><code>$ journalctl -u my_app -f
Mar 26 18:21:10 foo my_app[580361]: 18:21:10.337 request_id=Fm_3dFhPMtEHARkAAALy [info] Sent 500 in 16ms
Mar 26 18:21:10 foo my_app[580361]: <span style="color:red;font-weight:bold">18:21:10.345 [error] #PID<0.4149.0> running MyAppWeb.Endpoint (connection #PID<0.4148.0>, stream id 1) terminated
Server: foo.example.com:443 (https)
Request: GET /test/error
** (exit) an exception was raised:
** (RuntimeError) test runtime error
(my_app 0.1.0) lib/my_app_web/controllers/test_controller.ex:9: MyAppWeb.TestController.error/2
(my_app 0.1.0) lib/my_app_web/controllers/test_controller.ex:1: MyAppWeb.TestController.action/2
(my_app 0.1.0) lib/my_app_web/controllers/test_controller.ex:1: MyAppWeb.TestController.phoenix_controller_pipeline/2
(phoenix 1.5.8) lib/phoenix/router.ex:352: Phoenix.Router.__call__/2
(my_app 0.1.0) lib/my_app_web/endpoint.ex:1: MyAppWeb.Endpoint.plug_builder_call/2
(my_app 0.1.0) lib/my_app_web/endpoint.ex:1: MyAppWeb.Endpoint.call/2
(phoenix 1.5.8) lib/phoenix/endpoint/cowboy2_handler.ex:65: Phoenix.Endpoint.Cowboy2Handler.init/4
(cowboy 2.8.0) /srv/my_app/deps/cowboy/src/cowboy_handler.erl:37: :cowboy_handler.execute/2</span>
</code></blockquote>
<p>And as a side note, you can use journalctl to view just error-level messages and above via the <tt>--priority=err</tt> flag (<tt>-p3</tt> for short):</p>
<blockquote><code>journalctl -u my_app -p3</code></blockquote>
Justin Ludwighttp://www.blogger.com/profile/03245749869056259124noreply@blogger.com0tag:blogger.com,1999:blog-3778768890472614719.post-84621879099266824262021-03-08T10:22:00.000-08:002021-03-08T10:22:03.686-08:00D3v6 Pan and Zoom<script src="https://d3js.org/d3.v6.min.js"></script>
<p>Since <a href="https://github.com/d3/d3">D3</a> version 3 it's been really easy to add panning and zooming to custom visualizations, allowing the user to scroll the SVG canvas vertically and horizontally by clicking and dragging the mouse cursor around the canvas, and to scale the canvas larger and smaller by spinning the mouse wheel.</p>
<h5>Simplest way</h5>
<p>For the simplest case, all you need is to apply the <tt><a href="https://github.com/d3/d3-zoom#zoom">d3.zoom()</a></tt> behavior to your root <tt>svg</tt> element. This is how you do it with D3 version 6 (<tt>d3.v6.js</tt>):</p>
<blockquote><code><svg id="viz1" width="300" height="300" style="background:#ffc">
<circle cx="50%" cy="50%" r="25%" fill="#69c" />
</svg>
<script>
const svg = d3
.select('#viz1')
.call(d3.zoom().on('zoom', ({ transform }) => svg.attr('transform', transform)))
</script>
</code></blockquote>
<p>It'll work like the following:</p>
<svg id="viz1" width="300" height="300" style="background:#ffc">
<circle cx="50%" cy="50%" r="25%" fill="#69c" />
</svg>
<script>
const svg = d3
.select('#viz1')
.call(d3.zoom().on('zoom', ({ transform }) => svg.attr('transform', transform)))
</script>
<h5>Smoothest way</h5>
<p>In most cases, however, you'll get smoother behavior by adding some group (<tt><g></tt>) elements to wrap your main visualization elements. If you're starting with a structure like the following, where you've got a <tt>.canvas</tt> group element containing the main content you want to pan and zoom:</p>
<blockquote><code><svg id="viz2" width="300" height="300">
<g class="canvas" transform="translate(150,150)">
<circle cx="0" cy="0" r="25%" fill="#69c" />
</g>
</svg>
</code></blockquote>
<p>Do this: add one wrapper group element, <tt>.zoomed</tt>, around the original <tt>.canvas</tt> group; and a second group element, <tt>.bg</tt>, around <tt>.zoomed</tt>; and add a <tt>rect</tt> inside the <tt>.bg</tt> group:</p>
<blockquote><code><svg id="viz2" width="300" height="300">
<g class="bg">
<rect width="100%" height="100%" fill="#efc" />
<g class="zoomed">
<g class="canvas" transform="translate(150,150)">
<circle cx="0" cy="0" r="25%" fill="#69c" />
</g>
</g>
</g>
</svg>
</code></blockquote>
<p>The <tt>rect</tt> inside the <tt>.bg</tt> group will ensure that the user's click-n-drag or mouse wheeling will be captured as long as the mouse pointer is anywhere inside the <tt>svg</tt> element (without this <tt>rect</tt>, the mouse would be captured only when the user positions the mouse over a graphical element drawn inside the <tt>.bg</tt> group — like the <tt>circle</tt> in this example). For this example, I've set the fill of the <tt>rect</tt> to a light green-yellow color; but usually you'd just set it to <tt>transparent</tt>.</p>
<p>Then attach the pan & zoom behavior to the <tt>.bg</tt> group — but apply the pan & zoom transform to the <tt>.zoomed</tt> group it contains. This will prevent stuttering when panning, since the <tt>.bg</tt> group will remain fixed; and it will avoid messing with any transforms or other fancy styling/positioning you already have on your inner <tt>.canvas</tt> group:</p>
<blockquote><code><script>
const zoomed = d3.select('#viz2 .zoomed')
const bg = d3
.select('#viz2 .bg')
.call(
d3
// base d3 pan & zoom behavior
.zoom()
// limit zoom to between 20% and 200% of original size
.scaleExtent([0.2, 2])
// apply pan & zoom transform to 'zoomed' element
.on('zoom', ({ transform }) => zoomed.attr('transform', transform))
// add 'grabbing' class to 'bg' element when panning;
// add 'scaling' class to 'bg' element when zooming
.on('start', ({ sourceEvent: { type } }) => {
bg.classed(type === 'wheel' ? 'scaling' : 'grabbing', true)
})
// remove 'grabbing' and 'scaling' classes when done panning & zooming
.on('end', () => bg.classed('grabbing scaling', false)),
)
</script>
</code></blockquote>
<p>Finally, set the mouse cursor via CSS when the user positions the pointer over the <tt>rect</tt> element. The <tt>grabbing</tt> and <tt>scaling</tt> classes will be added to the <tt>.bg</tt> group while the pan or zoom activity is ongoing, via the <tt>on('start')</tt> and <tt>on('end')</tt> hooks above:</p>
<blockquote><code><style lang="css">
.bg > rect {
cursor: move;
}
.bg.grabbing > rect {
cursor: grabbing;
}
.bg.scaling > rect {
cursor: zoom-in;
}
</style>
</code></blockquote>
<p>When you put it all together, it will work like the following:</p>
<svg id="viz2" width="300" height="300">
<g class="bg">
<rect width="100%" height="100%" fill="#efc" />
<g class="zoomed">
<g class="canvas" transform="translate(150,150)">
<circle cx="0" cy="0" r="25%" fill="#69c" />
</g>
</g>
</g>
</svg>
<script>
const zoomed = d3.select('#viz2 .zoomed')
const bg = d3
.select('#viz2 .bg')
.call(
d3
.zoom()
.scaleExtent([0.2, 2])
.on('zoom', ({ transform }) => zoomed.attr('transform', transform))
.on('start', ({ sourceEvent: { type } }) => {
bg.classed(type === 'wheel' ? 'scaling' : 'grabbing', true)
})
.on('end', () => bg.classed('grabbing scaling', false)),
)
</script>
<style lang="css">
.bg > rect {
cursor: move;
}
.bg.grabbing > rect {
cursor: grabbing;
}
.bg.scaling > rect {
cursor: zoom-in;
}
</style>
Justin Ludwighttp://www.blogger.com/profile/03245749869056259124noreply@blogger.com0tag:blogger.com,1999:blog-3778768890472614719.post-66514235105856224342021-01-14T21:18:00.000-08:002021-01-14T21:18:16.444-08:00Using an Ecto Readonly Replica Repo<p>Elixir <a href="https://hexdocs.pm/ecto/Ecto.html">Ecto</a> has excellent documentation for how to use <a href="https://hexdocs.pm/ecto/replicas-and-dynamic-repositories.html">read-only replica databases</a>, but because I'm so dense it took me a bit of trial and error to figure out where all the changes suggested by the documentation should go in my own app. Here's a concrete example of what I had to change for my conventional <a href="https://hexdocs.pm/mix/Mix.html">Mix</a> + <a href="https://www.phoenixframework.org/">Phoenix</a> application.</p>
<p>(The docs describe how to add N different replicas dynamically — <tt>MyApp.Repo.Replica1</tt>, <tt>MyApp.Repo.Replica2</tt>, etc — but since I only have to worry about a single endpoint for my read replicas, I simplified things and just used a single, static <tt>MyApp.Repo.Replica</tt> instance in my Elixir configuration and code.)</p>
<h3>Mix Environment Helper</h3>
<p>To allow my app to determine whether it was compiled and is running with a test, dev, or prod configuration, I added a <tt>config_env</tt> setting to my app, and set it to the value of the <tt>Mix.env/0</tt> function at compile time:</p>
<blockquote><code># config/config.exs
config :my_app,
<ins>config_env: Mix.env(),</ins>
ecto_repos: [MyApp.Repo]
end
</code></blockquote>
<p>Note that with Elixir 1.11 and newer, you can instead use <tt>Config.config_env/0</tt> in place of <tt>Mix.env/0</tt>:</p>
<blockquote><code># config/config.exs
config :my_app,
<ins>config_env: Config.config_env(),</ins>
ecto_repos: [MyApp.Repo]
end
</code></blockquote>
<p>And in my root <tt>MyApp</tt> module, I added a helper function to access this <tt>config_env</tt> setting:</p>
<blockquote><code># lib/my_app.ex
defmodule MyApp do
<ins>def config_env, do: Application.get_env(:my_app, :config_env)</ins>
end
</code></blockquote>
<p>This means that I can call <tt>MyApp.config_env/0</tt> at runtime in various places in my app's code, and get the <tt>Mix.env/0</tt> value with which the app was compiled (like <tt>:test</tt>, <tt>:dev</tt>, or <tt>:prod</tt>).</p>
<h3>Replica Module</h3>
<p>To my existing <tt>lib/my_app/repo.ex</tt> file (which already contained the <tt>MyApp.Repo</tt> module), I added the definition for my new <tt>MyApp.Repo.Replica</tt> module, like so:</p>
<blockquote><code># lib/my_app/repo.ex
defmodule MyApp.Repo do
use Ecto.Repo,
otp_app: :my_app,
adapter: Ecto.Adapters.Postgres
<ins>def replica, do: MyApp.Repo.Replica
end
defmodule MyApp.Repo.Replica do
use Ecto.Repo,
otp_app: :my_app,
adapter: Ecto.Adapters.Postgres,
default_dynamic_repo: if(MyApp.config_env() != :test, do: MyApp.Repo.Replica, else: MyApp.Repo),
read_only: true</ins>
end
</code></blockquote>
<p>The <tt>default_dynamic_repo</tt> option in the <tt>MyApp.Repo.Replica</tt> module uses the <tt>config_env</tt> helper I added above to set up the module to use the primary <tt>MyApp.Repo</tt>'s own connection pool for the read replica in the test environment, as recommended by the Ecto docs. This way the replica instance will just delegate to the primary repo instance for all of its read operations in the test environment, but will still enforce its own read-only setting. Also, this way I don't have to configure any test-env-specific settings for the read replica in my <tt>config/test.exs</tt> file (nor do I need to start up another child process for the replica, as we'll see in the next section).</p>
<h3>Application Module</h3>
<p>In non-test environments, the new read replica module does need to be started as a child process, alongside the primary repo. So I modified the <tt>start/2</tt> function in my application module to start it:</p>
<blockquote><code># lib/my_app/application.ex
defmodule MyApp.Application do
use Application
def start(_type, _args) do
<ins># don't start separate readonly repo in test mode
repos =
if MyApp.config_env() != :test do
[MyApp.Repo, MyApp.Repo.Replica]
else
[MyApp.Repo]
end</ins>
children =
<ins>repos ++</ins>
[
<del>MyApp.Repo,</del>
MyAppWeb.Endpoint
]
opts = [strategy: :one_for_one, name: MyApp.Supervisor]
Supervisor.start_link(children, opts)
end
end
</code></blockquote>
<h3>Dev Config</h3>
<p>For my dev environment configuration, I updated my <tt>config/dev.exs</tt> file to simply duplicate the configuration of the primary <tt>MyApp.Reop</tt> for the <tt>MyApp.Repo.Replica</tt> (creating a separate connection pool to the same database as the primary for the replica):</p>
<blockquote><code># config/dev.exs
<del>config :my_app, MyApp.Repo,</del>
<ins>for repo <- [MyApp.Repo, MyApp.Repo.Replica] do
config :my_app, repo,</ins>
username: "myusername",
password: "mypassword",
database: "mydatabase",
hostname: "localhost",
show_sensitive_data_on_connection_error: true,
pool_size: 10
<ins>end</ins>
</code></blockquote>
<h3>Prod Config</h3>
<p>For the prod environment configuration, I updated my <tt>config/releases.exs</tt> file to use a similar configuration as the primary for the replica, but have it instead pull the replica hostname from a different environment variable (<tt>DB_READONLY</tt> in this case):</p>
<blockquote><code># config/releases.exs
config :my_app, MyApp.Repo,
ssl: true,
username: System.get_env("DB_USERNAME"),
password: System.get_env("DB_PASSWORD"),
database: System.get_env("DB_DATABASE"),
hostname: System.get_env("DB_HOSTNAME"),
pool_size: String.to_integer(System.get_env("DB_POOLSIZE") || "10")
<ins>config :my_app, MyApp.Repo.Replica,
ssl: true,
username: System.get_env("DB_USERNAME"),
password: System.get_env("DB_PASSWORD"),
database: System.get_env("DB_DATABASE"),
hostname: System.get_env("DB_READONLY"),
pool_size: String.to_integer(System.get_env("DB_POOLSIZE") || "10")</ins>
</code></blockquote>
<h3>Using the Replica</h3>
<p>With all the above in place, everywhere in my Elixir code that I want to query a read replica instead the primary database, I can just replace <tt>MyApp.Repo</tt> with <tt>MyApp.Repo.replica()</tt>:</p>
<blockquote><code># lib/my_app/users.ex
import Ecto.Query
alias MyApp.Repo
alias MyApp.Users.User
def list_usernames do
from(u in User, select: u.username)
|> Repo<ins>.replica()</ins>.all()
end
</code></blockquote>
Justin Ludwighttp://www.blogger.com/profile/03245749869056259124noreply@blogger.com0tag:blogger.com,1999:blog-3778768890472614719.post-77625060039629759192020-12-30T20:33:00.001-08:002020-12-30T20:58:22.652-08:00Ecto RDS SSL Connection with Certificate Verification<p>It's nice and easy to connect to an <a href="https://aws.amazon.com/rds/">AWS RDS</a> instance with Elixir <a href="https://hexdocs.pm/ecto/Ecto.html">Ecto</a> over SSL/TLS, as long as you're not worried about verifying the database server's certificate. You just add a <tt>ssl: true</tt> setting when your configure the Ecto Repo, like this snippet from a <tt>config/releases.exs</tt> file for a hypothetical "myapp":</p>
<blockquote><code># config/releases.exs
config :myapp, MyApp.Repo,
hostname: System.get_env("DB_HOSTNAME"),
database: System.get_env("DB_DATABASE"),
username: System.get_env("DB_USERNAME"),
password: System.get_env("DB_PASSWORD"),
<ins>ssl: true</ins>
</code></blockquote>
<p>That's probably good enough for most cloud environments; but if you want to defend against a sophisticated attacker eavesdropping on or manipulating the SSL connections between your DB client and the RDS server, you also need to configure your Ecto Repo's <tt>ssl_opts</tt> setting to verify the server's certificate.</p>
<p>Unfortunately, this is not so straightforward. You need to either write your own certificate verification function (not trivial), or use one supplied by another library — like the <a href="https://github.com/deadtrickster/ssl_verify_fun.erl">ssl_verify_fun.erl</a> library.</p>
<p>To use the <tt>:ssl_verify_hostname</tt> verification function from the <tt>ssl_verify_fun.erl</tt> library, first add the library as a dependency to your <tt>mix.exs</tt> file:</p>
<blockquote><code># mix.exs
defp deps do
[
{:ecto_sql, "~> 3.5"},
<ins>{:ssl_verify_fun, ">= 0.0.0"}</ins>
]
end
</code></blockquote>
<p>Then add the following <tt>ssl_opts</tt> setting to your Ecto Repo config:</p>
<blockquote><code># config/releases.exs
<ins>check_hostname = String.to_charlist(System.get_env("DB_HOSTNAME"))</ins>
config :myapp, MyApp.Repo,
hostname: System.get_env("DB_HOSTNAME"),
database: System.get_env("DB_DATABASE"),
username: System.get_env("DB_USERNAME"),
password: System.get_env("DB_PASSWORD"),
ssl: true,
<ins>ssl_opts: [
cacertfile: "/etc/ssl/certs/rds-ca-2019-root.pem",
server_name_indication: check_hostname,
verify: :verify_peer,
verify_fun: {&:ssl_verify_hostname.verify_fun/3, [check_hostname: check_hostname]}
]</ins>
</code></blockquote>
<p>Note the RDS server hostname (which would be something like <tt>my-rds-cluster.cluster-abcd1234efgh.us-east-1.rds.amazonaws.com</tt>) needs to be passed to the <tt>server_name_indication</tt> and <tt>check_hostname</tt> options as a charlist. The above example also assumes that you have downloaded the <a href="https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/UsingWithRDS.SSL.html">root RDS SSL certificate</a> to <tt>/etc/ssl/certs/rds-ca-2019-root.pem</tt> on your DB client hosts.</p>
<p>I'd also suggest pulling out the generation of <tt>ssl_opts</tt> into a function, to make it easy to set up multiple repos. This is the way I'd do it with out our hypothetical "myapp" repo: I'd add one environment variable (<tt>DB_SSL</tt>) to trigger the Ecto <tt>ssl</tt> setting (with or without verifying the server cert), and another environment variable (<tt>DB_SSL_CA_CERT</tt>) to specify the path for the <tt>cacertfile</tt> option (triggering cert verification):</p>
<blockquote><code># config/releases.exs
make_ssl_opts = fn
"", _hostname ->
[]
cacertfile, hostname ->
check_hostname = String.to_charlist(hostname)
[
cacertfile: cacertfile,
server_name_indication: check_hostname,
verify: :verify_peer,
verify_fun: {&:ssl_verify_hostname.verify_fun/3, [check_hostname: check_hostname]}
]
end
db_ssl_ca_cert = System.get_env("DB_SSL_CA_CERT", "")
db_ssl = db_ssl_ca_cert != "" or System.get_env("DB_SSL", "") != ""
db_hostname = System.get_env("DB_HOSTNAME")
config :myapp, MyApp.Repo,
hostname: db_hostname,
database: System.get_env("DB_DATABASE"),
username: System.get_env("DB_USERNAME"),
password: System.get_env("DB_PASSWORD"),
ssl: db_ssl,
ssl_opts: make_ssl_opts.(db_ssl_ca_cert, db_hostname)
</code></blockquote>
<p>With this verification in place, you'd see an error like the following if your DB client tries to connect to a server with a SSL certificate signed by a CA other than the one you configured:</p>
<blockquote><code>{:tls_alert, {:unknown_ca, 'TLS client: In state certify at ssl_handshake.erl:1950 generated CLIENT ALERT: Fatal - Unknown CA\n'}}</code></blockquote>
<p>And you'd see an error like the following if the certificate was signed by the expected CA, but for a different hostname:</p>
<blockquote><code>{bad_cert,unable_to_match_altnames} - {:tls_alert, {:handshake_failure, 'TLS client: In state certify at ssl_handshake.erl:1952 generated CLIENT ALERT: Fatal - Handshake Failure\n {bad_cert,unable_to_match_altnames}'}}</code></blockquote>Justin Ludwighttp://www.blogger.com/profile/03245749869056259124noreply@blogger.com0tag:blogger.com,1999:blog-3778768890472614719.post-44109454457759164892020-12-16T15:33:00.003-08:002020-12-16T15:33:49.829-08:00Using Logstash to Ingest CloudFront Logs Into Elasticsearch<p><a href="https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html">Elasticsearch</a> can be a good way of monitoring usage of your <a href="https://aws.amazon.com/cloudfront/">AWS CloudFront</a> websites. There are some fairly straightforward paths to shipping CloudFront logs to hosted Elasticsearch services like <a href="https://dzone.com/articles/cloudfront-log-analysis-using-the-logzio-elk-stack">Logz.io</a> or <a href="https://aws.amazon.com/blogs/networking-and-content-delivery/cloudfront-realtime-logs/">Amazon Elasticsearch</a>. Here's how to do it with your own self-hosted Elasticsearch and Logstash instances:</p>
<ol>
<li><a href="set-up-cloudfront-logging">Set up CloudFront logging</a></li>
<li><a href="set-up-sqs-notifications">Set up SQS notifications</a></li>
<li><a href="set-up-test-logstash-pipeline">Set up test Logstash pipeline</a></li>
<li><a href="set-up-main-logstash-pipeline">Set up main Logstash pipeline</a></li>
<li><a href="view-logs-in-kibana">View logs in Kibana</a></li>
</ol>
<h3 id="set-up-cloudfront-logging">Set up CloudFront logging</h3>
<p>First, you need an S3 bucket to store your CloudFront logs. You can use an existing bucket, or create a new one. You don't need to set up any special permissions for the bucket — but you probably will want to make sure the bucket denies public access to its content by default. In this example, we'll use an S3 bucket for logs called <tt>my-log-bucket</tt>, and we'll store our CloudFront logs under a directory of the bucket called <tt>my-cloudfront-logs</tt>. Also, we'll store each CloudFront distribution's logs in their own subdirectory of that directory; so for the distribution serving the <tt>www.example.com</tt> domain, we'll store the distributions logs under the <tt>my-cloudfront-logs/www.example.com</tt> subdirectory.</p>
<p>With the S3 logging bucket created and available, update each of your CloudFront distributions to log to it. You can do this via the AWS console by editing the distribution, turning the "Standard Logging" setting on, setting the "S3 Bucket for Logs" to your S3 logging bucket (<tt>my-log-bucket.s3.amazonaws.com</tt>), and setting the "Log Prefix" to the directory path of the subdirectory of the S3 bucket under which you'll store the logs (<tt>my-cloudfront-logs/www.example.com/</tt>). Save your changes, and every few minutes CloudFront will save a new <tt>.gz</tt> file to the <tt>my-cloudfront-logs/www.example.com/</tt> subdirectory of the <tt>my-log-bucket</tt> (see the <a href="https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/AccessLogs.html">CloudFront access logs</a> docs for details).</p>
<h3 id="set-up-sqs-notifications">Set up SQS notifications</h3>
<p>Next, create a new SQS queue. We'll call ours <tt>my-cloudfront-log-notifications</tt>, and we'll create it in the <tt>us-east-1</tt> AWS region. When you create the queue, configure its "Receive message wait time" setting to <tt>10</tt> seconds or so; this will ensure the SQS client doesn't make way more SQS requests than needed (a setting of 10 seconds should keep the cost of this queue down to less than $1/month).</p>
<p>The only other thing special you need to do when you create the queue is add an access policy to it that allows S3 to send messages to it. The policy should look like this (replace <tt>my-cloudfront-log-notifications</tt> with the name of your queue, <tt>us-east-1</tt> with your queue's region, <tt>my-log-bucket</tt> with the name of your log bucket, and <tt>123456789012</tt> with your AWS account ID):</p>
<blockquote><code>{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "*"
},
"Action": "SQS:SendMessage",
"Resource": "arn:aws:sqs:us-east-1:123456789012:my-cloudfront-log-notifications",
"Condition": {
"StringEquals": {
"aws:SourceAccount": "123456789012"
},
"ArnLike": {
"aws:SourceArn": "arn:aws:s3:*:*:my-log-bucket"
}
}
}
]
}</code></blockquote>
<p>With the SQS queue created, update the S3 bucket to send all object-create events to the queue. You can do this via the AWS console by selecting the bucket and opening the "Events" block in the "Advanced Settings" section of the "Properties" tab of the bucket. There you can add a notification; name it <tt>my-cloudfront-log-configuration</tt>, check the "All object create events" checkbox, set the "Prefix" to <tt>my-cloudfront-logs/</tt>, and send it to your SQS queue <tt>my-cloudfront-log-notifications</tt>.</p>
<p>Alternately, you can add a notification with the same settings as above via the <tt>put-bucket-notification-configuration</tt> command of the <tt>s3api</tt> CLI, using a notification-configuration JSON file like the following:</p>
<blockquote><code>{
"QueueConfigurations": [
{
"Id": "my-cloudfront-log-configuration",
"QueueArn": "arn:aws:sqs:us-east-1:123456789012:my-cloudfront-log-notifications",
"Events": [
"s3:ObjectCreated:*"
],
"Filter": {
"Key": {
"FilterRules": [
{
"Name": "prefix",
"Value": "my-cloudfront-logs/"
}
]
}
}
}
]
}</code></blockquote>
<p>Now that you've hooked up S3 bucket notifications to the SQS queue, if you look in the AWS console for the SQS queue, under the Monitoring tab's charts you'll start to see messages received every few minutes.</p>
<h3 id="set-up-logstash-pipeline">Set up test Logstash pipeline</h3>
<p>Download a sample <tt>.gz</tt> log file from your S3 logging bucket, and copy it over to the machine you have Logstash running on. Move the file to a directory that Logstash can access, and make sure it has read permissions on the file. Our sample file will live at <tt>/var/log/my-cloudfront-logs/www.example.com/E123456789ABCD.2020-01-02-03.abcd1234.gz</tt>.</p>
<p>Copy the following <tt>my-cloudfront-pipeline.conf</tt> file into the <tt>/etc/logstash/conf.d</tt> directory on your Logstash machine (replacing the input path with your sample <tt>.gz</tt> log file), tail the Logstash logs (<tt>journalctl -u logstash -f</tt> if managed with systemd), and restart the Logstash service (<tt>sudo systemctl restart logstash</tt>):</p>
<blockquote><code># /etc/logstash/conf.d/my-cloudfront-pipeline.conf
input {
file {
file_completed_action => "log"
file_completed_log_path => "/var/lib/logstash/cloudfront-completed.log"
mode => "read"
path => "/var/log/my-cloudfront-logs/www.example.com/E123456789ABCD.2020-01-02-03.abcd1234.gz"
sincedb_path => "/var/lib/logstash/cloudfront-since.db"
type => "cloudfront"
}
}
filter {
if [type] == "cloudfront" {
if (("#Version: 1.0" in [message]) or ("#Fields: date" in [message])) {
drop {}
}
mutate {
rename => {
"type" => "[@metadata][type]"
}
# strip dashes that indicate empty fields
gsub => ["message", "\t-(?=\t)", " "] # literal tab
}
#Fields: date time x-edge-location sc-bytes c-ip cs-method cs(Host) cs-uri-stem sc-status cs(Referer) cs(User-Agent) cs-uri-query cs(Cookie) x-edge-result-type x-edge-request-id x-host-header cs-protocol cs-bytes time-taken x-forwarded-for ssl-protocol ssl-cipher x-edge-response-result-type cs-protocol-version fle-status fle-encrypted-fields c-port time-to-first-byte x-edge-detailed-result-type sc-content-type sc-content-len sc-range-start sc-range-end
csv {
separator => " " # literal tab
columns => [
"date",
"time",
"x_edge_location",
"sc_bytes",
"c_ip",
"cs_method",
"cs_host",
"cs_uri_stem",
"sc_status",
"cs_referer",
"cs_user_agent",
"cs_uri_query",
"cs_cookie",
"x_edge_result_type",
"x_edge_request_id",
"x_host_header",
"cs_protocol",
"cs_bytes",
"time_taken",
"x_forwarded_for",
"ssl_protocol",
"ssl_cipher",
"x_edge_response_result_type",
"cs_protocol_version",
"fle_status",
"fle_encrypted_fields",
"c_port",
"time_to_first_byte",
"x_edge_detailed_result_type",
"sc_content_type",
"sc_content_len",
"sc_range_start",
"sc_range_end"
]
convert => {
"c_port" => "integer"
"cs_bytes" => "integer"
"sc_bytes" => "integer"
"sc_content_len" => "integer"
"sc_range_end" => "integer"
"sc_range_start" => "integer"
"sc_status" => "integer"
"time_taken" => "float"
"time_to_first_byte" => "float"
}
add_field => {
"datetime" => "%{date} %{time}"
"[@metadata][document_id]" => "%{x_edge_request_id}"
}
remove_field => ["cloudfront_fields", "cloudfront_version", "message"]
}
# parse datetime
date {
match => ["datetime", "yy-MM-dd HH:mm:ss"]
remove_field => ["datetime", "date", "time"]
}
# lookup geolocation of client ip address
geoip {
source => "c_ip"
target => "geo"
}
# parse user-agent into subfields
urldecode {
field => "cs_user_agent"
}
useragent {
source => "cs_user_agent"
target => "ua"
add_field => {
"user_agent.name" => "%{[ua][name]}"
"user_agent.version" => "%{[ua][major]}"
"user_agent.device.name" => "%{[ua][device]}"
"user_agent.os.name" => "%{[ua][os_name]}"
"user_agent.os.version" => "%{[ua][os_major]}"
}
remove_field => ["cs_user_agent", "ua"]
}
# pull logfile path from s3 metadata, if present
if [@metadata][s3][object_key] {
mutate {
add_field => {
"path" => "%{[@metadata][s3][object_key]}"
}
}
}
# strip directory path from logfile path, and canonicalize field name
mutate {
rename => {
"path" => "log.file.path"
}
gsub => ["log.file.path", ".*/", ""]
remove_field => "host"
}
# canonicalize field names, and drop unwanted fields
mutate {
rename => {
"c_ip" => "client.ip"
"cs_bytes" => "http.request.bytes"
"sc_content_len" => "http.response.body.bytes"
"sc_content_type" => "http.response.body.type"
"cs_method" => "http.request.method"
"cs_protocol" => "url.scheme"
"cs_protocol_version" => "http.version"
"cs_referer" => "http.request.referrer"
"cs_uri_query" => "url.query"
"cs_uri_stem" => "url.path"
"sc_bytes" => "http.response.bytes"
"sc_status" => "http.response.status_code"
"ssl_cipher" => "tls.cipher"
"ssl_protocol" => "tls.protocol_version"
"x_host_header" => "url.domain"
}
gsub => [
"http.version", "HTTP/", "",
"tls.protocol_version", "TLSv", ""
]
remove_field => [
"c_port",
"cs_cookie",
"cs_host",
"fle_encrypted_fields",
"fle_status",
"sc_range_end",
"sc_range_start",
"x_forwarded_for"
]
}
}
}
output {
stdout {
codec => "rubydebug"
}
}
</code></blockquote>
<p>You should see a bunch of entries in the Logstash logs like the following, one for each entry from your sample log file (note the fields will appear in a different order every time you run this):</p>
<blockquote><code>
Jan 02 03:04:05 logs1 logstash[12345]: {
Jan 02 03:04:05 logs1 logstash[12345]: "x_edge_detailed_result_type" => "Hit",
Jan 02 03:04:05 logs1 logstash[12345]: "@timestamp" => 2020-01-02T03:01:02.000Z,
Jan 02 03:04:05 logs1 logstash[12345]: "user_agent.device.name" => "EML-AL00",
Jan 02 03:04:05 logs1 logstash[12345]: "time_taken" => 0.001,
Jan 02 03:04:05 logs1 logstash[12345]: "http.version" => "2.0",
Jan 02 03:04:05 logs1 logstash[12345]: "user_agent.os.version" => "8",
Jan 02 03:04:05 logs1 logstash[12345]: "http.response.body.bytes" => nil,
Jan 02 03:04:05 logs1 logstash[12345]: "tls.cipher" => "ECDHE-RSA-AES128-GCM-SHA256",
Jan 02 03:04:05 logs1 logstash[12345]: "http.response.bytes" => 2318,
Jan 02 03:04:05 logs1 logstash[12345]: "@version" => "1",
Jan 02 03:04:05 logs1 logstash[12345]: "time_to_first_byte" => 0.001,
Jan 02 03:04:05 logs1 logstash[12345]: "http.request.method" => "GET",
Jan 02 03:04:05 logs1 logstash[12345]: "x_edge_request_id" => "s7lmJasUXiAm7w2oR34Gfg5zTgeQSTkYwiYV1pnz5Hzv8mRmBzyGrw==",
Jan 02 03:04:05 logs1 logstash[12345]: "log.file.path" => "EML9FBPJY2494.2020-01-02-03.abcd1234.gz",
Jan 02 03:04:05 logs1 logstash[12345]: "x_edge_result_type" => "Hit",
Jan 02 03:04:05 logs1 logstash[12345]: "http.request.bytes" => 388,
Jan 02 03:04:05 logs1 logstash[12345]: "http.request.referrer" => "http://baidu.com/",
Jan 02 03:04:05 logs1 logstash[12345]: "client.ip" => "192.0.2.0",
Jan 02 03:04:05 logs1 logstash[12345]: "user_agent.name" => "UC Browser",
Jan 02 03:04:05 logs1 logstash[12345]: "user_agent.version" => "11",
Jan 02 03:04:05 logs1 logstash[12345]: "url.query" => nil,
Jan 02 03:04:05 logs1 logstash[12345]: "http.response.body.type" => "text/html",
Jan 02 03:04:05 logs1 logstash[12345]: "url.domain" => "www.example.com",
Jan 02 03:04:05 logs1 logstash[12345]: "x_edge_location" => "LAX50-C3",
Jan 02 03:04:05 logs1 logstash[12345]: "http.response.status_code" => 200,
Jan 02 03:04:05 logs1 logstash[12345]: "geo" => {
Jan 02 03:04:05 logs1 logstash[12345]: "ip" => "192.0.2.0",
Jan 02 03:04:05 logs1 logstash[12345]: "region_name" => "Shanghai",
Jan 02 03:04:05 logs1 logstash[12345]: "country_name" => "China",
Jan 02 03:04:05 logs1 logstash[12345]: "timezone" => "Asia/Shanghai",
Jan 02 03:04:05 logs1 logstash[12345]: "longitude" => 121.4012,
Jan 02 03:04:05 logs1 logstash[12345]: "country_code3" => "CN",
Jan 02 03:04:05 logs1 logstash[12345]: "location" => {
Jan 02 03:04:05 logs1 logstash[12345]: "lon" => 121.4012,
Jan 02 03:04:05 logs1 logstash[12345]: "lat" => 31.0449
Jan 02 03:04:05 logs1 logstash[12345]: },
Jan 02 03:04:05 logs1 logstash[12345]: "region_code" => "SH",
Jan 02 03:04:05 logs1 logstash[12345]: "country_code2" => "CN",
Jan 02 03:04:05 logs1 logstash[12345]: "continent_code" => "AS",
Jan 02 03:04:05 logs1 logstash[12345]: "latitude" => 31.0449
Jan 02 03:04:05 logs1 logstash[12345]: },
Jan 02 03:04:05 logs1 logstash[12345]: "url.scheme" => "https",
Jan 02 03:04:05 logs1 logstash[12345]: "tls.protocol_version" => "1.2",
Jan 02 03:04:05 logs1 logstash[12345]: "user_agent.os.name" => "Android",
Jan 02 03:04:05 logs1 logstash[12345]: "x_edge_response_result_type" => "Hit",
Jan 02 03:04:05 logs1 logstash[12345]: "url.path" => "/"
Jan 02 03:04:05 logs1 logstash[12345]: }
</code></blockquote>
<p>These entries show you what Logstash will push to Elasticsearch, once you hook it up. You can adjust this <tt>my-cloudfront-pipeline.conf</tt> file and restart Logstash again and again until you get the exact field names and values that you want to push to Elasticsearch.</p>
<p>Let's look at each part of the pipeline individually.</p>
<p>In the <tt>input</tt> section, we're using the <tt>file</tt> input to read just our one sample file:</p>
<blockquote><code>
input {
file {
file_completed_action => "log"
file_completed_log_path => "/var/lib/logstash/cloudfront-completed.log"
mode => "read"
path => "/var/log/my-cloudfront-logs/www.example.com/E123456789ABCD.2020-01-02-03.abcd1234.gz"
sincedb_path => "/var/lib/logstash/cloudfront-since.db"
type => "cloudfront"
}
}
</code></blockquote>
<p>The key bit here is that we set the <tt>type</tt> field to <tt>cloudfront</tt>, which we'll use in the <tt>filter</tt> section below to apply our filtering logic only to entries of this type. If you're only going to process CloudFront log files in this pipeline, you can omit all the bits of the pipeline that deal with "type", which would simplify it some.</p>
<p>In the <tt>filter</tt> section, the first step is to check if the <tt>type</tt> field was set to <tt>"cloudfront"</tt>, and only execute the rest of the filter block if so:</p>
<blockquote><code>
filter {
if [type] == "cloudfront" {
</code></blockquote>
<p>Then the next step in <tt>filter</tt> section is to drop the two header lines in each CloudFront log file, the first beginning with <tt>#Version</tt>, and the second beginning with <tt>#Fields</tt>:</p>
<blockquote><code>
if (("#Version: 1.0" in [message]) or ("#Fields: date" in [message])) {
drop {}
}
</code></blockquote>
<p>After that, the next step renames the <tt>type</tt> field to <tt>[@metadata][type]</tt>, so that it won't be pushed to the Elasticsearch index. I've opted to use Elasticsearch indexes that are for my CloudFront logs only; however, if you want to push your CloudFront logs into indexes that are shared with other data, you may want to keep the <tt>type</tt> field.</p>
<blockquote><code>
mutate {
rename => {
"type" => "[@metadata][type]"
}
</code></blockquote>
<p>The second half of this <tt>mutate</tt> filter strips out the <tt>-</tt> characters that indicate empty field values from all the columns in the log entry. Note that the last argument of this <tt>gsub</tt> function is a literal tab character — make sure your text editor does not convert it to spaces!</p>
<blockquote><code>
# strip dashes that indicate empty fields
gsub => ["message", "\t-(?=\t)", " "] # literal tab
}
</code></blockquote>
<p>For example, it will convert a entry like this:</p>
<blockquote><code>
2020-01-02 03:03:03 HIO50-C1 6564 192.0.2.0 GET d2c4n4ttot8c65.cloudfront.net / 200 - Mozilla/5.0%20(Windows%20NT%206.1;%20WOW64;%20rv:40.0)%20Gecko/20100101%20Firefox/40.1 - - Miss nY0knXse4vDxS5uOBe3YAhDpH809bqhsILUUFAtE_4ZLlfXCiYcD0A== www.example.com https 170 0.164 - TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256 Miss HTTP/1.1 - - 62684 0.164 Miss text/html 6111 - -
</code></blockquote>
<p>Into this (removing the dashes that indicate empty values, but not the dashes in non-empty values like the date or ciphersuite):</p>
<blockquote><code>
2020-01-02 03:03:03 HIO50-C1 6564 192.0.2.0 GET d2c4n4ttot8c65.cloudfront.net / 200 Mozilla/5.0%20(Windows%20NT%206.1;%20WOW64;%20rv:40.0)%20Gecko/20100101%20Firefox/40.1 Miss nY0knXse4vDxS5uOBe3YAhDpH809bqhsILUUFAtE_4ZLlfXCiYcD0A== www.example.com https 170 0.164 TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256 Miss HTTP/1.1 62684 0.164 Miss text/html 6111
</code></blockquote>
<p>The next step is the meat of the process, using the <tt>csv</tt> filter to convert each tab-separated log line into named fields. Note that the <tt>separator</tt> property value is also a literal tab character:</p>
<blockquote><code>
#Fields: date time x-edge-location sc-bytes c-ip cs-method cs(Host) cs-uri-stem sc-status cs(Referer) cs(User-Agent) cs-uri-query cs(Cookie) x-edge-result-type x-edge-request-id x-host-header cs-protocol cs-bytes time-taken x-forwarded-for ssl-protocol ssl-cipher x-edge-response-result-type cs-protocol-version fle-status fle-encrypted-fields c-port time-to-first-byte x-edge-detailed-result-type sc-content-type sc-content-len sc-range-start sc-range-end
csv {
separator => " " # literal tab
columns => [
"date",
"time",
"x_edge_location",
"sc_bytes",
"c_ip",
"cs_method",
"cs_host",
"cs_uri_stem",
"sc_status",
"cs_referer",
"cs_user_agent",
"cs_uri_query",
"cs_cookie",
"x_edge_result_type",
"x_edge_request_id",
"x_host_header",
"cs_protocol",
"cs_bytes",
"time_taken",
"x_forwarded_for",
"ssl_protocol",
"ssl_cipher",
"x_edge_response_result_type",
"cs_protocol_version",
"fle_status",
"fle_encrypted_fields",
"c_port",
"time_to_first_byte",
"x_edge_detailed_result_type",
"sc_content_type",
"sc_content_len",
"sc_range_start",
"sc_range_end"
]
}
</code></blockquote>
<p>The <tt>columns</tt> property lists out each field name, in order. Later on in this pipeline, we'll rename many of these fields to use the <a href="https://www.elastic.co/guide/en/ecs/current/index.html">ECS</a> nomenclature, but this step uses the field names as defined by CloudFront, for clarity.</p>
<p>The middle part of the <tt>csv</tt> filter converts the numeric fields to actual numbers, via the <tt>convert</tt> property mapping:</p>
<blockquote><code>
convert => {
"c_port" => "integer"
"cs_bytes" => "integer"
"sc_bytes" => "integer"
"sc_content_len" => "integer"
"sc_range_end" => "integer"
"sc_range_start" => "integer"
"sc_status" => "integer"
"time_taken" => "float"
"time_to_first_byte" => "float"
}
</code></blockquote>
<p>The <tt>add_field</tt> part of the <tt>csv</tt> filter combines the individual <tt>date</tt> and <tt>time</tt> fields into a combined <tt>datetime</tt> field (to be converted to a timestamp object later); and also copies the <tt>x_edge_request_id</tt> field value as the <tt>[@metadata][document_id]</tt> field:</p>
<blockquote><code>
add_field => {
"datetime" => "%{date} %{time}"
"[@metadata][document_id]" => "%{x_edge_request_id}"
}
</code></blockquote>
<p>The <tt>[@metadata][document_id]</tt> field will be used later on when we push the record to Elasticsearch (to be used as the record's ID). Like with the <tt>[@metadata][type]</tt> field, this is another case where if you're only going to process CloudFront log files in this pipeline, you could omit this extra metadata field, and just use the <tt>x_edge_request_id</tt> directly when configuring the Elasticsearch record ID.</p>
<p>The final part of the <tt>csv</tt> filter removes some fields that are redundant once the log entry has been parsed: <tt>message</tt> (the full log entry text itself), and <tt>cloudfront_fields</tt> and <tt>cloudfront_version</tt> (which the <tt>s3snssqs</tt> input we'll add later automatically includes):</p>
<blockquote><code>
remove_field => ["cloudfront_fields", "cloudfront_version", "message"]
}
</code></blockquote>
<p>The next filter step is to convert the <tt>datetime</tt> field (created from the <tt>date</tt> and <tt>time</tt> fields above) into a proper datetime object:</p>
<blockquote><code>
# parse datetime
date {
match => ["datetime", "yy-MM-dd HH:mm:ss"]
remove_field => ["datetime", "date", "time"]
}
</code></blockquote>
<p>This sets the datetime as the value of the <tt>@timestamp</tt> field. We'll also remove the <tt>datetime</tt>, <tt>date</tt>, and <tt>time</tt> fields, since we won't need them now that we have the parsed datetime in the <tt>@timestamp</tt> field.</p>
<p>The next filter uses the client IP address to lookup a probable physical location for the client:</p>
<blockquote><code>
# lookup geolocation of client ip address
geoip {
source => "c_ip"
target => "geo"
}
</code></blockquote>
<p>This creates a <tt>geo</tt> field with a bunch of subfields (like <tt>[geo][country_name]</tt>, <tt>[geo][city_name]</tt>, etc) containing the probable location details. Note that many IP address won't have a mapping value for many of the subfields; see the <a href="https://www.elastic.co/guide/en/logstash/current/plugins-filters-geoip.html">Geoip filter docs</a> for more details.</p>
<p>The next filter decodes the user-agent field, and the filter after that parses it. The <tt>useragent</tt> filter parses the <tt>cs_user_agent</tt> field into the <tt>ua</tt> field, which, like the <tt>geo</tt> field, will contain a bunch of subfields. We'll pull out a few of those subfields, and add fields with ECS names for them:</p>
<blockquote><code>
# parse user-agent into subfields
urldecode {
field => "cs_user_agent"
}
useragent {
source => "cs_user_agent"
target => "ua"
add_field => {
"user_agent.name" => "%{[ua][name]}"
"user_agent.version" => "%{[ua][major]}"
"user_agent.device.name" => "%{[ua][device]}"
"user_agent.os.name" => "%{[ua][os_name]}"
"user_agent.os.version" => "%{[ua][os_major]}"
}
remove_field => ["cs_user_agent", "ua"]
}
</code></blockquote>
<p>Since the user-agent info we want are now in those newly added <tt>user_agent.*</tt> fields, the last part of the <tt>useragent</tt> filter removes the <tt>cs_user_agent</tt> field and intermediate <tt>ua</tt> field.</p>
<p>When using the <tt>file</tt> input, like we are while testing this pipeline, the <tt>file</tt> input will add a <tt>path</tt> field to each record, containing the path to the file its reading. Later on, when we use the <tt>s3snssqs</tt> input, the <tt>s3snssqs</tt> input will pass the same path as the <tt>[@metadata][s3][object_key]</tt> field. So that we can access this value uniformly, regardless of which input we used, we have this next filter step, where if the <tt>[@metadata][s3][object_key]</tt> field is present, we set the <tt>path</tt> field to the <tt>[@metadata][s3][object_key]</tt> field's value:</p>
<blockquote><code>
# pull logfile path from s3 metadata, if present
if [@metadata][s3][object_key] {
mutate {
add_field => {
"path" => "%{[@metadata][s3][object_key]}"
}
}
}
</code></blockquote>
<p>With the <tt>path</tt> field now containing the file path, regardless of input, we use the next filter to chop the path down to just the log file name (like <tt>E123456789ABCD.2020-01-02-03.abcd1234.gz</tt>):</p>
<blockquote><code>
# strip directory path from logfile path, and canonicalize field name
mutate {
rename => {
"path" => "log.file.path"
}
gsub => ["log.file.path", ".*/", ""]
remove_field => "host"
}
</code></blockquote>
<p>We also have the filter rename the <tt>path</tt> field to <tt>log.file.path</tt> (the canonical ECS name for it); and have the filter remove the <tt>host</tt> field (added by the <tt>file</tt> input along with the <tt>path</tt> field, based on the host Logstash is running on — which we don't really care to have as part of our log record in Elasticsearch).</p>
<p>The last filter in our pipeline renames all CloudFront fields that have equivalent <a href="https://www.elastic.co/guide/en/ecs/current/index.html">ECS</a> (Elastic Common Schema) field names:</p>
<blockquote><code>
# canonicalize field names, and drop unwanted fields
mutate {
rename => {
"c_ip" => "client.ip"
"cs_bytes" => "http.request.bytes"
"sc_content_len" => "http.response.body.bytes"
"sc_content_type" => "http.response.body.type"
"cs_method" => "http.request.method"
"cs_protocol" => "url.scheme"
"cs_protocol_version" => "http.version"
"cs_referer" => "http.request.referrer"
"cs_uri_query" => "url.query"
"cs_uri_stem" => "url.path"
"sc_bytes" => "http.response.bytes"
"sc_status" => "http.response.status_code"
"ssl_cipher" => "tls.cipher"
"ssl_protocol" => "tls.protocol_version"
"x_host_header" => "url.domain"
}
</code></blockquote>
<p>To match the ECS field specs, the middle part of the filter removes the <tt>HTTP/</tt> prefix from the <tt>http.version</tt> field values (converting values like <tt>HTTP/2.0</tt> to just <tt>2.0</tt>); and removes the <tt>TLSv</tt> prefix from the <tt>tls.protocol_version</tt> field values (converting values like <tt>TLSv1.2</tt> to just <tt>1.2</tt>):</p>
<blockquote><code>
gsub => [
"http.version", "HTTP/", "",
"tls.protocol_version", "TLSv", ""
]
</code></blockquote>
<p>And finally, the last part of the filter removes miscellaneous CloudFront fields that we don't care about:</p>
<blockquote><code>
remove_field => [
"c_port",
"cs_cookie",
"cs_host",
"fle_encrypted_fields",
"fle_status",
"sc_range_end",
"sc_range_start",
"x_forwarded_for"
]
}
}
}
</code></blockquote>
<p>The output section of the pipeline simply outputs each log record to Logstash's own log output — which is what you see when you tail Logstash's logs:</p>
<blockquote><code>
output {
stdout {
codec => "rubydebug"
}
}
</code></blockquote>
<h3 id="set-up-main-logstash-pipeline">Set up main Logstash pipeline</h3>
<p>Once you have this test pipeline working to your satisfaction, it's time to change the output section of the pipeline to push the output to Elasticsearch. Replace the <tt>output</tt> block of the <tt>/etc/logstash/conf.d/my-cloudfront-pipeline.conf</tt> file with this block (substituting your own <tt>host</tt>, <tt>user</tt>, and <tt>password</tt> settings, as well as any custom SSL settings you need — see the <a href="https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html">Elasticsearch output plugin</a> docs for details):</p>
<blockquote><code>
output {
# don't try to index anything that didn't get a document_id
if [@metadata][document_id] {
elasticsearch {
hosts => ["https://elasticsearch.example.com:9243"]
user => "elastic"
password => "password123"
document_id => "%{[@metadata][document_id]}"
ecs_compatibility => "v1"
index => "ecs-logstash-%{[@metadata][type]}-%{+YYYY.MM.dd}"
}
}
}
</code></blockquote>
<p>This following line in this block serves as one more guard to avoid indexing anything that didn't get parsed properly (you may want to send such log entries to a dedicated errors index, to keep an eye on entries that failed to parse):</p>
<blockquote><code>if [@metadata][document_id] {</code></blockquote>
<p>And this line uses the <tt>[@metadata][document_id]</tt> field to set the record ID for each entry (recall in the pipeline filters, we copied the value of the CloudFront <tt>x_edge_request_id</tt>, which should be unique for each request, to the <tt>[@metadata][document_id]</tt> field):</p>
<blockquote><code>document_id => "%{[@metadata][document_id]}"</code></blockquote>
<p>And since our output block includes setting <tt>ecs_compatibility</tt> to <tt>v1</tt>, which directs Logstash to use ECS-compatible index templates, this line directs Logstash to create a separate index for each day and type of log entry we process:</p>
<blockquote><code>index => "ecs-logstash-%{[@metadata][type]}-%{+YYYY.MM.dd}"</code></blockquote>
<p>For example, Logstash will create an index named <tt>ecs-logstash-cloudfront-2020.01.02</tt> if we process a CloudFront log entry for January 2, 2020 (or use the existing index with that name, if it already exists).</p>
<p>Restart Logstash once you change the output block. In Logstash's own log output, you should see entries indicating succesful connections to your Elasticsearch host, as well as a ginormous entry for the index template it installs in Elasticsearch. Once you see that, check your Elasticsearch instance — you should see a new <tt>ecs-logstash-cloudfront-YYYY.MM.DD</tt> index created, with entries from your sample CloudFront log file.</p>
<p>You can use this same mechanism to backfill your existing CloudFront log files to Elastic search — manually download the log files to backfill to your Logstash machine (like via the <tt>sync</tt> command of the <tt>s3</tt> CLI), and customize the <tt>file</tt> input block's <tt>path</tt> property (with wildcards) to direct Logstash to read them in.</p>
<p>For future CloudFront log files, however, we're going to make one more change to our pipeline, and use the <a href="https://www.elastic.co/guide/en/logstash/current/plugins-inputs-s3-sns-sqs.html">S3 via SNS/SQS</a> input (aka <tt>s3snssqs</tt>) to pull CloudFront log files from S3 as soon as CloudFront publishes them.</p>
<p>First, create a new IAM policy for your Logstash machine to use that will allow it to both read from your logging bucket, and to read and delete items from the SQS queue we set up above. The policy should look like this (change the <tt>Resource</tt> elements to point to your own S3 log bucket and SQS log queue, set up in the first two sections of this article):</p>
<blockquote><code>{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": "s3:ListBucket",
"Resource": "arn:aws:s3:::my-log-bucket"
},
{
"Effect": "Allow",
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::my-log-bucket/my-cloudfront-logs/*"
},
{
"Effect": "Allow",
"Action": [
"sqs:Get*",
"sqs:List*",
"sqs:ReceiveMessage",
"sqs:ChangeMessageVisibility",
"sqs:DeleteMessage"
],
"Resource": [
"arn:aws:sqs:us-east-1:123456789012:my-cloudfront-log-notifications"
]
}
]
}</code></blockquote>
<p>Then install the <tt>logstash-input-s3-sns-sqs</tt> plugin on your Logstash machine:</p>
<blockquote><code>cd /usr/share/logstash
sudo -u logstash bin/logstash-plugin install logstash-input-s3-sns-sqs
</code></blockquote>
<p>Then update the <tt>input</tt> section of your pipeline to be the following (substituting your own SQS <tt>queue</tt> name and its AWS <tt>region</tt>):</p>
<blockquote><code>
input {
# pull new logfiles from s3 when notified
s3snssqs {
region => "us-east-1"
queue => "my-cloudfront-log-notifications"
from_sns => false
type => "cloudfront"
}
}
</code></blockquote>
<p>If you're running the Logstash machine in AWS, you can use the usual <a href="https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-iam-roles.html">EC2 instance profiles</a> or <a href="https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2_instance-profiles.html">IAM roles for tasks</a> to grant the machine access to the policy you created above. Otherwise, you'll need to add some AWS credential settings to the <tt>s3snssqs</tt> input as well; consult the <a href="https://www.elastic.co/guide/en/logstash/current/plugins-inputs-s3.html">S3 input plugins</a> docs for options (the <tt>s3snssqs</tt> input allows for the same AWS credential options as the <tt>s3</tt> input does, but the <tt>s3</tt> input has better documentation for them).</p>
<p>Now restart Logstash. You should see the same output in Logstash's own log as before; but if you check Elasticsearch, you should see new records being added.</p>
<h3>View logs in Kibana</h3>
<p>Eventually you'll want to create fancy dashboards in Kibana for your new CloudFront data; but for now we'll just get started by setting up a listing where you can view them in the "Discover" section of Kibana.</p>
<p>First log into Kibana, and navigate to the "Management" > "Stack Management" section of Kibana. Within the "Stack Management" section, if you navigate to the "Data" > "Index management" subsection, you should see a bunch of new indexes named in the form of <tt>ecs-logstash-cloudfront-YYYY.MM.DD</tt> (like <tt>ecs-logstash-cloudfront-2020.01.01</tt> and so on):</p>
<p><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh0PkwPlTPORrOfXqFxFc0zN9tSa3Ts0V3vsx10MxoINu-RwD3sKzc4pG5WYYW243F8ea3lUwddKg3oDZ7ushOsc4n5J2kQlbsnLTQz1BdxskZsdca_hvZ2WTUwTamODe-vCUFVGKlWLeI/s1664/screenshot-kibana-index-management.png" style="display: block; padding: 1em 0; text-align: center; "><img alt="" border="0" width="320" data-original-height="880" data-original-width="1664" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh0PkwPlTPORrOfXqFxFc0zN9tSa3Ts0V3vsx10MxoINu-RwD3sKzc4pG5WYYW243F8ea3lUwddKg3oDZ7ushOsc4n5J2kQlbsnLTQz1BdxskZsdca_hvZ2WTUwTamODe-vCUFVGKlWLeI/s320/screenshot-kibana-index-management.png"/></a></p>
<p>Once you've verified Kibana is seeing the indexes, navigate to the "Kibana" > "Index Patterns" subsection, and click the "Create index pattern" button. Specify <tt>ecs-logstash-cloudfront-*</tt> as the pattern, and select <tt>@timestamp</tt> as the time field:</p>
<p><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi0jJiWBRjZqtQBPRf7_YrcMggR4ammAITKGuSAgk0sizPDe9uVdb7QbDx1lmOg0bmmkeizFSmRIr5xE8ayILjOz2GdWE9pi954Kjvvvxj6g6x0jh9yR55ebNxEreyaU6ScfnxBHvx2Nck/s1664/screenshot-kibana-index-patterns.png" style="display: block; padding: 1em 0; text-align: center; "><img alt="" border="0" width="320" data-original-height="880" data-original-width="1664" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEi0jJiWBRjZqtQBPRf7_YrcMggR4ammAITKGuSAgk0sizPDe9uVdb7QbDx1lmOg0bmmkeizFSmRIr5xE8ayILjOz2GdWE9pi954Kjvvvxj6g6x0jh9yR55ebNxEreyaU6ScfnxBHvx2Nck/s320/screenshot-kibana-index-patterns.png"/></a></p>
<p>With the new index pattern created, navigate out of the "Stack Management" section of Kibana into the main "Kibana" > "Discover" section. This will show your most recent "Discover" search. On the left side of the page, change the selected index pattern to the pattern you just created (<tt>ecs-logstash-cloudfront-*</tt>). You should now see your most recent CloudFront entries listed (if not, use the time window selector in the top right of the page to expand the time window to include a range you know should include some entries). You can use this page to create a list with custom columns and custom filter settings for your CloudFront logs:</p>
<p><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiY7nN6uIuBCA0ajUSRAyWTLcaFxjtT-Qf81Qv2bXwk8zFiXWM_tjaXenyv8sG8wYEQpW8zdCSUod6IM7DTMGfmssfQ4mcMYm9q8mkomwgrBldnzAB27eIrSbhNRWSZkUVPX6q9Ud8BBTQ/s1664/screenshot-kibana-discover.png" style="display: block; padding: 1em 0; text-align: center; "><img alt="" border="0" width="320" data-original-height="880" data-original-width="1664" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiY7nN6uIuBCA0ajUSRAyWTLcaFxjtT-Qf81Qv2bXwk8zFiXWM_tjaXenyv8sG8wYEQpW8zdCSUod6IM7DTMGfmssfQ4mcMYm9q8mkomwgrBldnzAB27eIrSbhNRWSZkUVPX6q9Ud8BBTQ/s320/screenshot-kibana-discover.png"/></a></p>
Justin Ludwighttp://www.blogger.com/profile/03245749869056259124noreply@blogger.com0tag:blogger.com,1999:blog-3778768890472614719.post-64095134072460115162020-12-04T15:05:00.003-08:002020-12-04T15:05:56.350-08:00Building a Logstash Offline Plugin Pack with Docker<p>If you run a <a href="https://www.elastic.co/guide/en/logstash/current/index.html">Logstash</a> node in an environment where it doesn't have access to the public Internet, and need to install some extra plugins, you have to build an "<a href="https://www.elastic.co/guide/en/logstash/current/offline-plugins.html">offline plugin pack</a>" (a zip containing the plugins and their dependencies) on a machine that does have public Internet access. You can then copy the pack to your Logstash node, and install the plugins from it directly.</p>
<p>Here's a quick little script I whipped up to build the offline plugin pack using the official Logstash docker container:</p>
<blockquote><code>#!/bin/sh -e
logstash_version=7.10.0
logstash_plugins=$(echo '
logstash-codec-cloudfront
logstash-input-s3-sns-sqs
' | xargs)
echo "
bin/logstash-plugin install $logstash_plugins
bin/logstash-plugin prepare-offline-pack \
--output /srv/logstash/logstash-plugins.zip \
$logstash_plugins
" |
docker run -i -u $(id -u) -v $(pwd):/srv/logstash --rm \
docker.elastic.co/logstash/logstash:$logstash_version /bin/sh
</code></blockquote>
<p>Set the script's <tt>logstash_version</tt> variable to the version of Logstash you're using, and set the (whitespace-separated) list of plugins in the <tt>logstash_plugins</tt> variable to the plugins you need. Run the script, and it will output a <tt>logstash-plugins.zip</tt> into your working directory.</p>
<p>You can then copy the <tt>logstash-plugins.zip</tt> file to your Logstash node (for example, to the <tt>/usr/share/logstash</tt> directory of the machine), and install the contained plugins like this:</p>
<blockquote><code>cd /usr/share/logstash
sudo -u logstash bin/logstash-plugin install file://logstash-plugins.zip
</code></blockquote>
<p>Make sure you run the <tt>logstash-plugin</tt> command as the same user you use to run Logstash itself (typically the <tt>logstash</tt> user) — otherwise the plugins will be installed with the wrong filesystem permissions (and you'll see errors about it when you run the main Logstash process).</p>
Justin Ludwighttp://www.blogger.com/profile/03245749869056259124noreply@blogger.com0tag:blogger.com,1999:blog-3778768890472614719.post-14132677268744862982020-11-18T10:53:00.000-08:002020-11-18T10:53:01.943-08:00Antora Deploy to S3 and CloudFront<p>Even though there aren't any dedicated <a href="https://antora.org/">Antora</a> components for deploying to <a href="https://aws.amazon.com/cloudfront/">AWS CloudFront</a> or <a href="https://aws.amazon.com/s3/">S3</a>, it's still really easy to do — most of the Antora settings you'd use for a generic web hosting site work perfectly for S3 + CloudFront. Here's how:</p>
<ol>
<li><a href="#playbook-settings">Playbook Settings</a></li>
<li><a href="#s3-settings">S3 Settings</a></li>
<li><a href="#cloudfront-settings">CloudFront Settings</a></li>
<li><a href="#upload-script">Upload Script</a></li>
<li><a href="#redirects-script">Redirects Script</a></li>
</ol>
<h3 id="playbook-settings">Playbook Settings</h3>
<p>Here's an example playbook file that you'd use to build your documentation as part of your production deploy process:</p>
<blockquote><code># antora-playbook.yml
site:
robots: allow
start_page: example-user-guide::index.adoc
title: Example Documentation
url: https://docs.example.com
content:
sources:
- url: https://git.example.com/my-account/my-docs.git
branches: master
start_path: content/*
output:
clean: true
runtime:
fetch: true
ui:
bundle:
url: https://ci.example.com/my-account/my-docs-ui/builds/latest/ui-bundle.zip
snapshot: true
urls:
html_extension_style: indexify
</code></blockquote>
<p>If you run Antora in the same directory as this playbook, with a command like the following, Antora will generate your site to the <tt>build/site</tt> sub-directory:</p>
<blockquote><code>antora generate antora-playbook.yml</code></blockquote>
<p>These are the key playbook settings for S3/CloudFront (including some settings omitted from the above playbook, because the default value is perfect already):</p>
<p><a href="https://docs.antora.org/antora/2.3/playbook/site-robots/"><b><tt>site.robots</tt></b></a>: Set this to <tt>allow</tt> (or <tt>disallow</tt> if you want to forbid search-engines from crawling your docs), so that Antora will generate a <tt>robots.txt</tt> file for you.</p>
<p><a href="https://docs.antora.org/antora/2.3/playbook/site-url/"><b><tt>site.url</tt></b></a>: Make sure you set this to an absolute URL — doing so will trigger Antora to build out a bunch of desirable files, like a <tt>404.html</tt> and <tt>sitemap.xml</tt>. If your documentation has its own dedicated domain name, like <tt>docs.example.com</tt>, set <tt>site.url</tt> to <tt>https://docs.example.com</tt>; if instead your documentation can be found at a sub-directory of your main website, like under the <tt>docs</tt> directory of <tt>www.example.com</tt>, set <tt>site.url</tt> to <tt>https://www.example.com/docs</tt>. In either case, omit the trailing slash (eg <b>don't</b> set it to <tt>https://www.example.com/docs/</tt> — <b>do</b> set it to <tt>https://www.example.com/docs</tt>).</p>
<p><a href="https://docs.antora.org/antora/2.3/playbook/output-dir/"><b><tt>output.dir</tt></b></a>: By default, Antora will generate your site to the <tt>build/site</tt> sub-directory of whatever directory you ran the <tt>antora</tt> command from. If this is good for you, you can omit the <tt>output.dir</tt> setting; otherwise you can set <tt>output.dir</tt> to some other local filesystem path.</p>
<p><a href="https://docs.antora.org/antora/2.3/playbook/urls-html-extension-style/"><b><tt>urls.html_extension_style</tt></b></a>: Set this to <tt>indexify</tt>, which directs Antora to a) build out each documentation page to an <tt>index.html</tt> file in a sub-directory named for the path of the page, and b) to build links to each page via the path to the page with a trailing slash. For example, for a page named <tt>how-it-works.adoc</tt> in the <tt>ROOT</tt> module of the <tt>example-ui-guide</tt> component, with <tt>indexify</tt> Antora will build the page out as a file named <tt>example-ui-guide/how-it-works/index.html</tt> (within its <tt>build/site</tt> output directory), and build links to the page as <tt>/example-ui-guide/how-it-works/</tt>. This is exactly what you want when you your site is served by S3.</p>
<p><a href="https://docs.antora.org/antora/2.3/playbook/urls-redirect-facility/"><b><tt>urls.redirect_facility</tt></b></a>: The default setting, <tt>static</tt> is what you want for S3, so you can omit this setting from your playbook (or set it explicity to <tt>static</tt> if you like).</p>
<h3 id="s3-settings">S3 Settings</h3>
<p>When hosting Antora-generated sites on S3, you don't need to do anything different than you would for any other statically-generated website, so you can following any of the dozens of online guides for S3 website hosting, like Amazon's own <a href="https://docs.aws.amazon.com/AmazonS3/latest/user-guide/static-website-hosting.html">S3 static website hosting guide</a>. The key things you need to set are:</p>
<ol>
<li>Turn on static website hosting for the S3 bucket.</li>
<li>Set the "index document" to <tt>index.html</tt> (the default for S3 website hosting).</li>
<li>Set the "error document" to <tt>404.html</tt> (Antora generates this file for you).</li>
<li>Either configure the permissions of the S3 bucket to explicitly allow public access to read all objects in the bucket; or when you upload files to the bucket, explicitly upload them with a canned ACL setting that allows public read access (as the scripts covered later in this article will).</li>
</ol>
<p>You need to make the files in your S3 bucket publicly-accessible (point #4 above) so that CloudFront can access them. While there technically is a way to configure S3 and CloudFront so that the files are not publicly-accessible in S3 but CloudFront can still access them (via an <a href="https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/private-content-restricting-access-to-s3.html">Origin Access Identity</a>), it's kind of a pain. Since these files are ultimately meant to be served to the public through CloudFront anyway, it's simpler just to make them publicly-accessible in S3.</p>
<h3 id="cloudfront-settings">CloudFront Settings</h3>
<p>There's also nothing special you need to do for Antora-generated sites with CloudFront — any of the dozens of online guides for S3 + CloudFront hosting will work to set it up. Just make sure that when you set the origin for your CloudFront distribution, you use the "<a href="https://docs.aws.amazon.com/AmazonS3/latest/dev/WebsiteEndpoints.html">website endpoint</a>" of your S3 bucket, and not the standard endpoint.</p>
<p>For example, if your S3 bucket is named "example-bucket" and it's located in the <tt>us-west-2</tt> region, <b>don't</b> use <tt>example-bucket.s3.us-west-2.amazonaws.com</tt> as your CloudFront origin — instead <b>do</b> use <tt>example-bucket.s3-website-us-west-2.amazonaws.com</tt>. Using the website endpoint will ensure that CloudFront serves the Antora-generated <tt>404.html</tt> page for pages that don't exist, and that it also serves a <tt>301</tt> redirect for pages for which you've configured S3 to redirect (as the scripts covered later in this article will).</p>
<h3 id="upload-script">Upload Script</h3>
<p>Once you've set up your Antora playbook, S3 bucket, and CloudFront distribution, you're ready to deploy your site. If you've set up your <tt>antora-playbook.yml</tt> as above, you can build your documentation, upload it to S3, and clear the CloudFront caches of the old version of your docs with the following simple script:</p>
<blockquote><code>#!/bin/sh -e
build_dir=build/site
cf_distro=E1234567890ABC
s3_bucket=example-bucket
antora generate antora-playbook.yml
aws s3 sync $build_dir s3://$s3_bucket --acl public-read --delete
aws cloudfront create-invalidation --distribution-id $cf_distro --paths '/*'
</code></blockquote>
<p>The first line generates your documentation to the <tt>build/site</tt> directory. The second line replaces the existing content of <tt>example-bucket</tt> with the content of the <tt>build/site</tt> directory (granting public read-access to each individual file uploaded). The third line clears the CloudFront caches for all the content of your CloudFront distribution.</p>
<p>If you documentation is part of a larger site (eg hosted as <tt>https://www.example.com/docs/</tt> instead of being hosted as its own site (eg <tt>https://docs.example.com/</tt>), add the sub-directory under which your documentation is hosted (eg <tt>/docs</tt>) to the last two lines of the above script; for example, like the following:</p>
<blockquote><code>aws s3 sync $build_dir s3://$s3_bucket<ins>/docs</ins> --acl public-read --delete
aws cloudfront create-invalidation --distribution-id $cf_distro --paths '<ins>/docs</ins>/*'
</code></blockquote>
<h3 id="redirects-script">Redirects Script</h3>
<p>The redirect pages that Antora will generate when you set the Antora <tt>urls.redirect_facility</tt> setting to <tt>static</tt> will work fine for your website users as is. But search engines will like it better if you serve real HTTP redirect responses (with the redirect information embedded in HTTP header fields) instead of just HTML pages that indicate that the client browser should redirect to a different location once parsed. You can get S3 + CloudFront to serve <tt>301 Moved Permanently</tt> redirects in place of all the redirect pages Antora generates by uploading them separately to S3 with a special <tt>x-amz-website-redirect-location</tt> header.</p>
<p>To do so, insert the following block into your upload script between the <tt>aws s3 sync</tt> and <tt>aws cloudfront create-invalidation</tt> commands:</p>
<blockquote><code>#!/bin/sh
build_dir=build/site
cf_distro=E1234567890ABC
s3_bucket=docs.example.com
antora generate antora-playbook.yml
aws s3 sync $build_dir s3://$s3_bucket --acl public-read --delete
<ins>grep -lR 'http-equiv="refresh"' $build_dir | while read file; do
redirect_url=$(awk -F'"' '/rel="canonical"/ { print $4 }' $file)
aws s3 cp $file s3://$s3_bucket/${file##$build_dir/} \
--website-redirect $redirect_url --acl public-read
done</ins>
aws cloudfront create-invalidation --distribution-id $cf_distro --paths '/*'
</code></blockquote>
<p>The above script block will search the Antora build dir for all redirect pages (with the <tt>grep</tt> command), and loop over each (with the <tt>while</tt> command, reading the local filepath to each into the <tt>file</tt> variable). It will pull out the canonical URL of the page to redirect to from the redirect page (via the <tt>awk</tt> command, into the <tt>redirect_url</tt> variable), and re-upload the file using the <tt>--website-redirect</tt> flag of the <tt>aws s3 cp</tt> command to indicate that S3 should serve a <tt>301</tt> redirect to the specified URL instead of the file content itself (when accessed through the S3 website endpoint).</p>
<p>As a concrete example of this redirect capability, say you had a page named <tt>how-it-works.adoc</tt> in the <tt>ROOT</tt> module of your <tt>example-ui-guide</tt> component. If you added metadata to that <tt>how-it-works.adoc</tt> page to add a redirect to it from the non-existant <tt>inner-workings.adoc</tt> page (eg via a <a href="https://docs.antora.org/antora/2.3/page/page-aliases/"><tt>page-aliases</tt></a> header attribute value of <tt>inner-workings.adoc</tt>), Antora would generate the following redirect page for you at <tt>build/site/example-user-guide/inner-workings/index.html</tt>:</p>
<blockquote><code><!DOCTYPE html>
<meta charset="utf-8">
<link rel="canonical" href="https://docs.example.com/example-user-guide/how-it-works/">
<script>location="../how-it-works/"</script>
<meta http-equiv="refresh" content="0; url=../how-it-works/">
<meta name="robots" content="noindex">
<title>Redirect Notice</title>
<h1>Redirect Notice</h1>
<p>The page you requested has been relocated to <a href="../how-it-works/">https://docs.example.com/example-user-guide/how-it-works/</a>.</p>
</code></blockquote>
<p>The above script would re-upload this file to S3 like so (with all variables expanded, and some additional line-wrapping for legibility):</p>
<blockquote><code> aws s3 cp build/site/example-user-guide/inner-workings/index.html \
s3://example-bucket/example-user-guide/inner-workings/index.html \
--website-redirect https://docs.example.com/example-user-guide/how-it-works/ --acl public-read
</code></blockquote>
<p>If a user (or search engine) then navigates to <tt>https://docs.example.com/example-user-guide/inner-workings/</tt>, S3 + CloudFront will send this response back:</p>
<blockquote><code>HTTP/2 301
location: https://docs.example.com/example-user-guide/how-it-works/
</code></blockquote>
Justin Ludwighttp://www.blogger.com/profile/03245749869056259124noreply@blogger.com0tag:blogger.com,1999:blog-3778768890472614719.post-91150327419062902032020-10-30T15:11:00.008-07:002020-11-01T15:55:15.893-08:00Antora Quick Start Tutorial<p><a href="https://antora.org/">Antora</a> is a sophisticated documentation "static site generator" (SSG) for <a href="https://asciidoc.org">AsciiDoc</a>/<a href="https://asciidoctor.org">Asciidoctor</a>. It's built for big projects with multiple sets of documentation, so just getting started can be a little intimidating. Here's a quick guide for how to get up and running with Antora, using <a href="https://docs.docker.com/">Docker</a> and some <a href="https://www.gnu.org/software/make/">Makefiles</a> on your local dev machine.</p>
<p>At minimum, you really should use at least three separate git repos with Antora:</p>
<ol>
<li>Your documentation content (AsciiDoc source files).</li>
<li>Your customized UI theme/style (JavaScript, CSS, and Handlebars templates).</li>
<li>Your Antora build and deployment configuration (<tt>antora-playbook.yml</tt>, and whatever additional scripts/config files you use to deploy to production).</li>
</ol>
<p>In practice, you may in fact have several different repos for #1, like the <tt>docs</tt> directory from several different software products, or simply different repos for different pieces of documentation (like an Install Guide, Product Manual, API Reference, etc). And #3 might be part of other repos you use for devops configuration/deployment/infrastructure/etc.</p>
<p>But you're really going to want to have a new dedicated repo for #2 — this is where you customize your page header and footer content with links to your own websites, with your own logos, colors, and general look-and-feel.</p>
<p>So this guide is just going to focus on #2, with development-focused Antora config (#3) and initial documentation content (#1) snuck into the same repo. Before you begin, make sure you have <a href="https://git-scm.com/book/en/v2/Getting-Started-Installing-Git">Git</a>, Make, and <a href="https://docs.docker.com/get-docker/">Docker</a> installed on your local dev machine.</p>
<ol>
<li><a href="#ui-set-up">UI Set Up</a></li>
<li><a href="#ui-customization">UI Customization</a></li>
<li><a href="#documentation-set-up">Documentation Set Up</a></li>
<li><a href="#documentation-customization">Documentation Customization</a></li>
<li><a href="#development-workflow">Development Workflow</a></li>
</ol>
<h3 id="ui-set-up">UI Set Up</h3>
<h5>Clone Antora Default UI</h5>
<p>To start off, clone the <a href="https://gitlab.com/antora/antora-ui-default">Antora Default UI</a> repo. We're going to save all our UI customizations, our initial doc content, and our dev Antora config in this new repo, which we'll call <tt>docs-ui</tt>. Run these commands in a terminal:</p>
<blockquote><code>$ git clone https://gitlab.com/antora/antora-ui-default.git docs-ui
$ cd docs-ui
$ git remote rename origin antora-ui-default
$ git branch --move antora
</code></blockquote>
<p>This will keep <tt>antora-ui-default</tt> as a remote source for your git repository, so you can easily diff and pull in core fixes to the UI from the Antora Default UI repo. The local copy of the Antora Default UI we'll keep in a branch called <tt>antora</tt>.</p>
<p>But you'll want to set the main remote source of your customized repo (called <tt>origin</tt> by convention) to a new remote source. For example, if you have a github account named <tt>my-account</tt>, create a new repo named <tt>docs-ui</tt> in it, and push the local content of this new repo to it. We'll call the local branch of the main source <tt>main</tt>:</p>
<blockquote><code>$ git checkout -b main
$ git remote add origin https://github.com/my-account/docs-ui.git
$ git push -u origin main
</code></blockquote>
<h5>Dockerized UI Build</h5>
<p>Now we're ready to start making changes to the repo. To make it easy to preview and build those changes, we'll add three files to the root of the repo: <tt>ui.dockerfile</tt>, <tt>docker-compose.yml</tt>, and <tt>Makefile</tt>. Create these files:</p>
<blockquote><code># ui.dockerfile
FROM node:12-buster
WORKDIR /srv/docs-ui
</code></blockquote>
<blockquote><code># docker-compose.yml
version: '3'
services:
ui:
build:
context: .
dockerfile: ui.dockerfile
ports:
- 8052:5252
volumes:
- .:/srv/docs-ui
</code></blockquote>
<blockquote><code># Makefile
# help: @ Lists available make tasks
help:
@egrep -oh '[0-9a-zA-Z_\.\-]+:.*?@ .*' $(MAKEFILE_LIST) | \
awk 'BEGIN {FS = ":.*?@ "}; {printf "\033[36m%-30s\033[0m %s\n", $$1, $$2}' | sort
# node_modules: @ Runs initial ui npm install
node_modules:
docker-compose run ui npm install
# ui.build: @ Builds ui production output (to build/ui-bundle.zip)
ui.build: node_modules
docker-compose run -u $$(id -u) ui node_modules/.bin/gulp bundle
# ui.lint: @ Runs ui linting
ui.lint: node_modules
docker-compose run -u $$(id -u) ui node_modules/.bin/gulp lint
# ui.run: @ Runs ui server in preview mode (on port 8052)
ui.run: node_modules
docker-compose run -u $$(id -u) --service-ports ui node_modules/.bin/gulp preview
# ui.shell: @ Opens bash shell in ui container
ui.shell: CMD ?= /bin/bash
ui.shell:
docker-compose run -u $$(id -u) ui $(CMD)
</code></blockquote>
<p>Note that Makefiles require you to use tabs for indentation, instead of spaces, so make sure your editor has preserved the indentation in the above <tt>Makefile</tt> with tabs.</p>
<h5>Run UI Preview</h5>
<p>With those three files in place, run this command from the repo root:</p>
<blockquote><code>make ui.run</code></blockquote>
<p>This will build the Docker image defined by the <tt>ui.dockerfile</tt> file, launch it as a container, and run the Antora UI preview server in it, exposed on port 8052. Open up a browser, and navigate to:</p>
<blockquote><code>http://localhost:8052/</code></blockquote>
<p>You should see a test page for the Antora UI (titled "Hardware and Software Requirements") with a bunch of sample content.</p>
<h3 id="ui-customization">UI Customization</h3>
<h5>Customize Header Content</h5>
<p>Now open up the <tt>src/partials/header-content.hbs</tt> file from the repo. This is the template for the page header content. Change the line with the "Home" link to this:</p>
<blockquote><code><a class="navbar-item" href="https://example.com/">Example Co Home</a></code></blockquote>
<p>Save the file, go back to your browser, and refresh the page. The "Home" link in the page header should now read "Example Co Home" (and link to https://example.com/).</p>
<h5>Customize Header Color</h5>
<p>Next open up the <tt>src/css/vars.css</tt> file. This file defines (via CSS variables, which begin with the <tt>--</tt> prefix) the basic colors and sizes of the UI. Change the <tt>--navbar-background</tt> variable to this:</p>
<blockquote><code>--navbar-background: #39f;</code></blockquote>
<p>Save the file, go back to your browser, and refresh the page. The background of the page header should now be a medium blue.</p>
<h5>Build UI Bundle</h5>
<p>Kill the <tt>make ui.run</tt> task (by pressing control-C in the terminal it's running in), and run this command in it's place:</p>
<blockquote><code>make ui.build</code></blockquote>
<p>This will build an Antora UI bundle as a zip file called <tt>ui-bundle.zip</tt> in the repo's <tt>build</tt> directory. We'll use this bundle in the next step, when we set up our basic Antora documentation build.</p>
<h3 id="documentation-set-up">Documentation Set Up</h3>
<p>Now we're ready to actually build some documentation. The Antora Default UI repo includes an Antora module in its <tt>docs</tt> directory, so we'll start by building it.</p>
<h5>Create Antora Playbook</h5>
<p>The first thing to do is create an <tt>antora-playbook.yml</tt> file. This file defines the configuration for the Antora build process. Add the following <tt>antora-playbook.yml</tt> to the root of the repo:</p>
<blockquote><code># antora-playbook.yml
site:
robots: allow
start_page: antora-ui-default::index.adoc
title: Example Documentation
url: https://docs.example.com
content:
sources:
- url: ./
branches: HEAD
start_path: docs
runtime:
cache_dir: ./build/cache
ui:
bundle:
url: ./build/ui-bundle.zip
urls:
html_extension_style: indexify
</code></blockquote>
<p>This will configure Antora to use "<a href="https://docs.antora.org/antora/2.3/playbook/author-mode/">author mode</a>", where it pulls its documentation content from local files instead of remote git repos. Your production build scripts will probably configure Antora to pull documentation content from specific release branches of various product or documentation repos, but while you're developing the documentation, you'll just want to build the latest docs from your working copy.</p>
<p>The Antora documentation covers all playbook settings thoroughly in its <a href="https://docs.antora.org/antora/2.3/playbook/">Antora Playbook</a> section, but let's briefly touch on each setting of this <tt>antora-playbook.yml</tt>:</p>
<p><b><tt>site.robots</tt></b>: Setting this to <tt>allow</tt> directs Antora to generate a <tt>robots.txt</tt> file which allows bots to spider your site; setting it to <tt>disallow</tt> generates a <tt>robots.txt</tt> that disallows bots. While you can omit this setting in your dev build (and not generate a <tt>robots.txt</tt> file), it's nice to see what Antora will generate.</p>
<p><b><tt>site.start_page</tt></b>: This is the <a href="https://docs.antora.org/antora/2.3/page/page-id/">page ID</a> of the page to redirect to when a visitor navigates to the root of your site (eg https://docs.example.com/). A value of <tt>antora-ui-default::index.adoc</tt> sets it to redirect to the <tt>index.adoc</tt> page of the <tt>ROOT</tt> module in the <tt>antora-ui-default</tt> component (https://docs.example.com/antora-ui-default/ under this configuration). This source file for this page is located at <tt>docs/modules/ROOT/pages/index.adoc</tt> in our repo.</p>
<p><b><tt>site.title</tt></b>: This is the site-wide title that will be displayed in the header of every page, as well as the browser titlebar.</p>
<p><b><tt>site.url</tt></b>: This is the base URL of the site, when deployed, minus the trailing slash. For deployments where the documentation site has its own domain name (eg <tt>docs.example.com</tt>), the value of this setting should just be the URL scheme plus domain name (eg <tt>https://docs.example.com</tt>); for deployments where the documentation lives as a sub-directory of a larger site, the value should also include the sub-directory under which the documentation will live (eg <tt>https://www.example.com/docs</tt>). While you can safely omit this setting for dev builds, it's nice to include just so you can see how this URL will be used in production.</p>
<p><b><tt>content.sources</tt></b>: This is a list of sources (each separate item in a <a href="https://yaml.org/spec/1.2/spec.html">YAML</a> list is denoted by a <tt>-</tt> sign); initially, we'll just have one source: the <tt>docs</tt> directory that came with the Antora Default UI repo.</p>
<p><b><tt>content.sources[0].url</tt></b>: With your production configuration, you'd usually specify a full URL to a git repo with this setting; but with our dev config, we'll just use this local repo, indicated by <tt>./</tt>. Note that this setting has to be the path to the root of a git repo — it can't specify a sub-directory of the repo itself (like say <tt>./docs</tt>).</p>
<p><b><tt>content.sources[0].branches</tt></b>: This specifies which branch of the repo to use. With your production config, you'd probably want to specify specific branches that correspond to releases of your product (eg <tt>[1.0, 1.1, 1.2]</tt>); but for development, we just want whatever your current working branch is (<tt>HEAD</tt> in git parlance).</p>
<p><b><tt>content.sources[0].start_path</tt></b>: This specifies the path to the <a href="https://docs.antora.org/antora/2.3/standard-directories/">content source root</a> of each Antora component in the repo you want to include. We just have one to start with, located in the <tt>docs</tt> directory. Each component must have an <tt>antora.yml</tt> file in it's source root; this file defines its component name, display title, version string, and navigation structure (and it can also include AsciiDoc setting customizations that apply to just the particular component).</p>
<p><b><tt>runtime.cache_dir</tt></b>: This specifies Antora's internal cache of files. The default location (<tt>~/.cache/antora</tt>) would put it inside our Docker container, which is fine except that it means that Antora has to rebuild the cache again every time we run one of our Makefile commands. So for a moderately improved experience, we'll move it to the <tt>build/cache</tt> directory inside the repo itself (the <tt>build</tt> directory is already conveniently listed in the <tt>.gitignore</tt> file we cloned from the Antora Default UI).</p>
<p><b><tt>ui.bundle.url</tt></b>: This specifies the location where Antora should fetch your customized UI files from. We want to use the UI built by this very repo, so we'll specify the path to to the UI bundle built by our <tt>make ui.build</tt> command, <tt>./build/ui-bundle.zip</tt>. We prefix the path with <tt>./</tt> to indicate to Antora that this is a local file path — usually this would be the full URL to the location where your build system has stored the latest build of your customized UI; the build system for the Antora project saves the latest stable build of the Antora Default UI to <tt>https://gitlab.com/antora/antora-ui-default/-/jobs/artifacts/master/raw/build/ui-bundle.zip?job=bundle-stable</tt>.</p>
<p><b><tt>urls.html_extension_style</tt></b>: Setting this to <tt>indexify</tt> directs Antora to generate each page as an <tt>index.html</tt> file within a directory named for the page — eg <tt>https://docs.example.com/antora-ui-default/index.html</tt> for our start page — and to drop <tt>index.html</tt> from the page URL when linking to it — eg <tt>https://docs.example.com/antora-ui-default/</tt>. This is the style you'd usually use when hosting static content with NGINX, Apache, and many static hosting services. Antora allows for <a href="https://docs.antora.org/antora/2.3/playbook/urls-html-extension-style/">several different options</a>, however, and while you don't need to set this in your dev config, it's nice to just so you can see what your URLs will look like in production.</p>
<h5>Dockerized Docs Build</h5>
<p>To make it easy to run the build, we'll add another Dockerfile, and add to our existing docker-compose configuration and Makefile. Add this <tt>antora.dockerfile</tt> (substituting <tt>2.3.4</tt> in the file for whatever Antora's latest stable version number is):</p>
<blockquote><code># antora.dockerfile
FROM antora/antora:2.3.4
RUN yarn global add http-server onchange
WORKDIR /srv/docs
</code></blockquote>
<p>Then add an <tt>antora</tt> service to our existing <tt>docker-compose.yml</tt> file:</p>
<blockquote><code># docker-compose.yml
version: '3'
services:
<ins>antora:
build:
context: .
dockerfile: antora.dockerfile
environment:
CI: 'true'
ports:
- 8051:8080
volumes:
- .:/srv/docs</ins>
ui:
build:
context: .
dockerfile: ui.dockerfile
ports:
- 8052:5252
volumes:
- .:/srv/docs-ui
</code></blockquote>
<p>Note that the <tt>CI: 'true'</tt> environment variable will suppress the "Edit this Page" link that otherwise would be displayed in the top right of each page (making the output generated by Antora in our Docker containers more similar to what we'd see in production).</p>
<p>Finally, add some <tt>antora.*</tt> tasks to our <tt>Makefile</tt>:</p>
<blockquote><code># Makefile
# help: @ Lists available make tasks
help:
@egrep -oh '[0-9a-zA-Z_\.\-]+:.*?@ .*' $(MAKEFILE_LIST) | \
awk 'BEGIN {FS = ":.*?@ "}; {printf "\033[36m%-30s\033[0m %s\n", $$1, $$2}' | sort
<ins># antora.build: @ Builds documentation production output (to build/site)
antora.build:
docker-compose run -u $$(id -u) antora antora generate --clean antora-playbook.yml
# antora.run: @ Serves documentation output (on port 8051)
antora.run:
docker-compose run --service-ports antora http-server build/site -c-1
# antora.watch: @ Watches for documentation changes and rebuilds (to build/site)
antora.watch:
docker-compose run -u $$(id -u) -T antora onchange \
-i antora-playbook.yml 'docs/**' \
-- antora generate antora-playbook.yml
# antora.shell: @ Opens bash shell in antora container
antora.shell: CMD ?= /bin/sh
antora.shell:
docker-compose run -u $$(id -u) antora $(CMD)</ins>
# node_modules: @ Runs initial ui npm install
node_modules:
docker-compose run ui npm install
# ui.build: @ Builds ui production output (to build/ui-bundle.zip)
ui.build: node_modules
docker-compose run -u $$(id -u) ui node_modules/.bin/gulp bundle
# ui.lint: @ Runs ui linting
ui.lint: node_modules
docker-compose run -u $$(id -u) ui node_modules/.bin/gulp lint
# ui.run: @ Runs ui server in preview mode (port 8052)
ui.run: node_modules
docker-compose run -u $$(id -u) --service-ports ui node_modules/.bin/gulp preview
# ui.shell: @ Opens bash shell in ui container
ui.shell: CMD ?= /bin/bash
ui.shell:
docker-compose run -u $$(id -u) ui $(CMD)
</code></blockquote>
<p>This <tt>antora.dockerfile</tt> takes the base Antora image produced by the Antora project, and adds the <a href="https://github.com/http-party/http-server">http-server</a> and <a href="https://github.com/Qard/onchange">onchange</a> Node.js modules to it. Our <tt>Makefile</tt>'s <tt>antora.run</tt> task uses the http-server module to serve the content Antora generates, and our <tt>antora.watch</tt> task uses the onchange module to automatically rebuild that content whenever we change a documentation source file.</p>
<h5>Build Documentation</h5>
<p>With those file changes in place, run this command from the repo root:</p>
<blockquote><code>make antora.build</code></blockquote>
<p>This will build the Docker image defined by the <tt>antora.dockerfile</tt> file, launch it as a container, and run Antora using the <tt>antora-playbook.yml</tt> playbook we just wrote. Antora will generate the built documentation files to the <tt>build/site</tt> directory in our repo.</p>
<h5>Run Documentation Preview</h5>
<p>Now let's take a look at that documentation — run this command from the repo root:</p>
<blockquote><code>make antora.run</code></blockquote>
<p>This will serve the built files via the http-server Node.js module on port 8051. Open up a browser, and navigate to:</p>
<blockquote><code>http://localhost:8051/</code></blockquote>
<p>You should be redirected to <tt>http://localhost:8051/antora-ui-default/</tt>, and see the start page of the Antora Default UI documentation.</p>
<h5>Rebuild on Documentation Changes</h5>
<p>Finally, in another terminal (while the <tt>make antora.run</tt> command is still running in the first terminal), run this command:</p>
<blockquote><code>make antora.watch</code></blockquote>
<p>This will use the onchange Node.js module to watch for changes to the documentation source, and automatically trigger Antora to rebuild whenever you make a change. We'll rely on this functionality with the next step.</p>
<h3 id="documentation-customization">Documentation Customization</h3>
<h5>Create New Component</h5>
<p>Now, finally, we're ready to start writing our own documentation! We'll start with a "User Guide" component, initially consisting of a "root" module with one page. Create a <tt>content</tt> directory, and a <tt>user-guide</tt> directory within it — this is where our "User Guide" component will live.</p>
<p>Within the <tt>user-guide</tt> directory, create a new <a href="https://docs.antora.com/antora/2.3/component-version-descriptor/"><tt>antora.yml</tt></a> file. This file will contain the basic metadata about our "User Guide" component. Create it with this content:</p>
<blockquote><code># content/user-guide/antora.yml
name: example-user-guide
title: Example User Guide
version: master
</code></blockquote>
<p>Note that <tt>master</tt> is a special keyword in Antora that means a <a href="https://docs.antora.org/antora/2.3/component-with-no-version/">component with no version</a>. Usually Antora includes the version number of a component in all the URLs it generates for that component, but it will omit the version number if it is <tt>master</tt>.</p>
<h5>Create First Page</h5>
<p>Also within the <tt>user-guide</tt> directory, create a <tt>modules</tt> directory; within the <tt>modules</tt> directory, create an <tt>ROOT</tt> directory; and within the <tt>ROOT</tt> directory, create a <tt>pages</tt> directory. Within the <tt>ROOT directory</tt>, create our first page, called <tt>index.adoc</tt>:</p>
<blockquote><code># content/user-guide/modules/ROOT/pages/index.adoc
= User Guide
== Welcome
Welcome to our product! This is the user guide.
</code></blockquote>
<h5>Add Watch Path</h5>
<p>Update our <tt>make antora.watch</tt> command in our <tt>Makefile</tt> to the following, adding <tt>components/**</tt> as a path to watch for changes:</p>
<blockquote><code># Makefile
antora.watch:
docker-compose run -u $$(id -u) -T antora onchange \
-i antora-playbook.yml <ins>'components/**'</ins> 'docs/**' \
-- antora generate antora-playbook.yml
</code></blockquote>
<p>If you still have the <tt>make antora.watch</tt> command running in a terminal, kill it (by pressing control-C in the terminal it's running in), and re-run <tt>make antora.watch</tt> again.</p>
<h5>Register Component</h5>
<p>Now we'll register the component in the <tt>antora-playbook.yml</tt> at the root of our repo, so that we can have Antora build it. We could add it as a second item to our <tt>sources</tt> list, but since we already have the current repo listed as a source, we can just change the existing single <tt>start_path</tt> setting to be a multiple <tt>start_paths</tt> setting (note the "s" on the end of the setting), and direct Antora to include any sub-directory of the <tt>content</tt> directory in our repo as a component:</p>
<blockquote><code># antora-playbook.yml
site:
robots: allow
start_page: antora-ui-default::index.adoc
title: Example Documentation
url: https://docs.example.com
content:
sources:
- url: ./
branches: HEAD
<del>start_path: docs</del>
<ins>start_paths:
- docs
- content/*</ins>
runtime:
cache_dir: ./build/cache
ui:
bundle:
url: ./build/ui-bundle.zip
urls:
html_extension_style: indexify
</code></blockquote>
<p>With our <tt>make antora.watch</tt> command running, as soon as you save this change to <tt>antora-playbook.yml</tt>, Antora will rebuild all its content. So make the change, and refresh the browser window you have opened at <tt>http://localhost:8051</tt>.</p>
<p>The page will look exactly the same as it did before — but click the "Antora Default UI" label in the bottom-left corner of the page. The bottom of the left navigation should expand to show two items: "Antora Default UI" and "Example User Guide". Click the "master" label directly below "Example User Guide", and you will navigate to your new "Example User Guide" index page at <tt>http://localhost:8051/example-user-guide/</tt>.</p>
<p>The component is listed as "Example User Guide", because that is the title we gave it in the <tt>content/user-guide/antora.yml</tt> file. And its URL path is <tt>/example-user-guide/</tt> because we set its name to <tt>example-user-guide</tt> in that same file.</p>
<p>The title of the page is "User Guide", as displayed in the browser titlebar and the main title of the page's body, because that's the title we gave the page in its AsciiDoc source at <tt>content/user-guide/modules/ROOT/pages/index.adoc</tt>. And because we added a "Welcome" section to that page, we see "Welcome" as a section title in the page's body, as well as its right navigation.</p>
<p>You'll also want to change your <tt>antora-playbook.yml</tt> to make the new component the start page for the site (so that whenever someone navigates to <tt>https://docs.example.com/</tt>, they'll be redirected to <tt>https://docs.example.com/example-user-guide/</tt> instead of <tt>https://docs.example.com/antora-ui-default/</tt>). Just replace <tt>antora-ui-default</tt> in the <tt>start_page</tt> setting of your <tt>antora-playbook.yml</tt> with <tt>example-user-guide</tt>:</p>
<blockquote><code># antora-playbook.yml
<del>start_page: antora-ui-default::index.adoc</del>
<ins>start_page: example-user-guide::index.adoc</ins>
</code></blockquote>
<h5>Create Navigation</h5>
<p>We don't have any left navigation for our new component, however, so let's fix that. Create a navigation file at <tt>content/user-guide/modules/ROOT/nav.adoc</tt> with this content:</p>
<blockquote><code># content/user-guide/modules/ROOT/nav.adoc
* xref:index.adoc#_welcome[Welcome!]
</code></blockquote>
<p>Then configure Antora to use this navigation file by updating the <tt>content/user-guide/modules/antora.yml</tt> file like so:</p>
<blockquote><code># content/user-guide/antora.yml
name: example-user-guide
title: Example User Guide
version: master
<ins>nav:
- modules/ROOT/nav.adoc</ins>
</code></blockquote>
<p>Go back to your browser and refresh the page, and you will see that Antora added a navigation list for the user-guide component to the left navigation, with one item beneath the root. This item links to the "Welcome" section of the "root" module's index page, displaying the link text as "Welcome!".</p>
<h5>Create Another Page</h5>
<p>Now create a second page for our component at <tt>content/user-guide/modules/ROOT/pages/how-it-works.adoc</tt> with this content:</p>
<blockquote><code># content/user-guide/modules/ROOT/pages/how-it-works.adoc
= This Is How It Works
:navtitle: But How Does It Work?
Well, to be honest, we're not exactly sure how any of this works.
</code></blockquote>
<p>Update the navigation file at <tt>content/user-guide/modules/ROOT/nav.adoc</tt> to add an item for our new page:</p>
<blockquote><code># content/user-guide/modules/ROOT/nav.adoc
* xref:index.adoc#_welcome[Welcome!]
<ins>* xref:how-it-works.adoc[]</ins>
</code></blockquote>
<p>Go back to your browser and refresh the page again, and you will see that Antora added another item to the left navigation. The item label is "But How Does It Work?", matching the <tt>navtitle</tt> attribute set under the title of the <tt>how-it-works.adoc</tt> page. The URL it links to is <tt>http://localhost:8051/example-user-guide/how-it-works/</tt> — the first path segment comes from the component name defined in the <tt>antora.yml</tt> file of the component, and the last path segment comes from the file name of the page itself. If the file was part of any module other than the "root" module for the component, it would have another path segment between the component name and page name for the module name (taken from the directory name of the module).</p>
<p>When you click the link, you'll see that the page title is "This Is How It Works", as displayed in the browser titlebar and the main title of the page's body, matching the page title from the <tt>how-it-works.adoc</tt> file.</p>
<h3 id="development-workflow">Development Workflow</h3>
<h5>Work on the UI</h5>
<p>With your initial UI and documentation content now set up, whenever you want to make changes to the look-and-feel of your documentation, you would follow these steps:</p>
<ol>
<li>Fire up the Antora UI preview server with <tt>make ui.run</tt></li>
<li>View the Antora UI preview in your web browser at <tt>http://localhost:8052/</tt></li>
<li>Iteratively make changes to the CSS, Handlebars, and other UI files in the <tt>src</tt> directory of your project</li>
</ol>
<h5>Build the UI</h5>
<p>Once your changes look good with the Antora UI preview content, follow these steps to incorporate them into the local version of your documentation:</p>
<ol>
<li>Build the UI bundle with <tt>make ui.build</tt></li>
<li>Pull the new UI bundle into your local doc build with <tt>make antora.build</tt></li>
<li>Fire up a web server for your local docs with <tt>make antora.run</tt></li>
<li>View your UI changes with your local copy of the docs at <tt>http://localhost:8051/</tt></li>
</ol>
<h5>Write Documentation</h5>
<p>And whenever you get the hankering to write some documentation, follow these steps:</p>
<ol>
<li>Fire up a web server for your local docs with <tt>make antora.run</tt></li>
<li>View your local copy of the docs at <tt>http://localhost:8051/</tt></li>
<li>Fire up the onchange watcher for the docs in a separate terminal with <tt>make antora.watch</tt> (to automatically re-build the docs whenever you make a change)
<li>Iteratively make changes to AsciiDoc files in the <tt>content</tt> directory of your project</li>
</ol>
Justin Ludwighttp://www.blogger.com/profile/03245749869056259124noreply@blogger.com5tag:blogger.com,1999:blog-3778768890472614719.post-75870237511303317192020-10-13T11:17:00.001-07:002020-10-13T11:17:27.032-07:00How To Test OpenRC Services with Docker-Compose<p>Similar to how I abused <a href="https://docs.docker.com/">Docker</a> conceptually to <a href="/2020/01/testing-systemd-services-with-docker.html">test systemd services with docker-compose</a>, I spent some time recently trying to do the same thing with <a href="https://wiki.gentoo.org/wiki/OpenRC">OpenRC</a> for Alpine Linux.</p>
<p>It basically requires the same steps as systemd. With the base 3.12 Alpine image, it's a matter of:</p>
<ol>
<li>Install OpenRC</li>
<li>Optionally map <tt>/sys/fs/cgroup</tt></li>
<li>Start up with <tt>/sbin/init</tt></li>
<li>Run tests via <tt>docker exec</tt></li>
</ol>
<h3>1. Install OpenRC</h3>
<p>The base Alpine images don't include OpenRC, so you have to install it with <tt>apk</tt>. I do this in my <tt>Dockerfile</tt>:</p>
<blockquote><code>FROM alpine:3.12
RUN apk add openrc
CMD ["/sbin/init"]
</code></blockquote>
<h3>2. Optionally map <tt>/sys/fs/cgroup</tt></h3>
<p>Unlike with systemd, I didn't have to set up any <tt>tmpfs</tt> mounts to get OpenRC services running. I also didn't <i>have</i> to map the <tt>/sys/fs/cgroup</tt> directory -- but if I didn't, I would get a bunch of cgroup-related error messages when starting and stopping services (although the services themselves still seemed to work fine). So I just went ahead and mapped the dir in my <tt>docker-compose.yml</tt> to avoid those error messages:</p>
<blockquote><code>version: '3'
services:
my_test_container:
build: .
image: my_test_image
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
</code></blockquote>
<h3>3. Start up with <tt>/sbin/init</tt></h3>
<p>With the Alpine <tt>openrc</tt> package, the traditional <tt>/sbin/init</tt> startup command works to start OpenRC. I added <tt>CMD ["/sbin/init"]</tt> to my <tt>Dockerfile</tt> to start up with it, but you could instead add <tt>command: /sbin/init</tt> to the service in your <tt>docker-compose.yml</tt> file.</p>
<h3>4. Run tests via <tt>docker exec</tt></h3>
<p>The above <tt>docker-compose.yml</tt> and <tt>Dockerfile</tt> will allow you to start up OpenRC in <tt>my_test_container</tt> with one command:</p>
<blockquote><code>docker-compose up -d my_test_container</code></blockquote>
<p>With OpenRC up and running, you can use a second command to execute a shell on the very same container to test it out:</p>
<blockquote><code>docker-compose exec my_test_container /bin/sh</code></blockquote>
<p>Or use <tt>exec</tt> to run other commands to test the services managed by OpenRC:</p>
<blockquote><code>docker-compose exec my_test_container rc-status --servicelist</code></blockquote>
<h3>Cleaning up</h3>
<p>The clean up steps with OpenRC are also basically the same as with systemd:</p>
<ol>
<li>Stop the running container: <tt>docker-compose stop my_test_container</tt></li>
<li>Remove the saved container state: <tt>docker-compose rm my_test_container</tt></li>
<li>Remove the built image: <tt>docker image rm my_test_image</tt></li>
</ol>
Justin Ludwighttp://www.blogger.com/profile/03245749869056259124noreply@blogger.com0tag:blogger.com,1999:blog-3778768890472614719.post-84640644556208528582020-10-02T15:57:00.005-07:002020-10-02T18:19:54.441-07:00Testing Systemd Services on Arch, Fedora, and Friends<p>Following up on a previous post about <a href="/2020/01/testing-systemd-services-with-docker.html">how to test systemd services with docker-compose</a> on Ubuntu, I spent some time recently trying to do the same thing with a few other Linux distributions. I was able to get the same tricks to work on these other distributions:</p>
<ul>
<li>Amazon Linux</li>
<li>Arch</li>
<li>CentOS</li>
<li>Debian</li>
<li>Fedora</li>
<li>openSUSE</li>
<li>RHEL</li>
</ul>
<p>A few of those distros required an additional tweak, however.</p>
<h3>One more <tt>tmpfs</tt> directory for Arch and Fedora</h3>
<p>For Arch and Fedora, I had to do one more thing: add <tt>/tmp</tt> as a <tt>tmpfs</tt> mount.</p>
<p>So the <tt>docker-compose.yml</tt> file for those distros should look like this:</p>
<blockquote><code>version: '3'
services:
my_test_container:
build: .
image: my_test_image
tmpfs:
- /run
- /run/lock
<ins>- /tmp</ins>
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
</code></blockquote>
<p>Or when running as a regular <tt>docker</tt> command, start the container like this:</p>
<blockquote><code>docker run \
--tmpfs /run --tmpfs /run/lock <ins>--tmpfs /tmp</ins> \
--volume /sys/fs/cgroup:/sys/fs/cgroup:ro \
--detach --rm \
--name my_test_container my_test_image
</code></blockquote>
<h3>Different init script location for openSUSE</h3>
<p>For openSUSE, the systemd init script is located at <tt>/usr/lib/systemd/systemd</tt> instead of just <tt>/lib/systemd/systemd</tt>. So the <tt>Dockerfile</tt> I used for it looks like this:</p>
<blockquote><code>FROM opensuse/leap:15
RUN zypper install -y systemd
CMD ["<ins>/usr</ins>/lib/systemd/systemd"]
</code></blockquote>
Justin Ludwighttp://www.blogger.com/profile/03245749869056259124noreply@blogger.com0tag:blogger.com,1999:blog-3778768890472614719.post-59166348347979913862020-09-14T17:30:00.000-07:002020-09-14T17:30:38.164-07:00Elixir Ed25519 Signatures With Enacl<p>The most-actively supported library for using ed25519 with <a href="https://elixir-lang.org/">Elixir</a> currently looks to be <a href="https://github.com/jlouis/enacl">enacl</a>. It provides straightforward, idiomatic Erlang bindings for <a href="https://doc.libsodium.org/">libsodium</a>.</p>
<h3>Installing</h3>
<p>Installing enacl for a <a href="https://hexdocs.pm/mix/Mix.html">Mix</a> project requires first installing your operating system's <tt>libsodium-dev</tt> package on your dev & build machines (as well as the regular <tt>libsodium</tt> package anywhere else you run your project binaries). Then in the <tt>mix.exs</tt> file of your project, add <tt>{:enacl, "~> 1.0.0"}</tt> to the <tt>deps</tt> section of that file; and then run <tt>mix deps.get</tt> to download the enacl package from <a href="https://hex.pm/">Hex</a>.</p>
<h3>Keys</h3>
<p>In the parlance of libsodium, the "secret key" is the full keypair, the "public key" is the public part of the keypair (the public curve point), and the "seed" is the private part of the keypair (the 256-bit secret). The seed is represented in enacl as a 32-byte binary string, as is the public key; and the secret key is the 64-byte binary concatenation of the seed plus the public key.</p>
<p>You can generate a brand new ed25519 keypair with enacl via the <tt>sign_keypair/0</tt> function. After generating, usually you'd want to save the keypair somewhere as a base64- or hex-encoded string:</p>
<blockquote><code>iex> keypair = :enacl.sign_keypair()
%{
public: <<215, 90, 152, 1, 130, 177, 10, 183, 213, 75, 254, 211, 201, 100, 7,
58, 14, 225, 114, 243, 218, 166, 35, 37, 175, 2, 26, 104, 247, 7, 81, 26>>,
secret: <<157, 97, 177, 157, 239, 253, 90, 96, 186, 132, 74, 244, 146, 236,
44, 196, 68, 73, 197, 105, 123, 50, 105, 25, 112, 59, 172, 3, 28, 174, 127,
96, 215, 90, 152, 1, 130, 177, 10, 183, 213, 75, 254, 211, 201, 100, 7, 58,
...>>
}
iex> <<seed::binary-size(32), public_key::binary>> = keypair.secret
<<157, 97, 177, 157, 239, 253, 90, 96, 186, 132, 74, 244, 146, 236, 44, 196, 68,
73, 197, 105, 123, 50, 105, 25, 112, 59, 172, 3, 28, 174, 127, 96, 215, 90,
152, 1, 130, 177, 10, 183, 213, 75, 254, 211, 201, 100, 7, 58, 14, 225, ...>>
iex> public_key == keypair.public
true
iex> seed <> public_key == keypair.secret
true
iex> public_key_base64 = public_key |> Base.encode64()
"11qYAYKxCrfVS/7TyWQHOg7hcvPapiMlrwIaaPcHURo="
iex> public_key_hex = public_key |> Base.encode16(case: :lower)
"d75a980182b10ab7d54bfed3c964073a0ee172f3daa62325af021a68f707511a"
iex> private_key_base64 = seed |> Base.encode64()
"nWGxne/9WmC6hEr0kuwsxERJxWl7MmkZcDusAxyuf2A="
iex> private_key_hex = seed |> Base.encode16(case: :lower)
"9d61b19deffd5a60ba844af492ec2cc44449c5697b326919703bac031cae7f60"
</code></blockquote>
<p>You can also reconstitute a keypair from just the private part (the "seed") with the enacle <tt>sign_seed_keypair/1</tt> function:</p>
<blockquote><code>iex> reloaded_keypair = (
...> "9d61b19deffd5a60ba844af492ec2cc44449c5697b326919703bac031cae7f60"
...> |> Base.decode16!(case: :lower)
...> |> :enacl.sign_seed_keypair()
...>)
%{
public: <<215, 90, 152, 1, 130, 177, 10, 183, 213, 75, 254, 211, 201, 100, 7,
58, 14, 225, 114, 243, 218, 166, 35, 37, 175, 2, 26, 104, 247, 7, 81, 26>>,
secret: <<157, 97, 177, 157, 239, 253, 90, 96, 186, 132, 74, 244, 146, 236,
44, 196, 68, 73, 197, 105, 123, 50, 105, 25, 112, 59, 172, 3, 28, 174, 127,
96, 215, 90, 152, 1, 130, 177, 10, 183, 213, 75, 254, 211, 201, 100, 7, 58,
...>>
}
iex> reloaded_keypair == keypair
true
</code></blockquote>
<h3>Signing</h3>
<p>Libsodium has a series of functions for signing large documents that won't fit into memory or otherwise have to be split into chunks — but for most cases, the simpler enacl <tt>sign/2</tt> or <tt>sign_detached/2</tt> functions are what you want to use.</p>
<p>The enacl <tt>sign/2</tt> function produces a binary string that combines the original message with the message signature, which the <tt>sign_open/2</tt> function can later unpack and verify. This is ideal for preventing misuse, since it makes it harder to just use the message without verifying the signature first.</p>
<p>The enacl <tt>sign_detached/2</tt> function produces the message signature as a stand-alone 64-byte binary string — if you need to store or send the signature separately from the message itself, this is the function you'd use. And often when using detached signatures, you will also base64- or hex-encode the resulting signature:</p>
<blockquote><code>iex> message = "test"
"test"
iex> signed_message = :enacl.sign(message, keypair.secret)
<<143, 152, 176, 38, 66, 39, 246, 31, 9, 107, 120, 221, 227, 176, 240, 13, 25,
1, 236, 254, 16, 80, 94, 65, 71, 57, 6, 144, 122, 82, 53, 107, 233, 83, 26,
215, 109, 77, 1, 219, 7, 67, 77, 72, 147, 94, 245, 81, 222, 80, ...>>
iex> signature = :enacl.sign_detached(message, keypair.secret)
<<143, 152, 176, 38, 66, 39, 246, 31, 9, 107, 120, 221, 227, 176, 240, 13, 25,
1, 236, 254, 16, 80, 94, 65, 71, 57, 6, 144, 122, 82, 53, 107, 233, 83, 26,
215, 109, 77, 1, 219, 7, 67, 77, 72, 147, 94, 245, 81, 222, 80, ...>>
iex> signature <> message == signed_message
true
iex> signature |> Base.encode64()
"j5iwJkIn9h8Ja3jd47DwDRkB7P4QUF5BRzkGkHpSNWvpUxrXbU0B2wdDTUiTXvVR3lBULDNm0/t1DY8GBoxfCA=="
iex> signature |> Base.encode16(case: :lower)
"8f98b0264227f61f096b78dde3b0f00d1901ecfe10505e41473906907a52356be9531ad76d4d01db07434d48935ef551de50542c3366d3fb750d8f06068c5f08"
</code></blockquote>
<h3>Verifying</h3>
<p>To verify a signed message (the message combined with the signature), and then access the message itself, you'd use the enacl <tt>sign_open/2</tt> function:</p>
<blockquote><code>iex> unpacked_message = :enacl.sign_open(signed_message, public_key)
{:ok, "test"}
</code></blockquote>
<p>If you try to verify the signed message with a different public key (or if the message is otherwise improperly signed or not signed at all), you'll get an error result from the <tt>sign_open/2</tt> function:</p>
<blockquote><code>iex> wrong_public_key = (
...> "3d4017c3e843895a92b70aa74d1b7ebc9c982ccf2ec4968cc0cd55f12af4660c"
...> |> Base.decode16!(case: :lower)
...> )
<<61, 64, 23, 195, 232, 67, 137, 90, 146, 183, 10, 167, 77, 27, 126, 188, 156,
152, 44, 207, 46, 196, 150, 140, 192, 205, 85, 241, 42, 244, 102, 12>>
iex> error_result = :enacl.sign_open(signed_message, wrong_public_key)
{:error, :failed_verification}
</code></blockquote>
<p>To verify a message with a detached signature, you need the original message itself (in the same binary form with which it was signed), and the signature (in binary form as well). You pass them both, plus the public key, to the <tt>sign_verify_detached/3</tt> function; <tt>sign_verify_detached/3</tt> returns <tt>true</tt> if the signature is legit, and <tt>false</tt> otherwise:</p>
<blockquote><code>iex> :enacl.sign_verify_detached(signature, message, public_key)
true
iex> :enacl.sign_verify_detached(signature, "wrong message", public_key)
false
iex> :enacl.sign_verify_detached(signature, message, wrong_public_key)
false
</code></blockquote>
<h3>Full Example</h3>
<p>To put it all together, if you have an ed25519 private key, like <tt>"nWGxne/9WmC6hEr0kuwsxERJxWl7MmkZcDusAxyuf2A="</tt>, and you want to sign a message (<tt>"test"</tt>) that someone else already has in their possession, you'd do the following to produce a stand-alone signature that you can send them:</p>
<blockquote><code>iex> secret_key = (
...> "nWGxne/9WmC6hEr0kuwsxERJxWl7MmkZcDusAxyuf2A="
...> |> Base.decode64!()
...> |> :enacl.sign_seed_keypair()
...> |> Map.get(:secret)
...> )
<<157, 97, 177, 157, 239, 253, 90, 96, 186, 132, 74, 244, 146, 236, 44, 196, 68,
73, 197, 105, 123, 50, 105, 25, 112, 59, 172, 3, 28, 174, 127, 96, 215, 90,
152, 1, 130, 177, 10, 183, 213, 75, 254, 211, 201, 100, 7, 58, 14, 225, ...>>
iex> signature_base64 = (
...> "test"
...> |> :enacl.sign_detached(secret_key)
...> |> Base.encode64()
...> )
"j5iwJkIn9h8Ja3jd47DwDRkB7P4QUF5BRzkGkHpSNWvpUxrXbU0B2wdDTUiTXvVR3lBULDNm0/t1DY8GBoxfCA=="
</code></blockquote>
<p>And if you're the one given an ed25519 public key (<tt>"11qYAYKxCrfVS/7TyWQHOg7hcvPapiMlrwIaaPcHURo="</tt>) and signature (<tt>"j5iwJkIn9h8Ja3jd47DwDRkB7P4QUF5BRzkGkHpSNWvpUxrXbU0B2wdDTUiTXvVR3lBULDNm0/t1DY8GBoxfCA=="</tt>), with the original message (<tt>"test"</tt>) in hand you can verify the signature like the following:</p>
<blockquote><code>iex> public_key = (
...> "11qYAYKxCrfVS/7TyWQHOg7hcvPapiMlrwIaaPcHURo="
...> |> Base.decode64!()
...> )
<<215, 90, 152, 1, 130, 177, 10, 183, 213, 75, 254, 211, 201, 100, 7, 58, 14,
225, 114, 243, 218, 166, 35, 37, 175, 2, 26, 104, 247, 7, 81, 26>>
iex> signature_legitimate? = (
...> "j5iwJkIn9h8Ja3jd47DwDRkB7P4QUF5BRzkGkHpSNWvpUxrXbU0B2wdDTUiTXvVR3lBULDNm0/t1DY8GBoxfCA=="
...> |> Base.decode64!()
...> |> :enacl.sign_verify_detached("test", public_key)
...> )
true
</code></blockquote>
Justin Ludwighttp://www.blogger.com/profile/03245749869056259124noreply@blogger.com0tag:blogger.com,1999:blog-3778768890472614719.post-3236022914033426472020-08-28T10:31:00.001-07:002020-12-30T20:59:21.768-08:00Postgrex Ecto Types<p>One thing I found confusing about <a href="https://github.com/elixir-ecto/postgrex">Postgrex</a>, the excellent <a href="https://www.postgresql.org/">PostgreSQL</a> adapter for <a href="https://elixir-lang.org/">Elixir</a>, was how to use PostgresSQL-specific data types (<tt>cidr</tt>, <tt>inet</tt>, <tt>interval</tt>, <tt>lexeme</tt>, <tt>range</tt>, etc) with <a href="https://github.com/elixir-ecto/ecto">Ecto</a>. As far as I can tell, while Postgrex includes structs that can be converted to each type (like <a href="https://hexdocs.pm/postgrex/Postgrex.INET.html">Postgrex.INET</a> etc), you still have to write your own <a href="https://hexdocs.pm/ecto/Ecto.Type.html">Ecto.Type</a> implementation for each type you use in an Ecto schema.</p>
<p>For example, to implement an Ecto schema for a table like the following:</p>
<blockquote><code>CREATE TABLE hits (
id BIGSERIAL NOT NULL PRIMARY KEY,
url TEXT NOT NULL,
ip INET NOT NULL,
inserted_at TIMESTAMP NOT NULL
);
</code></blockquote>
<p>You'd need minimally to create an Ecto type implementation like the following:</p>
<blockquote><code># lib/my_app/inet_type.ex
defmodule MyApp.InetType do
@moduledoc """
`Ecto.Type` implementation for postgres `INET` type.
"""
use Ecto.Type
def type, do: :inet
def cast(term), do: {:ok, term}
def dump(term), do: {:ok, term}
def load(term), do: {:ok, term}
end
</code></blockquote>
<p>So that you could then define the Ecto schema with your custom Ecto type:</p>
<blockquote><code># lib/my_app/hits/hit.ex
defmodule MyApp.Hits.Hit do
@moduledoc """
The Hit schema.
"""
use Ecto.Schema
schema "hits" do
field :url, :string
field :ip, MyApp.InetType
timestamps updated_at: false
end
def changeset(hit, attrs) do
hit
|> cast(attrs, [:url, :ip])
|> validate_required([:url, :ip])
end
end
</code></blockquote>
<p>Note that in your migration code, you use the native database type name (eg <tt>inet</tt>), not your custom type name:</p>
<blockquote><code># priv/repo/migrations/202001010000_create_hits.exs
defmodule MyApp.Repo.Migrations.CreateHits do
use Ecto.Migration
def change do
create table(:hits) do
add :url, :text, null: false
add :ip, :inet, null: false
timestamps updated_at: false
end
end
end
</code></blockquote>
<h3>A Fancier Type</h3>
<p>However, the above basic <tt>InetType</tt> implementation limits this example <tt>Hit</tt> schema to working only with <tt>Postgrex.INET</tt> structs for its <tt>ip</tt> field — so, while creating a <tt>Hit</tt> record with the IP address specified via a <tt>Postgrex.INET</tt> struct works nicely:</p>
<blockquote><code>iex> (
...> %MyApp.Hits.Hit{}
...> |> MyApp.Hits.Hit.changeset(%{url: "/", ip: %Postgrex.INET{address: {127, 0, 0, 1}}})
...> |> MyApp.Repo.insert!()
...> |> Map.get(:ip)
...> )
%Postgrex.INET{address: {127, 0, 0, 1}}
</code></blockquote>
<p>Creating one with the IP address specified as a string (or even a plain tuple like <tt>{127, 0, 0, 1}</tt>) won't work:</p>
<blockquote><code>iex> (
iex> %MyApp.Hits.Hit{}
...> |> MyApp.Hits.Hit.changeset(%{url: "/", ip: "127.0.0.1"}})
...> |> MyApp.Repo.insert!()
...> |> Map.get(:ip)
...> )
** (Ecto.InvalidChangesetError) could not perform insert because changeset is invalid.
</code></blockquote>
<p>This can be solved by implementing a fancier version of the <tt>cast</tt> function in the <tt>MyApp.InetType</tt> module, enabling Ecto to cast strings (and tuples) to the <tt>Postgrex.INET</tt> type. Here's a version of <tt>MyApp.InetType</tt> that does that, as well as allows the <tt>Postgrex.INET</tt> struct to be serialized as a string (including when serialized to JSON with the <a href="https://github.com/michalmuskala/jason">Jason</a> library, or when rendered as part of a <a href="https://hexdocs.pm/phoenix_html/Phoenix.HTML.html">Phoenix HTML</a> template):</p>
<blockquote><code># lib/my_app/inet_type.ex
defmodule MyApp.InetType do
@moduledoc """
`Ecto.Type` implementation for postgres `INET` type.
"""
use Bitwise
use Ecto.Type
alias Postgrex.INET
def type, do: :inet
def cast(nil), do: {:ok, nil}
def cast(""), do: {:ok, nil}
def cast(%INET{address: nil}), do: {:ok, nil}
def cast(%INET{} = term), do: {:ok, term}
def cast(term) when is_tuple(term), do: {:ok, %INET{address: term}}
def cast(term) when is_binary(term) do
[addr | mask] = String.split(term, "/", parts: 2)
with {:ok, address} <- parse_address(addr),
{:ok, number} <- parse_netmask(mask),
{:ok, netmask} <- validate_netmask(number, address) do
{:ok, %INET{address: address, netmask: netmask}}
else
message -> {:error, [message: message]}
end
end
def cast(_), do: :error
def dump(term), do: {:ok, term}
def load(term), do: {:ok, term}
defp parse_address(addr) do
case :inet.parse_strict_address(String.to_charlist(addr)) do
{:ok, address} -> {:ok, address}
_ -> "not a valid IP address"
end
end
defp parse_netmask([]), do: {:ok, nil}
defp parse_netmask([mask]) do
case Integer.parse(mask) do
{number, ""} -> {:ok, number}
_ -> "not a CIDR netmask"
end
end
defp validate_netmask(nil, _addr), do: {:ok, nil}
defp validate_netmask(mask, _addr) when mask < 0 do
"CIDR netmask cannot be negative"
end
defp validate_netmask(mask, addr) when mask > 32 and tuple_size(addr) == 4 do
"CIDR netmask cannot be greater than 32"
end
defp validate_netmask(mask, _addr) when mask > 128 do
"CIDR netmask cannot be greater than 128"
end
defp validate_netmask(mask, addr) do
ipv4 = tuple_size(addr) == 4
max = if ipv4, do: 32, else: 128
subnet = if ipv4, do: 8, else: 16
bits =
addr
|> Tuple.to_list()
|> Enum.reverse()
|> Enum.with_index()
|> Enum.reduce(0, fn {value, index}, acc -> acc + (value <<< (index * subnet)) end)
bitmask = ((1 <<< max) - 1) ^^^ ((1 <<< (max - mask)) - 1)
if (bits &&& bitmask) == bits do
{:ok, mask}
else
"masked bits of IP address all must be 0s"
end
end
end
defimpl String.Chars, for: Postgrex.INET do
def to_string(%{address: address, netmask: netmask}) do
"#{address_to_string(address)}#{netmask_to_string(netmask)}"
end
defp address_to_string(nil), do: ""
defp address_to_string(address), do: address |> :inet.ntoa()
defp netmask_to_string(nil), do: ""
defp netmask_to_string(netmask), do: "/#{netmask}"
end
defimpl Jason.Encoder, for: Postgrex.INET do
def encode(term, opts), do: term |> to_string() |> Jason.Encode.string(opts)
end
defimpl Phoenix.HTML.Safe, for: Postgrex.INET do
def to_iodata(term), do: term |> to_string()
end
</code></blockquote>
<h3>Alternative Canonical Representation</h3>
<p>Note that an alternative way of implementing your Ecto type would be to make the <tt>dump</tt> and <tt>load</tt> functions round-trip the <tt>Postgrex.INET</tt> struct to and from some more convenient canonical representation (like a plain string). For example, a <tt>MyApp.InetType</tt> like the following would allow you to use plain strings to represent IP address values in your schemas (instead of <tt>Postgrex.INET</tt> structs). It would dump each such string to a <tt>Postgrex.INET</tt> struct when Ecto attempts to save the value to the database, and load the value from a <tt>Postgrex.INET</tt> struct into a string when Ecto attempts to load the value from the database:</p>
<blockquote><code># lib/my_app/inet_type.ex
defmodule MyApp.InetType do
@moduledoc """
`Ecto.Type` implementation for postgres `INET` type.
"""
use Ecto.Type
alias Postgrex.INET
def type, do: :inet
def cast(nil), do: {:ok, ""}
def cast(term) when is_tuple(term), do: {:ok, address_to_string(term)}
def cast(term) when is_binary(term), do: {:ok, term}
def cast(_), do: :error
def dump(nil), do: {:ok, nil}
def dump(""), do: {:ok, nil}
def dump(term) when is_binary(term) do
[addr | mask] = String.split(term, "/", parts: 2)
with {:ok, address} <- parse_address(addr),
{:ok, number} <- parse_netmask(mask),
{:ok, netmask} <- validate_netmask(number, address) do
{:ok, %INET{address: address, netmask: netmask}}
else
message -> {:error, [message: message]}
end
end
def dump(_), do: :error
def load(nil), do: {:ok, ""}
def load(%INET{address: address, netmask: netmask}) do
"#{address_to_string(address)}#{netmask_to_string(netmask)}"
end
def load(_), do: :error
defp parse_address(addr) do
case :inet.parse_strict_address(String.to_charlist(addr)) do
{:ok, address} -> {:ok, address}
_ -> "not a valid IP address"
end
end
defp parse_netmask([]), do: {:ok, nil}
defp parse_netmask([mask]) do
case Integer.parse(mask) do
{number, ""} -> {:ok, number}
_ -> "not a CIDR netmask"
end
end
defp validate_netmask(nil, _addr), do: {:ok, nil}
defp validate_netmask(mask, _addr) when mask < 0 do
"CIDR netmask cannot be negative"
end
defp validate_netmask(mask, addr) when mask > 32 and tuple_size(addr) == 4 do
"CIDR netmask cannot be greater than 32"
end
defp validate_netmask(mask, _addr) when mask > 128 do
"CIDR netmask cannot be greater than 128"
end
defp validate_netmask(mask, addr) do
ipv4 = tuple_size(addr) == 4
max = if ipv4, do: 32, else: 128
subnet = if ipv4, do: 8, else: 16
bits =
addr
|> Tuple.to_list()
|> Enum.reverse()
|> Enum.with_index()
|> Enum.reduce(0, fn {value, index}, acc -> acc + (value <<< (index * subnet)) end)
bitmask = ((1 <<< max) - 1) ^^^ ((1 <<< (max - mask)) - 1)
if (bits &&& bitmask) == bits do
{:ok, mask}
else
"masked bits of IP address all must be 0s"
end
end
defp address_to_string(nil), do: ""
defp address_to_string(address), do: address |> :inet.ntoa()
defp netmask_to_string(nil), do: ""
defp netmask_to_string(netmask), do: "/#{netmask}"
end
</code></blockquote>
Justin Ludwighttp://www.blogger.com/profile/03245749869056259124noreply@blogger.com0tag:blogger.com,1999:blog-3778768890472614719.post-51757808570666280342020-08-14T16:24:00.004-07:002020-08-14T16:24:55.424-07:00Elixir Event Queue<p>As I'm learning <a href="https://elixir-lang.org/">Elixir</a>, I was trying to search for the idiomatic way for building a event queue in Elixir. After a few twists and turns, I found that it's easy, elegant, and pretty well documented — you just need to know what to look for.</p>
<p>There are a number of nifty "job queue" libraries for Elixir (like <a href="https://github.com/koudelka/honeydew">Honeydew</a> or <a href="https://github.com/sorentwo/oban">Oban</a>), but they're directed more toward queueing jobs themselves, rather than enabling a single job to work on a queue of items. What I was looking for was this:</p>
<ol>
<li>A singleton queue that would have events enqueued from multiple processes (in Elixir-world, this would take the form of an <a href="https://hexdocs.pm/elixir/Agent.html">Agent</a>).</li>
<li>A client app running multiple processes that would enqueue events (in my case, a <a href="https://www.phoenixframework.org/">Phoenix</a> app).</li>
<li>A worker process that dequeues a batch of events and processes them (in Elixir-world, this would be a <a href="https://hexdocs.pm/elixir/GenServer.html">GenServer</a>).</li>
</ol>
<h3 id="Example">An Example</h3>
<p>Following is an example of what I found to be the idiomatic Elixir way of implementing this, with 1) a generic agent that holds a queue as its state (<tt>QueueAgent</tt>), 2) the bits of a Phoenix app that listens for Phoenix telemetry events and enqueues some data from them onto this queue (<tt>RequestListener</tt>), and 3) a gen-server worker that dequeues those events and saves them to the DB (<tt>RequestSaver</tt>). These three components each are started by the Phoenix application's supervisor (4).</p>
<h3 id="QueueAgent">1. The Queue Agent</h3>
<p>The <tt>QueueAgent</tt> module holds the state of the queue, as a <a href="https://hexdocs.pm/qex/Qex.html">Qex</a> struct. Qex is a wrapper around the native Erlang/OTP <a href="https://erlang.org/doc/man/queue.html">:queue</a> module, adding some Elixir syntactic sugar and implementing the <a href="https://hexdocs.pm/elixir/Inspect.html">Inpsect</a>, <a href="https://hexdocs.pm/elixir/Collectable.html">Collectable</a>, and <a href="https://hexdocs.pm/elixir/Enumerable.html">Enumerable</a> protocols.</p>
<p>The <tt>QueueAgent</tt> module can pretty much just proxy basic Qex calls through the core agent <tt>get</tt>, <tt>update</tt>, and <tt>get_and_update</tt> functions. Each of these functions accepts a function itself, to which current state of the agent (the Qex queue) is passed. The functions accepted by <tt>update</tt> and <tt>get_and_update</tt> also return the new state of the agent (the updated Qex queue).</p>
<blockquote><code>
# lib/my_app/queue_agent.ex
defmodule MyApp.QueueAgent do
@moduledoc """
Agent that holds a queue as its state.
"""
use Agent
@doc """
Starts the agent with the specified options.
"""
@spec start_link(GenServer.options()) :: Agent.on_start()
def start_link(opts \\ []) do
Agent.start_link(&Qex.new/0, opts)
end
@doc """
Returns the length of the queue.
"""
@spec count(Agent.agent()) :: integer
def count(agent) do
Agent.get(agent, & &1) |> Enum.count()
end
@doc """
Enqueues the specified item to the end of the queue.
"""
@spec push(Agent.agent(), any) :: :ok
def push(agent, value) do
Agent.update(agent, &Qex.push(&1, value))
end
@doc """
Dequeues the first item from the front of the queue, and returns it.
If the queue is empty, returns the specified default value.
"""
@spec pop(Agent.agent(), any) :: any
def pop(agent, default \\ nil) do
case Agent.get_and_update(agent, &Qex.pop/1) do
{:value, value} -> value
_ -> default
end
end
@doc """
Takes the specified number of items off the front of the queue, and returns them.
If the queue has less than the specified number of items, empties the queue
and returns all items.
"""
@spec split(Agent.agent(), integer) :: Qex.t()
def split(agent, max) do
Agent.get_and_update(agent, fn queue ->
Qex.split(queue, Enum.min([Enum.count(queue), max]))
end)
end
end
</code></blockquote>
<h3 id="RequestListener">2. The Request Listener</h3>
<p>The <tt>RequestListener</tt> module attaches a <a href="https://github.com/beam-telemetry/telementry">Telemetry</a> listener (with the arbitrary name <tt>"my_app_web_request_listener"</tt>) to handle one specific event (the "response sent" event from the <a href="https://hexdocs.pm/phoenix/Phoenix.Logger.html">Phoenix Logger</a>, identified by <tt>[:phoenix, :endpoint, :stop]</tt>). The listener's <tt>handle_event</tt> function will be called whenever a response is sent (including error responses), and the response's <a href="https://hexdocs.pm/plug/Plug.Conn.html">Plug.Conn</a> struct will be included under the <tt>:conn</tt> key of the event metadata.</p>
<p>In handling the event, the <tt>RequestListener</tt> simply enqueues a new map containing the details about the request that I want to save to a named <tt>QueueAgent</tt> queue. The name can be arbitrary — in this example it's <tt>MyApp.Request</tt> (a module name that doesn't happen to exist) — what's important is that a <tt>QueueAgent</tt> with that name has been started (it will be started by the application, later on in step 4), and that the <tt>RequestSaver</tt> (later on in step 3) will use the same name to dequeue events.</p>
<blockquote><code>
# lib/my_app_web/request_listener.ex
defmodule MyAppWeb.RequestListener do
@moduledoc """
Listens for request telemetry events, and queues them to be saved.
"""
require Logger
@response_sent [:phoenix, :endpoint, :stop]
@events [@response_sent]
@doc """
Sets up event listener.
"""
def setup do
:telemetry.attach_many("my_app_web_request_listener", @events, &handle_event/4, nil)
end
@doc """
Telemetry callback to handle specified event.
"""
def handle_event(@response_sent, measurement, metadata, _config) do
handle_response_sent(measurement, metadata, MyApp.RequestQueue)
end
@doc """
Handles Phoenix response sent event.
"""
def handle_response_sent(measurement, metadata, queue_name) do
conn = metadata.conn
reason = conn.assigns[:reason]
MyApp.QueueAgent.push(queue_name, %{
inserted_at: DateTime.utc_now(),
ip: conn.remote_ip,
request_id: Logger.metadata()[:request_id],
controller: conn.private[:phoenix_controller],
action: conn.private[:phoenix_action],
status: conn.status,
method: conn.method,
path: conn.request_path,
query: conn.query_string,
error: if(reason, do: Exception.message(reason)),
# nanoseconds
duration: measurement.duration
})
end
end
</code></blockquote>
<h3 id="RequestSaver">3. The Request Saver</h3>
<p>The <tt>RequestSaver</tt> module is run as a dedicated process, dequeueing batches of up to 100 events, and saving each batch. When done saving, it will "sleep" for a minute, then try to dequeue some more events. Everything but the <tt>do_work</tt>, <tt>save_next_batch</tt>, <tt>save_batch</tt>, and <tt>batch_changeset</tt> functions are boilerplate gen-server functionality for running a process periodically.</p>
<p>The <tt>do_work</tt> function uses the same <tt>MyApp.RequestQueue</tt> name as the <tt>RequestListener</tt> to identify the queue, ensuring that both modules use the same <tt>QueueAgent</tt> instance. The <tt>save_next_batch</tt> function dequeues up to 100 events and saves them via the <tt>save_batch</tt> function (and continues working until it has emptied the queue). The <tt>save_batch</tt> and <tt>batch_changeset</tt> functions create and commit an <a href="https://hexdocs.pm/ecto/Ecto.Multi.html">Ecto.Multi</a> changeset using the app's <tt>MyApp.RequestEvent</tt> schema (not included in this example, but as you can imagine, it would include fields for the various properties that the <tt>RequestListener</tt> extracted from the event metadata).</p>
<p>The <tt>handle_info</tt> callback is the entry point for the gen-server's processing. It ignores the gen-server's state (it doesn't need to maintain any state itself) — it simply does some work, and then calls <tt>schedule_work</tt> to schedule itself to be called again in another minute.</p>
<blockquote><code>
# lib/my_app/request_saver.ex
defmodule MyApp.RequestSaver do
@moduledoc """
Saves queued events to the DB.
"""
use GenServer
@doc """
Starts the server with the specified options.
"""
def start_link(_opts) do
GenServer.start_link(__MODULE__, %{})
end
@doc """
GenServer callback to start process.
"""
@impl true
def init(state) do
schedule_work()
{:ok, state}
end
@doc """
GenServer callback to handle process messages.
"""
@impl true
def handle_info(:work, state) do
do_work()
schedule_work()
{:noreply, state}
end
@doc """
Does the next unit of work.
"""
def do_work do
save_next_batch(MyApp.RequestQueue)
end
@doc """
Pops the next 100 events from the specified queue and saves them.
"""
def save_next_batch(queue_name) do
batch = MyApp.QueueAgent.split(queue_name, 100)
if Enum.count(batch) > 0 do
save_batch(batch)
save_next_batch(queue_name)
end
end
@doc """
Saves the specified list of events in one big transaction.
"""
def save_batch(batch) do
batch_changeset(batch)
|> MyApp.Repo.transaction()
end
@doc """
Creates an Ecto.Multi from the specified list of events.
"""
def batch_changeset(batch) do
batch
|> Enum.reduce(Ecto.Multi.new(), fn event, multi ->
changeset = MyApp.RequestEvent.changeset(event)
Ecto.Multi.insert(multi, {:event, event.request_id}, changeset)
end)
end
defp schedule_work do
# in 1 minute
Process.send_after(self(), :work, 60 * 1000)
end
end
</code></blockquote>
<h3 id="Application">4. The Application Supervisor</h3>
<p>The above three components are all started in my Phoenix app via the standard Phoenix <tt>Application</tt> module. On start, it calls the <tt>RequestListener</tt> <tt>setup</tt> function, registering the <tt>RequestListener</tt> to receive Phoenix Telemetry events. Then the <tt>RequestSaver</tt> gen-server is started as a child process of the app (with no arguments, identified by its own module name); and the <tt>QueueAgent</tt> agent is also started as a child process — but with a <tt>name</tt> option, so that it can be identified via the <tt>MyApp.RequestQueue</tt> name. (Lines added to the boilerplate Phoneix <tt>Application</tt> module are highlighted in green.)</p>
<blockquote><code>
# lib/my_app/application.ex
defmodule MyApp.Application do
@moduledoc false
use Application
def start(_type, _args) do
<ins>MyAppWeb.RequestListener.setup()</ins>
children = [
MyApp.Repo,
MyAppWeb.Endpoint<ins>,
MyApp.RequestSaver,
{MyApp.QueueAgent, name: MyApp.RequestQueue}</ins>
]
opts = [strategy: :one_for_one, name: MyApp.Supervisor]
Supervisor.start_link(children, opts)
end
end
</code></blockquote>
<p>However, you usually don't want periodic jobs popping up randomly while you run your unit tests; so I added a little extra logic to avoid starting up the <tt>RequestSaver</tt> in test mode:</p>
<blockquote><code>
# lib/my_app/application.ex
defmodule MyApp.Application do
@moduledoc false
use Application
def start(_type, _args) do
<ins>MyAppWeb.RequestListener.setup()
periodic_jobs =
if Mix.env != :test do
[MyApp.RequestSaver]
else
[]
end</ins>
children = [
MyApp.Repo,
MyAppWeb.Endpoint<ins>,
{MyApp.QueueAgent, name: MyApp.RequestQueue}</ins>
]<ins> ++ periodic_jobs</ins>
opts = [strategy: :one_for_one, name: MyApp.Supervisor]
Supervisor.start_link(children, opts)
end
end
</code></blockquote>
<p>The overall processing flow of this queueing system, then, works like this:</p>
<ol>
<li>A request is handled by Phoenix, which raises a "response sent" telemetry event.</li>
<li>The <tt>RequestListener</tt> <tt>handle_event</tt> function is called by the Phoenix process.</li>
<li>The <tt>RequestListener</tt> calls the <tt>QueueAgent</tt> <tt>push</tt> function to queue the event (which the <tt>QueueAgent</tt> does within its own internal process).</li>
<li>Once a minute, the <tt>RequestSaver</tt> process runs the <tt>handle_info</tt> function, which tries to dequeue the next batch of events via the <tt>QueueAgent</tt> <tt>split</tt> function (again with the <tt>QueueAgent</tt> managing the state update in its own internal process).</li>
<li>The <tt>RequestSaver</tt>, continuing on in its process, saves any dequeued events to the DB.</li>
</ol>
Justin Ludwighttp://www.blogger.com/profile/03245749869056259124noreply@blogger.com0tag:blogger.com,1999:blog-3778768890472614719.post-26977235492794716322020-07-31T13:21:00.000-07:002020-07-31T13:21:47.231-07:00WeeChat Light Theme<p>When I started using <a href="https://weechat.org/">WeeChat</a> as my IRC client, I searched around for a light theme to use with it (since I use a light theme for my terminal, and the default WeeChat colors make several UI bits difficult to read with a light terminal theme). I couldn't find much in the way of WeeChat light themes, however, so I set up my own — I went through all the color settings listed in the <a href="https://weechat.org/files/doc/stable/weechat_user.en.html">WeeChat User's Guide</a>, and replaced defaults that showed up as dark-on-dark or light-on-light in my terminal with colors that instead would be dark-on-light or light-on-dark.</p>
<p>These are the settings I ended up modifying to build the light theme I now use (you can look up each setting in the User's Guide for a brief description of each, if you're curious).</p>
<blockquote><code>/set weechat.bar.status.color_bg gray
/set weechat.bar.title.color_bg gray
/set weechat.color.chat_buffer black
/set weechat.color.chat_channel black
/set weechat.color.chat_nick_offline_highlight_bg gray
/set weechat.color.chat_nick_self darkgray
/set weechat.color.chat_prefix_action darkgray
/set weechat.color.status_data_msg lightblue
/set weechat.color.status_more lightblue
/set weechat.color.status_name darkgray
/set weechat.color.status_number lightblue
/set buflist.format.buffer_current "${color:,yellow}${format_buffer}"
/set buflist.format.hotlist_low "${color:cyan}"
/set irc.color.topic_new red
/set relay.color.text_selected lightblue
</code></blockquote>
<p>There are a lot more color settings, but these were the only ones I needed to change to fix the dark-on-dark and light-on-light issues (I left the other settings alone). I use <a href="https://github.com/susam/inwee">InWee</a> to manage my custom WeeChat settings; I store these color settings in a file called <tt>colors.txt</tt>, and then start up WeeChat and run a command like <tt>inwee colors.txt</tt> whenever I want to make changes.</p>
<p>Here's a screenshot of what it looks like in my termial's color scheme:</p>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgjBUgxKnuZQvgQCNMVWT_u0ZcM9_yXKDGM2yz1CaHsKFli9u-y4vVZhzxBhPcWHnXCpkstWE4i5CVnHQMHdI4KgudmH5NsOlzWRy7WtdL1McQD1Wt6reAZmaKJk3AjO_LB3NrzLJJwFxo/s949/weechat.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" data-original-height="516" data-original-width="949" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgjBUgxKnuZQvgQCNMVWT_u0ZcM9_yXKDGM2yz1CaHsKFli9u-y4vVZhzxBhPcWHnXCpkstWE4i5CVnHQMHdI4KgudmH5NsOlzWRy7WtdL1McQD1Wt6reAZmaKJk3AjO_LB3NrzLJJwFxo/s640/weechat.png" width="640" /></a></div>
Justin Ludwighttp://www.blogger.com/profile/03245749869056259124noreply@blogger.com0tag:blogger.com,1999:blog-3778768890472614719.post-69077538642438690442020-02-23T12:11:00.000-08:002020-02-23T12:11:59.004-08:00IPv6 for Private AWS Subnets<p>Instead of setting up <a href="https://docs.aws.amazon.com/vpc/latest/userguide/VPC_Scenario2.html">"private" subnets in AWS</a> with the use of moderately-expensive AWS <a href="https://docs.aws.amazon.com/vpc/latest/userguide/vpc-nat-gateway.html">NAT Gateways</a>, I've been experimenting with free <a href="https://docs.aws.amazon.com/vpc/latest/userguide/egress-only-internet-gateway.html">Egress-Only Internet Gateways</a> (EIGW). The downside with EIGW is that it's IPv6 only — so you can communicate to the outside world from instances in your private subnet only through IPv6.</p>
<p>In theory, that shouldn't be a problem — it's 2020 and IPv6 just works everywhere, right? Well, actually, in practice it does work pretty well — but there are still a few gotchas. Here are a few hurdles I had to work around when setting up a fleet of Ubuntu Linux EC2 instances with <a href="https://docs.ansible.com/ansible/latest/index.html">Ansible</a>:</p>
<h3>APT Repos</h3>
<p>The base AMIs that Ubuntu provides are configured to use the Ubuntu APT repos hosted by AWS (like <tt>us-east-1.ec2.archive.ubuntu.com</tt>); however, these repos only support IPv4. So the first thing you need to do is change the repos listed in <tt>/etc/apt/sources.list</tt> to use external repos that support IPv6 (like <tt>us.archive.ubuntu.com</tt>, or other <a href="https://launchpad.net/ubuntu/+archivemirrors">Ubuntu mirrors</a> you might find).</p>
<p>And since you won't be able to use IPv4 to access the repos, you can speed up APT updates by configuring APT to try only IPv6. To do so, add a file in your <tt>/etc/apt/apt.conf.d/</tt> directory (call it something like <tt>99force-ipv6</tt>) with the following content:</p>
<blockquote><code>Acquire::ForceIPv6 "true";</code></blockquote>
<p>Also don't forget that if you do set up a restrictive <a href="https://docs.aws.amazon.com/vpc/latest/userguide/vpc-network-acls.html">Network ACL</a> for your private subnet, you'll need to allow inbound TCP access to the standard Linux ethereal port range (32768-61000) from whatever APT repos you use.</p>
<h3>NTP Pools</h3>
<p>The NTP pools used by the base AMIs also don't support IPv6. I use the <a href="https://www.eecis.udel.edu/~mills/ntp/html/index.html">traditional NTP daemon</a> provided by the Ubuntu <tt>ntp</tt> package, rather the default <tt>systemd-tymesyncd</tt> service. To configure the NTP daemon, I remove all the default pools from the <tt>/etc/ntp.conf</tt> file, and instead just use the <tt>2.us.pool.ntp.org</tt> pool (the convention with NTP is that for domains that have numbered pools, like <tt>0.us.pool.ntp.org</tt>, <tt>1.us.pool.ntp.org</tt>, <tt>2.us.pool.ntp.org</tt>, etc, the pool numbered <tt>2</tt> is the one that supports IPv6).</p>
<p>Specifically, this is how I configure the <tt>2.us.pool.ntp.org</tt> pool in <tt>/etc/ntp.conf</tt>:</p>
<blockquote><code>pool -6 2.us.pool.ntp.org iburst minpoll 10 maxpoll 12</code></blockquote>
<p>The <tt>-6</tt> flag means to use IPv6; the <tt>iburst</tt> part is supposed to help speed up initial synchronization; the <tt>minpoll 10</tt> part means to poll no more often than every 2^10 seconds (around 17 minutes); and the <tt>maxpoll 12</tt> part means to poll no less often than every 2^12 seconds (around 68 minutes).</p>
<p>Also, if you set up a restrictive Network ACL for your private subnet, you'll need to allow inbound access to UDP port 123.</p>
<h3>AWS APIs</h3>
<p>If you are planning to directly call AWS APIs (either through the various per-language <a href="https://aws.amazon.com/tools/">SDKs</a>, or the <a href="https://docs.aws.amazon.com/cli/latest/reference/">CLI</a>), a huge gotcha is that very few AWS services as of yet provide IPv6 endpoints. This means that you won't be able to use most AWS services at all from within your private IPv6 subnet (with the exception of services that consist of instances that themselves reside within your VPCs, like RDS; rather than endpoints hosted outside of your VPCs, like DynamoDB).</p>
<p>The only major AWS service I've tried that does support IPv6 through its APIs is S3. When connecting to it via CLI, you can get it to use IPv6 by explicitly specifying the "dualstack" endpoint via command-line flag, like this:</p>
<blockquote><code>aws --endpoint-url https://s3.dualstack.us-east-1.amazonaws.com --region us-east-1 s3 ls</code></blockquote>
<p>Or, alternately, you can enable IPv6 usage via the AWS config file (<tt>~/.aws/config</tt>), like this:</p>
<blockquote><code>[default]
region = us-east-1
s3 =
use_dualstack_endpoint = true
addressing_style = virtual
</code></blockquote>
<h3>Ansible Inventory</h3>
<p>To access EC2 instances in a private subnet, typically you'd use a VPN running in a public subnet of the same (or bridged) VPC, with the VPN client set to route the VPC's private IPv4 block through the VPN. For IPv6, I also have my VPN set to route the VPC's IPv6 block through the VPN, too.</p>
<p>Using Ansible through a VPN with IPv4 is pretty much as simple as configuring Ansible's <tt>ec2.ini</tt> file to set its <tt>destination_variable</tt> and <tt>vpc_destination_variable</tt> settings to <tt>private_ip_address</tt>. But since I decided to disallow any IPv4 access to my private subnets (even from other subnets within the same VPC), I had to jump through a few extra hoops:</p>
<h5>1. Custom Internal Domain Names</h5>
<p>I use a custom internal domain name for all my servers (I'll use <tt>example.net</tt> as the custom domain in the following examples), and assign each server its own domain name (like <tt>db1.example.net</tt> or <tt>mail2.example.net</tt>, etc). When I launch a new EC2 server, I create a DNS <tt>AAAA</tt> record for it (via Route53), pointing the DNS record to the IPv6 address of the newly-launched server. In this way I can use the DNS name to refer to the same server throughout the its lifetime.</p>
<p>I also tag my EC2 instances as soon as I launch them with tags from which the DNS name can be constructed. For example, I'd assign the server with the DNS name of <tt>fe3.example.net</tt> a "node" tag of <tt>fe</tt> and a "number" tag of <tt>3</tt>.
<h5>2. SSH Config</h5>
<p>In my SSH config file (<tt>~/.ssh/config</tt>), I have an entry like the following, to make sure SSH (and Ansible) only tries to access my EC2 instances through IPv6:</p>
<blockquote><code>Host *.example.net
AddressFamily inet6
</code></blockquote>
<h5>3. Ansible EC2 Config</h5>
<p>With the above two elements in place, I can then enable the <tt>destination_format</tt> (and <tt>destination_format_tags</tt>) settings in the Ansible <tt>ec2.ini</tt> configuration file to direct Ansible to use DNS names instead of IP address for EC2 inventory. With the "node" and "number" tags described above, I can use the following configuration in my <tt>ec2.ini</tt> file:</p>
<blockquote><code>destination_format = {0}{1}.example.net
destination_format_tags = node,number
</code></blockquote>
<p>When the above is set up correctly, you can run the <tt>ec2.py</tt> script (eg as <tt>./ec2.py</tt>), and see your DNS names in its output (like <tt>db1.example.net</tt> or <tt>mail2.example.net</tt>, etc), instead of IPv4 addresses. And when you run an ad-hoc Ansible module (like <tt>ansible fe3.example.com -i ec2.py -m setup</tt>) everything should "just work".</p>
Justin Ludwighttp://www.blogger.com/profile/03245749869056259124noreply@blogger.com0tag:blogger.com,1999:blog-3778768890472614719.post-87941110448394448322020-01-08T11:40:00.001-08:002020-10-02T17:59:24.728-07:00Testing Systemd Services with Docker-Compose<p>I've been using <a href="https://docs.docker.com/">Docker</a> containers to test out the install process for a project I've been working on, and have found it can be a little tricky to get systemd booted up and running in Docker. Normally running a service manager like <a href="https://systemd.io/">systemd</a> within a container would be redundant and unnecessary, but in this case I'm specifically trying to test out systemd service files (and directory paths, and user permissions, etc) that have been set up by my install process.</p>
<p>With the base 18.04 <a href="https://hub.docker.com/_/ubuntu/">Ubuntu image</a>, there are 4 key steps to getting systemd running and testable in a Docker container:</p>
<ol>
<li>Install systemd</li>
<li>Map a few key system directories</li>
<li>Start up with <tt>/lib/systemd/systemd</tt></li>
<li>Use <tt>docker exec</tt> to test</li>
</ol>
<h3>1. Install systemd</h3>
<p>With the base Ubuntu image, it's as simple as installing the <tt>systemd</tt> package with <tt>apt-get</tt> — like this <tt>Dockerfile</tt>:</p>
<blockquote><code>FROM ubuntu:18.04
ENV DEBIAN_FRONTEND noninteractive
RUN apt-get update && apt-get install -y systemd
CMD ["/lib/systemd/systemd"]
</code></blockquote>
<h3>2. Map a few key system directories</h3>
<p>I found I had to mount <tt>/run</tt> and <tt>/run/lock</tt> as <tt>tmpfs</tt> directories and map <tt>/sys/fs/cgroup</tt> to my local <tt>/sys/fs/cgroup</tt> directory. You can do that with this <tt>docker-compose.yml</tt> file:
<blockquote><code>version: '3'
services:
my_test_container:
build: .
image: my_test_image
tmpfs:
- /run
- /run/lock
volumes:
- /sys/fs/cgroup:/sys/fs/cgroup:ro
</code></blockquote>
<p>Or, alternately, when using the <tt>docker run</tt> command, specifying the <tt>--tmpfs /run --tmpfs /run/lock --volume /sys/fs/cgroup:/sys/fs/cgroup:ro</tt> flags.</p>
<h3>3. Start up with <tt>/lib/systemd/systemd</tt></h3>
<p>I added <tt>CMD ["/lib/systemd/systemd"]</tt> to my <tt>Dockerfile</tt> to start the container with <tt>/lib/systemd/systemd</tt> by default; but you can instead add <tt>command: /lib/systemd/systemd</tt> to a service in your <tt>docker-compose.yml</tt> file, or just run <tt>/lib/systemd/systemd</tt> directly with the <tt>docker run</tt> command.</p>
<h3>4. Use <tt>docker exec</tt> to test</h3>
<p>With the above <tt>docker-compose.yml</tt> and <tt>Dockerfile</tt>, you can start up the test container with one command:</p>
<blockquote><code>docker-compose up -d my_test_container</code></blockquote>
<p>And then, with systemd running, use a second command to execute a shell on the container to test it out:</p>
<blockquote><code>docker-compose exec my_test_container bash</code></blockquote>
<p>Or use <tt>exec</tt> to run whatever other commands you need to test systemd:</p>
<blockquote><code>docker-compose exec my_test_container systemctl list-units</code></blockquote>
<p>Alternately, if you built and ran the above <tt>Dockerfile</tt> with <tt>docker</tt> commands instead of <tt>docker-compose</tt>, you'd use the following command to test the container out:
<blockquote><code>docker exec -it my_test_container bash</code></blockquote>
<h3>Cleaning up</h3>
<p>To clean everything up, stop the container with <tt>docker-compose</tt>:</p>
<blockquote><code>docker-compose stop my_test_container</code></blockquote>
<p>Then remove the container:</p>
<blockquote><code>docker-compose rm my_test_container</code></blockquote>
<p>And finally, remove the image:</p>
<blockquote><code>docker image rm my_test_image</code></blockquote>
<p>Or execute all 3 clean-up steps at once (as well as removing all other containers/images referenced by your <tt>docker-compose.yml</tt> file), in a single command:</p>
<blockquote><code>docker-compose down --rmi all</code></blockquote>
<h3>Without <tt>docker-compose</tt></h3>
<p>The following <tt>docker</tt> commands would allow you to build, run, and clean up the above <tt>Dockerfile</tt> without using <tt>docker-compose</tt> at all:</p>
<blockquote><code>docker build --tag my_test_image .
docker run \
--tmpfs /run --tmpfs /run/lock \
--volume /sys/fs/cgroup:/sys/fs/cgroup:ro \
--detach --rm \
--name my_test_container my_test_image
docker exec --interactive --tty my_test_container bash
docker stop my_test_container
docker image rm my_test_image
</code></blockquote>
Justin Ludwighttp://www.blogger.com/profile/03245749869056259124noreply@blogger.com0tag:blogger.com,1999:blog-3778768890472614719.post-89519237960889526272019-10-25T11:05:00.000-07:002019-10-25T16:24:05.230-07:00Adapting PostgreSQL Timestamps To Arrow With Psycopg2<p>I did some digging the other day to try to figure out how to use the excellent Python datetime library <a href="https://github.com/crsmithdev/arrow">Arrow</a> with the workhorse <a href="http://initd.org/psycopg/">psycopg2</a> Python-PostgreSQL database adapter (plus the nifty <a href="https://github.com/coleifer/peewee">Peewee</a> ORM on top of psycopg2). I was pleasantly surprised how easy and painless it was to implement, with help from a blog post by <a href="https://medium.com/building-the-system/breaking-free-from-the-orm-replacing-database-queries-3e8c4820d697">Omar Rayward</a>, and the <a href="http://initd.org/psycopg/docs/advanced.html#adapting-new-python-types-to-sql-syntax">psycopg2 docs</a> (and source code) as a guide.</p>
<p>There are 5 core PostgreSQL date/time types that Arrow can handle, which psycopg2 maps to the 3 core Python date/time classes — by default through 4 core psycopg2 datatypes:</p>
<table class="table">
<thead>
<tr>
<th>PostgreSQL Type</th>
<th>Example Output</th>
<th>Psycopg2 Type</th>
<th>Python Type</th>
</tr>
</thead>
<tbody>
<tr>
<td>timestamp [without time zone]</td>
<td>2001-02-03 04:05:06</td>
<td>PYDATETIME</td>
<td>datetime</td>
</tr>
<tr>
<td>timestamp with time zone</td>
<td>2001-02-03 04:05:06-07</td>
<td>PYDATETIMETZ</td>
<td>datetime</td>
</tr>
<tr>
<td>date</td>
<td>2001-02-03</td>
<td>PYDATE</td>
<td>date</td>
</tr>
<tr>
<td>time [without time zone]</td>
<td>04:05:06</td>
<td>PYTIME</td>
<td>time</td>
</tr>
<tr>
<td>time with time zone</td>
<td>04:05:06-07</td>
<td>PYTIME</td>
<td>time</td>
</tr>
</tbody>
</table>
<p>Arrow can be used to handle each of these 5 types, via its single <tt>Arrow</tt> class. Here's how you set up the mappings:</p>
<blockquote><code>
import arrow
import psycopg2.extensions
def adapt_arrow_to_psql(value):
"""Formats an Arrow object as a quoted string for use in a SQL statement."""
# assume Arrow object is being used for TIME datatype if date is 1900 or earlier
if value.year <= 1900:
value = value.format("HH:mm:ss.SZ")
elif value == arrow.Arrow.max:
value = "infinity"
elif value == arrow.Arrow.min:
value = "-infinity"
return psycopg2.extensions.AsIs("'{}'".format(value))
# register adapter to format Arrow objects when passed as parameters to SQL statements
psycopg2.extensions.register_adapter(arrow.Arrow, adapt_arrow_to_psql)
def cast_psql_date_to_arrow(value, conn):
"""Parses a SQL timestamp or date string to an Arrow object."""
# handle NULL and special "infinity"/"-infinity" values
if not value:
return None
elif value == "infinity":
return arrow.Arrow.max
elif value == "-infinity":
return arrow.Arrow.min
return arrow.get(value)
def cast_psql_time_to_arrow(value, conn):
"""Parses a SQL time string to an Arrow object."""
# handle NULL
if not value:
return None
# handle TIME, TIME with fractional seconds (.S), and TIME WITH TIME ZONE (Z)
return arrow.get(value, ["HH:mm:ss", "HH:mm:ss.S", "HH:mm:ssZ", "HH:mm:ss.SZ"])
# override default timestamp/date converters
# to convert from SQL timestamp/date results to Arrow objects
psycopg2.extensions.register_type(psycopg2.extensions.new_type(
(
psycopg2.extensions.PYDATETIME.values +
psycopg2.extensions.PYDATETIMETZ.values +
psycopg2.extensions.PYDATE.values
),
"ARROW",
cast_psql_date_to_arrow,
))
# override default time converter to convert from SQL time results to Arrow objects
psycopg2.extensions.register_type(psycopg2.extensions.new_type(
psycopg2.extensions.PYTIME.values, "ARROW_TIME", cast_psql_time_to_arrow
))
</code></blockquote>
<p>The 3 slightly tricky bits are:</p>
<ol>
<li>Deciding whether to format an Arrow object as a date or a time (in <tt>adapt_arrow_to_psql()</tt>) — you may want to handle it differently, but since Arrow will parse times without dates as occurring on <tt>"0001-01-01"</tt>, the simplest thing to do is assume a date with an early year (like 1900 or earlier) represents a time instead of a date (which allows round-tripping of times from PostgreSQL to Arrow and back).</li>
<li>Handling PostgreSQL's special <tt>"-infinity"</tt> and <tt>"infinity"</tt> values when converting between PostgreSQL and Arrow dates (in <tt>adapt_arrow_to_psql()</tt> and <tt>cast_psql_date_to_arrow()</tt>) — <tt>Arrow.min</tt> and <tt>Arrow.max</tt> are the closest equivalents.</li>
<li>Handling the 4 different time variants that PostgreSQL emits (in <tt>cast_psql_time_to_arrow()</tt>):
<ul>
<li><tt>"12:34:56"</tt> (no fractional seconds or time zone)</li>
<li><tt>"12:34:56.123456"</tt> (fractional seconds but no time zone)</li>
<li><tt>"12:34:56-07"</tt> (no fractional seconds but time zone)</li>
<li><tt>"12:34:56.123456-07"</tt> (fractional seconds and time zone)</li>
</ul>
</li>
</ol>
<p>With those mappings in place, you can now use Arrow objects natively with psycopg2:</p>
<blockquote><code>
import arrow
import psycopg2
def test_datetimes():
conn = psycopg2.connect(dbname="mydbname", user="myuser")
try:
cur = conn.cursor()
cur.execute("""
CREATE TABLE foo (
id SERIAL PRIMARY KEY,
dt TIMESTAMP,
dtz TIMESTAMP WITH TIME ZONE,
d DATE,
t TIME,
twtz TIME WITH TIME ZONE
)
""")
cur.execute(
"INSERT INTO foo (dt, dtz, d, t, twtz) VALUES (%s, %s, %s, %s, %s)",
(
arrow.get("2001-02-03 04:05:06"),
arrow.get("2001-02-03 04:05:06-07"),
arrow.get("2001-02-03"),
arrow.get("04:05:06", "HH:mm:ss"),
arrow.get("04:05:06-07", "HH:mm:ssZ"),
),
)
cur.execute("SELECT * FROM foo")
result = cur.fetchone()
assert result[1] == arrow.get("2001-02-03 04:05:06")
assert result[2] == arrow.get("2001-02-03 04:05:06-07")
assert result[3] == arrow.get("2001-02-03")
assert result[4] == arrow.get("04:05:06", "HH:mm:ss")
assert result[5] == arrow.get("04:05:06-07", "HH:mm:ssZ")
finally:
conn.rollback()
</code></blockquote>
<p>Or with the Peewee ORM, you can use Peewee's built-in date/time fields, and pass and receive Arrow objects to/from those fields:</p>
<blockquote><code>
import arrow
import peewee
import playhouse.postgres_ext
db = playhouse.postgres_ext.PostgresqlExtDatabase("mydbname", user="myuser")
class Foo(peewee.Model):
dt = peewee.DateTimeField(
default=arrow.utcnow,
constraints=[peewee.SQL("DEFAULT (CURRENT_TIMESTAMP AT TIME ZONE 'UTC')")],
)
dtz = playhouse.postgres_ext.DateTimeTZField(
default=arrow.utcnow,
constraints=[peewee.SQL("DEFAULT (CURRENT_TIMESTAMP AT TIME ZONE 'UTC')")],
)
d = peewee.DateField(
default=arrow.utcnow,
constraints=[peewee.SQL("DEFAULT (CURRENT_DATE AT TIME ZONE 'UTC')")],
)
t = peewee.TimeField(
default=lambda: arrow.utcnow().time(),
constraints=[peewee.SQL("DEFAULT (CURRENT_TIME AT TIME ZONE 'UTC')")],
)
class Meta:
database = db
def test_datetimes():
with db.transaction() as tx:
try:
Foo.create_table()
result = Foo.get_by_id(
Foo.create(
dt=arrow.get("2001-02-03 04:05:06"),
dtz=arrow.get("2001-02-03 04:05:06-07"),
d=arrow.get("2001-02-03"),
t=arrow.get("04:05:06", "HH:mm:ss"),
).id
)
assert result.dt == arrow.get("2001-02-03 04:05:06")
assert result.dtz == arrow.get("2001-02-03 04:05:06-07")
assert result.d == arrow.get("2001-02-03")
assert result.t == arrow.get("04:05:06", "HH:mm:ss")
finally:
tx.rollback()
</code></blockquote>
Justin Ludwighttp://www.blogger.com/profile/03245749869056259124noreply@blogger.com0tag:blogger.com,1999:blog-3778768890472614719.post-19501798719685029442018-10-07T15:05:00.000-07:002018-10-07T15:05:25.001-07:00Unbricking My TRENDnet TEW-812DRU Wireless Router<p>Upgrading my <a href="http://www.trendnet.com/support/supportdetail.asp?prod=105_TEW-812DRU">TRENDnet TEW-812DRU v2</a> router with <a href="https://dd-wrt.com/">DD-WRT</a> firmware sometimes goes smoothly, and sometimes not. Usually if the upgrade fails on the first try, I can just unplug the router, wait 10 seconds, plug it in again, wait for the web UI to come up again, re-upload the firmware (and wait), and the upgrade will work on the second try.</p>
<p>But sometimes the router won't boot up correctly. All the blinking lights come on as normal, but it doesn't do any actual routing — or provide any DHCP services, which makes the router look bricked, even for devices connected to it physically with an ethernet cord.</p>
<p>But fortunately, it's not actually bricked. The router still grabs its usual local address (192.168.1.1, if you haven't configured it to be something else), and runs its nifty "TRENDnet - Emergency miniWeb Server" on port 80. The emergency page served up allows you to upload a new firmware image — and every time (so far) that I've gotten to that page, I've simply been able to upload the firmware image I've been trying to install (ie the latest <tt>trendnet-812dru-webflash.bin</tt> file from DD-WRT); and the router accepts it, installs it, and reboots itself, and everything is back to normal and happy in a few minutes.</p>
<p>The trick to accessing the router when its usual networking services are down is to 1) connect a computer to the router via wired ethernet connection (if you don't have one set up that way already), and 2) configure that computer with a static IP on the router's local subnet.</p>
<p>Since I'm running my router at 192.168.1.1, I just set the computer's static IP address to 192.168.1.10, and point its browser to http://192.168.1.1. The emergency web server seems to listen only for a minute or two after booting, though, and then goes way; so if the emergency page won't load, I unplug the router, wait 10 seconds, and plug it in again.</p>
<p>And since that wired computer is running Ubuntu 16.04 (with a wired interface named <tt>enp1s2f3</tt> — look it up via a command like <tt>ifconfig</tt> or <tt>ip address</tt> etc), I set its static IP address by adding the following to my <tt>/etc/network/interfaces</tt>:</p>
<blockquote><code>iface enp1s2f3 inet static
address 192.168.1.10
netmask 255.255.255.0
gateway 192.168.1.1
dns-nameservers 192.168.1.1
</code></blockquote>
<p>And then run <tt>sudo service network-manager stop</tt> to make NetworkManager cool its butt, and <tt>sudo service networking restart</tt> to use the static IP.</p>
Justin Ludwighttp://www.blogger.com/profile/03245749869056259124noreply@blogger.com0tag:blogger.com,1999:blog-3778768890472614719.post-17234658571716697232018-07-21T10:39:00.000-07:002018-10-07T15:06:04.855-07:00DD-WRT Firmware for TRENDnet TEW-812DRU Wireless Router<p>I'm lucky enough to get gigabit internet access at home, from <a href="https://waveg.wavebroadband.com/">Wave Broadband</a>, with which I'm quite happy. And I don't even need a modem — with my apartment building, I can just jack directly into the Ethernet port in my living room wall. I originally got a Kasada router, but its firmware hasn't been updated in a couple years, so I decided to get a new router that I knew would be updateable with the Free Software <a href="https://dd-wrt.com/">DD-WRT</a> firmware.</p>
<p>I got a <a href="http://www.trendnet.com/support/supportdetail.asp?prod=105_TEW-812DRU">TRENDnet TEW-812DRU v2</a>, which although it's like 5 years old, is as fast as I need (supporting gigabit ethernet, plus 1.3 Gbps 802.11ac and 450 Mbps 802.11n wireless on separate 5 GHz and 2.4 GHz channels) — and a quarter of the price of comparable new routers. And, importantly, it looked like it was well supported by DD-WRT.</p>
<p>And it did turn out to be well supported by DD-WRT. Having first read through all the forum posts about the TEW-812DRU, I found that, unlike some other routers, <a href="https://forum.dd-wrt.com/phpBB2/viewtopic.php?p=1023598#1023598">you don't need anything special</a> to use DD-WRT on the TEW-812DRU — just upload the new firmware to the router through its web UI and let it do its thing, no tricks needed.</p>
<p>So the first thing I did when I plugged in the router was login to its web UI and flash it with the "Open Source" firmware I downloaded from the <a href="http://www.trendnet.com/support/downloads.asp?SUBTYPE_ID=1674">TRENDnet TEW-812DRU downloads page</a>. That turned out to be DD-WRT v24-sp2 r23194, compiled on 12/21/2013. I was happy it worked, but that firmware was just way too old.</p>
<p>So next I looked up the TEW-812DRU in the <a href="https://dd-wrt.com/support/router-database/">DD-WRT router database</a>, and that prompted me to download something labeled DD-WRT v24-sp2 r23804 (but turned out actually to be r23808), compiled on 3/27/2014. I flashed that firmware through the web UI — but when the router rebooted, it presented me with an "Emergency Web Server" page. I went to look up what that meant via a working internet connection, and when I checked on the router again, the Emergency Web Server page had been replaced with the working DD-WRT web UI. I figure it must have just taken a little extra while for the router to boot everything up, no big deal.</p>
<p>But that firmware was also way older than I was hoping for, so I went searching through the downloads directory of the DD-WRT site — and finally found the latest version of the TEW-812DRU firmware here:</p>
<p><a href="https://download1.dd-wrt.com/dd-wrtv2/downloads/betas/2018/07-16-2018-r36330/trendnet-812DRUv2/">https://download1.dd-wrt.com/dd-wrtv2/downloads/betas/2018/07-16-2018-r36330/trendnet-812DRUv2/</a></p>
<p>I flashed that firmware through the router's web UI, and was very pleased to see the router reboot with no issues at all, happily running DD-WRT v3.0-r36330 mini — compiled just a few days ago on 7/16/2018. Finally, peace of mind that no bears, pandas, or kittens will be <a href="https://krebsonsecurity.com/2018/05/fbi-kindly-reboot-your-router-now-please/">making themselves at home</a> inside my router!</p>Justin Ludwighttp://www.blogger.com/profile/03245749869056259124noreply@blogger.com4tag:blogger.com,1999:blog-3778768890472614719.post-9592458855019400412018-06-24T17:58:00.000-07:002018-06-24T17:58:14.733-07:00Skip the Pre-Commit Hook on Git Rebase or Merge
<p>When you want to skip the git pre-commit hook for a single commit, it's easy — you just add the <tt>--no-verify</tt> flag (or <tt>-n</tt> for short) to the <tt>git commit</tt> command:</p>
<blockquote><pre>
git commit --no-verify
</pre></blockquote>
<p>But to skip multiple commits executed by another git command, like rebase or merge, the <tt>--no-verify</tt> flag doesn't work. The best way I've found to skip the pre-commit hook in that case is to code the hook to check for a custom environment variable (I like to use <tt>NO_VERIFY</tt>), and skip the pre-commit logic if it's not empty. For example, the <tt>pre-commit.sh</tt> script in my <a href="https://github.com/justinludwig/gjfpc-hook">Google Java Format Pre-Commit Hook</a> has a block of code like this at the top of the file, which skips the main functionality of the pre-commit hook if the <tt>NO_VERIFY</tt> environment variable has been set to anything other than an empty string:</p>
<blockquote><pre>
if [ "$NO_VERIFY" ]; then
echo 'pre-commit hook skipped' 1>&2
exit 0
fi
</pre></blockquote>
<p>So when I want to skip that pre-commit hook when doing a complicated rebase or merge, I simply run the following commands in the same shell:</p>
<blockquote><pre>
export NO_VERIFY=1
git rebase -i master # or `git merge some-branch` or whatever
export NO_VERIFY=
</pre></blockquote>
Justin Ludwighttp://www.blogger.com/profile/03245749869056259124noreply@blogger.com0