Friday, August 16, 2013

Jenkins EC2 Slave Plugin

Just finished configuring our Jenkins build machine to use the Jenkins EC2 Plugin (currently at version 1.18), which allows the Jenkins server to spin up AWS EC2 instances on demand to use as slaves for selected build jobs. It's very cool, and requires only a little bit of configuration to get it running your existing jobs automatically on EC2 slaves. These were the steps I had to take:

Create an IAM user for Jenkins

The Jenkins server needs access to your AWS account in order to run and kill EC2 instances; the best way to enable this is to create a separate IAM user that is used only by Jenkins, and has only the minimum permissions required. I used the IAM section of the AWS Console for this (although you can also do it via command line). When you create the new IAM user, create an access key for the user, and make sure you save the secret key part of it, since AWS does not store the secret key (you have to generate a new access key if you lose the secret part of it). You will use this access key when you configure the Jenkins EC2 plugin.

Once you create the user, you need to attach a policy to it that will allow the user to run and kill EC2 instances. Via trial-and-error, I found that these were the minimum permissions currently required by the Jenkins EC2 plugin (I limited it to a single region, us-west-2; if you want to allow Jenkins to manage instances in all regions, remove the ec2:Region condition):

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "ec2:DescribeRegions"
            ],
            "Effect": "Allow",
            "Resource": "*"
        },
        {
            "Action": [
                "ec2:CreateTags",
                "ec2:DescribeInstances",
                "ec2:DescribeKeyPairs",
                "ec2:GetConsoleOutput",
                "ec2:RunInstances",
                "ec2:StartInstances",
                "ec2:StopInstances",
                "ec2:TerminateInstances"
            ],
            "Effect": "Allow",
            "Resource": "*",
            "Condition": {
                "StringEquals": {
                    "ec2:Region": "us-west-2"
                }
            }
        }
    ]
}

Create an SSH key pair for Jenkins

Once the Jenkins server starts a new EC2 slave, it will connect to it via SSH, using public-key authentication. At minimum, you need to provide Jenkins with the private key of a user account that has password-less sudo privileges on the slave; the easiest way to do this with a stock AMI image is to just let Jenkins use the image's default account via the EC2-registered key-pair used to boot it.

I just used the stock Ubuntu AMI (12.04, 64-bit intance-store variety), where the default user (ubuntu) has password-less sudo privileges. And I created an SSH key-pair for Jenkins to use via the EC2 section of the AWS Console (although you can also do it via the command line). As with the secret part of the IAM user's access key, make sure you save the private key for the key pair, since AWS does not store it (you'll have to generate a new key pair if you lose the private key).

Create an EC2 security-group for the slave

Whenever you launch an EC2 instance, you must specify the "security group" to which the instance belongs. This is basically just a set of firewall rules for the built-in EC2 firewall that restrict inbound access to a limited set of ports (optionally from a limited set of hosts). You can change the firewall rules for a group even while that group has running instances, but you can't switch an instance to use a different group once the instance has been launched. So I recommend creating a separate group just for your Jenkins slaves, even if the group has the same rules as you use for other security groups (so that you can change the rules for the different groups independently, should the need ever arise).

The only port you'll need inbound access to on the slaves is 22 (SSH). Optionally, you can set the source for that rule to the IP address of the Jenkins master server (if you want to disallow any inbound access to the slave other than from the master).

Create an init script for the slave

When Jenkins starts a new EC2 slave (and once the slave has booted and Jenkins connects to it), Jenkins will run a script that you specify to prepare the slave to run builds. Jenkins automatically will bootstrap the Jenkins-specific environment on the slave (once your init script has run), so you really only need to setup a few things, like java and git (or svn, hg, etc).

Following is the init script I used with the stock Ubuntu AMI. It installs a few things via Apt; installs a few particular versions of Grails that we use; and then installs and runs the latest version of PhantomJS (which we need to have running with some specific arguments for our functional tests). It also sets-up a working directory for Jenkins to use as the ubuntu user at /srv/jenkins; creates a bigger swap file than comes with the Ubuntu AMIs; and moves the /srv, /var, and /tmp directories to the faster "ephemeral" drive on the EC2 instance (mounted on the smaller EC2 instances at /mnt):

#!/bin/sh

# keep trying to install until it works
for I in 1 2 3; do
    sudo DEBIAN_FRONTEND=noninteractive apt-get -y -q update
    sudo DEBIAN_FRONTEND=noninteractive apt-get -y -q upgrade
    sudo DEBIAN_FRONTEND=noninteractive apt-get -y -q install \
        git-core \
        groovy \
        openjdk-6-jdk \
        vim \
        zip
    sleep 1
done

# fix missing java links
if [ ! -e /usr/lib/jvm/default-java ]; then
    sudo ln -s /usr/lib/jvm/java-6-openjdk-amd64 /usr/lib/jvm/default-java
fi
if [ ! -e /usr/lib/jvm/java-6-openjdk ]; then
    sudo ln -s /usr/lib/jvm/java-6-openjdk-amd64 /usr/lib/jvm/java-6-openjdk
fi

# download grails if necessary
if [ ! -e /opt/grails-2.1.0 ]; then
    # download grails zip
    if [ ! -e "/tmp/grails.zip" ]; then
        wget -nd -O /tmp/grails.zip http://dist.springframework.org.s3.amazonaws.com/release/GRAILS/grails-1.3.7.zip
    fi
    # unzip grails
    unzip /tmp/grails.zip
    # move to /opt
    sudo mv grails-1.3.7 /opt/.

    # download grails zip
    if [ ! -e "/tmp/grails2.zip" ]; then
        wget -nd -O /tmp/grails2.zip http://dist.springframework.org.s3.amazonaws.com/release/GRAILS/grails-2.1.0.zip
    fi
    # unzip grails
    unzip /tmp/grails2.zip
    # move to /opt
    sudo mv grails-2.1.0 /opt/.

    # expand max-memory size for grails
    sudo perl -pli -e 's/Xmx\d+/Xmx2048/; s/MaxPermSize=\d+/MaxPermSize=1024/' /opt/grails*/bin/startGrails
fi

# download phantomjs if necessary
HAS_PHANTOMJS=`whereis phantomjs | grep bin`
if [ -z "$HAS_PHANTOMJS" ]; then
    # download phantomjs binary
    if [ ! -e "/tmp/phantomjs.tar.bz2" ]; then
        wget -nd -O /tmp/phantomjs.tar.bz2 https://phantomjs.googlecode.com/files/phantomjs-1.9.1-linux-x86_64.tar.bz2
    fi
    # unzip phantomjs
    tar xf /tmp/phantomjs.tar.bz2
    # move to /opt
    sudo mv phantomjs-1.9.1-linux-x86_64 /opt/.
    sudo ln -s /opt/phantomjs-1.9.1-linux-x86_64 /opt/phantomjs
    # run in background
    nohup /opt/phantomjs/bin/phantomjs \
        --webdriver=7483 \
        --webdriver-logfile=webdriver.log \
        --ignore-ssl-errors=true \
        &>phantomjs.log &
fi

# creating jenkins working directory
sudo mkdir /srv/jenkins && sudo chown ubuntu:ubuntu /srv/jenkins

# add more swap
SWAPFILE=/mnt/swap1
if [ ! -f $SWAPFILE ]; then
    # creates 2G (1M * 2K) /mnt/swap1 file
    sudo dd if=/dev/zero of=$SWAPFILE bs=1M count=2K
    sudo chmod 600 $SWAPFILE
    sudo mkswap $SWAPFILE

    # add new swap to config and start using it
    echo "$SWAPFILE none swap defaults 0 0" | sudo tee -a /etc/fstab
    sudo swapon -a
fi

# move /tmp to larger/faster/transient /mnt volume
if [ ! -e /mnt/tmp ]; then
    # move each directory, then create link from old location to new
    for DIR in /srv /tmp /var; do
        echo $DIR
        sudo mv $DIR /mnt$DIR
        sudo mkdir $DIR
        echo "/mnt$DIR  $DIR    none    bind    0   0" | sudo tee -a /etc/fstab
        sudo mount $DIR
    done
fi

Among a few quirky things that the script does is re-try apt-get three times — I found that at least half the time the first run of apt-get would fail to update/install any packages without any comprehensible error messages (I think maybe because the system was still lazy-initializing some components?). Re-running it after a second seems to solve the issue, however.

Also, while installing only the openjdk-6-jdk package installs enough of java 6 for our build purposes, it doesn't install some of the links in the /usr/lib/jvm directory through which we reference the jdk or java home in our build scripts (these links usually are created by some other unknown java packages); so the script manually creates these java links.

And finally, the default JVM memory settings used by grails aren't sufficient for running or testing our apps; the quick fix for this is just to overwrite the Xmx and MaxPermSize settings in the default GRAILS_OPTS of the startGrails script that comes with each version of grails.

Configure the Jenkins EC2 plugin

Now, finally, you're ready to configure the EC2 plugin itself. Once you've installed the plugin, you navigate to the main "Manage Jenkins" > "Configure System" page, and scroll down near the bottom to the "Cloud" section. There, you click the "Add a new cloud" button, and select the "Amazon EC2" option. This will display the UI for configuring the EC2 plugin. The first things you configure are the access key that you created for the Jenkins IAM user (via the "Access Key ID" and "Secret Access Key" fields). If you've configured the permissions for the Jenkins IAM user correctly, this will populate the "Region" dropdown, and allow you to select the AWS region to use. Next, paste in the text from the secret-key pem file for the SSH key pair that you created for Jenkins in the "EC2 Key Pair's Private Key" field (this text will start with the line "-----BEGIN RSA PRIVATE KEY-----").

In the "AMIs" section of the configuration UI, you'll configure the per-slave settings. If you want, you can generate multiple slave profiles (from different AMI images, or different instance sizes, or different initialization parameters, etc). But you can start with just one, which you can add by clicking the "Add" button at the bottom of the "AMIs" section.

For each slave profile, you configure the profile with a description ("Standard EC2 Slave"), as well as the ID of the AMI to use (I used the ami-5168f861 AMI, the current official Ubuntu 12.04, 64-bit instance-store AMI). Next, select the instance type; a "micro" instance is probably too small for just about anything other than a one-line shell job; a "small" instance may be fine for some jobs; but with most of our builds (which all include at least one grails build step), we get a 3-4x improvement over "small" instance times with a "medium" instance (which is 2x the price of a "small").

You can optionally select a specific availability-zone within your selected region in which to launch the slave; this matters a lot if you have an existing EBS volume to which you will attach to the slave, and it matters a little if you have other EC2 instances which the slave is going to access (like if the master Jenkins server is in EC2, or if you have a version-control repo in EC2 from which the slave is going to download source code, etc); otherwise you can just leave the "Availability Zone" field blank, and AWS automatically will launch the slave in whatever zone of your configured region that is least active when the slave is launched.

Enter the name of the EC2 security-group that you created for the slave in the "Security group names" field. Enter the path to the directory that Jenkins should use as its working directory (similar to the /var/lib/jenkins directory on the master); you probably created this in your slave init script (it should be writable by the user Jenkins uses on the slave; the directory that my init script created for this was /srv/jenkins). Specify the user Jeknins should run as on the slave in the "Remote user" field; on a stock Ubuntu AMI, this is ubuntu. And unless you use root for this user, enter sudo in the "Root command prefix" field.

Enter one or more labels (separated by spaces) in the "Labels" field. You can assign jobs to specific slave profiles via labels, so if you have multiple slave profiles, you may want to include a label for each distinguishing feature of the slave. For example, if you have one profile for a small instance with no DB, you might label it just "small"; and for another profile using a medium instance with a MySQL DB, you might label as "medium mysql". Then for a job that only needs a small instance, you can set the job to use the slave labeled "small"; and for a job that requires a medium instance, you can set it to use the slave labeled "medium"; and for a job that requires MySQL, you can set the job to use the slave with the "mysql" label. If you have only one slave profile, you can just use a simple label like "slave" or "ec2" (and then configure any job that you want to run on the slave with the "slave" or "ec2" label).

The default "Usage" setting for both the master Jenkins server and each slave server is to "Utilize this slave as much as possible". This means that Jenkins will not boot a slave for a job unless the job specifically has been configured (via label) to use a slave that currently is not running, or unless all the build-executors on the master currently are in use. If you instead change the slave's "Usage" setting to "Leave this machine for tied jobs only", Jenkins will use the slave only if it can't run the job on the master or any other slaves. See the "Usage" settings matrix table below for a clearer description of the interaction of the "Usage" setting between slaves and the master.

Set "Idle termination time" to the number of minutes a slave must sit at idle before Jenkins shuts it down (default is 30 minutes — if you use the slaves only for scheduled jobs you might want to cut this down to 0). Paste your init script into the "Init script" field. Jenkins executes the init script by writing your script to a file to the slave, setting execute permissions on it, and then running the file as the user you specified in the "Remote user" field — so make sure that you include #!/bin/sh at the top of the script (if it's a shell script, or the appropriate "sha-bang" if you you a different scripting language).

Click the "Advanced..." button at the bottom of the AMI's configuration section to access a few more options. A couple of settings that you may want to customize are "Number of executors" (you may want to set this to 1 — unless you intend to run multiple jobs on the same slave at the same time); and "Instance Cap", which is the maximum number of slaves from this profile that Jenkins can have running at the same time.

Click the "Save" button at the bottom of the page to save and apply your changes.

Configure the job-selecting behavior of the master

Once you've saved your first slave profile, go back to the same "Manage Jenkins" > "Configure System" page, and at the top of the page you'll find a new "Labels" and a new "Usage" field (just below the "# of executors" field). The "Usage" field determines how Jenkins utilizes the master for jobs. If you leave the "Utilize this slave as much as possible" option selected, whenever a job is triggered that either doesn't have a label, or is labeled so that it could be executed either by the master or by another slave, Jenkins will run the job on an already-running slave only if the slave has a free executor; but otherwise it will try to run the job on the master, and only boot a new slave if the master has no free executors. If you change the "Usage" field to "Leave this machine for tied jobs only", whenever a job is triggered that either doesn't have a label, or is labeled so that it could be executed either by the master or by another slave, Jenkins will first try a running slave, and then try booting a slave; and only try using the master if it can't boot any more slaves.

Here's a description of the interaction between the master and slave "Usage" settings in tabular form:

"Usage" setting matrix
Master
Utilize this slave as much as possibleLeave this machine for tied jobs only
Slave Utilize this slave as much as possible
  1. use executor on running slave if free
  2. use executor on master if free
  3. boot new slave if below instance cap
  4. wait
  1. use executor on running slave if free
  2. boot new slave if below instance cap
  3. use executor on master if free
  4. wait
Leave this machine for tied jobs only
  1. use executor on master if free
  2. use executor on running slave if free
  3. boot new slave if below instance cap
  4. wait
  1. use executor on running slave if free
  2. use executor on master if free
  3. boot new slave if below instance cap
  4. wait

Note that if a job is labeled with a label that the master doesn't have, Jenkins will not run it on the master regardless of the "Usage" setting — it will wait until it can boot or otherwise free up an executor on a slave that does have that label. So if there are jobs that you do want to run on the master, do give the master a label or two via the "Labels" field. For example, if you want to allow only small jobs and one or two master-specific jobs to run on the master, you might label the master "small master".

Earmark jobs for slaves with labels

The last step is to re-configure individual jobs to run on your new slaves. You can skip this step entirely if any slave can take any job — just configure the "Usage" setting of the master and slaves to indicate how Jenkins should utilize the slaves (if it should boot slaves to run jobs on them even if the master is free, or if it should max out the master before booting slaves).

Otherwise, navigate to the configuration page of each job for which you want to specify the type of slave to run the job, and check the "Restrict where this project can be run" checkbox at the bottom of the first section of the page. This will reveal the "Label expression" field; enter the label (or space-separated labels) that defines what kind of slave the job requires. For example, if the job requires a MySQL DB, you might enter "mysql" as the label (requiring a slave with the "mysql" label); or if the job can run on any small instance, you might enter "small" as the label (requiring a slave with the "small" label).

The next time you trigger a job that is labeled for a slave, Jenkins automatically will boot the slave (if no slaves of that type are currently running and have free executors), and run the job. The display of Jenkins' leftnav will also change, to include the list of executors on the slave in the "Build Executor" box (and when the slave is terminated, the slave's executors will be removed from this box).

7 comments:

  1. Nice write-up!

    I had to add the IAM "ec2:DescribeImages" action.

    David

    ReplyDelete
  2. Nice Blog. It helps a lot in setting up jenkins slaves.

    Just a quick question around this. My Jenkins setup is not available outside firewall. Is it required for Jenkins to be available over internet to be able to launch slave instances in EC2?

    Thanks in advance!!

    ReplyDelete
    Replies
    1. You don't need Jenkins to be available to the internet in order to launch regular EC2 slaves. The master (your Jenkins instance behind the firewall) launches/kills the slaves via the HTTP/HTTPS AWS API, and communicates with the slaves by opening SSH connections to them. So as long as your Jenkins instance has outbound SSH/HTTP/HTTPS access, you should be fine.

      The only case where you'd need Jenkins to be accessible from outside the firewall is if you wanted to use EC2 spot instances (a more complicated feature that you don't have to use). In that case, the slaves access the master over HTTP/HTTPS to let the master know when they've been launched (see https://wiki.jenkins-ci.org/display/JENKINS/Amazon+EC2+Plugin#AmazonEC2Plugin-SpotInstances).

      Delete
  3. I setup the ec2 plugin on jenkins and getting this error. What could be wrong?

    403, AWS Service: AmazonEC2, AWS Request ID: 00b7348d-8fb7-4880-99ee-d404d4bff8cd, AWS Error Code: UnauthorizedOperation, AWS Error Message: You are not authorized to perform this operation.

    ReplyDelete
    Replies
    1. I was having the same issue. Updating the user access policy to a power user through the console resolved this problem.

      Delete
  4. I've a quick question. How can I inject environment variable for a Jenkins slave? Is it possible?

    ReplyDelete