Jiby's toolbox

Jb Doyon’s personal website

Reproducible workspace deployment with Ansible and Vagrant

Posted on — Jun 11, 2020

Like many developers, I keep track of my configuration files in version control, to make moving to new machines is simpler and share code with others. This process started out for my Emacs config files at first, but grew to consume other aspects: bash functions, git aliases… Since 2016, I have accumulated over 500 git commits.

Now the dotfiles are tracked, the next biggest pain is to install packages and folder structures and obscure commands required to set up some subsystems needs which config files. Wouldn’t it be nice to have an all-in-one setup?

As an avid fan of Infrastructure as Code, and wanting to experiment some more on automation of machines, I’ve started using Ansible to create a reproducible workspace in the form of “playbooks”, complementing my dotfiles. To support this effort, I needed an automated way to get a fresh machine to experiment on, enabling from-scratch reproduction, for which I used Vagrant. I can now spin up and down a new Debian machine in one command. This evolved to eat up my entire emacs config and become something more: jibyconf.

In the rest of this article, we’ll set aside particularities of my dotfiles and explore how Ansible and Vagrant can be used together to make a reproducible machine setup that enables flexible deployments with minimal overhead.

First, here’s a demo of the result, sped up.

From Vagrant…

Nowadays, “reproducible development environment”, means Docker, first and foremost. Containers are a really popular technology, but for this specific approach, I chose to use Vagrant.

The best practice for containers has them come and go like cattle, static application-image-zips started and stopped at a whim, laser-focused on one task for which the image is optimized. What I wanted, in contrast, is a virtual machine: a long-lived pet, growing with me over time, likely outliving its initial purpose1. I also wanted to emulate not just applications, but an entire machine. Since they share the kernel of the host, containers don’t lend themselves to whole-machine emulation.

Vagrant automates virtual machine management. It abstracts away the detail of Virtualbox vs VMware and Linux vs Windows, exposing virtual machines through concepts like “providers” to run VMs on, and “provisioners” that configure machines for users. A Vagrantfile declares what the base machine is (the VM’s “box” it came from) and what commands to run on it for initial setup (e.g. apt-get install). Such a file can grow to be an elaborate dance made up of multiple machines running in private networks, running different OSes and server/client software, exposing network ports. Using Ruby, this can be extended far beyond what’s reasonable, all the while remaining a plain-text file that can be reproduced by other team members.

# See the online documentation at https://docs.vagrantup.com.
Vagrant.configure("2") do |config|
  # Use debian base box: https://app.vagrantup.com/debian/boxes/testing64
  config.vm.box = "debian/testing64"
  config.vm.hostname = "randy"  # This is randy. Say hello.

  # Expose a port to outside, useful for testing server softwares
  dev.vm.network "forwarded_port", guest: 8080, host: 80, id: "HTTP"

  # Run these shell commands on first boot
  config.vm.provision "shell", inline: <<-SHELL
    apt-get update
    apt-get install -y build-essential git
Code Snippet 1: A simple Vagrantfile

Common operations are very simple:

The great flexibility of Vagrant comes from its depth of integration with other tools: the supported Providers (Virtualbox, VMware, Docker2, AWS) to run machines on, and the Provisioners, automation tools like Chef, Puppet, Ansible, Salt to process these machines with.

… to Ansible

Tools like Ansible are usually classified as “configuration management”. They provide a declarative file format describing the expected end state of the machine (e.g. “Apache will be installed and running, with an /opt/myapp/ folder existing and populated by this binary with that checksum”…)

Of all of these, I first considered Ansible because of its simplicity. Compared to the heavy-handed corporate orchestration tools that exist, requiring daemons, servers, certificates, cloud accounts and time investment, Ansible was refreshingly simple: Give it a list of machines to connect to (called an inventory), and it can already provide value, connecting via SSH to the machines, checking service status, running arbitrary commands. Its strength though, appears when writing playbooks, describing expected system state as a list of tasks to run on each machine of the inventory.

A simple playbook can be written in a few minutes, and tested just as fast. From there, it can be grown to more complex operations with larger inventory to manage. At the core, Ansible just applies a playbook (list of tasks) on an inventory. This makes it a good starter automation software to test on an old laptop, from which to pick up tenets of Infrastructure as code, and start building a tech empire with.

- name: Developer documentation  (debian manpages + ~/dev/doc)
  hosts: dev
  tags: [dev, docs, slow]
    - name: Ensure ~/dev/docs exists
        path: "{{ ansible_home }}/dev/doc/"
        state: directory
        mode: '0755'
    - name: Install docs packages
      become: true
          - python3-doc
          - ansible-doc
          - debmake-doc
          - doc-debian
          - bash-doc
Code Snippet 2: A sample playbook borrowed from jibyconf, my dotfiles repository

Workspace reproducibility in action

Now we see how both Ansible and Vagrant broadly function on their own, let’s review my setup as demonstration of the toolchain. I wrote a playbook describing how to configure a dev machine of mine (install emacs, python, a mail client, etc), and have Vagrant bring up a new Debian machine from a base box, running the playbook on it once before handing over control. Once the machine is configured (“provisioned”), I am free to use it via SSH (or remote desktop, or clicking in Virtualbox)

Vagrant.configure("2") do |config|
  config.vm.define "dev", primary: true do |dev|
    dev.vm.box = "debian/testing64"
    dev.vm.hostname = "debby"  # Hey debby
    # Runs ansible on host, connecting via SSH (ansible assumed installed on
    # host, python on guest. See also Vagrant Provisioner "ansible_local")
    dev.vm.provision "dev-ansible", type: "ansible" do |ansible|
      ansible.playbook       = "playbook/main.yml"
      ansible.inventory_path = "vagrant_inventory"
      ansible.skip_tags      = "x11, slow"  # Can exclude tasks via tags
      ansible.limit          = "dev"
      # ansible.verbose        = true
  # [...] Other provisioners or machines
Code Snippet 3: Part of my Vagrantfile, describing debby, a debian machine provisioned by ansible

The trick is to remember that the Vagrant portion is separable from the Ansible one: Given a new laptop (not VM), I could ditch Vagrant and use Ansible alone to configure the device over SSH. The inventory also enables scaling, installing the same thing on more than one machine in parallel, with each getting tweaked values for hostnames, IPs, and port numbers.

For extra credits, Vagrantfiles can describe more than one machine, which is handy when describing a development VM + a test server, or front and backend systems. Here’s what the server portion of my Vagrantfile looks like:

Vagrant.configure("2") do |config|

  # [...] dev machine as seen above

  # A server on which to run the docker stuff
  config.vm.define "server" do |server|
    server.vm.box = "debian/buster64"
    server.vm.hostname = "sergei"  # Hey sergei.
    server.vm.network "forwarded_port", guest: 3000, host: 3000, id: "HTTP"
    server.vm.network "forwarded_port", guest: 222, host: 2224, id: "SSH"

    server.vm.provision "server-ansible", type: "ansible" do |ansible|
      ansible.playbook       = "playbook/server/main.yml"
      ansible.inventory_path = "vagrant_inventory"
      ansible.limit          = "servers"
      ansible.galaxy_role_file = "requirements.yml"
      ansible.galaxy_roles_path = "playbook/roles/"
      # ansible.verbose        = true
Code Snippet 4: Server section of Vagrantfile, Debian stable exposing ports for Gitea

This server runs a different playbook, which involves docker-based install of Gitea (a Github clone).

With multiple machines in same Vagrantfile, I can choose which machine I need and boot it selectively:

$ vagrant status
Current machine states:

dev                       running (virtualbox)
server                    not created (virtualbox)

This environment represents multiple VMs. The VMs are all listed
above with their current state. For more information about a specific
VM, run `vagrant status NAME`.

Booting specific machines is a matter of adding a keyword

vagrant up dev
# or
vagrant up server
# why not both?
vagrant up

Towards infrastructure as code

As developers, we evolved appropriate tooling for managing (plain text) code: we use diff to see examine changes, version control it with git, code review changes with our peers, and write tests to assert that it broadly does what we expect. But infrastructure is a blind spot for us: servers aren’t reviewable, diffable, rollback isn’t easy, and justification of change is usually non-existent, never mind documentation.

This is where infrastructure as code can improve our life, turning servers and other infrastructure into something we know how to manage: code. Back to the comfortable tooling and well understood practices.

As you can imagine, the setup I built is overkill for most people. That’s ok, it was partly a challenge for myself, a fun experiment in automating things for the sake of learning my tools. I’m quite proud of the result though, and I will likely write more about it, showing a more detailed tour in a future article.

I encourage you to try picking a computer you don’t care much about (old machine that would go to e-waste otherwise), read through the Ansible basic tutorials, create an inventory that connects to that machine, and write a simple playbook to install some package, and follow your curiosity in this safe playground machine, that you can tear down and rebuild easily. Then feel the power of these tools by thinking how easily the playbook now written could be used on the same machine over time to reset state after experiments, or how easily it would scale to 10 machines, to be used as a software deployment tool.

Adopting Infrastructure as Code will pay itself off quite quickly, by saving time wasted trying to diff two supposedly identical machines that exhibit different behaviour. Some may prefer using Docker instead, or want to swith to Terraform, and the usecase might involve managing a fleet of machines rather than the odd VM, and that’s fine. The important thing is: friends don’t let friends use hand-me-down machines, reproducibility is key.

  1. This pet/cattle distinction is quite important to the Infrastructure as Code movement, which explains part of its goal as “replace pets with cattle”. As mentioned, this specific workspace emulation project has a pet VM as intended output, which goes against these precepts. More common uses of devops tools involve fleet management, which fits the pet/cattle division more.

  2. Yes, Vagrant can use Docker as backend to provide virtual machine services. It’s crazy, and my brain hurts too. This is awesome, though, as it opens up Vagrantfiles to even broader setups. Imagine a single Vagrantfile used for local testing during development, describing 2 machines: the docker container for your backend server code, built via Dockerfile in the same folder, and a Windows client to launch as VM via Virtualbox. This Vagrant setup enables local machines for fast development cycle, but allow Vagrant to be skipped in production but reusing the Docker image and potentially playbook for server/client deployment.