Scanning

Mass-driver is mostly about generating bulk changes, but migrations require a good idea of what we need to change. A good amount of thought needs to be given to finding patterns in repositories before any change can happen.

Let’s see how mass-driver scans enable exploration of our repositories by writing simple python functions, reusable as plugins.

Using mass-driver to scan

Let’s define an Activity file specifying a list of scanners to run:

# An Activity file for scanning
[mass-driver.scan]
scanner_names = ["root-files", "dockerfile-from"]

This can be run just like a migration:

mass-driver run scan.toml --repo-filelist repos.txt

Creating a Scanner

Scanners are mass-driver plugins that map to functions. Let’s create one.

First some imports:

from pathlib import Path
from typing import Any

Then the function:

def dockerfile_from_scanner(repo: Path) -> dict[str, Any]:
    """Report the repo's Dockerfile's FROM line(s)"""
    dockerfile_path = repo / "Dockerfile"
    dockerfile_exists = dockerfile_path.is_file()
    if not dockerfile_exists:
        return {"dockerfile_exists": False, "dockerfile_from": None}
    dkr_lines = dockerfile_path.read_text().splitlines()
    dkr_from_lines = [line for line in dkr_lines if line.startswith("FROM")]
    return {"dockerfile_exists": True, "dockerfile_from_lines": dkr_from_lines}

This scanner will try to open the repo’s Dockerfile, and if any exist, will report lines that start with the FROM keyword.

Note that the scanner is built to report the same dict keys in all cases.

We suggest scanner functions return flat dictionaries (simple key, simple value, no nesting), to make it easy to map the returned content to database-style rows of data (or simply CSV). Try to take into account every check that can go wrong (like missing file/folder), and find a way to report on the specific of this repository uniquely.

To package this plugin for running it, we add it to the pyproject.toml plugins:

[tool.poetry.plugins.'massdriver.scanners']
dockerfile-from = 'my_lovely_package.dockerfile:dockerfile_from_scanner'

Prove it is available via mass-driver scanners:

Available scanners:
root-files
dockerfile-from

And now we can define an activity file with our scanner toggled on, possibly as part of other scanners.

# dockerfile_scan.toml
[mass-driver.scan]
scanner_names = ["dockerfile-from", "root-files"]

Now let’s generate the scan reports:

mass-driver run dockerfile_scan.toml --repo-filelist repos.txt

Testing the scanner

Before running it across many many repos, let’s test it with sample data. For this, we have a couple handy test fixtures available that turn testing into a trivial matter.

So let’s start by writing a new empty file tests/test_scanner.py, and a folder structure like this:

tests/
├── test_scanner/
│   ├── dockerfile/
│   │   ├── activity.toml
│   │   ├── Dockerfile
│   │   └── scan_results.json
│   └── multiple-dockerfiles/
│       ├── activity.toml
│       ├── Dockerfile
│       └── scan_results.json
└── test_scanner.py

The intent is for each subfolder under tests/test_scanner/ to be pretend repos, with an activity.toml selecting the scanners, and a scan_results.json to report on the repo’s scan.

As for the test file, here is the contents:

"""Demonstrate the Scanner test fixture

Feature: Scanner test fixtures
  As a mass-driver plugin dev
  I need to test my future scanner on sample repos
  In order to ship scanner plugins efficiently
"""

from pathlib import Path

import pytest

from mass_driver.tests.fixtures import copy_folder, massdrive_scan_check

# Go from this filename.py to folder:
# ./test_scanner.py -> ./test_scanner/
TESTS_FOLDER = Path(__file__).with_suffix("")


@pytest.mark.parametrize(
    "test_folder", [f.name for f in TESTS_FOLDER.iterdir() if f.is_dir()]
)
def test_scanner(test_folder: Path, tmp_path):
    """Scenario: Check the sample folder scan results"""
    absolute_reference = TESTS_FOLDER / test_folder
    workdir = tmp_path / "repo"
    copy_folder(absolute_reference, workdir)
    result, ref = massdrive_scan_check(workdir)
    assert result == ref, "Scan results should match reference"

Note how the test relies on the massdrive_scan_check fixture to run scan against specific folder. This wraps around the massdrive fixture.

Note

The scanner subsystem will swallow exceptions, using the scannerror key to define two things: exception: str, and backtrace: list[str].