# Scanning

Mass-driver is mostly about generating bulk changes, but migrations require a
good idea of what we need to change. A good amount of thought needs to be given
to finding patterns in repositories before any change can happen.

Let's see how mass-driver scans enable exploration of our repositories by
writing simple python functions, reusable as plugins.

## Using mass-driver to scan

```{include} ../../README.md
---
start-after: "<!-- scanner-activity -->"
end-before: "## Reviewing bulk PR status"
```

## Creating a Scanner

Scanners are mass-driver plugins that map to functions. Let's create one.

First some imports:

```python
from pathlib import Path
from typing import Any
```

Then the function:

```{literalinclude} ../../src/mass_driver/scanners/basic_scanners.py
---
language: python
pyobject: dockerfile_from_scanner
```

This scanner will try to open the repo's `Dockerfile`, and if any exist, will
report lines that start with the `FROM` keyword.

Note that the scanner is built to report the same dict keys in all cases.

We suggest scanner functions return flat dictionaries (simple key, simple value,
no nesting), to make it easy to map the returned content to database-style rows
of data (or simply CSV). Try to take into account every check that can go wrong
(like missing file/folder), and find a way to report on the specific of this
repository uniquely.

To package this plugin for running it, we add it to the `pyproject.toml` plugins:

```toml
[tool.poetry.plugins.'massdriver.scanners']
dockerfile-from = 'my_lovely_package.dockerfile:dockerfile_from_scanner'
```

Prove it is available via `mass-driver scanners`:

```none
Available scanners:
root-files
dockerfile-from
```

And now we can define an activity file with our scanner toggled on, possibly as
part of other scanners.

```toml
# dockerfile_scan.toml
[mass-driver.scan]
scanner_names = ["dockerfile-from", "root-files"]
```

Now let's generate the scan reports:

```shell
mass-driver run dockerfile_scan.toml --repo-filelist repos.txt
```

## Testing the scanner

Before running it across many many repos, let's test it with sample data. For
this, we have a couple handy test fixtures available that turn testing into a
trivial matter.

So let's start by writing a new empty file `tests/test_scanner.py`, and a folder
structure like this:

```none
tests/
├── test_scanner/
│   ├── dockerfile/
│   │   ├── activity.toml
│   │   ├── Dockerfile
│   │   └── scan_results.json
│   └── multiple-dockerfiles/
│       ├── activity.toml
│       ├── Dockerfile
│       └── scan_results.json
└── test_scanner.py
```

The intent is for each subfolder under `tests/test_scanner/` to be pretend
repos, with an `activity.toml` selecting the scanners, and a `scan_results.json`
to report on the repo's scan.

As for the test file, here is the contents:

```{literalinclude} ../../src/mass_driver/tests/test_scanner.py
---
language: python
---
```

Note how the test relies on the `massdrive_scan_check` fixture to run scan
against specific folder. This wraps around the `massdrive` fixture.

```{note}
The scanner subsystem will swallow exceptions, using the `scannerror`
key to define two things: `exception: str`, and `backtrace: list[str]`.
```