Sources

Discovering repos using a Source

Sometimes, the repos we want to apply patches to is a dynamic thing, coming from tooling, like a Github repository search, some compliance tool report, service catalogue, etc.

To address this, mass-driver can use a Source plugin to discover repositories to apply activities to.

# An Activity file with a file-list Source
[mass-driver.source]
source_name = "repo-list"
# Source 'repo-list' takes a 'repos' list of cloneable URLs:
[mass-driver.source.source_config]
repos = [
  "git@github.com:OverkillGuy/mass-driver.git",
  "git@github.com:OverkillGuy/mass-driver-plugins.git",
]

Because we included a Source, we can omit the CLI flags --repo-path or --repo-filelist, to instead rely on the activity’s config to discover the repos.

mass-driver run activity.toml

Smarter Sources can use more elaborate parameters, maybe even secret parameters like API tokens.

Note that to pass secrets safely at runtime, config parameters passed via source_config in file format can be passed as envvar, using prefix SOURCE_. So we could have avoided the repos entry in file, by providing a SOURCE_REPOS envvar instead. This feature works because the Source class derives from Pydantic.BaseSettings.

As a Source developer, though, you should really look into usage of Pydantic.SecretStr to avoid leaking the secret when config or result is stored. See Pydantic docs on Secret fields.

Creating a Source

Sources are mass-driver plugins that map to mass_driver.models.repository.Source. Let’s create one.

First, we import relevant bits:

from mass_driver.models.repository import IndexedRepos, SourcedRepo, RepoUrl, Source

Remembering that:

IndexedRepos = dict[RepoID, SourcedRepo]

So we now write up a new class:

class RepolistSource(Source):
    """A Source that just returns a pre-configured list of repositories"""

    repos: list[RepoUrl]
    """The configured list of repositories to use, as list of cloneable URL"""

    def discover(self) -> IndexedRepos:
        """Discover a list of repositories"""
        return {url: SourcedRepo(clone_url=url, repo_id=url) for url in self.repos}

This class, taking a parameter repos, generates SourcedRepo objects when calling discover(), as a dictionary indexed by RepoID (basically a string).

The only constraint on RepoID (type being an alias of str) is that the string key is unique, so in this case we use the git clone URL, which is guaranteed unique. Smarter Sources will use something shorter, as adequate.

Don’t use str for secret fields!

When passing sensitive config like API tokens, you should ensure that dumping the mass-driver config doesn’t disclose any secret value. Pydantic has specific types for that, providing the pydantic.SecretStr type, see Pydantic docs on Secret fields. These field types never print their content when represented as string, requiring a call to my_secret_field.get_secret_value() to actually disclose the secret.

Note the patch_data field of SourcedRepo, unused in this sample Source, is an arbitrary dictionary under the Source’s control, perfect to provide per-repo data extracted from the source that will be relevant to make migration against; For instance the file name to fix from some reporting tool…

To package this plugin for running it, we add it to the pyproject.toml plugins:

[tool.poetry.plugins.'massdriver.sources']
repo-list = 'mass_driver.sources.simple:RepolistSource'

Prove it is available via mass-driver sources:

INFO:root:Available sources:
INFO:root:csv-filelist
INFO:root:github-app-search
INFO:root:github-search
INFO:root:repo-filelist
INFO:root:repo-list
INFO:root:template-filelist

Note

For a more elaborate Source, take a look at the pyGithub-enabled GithubPersonalSource, using inheritance to enable two auth methods for pyGithub, using envvars for secret tokens.

Testing a Source

There is no particular way to test a Source other than using pytest on your own.

This is because Sources are usually API calls of sorts, which are hard to test for, because of the requirement to mock API calls, and the lack of realism this provides.

Warning

The Source subsystem, compared to other mass-driver plugins, bubbles up exceptions to the root, aborting any ongoing activity if left unchecked. This is because the repo discovery process is a critical part of mass-driver runs: with no proper list of repo, there is nothing to patch over! Compare to other plugins, where a single PatchDriver erroring over a single repo does not compromise other migrations on other repos.