Sources
Discovering repos using a Source
Sometimes, the repos we want to apply patches to is a dynamic thing, coming from tooling, like a Github repository search, some compliance tool report, service catalogue, etc.
To address this, mass-driver can use a Source plugin to discover repositories to apply activities to.
# An Activity file with a file-list Source
[mass-driver.source]
source_name = "repo-list"
# Source 'repo-list' takes a 'repos' list of cloneable URLs:
[mass-driver.source.source_config]
repos = [
"git@github.com:OverkillGuy/mass-driver.git",
"git@github.com:OverkillGuy/mass-driver-plugins.git",
]
Because we included a Source, we can omit the CLI flags --repo-path or
--repo-filelist, to instead rely on the activity’s config to discover the
repos.
mass-driver run activity.toml
Smarter Sources can use more elaborate parameters, maybe even secret parameters like API tokens.
Note that to pass secrets safely at runtime, config parameters passed via
source_config in file format can be passed as envvar, using prefix SOURCE_.
So we could have avoided the repos entry in file, by providing a
SOURCE_REPOS envvar instead. This feature works because the Source class
derives from Pydantic.BaseSettings.
As a Source developer, though, you should really look into usage of
Pydantic.SecretStr to avoid leaking the secret when config or result is
stored. See Pydantic docs on Secret
fields.
Creating a Source
Sources are mass-driver plugins that map to
mass_driver.models.repository.Source. Let’s create one.
First, we import relevant bits:
from mass_driver.models.repository import IndexedRepos, SourcedRepo, RepoUrl, Source
Remembering that:
IndexedRepos = dict[RepoID, SourcedRepo]
So we now write up a new class:
class RepolistSource(Source):
"""A Source that just returns a pre-configured list of repositories"""
repos: list[RepoUrl]
"""The configured list of repositories to use, as list of cloneable URL"""
def discover(self) -> IndexedRepos:
"""Discover a list of repositories"""
return {url: SourcedRepo(clone_url=url, repo_id=url) for url in self.repos}
This class, taking a parameter repos, generates
SourcedRepo objects when calling
discover(), as a dictionary
indexed by RepoID (basically a string).
The only constraint on RepoID (type
being an alias of str) is that the string key is unique, so in this case we
use the git clone URL, which is guaranteed unique. Smarter Sources will use
something shorter, as adequate.
Don’t use str for secret fields!
When passing sensitive config like API tokens, you should ensure that dumping
the mass-driver config doesn’t disclose any secret value. Pydantic has specific
types for that, providing the pydantic.SecretStr type, see Pydantic
docs on Secret
fields. These field
types never print their content when represented as string, requiring a call to
my_secret_field.get_secret_value() to actually disclose the secret.
Note the patch_data field of SourcedRepo,
unused in this sample Source, is an arbitrary dictionary under the
Source’s control, perfect to provide
per-repo data extracted from the source that will be relevant to make migration
against; For instance the file name to fix from some reporting tool…
To package this plugin for running it, we add it to the pyproject.toml
plugins:
[tool.poetry.plugins.'massdriver.sources']
repo-list = 'mass_driver.sources.simple:RepolistSource'
Prove it is available via mass-driver sources:
INFO:root:Available sources:
INFO:root:csv-filelist
INFO:root:github-app-search
INFO:root:github-search
INFO:root:repo-filelist
INFO:root:repo-list
INFO:root:template-filelist
Note
For a more elaborate Source, take a look at the pyGithub-enabled GithubPersonalSource, using inheritance to enable two auth methods for pyGithub, using envvars for secret tokens.
Testing a Source
There is no particular way to test a Source other than using pytest on your
own.
This is because Sources are usually API calls of sorts, which are hard to test for, because of the requirement to mock API calls, and the lack of realism this provides.
Warning
The Source subsystem, compared to other mass-driver plugins, bubbles up exceptions to the root, aborting any ongoing activity if left unchecked. This is because the repo discovery process is a critical part of mass-driver runs: with no proper list of repo, there is nothing to patch over! Compare to other plugins, where a single PatchDriver erroring over a single repo does not compromise other migrations on other repos.