Sources
Discovering repos using a Source
Sometimes, the repos we want to apply patches to is a dynamic thing, coming from tooling, like a Github repository search, some compliance tool report, service catalogue, etc.
To address this, mass-driver can use a Source plugin to discover repositories to apply activities to.
# An Activity file with a file-list Source
[mass-driver.source]
source_name = "repo-list"
# Source 'repo-list' takes a 'repos' list of cloneable URLs:
[mass-driver.source.source_config]
repos = [
"git@github.com:OverkillGuy/mass-driver.git",
"git@github.com:OverkillGuy/mass-driver-plugins.git",
]
Because we included a Source
, we can omit the CLI flags --repo-path
or
--repo-filelist
, to instead rely on the activity’s config to discover the
repos.
mass-driver run activity.toml
Smarter Sources can use more elaborate parameters, maybe even secret parameters like API tokens.
Note that to pass secrets safely at runtime, config parameters passed via
source_config
in file format can be passed as envvar, using prefix SOURCE_
.
So we could have avoided the repos
entry in file, by providing a
SOURCE_REPOS
envvar instead. This feature works because the Source class
derives from Pydantic.BaseSettings
.
As a Source
developer, though, you should really look into usage of
Pydantic.SecretStr
to avoid leaking the secret when config or result is
stored. See Pydantic docs on Secret
fields.
Creating a Source
Sources are mass-driver plugins that map to
mass_driver.models.repository.Source
. Let’s create one.
First, we import relevant bits:
from mass_driver.models.repository import IndexedRepos, SourcedRepo, RepoUrl, Source
Remembering that:
IndexedRepos = dict[RepoID, SourcedRepo]
So we now write up a new class:
class RepolistSource(Source):
"""A Source that just returns a pre-configured list of repositories"""
repos: list[RepoUrl]
"""The configured list of repositories to use, as list of cloneable URL"""
def discover(self) -> IndexedRepos:
"""Discover a list of repositories"""
return {url: SourcedRepo(clone_url=url, repo_id=url) for url in self.repos}
This class, taking a parameter repos
, generates
SourcedRepo
objects when calling
discover()
, as a dictionary
indexed by RepoID
(basically a string).
The only constraint on RepoID
(type
being an alias of str
) is that the string key is unique, so in this case we
use the git clone
URL, which is guaranteed unique. Smarter Sources will use
something shorter, as adequate.
Don’t use str
for secret fields!
When passing sensitive config like API tokens, you should ensure that dumping
the mass-driver config doesn’t disclose any secret value. Pydantic has specific
types for that, providing the pydantic.SecretStr
type, see Pydantic
docs on Secret
fields. These field
types never print their content when represented as string, requiring a call to
my_secret_field.get_secret_value()
to actually disclose the secret.
Note the patch_data
field of SourcedRepo
,
unused in this sample Source, is an arbitrary dictionary under the
Source
’s control, perfect to provide
per-repo data extracted from the source that will be relevant to make migration
against; For instance the file name to fix from some reporting tool…
To package this plugin for running it, we add it to the pyproject.toml
plugins:
[tool.poetry.plugins.'massdriver.sources']
repo-list = 'mass_driver.sources.simple:RepolistSource'
Prove it is available via mass-driver sources
:
INFO:root:Available sources:
INFO:root:csv-filelist
INFO:root:github-app-search
INFO:root:github-search
INFO:root:repo-filelist
INFO:root:repo-list
INFO:root:template-filelist
Note
For a more elaborate Source, take a look at the pyGithub
-enabled GithubPersonalSource
, using inheritance to enable two auth methods for pyGithub
, using envvars for secret tokens.
Testing a Source
There is no particular way to test a Source
other than using pytest
on your
own.
This is because Sources are usually API calls of sorts, which are hard to test for, because of the requirement to mock API calls, and the lack of realism this provides.
Warning
The Source subsystem, compared to other mass-driver plugins, bubbles up exceptions to the root, aborting any ongoing activity if left unchecked. This is because the repo discovery process is a critical part of mass-driver runs: with no proper list of repo, there is nothing to patch over! Compare to other plugins, where a single PatchDriver erroring over a single repo does not compromise other migrations on other repos.