Should actions be stored in a separate repo or nested in another - github-actions

What are the best practices around creating a Github Action?
There seem to be roughly three approaches
One repo = one Action
From these examples I clearly derive that 1 action = 1 repo.
action-repo
action.yml
...
With usage:
uses: org/action-repo#tag
"Normal" repo with nested Action
Some tend to just add the action to their repo like so:
repo
github-action
action.yml
...
Probably also with more than one action. This gives already longer imports like:
uses: org/repo/github-action#tag
"Normal" repo with nested/hidden action
This is the most special case I have seen:
repo
.github
actions
action1
action.yml
...
action2
action.yml
...
This setup leads to some weird imports in the usage of the actions like
uses: org/repo/.github/actions/action1#tag
Has anyone seen official docs around this?

Two weeks into GHA, seeing dozens of repositories I dare to self-answer my question.
As the initial comments suggested, the approach you choose depends mainly on your use case.
Let me re-use also part of that comment to finalize my summary (anyone feel free to comment I will take the argument into my list).
Approaches
org/action-repo
above called "1 action = 1 repo"
➕  Certainly, the most seen standard, many "best of breed" actions (e.g. checkout, setup-xyz, etc.) use this approach
➕  Transparent versioning
➕  Straight forward uses (no nested paths)
➕  Follow UNIX philosophy of doing one thing well
➕  Easy to document, have issues, and pull requests for your action only
➖  Additional repo
This is a suitable approach for when you expect your action to be adopted by a large community or you just feel like "public needs that".
org/repo/action
above called nested action
➕➖  The above comparison the other way around.
This approach is mainly suitable to maintain some smaller actions that you just casually want to offer from within your project. We use it in our case now to accumulate some mono-repo-specific actions in one place.
org/repo/.github/action(s)
This approach admittedly is the most special one and a derivation of the second. Generally, there is no real use case for it - conjuring one up it is possible e.g. in a mono-repo to abstract all actions into the .github folder to kind of collect them. On the other hand, you can do that too in org/repo/action(s).
Feel free to complete my list by commenting.

You can always refer to the documentation:
If you're developing an action for other people to use, we recommend keeping the action in its own repository instead of bundling it with other application code. This allows you to version, track, and release the action just like any other software.
Storing an action in its own repository makes it easier for the GitHub community to discover the action, narrows the scope of the code base for developers fixing issues and extending the action, and decouples the action's versioning from the versioning of other application code.
If you're building an action that you don't plan to make available to the public, you can store the action's files in any location in your repository. If you plan to combine action, workflow, and application code in a single repository, we recommend storing actions in the .github directory. For example, .github/actions/action-a and .github/actions/action-b.

Related

Return passing status on Github workflow when using paths-ignore

I'm using a Github workflow to run tests. Because the setup can take a while, we want to skip running the tests when no code was changed. So we are using paths-ignore like this:
on:
pull_request:
branches:
- develop
paths-ignore:
- '*.md'
The problem is that we have a protected branch here that requires a check to pass before a branch can be merged. There seems to be some workarounds https://github.community/t/feature-request-conditional-required-checks/16761/20 but they are pretty clunky. Is there an elegant and idiomatic way to return a passing status here for a job that was essentially skipped?
Elegant and idiomatic, evidently not. The conclusion elsewhere (GitHub community forum, Reddit) is that this is expected behavior, at least right now.
The two main workarounds people seem to be using are:
Run all required status checks on all PRs, even the slow ones. Sigh.
Use paths-filter or a homegrown alternative (example) inside the required workflows, as part of their execution, and skip the actual work and return success if no relevant files were changed.

Should I use a MarketPlace action instead of a plain bash `cp` command to copy files?

I am noticing there are many actions in the GitHub marketplace that do the same. Here is an example:
https://github.com/marketplace/actions/copy-file
Is there any benefit of using the GitHub marketplace action instead of plain bash commands? Do we have recommended practices guideline that helps to decide whether I use MarketPlace actions versus plain bash or command line
These actions don't seem to have any real value in my eyes...
Other than that, these run in docker and don't need cp, wget or curl to be available on the host, and they ensure a consistent version of their tools is used. If you're lucky these actions also run consistently the same way on Windows, Linux and Mac, where as your bash scripts may not run on Windows. But the action author would have to ensure this, it's not something that comes by default.
One thing that could be a reason to use these actions from the marketplace is that they can run as a post-step, which the run: script/bash/pwsh steps can't.
They aren't more stable or safer, unless you pin the actions on a commit-hash or fork it, the owner of the action can change the behavior of the action at any time. So, you are putting trust in the original author.
Many actions provide convenience functions, like better logging or output variables or the ability to safely pass in a credential, but these tasks seem to be more of an exercise in building an action by the author and they don't really serve a great purpose.
The documentation that comes with each of these actions, doesn't provide a clear reason to use these actions, the actions don't follow the preferred versioning scheme... I'd not use these.
So, when would you use an action from the marketplace...? In general actions, like certain cli's provide a specific purpose and an action should contain all the things it needs to run.
An action could contain a complex set of steps, ensure proper handling of arguments, issue special logging commands to make the output more human-readable or update the environment for tasks running further down in the workflow.
An action that adds this extra functionality on top of existing cli's makes it easier to pass data from one action to another or even from one job to another.
An action is also easier to re-use across repositories, so if you're using the same scripts in multiple repos, you could wrap them in an action and easily reference them from that one place instead of duplicating the script in each action workflow or adding the script to each repository.
GitHub provides little guidance on when to use an action or when an author should publish an action to the marketplace or not. Basically, anyone can publish anything to the marketplace that fulfills the minimum metadata requirements for the marketplace.
GitHub does provide guidance on versioning for authors, good actions should create tags that a user can pin to. Authors should practice semantic versioning to prevent accidentally breaking their users. Actions that specify a branch like main or master in their docs are suspect in my eyes and I wouldn't us them, their implementation could change from under you at any time.
As a consumer of any action, you should be aware of the security implications of using any actions. Other than that, the author has 2FA enabled on their account, GitHub does little to no verification on any actions they don't own themselves. Any author could in theory replace their implementation with ransomware or a bitcoin miner. So, for actions you haven't built a trust relation with its author, it's recommended to fork the action to your own account or organization and that you inspect the contents prior to running them on your runner, especially if that's a private runner with access to protected environments. My colleague Rob Bos has researched this topic deeply and has spoken about this topic frequently on conferences, podcasts and live streams.

What are the DRY options for GitHub Action .yml workflows?

I have many workflow.yaml files that their code and logic are quite similar.
This is a big DRY (Don't Repeat Yourself) violation.
Ideally, I would create a on: workflow_dispatch:' workflow with a series of inputs. Then I call that workflow by other workflows.
If the above idea is not easily possible, what are the DRY options with GitHub workflows?
You can build your own actions to split off common logic and use it from your workflows. They have inputs and outputs to feed them data and get out results. The following types are available.
JavaScript action
Docker container action
composite run steps action
Further in-depth description: https://docs.github.com/en/free-pro-team#latest/actions/creating-actions/about-actions

mercurial - several projects and repositories

(disclaimer: I am completely new to mercurial and version control)
So I have a folder structure
Programs
CPPLib1
CPPProject11
CPPProject12
CPPLib2
CPPProject21
CPPProject22
Each group of three is completely independent of the other group, but within each group the code is related and I'd like to manage it under version control as a whole (commit/extract everything in one transaction). As I understood by googling it, I must have a repository for each group in their common parent (Programs), but I cannot have 2 different repositories there, right? Does it mean I must have this structure instead:
Programs
Group1
CPPLib1
CPPProject11
CPPProject12
Group2
CPPLib2
CPPProject21
CPPProject22
A related question, this site http://help.fogcreek.com/8169/using-more-than-one-repository says
"Since Mercurial and Git are Distributed Version Control Systems (DVCSs), you should use at least use one separate repository per project, including shared projects and libraries."
So what does this advice mean? I can't have a separate repository for each of
CPPLib1
CPPProject11
CPPProject12
and manage them as a whole. I am confused.
For each of you project groups you'll need to create one repository in a separate directory. On how you structure that beneath is up to debate and depends a bit on your preferences.
You say that you want everything in that project group managed within a single repository. It means you can simply create a directory structure as you described with the sub-projects residing in different directories within this repository.
Within each group, you can take it further and make each of these directories (library, programme 1, programme 2, ...) a separate repository which in turn become a sub-repository to the main repository, as described in the link given by Lasse Karlsen (Subrepository).
You could also handle it differently, if you allow a more flexible layout and let go of checking out one group in its entirety: For instance you could declare the libraries a sub-repository to each of the programmes which uses the library. It would have the advantage that the programme defines this way directly which library version it depends on
Further, before jumping to sub-repositories, you might want to look at the alternative implementation of guest repositories as well. They handle the dependency less strict, thus a failure to find the sub-repository becomes less fatal: https://bitbucket.org/selinc/guestrepo

What is "vendoring"?

What is "vendoring" exactly? How would you define this term?
Does it mean the same thing in different programming languages? Conceptually speaking, not looking at the exact implementation.
Based on this answer
Defined here for Go as:
Vendoring is the act of making your own copy of the 3rd party packages
your project is using. Those copies are traditionally placed inside
each project and then saved in the project repository.
The context of this answer is in the Go language, but the concept still applies.
If your app depends on certain third-party code to be available you could declare a dependency and let your build system install the dependency for you.
If however the source of the third-party code is not very stable you could "vendor" that code. You take the third-party code and add it to your application in a more or less isolated way. If you take this isolation seriously you should "release" this code internally to your organization/working environment.
Another reason for vendoring is if you want to use certain third-party code but you want to change it a little bit (a fork in other words). You can copy the code, change it, release it internally and then let your build system install this piece of code.
Vendoring means putting a dependency into you project folder (vs. depending on it globally) AND committing it to the repo.
For example, running cp /usr/local/bin/node ~/yourproject/vendor/node & committing it to the repo would "vendor" the Node.js binary – all devs on the project would use this exact version. This is not commonly done for node itself but e.g. Yarn 2 ("Berry") is used like this (and only like this; they don't even install the binary globally).
The committing act is important. As an example, node_modules are already installed in your project but only committing them makes them "vendored". Almost nobody does that for node_modules but e.g. PnP + Zero Installs of Yarn 2 are actually built around vendoring – you commit .yarn/cache with many ZIP files into the repo.
"Vendoring" inherently brings tradeoffs between repo size (longer clone times, more data transferred, local storage requirements etc.) and reliability / reproducibility of installs.
Summarizing other, (too?) long answers:
Vendoring is hard-coding the often forked version of a dependency.
This typically involves static linking or some other copy but it doesn't have to.
Right or wrong, the term "hard-coding" has an old and bad reputation. So you won't find it near projects openly vendoring, however I can't think of a more accurate term.
As far as I know the term comes from Ruby on Rails.
It describes a convention to keep a snapshot of the full set of dependencies in source control, in directories that contain package name and version number.
The earliest occurrence of vendor as a verb I found is the vendor everything post on err the blog (2007, a bit before the author co-founded GitHub). That post explains the motivation and how to add dependencies. As far as I understand the code and commands, there was no special tool support for calling the directory vendor at that time (patches and code snippets were floating around).
The err blog post links to earlier ones with the same convention, like this fairly minimal way to add vendor subdirectories to the Rails import path (2006).
Earlier articles referenced from the err blog, like this one (2005), seemed to use the lib directory, which didn't make the distinction between own code and untouched snapshots of dependencies.
The goal of vendoring is more reproducibility, better deployment, the kind of things people currently use containers for; as well as better transparency through source control.
Other languages seem to have picked up the concept as is; one related concept is lockfiles, which define the same set of dependencies in a more compact form, involving hashes and remote package repositories. Lockfiles can be used to recreate the vendor directory and detect any alterations. The lockfile concept may have come from the Ruby gems community, but don't quote me on that.
The solution we’ve come up with is to throw every Ruby dependency in vendor. Everything. Savvy? Everyone is always on the same page: we don’t have to worry about who has what version of which gem. (we know) We don’t have to worry about getting everyone to update a gem. (we just do it once) We don’t have to worry about breaking the build with our libraries. […]
The goal here is simple: always get everyone, especially your production environment, on the same page. You don’t want to guess at which gems everyone does and does not have. Right.
There’s another point lurking subtlety in the background: once all your gems are under version control, you can (probably) get your app up and running at any point of its existence without fuss. You can also see, quite easily, which versions of what gems you were using when. A real history.