What are the DRY options for GitHub Action .yml workflows? - github-actions

I have many workflow.yaml files that their code and logic are quite similar.
This is a big DRY (Don't Repeat Yourself) violation.
Ideally, I would create a on: workflow_dispatch:' workflow with a series of inputs. Then I call that workflow by other workflows.
If the above idea is not easily possible, what are the DRY options with GitHub workflows?

You can build your own actions to split off common logic and use it from your workflows. They have inputs and outputs to feed them data and get out results. The following types are available.
JavaScript action
Docker container action
composite run steps action
Further in-depth description: https://docs.github.com/en/free-pro-team#latest/actions/creating-actions/about-actions

Related

Return passing status on Github workflow when using paths-ignore

I'm using a Github workflow to run tests. Because the setup can take a while, we want to skip running the tests when no code was changed. So we are using paths-ignore like this:
on:
pull_request:
branches:
- develop
paths-ignore:
- '*.md'
The problem is that we have a protected branch here that requires a check to pass before a branch can be merged. There seems to be some workarounds https://github.community/t/feature-request-conditional-required-checks/16761/20 but they are pretty clunky. Is there an elegant and idiomatic way to return a passing status here for a job that was essentially skipped?
Elegant and idiomatic, evidently not. The conclusion elsewhere (GitHub community forum, Reddit) is that this is expected behavior, at least right now.
The two main workarounds people seem to be using are:
Run all required status checks on all PRs, even the slow ones. Sigh.
Use paths-filter or a homegrown alternative (example) inside the required workflows, as part of their execution, and skip the actual work and return success if no relevant files were changed.

Should actions be stored in a separate repo or nested in another

What are the best practices around creating a Github Action?
There seem to be roughly three approaches
One repo = one Action
From these examples I clearly derive that 1 action = 1 repo.
action-repo
action.yml
...
With usage:
uses: org/action-repo#tag
"Normal" repo with nested Action
Some tend to just add the action to their repo like so:
repo
github-action
action.yml
...
Probably also with more than one action. This gives already longer imports like:
uses: org/repo/github-action#tag
"Normal" repo with nested/hidden action
This is the most special case I have seen:
repo
.github
actions
action1
action.yml
...
action2
action.yml
...
This setup leads to some weird imports in the usage of the actions like
uses: org/repo/.github/actions/action1#tag
Has anyone seen official docs around this?
Two weeks into GHA, seeing dozens of repositories I dare to self-answer my question.
As the initial comments suggested, the approach you choose depends mainly on your use case.
Let me re-use also part of that comment to finalize my summary (anyone feel free to comment I will take the argument into my list).
Approaches
org/action-repo
above called "1 action = 1 repo"
➕  Certainly, the most seen standard, many "best of breed" actions (e.g. checkout, setup-xyz, etc.) use this approach
➕  Transparent versioning
➕  Straight forward uses (no nested paths)
➕  Follow UNIX philosophy of doing one thing well
➕  Easy to document, have issues, and pull requests for your action only
➖  Additional repo
This is a suitable approach for when you expect your action to be adopted by a large community or you just feel like "public needs that".
org/repo/action
above called nested action
➕➖  The above comparison the other way around.
This approach is mainly suitable to maintain some smaller actions that you just casually want to offer from within your project. We use it in our case now to accumulate some mono-repo-specific actions in one place.
org/repo/.github/action(s)
This approach admittedly is the most special one and a derivation of the second. Generally, there is no real use case for it - conjuring one up it is possible e.g. in a mono-repo to abstract all actions into the .github folder to kind of collect them. On the other hand, you can do that too in org/repo/action(s).
Feel free to complete my list by commenting.
You can always refer to the documentation:
If you're developing an action for other people to use, we recommend keeping the action in its own repository instead of bundling it with other application code. This allows you to version, track, and release the action just like any other software.
Storing an action in its own repository makes it easier for the GitHub community to discover the action, narrows the scope of the code base for developers fixing issues and extending the action, and decouples the action's versioning from the versioning of other application code.
If you're building an action that you don't plan to make available to the public, you can store the action's files in any location in your repository. If you plan to combine action, workflow, and application code in a single repository, we recommend storing actions in the .github directory. For example, .github/actions/action-a and .github/actions/action-b.

Should I use a MarketPlace action instead of a plain bash `cp` command to copy files?

I am noticing there are many actions in the GitHub marketplace that do the same. Here is an example:
https://github.com/marketplace/actions/copy-file
Is there any benefit of using the GitHub marketplace action instead of plain bash commands? Do we have recommended practices guideline that helps to decide whether I use MarketPlace actions versus plain bash or command line
These actions don't seem to have any real value in my eyes...
Other than that, these run in docker and don't need cp, wget or curl to be available on the host, and they ensure a consistent version of their tools is used. If you're lucky these actions also run consistently the same way on Windows, Linux and Mac, where as your bash scripts may not run on Windows. But the action author would have to ensure this, it's not something that comes by default.
One thing that could be a reason to use these actions from the marketplace is that they can run as a post-step, which the run: script/bash/pwsh steps can't.
They aren't more stable or safer, unless you pin the actions on a commit-hash or fork it, the owner of the action can change the behavior of the action at any time. So, you are putting trust in the original author.
Many actions provide convenience functions, like better logging or output variables or the ability to safely pass in a credential, but these tasks seem to be more of an exercise in building an action by the author and they don't really serve a great purpose.
The documentation that comes with each of these actions, doesn't provide a clear reason to use these actions, the actions don't follow the preferred versioning scheme... I'd not use these.
So, when would you use an action from the marketplace...? In general actions, like certain cli's provide a specific purpose and an action should contain all the things it needs to run.
An action could contain a complex set of steps, ensure proper handling of arguments, issue special logging commands to make the output more human-readable or update the environment for tasks running further down in the workflow.
An action that adds this extra functionality on top of existing cli's makes it easier to pass data from one action to another or even from one job to another.
An action is also easier to re-use across repositories, so if you're using the same scripts in multiple repos, you could wrap them in an action and easily reference them from that one place instead of duplicating the script in each action workflow or adding the script to each repository.
GitHub provides little guidance on when to use an action or when an author should publish an action to the marketplace or not. Basically, anyone can publish anything to the marketplace that fulfills the minimum metadata requirements for the marketplace.
GitHub does provide guidance on versioning for authors, good actions should create tags that a user can pin to. Authors should practice semantic versioning to prevent accidentally breaking their users. Actions that specify a branch like main or master in their docs are suspect in my eyes and I wouldn't us them, their implementation could change from under you at any time.
As a consumer of any action, you should be aware of the security implications of using any actions. Other than that, the author has 2FA enabled on their account, GitHub does little to no verification on any actions they don't own themselves. Any author could in theory replace their implementation with ransomware or a bitcoin miner. So, for actions you haven't built a trust relation with its author, it's recommended to fork the action to your own account or organization and that you inspect the contents prior to running them on your runner, especially if that's a private runner with access to protected environments. My colleague Rob Bos has researched this topic deeply and has spoken about this topic frequently on conferences, podcasts and live streams.

Is it possible to define more than one GitHub Actions workflow per YAML file?

I would like to define various workflows that are all related in a single YAML file. Is this possible?
I could not find any official documentation, so I just tried out the idea myself, and I think you cannot define multiple workflows per single YAML file.
Here is github action run log, https://github.com/chenrui333/github-action-test/actions/runs/157799207.
Also, you cannot use YAML anchor syntax either, see more discussion in this Github Action thread.

Programmatically create gitlab-ci.yml file?

Is there any tool to generate .gitlab-ci.yml file like Jenkins has job-dsl-plugin to create jobs?
Jenkins DSL plugin allows me to generate jobs using Groovy, which outputs an xml that describes a job for Jenkins.
I can use DSL and a json file to generate jobs in Jenkins. What I’m looking for is a tool to help me generate .gitlab-ci.yml based on a specification.
The main question i have to ask what is your goal?
just reduce maintenance effort for repeating job snippets:
Sometimes .gitlab-ci.yml file are pretty similar in a lot of projects, and you want to manage them centrally. Then i recommend to take a look at Having Gitlab Projects calling the same gitlab-ci.yml stored in a central location - which shows multiple ways of centralizing your build,
generate pipeline configuration as the build is highly flexible
Actually this is more a templating task, and can be achieved in nearly every script language you like.
With simple bash, groovy, python, go, .. you name it. In the end the question is, what kind of flexibility you strive for, and what kind of logic you need for the generation. I will not go into the detail on how to generate a the .gitlab-ci.yml file, but how to use it for your next step. Because this is in my opinion the most crucial step. There is the way of simply generating and committing it, but you can also use GitLab CI to generate a file for you, which will be used in the next job of your pipeline.
setup:
script:
- echo ".." # generate your yaml file here, maybe use a custom image
artifacts:
paths:
- generated.gitlab-ci.yml
trigger:
needs:
- setup
trigger:
include:
- artifact: generated.gitlab-ci.yml
job: setup
strategy: depend
This allows you to generate a child pipeline and execute it - we use this for highly generic builds in monorepos.
see for further reading
GitLab JSONNET Example - documentation example for generated yml files within a pipeline
Dynamic Childpipelines - documentation for dynamically created pipelines