I have a github repository which is doing CI/CD using github actions that need more than what the github-hosted runners can do. In multiple ways. For some tasks, we need to test CUDA code on GPUs. For some other tasks, we need lots of CPU cores and local disk.
Is it possible to route github actions to different self-hosted runners based on the task? Some tasks go to the GPU workers, and others to the big CPU workers? The docs imply this might be possible using "runner groups" but I honestly can't tell if this is something that A) can work if I figure it out B) will only work if I upgrade my paid github account to something pricier (even though it says it's "enterprise" already) or C) can never work.
When I try to set up a runner group following the docs, I don't see the UI elements that the docs describe. So maybe my account isn't expensive enough yet?
But I also don't see any way that I would route a task to a specific runner group. To use the self-hosted runners today, I just say
gpu-test-job:
runs-on: self-hosted
instead of
standard-test-job:
runs-on: ubuntu-22.04
and I'm not sure how I would even specify which runner group (or other routing mechanism) to get it to a specific kind of self-hosted runner, if that's even a thing. I'd need to specify something like:
big-cpu-job:
runs-on: self-hosted
self-hosted-runner-group: big-cpu # is this even a thing?
It looks like you won't be able to utilize runner groups on a personal account, but that's not a problem!
Labels can be added to self-hosted runners. Those labels can be referenced in the runs-on value (as an array) to specify which self-hosted runner(s) the job should go to.
You would run ./config.sh like this (you can pass in as many comma-separated labels as you like):
./config.sh --labels big-cpu
and your job would use an array in the runs-on field to make sure it's selecting a self-hosted runner that is also has the big-cpu label:
big-cpu-job:
runs-on: [self-hosted, big-cpu]
...
Note: If you wanted to "reserve" the big-cpu runners for the jobs that need it, then you'd use a separate label, regular, for example, on the other runners' ./config.sh and use that in the runs-on for the jobs that don't need the specialized runner.
Related
I have two pipelines that I want to run with the same runner, is it possible?
my single runner installed on a Linux virtual machine and I want to use it to run all my pipelines.
If the pipelines are for different projects you will need to make sure the runner is accessible to each project.
Depending on the level of control you want, you can utilise gitlab cis keyword for tags, this will then enable you to determine which runner handles which pipeline.
If you want to run the jobs in parallel you will need to make sure the runner is enabled for concurrent running and also that the jobs are in the same stages within the pipelines.
The only way to do this at the moment is to define one tag for one runner only and user this tag for your project.
This way everything is run on this single runner.
This has of course the disadvantage that the load is not spread to different runners, so be careful.
You could improve this solution if you a child pipeline to get a new free runner tag and create a child pipeline using it.
There are active issues about this problem in gitlab, see this one and this.
There is a forum entry about it as well.
In a Github action workflow, is there a way to access the "Set up Job" -> "Virtual Environment" values? Ideally, I'd like to get them from variables already present, but getting them from the output of a command would be just fine too.
I can see the values in the Github actions UI by clicking on a specific job, then "Set up Job", then expending "Virtual Environment".
Virtual Environment
Environment: ubuntu-20.04
Version: 20220425.1
Included Software: https://github.com/actions/virtual-environments/blob/ubuntu20/20220425.1/images/linux/Ubuntu2004-Readme.md
Image Release: https://github.com/actions/virtual-environments/releases/tag/ubuntu20%2F20220425.1
I'd like to create a cache key based on that info so that, when it changes, the cache is made anew.
Background:
I have some tests that run on both ["ubuntu-latest", "macos-latest"]. I also have a setup action that in part builds some shared libraries (which is slow) and caches them. The shared libraries are external though and only need to be rebuilt for updates to the runner image (or for new versions of the libraries, which isn't a concern). The current cache key is ${{ runner.os }}-${{ needs.setup.outputs.cache-key-suffix }}, which only ever changes when we bump the cache-key-suffix hard-coded string. Using that key, the cache is shared among all the workflows and branches, saving lots of needless duplicated work.
Deep Background:
The specific problem I'm solving is that one of those shared libraries is Rocks DB. The tests were working fine for a while, but recently stopped; most tests using Rocks DB started failing with signal: illegal instruction (core dumped). The only thing that might have changed is an update to ubuntu-latest. So I figure that it'd be nice to automatically have the cache recreated when that happens.
I've tried digging through the Github Actions: Variables and Github Actions: Contexts documentation, and some general Google-based research, but haven't been able to find a way to get those values for use in a workflow.
I'm using a Github workflow to run tests. Because the setup can take a while, we want to skip running the tests when no code was changed. So we are using paths-ignore like this:
on:
pull_request:
branches:
- develop
paths-ignore:
- '*.md'
The problem is that we have a protected branch here that requires a check to pass before a branch can be merged. There seems to be some workarounds https://github.community/t/feature-request-conditional-required-checks/16761/20 but they are pretty clunky. Is there an elegant and idiomatic way to return a passing status here for a job that was essentially skipped?
Elegant and idiomatic, evidently not. The conclusion elsewhere (GitHub community forum, Reddit) is that this is expected behavior, at least right now.
The two main workarounds people seem to be using are:
Run all required status checks on all PRs, even the slow ones. Sigh.
Use paths-filter or a homegrown alternative (example) inside the required workflows, as part of their execution, and skip the actual work and return success if no relevant files were changed.
What are the best practices around creating a Github Action?
There seem to be roughly three approaches
One repo = one Action
From these examples I clearly derive that 1 action = 1 repo.
action-repo
action.yml
...
With usage:
uses: org/action-repo#tag
"Normal" repo with nested Action
Some tend to just add the action to their repo like so:
repo
github-action
action.yml
...
Probably also with more than one action. This gives already longer imports like:
uses: org/repo/github-action#tag
"Normal" repo with nested/hidden action
This is the most special case I have seen:
repo
.github
actions
action1
action.yml
...
action2
action.yml
...
This setup leads to some weird imports in the usage of the actions like
uses: org/repo/.github/actions/action1#tag
Has anyone seen official docs around this?
Two weeks into GHA, seeing dozens of repositories I dare to self-answer my question.
As the initial comments suggested, the approach you choose depends mainly on your use case.
Let me re-use also part of that comment to finalize my summary (anyone feel free to comment I will take the argument into my list).
Approaches
org/action-repo
above called "1 action = 1 repo"
➕ Certainly, the most seen standard, many "best of breed" actions (e.g. checkout, setup-xyz, etc.) use this approach
➕ Transparent versioning
➕ Straight forward uses (no nested paths)
➕ Follow UNIX philosophy of doing one thing well
➕ Easy to document, have issues, and pull requests for your action only
➖ Additional repo
This is a suitable approach for when you expect your action to be adopted by a large community or you just feel like "public needs that".
org/repo/action
above called nested action
➕➖ The above comparison the other way around.
This approach is mainly suitable to maintain some smaller actions that you just casually want to offer from within your project. We use it in our case now to accumulate some mono-repo-specific actions in one place.
org/repo/.github/action(s)
This approach admittedly is the most special one and a derivation of the second. Generally, there is no real use case for it - conjuring one up it is possible e.g. in a mono-repo to abstract all actions into the .github folder to kind of collect them. On the other hand, you can do that too in org/repo/action(s).
Feel free to complete my list by commenting.
You can always refer to the documentation:
If you're developing an action for other people to use, we recommend keeping the action in its own repository instead of bundling it with other application code. This allows you to version, track, and release the action just like any other software.
Storing an action in its own repository makes it easier for the GitHub community to discover the action, narrows the scope of the code base for developers fixing issues and extending the action, and decouples the action's versioning from the versioning of other application code.
If you're building an action that you don't plan to make available to the public, you can store the action's files in any location in your repository. If you plan to combine action, workflow, and application code in a single repository, we recommend storing actions in the .github directory. For example, .github/actions/action-a and .github/actions/action-b.
I am noticing there are many actions in the GitHub marketplace that do the same. Here is an example:
https://github.com/marketplace/actions/copy-file
Is there any benefit of using the GitHub marketplace action instead of plain bash commands? Do we have recommended practices guideline that helps to decide whether I use MarketPlace actions versus plain bash or command line
These actions don't seem to have any real value in my eyes...
Other than that, these run in docker and don't need cp, wget or curl to be available on the host, and they ensure a consistent version of their tools is used. If you're lucky these actions also run consistently the same way on Windows, Linux and Mac, where as your bash scripts may not run on Windows. But the action author would have to ensure this, it's not something that comes by default.
One thing that could be a reason to use these actions from the marketplace is that they can run as a post-step, which the run: script/bash/pwsh steps can't.
They aren't more stable or safer, unless you pin the actions on a commit-hash or fork it, the owner of the action can change the behavior of the action at any time. So, you are putting trust in the original author.
Many actions provide convenience functions, like better logging or output variables or the ability to safely pass in a credential, but these tasks seem to be more of an exercise in building an action by the author and they don't really serve a great purpose.
The documentation that comes with each of these actions, doesn't provide a clear reason to use these actions, the actions don't follow the preferred versioning scheme... I'd not use these.
So, when would you use an action from the marketplace...? In general actions, like certain cli's provide a specific purpose and an action should contain all the things it needs to run.
An action could contain a complex set of steps, ensure proper handling of arguments, issue special logging commands to make the output more human-readable or update the environment for tasks running further down in the workflow.
An action that adds this extra functionality on top of existing cli's makes it easier to pass data from one action to another or even from one job to another.
An action is also easier to re-use across repositories, so if you're using the same scripts in multiple repos, you could wrap them in an action and easily reference them from that one place instead of duplicating the script in each action workflow or adding the script to each repository.
GitHub provides little guidance on when to use an action or when an author should publish an action to the marketplace or not. Basically, anyone can publish anything to the marketplace that fulfills the minimum metadata requirements for the marketplace.
GitHub does provide guidance on versioning for authors, good actions should create tags that a user can pin to. Authors should practice semantic versioning to prevent accidentally breaking their users. Actions that specify a branch like main or master in their docs are suspect in my eyes and I wouldn't us them, their implementation could change from under you at any time.
As a consumer of any action, you should be aware of the security implications of using any actions. Other than that, the author has 2FA enabled on their account, GitHub does little to no verification on any actions they don't own themselves. Any author could in theory replace their implementation with ransomware or a bitcoin miner. So, for actions you haven't built a trust relation with its author, it's recommended to fork the action to your own account or organization and that you inspect the contents prior to running them on your runner, especially if that's a private runner with access to protected environments. My colleague Rob Bos has researched this topic deeply and has spoken about this topic frequently on conferences, podcasts and live streams.