I am new to CodeQL and therefore my apologies if my question is an obvious one, however, I've been unable to understand a few simple concepts.
Firstly, I can easily configure a public repo with a github action using a yml file configured as follows:
on:
push:
branches: [ master ]
pull_request:
# The branches below must be a subset of the branches above
branches: [ master ]
jobs:
analyze:
name: Analyze
runs-on: ubuntu-latest
permissions:
actions: read
contents: read
security-events: write
strategy:
fail-fast: false
matrix:
language: [ 'java' ]
# CodeQL supports [ 'cpp', 'csharp', 'go', 'java', 'javascript', 'python', 'ruby' ]
# Learn more about CodeQL language support at https://aka.ms/codeql-docs/language-support
steps:
- name: Checkout repository
uses: actions/checkout#v3
# Initializes the CodeQL tools for scanning.
- name: Initialize CodeQL
uses: github/codeql-action/init#v2
with:
queries: +security-extended
languages: ${{ matrix.language }}
# If you wish to specify custom queries, you can do so here or in a config file.
# By default, queries listed here will override any specified in a config file.
# Prefix the list here with "+" to use these queries and those in the config file.
# Details on CodeQL's query packs refer to : https://docs.github.com/en/code-security/code-scanning/automatically-scanning-your-code-for-vulnerabilities-and-errors/configuring-code-scanning#using-queries-in-ql-packs
# queries: security-extended,security-and-quality
# Autobuild attempts to build any compiled languages (C/C++, C#, or Java).
# If this step fails, then you should remove it and run the build manually (see below)
- name: Autobuild
uses: github/codeql-action/autobuild#v2
# âšī¸ Command-line programs to run using the OS shell.
# đ See https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idstepsrun
# If the Autobuild fails above, remove it and uncomment the following three lines.
# modify them (or add more) to build your code if your project, please refer to the EXAMPLE below for guidance.
# - run: |
# echo "Run, Build Application using script"
# ./location_of_script_within_repo/buildscript.sh
- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze#v2
As indicated in the yaml file, I'm using Java as the language. What I'm trying to then do is trigger a failure / alert with a simple code such as this in Java.
public class Main {
public static void main(String[] args) {
// Example code for https://cwe.mitre.org/data/definitions/476.html
String cmd = System.getProperty("cmd");
cmd = cmd.trim();
}
}
This simple code is an example from Common Weakness Enumeration (CWE) 416 where I'm trying to dereference a variable that hasn't been defined.
If I go to Security -> Code scanning alerts it will show that the scanning was performed but not alerts were found.
Basically, I'm wondering if I need to initialize the CodeQL with a specific CWE under the Initialize CodeQL step in the yaml file.
CodeQL only has a specific set of queries, which do not cover all possible CWEs. This list shows the currently covered CWEs for Java.
As far as I know there exists no query at the moment which detects the specific issue you are showing in your question (there are however queries which detect derefencing null). The reason for this is most likely that it would be difficult to prevent false positives. For example if your application is started with -Dcmd, then the system property would not be null. Similarly there could be a call to System.setProperty in a different part of the application which sets the system property to a non-null value.
Besides that you have configured queries: +security-extended but the type of the query you are looking for (assuming it existed) would most likely be in the query suite security-and-quality because it is not directly security related.
You could also try to write your own queries and then include them in the code scanning workflow. Some concepts of CodeQL might feel a bit unfamiliar at first, but they provide great examples and tutorials for getting started. However, you should probably first check if the provided queries already suffice for your use case.
Since May 2022:
Using CodeQL query packs (and its associated CWE coverage, with query specifiers) is still beta, but not going anywhere
its setup has been simplified:
Code scanning can be set up more easily without committing a workflow file to the repository (Jan. 2023)
Code scanning's new default setup feature automatically finds and sets up the best CodeQL configuration for your repository.
This will detect the languages in the repository and enable CodeQL analysis for every pull request and every push to the default branch and any protected branches.
Default setup currently supports analysis of JavaScript (including TypeScript), Python, and Ruby code.
More languages will be supported soon, and all other languages supported by CodeQL continue to work using a GitHub Actions workflow file.
The new default setup feature is available for CodeQL on repositories that use GitHub Actions.
You can use default setup on your repository's "Settings" tab under "Code security and analysis" (accessible by repository admins and security managers).
The options to set up code scanning using an Actions workflow file or through API upload from 3rd party CI/CD systems remain supported and are unchanged.
This more advanced setup method can be useful if you need to alter the default configuration, for example to include custom query packs.
Default setup configurations can also be converted to advanced setups if your analysis requirements change.
Default setup is currently available at the repository level.
We are actively working on future features at the organization level so you can easily set up code scanning at scale across large numbers of repositories.
This has shipped to GitHub.com and will be available in GitHub Enterprise Server 3.9.
To learn more, read the documentation on setting up code scanning for a repository.
In your case, you would still need an Actions workflow file, to specify a query pack.
Related
I know about organization-wide secrets for GitHub Actions, but I'm looking for a way to define organization-wide non-secret variables, such as default environment variables or something similar.
Sometimes you simply don't want all your shared config to be secret. An example is for AWS credentials, where it may be beneficial to see the AWS_ACCESS_KEY_ID in plain text, but keep the AWS_SECRET_ACCESS_KEY as a secret.
It seems silly to have to put the same AWS_ACCESS_KEY_ID variable as an env variable in every single repo, while the secret half of that pair works perfectly to configure once for the entire organization... Is there such a way?
Perhaps a workaround could be to create a reused workflow action such as set-env that sets the shared environment variables and then include that in each and every job such as uses: my-org/github-actions/.github/workflows/set-env.yml#main, but it's not the cleanest solution I think.
Too bad there's simply not an option next to GitHub action secrets to allow them to be ... well, not so secret.
Is there any way that a step in a workflow job can specify sub-steps to pass to a custom action?
steps:
- uses: ...
with:
steps:
- ...
- ...
- ...
For example, a uses: actions/cache#v2 will attempt to download a snapshot immediately (in the current step), and also inserts a post step to upload a fresh snapshot after all other steps in the parent job. This works well for build processes that (through multiple stages) intelligently recognise which objects need to be recreated and which do not. But it is not suited for workflows that need to set up a clean environment (or generate test data) that may then be manipulated by the tests. I'd like to make a different cache action, that is explicitly passed instructions for how to regenerate the cache from scratch, and which uploads the state before returning to the following step of the parent action.
Is there any way this could be achieved? (Composite actions? Advanced yaml syntax? Template expressions? Low-level actions toolkits/features? Implementing an interpreter to re-parse the input parameter?)
I have my CronJob working fine without the use of an image stream.
The job runs every 15 minutes and always pulls a tagged image, e.g. my-cron:stable.
Since the image is always pulled and the schedule tells the cluster when to run my job, what do I gain from knowing that there's an updated version of my image?
If the image changes and there's a running instance of my job, I want the job to complete using the current version of the image.
In the next scheduled run the updated image is pulled (AlwaysPull). So it seems I don't gain much tracking changes to an image stream for cron jobs.
ImageStream triggers only BuildConfigs and DeploymentConfigs, as per https://docs.openshift.com/container-platform/4.7/openshift_images/image-streams-manage.html .
Upstream kubernetes doesn't have a concept of ImageStream, so there is no triggering for 'vanilla' resource types. CronJob is used both in openshift and kubernetes (apiVersion: batch/v1beta1), and AFAIK the only way to access an imagestream is to use full path to internal registry, which is not that convenient. Your cronjob won't restart or won't be stopped for some reason, if imagestream is updated, because from kubernetes standpoint the image is pulled only when cronjob has been triggered, and after that it just waits for a job to complete.
As i see it - you are not gaining much from using imagestreams, because one of the main points, ability to use triggers, is not usable for cronjobs. The only reason to use it in CronJobs is if you are pushing directly to internal registry for some reason, but that's a bad practice too.
See following links for reference:
https://access.redhat.com/solutions/4815671
How to specify OpenShift image when creating a Job
Quoting redhat solution here:
Resolution
When using an image stream inside the project to run a cronjob,
specify the full path of the image:
[...]
spec:
jobTemplate:
spec:
template:
spec:
containers:
image: docker-registry.default.svc:5000/my-app-namespace/cronjob-image:latest
name: cronjob-image
[...]
Note that you can also put the ':latest' (or a specific tag) after the
image.
In this example, the cronjob will use the imagestream cronjob-image
from project my-app-namespace:
$ oc get is -n my-app-namespace [...]
imagestream.image.openshift.io/cronjob-image
docker-registry.default.svc:5000/my-app-namespace/cronjob-image
latest 27 minutes ago
Root Cause
The image was specified without its full path to the internal docker
registry. If the full path is not used (i.e. putting only
cronjob-image, OpenShift won't be able to find it.[...]
By using an ImageStream reference, you can avoid having to include the container image repository hostname hostname and port, and the project name in your Cron Job definition.
The docker repository reference looks likes this:
image-registry.openshift-image-registry.svc:5000/my-project/my-is:latest
The value of the equivalent annotation placed on a Cron Job looks like this:
[
{
"from": {
"kind": "ImageStreamTag",
"name": "my-is:latest"
},
"fieldPath": "spec.jobTemplate.spec.template.spec.containers[?(#.name==\"my-container\")].image"
}
]
On the one hand, this is longer. On the other hand, it includes less redundant information.
So, compared to other types of kubernetes resources, Image Streams don't add a great deal of functionality to Cron Jobs. But you might benefit from not having to hardcode the project name if for instance you kept the Cron Job YAML in Git and wanted to apply it to several different projects.
Kubernetes-native resources which contain a pod can be updated automatically in response to an image stream tag update by adding the image.openshift.io/triggers annotation.
This annotation can be placed on CronJobs, Deployments, StatefulSets, DaemonSets, Jobs, ReplicationControllers, etc.
The easiest way to do this is with the oc command.
$ oc set triggers cronjob/my-cronjob
NAME TYPE VALUE AUTO
cronjobs/my-cronjob config true
$ oc set triggers cronjob/my-cronjob --from-image=my-is:latest -c my-container
cronjob.batch/my-cronjob updated
$ oc set triggers cronjob/my-cronjob
NAME TYPE VALUE AUTO
cronjobs/my-cronjob config true
cronjobs/my-cronjob image my-is:latest (my-container) true
The effect of the oc set triggers command was to add the annotation to the CronJob, which we can examine with:
$ oc get cronjob/my-cronjob -o json | jq '.metadata.annotations["image.openshift.io/triggers"]' -r | jq
[
{
"from": {
"kind": "ImageStreamTag",
"name": "my-is:latest"
},
"fieldPath": "spec.jobTemplate.spec.template.spec.containers[?(#.name==\"my-container\")].image"
}
]
This is documented in Images - Triggering updates on image stream changes - but the syntax in the documentation appears to be wrong, so use oc set triggers if you find that the annotation you write by hand doesn't work.
I see the documentation in github actions to run a powershell script in a step. I need to increment version number and checkin versions.txt file in GitHub Actions using powershell. Can I accomplish this by running a powershell script from my folder and by using git.exe and powershell commands?
How do I set a variable from powershell script to use the version number in subsequent steps?
So there are two questions in your question, I will answer in reverse order.
To have one step (for example shell script) set a value for subsequent steps, you can use the special workflow commands which are just done by outputting specially formatted strings to STDOUT.
In your case, are looking for Setting an Output Parameter:
echo "::set-output name=version::1.2.3"
Which can then be read by subsequent steps, like this (for example):
env:
VERSION: ${{ steps.the-other-step-id.outputs.version }}
As for having GitHub actions modify code and then check it in, I would strongly advise against it as it will a) complicate your workflow, b) might create "workflow loops", where actions-originated checkins trigger this/other workflows.
For version tagging, I suggest you use tags as it is the almost universally accepted way to mark versions in source control. You might want to have a local script that both tags and updates your version.txt, and then your workflow just reads the file or the tag.
I need to distribute some sort of static configuration through my application. What is the best practice to do that?
I see three options:
Call application:get_env directly whenever a module requires to get configuration value.
Plus: simpler than other options.
Minus: how to test such modules without bringing the whole application thing up?
Minus: how to start certain module with different configuration (if required)?
Pass the configuration (retrieved from application:get_env), to application modules during start-up.
Plus: modules are easier to test, you can start them with different configuration.
Minus: lot of boilerplate code. Changing the configuration format requires fixing several places.
Hold the configuration inside separate configuration process.
Plus: more-or-less type-safe approch. Easier to track where certain parameter is used and change those places.
Minus: need to bring up configuration process before running the modules.
Minus: how to start certain module with different configuration (if required)?
Another approach is to transform your configuration data into an Erlang source module that makes the configuration data available through exports. Then you can change the configuration at any time in a running system by simply loading a new version of the configuration module.
For static configuration in my own projects, I like option (1). I'll show you the steps I take to access a configuration parameter called max_widgets in an application called factory.
First, we'll create a module called factory_env which contains the following:
-define(APPLICATION, factory).
get_env(Key, Default) ->
case application:get_env(?APPLICATION, Key) of
{ok, Value} -> Value;
undefined -> Default
end.
set_env(Key, Value) ->
application:set_env(?APPLICATION, Key, Value).
Next, in a module that needs to read max_widgets we'll define a macro like the following:
-define(MAX_WIDGETS, factory_env:get_env(max_widgets, 1000)).
There are a few nice things about this approach:
Because we used application:set_env/3 and application:get_env/2, we don't actually need to start the factory application in order to have our tests pass.
max_widgets gets a default value, so our code will still work even if the parameter isn't defined.
A second module could use a different default value for max_widgets.
Finally, when we are ready to deploy, we'll put a sys.config file in our priv directory and load it with -config priv/sys.config during startup. This allows us to change configuration parameters on a per-node basis if desired. This cleanly separates configuration from code - e.g. we don't need to make another commit in order to change max_widgets to 500.
You could use a process (a gen_server maybe?) to store your configuration parameters in its state. It should expose a get/set interface. If a value hasn't been explicitly set, it should retrieve a default value.
-export([get/1, set/2]).
...
get(Param) ->
gen_server:call(?MODULE, {get, Param}).
...
handle_call({get, Param}, _From, State) ->
case lookup(Param, State#state.params) of
undefined ->
application:get_env(...);
Value ->
{ok, Value}
end.
...
You could then easily mockup this module in your tests. It will also be easy to update the process with some new configuration at run-time.
You could use pattern matching and tuples to associate different configuration parameters to different modules:
set({ModuleName, ParamName}, Value) ->
...
get({ModuleName, ParamName}) ->
...
Put the process under a supervision tree, so it's started before all the other processes which are going to need the configuration.
Oh, I'm glad nobody suggested parametrized modules so far :)
I'd do option 1 for static configuration. You can always test by setting options via application:set_env/3,4. The reason you want to do this is that your tests of the application will need to run the whole application anyway at some time. And the ability to set test-specific configuration at that point is really neat.
The application controller runs by default, so it is not a problem that you need to go the application-way (you need to do that anyway too!)
Finally, if a process needs specific configuration, say so in the configuration data! You can store any Erlang-term, in particular, you can store a term which makes you able to override configuration parameters for a specific node.
For dynamic configuration, you are probably better off by using a gen_server or using the newest gproc features that lets you store such dynamic configuration.
I've also seen people use a .hrl (erlang header file) where all the configuration is defined and include it at the start of any file that needs configuration.
It makes for very concise configuration lookups, and you get configuration of arbitrary complexity.
I believe you can also reload configuration at runtime by performing hot code reloading of the module. The disadvantage is that if you use configuration in several modules and reload only one of them, only that one module will get its configuration updated.
However, I haven't actually checked if it works like that, and I couldn't find definitive documentation on how .hrl and hot code reloading interact, so make sure to double-check this before you actually use it.