The use case trying out is, way to initialize the postgres database after it starts up. I saw the post start hooks in the openshift pod lifecycle. I can't put the sql statements using here-document or in command line ( Docker command fails due to max length issue ).
So looking a option to save the SQL statements in a file via ConfigMap and attach it to the post container before it starts, so that the psql command can execute it. I couldn't see a way to attach the volume from the DeploymentConfig from the official document. Is there any way I can do it ?
Document I referred - openshift-doc
I found a workaround to pass the long SQL statements to the post life-cycle pods.
Set the SQL statements in the DeploymentConfig ENV variable. These ENV variables are accessible inside the life cycle pods also, so then we can easily do the bellow command
post:
failurePolicy: Abort
execNewPod:
command:
- /bin/bash
- '-c'
- >-
echo $INIT_SQL_STATEMENTS | psql "sslmode=allow
host=postgres user=postgres password=postgres"
containerName: postgres
.....
env:
- name: POSTGRESQL_ADMIN_PASSWORD
value: postgres
- name: INIT_SQL_STATEMENTS
value: >-
create user haridas with encrypted password 'haridas';...
Another option which I've employed in the past is to pass in the sql statements in a parameters file. This will then allow you to more easily configuration manage the sql commands (e.g. check it into git) and declutter your deployment configuration (DC). Here is what I did:
Move your post hook to the DC portion of the template file. Let me
know if you need steps on how to export, modify, and re-import a
template file, but I didn't want to over-complicate this procedure
unnecessarily.
Add a parameter to the template file called SQL_COMMANDS like this:
parameters:
- description: The SQL commands to run.
displayName: SQL commands
name: SQL_COMMANDS
required: true
In the post hook code of the template (DC section) run the SQL_COMMANDS like this:
execNewPod:
command:
- /bin/sh
- -c
- echo "${SQL_COMMANDS}" | psql -h ${DATABASE_SERVICE_NAME} -U ${POSTGRESQL_USER} -d ${POSTGRESQL_DATABASE};
Note the other variables in the command are also passed in as parameters.
Create a parameters file similar to this:
POSTGRESQL_USER=postgres
POSTGRESQL_PASSWORD=somepassword
POSTGRESQL_DATABASE=myDatabase
SQL_COMMANDS="CREATE TABLE Configuration(CONFIGURATION_ID character varying(255)
NOT NULL, description character varying(255)
NOT NULL, key character varying(255) NOT NULL, value text NOT NULL,
PRIMARY KEY (CONFIGURATION_ID) ); INSERT INTO Configuration
(CONFIGURATION_ID, description, key, value) VALUES ('10', ... etc."
Deploy your app using the template and pass in the parameters from the file:
oc new-app <template name> --param-file=ParametersFile.txt
Related
I would like to use test databases for feature branches.
Of course it would be best to create a gitlab ci environment on the fly (review apps style) and also create a test database on the target system with the same name. Unfortunately, this is not possible because the MySQL databases in the target system have fixed names, like xxx_1, xxx_2 etc. and this cannot be changed without moving to a different hosting provider.
So I would like to do something like "grab an empty test data base from the given xxx_n and then empty it again when the branch is deleted".
How could this be handled with gitlab ci?
Can I set a variable on the project that says "feature branch Y already uses database xxx_4"?
Or should I put a table into the test database to store this information?
Using dynamic environments/variables and stop jobs might be able to do the trick. Stop jobs will run when the environment is "stopped" -- in the case of feature branches without associated MRs, when the feature branch is deleted (or if there is an open MR for the review app, when the MR is merged or closed)
Can I set a variable on the project that says "feature branch Y already uses database xxx_4"?
One way may be to put the db name directly in the environment name. Then the Environments API keeps track of this.
stages:
- pre-deploy
- deploy
determine_database:
stage: pre-deploy
image: python:3.9-slim
script:
- pip install python-gitlab
- database_name=$(determine-database) # determine what database names are not currently in use
- echo "database_name=${database_name}" > vars.env
artifacts:
reports: # automatically set $database_name variable in subsequent jobs
dotenv: "vars.env"
deploy_review_app:
stage: deploy
environment:
name: review/$CI_COMMIT_REF_SLUG/$database_name
on_stop: teardown
script:
- echo "deploying review app for $CI_COMMIT_REF with database name configuration $database_name"
- ... # steps to actually do the deploy
teardown: # this will trigger when the environment is stopped
stage: deploy
variables:
GIT_STRATEGY: none # ensures this works even if the branch is deleted
when: manual
script:
- echo "tearing down test database $database_name"
- ... # actual script steps to stop env and cleanup database
environment:
name: review/$CI_COMMIT_REF_SLUG/$database_name
action: "stop"
The implementation of the determine-database command may have to connect to your database to determine what database names are available (or perhaps you have a set of these provisioned in advance). You can then inspect the GitLab environments API to see what database names are still in use (since it's baked into the environment name).
For example, you might have something like this. Here, I am using the python-gitlab API wrapper just because it's most familiar to me, but the same principle can be applied to any method of calling the GitLab REST API.
#!/usr/bin/env python3
import gitlab
import os, sys, random
GITLAB_URL = os.environ['CI_SERVER_URL']
PROJECT_TOKEN = os.environ['MY_PROJECT_TOKEN'] # you generate and add this to your CI/CD variables!
PROJECT_ID = os.environ['CI_PROJECT_ID']
DATABASE_NAMES = ['xxx_1', 'xxx_2', 'xxx_3'] # or determine this programmatically by connecting to the DB
gl = gitlab.Gitlab(GITLAB_URL, private_token=PROJECT_TOKEN)
in_use_databases = []
project = gl.projects.get(PROJECT_ID)
for environment in project.environments.list(state='available', all=True):
# the in-use database name is the string after the last '/' in the env name
in_use_db_name = environment.name.split('/')[-1]
in_use_databases.append(in_use_db_name)
available_databases = [name for name in DATABASE_NAMES if name not in in_use_databases]
if not available_databases: # bail if all databases are in use
print('FATAL. no available databases', file=sys.stderr)
raise SystemExit(1)
# otherwise pick one and output to stdout
db_name = random.choice(available_databses)
# optionally you could prepare the database here, too, instead of relying on the `on_stop` job.
print(db_name)
There is a potential concurrency problem here (two runs of determine_database concurrently on different branches can potentially select the same db twice before either finish) but that could be addressed with resource locks.
have a snowflake query in airflow using a password in a jinja template:
create stage if not exists {{ params.dest_database }}.{{ params.stg_schema }}.{{ params.stg_prefix }}blabla_ext_stage
url='{{ params.s3_bucket }}'
credentials=(aws_key_id='{{ params.login }}' aws_secret_key='{{ params.password }}');
problem is the password shows in query in log - any way of hiding it?
Why do you want to create a stage on every run of the airflow job? Create it in Snowflake first and then use it in the dag run.
If you need to create the stage inside the job then you can use Snowflake's Storage Integrations for this.
Provider's operators works with hooks that automatically hide sensitive parts of the credentials in the logs. For example, you can check S3ToRedshiftOperator that uses the S3 hook to get the credentials. I don't know how you're using that template, but I highly recommend using the same pattern with hooks to prevent showing the secret key in the logs.
This is what it's showing for me:
[2021-08-17 07:40:33,584] {{base.py:78}} INFO - Using connection to: id: redshift. Host: my-cluster-readable.us-east-1.redshift.amazonaws.com, Port: 5439, Schema: schema, Login: my_login_readable, Password: ***, extra: {}
[2021-08-17 07:40:33,601] {{dbapi.py:204}} INFO - Running statement:
COPY schema.table1
FROM 's3://s3-bucket-xxx/folder1/folder2/'
with credentials
'aws_access_key_id=MY_ACCESS_KEY_THAT_IS_READABLE;aws_secret_access_key=***'
IGNOREHEADER 1
DELIMITER ','
FORMAT CSV
EMPTYASNULL
BLANKSASNULL
ROUNDEC
TRUNCATECOLUMNS
TRIMBLANKS
GZIP;
, parameters: None
[2021-08-17 07:40:43,105] {{redshift_operators.py:122}} INFO - COPY command complete...
As you can see there, both Password for the connection and the aws_secret_access_key are shown as *** (automatically by Airflow using the hook).
My recommendation would be to use exactly the same logic like this:
from airflow.providers.amazon.aws.hooks.s3 import S3Hook
from airflow.providers.amazon.aws.utils.redshift import build_credentials_block
# More code here
s3_hook = S3Hook(aws_conn_id="your_conn_id")
credentials = s3_hook.get_credentials()
credentials_block = build_credentials_block(credentials)
# Invoke the template here using the credentials_block as a param
I have a cypher script file and I would like to run it directly.
All answers I could find on SO to the best of my knowledge use the command neo4j-shell which in my version (Neo4j server 3.5.5) seems to be deprecated and substituted with the command cyphershell.
Using the command sudo ./neo4j-community-3.5.5/bin/cypher-shell --help I got the following instructions.
usage: cypher-shell [-h] [-a ADDRESS] [-u USERNAME] [-p PASSWORD]
[--encryption {true,false}]
[--format {auto,verbose,plain}] [--debug] [--non-interactive] [--sample-rows SAMPLE-ROWS]
[--wrap {true,false}] [-v] [--driver-version] [--fail-fast | --fail-at-end] [cypher]
A command line shell where you can execute Cypher against an
instance of Neo4j. By default the shell is interactive but you can
use it for scripting by passing cypher directly on the command
line or by piping a file with cypher statements (requires Powershell
on Windows).
My file is the following which tries to create a graph from csv files and it comes from the book "Graph Algorithms".
WITH "https://github.com/neo4j-graph-analytics/book/raw/master/data" AS base
WITH base + "transport-nodes.csv" AS uri
LOAD CSV WITH HEADERS FROM uri AS row
MERGE (place:Place {id:row.id})
SET place.latitude = toFloat(row.latitude),
place.longitude = toFloat(row.latitude),
place.population = toInteger(row.population)
WITH "https://github.com/neo4j-graph-analytics/book/raw/master/data/" AS base
WITH base + "transport-relationships.csv" AS uri
LOAD CSV WITH HEADERS FROM uri AS row
MATCH (origin:Place {id: row.src})
MATCH (destination:Place {id: row.dst})
MERGE (origin)-[:EROAD {distance: toInteger(row.cost)}]->(destination)
When I try to pass the file directly with the command:
sudo ./neo4j-community-3.5.5/bin/cypher-shell neo_4.cypher
first it asks for username and password but after typing the correct password (the wrong password results in the error The client is unauthorized due to authentication failure.) I get the error:
Invalid input 'n': expected <init> (line 1, column 1 (offset: 0))
"neo_4.cypher"
^
When I try piping with the command:
sudo cat neo_4.cypher| sudo ./neo4j-community-3.5.5/bin/cypher-shell -u usr -p 'pwd'
no output is generated and no graph either.
How to run a cypher script file with the neo4j command cypher-shell?
Use cypher-shell -f yourscriptname. Check with --help for more description.
I think the key is here:
cypher-shell -- help
... Stuff deleted
positional arguments:
cypher an optional string of cypher to execute and then exit
This means that the paremeter is actual cypher code, not a file name. Thus, this works:
GMc#linux-ihon:~> cypher-shell "match(n) return n;"
username: neo4j
password: ****
+-----------------------------+
| n |
+-----------------------------+
| (:Job {jobName: "Job01"}) |
| (:Job {jobName: "Job02"}) |
But this doesn't (because the text "neo_4.cypher" isn't a valid cypher query)
cypher-shell neo_4.cypher
The help also says:
example of piping a file:
cat some-cypher.txt | cypher-shell
So:
cat neo_4.cypher | cypher-shell
should work. Possibly your problem is all of the sudo's. Specifically the cat ... | sudo cypher-shell. It is possible that sudo is protecting cypher-shell from some arbitrary input (although it doesn't seem to do so on my system).
If you really need to use sudo to run cypher, try using the following:
sudo cypher-shell arguments_as_needed < neo_4.cypher
Oh, also, your script doesn't have a return, so it probably won't display any data, but you should still see the summary reports of records loaded.
Perhaps try something simpler first such as a simple match ... return ... query in your script.
Oh, and don't forget to terminate the cypher query with a semi-colon!
The problem is in the cypher file: each line should end with a semicolon: ;. I still need sudo to run the program.
The file taken from the book seems to contain other errors as well actually.
Im using Rundeck 3.0.7 with Ansible 2.7 and cant figure out the correct syntax to pass variables to my Ansible playbook. If I run it from the command line it works fine.
ansible-playbook test-playbook.yml -i hosts -e "FirstName=John LastName=Doe OfficePhone=365"
However when I add those vars to the "Extra Variables" section of the Rundeck Job I add the following and it doesnt work.
-e "FirstName=John LastName=Doe OfficePhone=365"
Does anyone know the proper syntax?
In your workflow define your playbook extra arguments using options like:
-e "test1=${option.test1} test2=${option.test2}"
That way you get the values of options to those variable names in the arguments to ansible-playbook.
You can also use the "Extra Variables" section with the yaml syntax e.g.
FirstName: John
LastName: Doe
OfficePhone: 365
This can also be done with variables passed from Rundeck to Ansible:
Extra Variables:
FirstName: ${option.firstname}
LastName: ${option.lastname}
OfficePhone: ${option.phone}
Then of course you need the Rundeck options defined.
Source
I had a couple of nodes in my chef server that had a problem while bootstrapping and missed the FQDN and domain automatic attributes due to which they were not indexed by SOLR and not searchable by knife. I could not rebootstrap these machines, but wanted to fix this and it took me a while to do so. Therefore I am posting this hoping that it will save others some time.
Automatic attributes are stored by Chef in the database and are not editable by knife (see Chef Attributes Overview). They are stored in chef's database as a column named serialized_object in the nodes table in hex and is in fact a gzipped JSON string.
To obtain the JSON string:
Use a PostgreSQL client to connect to the chef PostgreSQL (you can find the credentials on the chef server in /etc/chef-server/chef-server-secrets.json)
Save the contents of the serialized_object to a file say serialized_object.hex (it should look something like '\x1f8b08000...')
Run: xxd -p -r serialized_object.hex > serialized_object.gz
Run: gunzip serialized_object.gz
Now the file serialized_object contains the attributes in JSON format which you can edit. After editing you can store its contents back in chef server by following this:
Run: gzip serialized_object
Run: xxd -p serialized_object.gz > serialized_object.hex
Now you need to use the PostgreSQL client and insert the Hex data (be sure to remove prefix backslashes and x from the hex string) with the following query:
update nodes set serialized_object = decode('1f8b08000...','hex') where name = ''
Hope this helps someone :)