how can I avoid showing password jinja param in airflow log? - jinja2

have a snowflake query in airflow using a password in a jinja template:
create stage if not exists {{ params.dest_database }}.{{ params.stg_schema }}.{{ params.stg_prefix }}blabla_ext_stage
url='{{ params.s3_bucket }}'
credentials=(aws_key_id='{{ params.login }}' aws_secret_key='{{ params.password }}');
problem is the password shows in query in log - any way of hiding it?

Why do you want to create a stage on every run of the airflow job? Create it in Snowflake first and then use it in the dag run.
If you need to create the stage inside the job then you can use Snowflake's Storage Integrations for this.

Provider's operators works with hooks that automatically hide sensitive parts of the credentials in the logs. For example, you can check S3ToRedshiftOperator that uses the S3 hook to get the credentials. I don't know how you're using that template, but I highly recommend using the same pattern with hooks to prevent showing the secret key in the logs.
This is what it's showing for me:
[2021-08-17 07:40:33,584] {{base.py:78}} INFO - Using connection to: id: redshift. Host: my-cluster-readable.us-east-1.redshift.amazonaws.com, Port: 5439, Schema: schema, Login: my_login_readable, Password: ***, extra: {}
[2021-08-17 07:40:33,601] {{dbapi.py:204}} INFO - Running statement:
COPY schema.table1
FROM 's3://s3-bucket-xxx/folder1/folder2/'
with credentials
'aws_access_key_id=MY_ACCESS_KEY_THAT_IS_READABLE;aws_secret_access_key=***'
IGNOREHEADER 1
DELIMITER ','
FORMAT CSV
EMPTYASNULL
BLANKSASNULL
ROUNDEC
TRUNCATECOLUMNS
TRIMBLANKS
GZIP;
, parameters: None
[2021-08-17 07:40:43,105] {{redshift_operators.py:122}} INFO - COPY command complete...
As you can see there, both Password for the connection and the aws_secret_access_key are shown as *** (automatically by Airflow using the hook).
My recommendation would be to use exactly the same logic like this:
from airflow.providers.amazon.aws.hooks.s3 import S3Hook
from airflow.providers.amazon.aws.utils.redshift import build_credentials_block
# More code here
s3_hook = S3Hook(aws_conn_id="your_conn_id")
credentials = s3_hook.get_credentials()
credentials_block = build_credentials_block(credentials)
# Invoke the template here using the credentials_block as a param

Related

how to connect my streamlit application with MySQL database?

So I am trying to use my (continously updating) database on MySQL with some visualizations which I want to put into my Streamlit app. In other words, I want to use the data from MySQL database in my Streamlit application.
For this purpose I consulted the official streamlit documentation here.
The problem here is that the tutorial tells me to create a file like this: .streamlit/secrets.toml and fill it with the following information (copy-pasting the syntax):
[
mysql
]
host = "localhost"
port = 3306
database = "xxx"
user = "xxx"
password = "xxx"
Everything was going good up until now but when I paste my secret.toml info in the SECRET MANAGEMENT widget (it is prompted when I am creating a new app in Streamlit cloud) it gives me a syntax error.
Invalid format: please enter valid TOML.
Up untill this point I was going by the book(tutorial). Now to go over this I tried using only the variable definitions like following (since I am not aware of the .toml syntax):
db_user = "root"
db_name = "dbname"
db_password = "123abc"
Am I doing this right? Or am I missing something obvious?
With all of that aside, I also need to know how to call dependencies on stream cloud for my app. For example, I need mysql-connector-python module but I don't see any console with which I can do that
NOTE:
This is my first time deploying an app on the cloud
[
mysql
]
It should be [mysql] in one line
In your GitHub repo, add requirements.txt file with your dependencies.
streamlit cloud will install those packages for your app.
I want to notify another way we can use a database within a Streamlit App rather than using the conventional method.
We can refer to this Medium.com article here.
It explains a way in which we can use Pandas Library to load a database and it also updates in real-time. By using this knowledge, connecting to a database becomes a "Python" problem, not a "streamlit" problem.
Assuming we are using MySQL
We can, according to the official tutorial for MySQL Database, create a .streamlit/secrets.toml file in which we will store our information(related to our database) as below:
# .streamlit/secrets.toml
[
mysql
]
host = "localhost"
port = 3306
database = "xxx"
user = "xxx"
password = "xxx"
Also install mysql-connector-python for python and import it on your application file. you will also need Pandas and toml Ofcourse:
pip install mysql-connector-python pandas toml
Here is what each of them do:
| Library | It's use |
| -------- | -------------- |
| mysql-connector-python | to connect to our database |
| pandas | to read and convert our database table into a Dataframe |
|toml| to read details from secrets.toml file |
STEP 1
We read details from secrets.toml
# Reading data
toml_data = toml.load("secrets.toml")
# saving each credential into a variable
HOST_NAME = toml_data['mysql']['host']
DATABASE = toml_data['mysql']['database']
PASSWORD = toml_data['mysql']['password']
USER = toml_data['mysql']['user']
PORT = toml_data['mysql']['port']
STEP 2
Connecting to our Database:
# Using the variables we read from secrets.toml
mydb = connection.connect(host=HOST_NAME, database=DATABASE, user=USER, passwd=PASSWORD, use_pure=True)
STEP 3
Making queries from our database:
query = pd.read_sql('SELECT * FROM mytable;' , mydb)
The query variable is now a display-able table in streamlit or Jupyter notebooks
Likewise, we can make any MySQL query(syntax applied) we want from our database.
This information is based on my own experience.

SlashDB doesn't commit transaction after executing a stored function in MySQL

I have a SlashDB installation on top of MySQL 5.7. I use it to serve custom REST API calls to allow other people to access the data in the DB. Most of these happen through the 'SQL Pass-thru' feature.
When executing straight SQL queries, changes to the DB are committed immediately. However, this is not true when I execute stored functions (through select [function name]). The function would execute perfectly, but any changes to the data is not committed until I issue commit;. The main problem is that this causes stranded locks on tables and other MySQL objects.
Anybody has any idea what's happening here?
Currently the only walk around is by manually adding ?autocommit=true to the connection string in /etc/slashdb/databases.cfg.
For example
myChinook:
alternate_key: {}
autoconnect: true
autoload: true
autoload_user: {dbpass: chinook, dbuser: chinook}
connection: 127.0.0.1:3308/Chinook?autocommit=true
creator: admin
db_encoding: utf-8
db_id: myChinook
db_schema: null
db_type: mysql
desc: ''
excluded_columns: {}
execute: []
foreign_keys: {}
owners: [admin]
read: []
write: []
After making manually changes in files you need to restart SlashDB service
sudo service slashdb stop
sudo service slashdb start
Or to call stored procedures instead of select on stored function.

How to attach volume to pod's post start life cycle hook?

The use case trying out is, way to initialize the postgres database after it starts up. I saw the post start hooks in the openshift pod lifecycle. I can't put the sql statements using here-document or in command line ( Docker command fails due to max length issue ).
So looking a option to save the SQL statements in a file via ConfigMap and attach it to the post container before it starts, so that the psql command can execute it. I couldn't see a way to attach the volume from the DeploymentConfig from the official document. Is there any way I can do it ?
Document I referred - openshift-doc
I found a workaround to pass the long SQL statements to the post life-cycle pods.
Set the SQL statements in the DeploymentConfig ENV variable. These ENV variables are accessible inside the life cycle pods also, so then we can easily do the bellow command
post:
failurePolicy: Abort
execNewPod:
command:
- /bin/bash
- '-c'
- >-
echo $INIT_SQL_STATEMENTS | psql "sslmode=allow
host=postgres user=postgres password=postgres"
containerName: postgres
.....
env:
- name: POSTGRESQL_ADMIN_PASSWORD
value: postgres
- name: INIT_SQL_STATEMENTS
value: >-
create user haridas with encrypted password 'haridas';...
Another option which I've employed in the past is to pass in the sql statements in a parameters file. This will then allow you to more easily configuration manage the sql commands (e.g. check it into git) and declutter your deployment configuration (DC). Here is what I did:
Move your post hook to the DC portion of the template file. Let me
know if you need steps on how to export, modify, and re-import a
template file, but I didn't want to over-complicate this procedure
unnecessarily.
Add a parameter to the template file called SQL_COMMANDS like this:
parameters:
- description: The SQL commands to run.
displayName: SQL commands
name: SQL_COMMANDS
required: true
In the post hook code of the template (DC section) run the SQL_COMMANDS like this:
execNewPod:
command:
- /bin/sh
- -c
- echo "${SQL_COMMANDS}" | psql -h ${DATABASE_SERVICE_NAME} -U ${POSTGRESQL_USER} -d ${POSTGRESQL_DATABASE};
Note the other variables in the command are also passed in as parameters.
Create a parameters file similar to this:
POSTGRESQL_USER=postgres
POSTGRESQL_PASSWORD=somepassword
POSTGRESQL_DATABASE=myDatabase
SQL_COMMANDS="CREATE TABLE Configuration(CONFIGURATION_ID character varying(255)
NOT NULL, description character varying(255)
NOT NULL, key character varying(255) NOT NULL, value text NOT NULL,
PRIMARY KEY (CONFIGURATION_ID) ); INSERT INTO Configuration
(CONFIGURATION_ID, description, key, value) VALUES ('10', ... etc."
Deploy your app using the template and pass in the parameters from the file:
oc new-app <template name> --param-file=ParametersFile.txt

Boto3 Error: botocore.exceptions.NoCredentialsError: Unable to locate credentials

When I simply run the following code, I always gets this error.
s3 = boto3.resource('s3')
bucket_name = "python-sdk-sample-%s" % uuid.uuid4()
print("Creating new bucket with name:", bucket_name)
s3.create_bucket(Bucket=bucket_name)
I have saved my credential file in
C:\Users\myname\.aws\credentials, from where Boto should read my credentials.
Is my setting wrong?
Here is the output from boto3.set_stream_logger('botocore', level='DEBUG').
2015-10-24 14:22:28,761 botocore.credentials [DEBUG] Skipping environment variable credential check because profile name was explicitly set.
2015-10-24 14:22:28,761 botocore.credentials [DEBUG] Looking for credentials via: env
2015-10-24 14:22:28,773 botocore.credentials [DEBUG] Looking for credentials via: shared-credentials-file
2015-10-24 14:22:28,774 botocore.credentials [DEBUG] Looking for credentials via: config-file
2015-10-24 14:22:28,774 botocore.credentials [DEBUG] Looking for credentials via: ec2-credentials-file
2015-10-24 14:22:28,774 botocore.credentials [DEBUG] Looking for credentials via: boto-config
2015-10-24 14:22:28,774 botocore.credentials [DEBUG] Looking for credentials via: iam-role
try specifying keys manually
s3 = boto3.resource('s3',
aws_access_key_id=ACCESS_ID,
aws_secret_access_key= ACCESS_KEY)
Make sure you don't include your ACCESS_ID and ACCESS_KEY in the code directly for security concerns.
Consider using environment configs and injecting them in the code as suggested by #Tiger_Mike.
For Prod environments consider using rotating access keys:
https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html#Using_RotateAccessKey
I had the same issue and found out that the format of my ~/.aws/credentials file was wrong.
It worked with a file containing:
[default]
aws_access_key_id=XXXXXXXXXXXXXX
aws_secret_access_key=YYYYYYYYYYYYYYYYYYYYYYYYYYY
Note that there must be a profile name "[default]". Some official documentation make reference to a profile named "[credentials]", which did not work for me.
If you are looking for an alternative way, try adding your credentials using
AmazonCLI
from the terminal type:-
aws configure
then fill in your keys and region.
Make sure your ~/.aws/credentials file in Unix looks like this:
[MyProfile1]
aws_access_key_id = yourAccessId
aws_secret_access_key = yourSecretKey
[MyProfile2]
aws_access_key_id = yourAccessId
aws_secret_access_key = yourSecretKey
Your Python script should look like this, and it'll work:
from __future__ import print_function
import boto3
import os
os.environ['AWS_PROFILE'] = "MyProfile1"
os.environ['AWS_DEFAULT_REGION'] = "us-east-1"
ec2 = boto3.client('ec2')
# Retrieves all regions/endpoints that work with EC2
response = ec2.describe_regions()
print('Regions:', response['Regions'])
Source: https://boto3.readthedocs.io/en/latest/guide/configuration.html#interactive-configuration.
I also had the same issue,it can be solved by creating a config and credential file in the home directory. Below show the steps I did to solve this issue.
Create a config file :
touch ~/.aws/config
And in that file I entered the region
[default]
region = us-west-2
Then create the credential file:
touch ~/.aws/credentials
Then enter your credentials
[Profile1]
aws_access_key_id = XXXXXXXXXXXXXXXXXXXX
aws_secret_access_key = YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY
After set all these, then my python file to connect bucket. Run this file will list all the contents.
import boto3
import os
os.environ['AWS_PROFILE'] = "Profile1"
os.environ['AWS_DEFAULT_REGION'] = "us-west-2"
s3 = boto3.client('s3', region_name='us-west-2')
print("[INFO:] Connecting to cloud")
# Retrieves all regions/endpoints that work with S3
response = s3.list_buckets()
print('Regions:', response)
You can also refer below links:
Amazon S3 with Python Boto3 Library
Boto 3 documentation
Boto3: Amazon S3 as Python Object Store
from the terminal type:-
aws configure
then fill in your keys and region.
after this do next step use any environment. You can have multiple keys depending your account. Can manage multiple enviroment or keys
import boto3
aws_session = boto3.Session(profile_name="prod")
# Create an S3 client
s3 = aws_session.client('s3')
Create an S3 client object with your credentials
AWS_S3_CREDS = {
"aws_access_key_id":"your access key", # os.getenv("AWS_ACCESS_KEY")
"aws_secret_access_key":"your aws secret key" # os.getenv("AWS_SECRET_KEY")
}
s3_client = boto3.client('s3',**AWS_S3_CREDS)
It is always good to get credentials from os environment
To set Environment variables run the following commands in terminal
if linux or mac
$ export AWS_ACCESS_KEY="aws_access_key"
$ export AWS_SECRET_KEY="aws_secret_key"
if windows
c:System\> set AWS_ACCESS_KEY="aws_access_key"
c:System\> set AWS_SECRET_KEY="aws_secret_key"
Exporting the credential also work, In linux:
export AWS_SECRET_ACCESS_KEY="XXXXXXXXXXXX"
export AWS_ACCESS_KEY_ID="XXXXXXXXXXX"
These instructions are for windows machine with a single user profile for AWS. Make sure your ~/.aws/credentials file looks like this
[profile_name]
aws_access_key_id = yourAccessId
aws_secret_access_key = yourSecretKey
I had to set the AWS_DEFAULT_PROFILEenvironment variable to profile_name found in your credentials.
Then my python was able to connect. eg from here
import boto3
# Let's use Amazon S3
s3 = boto3.resource('s3')
# Print out bucket names
for bucket in s3.buckets.all():
print(bucket.name)
I work for a large corporation and encountered this same error, but needed a different work around. My issue was related to proxy settings. I had my proxy set up so I needed to set my no_proxy to whitelist AWS before I was able to get everything to work. You can set it in your bash script as well if you don't want to muddy up your Python code with os settings.
Python:
import os
os.environ["NO_PROXY"] = "s3.amazonaws.com"
Bash:
export no_proxy = "s3.amazonaws.com"
Edit: The above assume a US East S3 region. For other regions: use s3.[region].amazonaws.com where region is something like us-east-1 or us-west-2
If you have multiple aws profiles in ~/.aws/credentials like...
[Profile 1]
aws_access_key_id = *******************
aws_secret_access_key = ******************************************
[Profile 2]
aws_access_key_id = *******************
aws_secret_access_key = ******************************************
Follow two steps:
Make one you want to use as a default using export AWS_DEFAULT_PROFILE=Profile 1 command in terminal.
Make sure to run above command in the same terminal from where you use boto3 or you open an editor.[Understand the following scenario]
Scenario:
If you have two terminal open called t1 and t2.
And you run the export command in t1 and you open JupyterLab or any other from t2, you will get NoCredentialsError: Unable to locate credentials error.
Solution:
Run the export command in t1 and then open JupyterLab or any other from the same terminal t1.
In case of MLflow a call to mlflow.log_artifact() will raise this error if you cannot write to AWS3/MinIO data lake.
The reason is not setting up credentials in your python env (as these two env vars):
os.environ['DATA_AWS_ACCESS_KEY_ID'] = 'login'
os.environ['DATA_AWS_SECRET_ACCESS_KEY'] = 'password'
Note you may also access MLflow artifacts directly, using minio client (which requires a separate connection to the data lake, apart from mlflow's connection). This client can be started like this:
minio_client_mlflow = minio.Minio(os.environ['MLFLOW_S3_ENDPOINT_URL'].split('://')[1],
access_key=os.environ['AWS_ACCESS_KEY_ID'],
secret_key=os.environ['AWS_SECRET_ACCESS_KEY'],
secure=False)
I have solved the problem like this:
aws configure
Afterwards I manually entered:
AWS Access Key ID [None]: xxxxxxxxxx
AWS Secret Access Key [None]: xxxxxxxxxx
Default region name [None]: us-east-1
Default output format [None]: just hit enter
After that it worked for me
The boto3 is looking for the credentials in the folder like
C:\ProgramData\Anaconda3\envs\tensorflow\Lib\site-packages\botocore\.aws
You should save two files in this folder credentials and config.
You may want to check out the general order in which boto3 searches for credentials in this link. Look under the Configuring Credentials sub heading.
If you're sure you configure your aws correctly, just make sure the user of the project can read from ./aws or just run your project as a root
I just had this problem. This is what worked for me:
pip install botocore==1.13.20
Source: https://github.com/boto/botocore/issues/1892
In case of using AWS
In my case I had to add the following policy in IAM role to allow ec2 tags to be read by the EC2 instances. That would eliminate Unable to locate credentials error
:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": "ec2:DescribeTags",
"Resource": "*"
}
]
}

Using MySQL on Openshift with Symfony 2

I added MySQL, and PHPMyAdmin cartridges to my openshift php app.
After mysql cartridge was added I saw the page which says:
Connection URL: mysql://$OPENSHIFT_MYSQL_DB_HOST:$OPENSHIFT_MYSQL_DB_PORT/
but I have no idea what does it mean.
When I access mysql database through PHPMyAdmin,
I see 127.8.111.1 as db host, so I configured my symfony 2 app (parameters.yml):
parameters:
database_driver: pdo_mysql
database_host: 127.8.111.1
database_port: 3306
database_name: <some_database>
database_user: admin
database_password: <some_password>
Now when I access my web page it throws an error, which I believe related to mysql connection. Can someone show me proper way of doing the above?
EDIT: It seems mysql connection works fine, but somehow
Error 101 (net::ERR_CONNECTION_RESET): Unknown error
is thrown.
The code I use and works very well to make my apps working both on localhost and openshift without changing database config parameters every time I move between them is this:
<?php
# app/config/params.php
if (getEnv("OPENSHIFT_APP_NAME")!='') {
$container->setParameter('database_host', getEnv("OPENSHIFT_MYSQL_DB_HOST"));
$container->setParameter('database_port', getEnv("OPENSHIFT_MYSQL_DB_PORT"));
$container->setParameter('database_name', getEnv("OPENSHIFT_APP_NAME"));
$container->setParameter('database_user', getEnv("OPENSHIFT_MYSQL_DB_USERNAME"));
$container->setParameter('database_password', getEnv("OPENSHIFT_MYSQL_DB_PASSWORD"));
}?>
This will tell the app that if is openshift environment it needs to load different username, host, database, etc.
Then you have to import this file (params.php) from your app/config/config.yml file:
imports:
- { resource: parameters.yml }
- { resource: security.yml }
- { resource: params.php }
...
And that's it. You will never have to touch this file or parameters.yml when you move on openshift or localhost.
Connection URL: mysql://$OPENSHIFT_MYSQL_DB_HOST:$OPENSHIFT_MYSQL_DB_PORT/
OpenShift exposes environment variables to your application containing the host and port information for your database. You should reference these environment variables in your configuration instead of hard-coding values. I am not a Symfony expert, but it looks to me like you would need to do the following in order to use this information in your app:
Create a pre-start hook for your application and export variables in Symfony's expected format. Add the following to the .openshift/action_hooks/pre_start_php-5.3 file in your application's git repo:
export SYMFONY__DATABASE__HOST=$OPENSHIFT_MYSQL_DB_HOST
export SYMFONY__DATABASE__PORT=$OPENSHIFT_MYSQL_DB_PORT
Symphony uses this pattern to identify external configuration in the environment, and will make the this configuration available for use in your YAML configuration:
parameters:
database_driver: pdo_mysql
database_host: "%database.host%"
database_port: "%database.port%"
EDIT:
Another option to expose this information for use in the YAML configuration is to import a php file in your app/config/config.yml:
imports:
- { resource: parameters.php }
In app/config/parameters.php:
$container->setParameter('database.host', getEnv("OPENSHIFT_MYSQL_DB_HOST"));
$container->setParameter('database.port', getEnv("OPENSHIFT_MYSQL_DB_PORT"));