Azure Batch :Elevating the user privileges during Pool Creation using Azure CLI - azure-cli
I need to mount the azure file storage to Linux-Pools when they are being spun-up.I am following the instructions given here to achieve that: mounting Azure-File Storage to Batch Specically in my Azure CLI script under the Pools start commands I am inserting something which looks like this
--start-task-command-line="apt-get update && apt-get install cifs-utils && mkdir -p {} && mount -t cifs {} {} -o vers=3.0,username={},password={},dir_mode=0777,file_mode=0777,serverino".format(_COMPUTE_NODE_MOUNT_POINT, _STORAGE_ACCOUNT_SHARE_ENDPOINT, _COMPUTE_NODE_MOUNT_POINT, _STORAGE_ACCOUNT_NAME, _STORAGE_ACCOUNT_KEY)
but when I run the tasks with the auto-user that batch uses by default I get an error in the stderr.txt file mentioning that it was unable to create the "/mnt/MyAzureFileshare" directory and so my guess is the mounting didn't occur during the pool creation process.I saw a very similar question to the one I am facing:setting custom user identity for tasks and even the official Microsoft documentation goes over this in detail:Run Tasks under User accounts in Batch but none of them put a light on how to achieve this using Azure CLI.
In order to install specific packages so that Azure File Storage can be mounted requires sudo privileges and I am unable to do that through the Azure-CLI. In order to recreate the error I would recommend having a look at this:app to replicate the issue
What I want to achieve is:
1) Create a Pool with the Azure-File Storage mounted on it and elevate the privileges of the auto-user to the admin level using Azure CLI
2) Run tasks with the same auto-user with Admin Privileges using the azure CLI
Update 1:
I was able to mount Azure File Storage with Batch using the Azure CLI. I still am not able to populate the Azure File Storage with the output files of the app that I deployed on Batch Nodes.I have got no error in the stderr.txt files.
The output of the stderr.txt file is:
WARNING: In "login" auth mode, the following arguments are ignored: --account-key
Alive[################################################################] 100.0000%
Finished[#############################################################] 100.0000%
pdf--->png: 0%| | 0/1 [00:00<?, ?it/s]
pdf--->png: 100%|##########| 1/1 [00:00<00:00, 1.16it/s]WARNING: In "login" auth mode, the following arguments are ignored: --account-key
WARNING: uploading /mnt/batch/tasks/workitems/pdf-processing-job-2018-10-29-15-36-15/job-1/mytask-0/wd/png_files-2018-10-29-15-39-25/akronbeaconjournal_20180108_AkronBeaconJournal_0___page---0.png
Alive[################################################################] 100.0000%
Finished[#############################################################] 100.0000%
The Python App that was deployed on the Batch Nodes is:
import os
import fitz
import subprocess
import argparse
import time
from tqdm import tqdm
import sentry_sdk
import sys
import datetime
def azure_active_directory_login(azure_username,azure_password,azure_tenant):
try:
azure_login_output=subprocess.check_output(["az","login","--service-principal","--username",azure_username,"--password",azure_password,"--tenant",azure_tenant])
except subprocess.CalledProcessError:
sentry_sdk.capture_message("Invalid Azure Login Credentials")
sys.exit("Invalid Azure Login Credentials")
def download_from_azure_blob(azure_storage_account,azure_storage_account_key,input_azure_container,file_to_process,pdf_docs_path):
file_to_download=os.path.join(input_azure_container,file_to_process)
try:
subprocess.check_output(["az","storage","blob","download","--container-name",input_azure_container,"--file",os.path.join(pdf_docs_path,file_to_process),"--name",file_to_process,"--account-key",azure_storage_account_key,\
"--account-name",azure_storage_account,"--auth-mode","login"])
except subprocess.CalledProcessError:
sentry_sdk.capture_message("unable to download the pdf file")
sys.exit("unable to download the pdf file")
def pdf_to_png(input_folder_path,output_folder_path):
pdf_files=[x for x in os.listdir(input_folder_path) if x.endswith((".pdf",".PDF"))]
pdf_files.sort()
for pdf in tqdm(pdf_files,desc="pdf--->png"):
doc=fitz.open(os.path.join(input_folder_path,pdf))
page_count=doc.pageCount
for f in range(page_count):
page=doc.loadPage(f)
pix = page.getPixmap()
if pdf.endswith(".pdf"):
png_filename=pdf.split(".pdf")[0]+"___"+"page---"+str(f)+".png"
pix.writePNG(os.path.join(output_folder_path,png_filename))
elif pdf.endswith(".PDF"):
png_filename=pdf.split(".PDF")[0]+"___"+"page---"+str(f)+".png"
pix.writePNG(os.path.join(output_folder_path,png_filename))
def upload_to_azure_blob(azure_storage_account,azure_storage_account_key,output_azure_container,png_docs_path):
try:
subprocess.check_output(["az","storage","blob","upload-batch","--destination",output_azure_container,"--source",png_docs_path,"--account-key",azure_storage_account_key,\
"--account-name",azure_storage_account,"--auth-mode","login"])
except subprocess.CalledProcessError:
sentry_sdk.capture_message("Unable to upload file to the container")
def upload_to_fileshare(png_docs_path):
try:
subprocess.check_output(["cp","-r",png_docs_path,"/mnt/MyAzureFileShare/"])
except subprocess.CalledProcessError:
sentry_sdk.capture_message("unable to upload to azure file share ")
if __name__=="__main__":
#Credentials
sentry_sdk.init("<Sentry Creds>")
azure_username=<azure_username>
azure_password=<azure_password>
azure_tenant=<azure_tenant>
azure_storage_account=<azure_storage_account>
azure_storage_account_key=<azure_account_key>
try:
parser = argparse.ArgumentParser()
parser.add_argument("input_azure_container",type=str,help="Location to download files from")
parser.add_argument("output_azure_container",type=str,help="Location to upload files to")
parser.add_argument("file_to_process",type=str,help="file link in azure blob storage")
args = parser.parse_args()
timestamp = time.time()
timestamp_humanreadable= datetime.datetime.fromtimestamp(timestamp).strftime('%Y-%m-%d-%H-%M-%S')
task_working_dir=os.getcwd()
file_to_process=args.file_to_process
input_azure_container=args.input_azure_container
output_azure_container=args.output_azure_container
pdf_docs_path=os.path.join(task_working_dir,"pdf_files"+"-"+timestamp_humanreadable)
png_docs_path=os.path.join(task_working_dir,"png_files"+"-"+timestamp_humanreadable)
os.mkdir(pdf_docs_path)
os.mkdir(png_docs_path)
except Exception as e:
sentry_sdk.capture_exception(e)
azure_active_directory_login(azure_username,azure_password,azure_tenant)
download_from_azure_blob(azure_storage_account,azure_storage_account_key,input_azure_container,file_to_process,pdf_docs_path)
pdf_to_png(pdf_docs_path,png_docs_path)
upload_to_azure_blob(azure_storage_account,azure_storage_account_key,output_azure_container,png_docs_path)
upload_to_fileshare(png_docs_path)
The upload_to_fileshare() in the python app above should initiate the upload but in my case nothing happens and there is no error in the copy operation in the stderr.txt files
Please let me know a way to troubleshoot this issue
It does not look like the run elevated parameter is exposed via a command line argument through the CLI. You can however specify a JSON file to the --json argument formatted as the REST API object to get all functionalities.
Related
"Unable to read file [input.json]: [Errno 2] No such file or directory: 'input.json'" When running local predict
I am following the Google Cloud guide to upload my prediction model via this guide: https://cloud.google.com/ml-engine/docs/scikit/quickstart?fbclid=IwAR1HjyGajUj-IiiEeshmViiN3GK97sJgwO1h4O0R3yVYubtwlNOhd1v-0Vs Both the input.json file and the model.pkl file is placed directly in my bucket. When trying to do a local prediction test, which can be seen in the code bit: gcloud ai-platform local predict --model-dir=$MODEL_DIR \ --json-instances $INPUT_FILE \ --framework $FRAMEWORK I get the following error: "Unable to read file [input.json]: [Errno 2] No such file or directory: 'input.json'" Can anyone help me out?
${INPUT_FILE} should be locally in the directory(pwd) from where you run the gcloud command. The gcloud command uploads ${INPUT_FILE} from your local machine to the prediction service. The prediction service uses your model (which is in the Cloud Storage bucket ${MODEL_DIR}).
Export .MWB to working .SQL file using command line
We recently installed a server dedicated to unit tests, which deploys updates automatically via Jenkins when commits are done, and sends mails when a regression is noticed > This requires our database to always be up-to-date Since the database-schema-reference is our MWB, we added some scripts during deploy, which export the .mwb to a .sql (using python) This worked fine... but still has some issues Our main concern is that the functions attached to the schema are not exported at all, which makes the DB unusable. We'd like to hack into the python code to make it export scripts... but didn't find enough informations about it. Here is the only piece of documentation we found. It's not very clear for us. We didn't find any information about exporting scripts. All we found is that a db_Script class exists. We don't know where we can find its instances in our execution context, nor if they can be exported easily. Did we miss something ? For reference, here is the script we currently use for the mwb to sql conversion (mwb2sql.sh). It calls the MySqlWorkbench from command line (we use a dummy x-server to flush graphical output.) What we need to complete is the python part passed in our command-line call of workbench. # generate sql from mwb # usage: sh mwb2sql.sh {mwb file} {output file} # prepare: set env MYSQL_WORKBENCH if [ "$MYSQL_WORKBENCH" = "" ]; then export MYSQL_WORKBENCH="/usr/bin/mysql-workbench" fi export INPUT=$(cd $(dirname $1);pwd)/$(basename $1) export OUTPUT=$(cd $(dirname $2);pwd)/$(basename $2) "$MYSQL_WORKBENCH" \ --open $INPUT \ --run-python " import os import grt from grt.modules import DbMySQLFE as fe c = grt.root.wb.doc.physicalModels[0].catalog fe.generateSQLCreateStatements(c, c.version, {}) fe.createScriptForCatalogObjects(os.getenv('OUTPUT'), c, {})" \ --quit-when-done set -e
cx_freeze to access .json files
I have created an application for windows using pythons cx_freeze module. The application runs the openpyxl module which runs fine for the script but when frozen it fails to find the .constants.json files. The following error is displayed. FileNotFoundError: [Errno 2] No such file or directory: 'C:....\exe.win-amd64-3.4\library.zip\openpyxl.constants.json' I have found a fix for this (https://cx-freeze.readthedocs.org/en/latest/faq.html#using-data-files) detailed below : def find_data_file(filename): if getattr(sys, 'frozen', False): # The application is frozen datadir = os.path.dirname(sys.executable) else: # The application is not frozen # Change this bit to match where you store your data files: datadir = os.path.dirname(__file__) return os.path.join(datadir, filename) The question I have is where do I paste this code? Does it go in the setup.py file? Or somewhere else?
web2py function not triggered on user request
Using web2py (Version 2.8.2-stable+timestamp.2013.11.28.13.54.07), on 64-bit Windows, I have the following problem There is an exe program that is started on user request (first an txt file is created then p is triggered). p = subprocess.Popen(['woshi_engine.exe', scriptId], shell=True, stdout = subprocess.PIPE, cwd=path_1) while the exe file is running it is creating a txt file. The program is stopped on user request by deleting the file the program needs as input. when exe is started i have other requests user can trigger. it is common that request comes to server (I used microsoft network monitor to check that), but the function is not triggered. I tried using scheduler but no success. Same problem I am really stuck here with this problem Thank you for your help
With a help of web2py google group the solution is. I used scheduler. Created a scheduler.py file with the following code def runWoshiEngine(scriptId, path): import os, sys import time import subprocess p = subprocess.Popen(['woshi_engine.exe', scriptId], shell=True, stdout = subprocess.PIPE, cwd=path) return dict(status = 1) from gluon.scheduler import Scheduler scheduler = Scheduler(db) In my controller function task = scheduler.queue_task(runWoshiEngine, [scriptId, path]) you also have to import scheduler (from gluon.scheduler import Scheduler) then I run the scheduler from command prompt with the following (so if I understood correctly you have two instances of web2py running, one for webserver, one for scheduler) web2py.py -K woshiweb -D 0 (-D 0 is for verbose logging so it can be removed)
Boto3 Error: botocore.exceptions.NoCredentialsError: Unable to locate credentials
When I simply run the following code, I always gets this error. s3 = boto3.resource('s3') bucket_name = "python-sdk-sample-%s" % uuid.uuid4() print("Creating new bucket with name:", bucket_name) s3.create_bucket(Bucket=bucket_name) I have saved my credential file in C:\Users\myname\.aws\credentials, from where Boto should read my credentials. Is my setting wrong? Here is the output from boto3.set_stream_logger('botocore', level='DEBUG'). 2015-10-24 14:22:28,761 botocore.credentials [DEBUG] Skipping environment variable credential check because profile name was explicitly set. 2015-10-24 14:22:28,761 botocore.credentials [DEBUG] Looking for credentials via: env 2015-10-24 14:22:28,773 botocore.credentials [DEBUG] Looking for credentials via: shared-credentials-file 2015-10-24 14:22:28,774 botocore.credentials [DEBUG] Looking for credentials via: config-file 2015-10-24 14:22:28,774 botocore.credentials [DEBUG] Looking for credentials via: ec2-credentials-file 2015-10-24 14:22:28,774 botocore.credentials [DEBUG] Looking for credentials via: boto-config 2015-10-24 14:22:28,774 botocore.credentials [DEBUG] Looking for credentials via: iam-role
try specifying keys manually s3 = boto3.resource('s3', aws_access_key_id=ACCESS_ID, aws_secret_access_key= ACCESS_KEY) Make sure you don't include your ACCESS_ID and ACCESS_KEY in the code directly for security concerns. Consider using environment configs and injecting them in the code as suggested by #Tiger_Mike. For Prod environments consider using rotating access keys: https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html#Using_RotateAccessKey
I had the same issue and found out that the format of my ~/.aws/credentials file was wrong. It worked with a file containing: [default] aws_access_key_id=XXXXXXXXXXXXXX aws_secret_access_key=YYYYYYYYYYYYYYYYYYYYYYYYYYY Note that there must be a profile name "[default]". Some official documentation make reference to a profile named "[credentials]", which did not work for me.
If you are looking for an alternative way, try adding your credentials using AmazonCLI from the terminal type:- aws configure then fill in your keys and region.
Make sure your ~/.aws/credentials file in Unix looks like this: [MyProfile1] aws_access_key_id = yourAccessId aws_secret_access_key = yourSecretKey [MyProfile2] aws_access_key_id = yourAccessId aws_secret_access_key = yourSecretKey Your Python script should look like this, and it'll work: from __future__ import print_function import boto3 import os os.environ['AWS_PROFILE'] = "MyProfile1" os.environ['AWS_DEFAULT_REGION'] = "us-east-1" ec2 = boto3.client('ec2') # Retrieves all regions/endpoints that work with EC2 response = ec2.describe_regions() print('Regions:', response['Regions']) Source: https://boto3.readthedocs.io/en/latest/guide/configuration.html#interactive-configuration.
I also had the same issue,it can be solved by creating a config and credential file in the home directory. Below show the steps I did to solve this issue. Create a config file : touch ~/.aws/config And in that file I entered the region [default] region = us-west-2 Then create the credential file: touch ~/.aws/credentials Then enter your credentials [Profile1] aws_access_key_id = XXXXXXXXXXXXXXXXXXXX aws_secret_access_key = YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY After set all these, then my python file to connect bucket. Run this file will list all the contents. import boto3 import os os.environ['AWS_PROFILE'] = "Profile1" os.environ['AWS_DEFAULT_REGION'] = "us-west-2" s3 = boto3.client('s3', region_name='us-west-2') print("[INFO:] Connecting to cloud") # Retrieves all regions/endpoints that work with S3 response = s3.list_buckets() print('Regions:', response) You can also refer below links: Amazon S3 with Python Boto3 Library Boto 3 documentation Boto3: Amazon S3 as Python Object Store
from the terminal type:- aws configure then fill in your keys and region. after this do next step use any environment. You can have multiple keys depending your account. Can manage multiple enviroment or keys import boto3 aws_session = boto3.Session(profile_name="prod") # Create an S3 client s3 = aws_session.client('s3')
Create an S3 client object with your credentials AWS_S3_CREDS = { "aws_access_key_id":"your access key", # os.getenv("AWS_ACCESS_KEY") "aws_secret_access_key":"your aws secret key" # os.getenv("AWS_SECRET_KEY") } s3_client = boto3.client('s3',**AWS_S3_CREDS) It is always good to get credentials from os environment To set Environment variables run the following commands in terminal if linux or mac $ export AWS_ACCESS_KEY="aws_access_key" $ export AWS_SECRET_KEY="aws_secret_key" if windows c:System\> set AWS_ACCESS_KEY="aws_access_key" c:System\> set AWS_SECRET_KEY="aws_secret_key"
Exporting the credential also work, In linux: export AWS_SECRET_ACCESS_KEY="XXXXXXXXXXXX" export AWS_ACCESS_KEY_ID="XXXXXXXXXXX"
These instructions are for windows machine with a single user profile for AWS. Make sure your ~/.aws/credentials file looks like this [profile_name] aws_access_key_id = yourAccessId aws_secret_access_key = yourSecretKey I had to set the AWS_DEFAULT_PROFILEenvironment variable to profile_name found in your credentials. Then my python was able to connect. eg from here import boto3 # Let's use Amazon S3 s3 = boto3.resource('s3') # Print out bucket names for bucket in s3.buckets.all(): print(bucket.name)
I work for a large corporation and encountered this same error, but needed a different work around. My issue was related to proxy settings. I had my proxy set up so I needed to set my no_proxy to whitelist AWS before I was able to get everything to work. You can set it in your bash script as well if you don't want to muddy up your Python code with os settings. Python: import os os.environ["NO_PROXY"] = "s3.amazonaws.com" Bash: export no_proxy = "s3.amazonaws.com" Edit: The above assume a US East S3 region. For other regions: use s3.[region].amazonaws.com where region is something like us-east-1 or us-west-2
If you have multiple aws profiles in ~/.aws/credentials like... [Profile 1] aws_access_key_id = ******************* aws_secret_access_key = ****************************************** [Profile 2] aws_access_key_id = ******************* aws_secret_access_key = ****************************************** Follow two steps: Make one you want to use as a default using export AWS_DEFAULT_PROFILE=Profile 1 command in terminal. Make sure to run above command in the same terminal from where you use boto3 or you open an editor.[Understand the following scenario] Scenario: If you have two terminal open called t1 and t2. And you run the export command in t1 and you open JupyterLab or any other from t2, you will get NoCredentialsError: Unable to locate credentials error. Solution: Run the export command in t1 and then open JupyterLab or any other from the same terminal t1.
In case of MLflow a call to mlflow.log_artifact() will raise this error if you cannot write to AWS3/MinIO data lake. The reason is not setting up credentials in your python env (as these two env vars): os.environ['DATA_AWS_ACCESS_KEY_ID'] = 'login' os.environ['DATA_AWS_SECRET_ACCESS_KEY'] = 'password' Note you may also access MLflow artifacts directly, using minio client (which requires a separate connection to the data lake, apart from mlflow's connection). This client can be started like this: minio_client_mlflow = minio.Minio(os.environ['MLFLOW_S3_ENDPOINT_URL'].split('://')[1], access_key=os.environ['AWS_ACCESS_KEY_ID'], secret_key=os.environ['AWS_SECRET_ACCESS_KEY'], secure=False)
I have solved the problem like this: aws configure Afterwards I manually entered: AWS Access Key ID [None]: xxxxxxxxxx AWS Secret Access Key [None]: xxxxxxxxxx Default region name [None]: us-east-1 Default output format [None]: just hit enter After that it worked for me
The boto3 is looking for the credentials in the folder like C:\ProgramData\Anaconda3\envs\tensorflow\Lib\site-packages\botocore\.aws You should save two files in this folder credentials and config. You may want to check out the general order in which boto3 searches for credentials in this link. Look under the Configuring Credentials sub heading.
If you're sure you configure your aws correctly, just make sure the user of the project can read from ./aws or just run your project as a root
I just had this problem. This is what worked for me: pip install botocore==1.13.20 Source: https://github.com/boto/botocore/issues/1892
In case of using AWS In my case I had to add the following policy in IAM role to allow ec2 tags to be read by the EC2 instances. That would eliminate Unable to locate credentials error : { "Version": "2012-10-17", "Statement": [ { "Sid": "VisualEditor0", "Effect": "Allow", "Action": "ec2:DescribeTags", "Resource": "*" } ] }