I would like to import variables defined in a json file(my_info.json) as attibutes for bazel rules.
I tried this (https://docs.bazel.build/versions/5.3.1/skylark/tutorial-sharing-variables.html) and works but do not want to use a .bzl file and import variables directly to attributes to BUILD.bazel.
I want to use those variables imported from my_info.json as attributes for other BUILD.bazel files.
projects/python_web/BUILD.bazel
load("//projects/tools/parser:config.bzl", "MY_REPO","MY_IMAGE")
container_push(
name = "publish",
format = "Docker",
registry = "registry.hub.docker.com",
repository = MY_REPO,
image = MY_IMAGE,
tag = "1",
)
Asking the similar in Bazel slack I was informed the is not possible to import variables directly to Bazel and it is needed to parse the json variables and write them into a .bzl file.
I tried also this code but nothing is written in config.bzl file.
my_info.json
{
"MYREPO" : "registry.hub.docker.com",
"MYIMAGE" : "michael/monorepo-python-web"
}
WORKSPACE.bazel
load("//projects/tools/parser:jsonparser.bzl", "load_my_json")
load_my_json(
name = "myjson"
)
projects/tools/parser/jsonparser.bzl
def _load_my_json_impl(repository_ctx):
json_data = json.decode(repository_ctx.read(repository_ctx.path(Label(":my_info.json"))))
config_lines = ["%s = %s" % (key, repr(val)) for key, val in json_data.items()]
repository_ctx.file("config.bzl", "\n".join(config_lines))
load_my_json = repository_rule(
implementation = _load_my_json_impl,
attrs = {},
)
projects/tools/parser/BUILD.bazel
load("#aspect_bazel_lib//lib:yq.bzl", "yq")
load(":config.bzl", "MYREPO", "MY_IMAGE")
yq(
name = "convert",
srcs = ["my_info2.json"],
args = ["-P"],
outs = ["bar.yaml"],
)
Executing:
% bazel build projects/tools/parser:convert
ERROR: Traceback (most recent call last):
File "/Users/michael.taquia/Documents/Personal/Projects/bazel/bazel-projects/multi-language-bazel-monorepo/projects/tools/parser/BUILD.bazel", line 2, column 22, in <toplevel>
load(":config.bzl", "MYREPO", "MY_IMAGE")
Error: file ':config.bzl' does not contain symbol 'MYREPO'
When making troubleshooting I see the execution calls the jsonparser.bzl but never enters to _load_my_json_impl function (based in print statements) and does not write anything to config.bzl.
Notes: Tested on macOS 12.6 (21G115 ) Darwin Kernel Version 21.6.0
There is a better way to do that? A code snippet will be very useful.
I am trying to design a circos plot using BioCircos R package. BioCircos allows to save the plots as .html interactive files. However, when I run the package using RScript the saved .html file is empty. To save the .html file I used saveWidget option from htmlwidgets package. Is it something wrong with saveWidget option? The code I used follows:
#!/usr/bin/Rscript
######R script for BioCircos test
library(htmlwidgets)
library(BioCircos)
genomes <- list("chra1" = 217471166, "chra2" = 181034961, "chra3" = 153873357, "chra4" = 153961319, "chra5" = 164033575,
"chra6" = 154486312, "chra7" = 133565930, "chra8" = 147241510, "chra9" = 91218944, "chra10" = 52432566, "chrb1" = 843366180, "chrb2" = 842558404, "chrb3" = 707956555, "chrb4" = 635713434, "chrb5" = 567300182,
"chrb6" = 439630435, "chrb7" = 236595445, "chrb8" = 231667822, "chrb9" = 230778867, "chrb10" = 151572763, "chrb11" = 103205957) # custom genome
links_chromosomes_01 <- c("chra1", "chra2", "chra3", "chra4", "chra4", "chra5", "chra6", "chra7", "chra7", "chra8", "chra8", "chra9", "chra10") # Chromosomes on which the links should start
links_chromosomes_02 <- c("chrb2", "chrb3", "chrb1", "chrb9", "chrb10", "chrb4", "chrb5", "chrb6", "chrb1", "chrb8", "chrb3", "chrb7", "chrb6") # Chromosomes on which the links should end
links_pos_01 <- c(115060347, 102611974, 14761160, 128700431, 128681496, 42116205, 58890582, 40356090,
146935315, 136481944, 157464876, 39323393, 84752508, 136164354,
99573657, 102580613,
111139346, 120764772, 90748238, 122164776,
44933176, 18823342,
48771409, 128288229, 150613881, 18509106, 123913217, 51237349,
34237851, 53357604, 78270031,
25306417, 25320614,
94266153,
41447919, 28810876, 2802465,
45583472,
81968637, 27858237, 17263637,
30569409) ### links chra chromosomes
links_pos_02 <- c(410543481, 463189512, 825903588, 353914638, 354135472, 717707494, 643107332, 724899652,
583713545, 558756961, 642015290, 154999098, 340216235, 557731577,
643350872, 655077847,
85356666, 157889318, 226411560, 161566470,
109857786, 25338955,
473876792, 124495704, 46258030, 572314729, 141584107, 426419779,
531245660, 220131772, 353941099,
62422773, 62387030,
116923325,
76544045, 33452274, 7942164,
642047816,
215981114, 39278129, 23302654,
418922633) ### links chrb chromosomes
links_labels <- c("aldh1a3", "amh", "cyp26b1", "dmrt1", "dmrt3", "fgf20", "hhip", "srd5a3",
"amhr2", "dhh", "fgf9", "nr0b1", "rspo1", "wnt1",
"aldh1a2", "cyp19a1",
"lhx9", "pdgfb", "ptch2", "sox10",
"cbln1", "wt1",
"esr1", "foxl2", "gata4", "lrpprc", "serpine2", "srd5a2",
"asns", "ctnnb1", "srd5a1",
"cyp26a1", "cyp26c1",
"wnt4",
"ar", "nr5a1", "ptgds",
"fgf16",
"cxcr4", "pdgfa", "sox8",
"sox9")
tracklist <- BioCircosLinkTrack('myLinkTrack', links_chromosomes_01, links_pos_01,
links_pos_01, links_chromosomes_02, links_pos_02, links_pos_02,
maxRadius = 0.55, labels = links_labels)
#plotting results
plot_chra_chrb <- BioCircos(tracklist, genome = chra_chrb_genomes, genomeFillColor = "RdBu", chrPad = 0.02, displayGenomeBorder = FALSE, genomeLabelTextSize = "10pt", genomeTicksScale = 4e+3,
elementId = "chra_chrb_comp_plot_test.html")
saveWidget(plot_chra_chrb, "chra_chrb_comp_plot_test.html", selfcontained = F, libdir = "lib")
The command line to run this script:
Rscript /path_to/Circle_plot_test.r
I tried to use this script in RStudio (without saveWidget() command), however it took too long to run in my personnel computer and the results was not displayed. However, this could be due to memory usage setup because when I took off some data, the script easily generates the plot in RStudio and I am able to save it. Is there other way to save the .hmtl interactive files in R or am I doing something wrong using htmlwidgets package in my script?
Thanks all in advance for any help and comments.
When you said it took too long to run, that was a sign that something was wrong! You weren't getting anything when you used saveWidget, because there is nothing returned from BioCiros.
I found two things that are a problem. The first one will result in a blank output—you can't use a '.' in the element ID. This ID will be used in the HTML coding.
You were getting huge delays due to the scale you set for genomeTickScale. That scaling value is for a tick mark attribute. I'm not sure why you set it to .004. However, when I comment out that line, it renders immediately. I have no issues with saving the widget, either.
--One other thing, you had chra_chrb_genomes as the object name assigned to the parameter genome in the function BioCircos. I assumed it was the object genome from your question since it was the only unused object.
The only things I changed were in the BioCircos function:
(plot_chra_chrb <- BioCircos(tracklist, genome = genomes, #chra_chrb_genomes,
genomeFillColor = "RdBu",
chrPad = 0.02,
displayGenomeBorder = FALSE,
genomeLabelTextSize = "10pt",
# genomeTicksScale = 4e+3, # problematic
elementId = "chra_chrb_comp_plot_test" # no periods
))
For my application, new file uploaded to storage is read and the data is added to a main file. The new file contains 2 lines, one a header and other an array whose values are separated by a comma. The main file will need maximum of 265MB. The new files will have maximum of 30MB.
def write_append_to_ecg_file(filename,ecg,patientdata):
file1 = open('/tmp/'+ filename,"w+")
file1.write(":".join(patientdata))
file1.write('\n')
file1.write(",".join(ecg.astype(str)))
file1.close()
def storage_trigger_function(data,context):
#Download the segment file
download_files_storage(bucket_name,new_file_name,storage_folder_name = blob_path)
#Read the segment file
data_from_new_file,meta = read_new_file(new_file_name, scale=1, fs=125, include_meta=True)
print("Length of ECG data from segment {} file {}".format(segment_no,len(data_from_new_file)))
os.remove(new_file_name)
#Check if the main ecg_file_exists
file_exists = blob_exists(bucket_name, blob_with_the_main_file)
print("File status {}".format(file_exists))
data_from_main_file = []
if ecg_file_exists:
download_files_storage(bucket_name,main_file_name,storage_folder_name = blob_with_the_main_file)
data_from_main_file,meta = read_new_file(main_file_name, scale=1, fs=125, include_meta=True)
print("ECG data from main file {}".format(len(data_from_main_file)))
os.remove(main_file_name)
data_from_main_file = np.append(data_from_main_file,data_from_new_file)
print("data after appending {}".format(len(data_from_main_file)))
write_append_to_ecg_file(main_file,data_from_main_file,meta)
token = upload_files_storage(bucket_name,main_file,storage_folder_name = main_file_blob,upload_file = True)
else:
write_append_to_ecg_file(main_file,data_from_new_file,meta)
token = upload_files_storage(bucket_name,main_file,storage_folder_name = main_file_blob,upload_file = True)
The GCF is deployed
gcloud functions deploy storage_trigger_function --runtime python37 --trigger-resource patch-us.appspot.com --trigger-event google.storage.object.finalize --timeout 540s --memory 8192MB
For the first file, I was able to read the file and write the data to the main file. But after uploading the 2nd file, its giving Function execution took 70448 ms, finished with status: 'connection error' On uploading the 3rd file, it gives the Function invocation was interrupted. Error: memory limit exceeded. Despite of deploying the function with 8192MB memory, I am getting this error. Can I get some help on this.
I have mounted my GDrive and have csv file in a folder. I am following the tutorial. However, when I issue the tf.keras.utils.get_file(), I get a ValueError As follows.
data_folder = r"/content/drive/My Drive/NLP/project2/data"
import os
print(os.listdir(data_folder))
It returns:
['crowdsourced_labelled_dataset.csv',
'P2_Testing_Dataset.csv',
'P2_Training_Dataset_old.csv',
'P2_Training_Dataset.csv']
TRAIN_DATA_URL = os.path.join(data_folder, 'P2_Training_Dataset.csv')
train_file_path = tf.compat.v1.keras.utils.get_file("train.csv", TRAIN_DATA_URL)
But this returns:
Downloading data from /content/drive/My Drive/NLP/project2/data/P2_Training_Dataset.csv
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-16-5bd642083471> in <module>()
2 TRAIN_DATA_URL = os.path.join(data_folder, 'P2_Training_Dataset.csv')
3 TEST_DATA_URL = os.path.join(data_folder, 'P2_Testing_Dataset.csv')
----> 4 train_file_path = tf.compat.v1.keras.utils.get_file("train.csv", TRAIN_DATA_URL)
5 test_file_path = tf.compat.v1.keras.utils.get_file("eval.csv", TEST_DATA_URL)
6 frames
/usr/lib/python3.6/urllib/request.py in _parse(self)
382 self.type, rest = splittype(self._full_url)
383 if self.type is None:
--> 384 raise ValueError("unknown url type: %r" % self.full_url)
385 self.host, self.selector = splithost(rest)
386 if self.host:
ValueError: unknown url type: '/content/drive/My Drive/NLP/project2/data/P2_Training_Dataset.csv'
What am I doing wrong please?
As per the docs, this will be the outcome of a call to the function tf.compat.v1.keras.utils.get_file.
tf.keras.utils.get_file(
fname,
origin,
untar=False,
md5_hash=None,
file_hash=None,
cache_subdir='datasets',
hash_algorithm='auto',
extract=False,
archive_format='auto',
cache_dir=None
)
By default the file at the url origin is downloaded to the cache_dir ~/.keras, placed in the cache_subdir datasets, and given the filename fname. The final location of a file example.txt would therefore be ~/.keras/datasets/example.txt.
Returns:
Path to the downloaded file
Since you already have the data in your drive, there's no need to download it again (and IIUC, the function is expecting an accessible URL). Also, there's no need of obtaining the file name from a function call because you already know it.
Assuming the drive is mounted, you can replace your file paths as below:
train_file_path = os.path.join(data_folder, 'P2_Training_Dataset.csv')
test_file_path = os.path.join(data_folder, 'P2_Testing_Dataset.csv')
I am using boto3 with Python 3.6 to start a Step Function execution. The Step Function is designed to share my base AMI across all my accounts. I have 4 variables I need to pass to the input parameter to kick off the execution. These are the AMI ID, the account list of accounts I own, the source account, and the KMS key. The AMI ID and the account list are constructed in my code and are the variables that need to get passed dynamically. According to the documentation, input is a string that contains the JSON input data for the execution and gives the following example: "input": "{\"ami_id\" : \"ami_id\"}". My question is how do I pass the variables in question to this parameter as the value? My code is below with the traceback:
CODE:
import boto3
import json
# Get an STS token to assume roles into AWS accounts
def get_sts_token(**kwargs):
role_arn = kwargs['RoleArn']
region_name = kwargs['RegionName']
sts = boto3.client(
'sts',
region_name=region_name,
)
token = sts.assume_role(
RoleArn=role_arn,
RoleSessionName='GetInstances',
DurationSeconds=900,
)
return token["Credentials"]
def get_accounts():
role_arn = "arn:aws:iam::xxxxxxxxxx:role/list-accounts-role"
region_name = "us-east-1"
token = get_sts_token(RoleArn=role_arn, RegionName=region_name)
access_key = token['AccessKeyId']
secret_key = token['SecretAccessKey']
session_token = token['SessionToken']
client = boto3.client('organizations',
aws_access_key_id=access_key,
aws_secret_access_key=secret_key,
aws_session_token=session_token)
moreAccounts=True
nexttoken=''
global accountList
accountList =[]
while moreAccounts:
if (len(nexttoken)>0):
accounts=client.list_accounts(NextToken=nexttoken)
else:
accounts=client.list_accounts()
if 'NextToken' in accounts:
nexttoken=accounts['NextToken']
else:
moreAccounts=False
for account in accounts['Accounts']:
if account['Status'] != 'SUSPENDED' and account['Status'] != 'CLOSED' :
account_id = account['Id']
accountList.append(account_id)
print(accountList)
def trigger_sfn():
ssm = boto3.client('ssm')
role_arn = "arn:aws:iam::xxxxxxxx:role/execute-sfn"
region_name = "us-east-1"
ami_id = ssm.get_parameter(Name='/BaseAMI/newest')['Parameter']['Value']
print(ami_id)
token = get_sts_token(RoleArn=role_arn, RegionName=region_name)
print(token)
access_key = token['AccessKeyId']
secret_key = token['SecretAccessKey']
session_token = token['SessionToken']
sfn = boto3.client('stepfunctions',
aws_access_key_id=access_key,
aws_secret_access_key=secret_key,
aws_session_token=session_token)
response = sfn.start_execution(
stateMachineArn='arn:aws:states:us-east-1:xxxxxxxx:stateMachine:ami-share',
input="{\"ami_id\": ami_id, \"source_account_id\": \"112233445566\", \"accountList\": accountList, \"kms_key_arn\": \"alias/aws/ebs\"}"
)
print(response)
TRACE:
An error occurred (InvalidExecutionInput) when calling the
StartExecution operation: Invalid State Machine Execution Input:
'com.fasterxml.jackson.core.JsonParseException: Unrecognized token
'ami_id': was expecting ('true', 'false' or 'null')
at [Source: (String)"{"ami_id": ami_id, "source_account_id":
"112233445566", "accountList": accountList, "kms_key_arn":
"alias/aws/ebs"}"; line: 1, column: 18]': InvalidExecutionInput
Traceback (most recent call last):
File "/var/task/lambda_function.py", line 6, in lambda_handler
trigger_sfn()
File "/var/task/lambda_function.py", line 96, in trigger_sfn
input="{\"ami_id\": ami_id, \"source_account_id\": \"112233445566\",
\"accountList\": accountList, \"kms_key_arn\": \"alias/aws/ebs\"}"
File "/var/runtime/botocore/client.py", line 314, in _api_call
return self._make_api_call(operation_name, kwargs)
File "/var/runtime/botocore/client.py", line 612, in _make_api_call
raise error_class(parsed_response, operation_name)
botocore.errorfactory.InvalidExecutionInput: An error occurred
(InvalidExecutionInput) when calling the StartExecution operation:
Invalid State Machine Execution Input:
'com.fasterxml.jackson.core.JsonParseException: Unrecognized token
'ami_id': was expecting ('true', 'false' or 'null')
at [Source: (String)"{"ami_id": ami_id, "source_account_id":
"112233445566", "accountList": accountList, "kms_key_arn":
"alias/aws/ebs"}"; line: 1, column: 18]'
You can add the value of the variable to the string like this:
input="{\"ami_id\": \"" + ami_id + "\", \"source_account_id\": \"112233445566\", \"accountList\": accountList, \"kms_key_arn\": \"alias/aws/ebs\"}"
Your code currently sends the literal string "ami_id", instead of the variable value of ami_id