Invalid JSON while submitting spark submit job via NiFi - json

I am trying to submit a spark job where I am setting a date argument in conf property and I am running it through a script in NiFi. However, when I am running the script I am facing an error.
Spark Submit Code in the script:
aws emr add-steps --cluster-id "$1" --steps '[{"Args":["spark-submit","--deploy-mode","cluster","--jars","s3://tvsc-lumiq-edl/jars/ojdbc7.jar","--executor-memory","10g","--driver-memory","10g","--conf","spark.hadoop.yarn.timeline-service.enabled=false","--conf","currDate='\"$5\"'","--class",'\"$2\"','\"$3\"','\"$4\"'],"Type":"CUSTOM_JAR","ActionOnFailure":"CONTINUE","Jar":"command-runner.jar","Properties":"","Name":"Spark application"}]' --region "$6"
and after I run it, I get the below error:
ExecuteStreamCommand[id=5b08df5a-1f24-3958-30ca-2e27a6c4becf] Transferring flow file StandardFlowFileRecord[uuid=00f844ee-dbea-42a3-aba3-0edcabfc50a2,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1607082757752-507103, container=default, section=223], offset=29, length=-1],offset=0,name=6414901712887990,size=0] to nonzero status. Executable command /bin/bash ended in an error:
Error parsing parameter '--steps': Invalid JSON:
[{"Args":["spark-submit","--deploy-mode","cluster","--jars","s3://tvsc-lumiq-edl/jars/ojdbc7.jar","--executor-memory","10g","--driver-memory","10g","--conf","spark.hadoop.yarn.timeline-service.enabled=false","--conf","currDate="Fri
Where am I going wrong?

You can use JSONLint to validate your JSON, which makes it easier to see why its wrong.
In your case, you are wrapping the final 3 values in single quotes ' rather than double quotes "
Your steps JSON should look like:
[{
"Args": [
"spark-submit",
"--deploy-mode",
"cluster",
"--jars",
"s3://tvsc-lumiq-edl/jars/ojdbc7.jar",
"--executor-memory",
"10g",
"--driver-memory",
"10g",
"--conf",
"spark.hadoop.yarn.timeline-service.enabled=false",
"--conf",
"currDate='\"$5\"'",
"--class",
"\"$2\"",
"\"$3\"",
"\"$4\""
],
"Type": "CUSTOM_JAR",
"ActionOnFailure": "CONTINUE",
"Jar": "command-runner.jar",
"Properties": "",
"Name": "Spark application"
}]
Specifically, these 3 lines:
"\"$2\"",
"\"$3\"",
"\"$4\""
Instead of the original:
'\"$2\"',
'\"$3\"',
'\"$4\"'

Related

Adding Dollar Sign($) in a VScode Code snippet

I was writing my custom code snippets for verilog files in VScode; VSCode uses JSON files for it.
I observed that $ before finish and dumpvars statements doesn't get printed when I use the snippet since it is a built-in keyword for including variables in strings in json file format,
I have tried adding \ before $ but that didn't work.
Is there any way I can insert $ in my snippets?
This is the relevant code block I am using:
"dump statements": {
"prefix": "dump",
"body": [
"initial begin",
" dumpfile(\"${1:filename}\");",
" \$dumpvars();",
"end"
],
"description": "Prints the dump file and variables statements."
},
Use a double backslash to escape it, like this \\$
"dump statements": {
"prefix": "dump",
"body": [
"initial begin",
" dumpfile(\"${1:filename}\");",
" \\$dumpvars();",
"end"
],
"description": "Prints the dump file and variables statements."
},

Terraform's external data source: STDOUT syntax unclear

I would like to use terraform's external data source to identify certain AWS EC2 instances:
data "external" "monitoring_instances" {
program = ["bash", "${path.module}/../bash/tf_datasource_monitoring.sh"]
query = {
env = var.env_stage
}
}
The bash script is using AWS CLI to return a list of instance IDs.
But I keep receiving this Error: command "bash" produced invalid JSON: json: cannot unmarshal array into Go value of type string
I don't understand what the expected syntax of my script's STDOUT would be for terraform to understand the result.
So let's assume the script is supposed to return 3 instance IDs i-1, i-2 and i-3.
What would be the correct JSON syntax to be returned to terraform?
Examples, that do NOT work:
{
"instances": [
"i-1",
"i-2",
"i-3"
]
}
[
"i-1",
"i-2",
"i-3"
]
It is a known issue in Terraform for provider-external: https://github.com/hashicorp/terraform-provider-external/issues/2. It was opened a while ago, unfortunately is still present for latest version (Terraform v1.011).
You may want to avoid returning JSON objects which contain arrays.
I had the same issue, while executing a python script to generate a dynamic bigquery schema, which is per definition an array of JSONs.
I solved it, by implementing a wrapper JSON with the schema as a string value (see dummy code below).
# get_dynamic_bigquery_schema.py
import json
bigquery_schema = [
{"name": "int_field", "type": "INTEGER", "mode": "NULLABLE", "description": "int_field"},
{"name": "int_field_repeated", "type": "INTEGER", "mode": "REPEATED", "description": "int_field_repeated"}
]
wrapper_json = {'actual_output': str(json.dumps(bigquery_schema))}
print(json.dumps(wrapper_json))
which I can then access within terraform
# main.tf
data "external" "bigquery_schema" {
program = ["python", "${path.module}/get_dynamic_bigquery_schema.py"]
}
locals {
bigquery_schema= data.external.bigquery_schema.result['actual_output']
}

Pass JSON variable to AzureDevops Release definition

My goal is to pass a JSON object from one machine to another using the AzureDevops pipeline variables.
The process starts with a powershell script that obtains the JSON object and compresses it to:
$json=[{"test":"foo","bar":"hello}].
Please note that it will be always an array.
Now, I set the azure variable with:
Write-Host "##vso[task.setvariable variable=Json]$json"
now the variable is initialized in the release pipeline BUT, the double quotes are not escaped.
That means that when I try to obtain the $(Json) in the next script it fails due to invalid characters of course. My question is, how can escape those double quotes? I have tried adding single quotes to the beginning and the end of the string but it won't work. Thanks!
You may try the following format for the JSON file:
$json = #"
[
{
"op": "add",
"path": "/fields/System.Title",
"value": "Bug22"
},
{
"op": "add",
"path": "/fields/Microsoft.VSTS.Common.Severity",
"value": "3 - Medium"
}
]
"# | ConvertTo-Json -Compress
Have you tried wrapping the pipeline variable with a here-string before using it?
$json = #"
$(Json)
"#

Bash JSON string into variable

my idea is to put json string into variable JSON, i have a command whom takes a console login to user in AWS IAM, the comand is : aws iam create-login-profile --cli-input-json file://create-login-profile.json
in create-login-profile.json is the json following content :
{
"UserName": "roberto.viquezzz",
"Password": "aaaaaaaaaaa",
"PasswordResetRequired": true}
i try to write bash script whom contains json var "JSON" like following code:
JSON="{\"UserName\": \"roberto.viquezzz\",\"Password\": \"aaaaaaaaaaa\",\"PasswordResetRequired\" : true}"
aws iam create-login-profile --cli-input-json $JSON
and if i type in console ./file.sh the file execute create a console user.
if it try to execute this code i get an error : Unknown options: "aaaaaaaaaaa","PasswordResetRequired", :, true}, "roberto.viquezzz","Password":
but if i execute this code from command line like :
aws iam create-login-profile --cli-input-json "{\"UserName\": \"roberto.viquezzz\",\"Password\": \"aaaaaaaaaaa\",\"PasswordResetRequired\": true}"
all is be ok , maybe whom know whats wrong ? please suggest!
Put quotes around $JSON:
aws iam create-login-profile --cli-input-json "$JSON"
The quotes that are there during the assignment get consumed by the shell. You can verify this by issuing echo $JSON.
By adding the quotes you will make sure that the entire string is passed to the command "aws" as a single argument.

Need help! - Unable to load JSON using COPY command

Need your expertise here!
I am trying to load a JSON file (generated by JSON dumps) into redshift using copy command which is in the following format,
[
{
"cookieId": "cb2278",
"environment": "STAGE",
"errorMessages": [
"70460"
]
}
,
{
"cookieId": "cb2271",
"environment": "STG",
"errorMessages": [
"70460"
]
}
]
We ran into the error - "Invalid JSONPath format: Member is not an object."
when I tried to get rid of square braces - [] and remove the "," comma separator between JSON dicts then it loads perfectly fine.
{
"cookieId": "cb2278",
"environment": "STAGE",
"errorMessages": [
"70460"
]
}
{
"cookieId": "cb2271",
"environment": "STG",
"errorMessages": [
"70460"
]
}
But in reality most JSON files from API s have this formatting.
I could do string replace or reg ex to get rid of , and [] but I am wondering if there is a better way to load into redshift seamlessly with out modifying the file.
One way to convert a JSON array into a stream of the array's elements is to pipe the former into jq '.[]'. The output is sent to stdout.
If the JSON array is in a file named input.json, then the following command will produce a stream of the array's elements on stdout:
$ jq ".[]" input.json
If you want the output in jsonlines format, then use the -c switch (i.e. jq -c ......).
For more on jq, see https://stedolan.github.io/jq