ariflow2: How to render ds_nodash macro as YYYY/MM - jinja2

I am currently migrating a job from Airflow 1.10.14 to 2.1.4
In airflow2, I am using the operator BeamRunPythonPipelineOperator, and one of the requirements is to store data in GCS, following this pattern: gs://datalate/data_source/YYYY/MM/model.
partition_sessions_unlimited = BeamRunPythonPipelineOperator(
task_id="partition_sessions_unlimited",
dag=aggregation_dag,
py_file=os.path.join(
BEAM_SRC_DIR,
"streaming_sessions",
"streaming_session_aggregation_pipeline.py",
),
runner="DataflowRunner",
dataflow_config=DataflowConfiguration(
job_name="%s_partition_sessions_unlimited" % ds_env,
project_id=GCP_PROJECT_ID,
location="us-central1",
),
pipeline_options={
"temp_location": "gs://dataflow-temp/{}/{}/amazon_sessions/amz_unlimited".format(
sch_date, ds_env
),
"staging_location": "gs://dataflow-staging/{}/{}/amazon_sessions/amz_unlimited".format(
sch_date, ds_env
),
"disk_size_gb": "100",
"num_workers": "10",
"num_max_workers": "25",
"worker_machine_type": "n1-highcpu-64",
"setup_file": os.path.join(
BEAM_SRC_DIR, "streaming_sessions", "setup.py"
),
"input": "gs://{}/amazon_sessions/{{ ds_nodash[:4] }}/{{ ds_nodash[4:6] }}/amz_unlimited/input/listens_*".format(
w_datalake,
),
"output": "gs://{}/amazon_sessions/{{ ds_nodash[:4] }}/{{ ds_nodash[4:6] }}/amz_unlimited/output/sessions_".format(
w_datalake
),
},
)
however, I get
'output': 'gs://datalake/amazon_sessions/{ ds_nodash[:4] }/{ ds_nodash[4:6] }/amz_prime/output/sessions_',
instead of
'output': 'gs://datalake/amazon_sessions/2022/02/amz_prime/output/sessions_',
How can I achieve this?

First, you are using a format string for jinja templated field.
format() will replace {var} to value from params that are passed, if it exists.
"gs://{}/.../{{ ds_nodash[:4] }}...".format(w_datalake)
First {} is replaced with "datalake" and 2nd part doesn't have any equivalent param that is passed, so resulted in the literal "ds_nodash[:4]".
"gs://datalake/.../{ds_nodash[:4]}..."
In order to use jinja template within the formatted string, you can escape the { and } for part you are intended to get value from jinja. To escape { you add another {, and for } you add another }. Original one has 2 {{ so add 2 { on each side like this;
"gs://{}/.../{{{{ ds_nodash[:4] }}}}...".format(w_datalake)
With this, format will be applied first (replacing the value and take out the escape symbol) and turned this string to
gs://datalake/.../{{ ds_nodash[:4] }}...
then this string is passed to BeamRunPythonPipelineOperator where this part is converted with jinja fields.
Secondly, instead of using ds_nodash twice with slicing, you can use execution_date to format as you like
{{ execution_date.strftime('%Y/%m') }}

Related

Method to make themes in vscode conditional

I use bracket pair colorizer. Is it possible that the bracket pair settings can be modified if the theme is (for example) monokai?
Ik its a json file and json files dont accept if statements, otherwise id write:
if "theme" == "monokai" {
bracket-pair-colorizer-2.colors: [
"red",
"green",
"blue"
]
}, else {...}
Is there any way to make this reality?
Edit:
I tried setting editor.tokenColorCustomizations's brackets option to a list, and string, then tried that with parenthesis, and bracket, but it simply said that it cannot do that

Convert string to integer inside body data in jmeter

I stored jobID in variable as newID after create a job ,I try to pass this newID as integer ,I want to convert it as integer because when i create a job it return string ID ?
{"limit":100,"offset":0,"countryID":null,"experienceYear":null,"languageIdIn":[],"degreeLevel":null,"jobID":"${__groovy(newID)}"]}
```
In JSON you need to remove the quotes around integer value
jobID":${newID}
Your request body is not a valid JSON, you need to remove this unbalanced ], you can check it yourself using online JSON validator tool
You need to remove quotation marks around jobID attribute value, see JSON Data Types for more details
${__groovy(newID)} expression is syntactically incorrect, if newID is a JMeter Variable - you need to refer it using vars shorthand for JMeterVariables class like:
Putting everything together:
{
"limit": 100,
"offset": 0,
"countryID": null,
"experienceYear": null,
"languageIdIn": [
],
"degreeLevel": null,
"jobID": ${__groovy(vars.get('newID'),)}
}

String Interpolation in Keys in Jsonnet

I'm wondering if it's possible to have string interpolation in keys when using jsonnet?
For example, I want to do something like this:
{
std.format("Hello %03d", 12): "milk"
}
But it results in
STATIC ERROR: arith.jsonnet:2:5: expected token OPERATOR but got "."
I know the 'key' itself is valid, because if I don't use interpolation it works fine, i.e.
{
"milk": std.format("Hello %03d", 12),
"Hello 12": "milk"
}
generates:
{
"Hello 12": "milk",
"milk": "Hello 012"
}
It also looks like I can't use variables in keys, as they get resolved as just a string (not the variable's value) - any suggestions would be appreciated.
For computed field names, you need to wrap them with [] (see https://jsonnet.org/learning/tutorial.html#computed_field_names ), i.e. below will just work:
{
[std.format("Hello %03d", 12)]: "milk"
}

Ruby sort_by for arrays returned by MySQL, date formatted as string

I have a database that has a task table. In that table, there is a date column. Those dates are formatted as strings, they aren't Date.
I'm trying to sort these tasks by date. I already have an array of the tasks named tasks. I'm trying to replace it with the sorted array called tasksByDate using the below code.
tasksByDate = tasks.sort_by do |task|
task[:date].to_date
end
The error I'm getting is:
TypeError: no implicit conversion of Symbol into Integer
I also tried without to_date just to see if it would sort it without it being a date, and just being a string.
The date field is formatted as a string like so 2016-08-29. I used the to_date method on it somewhere else in the code, and it works great, so I didn't really think that was the problem.
Edit 1
I have checked that tasks actually contains a date, and it is formatted like explained.
The output of p task.class is Array
Edit 2
The output of p task is
[#<User id: 10, login: "my.name", hashed_password: "", date: "2016-08-29">]
The elements appear to be nested deeper than you expected them to be. Change the your code to:
# use '{ }' instead of 'do end' for a single-line blocks
tasksByDate = tasks.sort_by { |task| task.first[:date].to_date }
Explanation:
What you see as an output of p task:
[#<User id: 10, login: "my.name", hashed_password: "", date: "2016-08-29">]
It means that this is an Array of elements. Notice the enclosing braces [ ]. So what you have to do in this case is task.first, which will return:
#<User id: 10, login: "my.name", hashed_password: "", date: "2016-08-29">
From there you should be able to access the element's values by a key, like you intended:
task.first[:date]

Get the value of a JSON array with _attribute

I have a strange looking JSON file (I think?) generated from elasticsearch.
I was wondering if anyone know how I could retrieve the data from a JSON object looking like this:
u'hits : {
u'hits : [{
u'_score' : 2.1224,
u'_source' : {u'content': u'SomethingSomething' }
}],
u'total: 8 }
u'took: 2 }
I can retrieve the total by writing {{ results.hits.hits.total }}, however, the underscore symbol (_) in front of the attribute name "_score" makes it impossible to retrieve the value of that attribute.
Any suggestions?
Try:
{{ results.hits.hits[0]._score }}
{{ results.hits.hits[0]._source }}