Convert an array of strings to a dictionary with JQ? - json

I have trying to convert the AWS public IP ranges into a format that can be used with the Terraform external data provider so I can create a security group rule based off the AWS public CIDRs. The provider requires a single JSON object with this format:
{"string": "string"}
Here is a snippet of the public ranges JSON document:
{
"syncToken": "1589917992",
"createDate": "2020-05-19-19-53-12",
"prefixes": [
{
"ip_prefix": "35.180.0.0/16",
"region": "eu-west-3",
"service": "AMAZON",
"network_border_group": "eu-west-3"
},
{
"ip_prefix": "52.94.76.0/22",
"region": "us-west-2",
"service": "AMAZON",
"network_border_group": "us-west-2"
},
// ...
]
I can successfully extract the ranges I care about with this, [.prefixes[] | select(.region == "us-west-2") | .ip_prefix] | sort | unique, and it gives me this:
[
"100.20.0.0/14",
"108.166.224.0/21",
"108.166.240.0/21",
"13.248.112.0/24",
...
]
I can't figure out how to convert this to an arbitrarily-keyed object with jq. In order to properly use the array object, I need to convert it to a dictionary, something like {"arbitrary-key": "100.20.0.0/14"}, so that I can use it in Terraform like this:
data "external" "amazon-ranges" {
program = [
"cat",
"${path.cwd}/aws-ranges.json"
]
}
resource "aws_default_security_group" "allow-mysql" {
vpc_id = aws_vpc.main.id
ingress {
description = "MySQL"
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = [
values(data.external.amazon-ranges.result)
]
}
}
What is the most effective way to extract the the AWS public IP ranges document into a single object with arbitrary keys?

The following script uses the .ip_prefix as the key, thus perhaps avoiding the need for the sort|unique. It yields:
{
"35.180.0.0/16": "35.180.0.0/16",
"52.94.76.0/22": "52.94.76.0/22"
}
Script
#!/bin/bash
function data {
cat <<EOF
{
"syncToken": "1589917992",
"createDate": "2020-05-19-19-53-12",
"prefixes": [
{
"ip_prefix": "35.180.0.0/16",
"region": "eu-west-3",
"service": "AMAZON",
"network_border_group": "eu-west-3"
},
{
"ip_prefix": "52.94.76.0/22",
"region": "us-west-2",
"service": "AMAZON",
"network_border_group": "us-west-2"
}
]
}
EOF
}
data | jq '
.prefixes
| map(select(.region | test("west"))
| {(.ip_prefix): .ip_prefix} )
| add '

There's a better option to get at the AWS IP ranges data in Terraform, which is to use the aws_ip_ranges data source, instead of trying to mangle things with the external data source and jq.
The example in the above linked documentation shows a similar, but also slightly more complex, thing to what you're trying to do here:
data "aws_ip_ranges" "european_ec2" {
regions = ["eu-west-1", "eu-central-1"]
services = ["ec2"]
}
resource "aws_security_group" "from_europe" {
name = "from_europe"
ingress {
from_port = "443"
to_port = "443"
protocol = "tcp"
cidr_blocks = data.aws_ip_ranges.european_ec2.cidr_blocks
ipv6_cidr_blocks = data.aws_ip_ranges.european_ec2.ipv6_cidr_blocks
}
tags = {
CreateDate = data.aws_ip_ranges.european_ec2.create_date
SyncToken = data.aws_ip_ranges.european_ec2.sync_token
}
}
To do your exact thing you would do something like this:
data "aws_ip_ranges" "us_west_2_amazon" {
regions = ["us_west_2"]
services = ["amazon"]
}
resource "aws_default_security_group" "allow-mysql" {
vpc_id = aws_vpc.main.id
ingress {
description = "MySQL"
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = data.aws_ip_ranges.us_west_2_amazon.cidr_blocks
}
}
However, there are 2 things that are bad here.
The first, and most important, is that you're allowing access to your database from every IP address that AWS has in US-West-2 across all services. That means that anyone in the world is able to spin up an EC2 instance or Lambda function in US-West-2 and then have network access to your database. This is a very bad idea.
The second is that if that returns more than 60 CIDR blocks you are going to end up with more than 60 rules in your security group. AWS security groups have a limit of 60 security group rules per IP address type (IPv4 vs IPv6) and per ingress/egress:
You can have 60 inbound and 60 outbound rules per security group (making a total of 120 rules). This quota is enforced separately for IPv4 rules and IPv6 rules; for example, a security group can have 60 inbound rules for IPv4 traffic and 60 inbound rules for IPv6 traffic. A rule that references a security group or prefix list ID counts as one rule for IPv4 and one rule for IPv6.
From https://docs.aws.amazon.com/vpc/latest/userguide/amazon-vpc-limits.html#vpc-limits-security-groups
This is technically a soft cap and you can ask AWS to raise this limit in exchange for reducing the amount of security groups that can be applied to a network interface to keep the maximum amount of security group rules at or below 1000 per network interface. It's probably not something you want to mess around with though.

Related

TTN V3 (MQTT JSON) -> Telegraf -> Grafana / Sensor data from Dragino LSE01 does not apear

I have a problem with Telegraf. I have a Dragino LSE01-8 sensor which is registered on TTN v3. I can check the decoded payload by subscribing to the topic "v3/lse01-8#ttn/devices/+/up".
But when I want to grab the data from Influx, I can not get "temp_SOIL" and "water_SOIL", although the data appears in JSON. "conduct_SOIL" is no problem. But I don't know why. Can somebody give me a hint?
Another sensor (Dragino LHT 65) works fine with all data I want to access.
It's possible to get this data from the Influx-Database:
uplink_message_decoded_payload_BatV
uplink_message_decoded_payload_Mod
uplink_message_decoded_payload_conduct_SOIL
uplink_message_decoded_payload_i_flag
uplink_message_decoded_payload_s_flag
uplink_message_f_cnt
uplink_message_f_port
uplink_message_locations_user_latitude
uplink_message_locations_user_longitude
uplink_message_rx_metadata_0_channel_index
uplink_message_rx_metadata_0_channel_rssi
uplink_message_rx_metadata_0_location_altitude
uplink_message_rx_metadata_0_location_latitude
uplink_message_rx_metadata_0_location_longitude
uplink_message_rx_metadata_0_rssi
uplink_message_rx_metadata_0_snr
uplink_message_rx_metadata_0_timestamp
uplink_message_settings_data_rate_lora_bandwidth
uplink_message_settings_data_rate_lora_spreading_factor
uplink_message_settings_timestamp
## Feuchtigkeitssensor Dragino LSE01-8
[[inputs.mqtt_consumer]]
name_override = "TTN-LSE01"
servers = ["tcp://eu1.cloud.thethings.network:1883"]
qos = 0
connection_timeout = "30s"
topics = [ "v3/lse01-8#ttn/devices/+/up" ]
client_id = "telegraf"
username = "lse01-8#ttn"
password = "NNSXS.LLSNSE67AP..................P67Q.Q...........HPG............KJA..........." //
data_format = "json"
This is the JSON data I can get (I changed some data in order not to send any passwords or tokens).
{
"end_device_ids":{
"device_id":"eui-a8.40.141.bbe4",
"application_ids":{
"application_id":"lse01-8"
},
"dev_eui":"A8...40.BE...4",
"join_eui":"A8.40.010.1",
"dev_addr":"2.9F.....8"
},
"correlation_ids":[
"as:up:01G4WDNS..P3C3R...RK56VQ...KT7N076",
"gs:conn:01G4H2F.ETRG.V2QER...RQ.0K1MGZ44",
"gs:up:host:01G4H2F.ETWRZX.4PFN.A2M.6RDKD4",
"gs:uplink:01G4WDN.N7B6P.J8E.JS.503F1",
"ns:uplink:01G4WDNSFM.MCYYEZZ1.KY.4M78",
"rpc:/ttn.lorawan.v3.GsNs/HandleUplink:01G4W.NSFM29Z3.PABYW...43",
"rpc:/ttn.lorawan.v3.NsAs/HandleUplink:01G4W....VTQ4DMKBF"
],
"received_at":"2022-06-06T11:51:18.979353604Z",
"uplink_message":{
"session_key_id":"AYE...j+DM....A==",
"f_port":2,
"f_cnt":292,
"frm_payload":"DSQAAAcVB4AADBA=",
"decoded_payload":{
"BatV":3.364,
"Mod":0,
"conduct_SOIL":12,
"i_flag":0,
"s_flag":1,
"temp_DS18B20":"0.00",
"temp_SOIL":"19.20",
"water_SOIL":"18.13"
},
"rx_metadata":[
{
"gateway_ids":{
"gateway_id":"lr8",
"eui":"3.6201F0.058.....00"
},
"time":"2022-06-06T11:51:00.289713Z",
"timestamp":4283143007,
"rssi":-47,
"channel_rssi":-47,
"snr":7,
"location":{
"latitude":51.______________,
"longitude":6.__________________,
"altitude":25,
"source":"SOURCE_REGISTRY"
},
"uplink_token":"ChsKG________________________________",
"channel_index":2
}
],
"settings":{
"data_rate":{
"lora":{
"bandwidth":125000,
"spreading_factor":7
}
},
"coding_rate":"4/5",
"frequency":"868500000",
"timestamp":4283143007,
"time":"2022-06-06T11:51:00.289713Z"
},
"received_at":"2022-06-06T11:51:18.772518399Z",
"consumed_airtime":"0.061696s",
"locations":{
"user":{
"latitude":51._________________,
"longitude":6.__________________4,
"source":"SOURCE_REGISTRY"
}
},
"version_ids":{
"brand_id":"dragino",
"model_id":"lse01",
"hardware_version":"_unknown_hw_version_",
"firmware_version":"1.1.4",
"band_id":"EU_863_870"
},
"network_ids":{
"net_id":"000013",
"tenant_id":"ttn",
"cluster_id":"eu1",
"cluster_address":"eu1.cloud.thethings.network"
}
}
}

json preserve original data order

I am rendering following data
set firewall family inet filter v4-test term accept_1 from protocol lilla
set firewall family inet filter v4-test term accept_2 then accept
set firewall family inet filter v4-test term accept_3 from source-prefix-list v4-test
set firewall family inet filter v4-test term accept_3 from destination-port 1
set firewall family inet filter v4-test term accept_3 from destination-port 2
set firewall family inet filter v4-test term accept_3 then accept
set firewall family inet filter v4-test term access_4 from source-address x.x.x.x/32
via ttp templating
parser = ttp(data=data_to_parse, template=ttp_template)
parser.parse()
results = parser.result(format='json')[0]
results_dic = json.loads(results)
Output of rendering is a json file
[
{
"v4-test": {
"accept_2": {
"action": "accept"
},
"accept_1": {
"protocol": "lilla"
},
"accept_3": [
{
"source-prefix-list": "v4-test"
},
{
"destination-port": "1"
},
{
"destination-port": "2"
},
{
"action": "accept"
}
],
"access_4": {
"source-prefix-list": "x.x.x.x/32"
}
}
}
]
Problem: I want the output data to keep the order of original data. Any hint?
Thank you.

Glue_version and python_version not working in terraform

Hellow everyone,
I am using terraform to create the glue job. Now AWS Glue now supports the ability to run ETL jobs on Apache Spark 2.4.3 (with Python 3).
I want to use this feature. but whenever i am making changes it is throwing error.
I am using
aws-cli/1.16.184.
Terraform v0.12.6
aws provider 2.29
resource "aws_glue_job" "aws_glue_job_foo" {
glue_version = "1"
name = "job-name"
description = "job-desc"
role_arn = data.aws_iam_role.aws_glue_iam_role.arn
max_capacity = 1
max_retries = 1
connections = [aws_glue_connection.connection.name]
timeout = 5
command {
name = "pythonshell"
script_location = "s3://bucket/script.py"
python_version = "3"
}
default_arguments = {
"--job-language" = "python"
"--ENV" = "env"
"--ROLE_ARN" = data.aws_iam_role.aws_glue_iam_role.arn
}
execution_property {
max_concurrent_runs = 1
}
}
But it is throwing error to me,
Error: Unsupported argument
An argument named "glue_version" is not expected here.
This Terraform issue has been resolved.
Terraform aws_glue_job now accepts a glue_version argument.
Previous Answer
With or without python_version in the Terraform command block, I must go to the AWS console to edit the job and set "Glue version". My job fails without this manual step.
Workaround #1
This issue has been reported and debated and includes a workaround.
resource "aws_glue_job" "etl" {
name = "${var.job_name}"
role_arn = "${var.iam_role_arn}"
command {
script_location = "s3://${var.bucket_name}/${aws_s3_bucket_object.script.key}"
}
default_arguments = {
"--enable-metrics" = ""
"--job-language" = "python"
"--TempDir" = "s3://${var.bucket_name}/TEMP"
}
# Manually set python 3 and glue 1.0
provisioner "local-exec" {
command = "aws glue update-job --job-name ${var.job_name} --job-update 'Command={ScriptLocation=s3://${var.bucket_name}/${aws_s3_bucket_object.script.key},PythonVersion=3,Name=glueetl},GlueVersion=1.0,Role=${var.iam_role_arn},DefaultArguments={--enable-metrics=\"\",--job-language=python,--TempDir=\"s3://${var.bucket_name}/TEMP\"}'"
}
}
Workaround #2
Here is a different workaround.
resource "aws_cloudformation_stack" "network" {
name = "${local.name}-glue-job"
template_body = <<STACK
{
"Resources" : {
"MyJob": {
"Type": "AWS::Glue::Job",
"Properties": {
"Command": {
"Name": "glueetl",
"ScriptLocation": "s3://${local.bucket_name}/jobs/${var.job}"
},
"ExecutionProperty": {
"MaxConcurrentRuns": 2
},
"MaxRetries": 0,
"Name": "${local.name}",
"Role": "${var.role}"
}
}
}
}
STACK
}
This has been released in version 2.34.0 of the Terraform AWS provider.
It looks like terraform uses python_version instead of glue_version
By using python_version = "3", you should be using glue version 1.0. Glue version 0.9 doesn't support python 3.

Kafka-Connect JDBC Connector tinyint to boolean mapping

I have a Kafka-Connect job configured to query a MySQL table periodically and place messages on a queue. The structure of these messages are defined using an Avro schema. I am having an issue with the mapping for one of my columns.
The column is defined as a tinyint(1) in my MySQL schema, and I am trying to map this to a boolean field in my avro object.
{
"name": "is_active",
"type": "boolean"
}
The kafka-connect jobs runs, and messages are placed on the queue, but when my application who reads from the queue attempts to deserialize the messages I get the following error:
org.apache.avro.AvroTypeException: Found int, expecting boolean
I was hoping that a 1 or 0 value could be automatically mapped to a boolean, but that does not seem to be the case.
I have also tried to configure my job to use a 'Cast' transform, but that just seems to caused issues with the other fields in the message.
"transforms": "Cast",
"transforms.Cast.type": "org.apache.kafka.connect.transforms.Cast$Value",
"transforms.Cast.spec": "is_active:boolean"
Is what I am attempting possible, or will I have to change my application to work with the int value?
Here is my full configuration ( I have stripped out some other irrelevant fields )
Kafka Connect job config
{
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"mode": "bulk",
"topic.prefix": "my_topic-name",
"transforms.SetSchemaMetadata.type": "org.apache.kafka.connect.transforms.SetSchemaMetadata$Value",
"query": "select is_active from my_table",
"poll.interval.ms": "30000",
"transforms": "SetSchemaMetadata",
"name": "job_name",
"connection.url": "connectiondetailshere",
"transforms.SetSchemaMetadata.schema.name": "com.my.model.name"
}
AVRO Schema
{
"type": "record",
"name": "name",
"namespace": "com.my.model",
"fields": [
{
"name": "is_active",
"type": "long"
}
],
"connect.name": "com.my.model.name"
}
You can do this either with a custom Transform (this is a perfect use case for it), or write a simple streaming application to do it, for example in KSQL:
CREATE STREAM my_topic AS
SELECT COL1, COL2, …
CASE WHEN is_active=1 THEN TRUE ELSE FALSE END AS is_active_bln
FROM my_source_connect_topic;
ksql> describe my_topic;
Name : my_topic
Field | Type
-----------------------------------------
ROWTIME | BIGINT (system)
ROWKEY | VARCHAR(STRING) (system)
COL1 | INTEGER
COL1 | VARCHAR
IS_ACTIVE_BLN | BOOLEAN
----------------------------------------

compare input to fields in a json file in ruby

I am trying to create a function that takes an input. Which in this case is a tracking code. Look that tracking code up in a JSON file then return the tracking code as output. The json file is as follows:
[
{
"tracking_number": "IN175417577",
"status": "IN_TRANSIT",
"address": "237 Pentonville Road, N1 9NG"
},
{
"tracking_number": "IN175417578",
"status": "NOT_DISPATCHED",
"address": "Holly House, Dale Road, Coalbrookdale, TF8 7DT"
},
{
"tracking_number": "IN175417579",
"status": "DELIVERED",
"address": "Number 10 Downing Street, London, SW1A 2AA"
}
]
I have started using this function:
def compare_content(tracking_number)
File.open("pages/tracking_number.json", "r") do |file|
file.print()
end
Not sure how I would compare the input to the json file. Any help would be much appreciated.
You can use the built-in JSON module.
require 'json'
def compare_content(tracking_number)
# Loads ENTIRE file into string. Will not be effective on very large files
json_string = File.read("pages/tracking_number.json")
# Uses the JSON module to create an array from the JSON string
array_from_json = JSON.parse(json_string)
# Iterates through the array of hashes
array_from_json.each do |tracking_hash|
if tracking_number == tracking_hash["tracking_number"]
# If this code runs, tracking_hash has the data for the number you are looking up
end
end
end
This will parse the JSON supplied into an array of hashes which you can then compare to the number you are looking up.
If you are the one generating the JSON file and this method will be called a lot, consider mapping the tracking numbers directly to their data for this method to potentially run much faster. For example,
{
"IN175417577": {
"status": "IN_TRANSIT",
"address": "237 Pentonville Road, N1 9NG"
},
"IN175417578": {
"status": "NOT_DISPATCHED",
"address": "Holly House, Dale Road, Coalbrookdale, TF8 7DT"
},
"IN175417579": {
"status": "DELIVERED",
"address": "Number 10 Downing Street, London, SW1A 2AA"
}
}
That would parse into a hash, where you could much more easily grab the data:
require 'json'
def compare_content(tracking_number)
json_string = File.read("pages/tracking_number.json")
hash_from_json = JSON.parse(json_string)
if hash_from_json.key?(tracking_number)
tracking_hash = hash_from_json[tracking_number]
else
# Tracking number does not exist
end
end