Perseo events do not seen to fire with NGSI-v2 - fiware

Moro,
we have Orion CB and data (NGSI-V2) like this:
[
{
"id": "bloodm1",
"type": "BloodMeter",
"hippo": {
"type": "Number",
"value": 39,
"metadata": {}
}
}
]
and a subscription like this
{
"id": "5ecf6be4e9f143d750cb7d63",
"description": "Perseo Subscription",
"status": "active",
"subject": {
"entities": [
{
"idPattern": ".*"
}
],
"condition": {
"attrs": []
}
},
"notification": {
"timesSent": 26,
"lastNotification": "2020-05-28T11:41:54.00Z",
"attrs": [],
"onlyChangedAttrs": false,
"attrsFormat": "normalized",
"http": {
"url": "http://perseo-fe.fiware-dev.svc.cluster.local:9090/notices"
},
"metadata": [
"dateCreated",
"dateModified",
"timestamp"
],
"lastSuccess": "2020-05-28T11:41:54.00Z",
"lastSuccessCode": 200
}
}
and rule like this:
{
"_id": "5ecfb70f1d163a0007dd715e",
"name": "perseo_email12",
"text": "select \"perseo_email12\" as ruleName, * from pattern [every ev=iotEvent(cast(hippo?,float) > 1)]",
"action": {
"type": "email",
"parameters": {
"to": "adf.fasdf#asdfator.fi",
"from": "mail#asdfator.fi",
"subject": "It's The End Of The World As We Know It (And I Feel Fine)"
}
},
"subservice": "/",
"service": "unknownt"
}
it seems that the email is not sent. what are we doing wrong? We can see from the peseo backend logs that the event goes there. What should we see in the logs if the action fires?
Is there any way to force some rule to fire? Or test the email (rule out misconfig)?
this is what we see in the core logs:
time=2020-05-28T13:11:19.399Z | lvl=INFO | from=::ffff:192.168.29.199 | corr=b84fca16-a0e4-11ea-9391-167c661b292c; perseocep=121 | trans=51ac0299-4308-47c9-9c1b-ceb99b257c99 | srv=perseo | subsrv=/ | op=doPost | comp=perseo-core | msg=incoming event: {"noticeId":"b8557f60-a0e4-11ea-9861-53e82ada17b4","noticeTS":1590671479382,"id":"bloodm1","type":"BloodMeter","isPattern":false,"subservice":"/","service":"perseo","hippo__type":"Number","hippo":40,"hippo__metadata__dateCreated__type":"DateTime","hippo__metadata__dateCreated__ts":1590671100000,"hippo__metadata__dateCreated__day":28,"hippo__metadata__dateCreated__month":5,"hippo__metadata__dateCreated__year":2020,"hippo__metadata__dateCreated__hour":13,"hippo__metadata__dateCreated__minute":5,"hippo__metadata__dateCreated__second":0,"hippo__metadata__dateCreated__millisecond":0,"hippo__metadata__dateCreated__dayUTC":28,"hippo__metadata__dateCreated__monthUTC":5,"hippo__metadata__dateCreated__yearUTC":2020,"hippo__metadata__dateCreated__hourUTC":13,"hippo__metadata__dateCreated__minuteUTC":5,"hippo__metadata__dateCreated__secondUTC":0,"hippo__metadata__dateCreated__millisecondUTC":0,"hippo__metadata__dateModified__type":"DateTime","hippo__metadata__dateModified__ts":1590671479000,"hippo__metadata__dateModified__day":28,"hippo__metadata__dateModified__month":5,"hippo__metadata__dateModified__year":2020,"hippo__metadata__dateModified__hour":13,"hippo__metadata__dateModified__minute":11,"hippo__metadata__dateModified__second":19,"hippo__metadata__dateModified__millisecond":0,"hippo__metadata__dateModified__dayUTC":28,"hippo__metadata__dateModified__monthUTC":5,"hippo__metadata__dateModified__yearUTC":2020,"hippo__metadata__dateModified__hourUTC":13,"hippo__metadata__dateModified__minuteUTC":11,"hippo__metadata__dateModified__secondUTC":19,"hippo__metadata__dateModified__millisecondUTC":0,"stripped":{"id":"bloodm1","type":"BloodMeter","hippo":{"type":"Number","value":40,"metadata":{"dateCreated":{"type":"DateTime","value":"2020-05-28T13:05:00.00Z"},"dateModified":{"type":"DateTime","value":"2020-05-28T13:11:19.00Z"}}}}}
EDIT:
ok, we got forward, (did not understand to use fiware-service header when posting the rule, our bad). BUT the email sending is not working. we get this error:
time=2020-06-08T12:01:05.234Z | lvl=DEBUG | corr=ba89f43e-a97f-11ea-9b7c-167c661b292c; perseocep=2 | trans=3ec8910b-ef8b-461e-bf71-dbf10f9ecf85 | op=/actions/do | path=/actions/do | comp=perseo-fe | srv=perseo | subsrv=/ | msg=emailAction.SendMail {"from":"mail#profirator.fi","to":"ilari.mikkonen#profirator.fi","subject":"Perseo Test One","headers":{}} {"code":"EENVELOPE","response":"554 5.7.1 <unknown[212.15.209.181]>: Client host rejected: Access denied","responseCode":554} undefined
time=2020-06-08T12:01:05.237Z | lvl=ERROR | corr=ba89f43e-a97f-11ea-9b7c-167c661b292c; perseocep=2 | trans=3ec8910b-ef8b-461e-bf71-dbf10f9ecf85 | op=/actions/do | path=/actions/do | comp=perseo-fe | srv=perseo | subsrv=/ | msg=emailAction.SendMail {"to":"ilari.mikkonen#profirator.fi","from":"mail#profirator.fi","subject":"Perseo Test One"} Can't send mail - all recipients were rejected: 554 5.7.1 <unknown[212.15.209.181]>: Client host rejected: Access denied
email creds are tested and working on other components. Tested with 2 different email services. We give these values via docker env variables:
PERSEO_SMTP_HOST: email.service.host
PERSEO_SMTP_PORT: 587
PERSEO_SMTP_SECURE: "false"
PERSEO_SMTP_AUTH_USER: user#email.com
PERSEO_SMTP_AUTH_PASS: password
We also tired to PERSEO_SMTP_TLS_REJECTUNAUTHORIZED: with false

I think we got it: Email sending is not working since we are using STARTTLS & email server requires username and password: https://github.com/telefonicaid/perseo-fe/issues/272

Related

Kusto Query using a bracket with a wildcard

Can you help me identifying what type of wildcard I need to use to find a certain email address in my properties field?
I know that the email I'm looking for is in the slot number 2
How can I find the email address without knowing the slot number?
can I use a [*] instead of a [2]?
Here's my query:
resources
| where type == 'microsoft.insights/actiongroups'
| where properties["enabled"] in~ ('true')
| where properties['emailReceivers'][2]['emailAddress'] == "DevSecOps#pato.com"
| project id,name,resourceGroup,subscriptionId,properties,location
| order by tolower(tostring(name)) asc
I have the following data in my properties field:
{
"enabled": true,
"automationRunbookReceivers": [],
"azureFunctionReceivers": [],
"azureAppPushReceivers": [],
"logicAppReceivers": [],
"eventHubReceivers": [],
"webhookReceivers": [],
"armRoleReceivers": [],
"emailReceivers": [
{
"name": "TED",
"status": "Enabled",
"useCommonAlertSchema": true,
"emailAddress": "tedtechnicalengineeringdesign#pato.com"
},
{
"name": "SevenOfNine",
"status": "Enabled",
"useCommonAlertSchema": true,
"emailAddress": "sevenofnine#pato.com"
},
{
"name": "PEAT",
"status": "Enabled",
"useCommonAlertSchema": true,
"emailAddress": "DevSecOps#pato.com"
}
],
"voiceReceivers": [],
"groupShortName": "eng-mon",
"itsmReceivers": [],
"smsReceivers": []
}
I've tried using [*] instead of [2] but it didn't work.
Do you need to find a certain email address from properties? can you please explain a little more why you need wildcards? can this query work for you? basically expand the 'emailReceivers' list and find out where emailAddress contains the value you are searching for.
resources
| where type == 'microsoft.insights/actiongroups'
| where properties["enabled"] in~ ('true')
| mv-expand properties['emailReceivers'] | limit 100
| extend emailAddr = properties_emailReceivers['emailAddress']
| where emailAddr contains "DevSecOps#pato.com"
| project id,name,resourceGroup,subscriptionId,properties,location
| order by tolower(tostring(name)) asc
where properties.emailReceivers has_cs "DevSecOps#pato.com" is theoretically not 100% safe ("DevSecOps#pato.com" might appear in fields other than "emailAddress"), but in your case it might be enough and if you have a large data set it will also be fast.
If you need a 100% guarantee, then also add the following:
where dynamic_to_json(properties.emailReceivers) matches regex '"emailAddress":"DevSecOps#pato.com"'
It's not pretty, but Azure Resource Graph uses just a subset of the KQL supported by Azure Data Explorer.
let resources = datatable(id:string, name:string, resourceGroup:string, subscriptionId:string, location:string, type:string, properties:dynamic)
[
"my_id"
,"my_name"
,"my_resourceGroup"
,"my_subscriptionId"
,"my_location"
,"microsoft.insights/actiongroups"
,dynamic
(
{
"enabled": true,
"automationRunbookReceivers": [],
"azureFunctionReceivers": [],
"azureAppPushReceivers": [],
"logicAppReceivers": [],
"eventHubReceivers": [],
"webhookReceivers": [],
"armRoleReceivers": [],
"emailReceivers": [
{
"name": "TED",
"status": "Enabled",
"useCommonAlertSchema": true,
"emailAddress": "tedtechnicalengineeringdesign#pato.com"
},
{
"name": "SevenOfNine",
"status": "Enabled",
"useCommonAlertSchema": true,
"emailAddress": "sevenofnine#pato.com"
},
{
"name": "PEAT",
"status": "Enabled",
"useCommonAlertSchema": true,
"emailAddress": "DevSecOps#pato.com"
}
],
"voiceReceivers": [],
"groupShortName": "eng-mon",
"itsmReceivers": [],
"smsReceivers": []
}
)
];
resources
| where type == "microsoft.insights/actiongroups"
| where properties.enabled == true
| where properties.emailReceivers has_cs "DevSecOps#pato.com"
| where dynamic_to_json(properties.emailReceivers) matches regex '"emailAddress":"DevSecOps#pato.com"'
| project id,name,resourceGroup,subscriptionId,properties,location
| order by tolower(name) asc
id
name
resourceGroup
subscriptionId
properties
location
my_id
my_name
my_resourceGroup
my_subscriptionId
{"enabled":true,"automationRunbookReceivers":[],"azureFunctionReceivers":[],"azureAppPushReceivers":[],"logicAppReceivers":[],"eventHubReceivers":[],"webhookReceivers":[],"armRoleReceivers":[],"emailReceivers":[{"name":"TED","status":"Enabled","useCommonAlertSchema":true,"emailAddress":"tedtechnicalengineeringdesign#pato.com"},{"name":"SevenOfNine","status":"Enabled","useCommonAlertSchema":true,"emailAddress":"sevenofnine#pato.com"},{"name":"PEAT","status":"Enabled","useCommonAlertSchema":true,"emailAddress":"DevSecOps#pato.com"}],"voiceReceivers":[],"groupShortName":"eng-mon","itsmReceivers":[],"smsReceivers":[]}
my_location
Fiddle
I found a way to do it using the keyword "contains".
In that way you don't need to specify in which slot it should find it, it could be [0],[1],[2]...[n]
resources
| where type == 'microsoft.insights/actiongroups'
| where properties["enabled"] in~ ('true')
| where properties['emailReceivers'] contains "DevSecOps#pato.com"
| project id,name,resourceGroup,subscriptionId,properties,location
| order by tolower(tostring(name)) asc

Jira Xray empty Test Details with Behave or Cubumber reports

I'm trying to follow the Testing using Behave in Python tutorial. I can get the import to work, but the "The execution details displays the result of the Cucumber Scenario." does not work. 
Here's what I'm doing:
I'm creating a new Test Execution (say, PROJ-123).
I'm creating a new Automated[Cucumber] test (say, PROJ-234)
I'm creating a new Automated[Cucumber] test (say, PROJ-345)
I'm using the following feature file with Behave
#PROJ-123
Feature: Verify something
Scenario Outline: Verify something with <data>
Given I use the data <data>
Then the result is <result>
#PROJ-234
Examples:
| data | result |
| 1 | 1 |
#PROJ-345
Examples:
| data | result |
| 2 | 4 |
I'm running behave with:
behave -k --format=cucumber_json:PrettyCucumberJSONFormatter -o cucumber.json --junit --format=json -o reports/data.json x.feature
I'm importing the report with:
curl -H "Content-Type: application/json" -X POST -u user:password --data #reports/data.json "https://jira.example.com/rest/raven/1.0/import/execution/behave"
The server reply is:
{"testExecIssue":{"id":"574356","key":"PROJ-123","self":"https://jira.example.com/rest/api/2/issue/574356"},"testIssues":{"success":[{"id":"574408","key":"PROJ-234","self":"https://jira.example.com/rest/api/2/issue/574408"},{"id":"574409","key":"PROJ-345","self":"https://jira.example.com/rest/api/2/issue/574409"}]}}
But when I look at the Test Details for either PROG-234 or PROJ-345, it's empty:
I've also tried to import the Cucumber JSON test report:
curl -H "Content-Type: application/json" -X POST -u user:pass --data #cucumber.json https://jira.example.com/rest/raven/1.0/import/execution/cucumber
{"testExecIssue":{"id":"574356","key":"PROJ-123","self":"https://jira.example.com/rest/api/2/issue/574356"},"testIssues":{"success":[{"id":"574408","key":"PROJ-234","self":"https://jira.example.com/rest/api/2/issue/574408"},{"id":"574409","key":"PROJ-345","self":"https://jira.example.com/rest/api/2/issue/574409"}]}}
The result is exactly the same: empty Test Details for either PROG-234 or PROJ-345.
I'm using Jira Data Center v8.13.1 with Xray.
Edit 1: Sergio's comment below states that if I have a feature like the one below it should work:
#PROJ-123
Feature: Verify something
#PROJ-234
# Jira Test ID
Scenario Outline: Verify something with <data>
Given I use the data <data>
Then the result is <result>
Examples:
| data | result |
| 1 | 1 |
| 2 | 4 |
This second feature file generates the following Cucumber JSON report:
[
{
"description": "",
"elements": [
{
"description": "",
"id": "verify-something;verify-something-with-1----#1.1-",
"keyword": "Scenario Outline",
"line": 13,
"location": "x.feature:13",
"name": "Verify something with 1 -- #1.1 ",
"steps": [
{
"keyword": "Given",
"line": 7,
"match": {
"location": "steps/x.py:3"
},
"name": "I use the data 1",
"result": {
"duration": 1996756,
"status": "passed"
},
"step_type": "given"
},
{
"keyword": "Then",
"line": 8,
"match": {
"location": "steps/x.py:7"
},
"name": "the result is 1",
"result": {
"duration": 993013,
"status": "passed"
},
"step_type": "then"
}
],
"tags": [
{
"line": 1,
"name": "PROJ-234"
}
],
"type": "scenario"
},
{
"description": "",
"id": "verify-something;verify-something-with-2----#1.2-",
"keyword": "Scenario Outline",
"line": 14,
"location": "x.feature:14",
"name": "Verify something with 2 -- #1.2 ",
"steps": [
{
"keyword": "Given",
"line": 7,
"match": {
"location": "steps/x.py:3"
},
"name": "I use the data 2",
"result": {
"duration": 1998901,
"status": "passed"
},
"step_type": "given"
},
{
"keyword": "Then",
"line": 8,
"match": {
"location": "steps/x.py:7"
},
"name": "the result is 4",
"result": {
"duration": 0,
"status": "passed"
},
"step_type": "then"
}
],
"tags": [
{
"line": 1,
"name": "PROJ-234"
}
],
"type": "scenario"
}
],
"id": "verify-something",
"keyword": "Feature",
"line": 2,
"name": "Verify something",
"status": "passed",
"tags": [
{
"line": 1,
"name": "PROJ-123"
}
],
"uri": "x.feature"
}
]
It doesn't. The Test Details are still empty (with Behave or Cucumber reports).
Multiple "Examples" sections in a given Scenario Outline are not supported, at this time.
Also, the tag should be added on the Scenario Outline entry and not on the Examples section.
Another important thing is that the Tests you have, the Cucumber Type should be "Scenario Outline" and not "Scenario". This should fix your initial problem.
This isn't supposed to work, I should have read the tutorial more carefully:
The test (specification) is initialy created in Jira as a Cucumber Test and afterwards, it is exported using the UI or the REST API.
The Test Details won't be populated by a test report.

JQ if then statement scope

I'd like to use JQ to grab only the sub-records that match an if-then statement. When I use
jq 'if .services[].banner == "FQMDAAICCg==" then .services[].port else empty end
it grabs all of the ports for the record. (there are multiple services under each record and I want to restrict my then statement to only the services scope where I actually found the if condition).
How do I just get the port, banner, etc. for the specific service underneath the record which hit my condition?
example:
{
"services": [
{
"tls_detected": false,
"banner_is_raw": true,
"transport_protocol": "tcp",
"banner": "PCFET0NUWVBFIEhU",
"certificate": null,
"timestamp": "2020-03-22T00:38:01.074Z",
"protocol": null,
"port": 4444
},
{
"tls_detected": false,
"banner_is_raw": true,
"transport_protocol": "tcp",
"banner": "SFRUUC8xLjEgMzA",
"certificate": null,
"timestamp": "2020-03-19T01:39:45.288Z",
"protocol": null,
"port": 8080
},
{
"tls_detected": false,
"banner_is_raw": true,
"transport_protocol": "tcp",
"banner": "FQMDAAICCg==",
"certificate": null,
"timestamp": "2020-03-19T01:39:45.288Z",
"protocol": null,
"port": 8085
},
{
"tls_detected": false,
"banner_is_raw": false,
"transport_protocol": "tcp",
"banner": "Q2FjaGUtQ29ud",
"certificate": null,
"timestamp": "2020-03-20T04:25:24Z",
"protocol": "http",
"port": 8080
}
],
"ip": "103.238.62.68",
"autonomous_system": {
"description": "CHAPTECH-AS-AP Chaptech Pty Ltd",
"asn": 133493,
"routed_prefix": "103.238.62.0/24",
"country_code": "AU",
"name": "CHAPTECH-AS-AP Chaptech Pty Ltd",
"path": [
11164,
3491,
63956,
7594,
7594,
7594,
7594,
133493
]
},
"location": {
"country_code": "AU",
"registered_country": "Australia",
"registered_country_code": "AU",
"continent": "Oceania",
"timezone": "Australia/Sydney",
"latitude": -33.494,
"longitude": 143.2104,
"country": "Australia"
}
}
Update:
Thanks to peak but I couldn't get the additional goals bit working below. I ended up using
jq 'select(.services[].banner == "FQMDAAICCg==") | {port: .services[].port, banner: .services[].banner, ip: .ip}' censys.json | jq 'if .banner == "FQMDAAICCg==" then .ip,.port else empty end'
which is ugly but did the trick and still allowed me to stream the data to the first filter.
Original question
How do I just get the port, banner, etc. for the specific service underneath the record which hit my condition?
To get just the "port" for the service matching the condition, you could modify your query:
.services[]
| if .banner == "FQMDAAICCg==" then .port else empty end
Equivalently:
.services[]
| select(.banner == "FQMDAAICCg==")
| .port
Additional goal
I want to end up in this example with '8085' + '103.238.62.68'
If you really want the two values in that format, you could write something along the following lines, invoking jq with the -r option:
.ip as $ip
| (.services[] | select(.banner == "FQMDAAICCg==") | .port) as $port
| "'\($port)' + '\($ip)'"
or more briefly but less readably:
"'\(.services[] | select(.banner == "FQMDAAICCg==") | .port)' + '\(.ip)'"

How can I filter by a numeric field using jq?

I am writing a script to query the Bitbucket API and delete SNAPSHOT artifacts that have never been downloaded. This script is failing because it gets ALL snapshot artifacts, the select for the number of downloads does not appear to be working.
What is wrong with my select statement to filter objects by the number of downloads?
Of course the more direct solution here would be if I could just query the Bitbucket API with a filter. To the best of my knowledge the API does not support filtering by downloads.
My script is:
#!/usr/bin/env bash
curl -X GET --user "me:mykey" "https://api.bitbucket.org/2.0/repositories/myemployer/myproject/downloads?pagelen=100" > downloads.json
# get all values | reduce the set to just be name and downloads | select entries where downloads is zero | select entries where name contains SNAPSHOT | just get the name
#TODO i screwed up the selection somewhere its returning files that contain SNAPSHOT regardless of number of downloads
jq '.values | {name: .[].name, downloads: .[].downloads} | select(.downloads==0) | select(.name | contains("SNAPSHOT")) | .name' downloads.json > snapshots_without_any_downloads.js
#unique sort, not sure why jq gives me multiple values
sort -u snapshots_without_any_downloads.js | tr -d '"' > unique_snapshots_without_downloads.js
cat unique_snapshots_without_downloads.js | xargs -t -I % curl -Ss -X DELETE --user "me:mykey" "https://api.bitbucket.org/2.0/repositories/myemployer/myproject/downloads/%" > deleted_files.txt
A deidentified sample of the raw input from the API is:
{
"pagelen": 10,
"size": 40,
"values": [
{
"name": "myproject_1.1-SNAPSHOT_0210f77_mc_3.5.0.zip",
"links": {
"self": {
"href": "https://api.bitbucket.org/2.0/repositories/myemployer/myproject/downloads/myproject_1.1-SNAPSHOT_0210f77_mc_3.5.0.zip"
}
},
"downloads": 2,
"created_on": "2018-03-15T17:50:00.157310+00:00",
"user": {
"username": "me",
"display_name": "me",
"type": "user",
"uuid": "{3051ec5f-cc92-4bc3-b291-38189a490a89}",
"links": {
"self": {
"href": "https://api.bitbucket.org/2.0/users/me"
},
"html": {
"href": "https://bitbucket.org/me/"
},
"avatar": {
"href": "https://bitbucket.org/account/me/avatar/32/"
}
}
},
"type": "download",
"size": 430894
},
{
"name": "myproject_1.1-SNAPSHOT_thanks_for_the_reminder_charles_duffy_mc_3.5.0.zip",
"links": {
"self": {
"href": "https://api.bitbucket.org/2.0/repositories/myemployer/myproject/downloads/myproject_1.1-SNAPSHOT_0210f77_mc_3.5.0.zip"
}
},
"downloads": 0,
"created_on": "2018-03-15T17:50:00.157310+00:00",
"user": {
"username": "me",
"display_name": "me",
"type": "user",
"uuid": "{3051ec5f-cc92-4bc3-b291-38189a490a89}",
"links": {
"self": {
"href": "https://api.bitbucket.org/2.0/users/me"
},
"html": {
"href": "https://bitbucket.org/me/"
},
"avatar": {
"href": "https://bitbucket.org/account/me/avatar/32/"
}
}
},
"type": "download",
"size": 430894
},
{
"name": "myproject_1.0_mc_3.5.1.zip",
"links": {
"self": {
"href": "https://api.bitbucket.org/2.0/repositories/myemployer/myproject/downloads/myproject_1.1-SNAPSHOT_0210f77_mc_3.5.1.zip"
}
},
"downloads": 5,
"created_on": "2018-03-15T17:49:14.885544+00:00",
"user": {
"username": "me",
"display_name": "me",
"type": "user",
"uuid": "{3051ec5f-cc92-4bc3-b291-38189a490a89}",
"links": {
"self": {
"href": "https://api.bitbucket.org/2.0/users/me"
},
"html": {
"href": "https://bitbucket.org/me/"
},
"avatar": {
"href": "https://bitbucket.org/account/me/avatar/32/"
}
}
},
"type": "download",
"size": 430934
}
],
"page": 1,
"next": "https://api.bitbucket.org/2.0/repositories/myemployer/myproject/downloads?pagelen=10&page=2"
}
The output I want from this snippet is myproject_1.1-SNAPSHOT_thanks_for_the_reminder_charles_duffy_mc_3.5.0.zip - that artifact is a SNAPSHOT and has zero downloads.
I have used this intermediate step to do some debugging:
jq '.values | {name: .[].name, downloads: .[].downloads} | select(.downloads>0) | select(.name | contains("SNAPSHOT")) | unique' downloads.json > snapshots_with_downloads.js
jq '.values | {name: .[].name, downloads: .[].downloads} | select(.downloads==0) | select(.name | contains("SNAPSHOT")) | .name' downloads.json > snapshots_without_any_downloads.js
#this returns the same values for each list!
diff unique_snapshots_with_downloads.js unique_snapshots_without_downloads.js
This adjustment gives a cleaner and unique structure, it suggests that theres some sort of splitting or streaming aspect of jq that I do not fully understand:
#this returns a "unique" array like I expect, adding select to this still does not produce the desired outcome
jq '.values | [{name: .[].name, downloads: .[].downloads}] | unique' downloads.json
The data after this step looks like this. It just removed the cruft I didn't need from the raw API response:
[
{
"name": "myproject_1.0_2400a51_mc_3.4.0.zip",
"downloads": 0
},
{
"name": "myproject_1.0_2400a51_mc_3.4.1.zip",
"downloads": 2
},
{
"name": "myproject_1.1-SNAPSHOT_391f4d5_mc_3.5.0.zip",
"downloads": 0
},
{
"name": "myproject_1.1-SNAPSHOT_391f4d5_mc_3.5.1.zip",
"downloads": 2
}
]
As I understand it:
You want globally unique outputs
You want only items with downloads==0
You want only items whose name contains "SNAPSHOT"
The following will accomplish that:
jq -r '
[.values[] | {(.name): .downloads}]
| add
| to_entries[]
| select(.value == 0)
| .key | select(contains("SNAPSHOT"))'
Rather than making unique an explicit step, this version generates a map from names to download counters (adding the values together -- which means that in case of conflicts, the last one wins), and thereby both ensures that the outputs are unique.
Given your test JSON, output is:
myproject_1.1-SNAPSHOT_thanks_for_the_reminder_charles_duffy_mc_3.5.0.zip
Applied to the overall problem context, this strategy can be used to simplify the overall process:
jq -r '[.values[] | {(.links.self.href): .downloads}] | add | to_entries[] | select(.value == 0) | .key | select(contains("SNAPSHOT"))'
It simplifies the overall process by acting on the URL to the file rather than the name only. This simplifies the subsequent DELETE call. The sort and tr calls can also be removed.
Here's a solution which sums up the .download values per .name before making the selection based on the total number of downloads:
reduce (.values[] | select(.name | contains("SNAPSHOT"))) as $v
({}; .[$v.name] += $v.downloads)
| with_entries(select(.value == 0))
| keys_unsorted[]
Example:
$ jq -r -f program.jq input.json
myproject_1.1-SNAPSHOT_thanks_for_the_reminder_charles_duffy_mc_3.5.0.zip
p.s.
What is wrong with my select statement ...?
The problem that jumps out is the bit of the pipeline just before the "select" filter:
.values | {name: .[].name, downloads: .[].downloads}
The use of .[] in this manner results in the Cartesian product being formed -- that is, the above expression will emit n*n JSON sets, where n is the length of .values. You evidently intended to write:
.values[] | {name: .name, downloads: .downloads}
which can be abbreviated to:
.values[] | {name, downloads}

Chain select and max_by on json doc with jq in bash

I want to get the SnapshotIdentifier of the snapshot with the maximum SnapshotCreateTime, and filter it by ClusterIdentifier. Here is the command I'm using:
aws redshift describe-cluster-snapshots --region us-west-2 |
jq -r '.Snapshots[]
| select(.ClusterIdentifier == "dev-cluster")
| max_by(.SnapshotCreateTime)
| .SnapshotIdentifier '
Here is the json
{
"Snapshots": [
{
"EstimatedSecondsToCompletion": 0,
"OwnerAccount": "45645641155",
"CurrentBackupRateInMegaBytesPerSecond": 6.2857,
"ActualIncrementalBackupSizeInMegaBytes": 22.0,
"NumberOfNodes": 3,
"Status": "available",
"VpcId": "myvpc",
"ClusterVersion": "1.0",
"Tags": [],
"MasterUsername": "ayxbizops",
"TotalBackupSizeInMegaBytes": 192959.0,
"DBName": "dev",
"BackupProgressInMegaBytes": 22.0,
"ClusterCreateTime": "2016-09-06T15:56:08.170Z",
"RestorableNodeTypes": [
"dc1.large"
],
"EncryptedWithHSM": false,
"ClusterIdentifier": "dev-cluster",
"SnapshotCreateTime": "2016-09-06T16:00:25.595Z",
"AvailabilityZone": "us-west-2c",
"NodeType": "dc1.large",
"Encrypted": false,
"ElapsedTimeInSeconds": 3,
"SnapshotType": "manual",
"Port": 5439,
"SnapshotIdentifier": "thismorning"
}
]
}
max_by expects an array as input. Thus the following variant of your filter would work:
[.Snapshots[] | select(.ClusterIdentifier == "dev-cluster")]
| max_by(.SnapshotCreateTime)
| .SnapshotIdentifier
Based on your verbal description, it would seem you want to run max_by before select:
.Snapshots
| max_by(.SnapshotCreateTime)
| select(.ClusterIdentifier == "dev-cluster")
| .SnapshotIdentifier
If there is possibly more than one maximal object, you might want to use maximal_by rather than max_by:
def maximal_by(f):
(map(f) | max) as $mx
| .[] | select(f == $mx);