jq get unique value from two keys - json

i know to get a unique from one key - unique_by('.[].name)
i want to get output by checking for unique values in two keys
but how to do for two keys like unique_by('.[].name,.[].url') and return the input along with other keys?
#input
[
{
"name": "abc",
"url": "https://aa.com",
"created_at": "2022-09-30T11:17:33.181Z"
},
{
"name": "bb",
"url": "https://ddd.com",
"created_at": "2022-09-30T11:14:33.180Z"
},
{
"name": "abc",
"url": "https://aa.com",
"created_at": "2022-09-30T11:14:33.180Z"
}
]
#expected output
[
{
"name": "abc",
"url": "https://aa.com",
"created_at": "2022-09-30T11:17:33.181Z"
},
{
"name": "bb",
"url": "https://ddd.com",
"created_at": "2022-09-30T11:14:33.180Z"
}
]

Collect the criteria into an array:
unique_by([.name, .url])

Just provide to unique_by an array with everything included, so that the array must become unique:
jq 'unique_by([.name, .url])'
[
{
"name": "abc",
"url": "https://aa.com",
"created_at": "2022-09-30T11:17:33.181Z"
},
{
"name": "bb",
"url": "https://ddd.com",
"created_at": "2022-09-30T11:14:33.180Z"
}
]
Demo

Related

Sort an array of objects within an array by most recent timestamp, and then sort the outer array by each array's first object's timestamp, using jq

Sort an array of objects within an array by most recent timestamp, and then sort the outer array by each array's first object's timestamp, using jq.
This is an example of the JSON data, at the stage of the jq pipeline where I'm stuck.
[
[
{
"created_at": "2020-09-26T14:48:46.000Z",
"conversation_id": "1309867515456237571",
"id": "1309867515456237571",
"text": "example1"
}
],
[
{
"created_at": "2020-09-26T14:48:47.000Z",
"conversation_id": "1309867518455156736",
"id": "1309867518455156736",
"text": "example2"
},
{
"created_at": "2020-09-26T14:48:47.000Z",
"conversation_id": "1309867518455156736",
"id": "1309867517846810625",
"text": "example3"
},
{
"created_at": "2020-09-26T14:48:46.000Z",
"conversation_id": "1309867518455156736",
"id": "1309867516659937284",
"text": "example4"
}
],
[
{
"created_at": "2020-09-26T14:48:48.000Z",
"conversation_id": "1309867524473999364",
"id": "1309867524473999364",
"text": "example5"
},
{
"created_at": "2020-09-26T14:48:47.000Z",
"conversation_id": "1309867524473999364",
"id": "1309867520468291586",
"text": "example6"
},
{
"created_at": "2020-09-26T14:48:47.000Z",
"conversation_id": "1309867524473999364",
"id": "1309867520153845760",
"text": "example7"
}
],
[
{
"created_at": "2020-09-26T14:48:48.000Z",
"conversation_id": "1309867524750749705",
"id": "1309867524750749705",
"text": "example8"
}
]
]
Everything I've tried ends up with an error like this one,
jq: error (at <stdin>:8): Cannot index string with string "created_at"
Maybe this?
jq '.[] |= sort_by(.created_at) | sort_by(.[].created_at)'

Find a record in json Object if the record has specific key in python

I have a JSON object which has 100000 records. I want a select a record which has specific value to the one of the key
Eg:
[{
"name": "bindu",
"age": "24",
"qualification": "b.tech"
},
{
"name": "naveen",
"age": "23",
"qualification": "b.tech"
},
{
"name": "parvathi",
"age": "23",
"qualification": "m.tech"
},
{
"name": "bindu s",
"status": "married"
},
{
"name": "naveen k",
"status": "unmarried"
}]
now I want to combine the records which are having the name with 'bindu' and 'bindu s. We can achieve this by iterating on the JSON object but since the size is more it is taking more time. Is there any way to make this easy.
I want the output like
[{
"name": "bindu",
"age": "24",
"qualification": "b.tech",
"status": "married"
},
{
"name": "naveen",
"age": "23",
"qualification": "b.tech",
"status": "unmarried"
},
{
"name": "parvathi",
"age": "23",
"qualification": "m.tech"
"status": ""
},
This will rename and merge your objects by first name.
jq 'map(.name |= split(" ")[0]) | group_by(.name) | map(add)'

Using jq to return specific information in JSON object

I wish to parse individual elements of inner JSON object to build / load in the database.
The following is the JSON object. How can I parse elements like id, name queue etc? I will iterate it in loop and work and build the insert query.
{
"apps": {
"app": [
{
"id": "application_1540378900448_18838",
"user": "hive",
"name": "insert overwrite tabl...summary_view_stg_etl(Stage-2)",
"queue": "Data_Ingestion",
"state": "FINISHED",
"finalStatus": "SUCCEEDED",
"progress": 100
},
{
"id": "application_1540378900448_18833",
"user": "hive",
"name": "insert into SNOW_WORK...metric_definitions')(Stage-13)",
"queue": "Data_Ingestion",
"state": "FINISHED",
"finalStatus": "SUCCEEDED",
"progress": 100
}
]
}
}
You're better off converting the data to a format easily consumed by a database processor, like csv, then do something about it.
$ jq -r '(.apps.app[0] | keys_unsorted) as $k
| $k, (.apps.app[] | [.[$k[]]])
| #csv
' input.json
its pretty simple just fetch elment which is having an array of values.
var JSONOBJ={
"apps": {
"app": [
{
"id": "application_1540378900448_18838",
"user": "hive",
"name": "insert overwrite tabl...summary_view_stg_etl(Stage-2)",
"queue": "Data_Ingestion",
"state": "FINISHED",
"finalStatus": "SUCCEEDED",
"progress": 100
},
{
"id": "application_1540378900448_18833",
"user": "hive",
"name": "insert into SNOW_WORK...metric_definitions')(Stage-13)",
"queue": "Data_Ingestion",
"state": "FINISHED",
"finalStatus": "SUCCEEDED",
"progress": 100
}
]
}
}
JSONOBJ.apps.app.forEach(function(o){console.log(o.id);console.log(o.user);console.log(o.name);})

How to parse a JSON key that has a hash sign # in it with the jq command?

I have a JSON data:
{
"module": {
"data": {
"deliverySummary_200648721592191#address": {
"fields": {
"address": "MyAddress",
"consignee": "MyName",
"phone": "MyPhone",
"postCode": "",
"title": "Alamat Pengiriman \\r\\n"
},
"id": "200648721592191#address",
"tag": "deliverySummary",
"type": "biz"
}
}
}
}
And I want to extract this part:
{
"fields": {
"address": "MyAddress",
"consignee": "MyName",
"phone": "MyPhone",
"postCode": "",
"title": "Alamat Pengiriman \\r\\n"
},
"id": "200648721592191#address",
"tag": "deliverySummary",
"type": "biz"
}
I have tried jq '.module.data.deliverySummary_200648721592191#address' but it just returns null instead of the part that I want above, how do I fix it ?
You should add double-quotation-marks around the problematic property key like '.module.data."deliverySummary_200648721592191#address"' for your case.
See the playground result here.

How to Index & Search Nested Json in Solr 4.9.0

I want to index & search nested json in solr. Here is my json code
{
"id": "44444",
"headline": "testing US",
"generaltags": [
{
"type": "person",
"name": "Jayalalitha",
"relevance": "0.334",
"count": 1
},
{
"type": "person",
"name": "Kumar",
"relevance": "0.234",
"count": 1
}
],
"socialtags": {
"type": "SocialTag",
"name": "US",
"importance": 2
},
"topic": {
"type": "Topic",
"name": "US",
"score": "0.936"
}
}
When I try to Index, I'm getting the error "Error parsing JSON field value. Unexpected OBJECT_START"
When we tried to use Multivalued Field & index, we couldn't able to search using the multivalued field? Its returning "Undefined Field"
Also Please advice if I need to do any changes in schema.xml file?
You are nesting child documents within your document. You need to use the proper syntax for nested child documents in JSON:
[
{
"id": "1",
"title": "Solr adds block join support",
"content_type": "parentDocument",
"_childDocuments_": [
{
"id": "2",
"comments": "SolrCloud supports it too!"
}
]
},
{
"id": "3",
"title": "Lucene and Solr 4.5 is out",
"content_type": "parentDocument",
"_childDocuments_": [
{
"id": "4",
"comments": "Lots of new features"
}
]
}
]
Have a look at this article which describes JSON child documents and block joins.
Using the format mentioned by #qux you will face "Expected: OBJECT_START but got ARRAY_START at [16]",
"code": 400
as when JSON starting with [....] will parsed as a JSON array
{
"id": "44444",
"headline": "testing US",
"generaltags": [
{
"type": "person",
"name": "Jayalalitha",
"relevance": "0.334",
"count": 1
},
{
"type": "person",
"name": "Kumar",
"relevance": "0.234",
"count": 1
}
],
"socialtags": {
"type": "SocialTag",
"name": "US",
"importance": 2
},
"topic": {
"type": "Topic",
"name": "US",
"score": "0.936"
}
}
The above format is correct.
Regarding searching. Kindly use the index to search for the elements of the JSON array.
The workaround for this can be keeping the whole JSON object inside other JSON object and the indexing it
I was suggesting to keep the whole data inside another JSON object. You can try the following way
{
"data": [
{
"id": "44444",
"headline": "testing US",
"generaltags": [
{
"type": "person",
"name": "Jayalalitha",
"relevance": "0.334",
"count": 1
},
{
"type": "person",
"name": "Kumar",
"relevance": "0.234",
"count": 1
}
],
"socialtags": {
"type": "SocialTag",
"name": "US",
"importance": 2
},
"topic": {
"type": "Topic",
"name": "US",
"score": "0.936"
}
}
]
}
see the syntax in http://yonik.com/solr-nested-objects/
$ curl http://localhost:8983/solr/demo/update?commitWithin=3000 -d '
[
{id : book1, type_s:book, title_t : "The Way of Kings", author_s : "Brandon Sanderson",
cat_s:fantasy, pubyear_i:2010, publisher_s:Tor,
_childDocuments_ : [
{ id: book1_c1, type_s:review, review_dt:"2015-01-03T14:30:00Z",
stars_i:5, author_s:yonik,
comment_t:"A great start to what looks like an epic series!"
}
,
{ id: book1_c2, type_s:review, review_dt:"2014-03-15T12:00:00Z",
stars_i:3, author_s:dan,
comment_t:"This book was too long."
}
]
}
]'
supported from solr 5.3