mongoexport - Leaf Level - JSON to CSV conversion - egrep not working with multiple patterns using "|" pipe or with -f option - json

Why egrep is not giving me all the matching entries?
This is my simple JSON blob:
[nukaNUKA#dev-machine csv]$ cat jsonfile.json
{"number": 303,"projectName": "giga","queueId":8881,"result":"SUCCESS"}
This is my pattern file (so that I don't scare the editor):
[nukaNUKA#dev-machine csv]$ cat egrep-pattern.txt
\"number\":.*\"projectName
\"projectName\":.*,\"queueId
\"queueId\":.*,\"result
\"result\":\".*$
This is egrep/grep command for individual searches, which works!:
[nukaNUKA#dev-machine csv]$ egrep -o "\"number\":.*\"projectName" jsonfile.json
"number": 303,"projectName
[nukaNUKA#dev-machine csv]$ egrep -o "\"projectName\":.*,\"queueId" jsonfile.json
"projectName": "giga","queueId
[nukaNUKA#dev-machine csv]$ egrep -o "\"queueId\":.*,\"result" jsonfile.json
"queueId":8881,"result
[nukaNUKA#dev-machine csv]$ egrep -o "\"result\":\".*$" jsonfile.json
"result":"SUCCESS"}
So, wth this didn't work? I don't wear glasses, yes.
[nukaNUKA#dev-machine csv]$ egrep -o "\"number\":.*\"projectName|\"projectName\":.*,\"queueId|\"queueId\":.*,\"result|\"result\":\".*$" jsonfile.json
"number": 303,"projectName
"queueId":8881,"result
[nukaNUKA#dev-machine csv]$ egrep -o -f egrep-pattern.txt jsonfile.json
"number": 303,"projectName
"queueId":8881,"result
[nukaNUKA#dev-machine csv]$
I have a complex nested JSON blob and because everything is unstructured, it seems like, I can't use JQ or JSONV or anything other Python script (as the data that I'm looking for is stored in arrays containing 1 dictionary entries (key=value) with same key names for what I'm looking for (ex: { "parameters": [ { "name": "jobname", "value": "shenzi" }, { "name": "pipelineVersion", "value": "1.2.3.4" }, ...so on..., ... ]) and the index for jobname and pipelineVersion or similar parameter names is not at the same index[X] location in every JSON entry I have.
Worst case, I can add conditional checks to see if the key at every index matches, jobname etc and then I get those fields what I looking for, but then, there are hundreds of such fields that I want to grab. I don't want to hard code those if possible.
I thought as my JSON entry is per line, I can simply write a cool patterns (ugly I know) but at least then I don't need to worry about the conditional code or just use BASH/sed/tr/cut power to get what I need but it seems like egrep -f or -o ... didn't work as shown above.
Sample JSON blob object (from one Jenkins job). There are different Jenkins build job's JSON blob entries (each having different JSON structures, parameters etc) in a single JenkinsJobsBuild collection in MongoDB.
See attached for sample JSON blob object.
{
"_id": {
"$oid": "5120349es967yhsdfs907c4f"
},
"actions": [
{
"causes": [
{
"shortDescription": "Started by an SCM change"
}
]
},
{
},
{
"oneClickDeployPossible": false,
"oneClickDeployReady": false,
"oneClickDeployValid": false
},
{
},
{
},
{
},
{
"cspec": "element * ...\/MyProject_latest_int\/LATESTnelement * ...\/MyProject_integration\/LATESTnelement \/vobs\/some_vob\/gigi \/main\/myproject_integration\/MyProject_Slot_0_maint_int\/LATESTnelement * ...\/myproject_integration\/LATESTnelement \/vobs\/some_vob \/main\/LATEST",
"latestBlsOnConfiguredStream": null,
"stream": null
},
{
},
{
"parameters": [
{
"name": "CLEARCASE_VIEWTAG",
"value": "jenkins_MyProject_latest"
},
{
"name": "BUILD_DEBUG",
"value": false
},
{
"name": "CLEAN_BUILD",
"value": true
},
{
"name": "BASEVERSION",
"value": "7.4.1"
},
{
"name": "ARTIFACTID",
"value": "lowercaseprojectname"
},
{
"name": "SYSTEM",
"value": "myprojectSystem"
},
{
"name": "LOT",
"value": "02"
},
{
"name": "PIPENUMBER",
"value": "7.4.1.303"
}
]
},
{
},
{
},
{
"parameters": [
{
"name": "DESCRIPTION_SETTER_DESCRIPTION",
"value": "lowercaseprojectname_V7.4.1.303"
}
]
},
{
},
{
},
{
},
{
}
],
"artifacts": [
],
"building": false,
"builtOn": "servername",
"changeSet": {
"items": [
{
"affectedPaths": [
"vobs\/some_vob\/myproject\/apps\/app1\/Java\/test\/src\/com\/giga\/highlevelproject\/myproject\/schedule\/validation\/SomeActivityTest.java"
],
"author": {
"absoluteUrl": "http:\/\/11.22.33.44:8080\/user\/hitj1620",
"fullName": "name1, name2 A"
},
"commitId": null,
"date": {
"$numberLong": "1489439532000"
},
"dateStr": "13\/03\/2017 21:12:12",
"elements": [
{
"action": "create version",
"editType": "edit",
"file": "vobs\/some_vob\/myproject\/apps\/app1\/Java\/test\/src\/com\/giga\/highlevelproject\/myproject\/schedule\/validation\/SomeActivityTest.java",
"operation": "checkin",
"version": "\/main\/MyProject_latest_int\/2"
}
],
"msg": "",
"timestamp": -1,
"user": "user111"
}
],
"kind": null
},
"culprits": [
{
"absoluteUrl": "http:\/\/11.22.33.44:8080\/user\/nuka1620",
"fullName": "nuka, Chuck"
}
],
"description": "lowercaseprojectname_V7.4.1.303",
"displayName": "#303",
"duration": 525758,
"estimatedDuration": 306374,
"executor": null,
"fullDisplayName": "MyProject \u00bb MyProject-build #303",
"highlevelproject_metrics_source_url": "http:\/\/11.22.33.44:8080\/job\/MyProject\/job\/MyProject-build\/303\/\/api\/json",
"id": "303",
"keepLog": false,
"number": 303,
"projectName": "MyProject-build",
"queueId": 8201,
"result": "SUCCESS",
"timeToRepair": null,
"timestamp": {
"$numberLong": "1489439650307"
},
"url": "http:\/\/11.22.33.44:8080\/job\/MyProject\/job\/MyProject-build\/303\/"
}

When the regexes are in a file, you don't have to escape double quotes; you don't have to fight to get your double quotes past the shell.
"number":.*"projectName
"projectName":.*,"queueId
"queueId":.*,"result
"result":".*$
When that's fixed, I get:
$ egrep -o -f egrep-pattern.txt jsonfile.json
"number": 303,"projectName
"queueId":8881,"result
$
The trouble now is, I think, that you've consumed the projectName with the first pattern, so the others don't get a chance to match it. Change the patterns to read up to a comma and you can get better results:
"number":[^,]*
"projectName":[^,]*
"queueId":[^,]*
"result":".*$
yields:
"number": 303
"projectName": "giga"
"queueId":8881
"result":"SUCESS"}
You could try to be more delicate, but you rapidly reach a point where a JSON-aware tool becomes more sensible. Commas in a string value would mess up the modified regexes, for example. (So, if the project name was "Giga, if not Tera", you'd have problems.)
Matching more general JSON name:value notation
As long as you're looking for simple "key":"quoted value" objects, you can use the following grep -E (aka egrep) command:
grep -Eoe '"[^"]+":"((\\(["\\/bfnrt]|u[0-9a-fA-F]{4}))|[^"])*"' data
Given the JSON-like data (in the file called data):
{"key1":"value","key2":"value2 with \"quoted\" text","key3":"value3 with \\ and \/ and \f and \uA32D embedded"}
that script produces:
"key1":"value"
"key2":"value2 with \"quoted\" text"
"key3":"value3 with \\ and \/ and \f and \uA32D embedded"
You can upgrade it to handle almost any valid JSON "key":value by using:
grep -Eoe '"[^"]+":(("((\\(["\\/bfnrt]|u[0-9a-fA-F]{4}))|[^"])*")|true|false|null|(-?(0|[1-9][0-9]*)(\.[0-9]+)?([eE][-+]?[0-9]+)?))' data
With a new data file containing:
{"key1":"value","key2":"value2 with \"quoted\" text"}
{"key3":"value3 with \\ and \/ and \f and \uA32D embedded"}
{"key4":false,"key5":true,"key6":null,"key7":7,"key8":0,"key9":0.123E-23}
{"key10":10,"key11":3.14159,"key12":0.876,"key13":-543.123}
the script produces:
"key1":"value"
"key2":"value2 with \"quoted\" text"
"key3":"value3 with \\ and \/ and \f and \uA32D embedded"
"key4":false
"key5":true
"key6":null
"key7":7
"key8":0
"key9":0.123E-23
"key10":10
"key11":3.14159
"key12":0.876
"key13":-543.123
You can follow the railroad diagrams in the outline JSON specification at http://json.org to see how I created the regex.
It could be enhanced by the judicious addition of [[:space:]]* in places where spaces are permitted but not required — before the key string, before the colon, after the colon (you could add it after the value too, but you probably don't want that).
Another simplification that I've taken is that the key doesn't allow for the various escape characters that the value string does. You could repeat that.
And, of course, this only works for 'leaf' name:value pairs; if a value is itself an object {…} or an array […], this doesn't handle the value as a whole.
However, this just goes to emphasize that it gets very messy very quickly and you would be better off using a special-purpose JSON query tool. One such tool is jq, as mentioned in a comment to the main query.

The complex JSON blob I had, was from Jenkins (i.e. Jenkins job's RestAPI data) that I had in MongoDB database.
To grab it from MongoDB, I used mongoexport command for generating (non-JsonArray or non-Pretty format) JSON blob successfully.
#/bin/bash
server=localhost
collectionFile=collections.txt
## Generate collection file contains all collections in the Jenkins database in MongoDB.
( set -x
mongo "mongoDbServer.company.com/database_Jenkins" --eval "rs.slaveOk();db.getCollectionNames()" --quiet > ${collectionFile}
)
## create collection based JSON files
for collection in $(cat ${collectionFile} | sed -e 's:,: :g')
do
mongoexport --host ${server} --db ${db} --collection "${collection}" --out ${exportDir}/${collection}.json
##mongoexport --host ${server} --db ${db} --collection "${collection}" --type=csv --fieldFile ~/mongoDB_fetch/get_these_csv_fields.txt --out ${exportDir}/${collection}.csv; ## This didn't work if you have nested fields. fieldFile file was just containing field name per line in a particular xyz.IndexNumber.yyy format.
done
Tried inbuilt mongoexport command's --type=csv with -f fields to catch topfield.0.subField, field2, field3.7.parameters.7.. nothing worked.
PS: The number after the . mark is how you define indexes if you are going to create a CSV file and using fields (mandatory) using mongoexport command.
As my JSON structure was all unstructured (Jenkins version bumps/upgrades happened in past and the data about a job was not the same structure), I tried this final sed trick (as JSON data per entry was in each individual line).
This sed command (as shown below) will give you all the keys and it's values (in the key=value format) per line at the LEAF field key=value level of almost any JSON blob / at least from the Jenkins JSON blob . Once you have this info, you can feed the output of this command to temporary file, then read all the value part (after the = mark) and create your CSV file acc. YES, you have to sort it so that your CSV file's fields are maintained in order for the header names and thus values are inserted to the right column/field. I calculated the fields names from all different collection JSON file's temporary key=value generated key names. Then, read all temporary collection files and added the values acc. into the final CSV file under respective header/field/column.
OK this is a weird solution but at least it's a solution - in one liner.
cat myJenkinsJob.json | sed "s/{}//g;s/,,*/,/g;s/},\"/\n/g;s/},{/\n/g;s/\([^\"a-zA-Z]\),\"/\1\n/g;s/:\[{/\n/g;s/\"name\":\"//g;s/\",\"value//g;s/,\"/\n/g;s/\":\"*/=/g;s/\"//g;s/[\[}\]]//g;s/[{}]//g;s/\$[a-zA-Z][a-zA-Z]*=//g"|grep "=" | sed "s/,$//"|egrep -v "=-|=$|=\[|^_class="
Tweak this acc. to your own solution for the sed part a little bit, if your JSON blob shows you funny characters that you don't want. The order of sed operations below is important. I'm also excluding any redundant variables (that I don't need at this time, for ex: JSON blob contained _class="..." values) so I'm excluding those via egrep -v after the last | pipe.

Related

Merging multiple JSON Lines files into a single JSON object

I'm trying to merge / reduce many JSON objects and somehow I'm not getting the expected result.
I'm only interested in getting all keys, the values and the number of items inside arrays are irrelevant.
file1.json:
{
"customerId": "xx",
"emails": [
{
"address": "james#zz.com",
"customType": "",
"type": "custom"
},
{
"address": "sales#x.com",
"primary": true
},
{
"address": "info#x.com"
}
]
}
{
"id": "654",
"emails": [
{
"address": "peter#x.com",
"primary": true
}
]
}
The desired output is a JSON object with all possible keys from all input objects. The values are irrelevant, any value from any input object is OK. But all keys from input objects must be present in output object:
{
"emails": [
{
"address": "james#zz.com", <--- any existing value works
"customType": "", <--- any existing value works
"type": "custom", <--- any existing value works
"primary": true <--- any existing value works
}
],
"customerId": "xx", <--- any existing value works
"id": "654" <--- any existing value works
}
I tried reducing it, but it misses many of the keys in the array:
$ jq -s 'reduce .[] as $item ({}; . + $item)' file1.json
{
"customerId": "xx",
"emails": [
{
"address": "peter#x.com",
"primary": true
}
],
"id": "654"
}
The structure of the objects contained in file1.json is unknown, so the solution must be agnostic of any keys/values and the solution must not assume any structure or depth.
Is it possible to fix this somehow considering how jq works? Or is it possible to solve this issue using another tool?
PS: For those of you that are curious, this is useful to infer a schema that can be created in a database. Given an arbitrary number of JSON objects with an arbitrary structure, it's easy to create a single JSON squished/merged/fused structure that will "accommodate" all JSON objects.
BigQuery is able to autodetect a schema, but only 500 lines are analyzed to come up with it. This presents problems if objects have different structures past that 500 line mark.
With this approach I can squish a JSON Lines file with 1000000s of objects into one line that can be then imported into BigQuery with the autodetect schema flag and it will work every time since BigQuery only has one line to analyze and this line is the "super-schema" of all the objects. After extracting the autodetected schema I can manually fine tune it to make sure types are correct and then recreate the table specifying my tuned schema:
$ ls -1 users*.json | wc --lines
3672
$ cat users*.json > users-all.json
$ cat users-all.json | wc --lines
146482633
$ jq 'squish' users-all.json > users-all-squished.json
$ cat users-all-squished.json | wc --lines
1
$ bq load --autodetect users users-all-squished.json
$ bq show schema --format=prettyjson users > users-schema.json
$ vi users-schema.json
$ bq rm --table users
$ bq mk --table users --schema=users-schema.json
$ bq load users users-all.json
[Some options are missing or changed for readability]
Here is a solution that produces the expected result in the sample example, and seems to meet all the stated requirements. It is similar to one proposed by #pmf on this page.
jq -n --stream '
def squish: map(if type == "number" then 0 else . end);
reduce (inputs | select(length==2)) as [$p, $v] ({}; setpath($p|squish; $v))
'
Output
For the example given in the Q, the output is:
{
"customerId": "xx",
"emails": [
{
"address": "peter#x.com",
"customType": "",
"type": "custom",
"primary": true
}
],
"id": "654"
}
As #peak has pointed out, some aspects are underspecified. For instance, what should happen with .customerId and .id? Are they always the same across all files (as suggested by the sample files provided)? Do you want the items of the .emails array just thrown into one large array, or do you want to have them "merged" by some criteria (e.g. by a common value in their .address field)? Here are some stubs to start from:
Simply concatenate the .emails arrays and take all other parts from the first file:
jq 'reduce inputs as $in (.; .emails += $in.emails)' file*.json
# or simpler
jq '.emails += [inputs.emails[]]' file*.json
Demo Demo
{
"emails": [
{
"address": "cc#xx.com"
},
{
"address": "james#zz.com",
"customType": "",
"type": "custom"
},
{
"address": "james#x.com"
},
{
"address": "sales#x.com",
"primary": true
},
{
"address": "info#x.com"
},
{
"address": "james#x.com"
},
{
"address": "sales#x.com",
"primary": true
},
{
"address": "info#x.com"
}
],
"customerId": "xx",
"id": "654"
}
Merge the objects in the .emails array by a common value in their .address field, with latter values overwriting former values for other fields with colliding names, and discard all other parts from the files:
jq -n 'reduce inputs.emails[] as $e ({}; .[$e.address] += $e) | map(.)' file*.json
Demo
[
{
"address": "cc#xx.com"
},
{
"address": "james#zz.com",
"customType": "",
"type": "custom"
},
{
"address": "james#x.com"
},
{
"address": "sales#x.com",
"primary": true
},
{
"address": "info#x.com"
}
]
If you are only interested in a list of unique field names for a given address, regardless of the counts and values used, you can also go with:
jq -n '
reduce inputs.emails[] as $e ({}; .[$e.address][$e | keys_unsorted[]] = 1)
| map_values(keys)
'
Demo
{
"cc#xx.com": [
"address"
],
"james#zz.com": [
"address",
"customType",
"type"
],
"james#x.com": [
"address"
],
"sales#x.com": [
"address",
"primary"
],
"info#x.com": [
"address"
]
}
The structure of the objects contained in file1.json is unknown, so the solution must be agnostic of any keys/values and the solution must not assume any structure or depth.
You can use the --stream flag to break down the structure into an array of paths and values, discard the values part and make the paths unique:
jq --stream -nc '[inputs[0]] | unique[]' file*.json
["customerId"]
["emails"]
["emails",0,"address"]
["emails",0,"customType"]
["emails",0,"primary"]
["emails",0,"type"]
["emails",1,"address"]
["emails",2]
["emails",2,"address"]
["emails",2,"primary"]
["emails",3]
["emails",3,"address"]
["id"]
Trying to build a representation of this, similar to any of the input files, comes with a lot of caveats. For instance, how would you represent in a single structure if one file had .emails as an array of objects, and another had .emails as just an atomic value, say, a string. You would not be able to represent this plurality without introducing new, possibly ambiguous structures (e.g. putting all possibilities into an array).
Therefore, having a list of paths could be a fair compromise. Judging by your desired output, you want to focus more on the object structure, so you could further reduce complexity by discarding the array indices. Depending on your use case, you could replace them with a single value to retain the information of the presence of an array, or discard them entirely:
jq --stream -nc '[inputs[0] | map(numbers = 0)] | unique[]' file*.json
["customerId"]
["emails"]
["emails",0]
["emails",0,"address"]
["emails",0,"customType"]
["emails",0,"primary"]
["emails",0,"type"]
["id"]
jq --stream -nc '[inputs[0] | map(strings)] | unique[]' file*.json
["customerId"]
["emails"]
["emails","address"]
["emails","customType"]
["emails","primary"]
["emails","type"]
["id"]
The following program meets these two key requirements:
"all keys from input objects must be present in output object";
"the solution must be agnostic of any keys/values and the solution must not assume any structure or depth."
The approach is the same as one suggested by #pmf, and for the example given in the Q, produces results that are very similar to the one that is shown:
jq -n --stream '
def squish: map(select(type == "string"));
reduce (inputs | select(length==2)) as [$p, $v] ({};
setpath($p|squish; $v))
'
With the given input, this produces:
{
"customerId": "xx",
"emails": {
"address": "peter#x.com",
"customType": "",
"type": "custom",
"primary": true
},
"id": "654"
}

Merge and Sort JSON using JQ

I have a file containing the following structure and unknown number of results:
{
"results": [
[
{
"field": "AccountID",
"value": "5177497"
},
{
"field": "Requests",
"value": "50900"
}
],
[
{
"field": "AccountID",
"value": "pro"
},
{
"field": "Requests",
"value": "251"
}
]
],
"statistics": {
"Matched": 51498,
"Scanned": 8673577,
"ScannedByte": 2.72400814E10
},
"status": "HOLD"
}
{
"results": [
[
{
"field": "AccountID",
"value": "5577497"
},
{
"field": "Requests",
"value": "51900"
}
],
"statistics": {
"Matched": 51498,
"Scanned": 8673577,
"ScannedByte": 2.72400814E10
},
"status": "HOLD"
}
There are multiple such results which are indexed as an array with the results folder. They are not seperated by a comma.
I am trying to just print The "AccountID" sorted by "Requests" in ZSH using jq. I have tried flattening them and using:
jq -r '.results[][0] |.value ' filename
jq -r '.results[][1] |.value ' filename
To get the Account ID and Requests seperately and sorting them. I don't think bash has a dictionary that can be used. The problem lies in the file as the Field and value are not key value pair but are both pairs. Therefore extracting them using the above two lines into seperate arrays and sorting by the second array seems a bit too long. I was wondering if there is a way to combine both the operations.
The other way is to combine it all to a string and sort it in ascending order. Python would probably have the best solution but the code requires to be a zsh or bash script.
Solutions that use sed, jq or any other ZSH supported compilers are welcome. If there is a way to create a dictionary in bash, please do let me know.
The projectd output requirement is just the Account ID vs Request Number.
5577497 has 51900 requests
5177497 has 50900 requests
pro has 251 requests
If you don't mind learning a little jq, it will probably be best to write a small jq program to do what you want.
To get you started, consider the following jq program, which assumes your input is a stream of valid JSON objects with a "results" key similar to your sample:
[inputs | .results[] | map( { (.field) : .value} ) | add]
After making minor changes to your input so that it consists of valid JSON objects, an invocation of jq with the -n option produces an array of AccountID/Requests objects:
[
{
"AccountID": "5177497",
"Requests": "50900"
},
{
"AccountID": "pro",
"Requests": "251"
},
{
"AccountID": "5577497",
"Requests": "51900"
}
]
You could (for example) now use jq's group_by to group these objects by AccountID, and thereby produce the result you want.
jq -S '.results[] | map( { (.field) : .value} ) | add' query-results-aggregate \
| jq -s -c 'group_by(.number_of_requests) | .[]'
This does the trick. Thanks to peak for the guidance.

Adding json array via JQ introduces unicode characters in string

I have a JSON file in which I want to append an array element, using bash and latest JQ installed. I am able to append it but the resulting string has unicode characters as can be seen below. The first element in validators array is the original and the second is the appended code. (not the whole json file)
"validators": [
{
"address": "85BAF568E7F89277E47D3FC8E111775A4F6992FA",
"pub_key": {
"type": "tendermint/PubKeyEd25519",
"value": "BCzCLcW7rZ9VJgAtEUoDN17qcZw8ZvpYbPsL6eOy3No="
},
"power": "10",
"name": ""
},
{
"address": "\u001b[32m\"F75E15A3949824B685A3C5BFCDEED7E3DA4277AE\"\u001b[0m\r",
"pub_key": "\u001b[37m{\u001b[0m\u001b[34;1m\"type\"\u001b[0m\u001b[37m:\u001b[0m\u001b[32m\"tendermint/PubKeyEd25519\"\u001b[0m\u001b[37m,\u001b[0m\u001b[34;1m\"value\"\u001b[0m\u001b[37m:\u001b[0m\u001b[32m\"INeR51z41k6jPAEJ5rV+1TY+4sxnbIykc4bfJFmSCQ8=\"\u001b[0m\u001b[37m\u001b[37m}\u001b[0m\r",
"power": "10",
"name": "node2"
}
]
Printing the address element separately prints the element without any utf/unicode encoding chars.
{
"type": "tendermint/PubKeyEd25519",
"value": "BCzCLcW7rZ9VJgAtEUoDN17qcZw8ZvpYbPsL6eOy3No="
}
I merge the code using the following code:
cat genesis.json.src | jq --arg pub_key $PK --arg name node$i --arg addr $ADDR '.validators+= [{address: $addr, pub_key: $pub_key, power:"10",name:$name}]' > genesis.json.dest
I am running macOS. Any help or suggestion would be appreciated.
As #choroba mentioned in the comment, this is colour sequence characters. I removed them by adding a -M flag for JQ that disables colours.

Why is my jq failing on my JSON?

Haven't used jq before but I'm wanting to build a shell script that will get a JSON response and extract just the values. To learn I thought I would try on my blog's WP API but for some reason I'm getting an error of:
jq: error (at :322): Cannot index array with string "slug"
When researching for and testing previous questions:
jq: Cannot index array with string
jq is sed for JSON
JSON array to bash variables using jq
How to use jq in a shell pipeline?
How to extract data from a JSON file
The above reading I've tried to code:
URL="http://foobar.com"
RESPONSE=$(curl -so /dev/null -w "%{http_code}" $URL)
WPAPI="/wp-json/wp/v2"
IDENTIFIER="categories"
if (("$RESPONSE" == 200)); then
curl -s {$URL$WPAPI"/"$IDENTIFIER"/"} | jq '.' >> $IDENTIFIER.json
result=$(jq .slug $IDENTIFIER.json)
echo $result
else
echo "Not returned status 200";
fi
An additional attempt changing the jq after the curl:
curl -s {$URL$WPAPI"/"$IDENTIFIER"/"} | jq '.' | $IDENTIFIER.json
result=(jq -r '.slug' $IDENTIFIER.json)
echo $result
I can modify the uncompress with the python JSON tool:
result=(curl -s {$URL$WPAPI"/"$IDENTIFIER"/"} | python -m json.tool > $IDENTIFIER.json)
I can save the JSON to a file but when I use jq I cannot get just the slug and here are my other trys:
catCalled=$(curl -s {$URL$WPAPI"/"$IDENTIFIER"/"} | python -m json.tool | ./jq -r '.slug')
echo $catCalled
My end goal is to try to use jq in a shell script and build a slug array with jq. What am I doing wrong in my jq and can I use jq on a string without creating a file?
Return from curl after uncompress per comment request:
[
{
"id": 4,
"count": 18,
"description": "",
"link": "http://foobar.com/category/foo/",
"name": "Foo",
"slug": "foo",
"taxonomy": "category",
},
{
"id": 8,
"count": 9,
"description": "",
"link": "http://foobar.com/category/bar/",
"name": "Bar",
"slug": "bar",
"taxonomy": "category",
},
{
"id": 5,
"count": 1,
"description": "",
"link": "http://foobar.com/category/mon/",
"name": "Mon",
"slug": "mon",
"taxonomy": "category",
},
{
"id": 11,
"count": 8,
"description": "",
"link": "http://foobar.com/category/fort/",
"name": "Fort",
"slug": "fort",
"taxonomy": "category",
}
]
eventually my goal is trying to get the name of the slug's into an array like:
catArray=('foo','bar','mon', 'fort')
There are 2 issues here:
slug is not a root level element in your example json. The root level element is an array. If you want to access the slug property of each element of the array, you can do so like this:
jq '.[].slug' $IDENTIFIER.json
Your example json has trailing commas after the last property of each array element. Remove the commas after "taxonomy": "category".
If I take your sample json, remove the errant commas, save it to a plain text file called test.json and run the following command:
jq '.[].slug' test.json
I get the following output:
"foo"
"bar"
"mon"
"fort"
Preprocessing
Unfortunately, the JSON-like data shown as having been produced by curl is not strictly JSON. jq does not have a "relaxed JSON" mode, so in order to use jq, you will have to preprocess the JSON-like data, e.g. using hjson (see http://hjson.org/):
$ hjson -j input.qjson > input.json
jq
With the JSON in input.json:
$ jq -c 'map(.slug)' input.json
["foo","bar","mon","fort"]
your string is not json, notice how the last member of your objects ends with a comma,
{foo:"bar",baz:9,}
this is legal in javascript, but it's illegal in json. if you are supposed to be receiving json from that endpoint, then contact the people behind it and tell them to fix the bug (it's breaking the json specs by ending objects's last member with a comma, which is illegal in json.) - until it's fixed, i guess you can patch it with a little regex, but it's a dirty quickfix, and probably not very reliable, but running it through
perl -p -0777 -e 's/\"\,\s*}/\"}/g;' makes it legal json..

How to parse asterisk in json file as string via jq

Here is a json file named test.json for testing
{
"name": "Google",
"location": {
"street": "1600 Amphitheatre Parkway",
"city": "Mountain View",
"state": "California",
"country": "US"
},
"employees": [
{
"name": "Michael",
"division": "Engineering"
},
{
"name": "Laura",
"division": "HR"
},
{
"name": "Elise",
"division": "Marketing * test"
}
]
}
if I use the jq code to parser it like below:
cat test.json | jq -r '.employees[2].division'
it will work well and give a correct result:
Marketing * test
but I use $(), the bad thing will happen!
echo $(cat test.json | jq -r '.employees[2].division')
the result will list all file names under current folder! like:
my1.json my2.json test.json test ...
I guess it $() run asterisk * as a shell script, but a string.
so how to make asterisk (*) in json file just as a string when I am using jq ?. I am using Google cloud platform and Ubuntu 17.10
Always use double-quotes around command-substitution to avoid * to be treated literally. The * is a special character in shell that is a wildcard entry that expands to all the files available in the current working directory. You need to quote it to deprive of its special meaning (Refer GNU bash man page under Parameters section).
Also jq can process the file directly, you can avoid useless cat usage.
result="$(jq -r '.employees[2].division' < test.json)"
echo "$result"
should produce the result as expected.