I am using pandas to normalize some json data. I am getting stuck on this issue when more than 1 section is either an object or an array.
If i use the record_path on Car it breaks on the second.
Any pointers on how to get something like this to create a line in the csv per Car and per Location?
[
{
"Name": "John Doe",
"Car": [
"Car1",
"Car2"
],
"Location": "Texas"
},
{
"Name": "Jane Roe",
"Car": "Car1",
"Location": [
"Illinois",
"Kansas"
]
}
]
Here is the output
Name,Car,Location
John Doe,"['Car1', 'Car2']",Texas
Jane Roe,Car1,"['Illinois', 'Kansas']"
Here is the code:
with open('file.json') as data_file:
data = json.load(data_file)
df = pd.io.json.json_normalize(data, errors='ignore')
Would like it to end up like this:
Name,Car,Location
John Doe,Car1,Texas
John Doe,Car2,Texas
Jane Roe,Car1,Illinois
Jane Roe,Car1,Kansas
The answers worked great until I ran into a json file with extra data. This what a file looks like with the extra values.
{
Customers:[
{
"Name": "John Doe",
"Car": [
"Car1",
"Car2"
],
"Location": "Texas",
"Repairs: {
"RepairLocations": {
"RepairsCompleted":[
"Fix1",
"Fix2"
]
}
}
},
{
"Name": "Jane Roe",
"Car": "Car1",
"Location": [
"Illinois",
"Kansas"
]
}
]
}
Here is what I am going for. I think its the most readable in this format but anything would at least should all the keys
Name,Car,Location,Repairs:RepairLocation
John Doe,Car1,Texas,RepairsCompleted:Fix1
John Doe,Car1,Texas,RepairsCompleted:Fix2
John Doe,Car2,Texas,RepairsCompleted:Fix1
John Doe,Car2,Texas,RepairsCompleted:Fix2
Jane Roe,Car1,Illinois,
Jane Roe,Car1,Kansas,
Any suggestions on getting this second part?
A simple jq solution which is also a bit more generic than needed here:
["Name", "Car", "Location"],
(.[]
| [.Name] + (.Car|..|scalars|[.]) + (.Location|..|scalars|[.]))
| #csv
You're looking for something like this:
def expand($keys):
. as $in
| reduce $keys[] as $k ( [{}];
map(. + {
($k): ($in[$k] | if type == "array" then .[] else . end)
})
) | .[];
(.[0] | keys_unsorted) as $h
| $h, (.[] | expand($h) | [.[$h[]]]) | #csv
REPL demo
Related
{
"Users": [
{
"Attributes": [
{
"Name": "sub",
"Value": "1"
},
{
"Name": "phone_number",
"Value": "1234"
},
{
"Name": "referral_code",
"Value": "abc"
}
]
},
{
"Attributes": [
{
"Name": "sub",
"Value": "2"
},
{
"Name": "phone_number",
"Value": "5678"
},
{
"Name": "referral_code",
"Value": "def"
}
]
}
]
}
How can I produce output like below ?
1,1234,abc
2,5678,def
jq '.Users[] .Attributes[] .Value' test.json
produces
1
1234
abc
2
5678
def
Not sure this is the cleanest way to handle this, but the following will get the desired output:
.Users[].Attributes | map(.Value) | #csv
Loop through all the deep Attributes .Users[].Attributes
map() to get all the Value's
Convert to #csv
jqPlay demo
If you don't need the output to be guaranteed to be CSV, and if you're sure the "Name" values are presented in the same order, you could go with:
.Users[].Attributes
| from_entries
| [.[]]
| join(",")
To be safe though it would be better to ensure consistency of ordering:
(.Users[0] | [.Attributes[] | .Name]) as $keys
| .Users[]
| .Attributes
| from_entries
| [.[ $keys[] ]]
| join(",")
Using join(",") will produce the comma-separated values as shown in the Q (without the quotation marks), but is not guaranteed to produce the expected CSV for all valid values of the input. If you don't mind the pesky quotation marks, you could use #csv, or if you want to skip the quotation marks around all numeric values:
map(tonumber? // .) | #csv
I have the following json file for exemple:
{
"FOO": {
"name": "Donald",
"location": "Stockholm"
},
"BAR": {
"name": "Walt",
"location": "Stockholm"
},
"BAZ": {
"name": "Jack",
"location": "Whereever"
}
}
and i have this jq command :
cat json | jq .[] | {newname : select(.location=="Stockholm") | .name , contains_w : select(.location=="Stockholm") | .name | startswith("W")}
so i get the result :
{
"newname": "Donald",
"contains_w": false
}
{
"newname": "Walt",
"contains_w": true
}
my question is : is there any way to DRY my command ?
i mean how can i get the same result without duplicate the part :
select(.location=="Stockholm") | .name
how can i reuse the result of newname feild ?
i have a really big file to work with so i don't want to waste time and resources.
You are filtering multiple times during object construction. You could filter first and then do the construction on the filtered list eg.
map(select(.location=="Stockholm"))
| map({newname: .name, contains_w: (.name | startswith("W"))})
https://jqplay.org/s/aXjlgOEDnb
I have a json structure that looks like this:
{
"lorry1": {
"box1": [
{"item": "shoes", "state": "new"},
{"item": "snacks", "state": "new"}
],
"box2": [
{"item": "beer", "state": "cold"},
{"item": "potatoes"}
]
},
"lorry2": {
"box1": [
{"item": "shoes", "state": "new"},
{"item": "snacks", "state": "new"}
],
"box2": [
{"item": "beer", "state": "lukewarm"}
]
}
}
Now I want to know where I can find shoes:
I could come up with this jq query:
to_entries | select(.[].value | .[][].item=="shoes") | map({"lorry": "\(.key)" })
But that only gives me the lorries. Useful, but not quite there yet. I'd like to know the box they're in as well.
I came up with this, but it obviously is not correct:
to_entries | select(.[].value | .[][].item=="shoes") | keys as $box |map({"lorry": "\(.key)", "box": $box })
The answer I'd like to get is lorry1, box1 and lorry2, box1.
Even better yet: I'd like to find all items and provide the information, like this:
"shoes": [ {"lorry1", "box1"}, {"lorry2", "box1" } ],
"snacks": [ {"lorry1", "box1"}, {"lorry2", "box1"} ],
"beer": [ {"lorry1", "box2"}, {"lorry2", "box2"} ],
"potatoes": [ {"lorry1", "box2"} ]
but that may be asking a bit too much :)
This looks like a overkill to me but it does the job.
[path(.[][][].item) as $p | [$p, getpath($p)]] |
group_by( .[1] ) |
map({(.[0][1]): (. | map([.[0][0,1]]))})|
add
Save the above jq filter in file item_in.jq and run it as jq --from-file item_in.jq. Passing your input to this gives the following output:
{
"beer": [
[
"lorry1",
"box2"
],
[
"lorry2",
"box2"
]
],
"potatoes": [
[
"lorry1",
"box2"
]
],
"shoes": [
[
"lorry1",
"box1"
],
[
"lorry2",
"box1"
]
],
"snacks": [
[
"lorry1",
"box1"
],
[
"lorry2",
"box1"
]
]
}
The initial transformation was to dump leaves and their paths from the input JSON tree.
See
https://github.com/stedolan/jq/issues/78#issuecomment-17819519
The answer I'd like to get is lorry1, box1 and lorry2, box1
In this case, you can get it with:
path(.. | select(.item? == "shoes"))
Which returns:
["lorry1","box1",0]
["lorry2","box1",0]
These are the paths in your object that will lead to an object which .item property is set to "shoes"
Here's a generic solution that does not assume the "items" are in arrays, or even that the values associated with the "item" keys are always strings:
jq -c '. as $in
| [paths as $p | select($p[-1] == "item") | $p]
| group_by(. as $p | $in|getpath($p))
| .[]
| (.[0] as $p | $in | getpath($p)) as $v
| {($v|tostring): ( map(.[:-1] | if .[-1] | type == "number" then .[:-1] else . end)) }
'
Output
With your input:
{"beer":[["lorry1","box2"],["lorry2","box2"]]}
{"potatoes":[["lorry1","box2"]]}
{"shoes":[["lorry1","box1"],["lorry2","box1"]]}
{"snacks":[["lorry1","box1"],["lorry2","box1"]]}
If you want the output as a single JSON object, then collect the above into an array and use add.
I have some database in JSON file, I had already sort and remove some data from object by using ./jq
But I'm stuck at adding new variables in object.
Here is a part of my JSON file:
{
"Name": "Forrest.Gump.1994.MULTi.1080p.AMZN.WEB-DL.DDP5.1.H264-Ao",
"ID": "SMwIkBoC2blXeWnBa9Hjge9YPs90"
},
{
"Name": "Point.Blank.2019.MULTi.1080p.NF.WEB-DL.DDP5.1.x264-Ao",
"ID": "OZI4mOuBXuJ7b89FLgXJoozyhHe9"
},
{
"Name": "The.Incredible.Hulk.2008.MULTi.2160p.UHD.BluRay.REMUX.HDR.HEVC.DTS-HD.MA.7.1",
"ID": "jZzR4_B_vjm593cYKR7j97XAMv6d"
},
Is it possible by using jq and for example RegExp to extract some data and insert it as new variable in object, I wish to achive something like this:
{
"Name": "Forrest.Gump.1994.MULTi.1080p.AMZN.WEB-DL.DDP5.1.H264-Ao",
"ID": "SMwIkBoC2blXeWnBa9Hjge9YPs90",
"Year": "1994",
"Res": "1080p"
},
{
"Name": "Point.Blank.2019.MULTi.1080p.NF.WEB-DL.DDP5.1.x264-Ao",
"ID": "OZI4mOuBXuJ7b89FLgXJoozyhHe9",
"Year": "2019",
"Res": "1080p"
},
{
"Name": "The.Incredible.Hulk.2008.MULTi.2160p.UHD.BluRay.REMUX.HDR.HEVC.DTS-HD.MA.7.1",
"ID": "jZzR4_B_vjm593cYKR7j97XAMv6d",
"Year": "2008",
"Res": "2160p"
},
Thanks in advance
Here's one solution that assumes for simplicity that the fragment you've shown comes from an array:
map( . as $in
| .Name | capture(".*[.](?<year>[12][0-9]{3})[.](?<rest>.*)")
| .year as $year
| (.rest | split(".") | .[1]) as $res
| $in + {Year: $year, Res: $res} )
Hopefully, once you're familiar with some jq basics, such as map, capture, and the EXP as $var syntax, the above will be more-or-less self-explanatory.
As a one-liner
Here's the same thing but as a one-liner:
map(. + (.Name | capture(".*[.](?<Year>[12][0-9]{3})[.](?<Res>.*)") | {Year, Res: (.Res | split(".")[1])}))
My json is as shown below:
[
[
{
"id": "abcd"
},
{
"address": [
"140 Deco st"
]
}
],
[
{
"id": "xyz"
},
{
"dummy": "This is dummy"
}
],
[
{
"id": "12356"
},
{
"address": [
"140 Deco st"
]
}
]]
Now, I want to capture only those ids who have dummy value of "This is dummy". Some of the data may or may not have dummy and address fields.
I tried below but it gave me error "... cannot have their containment checked"
jq -c '.[] | .[] | select(.dummy | contains("This is dummy")) | .[] | .id'
Any help is much appreciated!
contains is quite tricky to use correctly. Since the requirement is:
to capture only those ids who have dummy value of "This is dummy"
I would suggest:
.[]
| select( any(.[]; .dummy == "This is dummy") )
| add
| .id
or perhaps (depending on your detailed requirements):
.[]
| select( any(.[]; .dummy == "This is dummy") )
| .[]
| .id? // empty