I have a json file that is filtered by US County and has, in the "properties" section, median income. So this json file contains median income by county.
{
"type": "Topology",
"transform": {
"scale": [
0.035896170617061705,
0.005347309530953095
],
"translate": [
-179.14734,
17.884813
]
},
"objects": {
"us_counties_20m": {
"type": "GeometryCollection",
"geometries": [
{
"type": "Polygon",
"arcs": [
[
0,
1,
2,
3,
4
]
],
"id": "0500000US01001",
"properties": {
"PRICE": 48863
}
},
{
"type": "Polygon",
"arcs": [
[
5,
6,
7,
8,
9,
10
]
],
"id": "0500000US01009",
"properties": {
"PRICE": 41940
}
},
{
"type": "Polygon",
"arcs": [
[
11,
12,
13,
14,
15
]
],
"id": "0500000US01017",
"properties": {
"PRICE": 33500
}
},
{
"type": "Polygon",
"arcs": [
[
16,
17,
-3,
18,
19,
20,
21
]
],
"id": "0500000US01021",
"properties": {
"PRICE": 38833
}
},
I wish to add to the "properties" section another price, namely the median home price per county. So I have a second json file with data like this:
[
{
"Full County Number": 56045,
"Price-RangeQ42019": "$150,000-$350,000",
"Geography": "Weston County, Wyoming",
"Latitude (generated)": 43.8403,
"Longitude (generated)": -104.5684,
"Q42019 Price": "$178,218"
},
{
"Full County Number": 56043,
"Price-RangeQ42019": "$150,000-$350,000",
"Geography": "Washakie County, Wyoming",
"Latitude (generated)": 43.8356,
"Longitude (generated)": -107.6602,
"Q42019 Price": "$170,665"
},
, where I want all the categories of the 2nd json to be appended to the "properties" section as a separate category.
Desired output (the "properties" section contains more info):
{
"type": "Topology",
"transform": {
"scale": [
0.035896170617061705,
0.005347309530953095
],
"translate": [
-179.14734,
17.884813
]
},
"objects": {
"us_counties_20m": {
"type": "GeometryCollection",
"geometries": [
{
"type": "Polygon",
"arcs": [
[
0,
1,
2,
3,
4
]
],
"id": "0500000US01001",
"properties": {
"PRICE": 48863
"Price-RangeQ42019": "$150,000-$350,000",
"Geography": "Washakie County, Wyoming",
"Latitude (generated)": 43.8356,
"Longitude (generated)": -107.6602,
"Q42019 Price": "$170,665"
}
},
etc...
The "id" and the "Full County Number" in the first and second json files match up exactly. However, the "Full County Number" lacks the "0500000US" prefix before each county. How might I merge these 2 json files to get the third json with the additional property?
Thanks so much in advance.
The following should come close to providing a solution. First, a dictionary ($dict) is constructed, and then this dictionary is used to update the first file.
Invocation:
jq -n -f program.jq secondfile.json firstfile.json
where program.jq contains:
def lpad:
tostring | if length < 5 then ("00000" + .) | .[-5:] else . end;
(input
| map( with_entries(if .key == "Full County Number"
then .key = "id" | .value |= "0500000US" + lpad
else .
end ) )
| INDEX(.[]; .id) ) as $dict
| inputs
| .objects.us_counties_20m.geometries |=
map( .id as $id
| (.properties += $dict[$id]) )
Related
I'm needing to solve this with JQ. I have a large lists of arrays in my json file and am needing to do some sort | uniq -c types of stuff on them. Specifically I have a relatively nasty looking fruit array that needs to break down what is inside. I'm aware of unique and things like that, and imagine there is likely a simple way to do this, but I've been trying run down assigning things as variables and appending and whatnot, but I can't get the most basic part of counting the unique values per that fruit array, and especially not without breaking the rest of the content (hence the variable ideas). Please tell me I'm overthinking this.
I'd like to turn this;
[
{
"uid": "123abc",
"tID": [
"T19"
],
"fruit": [
"Kiwi",
"Apple",
"",
"",
"",
"Kiwi",
"",
"Kiwi",
"",
"",
"Mango",
"Kiwi"
]
},
{
"uid": "456xyz",
"tID": [
"T15"
],
"fruit": [
"",
"Orange"
]
}
]
Into this;
[
{
"uid": "123abc",
"tID": [
"T19"
],
"metadata": [
{
"name": "fruit",
"value": "Kiwi - 3"
},
{
"name": "fruit",
"value": "Mango - 1"
},
{
"name": "fruit",
"value": "Apple - 1"
}
]
},
{
"uid": "456xyz",
"tID": [
"T15"
],
"metadata": [
{
"name": "fruit",
"value": "Orange - 1"
}
]
}
]
Using group_by and length would be one way:
jq '
map(with_entries(select(.key == "fruit") |= (
.value |= (group_by(.) | map(
{name: "fruit", value: "\(.[0] | select(. != "")) - \(length)"}
))
| .key = "metadata"
)))
'
[
{
"uid": "123abc",
"tID": [
"T19"
],
"metadata": [
{
"name": "fruit",
"value": "Apple - 1"
},
{
"name": "fruit",
"value": "Kiwi - 4"
},
{
"name": "fruit",
"value": "Mango - 1"
}
]
},
{
"uid": "456xyz",
"tID": [
"T15"
],
"metadata": [
{
"name": "fruit",
"value": "Orange - 1"
}
]
}
]
Demo
I have file with 30, 000 JSON lines delimited by new line. I am using JQ to process it.
Below is each line schema (new.json).
{
"indexed": {
"date-parts": [
[
2020,
8,
13
]
],
"date-time": "2020-08-13T06:27:26Z",
"timestamp": 1597300046660
},
"reference-count": 42,
"publisher": "American Chemical Society (ACS)",
"issue": "3",
"content-domain": {
"domain": [],
"crossmark-restriction": false
},
"short-container-title": [
"Org. Lett."
],
"published-print": {
"date-parts": [
[
2005,
2
]
]
},
"DOI": "10.1021/ol047829t",
"type": "journal-article",
"created": {
"date-parts": [
[
2005,
1,
27
]
],
"date-time": "2005-01-27T05:53:29Z",
"timestamp": 1106805209000
},
"page": "383-386",
"source": "Crossref",
"is-referenced-by-count": 38,
"title": [
"Liquid-Crystalline [60]Fullerene-TTF Dyads"
],
"prefix": "10.1021",
"volume": "7",
"author": [
{
"given": "Emmanuel",
"family": "Allard",
"affiliation": []
},
{
"given": "Frédéric",
"family": "Oswald",
"affiliation": []
},
{
"given": "Bertrand",
"family": "Donnio",
"affiliation": []
},
{
"given": "Daniel",
"family": "Guillon",
"affiliation": []
}
],
"member": "316",
"container-title": [
"Organic Letters"
],
"original-title": [],
"link": [
{
"URL": "https://pubs.acs.org/doi/pdf/10.1021/ol047829t",
"content-type": "unspecified",
"content-version": "vor",
"intended-application": "similarity-checking"
}
],
"deposited": {
"date-parts": [
[
2020,
4,
7
]
],
"date-time": "2020-04-07T13:39:55Z",
"timestamp": 1586266795000
},
"score": null,
"subtitle": [],
"short-title": [],
"issued": {
"date-parts": [
[
2005,
2
]
]
},
"references-count": 42,
"alternative-id": [
"10.1021/ol047829t"
],
"URL": "http://dx.doi.org/10.1021/ol047829t",
"relation": {},
"ISSN": [
"1523-7060",
"1523-7052"
],
"issn-type": [
{
"value": "1523-7060",
"type": "print"
},
{
"value": "1523-7052",
"type": "electronic"
}
],
"subject": [
"Physical and Theoretical Chemistry",
"Organic Chemistry",
"Biochemistry"
]
}
For every DOI, I need to obtain the values of given and family key in the same cell of the same row of that DOI in the CSV/TSV format.
The expected output for the above json is (in CSV/TSV format):
|DOI| givenName|familyName|
|10.1021/ol047829t|Emmanuel; Frédéric; Bertrand; Daniel;|Allard; Oswald; Donnio; Guillon|
I am using the below command line but it is throwing error and when I try to alter I am unable to get CSV/TSV output at all.
cat new.json | jq -r "[.DOI, .publisher, .author[] | .given] | #tsv" > manage.tsv
The same logic applies for subject key also. I am using the below command line to output values of subject key to CSV but it is throwing only the first element (in this case only: "Physical and Theoretical Chemistry")
cat new.json | jq -c -r "[.DOI, .publisher, .subject[0]] | #csv" > manage.csv
Any pointers for right jq command line will be of great help.
Join given and family names by semicolons separately, then pass resulting strings as fields to the TSV filter.
["DOI", "givenName", "familyName"],
(inputs | [.DOI, (.author | map(.given), map(.family) | join("; "))])
| #tsv
Online demo
Note that you need to invoke JQ with -r and -n flags for this to work and produce a valid TSV output.
I have a project I'm working on that creates a choropleth map with all US county borders loaded from file1.json and filled with a color gradient based on values in file2.json. In previous iterations, I just enter values manually into file1.json, but now I want to expand my map and make it more user-friendly.
file1.json is structured like this:
{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"properties": {
"GEO_ID": "0500000US06001",
"STATE": "06",
"COUNTY": "001",
"NAME": "Alameda",
"LSAD": "County",
"CENSUSAREA": 739.017
},
"geometry": {
"type": "Polygon",
"coordinates": [
[
[
-122.30936,
37.77615
],
[
-122.317215,
37.778527
]
]
]
}
},
...
]
}
file2.json is structued like this:
[
{
"County": "Alameda",
"Count": 25
},
{
"County": "Amador",
"Count": 1
},
{
"County": "Butte",
"Count": 2
},
...
]
I want to create a new file that includes everything from file1.json, but append it to include the relevent Count field based on the County field.
The result would look like this:
[
{
"type": "Feature",
"properties": {
"GEO_ID": "0500000US06001",
"STATE": "06",
"COUNTY": "001",
"NAME": "Alameda",
"Count": "25",
"LSAD": "County",
"CENSUSAREA": 739.017
},
"geometry": {
"type": "Polygon",
"coordinates": [
[
[
-122.30936,
37.77615
],
[
-122.317215,
37.778527
]
]
]
}
},
...
]
I'm new to using jq, but I've played around with it enough to get it running in PowerShell.
Here is a test.jq file which may help
# utility to create lookup table from array of objects
# k is the name to use as the key
# f is a function to compute the value
#
def obj(k;f): reduce .[] as $o ({}; .[$o[k]] = ($o | f));
# create map from county to count
( $file2 | obj("County";.Count) ) as $count
# add .properties.Count to each feature
| .features |= map( .properties.Count = $count[.properties.NAME] )
Example use assuming suitable file1.json and file2.json:
$ jq -M --argfile file2 file2.json -f test.jq file1.json
{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"properties": {
"GEO_ID": "0500000US06001",
"STATE": "06",
"COUNTY": "001",
"NAME": "Alameda",
"LSAD": "County",
"CENSUSAREA": 739.017,
"Count": 25
},
"geometry": {
"type": "Polygon",
"coordinates": [
[
[
-122.30936,
37.77615
],
[
-122.317215,
37.778527
]
]
]
}
}
]
}
Try it online!
I notice that "Count" is a string in your example output but it's a number in the sample file2. If you need to convert that to a string you can include a call to tostring. e.g.
.features |= map( .properties.Count = ( $count[.properties.NAME] | tostring ) )
or you could perform the conversion when the lookup table is created, e.g.
( $file2 | obj("County"; .Count | tostring ) ) as $count
I have been playing around with jq to format a json file but I am having some issues trying to solve a particular transformation. Given a test.json file in this format:
[
{
"name": "A", // This would be the first key
"number": 1,
"type": "apple",
"city": "NYC" // This would be the second key
},
{
"name": "A",
"number": "5",
"type": "apple",
"city": "LA"
},
{
"name": "A",
"number": 2,
"type": "apple",
"city": "NYC"
},
{
"name": "B",
"number": 3,
"type": "apple",
"city": "NYC"
}
]
I was wondering, how can I format it this way using jq?
[
{
"key": "A",
"values": [
{
"key": "NYC",
"values": [
{
"number": 1,
"type": "a"
},
{
"number": 2,
"type": "b"
}
]
},
{
"key": "LA",
"values": [
{
"number": 5,
"type": "b"
}
]
}
]
},
{
"key": "B",
"values": [
{
"key": "NYC",
"values": [
{
"number": 3,
"type": "apple"
}
]
}
]
}
]
I have followed this thread Using jq, convert array of name/value pairs to object with named keys and tried to group the json using this expression
jq '. | group_by(.name) | group_by(.city) ' ./test.json
but I have not been able to add the keys in the output.
You'll want to group the items at the different levels and building out your result objects as you want.
group_by(.name) | map({
key: .[0].name,
values: (group_by(.city) | map({
key: .[0].city,
values: map({number,type})
}))
})
Just keep in mind that group_by/1 yields groups in a sorted order. You'll probably want an implementation that preserves that order.
def group_by_unsorted(key_selector):
reduce .[] as $i ({};
.["\($i|key_selector)"] += [$i]
)|[.[]];
I want to convert a complex JSON file into a simple JSON file using JQ. However, the query I'm using generates an incorrect output.
My (cut down) JSON file:
[
{
"id": 100,
"foo": [
{
"bar": [
{"type": "read"},
{"type": "write"}
],
"users": ["admin_1"],
"groups": []
},
{
"bar": [
{"type": "execute"},
{ "type": "read"}
],
"users": [],
"groups": ["admin_2"]
}
]
},
{
"id": 101,
"foo": [
{
"bar": [
{"type": "read"}
],
"users": [
"admin_3"
],
"groups": []
}
]
}
]
I need to generate a flatter JSON file and combine the users and groups into one field, similar to this:
[
{
"id": 100,
"users_groups": [
"admin_1",
"admin_2"
],
"bar": ["read"]
},
{
"id": 100,
"users_groups": ["admin_1"],
"bar": ["write"]
},
{
"id": 100,
"users_groups": ["admin_2"],
"bar": ["execute"]
},
{
"id": 101,
"users_groups": ["admin_3"],
"bar": ["read"]
}
]
Everything I try in JQ results in me getting an incorrect output (where admin_1 incorrectly has bar=execute and admin_2 incorrectly has bar=write), similar to the following:
[
{
"id": 100,
"users_groups": [
"admin_1",
"admin_2"
],
"bar": ["read", "write", "execute"]
},
{
"id": 101,
"users_groups": ["admin_3"],
"bar": ["read"]
}
]
I have tried many vairiats of this query - any idea what I should be doing instead?
cat file.json | jq -r '[.[] | select(has("foo")) |{"id", "users":(.foo[] | .users), "groups":(.foo[] | .groups), "bar":([.foo[].bar[] | .type])} ] '
The following filter groups by "type" as the question seems to require:
map(.id as $id
| [.foo[]
| {id: $id, bar: .bar[].type} +
{"users_groups": (.users + .groups)[]} ]
| group_by(.bar)
| map(.[0] + {"users_groups": [.[].users_groups]}) )
Output
[
[
{
"id": 100,
"bar": "execute",
"users_groups": [
"admin_2"
]
},
{
"id": 100,
"bar": "read",
"users_groups": [
"admin_1",
"admin_2"
]
},
{
"id": 100,
"bar": "write",
"users_groups": [
"admin_1"
]
}
],
[
{
"id": 101,
"bar": "read",
"users_groups": [
"admin_3"
]
}
]
]
Variations
To achieve the array-of-objects output format, simply tack on | [.[][]];
it would similarly be trivially easy to ensure that .bar is array-valued, though that might be pointless given that the grouping is by .type.