I have a project I'm working on that creates a choropleth map with all US county borders loaded from file1.json and filled with a color gradient based on values in file2.json. In previous iterations, I just enter values manually into file1.json, but now I want to expand my map and make it more user-friendly.
file1.json is structured like this:
{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"properties": {
"GEO_ID": "0500000US06001",
"STATE": "06",
"COUNTY": "001",
"NAME": "Alameda",
"LSAD": "County",
"CENSUSAREA": 739.017
},
"geometry": {
"type": "Polygon",
"coordinates": [
[
[
-122.30936,
37.77615
],
[
-122.317215,
37.778527
]
]
]
}
},
...
]
}
file2.json is structued like this:
[
{
"County": "Alameda",
"Count": 25
},
{
"County": "Amador",
"Count": 1
},
{
"County": "Butte",
"Count": 2
},
...
]
I want to create a new file that includes everything from file1.json, but append it to include the relevent Count field based on the County field.
The result would look like this:
[
{
"type": "Feature",
"properties": {
"GEO_ID": "0500000US06001",
"STATE": "06",
"COUNTY": "001",
"NAME": "Alameda",
"Count": "25",
"LSAD": "County",
"CENSUSAREA": 739.017
},
"geometry": {
"type": "Polygon",
"coordinates": [
[
[
-122.30936,
37.77615
],
[
-122.317215,
37.778527
]
]
]
}
},
...
]
I'm new to using jq, but I've played around with it enough to get it running in PowerShell.
Here is a test.jq file which may help
# utility to create lookup table from array of objects
# k is the name to use as the key
# f is a function to compute the value
#
def obj(k;f): reduce .[] as $o ({}; .[$o[k]] = ($o | f));
# create map from county to count
( $file2 | obj("County";.Count) ) as $count
# add .properties.Count to each feature
| .features |= map( .properties.Count = $count[.properties.NAME] )
Example use assuming suitable file1.json and file2.json:
$ jq -M --argfile file2 file2.json -f test.jq file1.json
{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"properties": {
"GEO_ID": "0500000US06001",
"STATE": "06",
"COUNTY": "001",
"NAME": "Alameda",
"LSAD": "County",
"CENSUSAREA": 739.017,
"Count": 25
},
"geometry": {
"type": "Polygon",
"coordinates": [
[
[
-122.30936,
37.77615
],
[
-122.317215,
37.778527
]
]
]
}
}
]
}
Try it online!
I notice that "Count" is a string in your example output but it's a number in the sample file2. If you need to convert that to a string you can include a call to tostring. e.g.
.features |= map( .properties.Count = ( $count[.properties.NAME] | tostring ) )
or you could perform the conversion when the lookup table is created, e.g.
( $file2 | obj("County"; .Count | tostring ) ) as $count
Related
I have some huge JSON files I need to profile so I can transform them into some tables. I found jq to be really useful in inspecting them, but there are going to be hundreds of these, and I'm pretty new to jq.
I already have some really handy functions in my ~/.jq (big thank you to #mikehwang)
def profile_object:
to_entries | def parse_entry: {"key": .key, "value": .value | type}; map(parse_entry)
| sort_by(.key) | from_entries;
def profile_array_objects:
map(profile_object) | map(to_entries) | reduce .[] as $item ([]; . + $item) | sort_by(.key) | from_entries;
I'm sure I'll have to modify them after I describe my question.
I'd like a jq line to profile a single object. If a key maps to an array of objects then collect the unique keys across the objects and keep profiling down if there are nested arrays of objects there. If a value is an object, profile that object.
Sorry for the long example, but imagine several GBs of this:
{
"name": "XYZ Company",
"type": "Contractors",
"reporting": [
{
"group_id": "660",
"groups": [
{
"ids": [
987654321,
987654321,
987654321
],
"market": {
"name": "Austin, TX",
"value": "873275"
}
},
{
"ids": [
987654321,
987654321,
987654321
],
"market": {
"name": "Nashville, TN",
"value": "2393287"
}
}
]
}
],
"product_agreements": [
{
"negotiation_arrangement": "FFVII",
"code": "84144",
"type": "DJ",
"type_version": "V10",
"description": "DJ in a mask",
"name": "Claptone",
"negotiated_rates": [
{
"company_references": [
1,
5,
458
],
"negotiated_prices": [
{
"type": "negotiated",
"rate": 17.73,
"expiration_date": "9999-12-31",
"code": [
"11"
],
"billing_modifier_code": [
"124"
],
"billing_class": "professional"
}
]
},
{
"company_references": [
747
],
"negotiated_prices": [
{
"type": "fee",
"rate": 28.42,
"expiration_date": "9999-12-31",
"code": [
"11"
],
"billing_class": "professional"
}
]
}
]
},
{
"negotiation_arrangement": "MGS3",
"name": "David Byrne",
"type": "Producer",
"type_version": "V10",
"code": "654321",
"description": "Frontman from Talking Heads",
"negotiated_rates": [
{
"company_references": [
1,
9,
2344,
8456
],
"negotiated_prices": [
{
"type": "negotiated",
"rate": 68.73,
"expiration_date": "9999-12-31",
"code": [
"11"
],
"billing_class": "professional"
}
]
},
{
"company_references": [
679
],
"negotiated_prices": [
{
"type": "fee",
"rate": 89.25,
"expiration_date": "9999-12-31",
"code": [
"11"
],
"billing_class": "professional"
}
]
}
]
}
],
"version": "1.3.1",
"last_updated_on": "2023-02-01"
}
Desired output:
{
"name": "string",
"type": "string",
"reporting": [
{
"group_id": "number",
"groups": [
{
"ids": [
"number"
],
"market": {
"type": "string",
"value": "string"
}
}
]
}
],
"product_agreements": [
{
"negotiation_arrangement": "string",
"code": "string",
"type": "string",
"type_version": "string",
"description": "string",
"name": "string",
"negotiated_rates": [
{
"company_references": [
"number"
],
"negotiated_prices": [
{
"type": "string",
"rate": "number",
"expiration_date": "string",
"code": [
"string"
],
"billing_modifier_code": [
"string"
],
"billing_class": "string"
}
]
}
]
}
],
"version": "string",
"last_updated_on": "string"
}
Really sorry if there's any errors in that, but I tried to make it all consistent and about as simple as I could.
To restate the need, recursively profile each key in a JSON object if a value is an object or array. Solution needs to be key name independent. Happily to clarify further if needed.
The jq module schema.jq at https://gist.github.com/pkoppstein/a5abb4ebef3b0f72a6ed
Was designed to produce the kind of structural schema you describe.
For very large inputs, it might be very slow, so if the JSON is sufficiently regular, it might be possible to use a hybrid strategy - profiling enough of the data to come up with a comprehensive structural schema, and then checking that it does apply.
For conformance testing of structural schemas such as produced by schema.jq, see https://github.com/pkoppstein/JESS
Given your input.json, here is a solution :
jq '
def schema:
if type == "object" then .[] |= schema
elif type == "array" then map(schema)|unique
| if (first | type) == "object" then [add] else . end
else type
end;
schema
' input.json
Here's a variant of #Philippe's solution: it coalesces objects in map(schema) for arrays in a principled though lossy way. (All these half-solutions trade speed for loss of precision.)
Note that keys_unsorted is used below; if using gojq, then either this would have to be changed to keys, or a def of keys_unsorted provided.
# Use "JSON" as the union of two distinct types
# except combine([]; [ $x ]) => [ $x ]
def combine($a;$b):
if $a == $b then $a elif $a == null then $b elif $b == null then $a
elif ($a == []) and ($b|type) == "array" then $b
elif ($b == []) and ($a|type) == "array" then $a
else "JSON"
end;
# Profile an array by calling mergeTypes(.[] | schema)
# in order to coalesce objects
def mergeTypes(s):
reduce s as $t (null;
if ($t|type) != "object" then .types = (.types + [$t] | unique)
else .object as $o
| .object = reduce ($t | keys_unsorted[]) as $k ($o;
.[$k] = combine( $t[$k]; $o[$k] )
)
end)
| (if .object then [.object] else null end ) + .types ;
def schema:
if type == "object" then .[] |= schema
elif type == "array"
then if . == [] then [] else mergeTypes(.[] | schema) end
else type
end;
schema
Example:
Input:
{"a": [{"b":[1]}, {"c":[2]}, {"c": []}] }
Output:
{
"a": [
{
"b": [
"number"
],
"c": [
"number"
]
}
]
}
I have several CSV files of item data for a game I'm messing around with that I need to convert to JSON for consumption. The data can be quite irregular with several empty fields per record, which makes for sort of ugly JSON output.
Example with dummy values:
Id,Name,Value,Type,Properties/1,Properties/2,Properties/3,Properties/4
01:Foo:13,Foo,13,ACME,CanExplode,IsRocket,,
02:Bar:42,Bar,42,,IsRocket,,,
03:Baz:37,Baz,37,BlackMesa,CanExplode,IsAlive,IsHungry,
Converted output:
[
{
"Id": "01:Foo:13",
"Name": "Foo",
"Value": 13,
"Type": "ACME",
"Properties": ["CanExplode", "IsRocket", ""]
},
{
"Id": "02:Bar:42",
"Name": "Bar",
"Value": 42,
"Type": "",
"Properties": ["IsRocket", "", ""]
},
{
"Id": "03:Baz:37",
"Name": "Baz",
"Value": 37,
"Type": "BlackMesa",
"Properties": ["CanExplode", "IsAlive", "IsHungry"]
}
]
So far I've been quite successful with using Miller. I've managed to remove completely empty columns from the CSV as well as aggregate the Properties/X columns into a single array.
But now I'd like to do two more things to improve the output format to make consuming the JSON easier:
remove empty strings "" from the Properties array
replace the other empty strings "" (e.g. Type of the second record) with null
Desired output:
[
{
"Id": "01:Foo:13",
"Name": "Foo",
"Value": 13,
"Type": "ACME",
"Properties": ["CanExplode", "IsRocket"]
},
{
"Id": "02:Bar:42",
"Name": "Bar",
"Value": 42,
"Type": null,
"Properties": ["IsRocket"]
},
{
"Id": "03:Baz:37",
"Name": "Baz",
"Value": 37,
"Type": "BlackMesa",
"Properties": ["CanExplode", "IsAlive", "IsHungry"]
}
]
Is there a way to achieve that with Miller?
My current commands are:
mlr -I --csv remove-empty-columns file.csv to clean up the columns
mlr --icsv --ojson --jflatsep '/' --jlistwrap cat file.csv > file.json for the conversion
It's not probably the way you want to do it. I use also jq.
Running
mlr --c2j --jflatsep '/' --jlistwrap remove-empty-columns then cat input.csv | \
jq '.[].Properties|=map(select(length > 0))' | \
jq '.[].Type|=(if . == "" then null else . end)'
you will have
[
{
"Id": "01:Foo:13",
"Name": "Foo",
"Value": 13,
"Type": "ACME",
"Properties": [
"CanExplode",
"IsRocket"
]
},
{
"Id": "02:Bar:42",
"Name": "Bar",
"Value": 42,
"Type": null,
"Properties": [
"IsRocket"
]
},
{
"Id": "03:Baz:37",
"Name": "Baz",
"Value": 37,
"Type": "BlackMesa",
"Properties": [
"CanExplode",
"IsAlive",
"IsHungry"
]
}
]
Using Miller, you can "filter out" the empty fields from each record with:
mlr --c2j --jflatsep '/' --jlistwrap put '
$* = select($*, func(k,v) {return v != ""})
' file.csv
remark: actually, we're building a new record containing the non-empty fields instead of deleting the empty fields from the record; the final result is equivalent though:
[
{
"Id": "01:Foo:13",
"Name": "Foo",
"Value": 13,
"Type": "ACME",
"Properties": ["CanExplode", "IsRocket"]
},
{
"Id": "02:Bar:42",
"Name": "Bar",
"Value": 42,
"Properties": ["IsRocket"]
},
{
"Id": "03:Baz:37",
"Name": "Baz",
"Value": 37,
"Type": "BlackMesa",
"Properties": ["CanExplode", "IsAlive", "IsHungry"]
}
]
I have file with 30, 000 JSON lines delimited by new line. I am using JQ to process it.
Below is each line schema (new.json).
{
"indexed": {
"date-parts": [
[
2020,
8,
13
]
],
"date-time": "2020-08-13T06:27:26Z",
"timestamp": 1597300046660
},
"reference-count": 42,
"publisher": "American Chemical Society (ACS)",
"issue": "3",
"content-domain": {
"domain": [],
"crossmark-restriction": false
},
"short-container-title": [
"Org. Lett."
],
"published-print": {
"date-parts": [
[
2005,
2
]
]
},
"DOI": "10.1021/ol047829t",
"type": "journal-article",
"created": {
"date-parts": [
[
2005,
1,
27
]
],
"date-time": "2005-01-27T05:53:29Z",
"timestamp": 1106805209000
},
"page": "383-386",
"source": "Crossref",
"is-referenced-by-count": 38,
"title": [
"Liquid-Crystalline [60]Fullerene-TTF Dyads"
],
"prefix": "10.1021",
"volume": "7",
"author": [
{
"given": "Emmanuel",
"family": "Allard",
"affiliation": []
},
{
"given": "Frédéric",
"family": "Oswald",
"affiliation": []
},
{
"given": "Bertrand",
"family": "Donnio",
"affiliation": []
},
{
"given": "Daniel",
"family": "Guillon",
"affiliation": []
}
],
"member": "316",
"container-title": [
"Organic Letters"
],
"original-title": [],
"link": [
{
"URL": "https://pubs.acs.org/doi/pdf/10.1021/ol047829t",
"content-type": "unspecified",
"content-version": "vor",
"intended-application": "similarity-checking"
}
],
"deposited": {
"date-parts": [
[
2020,
4,
7
]
],
"date-time": "2020-04-07T13:39:55Z",
"timestamp": 1586266795000
},
"score": null,
"subtitle": [],
"short-title": [],
"issued": {
"date-parts": [
[
2005,
2
]
]
},
"references-count": 42,
"alternative-id": [
"10.1021/ol047829t"
],
"URL": "http://dx.doi.org/10.1021/ol047829t",
"relation": {},
"ISSN": [
"1523-7060",
"1523-7052"
],
"issn-type": [
{
"value": "1523-7060",
"type": "print"
},
{
"value": "1523-7052",
"type": "electronic"
}
],
"subject": [
"Physical and Theoretical Chemistry",
"Organic Chemistry",
"Biochemistry"
]
}
For every DOI, I need to obtain the values of given and family key in the same cell of the same row of that DOI in the CSV/TSV format.
The expected output for the above json is (in CSV/TSV format):
|DOI| givenName|familyName|
|10.1021/ol047829t|Emmanuel; Frédéric; Bertrand; Daniel;|Allard; Oswald; Donnio; Guillon|
I am using the below command line but it is throwing error and when I try to alter I am unable to get CSV/TSV output at all.
cat new.json | jq -r "[.DOI, .publisher, .author[] | .given] | #tsv" > manage.tsv
The same logic applies for subject key also. I am using the below command line to output values of subject key to CSV but it is throwing only the first element (in this case only: "Physical and Theoretical Chemistry")
cat new.json | jq -c -r "[.DOI, .publisher, .subject[0]] | #csv" > manage.csv
Any pointers for right jq command line will be of great help.
Join given and family names by semicolons separately, then pass resulting strings as fields to the TSV filter.
["DOI", "givenName", "familyName"],
(inputs | [.DOI, (.author | map(.given), map(.family) | join("; "))])
| #tsv
Online demo
Note that you need to invoke JQ with -r and -n flags for this to work and produce a valid TSV output.
I have a json file that is filtered by US County and has, in the "properties" section, median income. So this json file contains median income by county.
{
"type": "Topology",
"transform": {
"scale": [
0.035896170617061705,
0.005347309530953095
],
"translate": [
-179.14734,
17.884813
]
},
"objects": {
"us_counties_20m": {
"type": "GeometryCollection",
"geometries": [
{
"type": "Polygon",
"arcs": [
[
0,
1,
2,
3,
4
]
],
"id": "0500000US01001",
"properties": {
"PRICE": 48863
}
},
{
"type": "Polygon",
"arcs": [
[
5,
6,
7,
8,
9,
10
]
],
"id": "0500000US01009",
"properties": {
"PRICE": 41940
}
},
{
"type": "Polygon",
"arcs": [
[
11,
12,
13,
14,
15
]
],
"id": "0500000US01017",
"properties": {
"PRICE": 33500
}
},
{
"type": "Polygon",
"arcs": [
[
16,
17,
-3,
18,
19,
20,
21
]
],
"id": "0500000US01021",
"properties": {
"PRICE": 38833
}
},
I wish to add to the "properties" section another price, namely the median home price per county. So I have a second json file with data like this:
[
{
"Full County Number": 56045,
"Price-RangeQ42019": "$150,000-$350,000",
"Geography": "Weston County, Wyoming",
"Latitude (generated)": 43.8403,
"Longitude (generated)": -104.5684,
"Q42019 Price": "$178,218"
},
{
"Full County Number": 56043,
"Price-RangeQ42019": "$150,000-$350,000",
"Geography": "Washakie County, Wyoming",
"Latitude (generated)": 43.8356,
"Longitude (generated)": -107.6602,
"Q42019 Price": "$170,665"
},
, where I want all the categories of the 2nd json to be appended to the "properties" section as a separate category.
Desired output (the "properties" section contains more info):
{
"type": "Topology",
"transform": {
"scale": [
0.035896170617061705,
0.005347309530953095
],
"translate": [
-179.14734,
17.884813
]
},
"objects": {
"us_counties_20m": {
"type": "GeometryCollection",
"geometries": [
{
"type": "Polygon",
"arcs": [
[
0,
1,
2,
3,
4
]
],
"id": "0500000US01001",
"properties": {
"PRICE": 48863
"Price-RangeQ42019": "$150,000-$350,000",
"Geography": "Washakie County, Wyoming",
"Latitude (generated)": 43.8356,
"Longitude (generated)": -107.6602,
"Q42019 Price": "$170,665"
}
},
etc...
The "id" and the "Full County Number" in the first and second json files match up exactly. However, the "Full County Number" lacks the "0500000US" prefix before each county. How might I merge these 2 json files to get the third json with the additional property?
Thanks so much in advance.
The following should come close to providing a solution. First, a dictionary ($dict) is constructed, and then this dictionary is used to update the first file.
Invocation:
jq -n -f program.jq secondfile.json firstfile.json
where program.jq contains:
def lpad:
tostring | if length < 5 then ("00000" + .) | .[-5:] else . end;
(input
| map( with_entries(if .key == "Full County Number"
then .key = "id" | .value |= "0500000US" + lpad
else .
end ) )
| INDEX(.[]; .id) ) as $dict
| inputs
| .objects.us_counties_20m.geometries |=
map( .id as $id
| (.properties += $dict[$id]) )
I have been playing around with jq to format a json file but I am having some issues trying to solve a particular transformation. Given a test.json file in this format:
[
{
"name": "A", // This would be the first key
"number": 1,
"type": "apple",
"city": "NYC" // This would be the second key
},
{
"name": "A",
"number": "5",
"type": "apple",
"city": "LA"
},
{
"name": "A",
"number": 2,
"type": "apple",
"city": "NYC"
},
{
"name": "B",
"number": 3,
"type": "apple",
"city": "NYC"
}
]
I was wondering, how can I format it this way using jq?
[
{
"key": "A",
"values": [
{
"key": "NYC",
"values": [
{
"number": 1,
"type": "a"
},
{
"number": 2,
"type": "b"
}
]
},
{
"key": "LA",
"values": [
{
"number": 5,
"type": "b"
}
]
}
]
},
{
"key": "B",
"values": [
{
"key": "NYC",
"values": [
{
"number": 3,
"type": "apple"
}
]
}
]
}
]
I have followed this thread Using jq, convert array of name/value pairs to object with named keys and tried to group the json using this expression
jq '. | group_by(.name) | group_by(.city) ' ./test.json
but I have not been able to add the keys in the output.
You'll want to group the items at the different levels and building out your result objects as you want.
group_by(.name) | map({
key: .[0].name,
values: (group_by(.city) | map({
key: .[0].city,
values: map({number,type})
}))
})
Just keep in mind that group_by/1 yields groups in a sorted order. You'll probably want an implementation that preserves that order.
def group_by_unsorted(key_selector):
reduce .[] as $i ({};
.["\($i|key_selector)"] += [$i]
)|[.[]];