Merging JSON with array to create new file - json

I am trying to make a map of the U.S. with Mapbox that shows median home price by county. I have a .json file that contains all the counties and is already accepted by Mapbox tileset -
{
"type": "Topology",
"transform": {
"scale": [
0.035896170617061705,
0.005347309530953095
],
"translate": [
-179.14734,
17.884813
]
},
"objects": {
"us_counties_20m": {
"type": "GeometryCollection",
"geometries": [
{
"type": "Polygon",
"arcs": [],
"id": "0500000US01001"
},
{
"type": "Polygon",
"arcs": [],
"id": "0500000US01009"
},
{
"type": "Polygon",
"arcs": [],
"id": "0500000US01017"
},
{
"type": "Polygon",
"arcs": [],
"id": "0500000US01021"
}
]
}
}
}
Basically, it's a json file with "type" (Polygon), "arcs" (to map the county), and "id", which is an ID for the county.
This is great and accepted by Mapbox Tilesets to give me a visualization by county, but I need to add in median home price by county (in order to get colors by county, based on price).
I have a second json file that is more like an array, which has
[
{
"0500000US01001": 51289.0,
"0500000US01009": 46793.0,
"0500000US01017": 39857.0,
"0500000US01021": 48859.0
}
]
and so on, but basically it has the ID -> median home price per county. The ID's are the same between these 2 files, and of the same quantity. So I need get a 3rd json file out of these, which has "type", "arcs", "id", and "PRICE" (the addition).
These files are huge - any suggestions? I tried using jq but received an error that
jq: error ... object ({"type":"To...) and array ([{"0500000U...) cannot be multiplied
Thanks in advance!

A straightforward approach would be saving the second file into a variable and using it as a reference while updating the first file. E.g:
jq 'add as $prices | input
| .objects.us_counties_20m.geometries[] |= . + {PRICE: $prices[.id]}' file2 file1
add can be substituted with .[0] if the array in file2 contains only one object.
Online demo

Related

JMESPath how to write a query with multi-level filter?

I have been studying official documentation of JMESPath and a few other resources. However I was not successful with the following task:
my data structure is a json from vimeo api (video list):
data array contains lots of objects, each object is the uploaded file that has many attributes and various options.
"data": [
{
"uri": "/videos/00001",
"name": "Video will be added.mp4",
"description": null,
"type": "video",
"link": "https://vimeo.com/00001",
"duration": 9,
"files":[
{
"quality": "hd",
"type": "video/mp4",
"width": 1440,
"height": 1440,
"link": "https://player.vimeo.com/external/4443333.sd.mp4",
"created_time": "2020-09-01T19:10:01+00:00",
"fps": 30,
"size": 10807854,
"md5": "643d9f18e0a63e0630da4ad85eecc7cb",
"public_name": "UHD 1440p",
"size_short": "10.31MB"
},
{
"quality": "sd",
"type": "video/mp4",
"width": 540,
"height": 540,
"link": "https://player.vimeo.com/external/44444444.sd.mp4",
"created_time": "2020-09-01T19:10:01+00:00",
"fps": 30,
"size": 1345793,
"md5": "cb568939bb7b276eb468d9474c1f63f6",
"public_name": "SD 540p",
"size_short": "1.28MB"
},
... other data
]
},
... other uploaded files
]
Filter I need to apply is that duration needs to be less than 10 and width of file needs to be 540 and the result needs to contain a link (url) from files
I have managed to get only one of structure-levels working:
data[].files[?width == '540'].link
I need to extract this kind of list
[
{
"uri": "/videos/111111",
"link": "https://player.vimeo.com/external/4123112312.sd.mp4"
},
{
"uri": "/videos/22222",
"link": "https://player.vimeo.com/external/1231231231.sd.mp4"
},
...other data
]
Since the duration is in your data array, you will have to add this filter at that level.
You will also have to use what is described under the section filtering and selecting nested data because you only care of one specific type of file under the files array, so, you can use the same type of query structure | [0] in order to pull only the first element of the filtered files array.
So on your reduced exemple, the query:
data[?duration < `10`].{ uri: uri, link: files[?width == `540`].link | [0] }
Would yield the expected:
[
{
"uri": "/videos/00001",
"link": "https://player.vimeo.com/external/44444444.sd.mp4"
}
]

filtering geoJSON data using JMESPath not working

I am trying to filter some data from the geoJSON data structure shown as below:
"features": [
{
"type": "Feature",
"properties": {
"#id": "node/7071544593",
"addr:city": "Joensuu",
"addr:housenumber": "12",
"addr:postcode": "80100",
"addr:street": "Siltakatu",
"addr:unit": "C 33",
"alt_name": "Crasman Oy Joensuu",
"alt_name_1": "Crasman Oy",
"name": "Crasman Joensuu",
"short_name": "Crasman",
"website": "https://www.crasman.fi"
},
"geometry": {
"type": "Point",
"coordinates": [
29.7621398,
62.6015236
]
},
"id": "node/7071544593"
},
{
"type": "Feature",
"properties": {
"#id": "node/7117872562",
"amenity": "car_rental",
"operator": "avis"
},
"geometry": {
"type": "Point",
"coordinates": [
29.7630643,
62.6036656
]
},
"id": "node/7117872562"
}
]
What I am trying to do is iterate through this array of features, look into the properties object to check if it contains website, if Yes, then I can print its coordinates from geometry object.
This is what I tried:
Features[*].properties[?contains(#,'website')=='true'].geometry.coordinates
It gives me null value
Try this:
features[?contains(keys(properties),'website')].geometry.coordinates
E.g.:
$ jp "features[?contains(keys(properties),'website')].geometry.coordinates" <input.json
[
[
29.7621398,
62.6015236
]
]
With regard to why your example didn't work:
Identifiers are case-sensitive, so you need features, not Features.
properties is an object, not an array, so you can't apply a filter expression to it.
Even if you could, it's not properties that you want to filter. You are trying to filter whole features.
contains tests if an array contains an item (or if a string contains a substring), not whether an object has a key. You can use keys() to get the keys of an object in an array.
You don't need to compare the result of contains() to true, it's already a boolean.
Even if you were trying to compare to true, you'd need to use backticks: `true`, not quotes 'true'.

Jq convert an object into an array

I have the following file "Pokemon.json", it's a stripped down list of Pokémon, listing their Pokédex ID, name and an array of Object Types.
[{
"name": "onix",
"id": 95,
"types": [{
"slot": 2,
"type": {
"name": "ground"
}
},
{
"slot": 1,
"type": {
"name": "rock"
}
}
]
}, {
"name": "drowzee",
"id": 96,
"types": [{
"slot": 1,
"type": {
"name": "psychic"
}
}]
}]
The output I'm trying to achieve is, extracting the name value of the type object and inserting it into an array.
I can easily get an array of all the types with
jq -r '.pokemon[].types[].type.name' pokemon.json
But I'm missing the key part to transform the name field into it's own array
[ {
"name": "onix",
"id": 95,
"types": [ "rock", "ground" ]
}, {
"name": "drowzee",
"id": 96,
"types": [ "psychic" ]
} ]
Any help appreciated, thank you!
In the man it states you have an option to use map - which essentially means walking over each result and returning something (in our case, same data, constructed differently.)
This means that for each row you are creating new object, and put some values inside
Pay attention, you do need another iterator within, since we want one object per row.
(we simply need to map the values in different way it is constructed right now.)
So the solution might look like so:
jq -r '.pokemon[]|{name:.name, id:.id, types:.types|map(.type.name)}' pokemon.json

Select only some parts of a Json map file

Is there a way to select only some countries of a json file?
For example, given this json that represents the NUTS 2 subdivisions of the various European countries, I would like to modify it and select only a few countries.
For example, if I wanted only Italy (and its regions) how could I do that?
I looked for sites on the Internet that did this but I didn't find anything, and processing the file manually seems a madness.
Thanks
I tried to edit the file manually but there are some problems.
The file structure is:
{
"type": "Topology",
"objects":
{
"nuts2":
{
"type": "GeometryCollection",
"bbox": [-63.15345500000001, -21.387309500000015, 55.83662850000002, 71.18531800099998],
"geometries": [
{
"type": "Polygon",
"properties":
{
"nuts_id": "ITC1",
"name": "Piemonte",
"population": 4374052
},
"id": "ITC1",
"arcs": [
[2243, 2244, 2245, -1918, -1908, 2246, -122, -164]
]
},
...
...
{
"type": "MultiPolygon",
"properties":
{
"nuts_id": "ITI4",
"name": "Lazio",
"population": 5557276
},
"id": "ITI4",
"arcs": [
[
[-2356, -2359, -2258, -2262, -2268, 2361, -2347],
[2362]
],
[
[2363]
],
[
[2364]
],
[
[2365]
],
[
[2366]
]
]
},
]
}
},
"arcs": [
[
... PROBLEM HERE
]
],
"transform":
{
"scale": [0.011900198369836986, 0.009258188568956896],
"translate": [-63.15345500000001, -21.387309500000015]
}
}
I removed the "geometries" elements that do not interest me, leaving only the Italian nuts 2.
The problem is the content inside "arcs": the elements are many and hardly recognizable..
The only thing you can do here is selecting them one by one, because the regions of Italy aren't inside a mother clause that enclouses all of them.
What you can also do if all the Italy regions are together in the Json file is iterate to the first one and exit the loop in the last region.

How to add properties to topojson file?

Given a data.tsv file such :
id code name
1 AL Alabama
2 AK Alaska
4 AZ Arizona
5 AR Arkansas
6 CA California
... ... ...
Given a topojson.json file such : (the structure is correct, the numeral values are random)
{
"type":"Topology",
"transform":
{
"scale": [0.0015484881821515486,0.0010301030103010299],
"translate":[-5.491666666666662,41.008333333333354]
},
"objects":
{
"states":
{
"type":"GeometryCollection",
"geometries":
[
{"type":"Polygon","arcs":[[0]],"properties":{"code_2":"AL"}},
{"type":"Polygon","arcs":[[1]],"properties":{"code_2":"AK"}}
]
}
},
"arcs":
[
[[2466,9916],[-25,-5],[3,-13]],
[[2357,9852],[1,-2],[1,-2]]
]
}
How to use the common fields(1) to inject the values of an other field(2) into the json file ?
1]: data.txt#code and topojson.txt.objects.states.geometries.properties.code_2
2]: data.txt#name
The end result should contains :
{"type":"Polygon","arcs":[[0]],"properties":{"code_2":"AL", "name":"Alabama" }},
{"type":"Polygon","arcs":[[1]],"properties":{"code_2":"AK", "name":"Alaska" }},
EDIT: Accepted answer:
topojson -o final.json -e data.tsv --id-property=code_2,code -p code_2,state=name -- topojson.json
Try using this:
topojson -o final.json -e data.tsv \
--id-property=code_2,code -p code_2,state=name \
-- topojson.json
Which should output:
{
"type": "Topology",
"transform": {
"scale": [
0.000016880209206372492,
0.000007005401010148724
],
"translate": [ -1.8418800213354616, 51.15278777877789 ]
},
"objects": {
"states": {
"type": "GeometryCollection",
"geometries": [
{
"type": "Polygon",
"arcs": [
[ 0 ]
],
"id": "AK",
"properties": {
"code_2": "AK",
"state": "Alaska"
}
}
]
}
},
"arcs": [
[[2466,9916],[-25,-5],[3,-13]],
[[2357,9852],[1,-2],[1,-2]]
]
}
From the Command Line Reference wiki:
--id-property name of feature property to promote to geometry id
By using the code_2 property with this option, you promote it as the feature ID.
Prepend a + in front of the input property name to coerce its value to a number.
Plus:
If the properties referenced by --id-property are null or undefined,
they are omitted from the output geometry object. Thus, the generated
objects may not have a defined ID if the input features did not have a
property with the specified name.
So, when you are using +code and +code_2, they are probably undefined, as you can't convert the AK string value to a number.
Here, the input property "FIPS" is coerced to a number and used as the
feature identifier; likewise, the column named "FIPS" is used as the
identifier in the CSV file. (If your CSV file uses a different column
name for the feature identifier, you can specify multiple id
properties, such as --id-property=+FIPS,+id.)
That's why you have to add the code to the --id-property=code_2,code option. This is how the mapping is made (the code_2 from topojson.json and the code column from data.tsv).
Then, the output property "unemployment" is generated from the
external data file, unemployment.tsv, which defines the input property
"rate"
In our case, -p code_2,state=name specifies that we will preserve the code_2 property and we will rename the name property to state. The Properties and External Properties sections in the aforementioned documentation wiki are pretty informative on the matter.
The topojson package has been deprecated. The following steps are based on the command-line cartography workflow. These interfaces are more flexible, but a little bit more complicated to use.
Install dependencies:
npm install d3-dsv ndjson-cli
Add the node_modules/.bin directory to the path so that you can easily run the commands:
PATH=$(npm bin):$PATH
Convert the tsv file into a newline-delimited json file:
tsv2json data.tsv -n > data.ndjson
{"id":"1","code":"AL","name":"Alabama"}
{"id":"2","code":"AK","name":"Alaska"}
Parse the id column as a number:
ndjson-map '{id: +d.id, code: d.code, name: d.name}' < data.ndjson > data_parsed.ndjson
{"id":1,"code":"AL","name":"Alabama"}
{"id":2,"code":"AK","name":"Alaska"}
Extract the geometries of the topojson file:
ndjson-cat topojson.json | ndjson-split 'd.objects.states.geometries' > topojson_geometries.ndjson
{"type":"Polygon","arcs":[[0]],"properties":{"code_2":"AK"}}
{"type":"Polygon","arcs":[[1]],"properties":{"code_2":"AL"}}
Join both newline-delimited json files:
ndjson-join 'd.properties.code_2' 'd.code' topojson_geometries.ndjson data_parsed.ndjson > geometries_data_join.ndjson
[{"type":"Polygon","arcs":[[0]],"properties":{"code_2":"AK"}},{"id":2,"code":"AK","name":"Alaska"}]
[{"type":"Polygon","arcs":[[1]],"properties":{"code_2":"AL"}},{"id":1,"code":"AL","name":"Alabama"}]
Add the name column to the topojson properties and only keep the topojson geometries:
ndjson-map 'd[0].properties.name = d[1].name, d[0]' < geometries_data_join.ndjson > geometries_data_merge.ndjson
{"type":"Polygon","arcs":[[0]],"properties":{"code_2":"AK","name":"Alaska"}}
{"type":"Polygon","arcs":[[1]],"properties":{"code_2":"AL","name":"Alabama"}}
Convert the previous result into an array and concat it with the original topojson file:
ndjson-join <(ndjson-cat topojson.json) <(ndjson-reduce < geometries_data_merge.ndjson) > topojson_concat.ndjson
[{
"type": "Topology",
"transform": {
"scale": [0.0015484881821515486, 0.0010301030103010299],
"translate": [-5.491666666666662, 41.008333333333354]
},
"objects": {
"states": {
"type": "GeometryCollection",
"geometries": [{
"type": "Polygon",
"arcs": [[0]],
"properties": {
"code_2": "AK"
}
}, {
"type": "Polygon",
"arcs": [[1]],
"properties": {
"code_2": "AL"
}
}
]
}
},
"arcs": [[[2466, 9916], [-25, -5], [3, -13]], [[2357, 9852], [1, -2], [1, -2]]]
}, [{
"type": "Polygon",
"arcs": [[0]],
"properties": {
"code_2": "AK",
"name": "Alaska"
}
}, {
"type": "Polygon",
"arcs": [[1]],
"properties": {
"code_2": "AL",
"name": "Alabama"
}
}
]
]
Overwrite the geometries of original topojson file and save it as a normal json file:
ndjson-map 'd[0].objects.states.geometries = d[1], d[0]' < topojson_concat.ndjson > topojson_data.json
{
"type": "Topology",
"transform": {
"scale": [0.0015484881821515486, 0.0010301030103010299],
"translate": [-5.491666666666662, 41.008333333333354]
},
"objects": {
"states": {
"type": "GeometryCollection",
"geometries": [{
"type": "Polygon",
"arcs": [[0]],
"properties": {
"code_2": "AK",
"name": "Alaska"
}
}, {
"type": "Polygon",
"arcs": [[1]],
"properties": {
"code_2": "AL",
"name": "Alabama"
}
}
]
}
},
"arcs": [[[2466, 9916], [-25, -5], [3, -13]], [[2357, 9852], [1, -2], [1, -2]]]
}
All commands in one line:
ndjson-join <(ndjson-cat topojson.json) <(ndjson-join 'd.properties.code_2' 'd.code' <(ndjson-cat topojson.json | ndjson-split 'd.objects.states.geometries') <(tsv2json data.tsv -n | ndjson-map '{id: +d.id, code: d.code, name: d.name}') | ndjson-map 'd[0].properties.name = d[1].name, d[0]' | ndjson-reduce) | ndjson-map 'd[0].objects.states.geometries = d[1], d[0]' > topojson_data.json
Notes:
I swapped "AK" and "AL" in the topojson file to check if the join really works.
The last command (before the one-liner) only works on the original output and not on the given pretty-printed version, which has newlines in it.
I tested the workflow on the subsystem for Linux since ndjson-map does not seem to work properly on Windows currently.