Given a data.tsv file such :
id code name
1 AL Alabama
2 AK Alaska
4 AZ Arizona
5 AR Arkansas
6 CA California
... ... ...
Given a topojson.json file such : (the structure is correct, the numeral values are random)
{
"type":"Topology",
"transform":
{
"scale": [0.0015484881821515486,0.0010301030103010299],
"translate":[-5.491666666666662,41.008333333333354]
},
"objects":
{
"states":
{
"type":"GeometryCollection",
"geometries":
[
{"type":"Polygon","arcs":[[0]],"properties":{"code_2":"AL"}},
{"type":"Polygon","arcs":[[1]],"properties":{"code_2":"AK"}}
]
}
},
"arcs":
[
[[2466,9916],[-25,-5],[3,-13]],
[[2357,9852],[1,-2],[1,-2]]
]
}
How to use the common fields(1) to inject the values of an other field(2) into the json file ?
1]: data.txt#code and topojson.txt.objects.states.geometries.properties.code_2
2]: data.txt#name
The end result should contains :
{"type":"Polygon","arcs":[[0]],"properties":{"code_2":"AL", "name":"Alabama" }},
{"type":"Polygon","arcs":[[1]],"properties":{"code_2":"AK", "name":"Alaska" }},
EDIT: Accepted answer:
topojson -o final.json -e data.tsv --id-property=code_2,code -p code_2,state=name -- topojson.json
Try using this:
topojson -o final.json -e data.tsv \
--id-property=code_2,code -p code_2,state=name \
-- topojson.json
Which should output:
{
"type": "Topology",
"transform": {
"scale": [
0.000016880209206372492,
0.000007005401010148724
],
"translate": [ -1.8418800213354616, 51.15278777877789 ]
},
"objects": {
"states": {
"type": "GeometryCollection",
"geometries": [
{
"type": "Polygon",
"arcs": [
[ 0 ]
],
"id": "AK",
"properties": {
"code_2": "AK",
"state": "Alaska"
}
}
]
}
},
"arcs": [
[[2466,9916],[-25,-5],[3,-13]],
[[2357,9852],[1,-2],[1,-2]]
]
}
From the Command Line Reference wiki:
--id-property name of feature property to promote to geometry id
By using the code_2 property with this option, you promote it as the feature ID.
Prepend a + in front of the input property name to coerce its value to a number.
Plus:
If the properties referenced by --id-property are null or undefined,
they are omitted from the output geometry object. Thus, the generated
objects may not have a defined ID if the input features did not have a
property with the specified name.
So, when you are using +code and +code_2, they are probably undefined, as you can't convert the AK string value to a number.
Here, the input property "FIPS" is coerced to a number and used as the
feature identifier; likewise, the column named "FIPS" is used as the
identifier in the CSV file. (If your CSV file uses a different column
name for the feature identifier, you can specify multiple id
properties, such as --id-property=+FIPS,+id.)
That's why you have to add the code to the --id-property=code_2,code option. This is how the mapping is made (the code_2 from topojson.json and the code column from data.tsv).
Then, the output property "unemployment" is generated from the
external data file, unemployment.tsv, which defines the input property
"rate"
In our case, -p code_2,state=name specifies that we will preserve the code_2 property and we will rename the name property to state. The Properties and External Properties sections in the aforementioned documentation wiki are pretty informative on the matter.
The topojson package has been deprecated. The following steps are based on the command-line cartography workflow. These interfaces are more flexible, but a little bit more complicated to use.
Install dependencies:
npm install d3-dsv ndjson-cli
Add the node_modules/.bin directory to the path so that you can easily run the commands:
PATH=$(npm bin):$PATH
Convert the tsv file into a newline-delimited json file:
tsv2json data.tsv -n > data.ndjson
{"id":"1","code":"AL","name":"Alabama"}
{"id":"2","code":"AK","name":"Alaska"}
Parse the id column as a number:
ndjson-map '{id: +d.id, code: d.code, name: d.name}' < data.ndjson > data_parsed.ndjson
{"id":1,"code":"AL","name":"Alabama"}
{"id":2,"code":"AK","name":"Alaska"}
Extract the geometries of the topojson file:
ndjson-cat topojson.json | ndjson-split 'd.objects.states.geometries' > topojson_geometries.ndjson
{"type":"Polygon","arcs":[[0]],"properties":{"code_2":"AK"}}
{"type":"Polygon","arcs":[[1]],"properties":{"code_2":"AL"}}
Join both newline-delimited json files:
ndjson-join 'd.properties.code_2' 'd.code' topojson_geometries.ndjson data_parsed.ndjson > geometries_data_join.ndjson
[{"type":"Polygon","arcs":[[0]],"properties":{"code_2":"AK"}},{"id":2,"code":"AK","name":"Alaska"}]
[{"type":"Polygon","arcs":[[1]],"properties":{"code_2":"AL"}},{"id":1,"code":"AL","name":"Alabama"}]
Add the name column to the topojson properties and only keep the topojson geometries:
ndjson-map 'd[0].properties.name = d[1].name, d[0]' < geometries_data_join.ndjson > geometries_data_merge.ndjson
{"type":"Polygon","arcs":[[0]],"properties":{"code_2":"AK","name":"Alaska"}}
{"type":"Polygon","arcs":[[1]],"properties":{"code_2":"AL","name":"Alabama"}}
Convert the previous result into an array and concat it with the original topojson file:
ndjson-join <(ndjson-cat topojson.json) <(ndjson-reduce < geometries_data_merge.ndjson) > topojson_concat.ndjson
[{
"type": "Topology",
"transform": {
"scale": [0.0015484881821515486, 0.0010301030103010299],
"translate": [-5.491666666666662, 41.008333333333354]
},
"objects": {
"states": {
"type": "GeometryCollection",
"geometries": [{
"type": "Polygon",
"arcs": [[0]],
"properties": {
"code_2": "AK"
}
}, {
"type": "Polygon",
"arcs": [[1]],
"properties": {
"code_2": "AL"
}
}
]
}
},
"arcs": [[[2466, 9916], [-25, -5], [3, -13]], [[2357, 9852], [1, -2], [1, -2]]]
}, [{
"type": "Polygon",
"arcs": [[0]],
"properties": {
"code_2": "AK",
"name": "Alaska"
}
}, {
"type": "Polygon",
"arcs": [[1]],
"properties": {
"code_2": "AL",
"name": "Alabama"
}
}
]
]
Overwrite the geometries of original topojson file and save it as a normal json file:
ndjson-map 'd[0].objects.states.geometries = d[1], d[0]' < topojson_concat.ndjson > topojson_data.json
{
"type": "Topology",
"transform": {
"scale": [0.0015484881821515486, 0.0010301030103010299],
"translate": [-5.491666666666662, 41.008333333333354]
},
"objects": {
"states": {
"type": "GeometryCollection",
"geometries": [{
"type": "Polygon",
"arcs": [[0]],
"properties": {
"code_2": "AK",
"name": "Alaska"
}
}, {
"type": "Polygon",
"arcs": [[1]],
"properties": {
"code_2": "AL",
"name": "Alabama"
}
}
]
}
},
"arcs": [[[2466, 9916], [-25, -5], [3, -13]], [[2357, 9852], [1, -2], [1, -2]]]
}
All commands in one line:
ndjson-join <(ndjson-cat topojson.json) <(ndjson-join 'd.properties.code_2' 'd.code' <(ndjson-cat topojson.json | ndjson-split 'd.objects.states.geometries') <(tsv2json data.tsv -n | ndjson-map '{id: +d.id, code: d.code, name: d.name}') | ndjson-map 'd[0].properties.name = d[1].name, d[0]' | ndjson-reduce) | ndjson-map 'd[0].objects.states.geometries = d[1], d[0]' > topojson_data.json
Notes:
I swapped "AK" and "AL" in the topojson file to check if the join really works.
The last command (before the one-liner) only works on the original output and not on the given pretty-printed version, which has newlines in it.
I tested the workflow on the subsystem for Linux since ndjson-map does not seem to work properly on Windows currently.
Related
I want to describe a network graph of vertices and edges with JSON Schema.
An example JSON could look like this:
{
"V": [
"1",
"2",
"3"
],
"E": [
{
"v1": "1",
"v2": "2"
},
{
"v1": "2",
"v2": "3"
}
]
}
I have a set of 3 vertices and 2 edges to connect them. I want all vertices to have an arbitrary string identifier, so it could also be "node1" or "panda". However, is there a way to validate that the endpoints of my edges only point to existing vertices?
I.e.: Should NOT pass:
{
"V": [
"n1",
"n2",
"n3"
],
"E": [
{
"v1": "n1",
"v2": "IdThatDoesNotExistAbove"
}
]
}
I looked at ENUMs, however, I struggle to have them point at data from a JSON that I want to validate rather than to the specification itself.
With jq this task can be solved.
jq -r '([.E[] | to_entries[].value] | unique) - .V |
if length == 0
then "all vertices defined"
else "undefined vertices: \(.)\n" | halt_error(1)
end
' "$FILE"
echo "exit code: $?"
Output valid file
all vertices defined
exit code: 0
Output invalid file
undefined vertices: ["IdThatDoesNotExistAbove"]
exit code: 1
If you are not interested which vertices are undefined you can use a shorter version
jq -e '([.E[] | to_entries[].value]) - .V | length == 0' "$FILE"
echo "exit code: $?"
Output valid file
true
exit code: 0
Output invalid file
false
exit code: 1
JSON Schema doesn't define a way to reference data like this, but it does have extension vocabularies, which allow the definition of custom keywords. I have created a data vocabulary that does precisely what you're looking to do.
{
"$schema": "https://json-everything.net/meta/data-2022",
"type": "object",
"$defs": {
"user-defined-vertex": {
"data": {
"enum": "/V"
}
}
},
"properties": {
"V": {
"type": "array",
"items": {"type": "string"}
},
"E": {
"type": "array",
"items": {
"type": "object",
"properties":{
"v1": { "$ref": "#/$defs/user-defined-vertex" },
"v2": { "$ref": "#/$defs/user-defined-vertex" }
},
"required": ["v1", "v2"],
"additionalProperties": false
}
}
},
"additionalProperties": false
}
The key part of this is the data keyword in #/$defs.
data takes an object with schema keywords as keys and JSON Pointers or URIs as values. If you want to extract values from the instance data, you'll use JSON Pointers. For anything else, you'll use a URI.
So for this case, I have
{
"data": {
"enum": "/V"
}
}
which says to take the value from /V in the instance data and use that as the value for the enum keyword.
In #/properties/V you define that /V must be an array with string values.
However, to my knowledge, this vocabulary is only implemented for my library, JsonSchema.Net and you'll need the extension package JsonSchema.Net.Data.
I am trying to make a map of the U.S. with Mapbox that shows median home price by county. I have a .json file that contains all the counties and is already accepted by Mapbox tileset -
{
"type": "Topology",
"transform": {
"scale": [
0.035896170617061705,
0.005347309530953095
],
"translate": [
-179.14734,
17.884813
]
},
"objects": {
"us_counties_20m": {
"type": "GeometryCollection",
"geometries": [
{
"type": "Polygon",
"arcs": [],
"id": "0500000US01001"
},
{
"type": "Polygon",
"arcs": [],
"id": "0500000US01009"
},
{
"type": "Polygon",
"arcs": [],
"id": "0500000US01017"
},
{
"type": "Polygon",
"arcs": [],
"id": "0500000US01021"
}
]
}
}
}
Basically, it's a json file with "type" (Polygon), "arcs" (to map the county), and "id", which is an ID for the county.
This is great and accepted by Mapbox Tilesets to give me a visualization by county, but I need to add in median home price by county (in order to get colors by county, based on price).
I have a second json file that is more like an array, which has
[
{
"0500000US01001": 51289.0,
"0500000US01009": 46793.0,
"0500000US01017": 39857.0,
"0500000US01021": 48859.0
}
]
and so on, but basically it has the ID -> median home price per county. The ID's are the same between these 2 files, and of the same quantity. So I need get a 3rd json file out of these, which has "type", "arcs", "id", and "PRICE" (the addition).
These files are huge - any suggestions? I tried using jq but received an error that
jq: error ... object ({"type":"To...) and array ([{"0500000U...) cannot be multiplied
Thanks in advance!
A straightforward approach would be saving the second file into a variable and using it as a reference while updating the first file. E.g:
jq 'add as $prices | input
| .objects.us_counties_20m.geometries[] |= . + {PRICE: $prices[.id]}' file2 file1
add can be substituted with .[0] if the array in file2 contains only one object.
Online demo
I have multiple JSON files one.json, two.json, three.json with the below format and I want to create a consolidated array from them using jq. So, from all the files I want to extract Name and Value field inside the Parameters and use them to create an array where the id value will be constructed from the Name value and value field will be constructed using Value field value.
input:
one.json:
{
"Parameters": [
{
"Name": "id1",
"Value": "one",
"Version": 2,
"LastModifiedDate": 1581663187.36
}
]
}
two.json
{
"Parameters": [
{
"Name": "id2",
"Value": "xyz",
"Version": 2,
"LastModifiedDate": 1581663187.36
}
]
}
three.json
{
"Parameters": [
{
"Name": "id3",
"Value": "xyz",
"Version": 2,
"LastModifiedDate": 1581663187.36
}
]
}
output:
[
{
"id": "id1",
"value": "one"
},
{
"id": "id2",
"value": "xyz"
},
{
"id": "id3",
"value": "xyz"
}
]
How to achieve this using jq
You can use a reduce expression instead of slurping the whole file into memory (-s); by iterative manipulation of the input file contents and then appending the required fields one at a time.
jq -n 'reduce inputs.Parameters[] as $d (.; . + [ { id: $d.Name, value: $d.Value } ])' one.json two.json three.json
The -n flag is to ensure that we construct the output JSON data from scratch over the input file contents made available over the inputs function. Since reduce works in an iterative manner, for each of the object in the input, we create a final array, creating the KV pair as desired.
I have the following json
[
{
"certname": "server1",
"environment": "production",
"name": "memorysize",
"value": "62.76 GiB"
},
{
"certname": "server1",
"environment": "production",
"name": "processorcount",
"value": 12
},
{
"certname": "server2",
"environment": "production",
"name": "memorysize",
"value": "62.76 GiB"
},
{
"certname": "server2",
"environment": "production",
"name": "processorcount",
"value": 10
}
]
And I want to convert to this format where it's grouped by the certname. The challenge is I need to use value for to make it as the key as follow
[
{
"certname": "server1",
"memorysize": "62.76 GiB",
"processorcount": 12
},
{
"certname": "server2",
"memorysize": "62.76 GiB",
"processorcount": 10
}
]
How do I do this using jq? I have tried to_entries but it doesn't help either.
Thanks
The following is a commented jq script. Feel free to use it as is, or strip out the newlines and comments and use it as is.
# First, we construct an object that maps each `$certname` to `{certname: $certname}`. We name it $init.
(map({key:.certname, value: {certname}}) | unique | from_entries) as $init |
# Next, we take each object of the input in turn (name it $attr) and assign its
# `name:value` into one of the objects.
# $init is the dictionary above
# Reduce will pass the current dictionary as . for each invocation, and the assignment
# returns the input object.
reduce .[] as $attr ($init; .[$attr.certname][$attr.name] = $attr.value) |
# Our initial dictionary has now been expanded with attributes.
# Map it back to an array of objects. .[] is a stream of objects,
# we capture that in an outer array.
[.[]]
I am using JQ 1.4 on Windows 64 bit machine.
Below are the contents of input file IP.txt
{
"results": [
{
"name": "Google",
"employees": [
{
"name": "Michael",
"division": "Engineering"
},
{
"name": "Laura",
"division": "HR"
},
{
"name": "Elise",
"division": "Marketing"
}
]
},
{
"name": "Microsoft",
"employees": [
{
"name": "Brett",
"division": "Engineering"
},
{
"name": "David",
"division": "HR"
}
]
}
]
}
{
"results": [
{
"name": "Amazon",
"employees": [
{
"name": "Watson",
"division": "Marketing"
}
]
}
]
}
File contains two "results". 1st result containts information for 2 companies: Google and Microsoft. 2nd result contains information for Amazon.
I want to convert this JSON into csv file with company name and employee name.
"Google","Michael"
"Google","Laura"
"Google","Elise"
"Microsoft","Brett"
"Microsoft","David"
"Amazon","Watson"
I am able to write below script:
jq -r "[.results[0].name,.results[0].employees[0].name]|#csv" IP.txt
"Google","Michael"
"Amazon","Watson"
Can someone guide me to write the script without hardcoding the index values?
Script should be able generate output for any number results and each cotaining information of any number of companies.
I tried using below script which didn't generate expected output:
jq -r "[.results[].name,.results[].employees[].name]|#csv" IP.txt
"Google","Microsoft","Michael","Laura","Elise","Brett","David"
"Amazon","Watson"
You need to flatten down the results first to rows of company and employee names. Then with that, you can convert to csv rows.
map(.results | map({ cn: .name, en: .employees[].name } | [ .cn, .en ])) | add[] | #csv
Since you have a stream of inputs, you'll have to slurp (-s) it in. Since you want to output csv, you'll want to use raw output (-r).