Powershell - Search json tree - json

I have a json structure in a file my.json that looks similar to below:
{
"key1":{
"key1A":{
"someKey":"someValue"
},
"key1B":{
"someKey":"someValue"
}
},
"key2":{
"key2A":{
"someKey":"someValue"
},
"key2B":{
"someKey":"someValue"
}
},
"key3":{
"key3A":{
"someKey":"someValue"
},
"key3B":{
"someKey":"someValue"
}
}
}
Using powershell I would like to search for "node" key1B and select it. The assumptions are
key1B only occurs once in the whole tree
The search term (key1B) can either be a root element or a child of a root element.
If I were to know the parent under which node key1B is I could use below:
Get-Content -Raw my.json | ConvertFrom-Json | Select-Object -ExpandProperty key1 | Select-Object -ExpandProperty key1B
How can I make above select generic so that the node I'm searching for can be either in the root or a child of the root?

# convert as a Hashtable
$json = Get-Content -Raw .\my.json | ConvertFrom-Json -AsHashtable
# a root element
$json.key1B
# a child of a root element
$json.Values.key1B
# descendant
function findByKey {
param($json, $key)
if ($json.$key) { $json.$key }
else {
$json.Values | ? values | % { findByKey $_ $key } | select -First 1
}
}
findByKey $json "key1B"

jq has a recursive descent .. operator that can search all the keys like xpath //.
get-content file.json | jq '.. | .key1B? | select(. != null)'
{
"someKey1b": "someValue1b"
}

You can find Key1A or whatever key you are looking for by looking at the definition column of Get-Member.
Let's define your JSON As variable $TestJson:
$testJson = #"
{
"key1":{
"key1A":{
"someKey":"someValue"
},
"key1B":{
"someKey":"someValue"
}
},
"key2":{
"key2A":{
"someKey":"someValue"
},
"key2B":{
"someKey":"someValue"
}
},
"key3":{
"key3A":{
"someKey":"someValue"
},
"key3B":{
"someKey":"someValue"
}
}
}
"#
$testJson = $testJson | ConvertFrom-Json
We are looking for Key1A in $testJson which we do not know is under the parent node key1, we can do this by looking at the output of $testJson | gm
$testJson | gm
TypeName: System.Management.Automation.PSCustomObject
Name MemberType Definition
---- ---------- ----------
Equals Method bool Equals(System.Object obj)
GetHashCode Method int GetHashCode()
GetType Method type GetType()
ToString Method string ToString()
key1 NoteProperty System.Management.Automation.PSCustomObject key1=#{key1A=; key1B=}
key2 NoteProperty System.Management.Automation.PSCustomObject key2=#{key2A=; key2B=}
key3 NoteProperty System.Management.Automation.PSCustomObject key3=#{key3A=; key3B=}
We can see here that all of the nodes and their sub-nodes are listed in the definitions tab, with bigger JSONs, we wouldn't be able to see the whole definitions tab with this so we could wither to one of these two things:
$testJson | gm | select-object "Definition"
($testJson | gm).Definition
So if we want to find Key1A we can do
$testJson | gm | ? {$_.Definition -imatch "key1A"}
Which finds the definition in where key1a is in (case-insensitive as specified by -i instead of -c) which gives us the output of
TypeName: System.Management.Automation.PSCustomObject
Name MemberType Definition
---- ---------- ----------
key1 NoteProperty System.Management.Automation.PSCustomObject key1=#{key1A=; key1B=}
Where as you can see the parent node is key1 and we can grab that as well with
($testJson | gm | ? {$_.Definition -imatch "key1A"}).name
key1
And to view the content of key1 we can do
$($testJson).$(($testJson | gm | ? {$_.Definition -imatch "key1A"}).name)
key1A key1B
----- -----
#{someKey=someValue} #{someKey=someValue}
And key1a
$($testJson).$(($testJson | gm | ? {$_.Definition -imatch "key1A"}).name).key1a
someKey
-------
someValue

Related

Json extract Array property with index using JQ

I got a Json which is basically a array but with a weird format that i can not change.
Is there any way that i can get with JQ the url by searching for the name, like this?
{
"servers": {
"servers[0].name": "abc",
"servers[0].url": "www.abc.test.com",
"servers[1].name": "xyz",
"servers[1].url": "www.xyz.test.com"
}
}
jq -r '.servers | select(.name=="abc") | .url'
Assuming the "=" can be naively changed to ":":
sed 's/ = /: /' | jq '
.servers
| keys_unsorted[] as $k
| select(.[$k] == "abc")
| ($k | sub("[.]name"; ".url")) as $k
| .[$k]
'
If you are looking for a general way to build a JSON array or object from such source, here's one way using reduce and setpath with regexes for splitting up the keys:
def build:
reduce (to_entries[] | .key |= [
splits("(?=\\[\\d+\\])|\\.")
| capture("\\[(?<index>\\d+)\\]|(?<field>.+)")
| (.index | tonumber)? // .field
]) as {$key, $value} (null; setpath($key; $value));
.servers | build.servers[] | select(.name == "abc").url
Demo

Convert the JSON Array values using Power Shell

I have a JSON file with the below given sample structure. How can I convert this into CSV and get the content of CSV as the below given expected output?
{
"code":"A123",
"name":"ABC",
"parties":[
{
"businessTeam":{
"code":"B123",
"name":"Plaza"
},
"TotalNumber":"1000"
},
{
"businessTeam":{
"code":"B125",
"name":"Marina"
},
"TotalNumber":"2000"
},
{
"businessTeam":{
"code":"B130",
"name":"Marriot"
},
"TotalNumber":"2500"
}
]
}
Expected Output:
Code, Name,BusinessPartyCode,BusinessPartyName,Capacity
A123,ABC,B123,Plaza,1000
A123,ABC,B125,Marina,2000
A123,ABC,B130,Marriot,2500
I have tied with the below script and was able to extract the array values as a single delimiter concatenated values.
$deploy = Get-Content 'C:\psscripts\sample.json' -Raw | ConvertFrom-Json
$items = #()
foreach ($server in $deploy) {
foreach ($item in $server) {
$items += New-Object -TypeName PSObject -Property (
[ordered]#{
code = #($item.Code) -replace '"','#' -join '~'
businessparty = #($item.parties.businessteam.code) -join '-'
businesspartyName = #($item.parties.businessteam.name) -join '-'
Capacity = #($item.parties.businessteamtotalnumber) -join '-'
}
)
}
}
$items
-> output A123,ABC,B123-B125-B130,Plaza-Marina-Marriot,1000-2000-2500
Regards,
Sandeep
You're missing an inner loop to expand the values of businessTeam:
Get-Content 'C:\psscripts\sample.json' -Raw | ConvertFrom-Json | ForEach-Object {
foreach($item in $_.parties) {
foreach($team in $item.businessTeam) {
[pscustomobject]#{
Code = $_.code
Name = $_.name
BusinessPartyCode = $team.code
BusinessPartyName = $team.name
Capacity = $item.TotalNumber
}
}
}
} | Format-Table
Using the Json in question, the array of objects generated using this code would be:
Code Name BusinessPartyCode BusinessPartyName Capacity
---- ---- ----------------- ----------------- --------
A123 ABC B123 Plaza 1000
A123 ABC B125 Marina 2000
A123 ABC B130 Marriot 2500

jq - converting json to cvs - how to treat "null" as string?

I have the following json file which I would like to convert to csv:
{
"id": 1,
"date": "2014-05-05T19:07:48.577"
}
{
"id": 2,
"date": null
}
Converting it to csv with the following jq produces:
$ jq -sr '(map(keys) | add | unique) as $cols | map(. as $row | $cols | map($row[.])) as $rows | $cols, $rows[] | #csv' < test.json
"date","id"
"2014-05-05T19:07:48.577",1
,2
Unfortunately, for the line with "id" equal to "2", the date column was not set to "null" - instead, it was empty. This in turn makes MySQL error on import if it's a datetime column (it expects a literal "null" if we don't have a date, and errors on "").
How can I make jq print the literal "null", and not ""?
I'd go with:
(map(keys_unsorted) | add | unique) as $cols
| $cols,
(.[] | [.[$cols[]]] | map(. // "null") )
| #csv
First, using keys_unsorted avoids useless sorting.
Second, [.[$cols[]]] is an important, recurrent and idiomatic pattern, used to ensure an array is constructed in the correct order without resorting to the reduce sledge-hammer.
Third, although map(. // "null") seems to be appropriate here, it should be noted that this expression will also replace false with "null", so, it would not be appropriate in general. Instead, to preserve false, one could write map(if . == null then "null" else . end).
Fourth, it should be noted that using map(. // "null") as above will also mask missing values of any of the keys, so if one wants some other behavior (e.g., raising an error if id is missing), then an alternative approach would be warranted.
The above assumes the stream of JSON objects shown in the question is "slurped", e.g. using jq's -s command-line option.
Use // as alternative operator for your cell value:
jq -r '(map(keys) | add | unique) as $cols | map(. as $row | $cols | map($row[.] // "null")) as $rows | $cols, $rows[] | #csv' < test.json
(The whole string is pretty good explained here: https://stackoverflow.com/a/32965227/16174836)
You can "stringify" the value using tostring by changing map($row[.]) into map($row[.]|tostring):
$ cat so2332.json
{
"id": 1,
"date": "2014-05-05T19:07:48.577"
}
{
"id": 2,
"date": null
}
$ jq --slurp --raw-output '(map(keys) | add | unique) as $cols | map(. as $row | $cols | map($row[.]|tostring)) as $rows | $cols, $rows[] | #csv' so2332.json
"date","id"
"2014-05-05T19:07:48.577","1"
"null","2"
Note that the use of tostring will cause the numbers to be converted to strings.

Convert nested JSON array into separate columns in CSV file

I have a JSON file that looks like this:
{
"id": 10011,
"title": "Test procedure",
"slug": "slug",
"url": "http://test.test",
"email": "test#test.com",
"link": "http://test.er",
"subject": "testing",
"level": 1,
"disciplines": [
"discipline_a",
"discipline_b",
"discipline_c"
],
"areas": [
"area_a",
"area_b"
]
},
I was trying to use the following command to convert that into the CSV file:
(Get-Content "PATH_TO\test.json" -Raw | ConvertFrom-Json)| Convertto-CSV -NoTypeInformation | Set-Content "PATH_TO\test.csv"
However, for disciplines and areas I am getting System.Object[] in the resulting CSV file.
Is there a way to put all those nested values as a separate columns in CSV file like area_1, area_2 etc. And the same for disciplines.
2017-11-20, Completely rewrote function to improve performance and add features as -ArrayBase and support for PSStandardMembers and grouped objects.
Flatten-Object
Recursively flattens objects containing arrays, hash tables and (custom) objects. All added properties of the supplied objects will be aligned with the rest of the objects.
Requires PowerShell version 2 or higher.
Cmdlet
Function Flatten-Object { # Version 00.02.12, by iRon
[CmdletBinding()]Param (
[Parameter(ValueFromPipeLine = $True)][Object[]]$Objects,
[String]$Separator = ".", [ValidateSet("", 0, 1)]$Base = 1, [Int]$Depth = 5, [Int]$Uncut = 1,
[String[]]$ToString = ([String], [DateTime], [TimeSpan]), [String[]]$Path = #()
)
$PipeLine = $Input | ForEach {$_}; If ($PipeLine) {$Objects = $PipeLine}
If (#(Get-PSCallStack)[1].Command -eq $MyInvocation.MyCommand.Name -or #(Get-PSCallStack)[1].Command -eq "<position>") {
$Object = #($Objects)[0]; $Iterate = New-Object System.Collections.Specialized.OrderedDictionary
If ($ToString | Where {$Object -is $_}) {$Object = $Object.ToString()}
ElseIf ($Depth) {$Depth--
If ($Object.GetEnumerator.OverloadDefinitions -match "[\W]IDictionaryEnumerator[\W]") {
$Iterate = $Object
} ElseIf ($Object.GetEnumerator.OverloadDefinitions -match "[\W]IEnumerator[\W]") {
$Object.GetEnumerator() | ForEach -Begin {$i = $Base} {$Iterate.($i) = $_; $i += 1}
} Else {
$Names = If ($Uncut) {$Uncut--} Else {$Object.PSStandardMembers.DefaultDisplayPropertySet.ReferencedPropertyNames}
If (!$Names) {$Names = $Object.PSObject.Properties | Where {$_.IsGettable} | Select -Expand Name}
If ($Names) {$Names | ForEach {$Iterate.$_ = $Object.$_}}
}
}
If (#($Iterate.Keys).Count) {
$Iterate.Keys | ForEach {
Flatten-Object #(,$Iterate.$_) $Separator $Base $Depth $Uncut $ToString ($Path + $_)
}
} Else {$Property.(($Path | Where {$_}) -Join $Separator) = $Object}
} ElseIf ($Objects -ne $Null) {
#($Objects) | ForEach -Begin {$Output = #(); $Names = #()} {
New-Variable -Force -Option AllScope -Name Property -Value (New-Object System.Collections.Specialized.OrderedDictionary)
Flatten-Object #(,$_) $Separator $Base $Depth $Uncut $ToString $Path
$Output += New-Object PSObject -Property $Property
$Names += $Output[-1].PSObject.Properties | Select -Expand Name
}
$Output | Select ([String[]]($Names | Select -Unique))
}
}; Set-Alias Flatten Flatten-Object
Syntax
<Object[]> Flatten-Object [-Separator <String>] [-Base "" | 0 | 1] [-Depth <Int>] [-Uncut<Int>] [ToString <Type[]>]
or:
Flatten-Object <Object[]> [[-Separator] <String>] [[-Base] "" | 0 | 1] [[-Depth] <Int>] [[-Uncut] <Int>] [[ToString] <Type[]>]
Parameters
-Object[] <Object[]>
The object (or objects) to be flatten.
-Separator <String> (Default: .)
The separator used between the recursive property names. .
-Depth <Int> (Default: 5)
The maximal depth of flattening a recursive property. Any negative value will result in an unlimited depth and could cause a infinitive loop.
-Uncut <Int> (Default: 1)
The number of object iterations that will left uncut further object properties will be limited to just the DefaultDisplayPropertySet. Any negative value will reveal all properties of all objects.
-Base "" | 0 | 1 (Default: 1)
The first index name of an embedded array:
1, arrays will be 1 based: <Parent>.1, <Parent>.2, <Parent>.3, ...
0, arrays will be 0 based: <Parent>.0, <Parent>.1, <Parent>.2, ...
"", the first item in an array will be unnamed and than followed with 1: <Parent>, <Parent>.1, <Parent>.2, ...
-ToString <Type[]= [String], [DateTime], [TimeSpan]>
A list of value types (default [String], [DateTime], [TimeSpan]) that will be converted to string rather the further flattened. E.g. a [DateTime] could be flattened with additional properties like Date, Day, DayOfWeek etc. but will be converted to a single (String) property instead.
Note:
The parameter -Path is for internal use but could but used to prefix property names.
Examples
Answering the specific question:
(Get-Content "PATH_TO\test.json" -Raw | ConvertFrom-Json) | Flatten-Object | Convertto-CSV -NoTypeInformation | Set-Content "PATH_TO\test.csv"
Result:
{
"url": "http://test.test",
"slug": "slug",
"id": 10011,
"link": "http://test.er",
"level": 1,
"areas.2": "area_b",
"areas.1": "area_a",
"disciplines.3": "discipline_c",
"disciplines.2": "discipline_b",
"disciplines.1": "discipline_a",
"subject": "testing",
"title": "Test procedure",
"email": "test#test.com"
}
Stress testing a more complex custom object:
New-Object PSObject #{
String = [String]"Text"
Char = [Char]65
Byte = [Byte]66
Int = [Int]67
Long = [Long]68
Null = $Null
Booleans = $False, $True
Decimal = [Decimal]69
Single = [Single]70
Double = [Double]71
Array = #("One", "Two", #("Three", "Four"), "Five")
HashTable = #{city="New York"; currency="Dollar"; postalCode=10021; Etc = #("Three", "Four", "Five")}
Object = New-Object PSObject -Property #{Name = "One"; Value = 1; Text = #("First", "1st")}
} | Flatten
Result:
Double : 71
Decimal : 69
Long : 68
Array.1 : One
Array.2 : Two
Array.3.1 : Three
Array.3.2 : Four
Array.4 : Five
Object.Name : One
Object.Value : 1
Object.Text.1 : First
Object.Text.2 : 1st
Int : 67
Byte : 66
HashTable.postalCode : 10021
HashTable.currency : Dollar
HashTable.Etc.1 : Three
HashTable.Etc.2 : Four
HashTable.Etc.3 : Five
HashTable.city : New York
Booleans.1 : False
Booleans.2 : True
String : Text
Char : A
Single : 70
Null :
Flatting grouped objects:
$csv | Group Name | Flatten | Format-Table # https://stackoverflow.com/a/47409634/1701026
Flatting common objects:
(Get-Process)[0] | Flatten-Object
Or a list (array) of objects:
Get-Service | Flatten-Object -Depth 3 | Export-CSV Service.csv
Note that a command as below takes hours to compute:
Get-Process | Flatten-Object | Export-CSV Process.csv
Why? because it results in a table with a few hundred rows and several thousand columns. So if you if would like to use this for flatting process, you beter limit the number of rows (using the Where-Object cmdlet) or the number of columns (using the Select-Object cmdlet).
For the latest Flatten-Object version, see: https://powersnippets.com/flatten-object/
The CSV conversion/export cmdlets have no way of "flattening" an object, and I may be missing something, but I know of no way to do this with a built-in cmdlet or feature.
If you can guarantee that disciplines and areas will always have the same number of elements, you can trivialize it by using Select-Object with derived properties to do this:
$properties=#('id','title','slug','url','email','link','subject','level',
#{Name='discipline_1';Expression={$_.disciplines[0]}}
#{Name='discipline_2';Expression={$_.disciplines[1]}}
#{Name='discipline_3';Expression={$_.disciplines[2]}}
#{Name='area_1';Expression={$_.areas[0]}}
#{Name='area_2';Expression={$_.areas[1]}}
)
(Get-Content 'PATH_TO\test.json' -Raw | ConvertFrom-Json)| Select-Object -Property $properties | Export-CSV -NoTypeInformation -Path 'PATH_TO\test.csv'
However, I am assuming that disciplines and areas will be variable length for each record. In that case, you will have to loop over the input and pull the highest count value for both disciplines and areas, then build the properties array dynamically:
$inputData = Get-Content 'PATH_TO\test.json' -Raw | ConvertFrom-Json
$counts = $inputData | Select-Object -Property #{Name='disciplineCount';Expression={$_.disciplines.Count}},#{Name='areaCount';Expression={$_.areas.count}}
$maxDisciplines = $counts | Measure-Object -Maximum -Property disciplineCount | Select-Object -ExpandProperty Maximum
$maxAreas = $counts | Measure-Object -Maximum -Property areaCount | Select-Object -ExpandProperty Maximum
$properties=#('id','title','slug','url','email','link','subject','level')
1..$maxDisciplines | % {
$properties += #{Name="discipline_$_";Expression=[scriptblock]::create("`$_.disciplines[$($_ - 1)]")}
}
1..$maxAreas | % {
$properties += #{Name="area_$_";Expression=[scriptblock]::create("`$_.areas[$($_ - 1)]")}
}
$inputData | Select-Object -Property $properties | Export-CSV -NoTypeInformation -Path 'PATH_TO\test.csv'
This code hasn't been fully tested, so it may need some tweaking to work 100%, but I believe the ideas are solid =)

How to convert nested JSON to CSV using only jq

I've following json,
{
"A": {
"C": {
"D": "T1",
"E": 1
},
"F": {
"D": "T2",
"E": 2
}
},
"B": {
"C": {
"D": "T3",
"E": 3
}
}
}
I want to convert it into csv as follows,
A,C,T1,1
A,F,T2,2
B,C,T3,3
Description of output: The parents keys will be printed until, I've reached the leaf child. Once I reached leaf child, print its value.
I've tried following and couldn't succeed,
cat my.json | jq -r '(map(keys) | add | unique) as $cols | map(. as $row | $cols | map($row[.])) as $rows | $rows[] | #csv'
and it throwing me an error.
I can't hardcode the parent keys, as the actual json has too many records. But the structure of the json is similar. What am I missing?
Some of the requirements are unclear, but the following solves one interpretation of the problem:
paths as $path
| {path: $path, value: getpath($path)}
| select(.value|type == "object" )
| select( [.value[]][0] | type != "object")
| .path + ([.value[]])
| #csv
(This program could be optimized but the presentation here is intended to make the separate steps clear.)
Invocation:
jq -r -f leaves-to-csv.jq input.json
Output:
"A","C","T1",1
"A","F","T2",2
"B","C","T3",3
Unquoted strings
To avoid the quotation marks around strings, you could replace the last component of the pipeline above with:
join(",")
Here is a solution using tostream and group_by
[
tostream
| select(length == 2) # e.g. [["A","C","D"],"T1"]
| .[0][:-1] + [.[1]] # ["A","C","T1"]
]
| group_by(.[:-1]) # [[["A","C","T1"],["A","C",1]],...
| .[] # [["A","C","T1"],["A","C",1]]
| .[0][0:2] + map(.[-1]|tostring) # ["A","C","T1","1"]
| join(",") # "A,C,T1,1"