Extract multiline regex from extra large files in Powershell - json

I have extra large log file in CSV format which includes JSON formatted data inside. What I'm trying to do is extract JSON parts from the data and store it in a separate file.
The real problem is that the file size is almost 70Gb which causes some interesting problems to tackle.
The file size makes it impossible to read the whole file in one chunk. With Powershell's Get-Content combined with -ReadCount and Foreach-Object I can take smaller chunks and run regex pattern over them, chunk by chunk.
$Path = <pathToFile>
$outPath = <pathToOutput>
Out-File -Encoding utf8 -FilePath $outPath
$JsonRegex = "(?smi)\{.*?\}"
Get-Content -Path $Path -ReadCount 100000 | Foreach-Object {
( "$_" | Select-String -Pattern $JsonRegex -AllMatches | Foreach-Object { $_.Matches } | Foreach-Object { $_.Value } ) | Add-Content $outPath
}
But here what happens is, every 100k lines the ReadCount is in the middle of a JSON object thus skipping said object and continuing from next object.
Here is an example how this log data looks like. It includes some columns on first row and then JSON formatted data which is not consistent so I cannot use any fixed ReadCount value to avoid being in the middle of a JSON object.
"5","5","9/10/2019 12:00:46 AM","2","some","data","removed","comment","{
"message": "comment",
"level": "Information",
"logType": "User",
"timeStamp": "2019-09-10T03:00:46.5573047+03:00",
"fingerprint": "some",
}","11"
"5","5","9/10/2019 12:00:46 AM","2","some","data","removed","comment","{
"message": "comment",
"level": "Information",
"logType": "User",
"timeStamp": "2019-09-10T03:00:46.5672713+03:00",
"fingerprint": "some",
"windowsIdentity": "LOCAL\\WinID",
"machineName": "TK-141",
"processVersion": "1.0.71",
"jobId": "24a8",
"machineId": 11
}","11"
Is there any way to accomplish this without missing any data rows from the gigantous logfile?

Use a switch statement with the -Regex and -File parameters to efficiently (by PowerShell standards) read the file line by line and keep state across multiple lines.
For efficient writing to a file, use a .NET API, namely a System.IO.StreamWriter instance.
The following code assumes:
Each JSON string spans multiple lines and is non-nested.
On a given line, an opening { / closing } unambiguously marks the start / end of a (multi-line) JSON string.
# Input file path
$path = '...'
# Output file path
# Important: specify a *full* path
$outFileStream = [System.IO.StreamWriter] "$PWD/out.txt"
$json = ''
switch -Regex -File $path {
'\{.*' { $json = $Matches[0]; continue }
'.*\}' {
$json += "`n" + $Matches[0]
$outFileStream.WriteLine($json)
$json = ''
continue
}
default { if ($json) { $json += "`n" + $_ } }
}
$outFileStream.Close()
If you can further assume that no part of the JSON string follows the opening { / precedes the closing } on the same line, as your sample data suggest, you can simplify (and speed up) the switch statement:
$json = ''
switch -Regex -File $path {
'\{$' { $json ='{'; continue }
'^\}' { $outFileStream.WriteLine(($json + "`n}")); $json = ''; continue }
default { if ($json) { $json += "`n" + $_ } }
}
$outFileStream.Close()
Doug Maurer had a solution attempt involving a System.Text.StringBuilder instance so as to optimize the iterative concatenation of the parts making up each JSON string:
However, at least with an input file crafted from many repetitions of the sample data, I saw only a small performance gain in my informal tests.
For the sake of completeness, here's the System.Text.StringBuilder solution:
$json = [System.Text.StringBuilder]::new(512) # tweak the buffer size as needed
switch -Regex -File $path {
'\{$' { $null = $json.Append('{'); continue }
'^\}' { $outFileStream.WriteLine($json.Append("`n}").ToString()); $null = $json.Clear(); continue }
default { if ($json.Length) { $null = $json.Append("`n").Append($_) } }
}
$outFileStream.Close()

Related

I want to create json file by substituting values from environment variables in a json template file

One requirement of mine is - Using windows, not use any tools not already available as part of aws cli or windows
For example, I have this json file test.json with below content:
"My number is $myvar"
I read this into a powershell variable like so:
$myobj=(get-content .\test.json | convertfrom-json)
$myvar=1
From here, I would like to do something with this $myobj which will enable me to get this output:
$myobj | tee json_with_values_from_environment.json
My number is 1
I got some limited success with iex, but not sure if it can be made to work for this example
You can use $ExecutionContext.InvokeCommand.ExpandString()
$myobj = '{test: "My number is $myvar"}' | ConvertFrom-Json
$myvar = 1
$ExecutionContext.InvokeCommand.ExpandString($myobj.test)
Output
My number is 1
Here is one way to do it using the Parser to find all VariableExpressionAst and replace them with the values in your session.
Given the following test.json:
{
"test1": "My number is $myvar",
"test2": {
"somevalue": "$env:myothervar",
"someothervalue": "$anothervar !!"
}
}
We want to find and replace $myvar, $myothervar and $anothervar with their corresponding values defined in the current session, so the code looks like this (note that we do the replacement before converting the Json string into an object, this way is much easier):
using namespace System.Management.Automation.Language
$isCore7 = $PSVersionTable.PSVersion -ge '7.2'
# Define the variables here
$myvar = 10
$env:myothervar = 'hello'
$anothervar = 'world'
# Read the Json
$json = Get-Content .\test.json -Raw
# Now parse it
$ast = [Parser]::ParseInput($json, [ref] $null, [ref] $null)
# Find all variables in it, and enumerate them
$ast.FindAll({ $args[0] -is [VariableExpressionAst] }, $true) |
Sort-Object { $_.Extent.Text } -Unique | ForEach-Object {
# now replace the text with the actual value
if($isCore7) {
# in PowerShell Core is very easy
$json = $json.Replace($_.Extent.Text, $_.SafeGetValue($true))
return
}
# in Windows PowerShell not so much
$varText = $_.Extent.Text
$varPath = $_.VariablePath
# find the value of the var (here we use the path)
$value = $ExecutionContext.SessionState.PSVariable.GetValue($varPath.UserPath)
if($varPath.IsDriveQualified) {
$value = $ExecutionContext.SessionState.InvokeProvider.Item.Get($varPath.UserPath).Value
}
# now replace the text with the actual value
$json = $json.Replace($varText, $value)
}
# now we can safely convert the string to an object
$json | ConvertFrom-Json
If we were to convert it back to Json to see the result:
{
"test1": "My number is 10",
"test2": {
"somevalue": "hello",
"someothervalue": "world !!"
}
}

Powershell 7.2: ConvertFrom-Json - Date Handling

With Powershell 7.2 there seems to be a change in how a JSON is deserialized into an object in terms of dates -> instead of string it is now datetime. But I want to have the "old" behavior, i.e. that it is handled as string and NOT datetime.
How can I achieve that when using ConvertFrom-Json in Powershell 7.2 all dates are deserialized as string and not datetime?
EDIT:
$val = '{ "date":"2022-09-30T07:04:23.571+00:00" }' | ConvertFrom-Json
$val.date.GetType().FullName
This is actually a known issue, see: #13598 Add a -DateKind parameter to ConvertFrom-Json to control how System.DateTime / System.DateTimeOffset values are constructed. Yet I think there is no easy solution for this. One thing you might do is just invoke (Windows) PowerShell. Which isn't currently straights forward as well therefore I have created a small wrapper to send and receive complex objects between PowerShell sessions (see also my #18460 Invoke-PowerShell purpose):
function Invoke-PowerShell ($Command) {
$SerializeOutput = #"
`$Output = $Command
[System.Management.Automation.PSSerializer]::Serialize(`$Output)
"#
$Bytes = [System.Text.Encoding]::Unicode.GetBytes($SerializeOutput)
$EncodedCommand = [Convert]::ToBase64String($Bytes)
$PSSerial = PowerShell -EncodedCommand $EncodedCommand
[System.Management.Automation.PSSerializer]::Deserialize($PSSerial)
}
Usage:
Invoke-PowerShell { '{ "date":"2022-09-30T07:04:23.571+00:00" }' | ConvertFrom-Json }
date
----
2022-09-30T07:04:23.571+00:00
Update
As commented by mklement0, I clearly complicated the answer.
Calling via powershell.exe is a pragmatic workaround (albeit slow and Windows-only), but note that you don't need a helper function: if you pass a script block to powershell.exe (or pwsh.exe) from PowerShell, Based64 CLIXML-based serialization happens automatically behind the scenes: try powershell.exe -noprofile { $args | ConvertFrom-Json } -args '{ "date":"2022-09-30T07:04:23.571+00:00" }' For that reason, I don't think there's a need for an Invoke-PowerShell cmdlet.
$Json = '{ "date":"2022-09-30T07:04:23.571+00:00" }'
powershell.exe -noprofile { $args | ConvertFrom-Json } -args $Json
date
----
2022-09-30T07:04:23.571+00:00
iRon's helpful answer provides a pragmatic solution via the Windows PowerShell CLI, powershell.exe, relying on the fact that ConvertFrom-Json there does not automatically transform ISO 8601-like timestamp strings to [datetime] instances.
Hopefully, the proposal in the GitHub issue he links to, #13598, will be implemented in the future, which would then simplify the solution to:
# NOT YET IMPLEMENTED as of PowerShell 7.2.x
'{ "date":"2022-09-30T07:04:23.571+00:00" }' |
ConvertFrom-Json -DateTimeKind None
However, a powershell.exe workaround has two disadvantages: (a) it is slow (a separate PowerShell instance in a child process must be launched), and (b) it is Windows-only. The solution below is a generalization of your own approach that avoids these problems.
Here's a generalization of your own in-process approach:
It injects a NUL character ("`0") at the start of each string that matches the pattern of a timestamp - the assumption is that the input itself never contains such characters, which is fair to assume.
This, as in your approach, prevents ConvertFrom-Json from recognizing timestamp strings as such, and leaves them untouched.
The [pscustomobject] graph that ConvertFrom-Json outputs must then be post-processed in order to remove the injected NUL characters again.
This is achieved with a ForEach-Object call that contains a helper script block that recursively walks the object graph, which has the advantage that it works with JSON input whose timestamp strings may be at any level of the hierarchy (i.e. they may also be in properties of nested objects).
Note: The assumption is that the timestamp strings are only ever contained as property values in the input; more work would be needed if you wanted to handle input JSON such as '[ "2022-09-30T07:04:23.571+00:00" ]' too, where the strings are input objects themselves.
# Sample JSON.
$val = '{ "date":"2022-09-30T07:04:23.571+00:00" }'
$val -replace '"(?=\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d{3}[+-]\d{2}:\d{2}")', "`"`0" | #"
ConvertFrom-Json |
ForEach-Object {
# Helper script block that walks the object graph
$sb = {
foreach ($o in $args[0]) {
if ($o -is [Array]) { # nested array -> recurse
foreach ($el in $o) { & $sb $el } # recurse
}
elseif ($o -is [System.Management.Automation.PSCustomObject]) {
foreach ($prop in $o.psobject.Properties) {
if ($prop.Value -is [Array]) {
foreach ($o in $prop.Value) { & $sb $o } # nested array -> recurse
}
elseif ($prop.Value -is [System.Management.Automation.PSCustomObject]) {
& $sb $prop.Value # nested custom object -> recurse
}
elseif ($prop.Value -is [string] -and $prop.Value -match '^\0') {
$prop.Value = $prop.Value.Substring(1) # Remove the NUL again.
}
}
}
}
}
# Call the helper script block with the input object.
& $sb $_
# Output the modified object.
if ($_ -is [array]) {
# Input object was array as a whole (implies use of -NoEnumerate), output as such.
, $_
} else {
$_
}
}
Based on the input from #zett42 here my solution:
Assuming we know the regex pattern of the date used in the JSON I get the JSON as string, add a prefix so that ConvertFrom-Json does not convert dates to datetime but keeps it as string, convert it with ConvertFrom-Json to a PSCustomObject, do whatever I need to do on the object, serialize it back to a JSON string with ConvertTo-Json and then remove the prefix again.
[string]$json = '{ "date":"2022-09-30T07:04:23.571+00:00", "key1": "value1" }'
[string]$jsonWithDatePrefix = $json -replace '"(\d+-\d+.\d+T\d+:\d+:\d+\.\d+\+\d+:\d+)"', '"#$1"'
[pscustomobject]$jsonWithDatePrefixAsObject = $jsonWithDatePrefix | ConvertFrom-Json
$jsonWithDatePrefixAsObject.key1 = "value2"
[string]$updatedJsonString = $jsonWithDatePrefixAsObject | ConvertTo-Json
[string]$updatedJsonStringWithoutPrefix = $updatedJsonString -replace '"(#)(\d+-\d+.\d+T\d+:\d+:\d+\.\d+\+\d+:\d+)"', '"$2"'
Write-Host $updatedJsonStringWithoutPrefix
Two additional ways to change the date format:
Get-Node
Using this Get-Node which is quiet similar to mklement0 recursive function:
$Data = ConvertFrom-Json $Json
$Data |Get-Node -Where { $_.Value -is [DateTime] } | ForEach-Object {
$_.Value = GetDate($_.Value) -Format 'yyyy-MM-ddTHH\:mm\:ss.fffzzz' -AsUTC
}
$Data
DIY
Or do-it-yourself and build your own Json deserializer:
function ConvertFrom-Json {
[CmdletBinding()][OutputType([Object[]])] param(
[Parameter(ValueFromPipeLine = $True, Mandatory = $True)][String]$InputObject,
[String]$DateFormat = 'yyyy-MM-ddTHH\:mm\:ss.fffffffzzz', # Default: ISO 8601, https://www.newtonsoft.com/json/help/html/datesinjson.htm
[Switch]$AsLocalTime,
[Switch]$AsOrdered
)
function GetObject($JObject) {
switch ($JObject.GetType().Name) {
'JValue' {
switch ($JObject.Type) {
'Boolean' { $JObject.Value }
'Integer' { 0 + $JObject.Value } # https://github.com/PowerShell/PowerShell/issues/14264
'Date' { Get-Date $JObject.Value -Format $DateFormat -AsUTC:(!$AsLocalTime) } # https://github.com/PowerShell/PowerShell/issues/13598
Default { "$($JObject.Value)" }
}
}
'JArray' {
,#( $JObject.ForEach{ GetObject $_ } )
}
'JObject' {
$Properties = [Ordered]#{}
$JObject.ForEach{ $Properties[$_.Name] = GetObject $_.Value }
if ($AsOrdered) { $Properties } else { [PSCustomObject]$Properties } # https://github.com/PowerShell/PowerShell/pull/17405
}
}
}
GetObject ([Newtonsoft.Json.Linq.JObject]::Parse($InputObject))
}
Usage:
ConvertFrom-Json $Json -DateFormat 'yyyy-MM-ddTHH\:mm\:ss.fffzzz' |ConvertTo-Json -Depth 9

Powershell: Modify key value pair in JSON file

How do I modify a Key Value Pair in a JSON File with powershell?
We are trying to modify Database Connection, sometimes it can be two levels nested deep, sometimes it can be three levels deep.
Trying to utilize this answer,
Currently we are switching servers in multiple json files, so we can test in different server environments.
Add new key value pair to JSON file in powershell.
"JWTToken": {
"SecretKey": "Security Key For Generate Token",
"Issuer": "ABC Company"
},
"AllowedHosts": "*",
"ModulesConfiguration": {
"AppModules": [ "ABC Modules" ]
},
"ConnectionStrings": {
"DatabaseConnection": "Server=testserver,1433;Database=TestDatabase;User Id=code-developer;password=xyz;Trusted_Connection=False;MultipleActiveResultSets=true;",
"TableStorageConnection": "etc",
"BlobStorageConnection": "etc"
},
Once you convert JSON string to an object with PowerShell, it's not really a problem to then change the properties. The main issue you are going to face here is that your string is currently invalid JSON for .Net or at least it won't be expecting it in the current format. We can fix that though.
Here is your current JSON.
"JWTToken": {
"SecretKey": "Security Key For Generate Token",
"Issuer": "ABC Company"
},
"AllowedHosts": "*",
"ModulesConfiguration": {
"AppModules": [ "ABC Modules" ]
},
"ConnectionStrings": {
"DatabaseConnection": "Server=testserver,1433;Database=TestDatabase;User Id=code-developer;password=xyz;Trusted_Connection=False;MultipleActiveResultSets=true;",
"TableStorageConnection": "etc",
"BlobStorageConnection": "etc"
},
There may be other issues, for PowerShell JSON, in your application.config file, but these two are immediately noticeable to me.
Unnecessary trailing commas
No definitive opening { and closing }
How Can We Fix This?
We can use simple string concatenation to add { and } where necessary.
$RawText = Get-Content -Path .\path_to\application.config -Raw
$RawText = "{ " + $RawText + " }"
To remove any unnecessary parsing issues with trailing commas when parsing the JSON with ConvertFrom-Json we need to remove them via regex. My proposed approach would be to identify them by whether the current array } or ] closes after them, it might be that these closing brackets have a number of spaces or \s before they appear. So we would have a regex that looks like this:
"\,(?=\s*?[\}\]])".
We could then use that with -replace in PowerShell. Of course we will replace them with an empty string.
$FormattedText = $RawText -replace "\,(?=\s*?[\}\]])",""
From here we convert to JSON.
$JsonObj = $FormattedText | ConvertFrom-Json
We can now change your database string by setting a property.
$JsonObj.ConnectionStrings.DatabaseConnection = "your new string"
We use ConvertTo-Json to convert the array back to a Json string.
$JsonString = $JsonObj | ConvertTo-Json
It's not important to return the trailing commas, they aren't valid JSON, but your file needs the first { and last } removing before we commit it back to file with Set-Content.
# Remove the first { and trim white space. Second TrimStart() clears the space.
$JsonString = $JsonString.TrimStart("{").TrimStart()
# Repeat this but for the final } and use TrimEnd().
$JsonString = $JsonString.TrimEnd("}").TrimEnd()
# Write back to file.
$JsonString | Set-Content -Path .\path_to\application.config -Force
Your config file should be written back more or less as you found it. I will try and think of a regex to fix the appearance of the formatting, it shouldn't error, it just doesn't look great. Hope that helps.
EDIT
Here is a function to fix the unsightly appearance of the text in the file.
function Restore-Formatting {
Param (
[parameter(Mandatory=$true,ValueFromPipeline=$true)][string]$InputObject
)
$JsonArray = $InputObject -split "\n"
$Tab = 0
$Output = #()
foreach ($Line in $JsonArray) {
if ($Line -match "{" -or $Line -match "\[") {
$Output += (" " * $Tab) + $Line.TrimStart()
$Tab += 4
}
elseif ($Line -match "^\s+}" -or $Line -match "^\s+\]") {
$Tab -= 4
$Output += (" " * $Tab) + $Line.TrimStart()
}
else {
$Output += (" " * $Tab) + $Line.TrimStart()
}
}
$Output
}
TL;DR Script:
$RawText = Get-Content -Path .\path_to\application.config -Raw
$RawText = "{ " + $RawText + " }"
$FormattedText = $RawText -replace "\,(?=\s*?[\}\]])",""
$JsonObj = $FormattedText | ConvertFrom-Json
$JsonObj.ConnectionStrings.DatabaseConnection = "your new string"
$JsonString = $JsonObj | ConvertTo-Json
$JsonString = $JsonString.TrimStart("{").TrimStart()
$JsonString = $JsonString.TrimEnd("}").TrimEnd()
$JsonString | Restore-Formatting | Set-Content -Path .\path_to\application.config -NoNewLine -Force

Convert to JSON with comments from PowerShell

I have a very simple json and this code works for him:
function Get-CustomHeaders() {
return Get-Content -Raw -Path $JsonName | ConvertFrom-Json
}
However, if my json has any comments // wololo it breaks. Would it be too hard to make this parser accept comments ?
The solution in the other answer only removes // comments if they are at the beginning of a line (with or without spaces), and doesn't remove /* multiline comments */
This code removes all kind of // and /* multiline comments *//
$configFile = (Get-Content path-to-jsonc-file -raw)
# Keep reading, for an improvement
# $configFile = $configFile -replace '(?m)\s*//.*?$' -replace '(?ms)/\*.*?\*/'
As #Jiří Herník indicates in his answer, this expression doesn't have into account the case of strings with comments inside it, for example "url": "http://mydomian.com". To handle this case:
$configFile = $configFile -replace '(?m)(?<=^([^"]|"[^"]*")*)//.*' -replace '(?ms)/\*.*?\*/'
for example removing the comments in this file:
{
// https://github.com/serilog/serilog-settings-configuration
"Serilog": {
"MinimumLevel": "Error", // Verbose, Debug, Information, Warning, Error or Fatal
"WriteTo": [
{
"Name": "File",
"Args": {
"path": "D:\\temp\\MyService\\log.txt",
"rollingInterval": "Day",
"outputTemplate": "{Timestamp:yyyy-MM-dd HH:mm:ss.fff zzz} [{Level:u3}] ({App}) ({Environment}) {Message:lj}{NewLine}{Exception}"
}
},
{/*
"Name": "Seq",*/
"Args": {
"serverUrl": "http://localhost:5341"
}
}
]
}
}
results in:
{
"Serilog": {
"MinimumLevel": "Error",
"WriteTo": [
{
"Name": "File",
"Args": {
"path": "D:\\temp\\MyService\\log.txt",
"rollingInterval": "Day",
"outputTemplate": "{Timestamp:yyyy-MM-dd HH:mm:ss.fff zzz} [{Level:u3}] ({App}) ({Environment}) {Message:lj}{NewLine}{Exception}"
}
} ,
{
"Args": {
"serverUrl": "http://localhost:5341"
}
}
]
}
}
Remove comment lines from your input before the conversion:
(Get-Content $JsonName) -replace '^\s*//.*' | Out-String | ConvertFrom-Json
Here you have an example which can't be handled right by previous answers:
{
"url":"http://something" // note the double slash in URL
}
so here is regexp that solves also this problem.
$configFile = $configFile -replace '(?m)(?<=^([^"]|"[^"]*")*)//.*' -replace '(?ms)/\*.*?\*/'
IMPORTANT NOTE:
Powershell 6.0+ can load JSON with comments in it.
A simpler pattern that catches all combinations of string, escapes and comments is:
$configFile = $configFile -replace '("(\\.|[^\\"])*")|/\*[\S\s]*?\*/|//.*', '$1';
This assumes the file is valid, with no unclosed strings or comments. Invalid files are beyond the scope if this question.
The first part ("(\\.|[^\\"])*") matches full strings and skips any escaped characters, including \\ and \". This is captured so it can be placed back in the replacement string.
The second part /\*[\S\s]*?\*/ matches multiline comments. It uses [\S\s] instead of ., so linebreaks are also matched. It is a combination of non-whitespace characters (\S) and whitespace characters (\s). The *? is a lazy repetition, so it will prefer to match as little as possible, so it won't skip over any closing */.
The last part //.* matches single line comments. The . won't match any linebreak, so it will only match until the end of the line.
When a string is matched, it is captured into slot 1. When a comment is matched, nothing is captured. The replacement is with whatever is in slot 1 ($1). The result is that strings are matched but preserved, but comments are removed.
I wrote a function that takes any comments and puts them back into the JSON file if found.
This also allows reading and writing to the JSON file.
There are comments within. Tested in v5.1 and v7.
# Helper Function
# Write the contents of argument content to a file.
# Will create the file if it does not exist.
Function Write-ToFile {
Param ([Parameter(Mandatory=$true, Position=0)] [string] $path,[Parameter(Mandatory=$true, Position=1)] [string] $content)
[System.IO.File]::WriteAllText($path, $content)
}
Function Invoke-ReadWriteJSON {
<#
.SYNOPSIS
Reads and writes properties from a JSON file.
.DESCRIPTION
This will allow JSON files to have comments, either multi-line or single line
comments are supported.
If the file does not exist or is empty then the default file contents are
written to it.
.NOTES
Author: Ste
Date Created: 2021.05.01
Tested with PowerShell 5.1 and 7.1.
Posted here: https://stackoverflow.com/questions/51066978/convert-to-json-with-comments-from-powershell
.BUGS: NA
.TODO: NA
.PARAMETER filePath
The file path of the JSON file.
.PARAMETER Mode
This parameter is either Read or Write.
.PARAMETER Property
The property of the JSON object.
.PARAMETER newValue
The new property of the JSON object.
.INPUTS
None. You cannot pipe objects to Add-Extension.
.OUTPUTS
Writes to or reads a file using the filePath parameter.
.EXAMPLE (Write the property "Prop 1" with the value "Get in you machine!" to a file)
PS> Invoke-ReadWriteJSON -filePath $jsonFilePath "Write" "Prop 1" "Get in you machine!"
.EXAMPLE (Read a property from a file)
PS> Invoke-ReadWriteJSON -filePath $jsonFilePath "Read" "Prop 2"
PS> temp
#>
Param
(
[Parameter(Mandatory = $true, HelpMessage = 'The file path of the JSON file.')]
[String]$filePath,
[Parameter(Mandatory = $true, HelpMessage = 'This parameter is either Read or Write.')]
[String]$Mode,
[Parameter(Mandatory = $true, HelpMessage = 'The property of the JSON object.')]
[String]$Property,
[Parameter(Mandatory = $false, HelpMessage = 'The new property of the JSON object.')]
[String]$newValue
)
# If there is a file then set its content else set the content variable to empty.
if (Test-Path -LiteralPath $filePath) {
$contents = Get-Content -LiteralPath $filePath
$contents = $contents -replace '\s*' # Replace any whitespaces so that the length can be checked.
}
else {
$contents = ''
}
# if the file does not exist or the contents are empty
if ((Test-Path -LiteralPath $filePath) -eq $false -or $contents.length -eq 0) {
Write-ToFile $filePath $jsonSettingFileDefaultContents
}
# This will allow single and multiline comments in the json file.
# Regex for removing comments: https://stackoverflow.com/a/59264162/8262102
$jsonContents = (Get-Content -LiteralPath $filePath -Raw) -replace '(?m)(?<=^([^"]|"[^"]*")*)//.*' -replace '(?ms)/\*.*?\*/' | Out-String | ConvertFrom-Json
# Grab the comments that will be used late on.
$jsonComments = (Get-Content -LiteralPath $filePath -Raw) -replace '(?s)\s*\{.*\}\s*'
# Read the property.
if ($Mode -eq "Read") {return $jsonContents.$Property}
# Write the property.
if ($Mode -eq "Write") {
$jsonContents.$Property = $newValue
$jsonContents | ConvertTo-Json -depth 32 | set-content $filePath
# Trims any whitespace from the beginning and end of contents.
Set-content $filePath ((Get-Content -LiteralPath $filePath -Raw) -replace '(?s)^\s*|\s*$')
}
# If there are comments then this section will add them back in. Important to
# read contents with -Raw switch here.
if ($jsonComments.length -gt 0) {
$jsonNewcontents = (Get-Content -LiteralPath $filePath -Raw) -replace '(?m)(?<=^([^"]|"[^"]*")*)//.*' -replace '(?ms)/\*.*?\*/'
# Trims any whitespace from the beginning and end of contents.
Set-content $filePath (("$jsonComments`n" + $jsonNewcontents) -replace '(?s)^\s*|\s*$')
}
}
$deskTopFolder = [Environment]::GetFolderPath("DesktopDirectory")
$jsonFilePath = "$deskTopFolder\color-dialog-settings.json"
$jsonSettingFileDefaultContents = #'
// Some comments go here.
// Some comments go here.
// Some comments go here.
{
"Prop 1": "temp",
"Prop 2": "temp"
}
'#
# Write the JSON property.
# Invoke-ReadWriteJSON -filePath $jsonFilePath "Write" "Prop 1" "Get in you machine!"
# Read the JSON property.
Invoke-ReadWriteJSON -filePath $jsonFilePath "Read" "Prop 2"
# PS> temp

Split Period-Delimited Nodes To JSON Object

I have many string entries (this are namespace/class trees) that look like the following:
appsystem
appsystem.applications
appsystem.applications.APPactivities
appsystem.applications.APPmanager
appsystem.applications.APPmodels
appsystem.applications.MAPmanager
appsystem.applications.MAPmanager.maphub
appsystem.applications.MAPmanager.mapmanager
appsystem.applications.pagealertsmanager
appsystem.authentication
appsystem.authentication.manager
appsystem.authentication.manager.encryptionmanager
appsystem.authentication.manager.sso
appsystem.authentication.manager.tokenmanager
But, I need the final output to be like:
{
"name": "appsystem",
"children": [
{
"name": "applications",
"children": [
{"name": "APPactivities"},
{"name": "APPmanager"},
{"name": "APPmodels"},
{"name": "MAPmanager",
"children": [
{"name": "maphub"},
{"name": "mapmanager"}
]},
{"name": "pagealertsmanager"}
]
},
{
"name": "authentication",
"children": [
{"name": "manager",
"children": [
{"name": "encryptionmanager"},
{"name": "sso"},
{"name": "tokenmanager"}
]}
]
}
]
}
The total nodes can be any number.
I am assuming I am going to need recursion but I am at a loss on where even to begin.
This builds up nested lists, PowerShell ConvertTo-JSON flattens the outer list.
You can change the $Line in $s to $line in (Get-Content input.txt).
But I think this does it:
$s = #'
appsystem
appsystem.applications
appsystem.applications.APPactivities
appsystem.applications.APPmanager
appsystem.applications.APPmodels
appsystem.applications.MAPmanager
appsystem.applications.MAPmanager.maphub
appsystem.applications.MAPmanager.mapmanager
appsystem.applications.pagealertsmanager
appsystem.authentication
appsystem.authentication.manager
appsystem.authentication.manager.encryptionmanager
appsystem.authentication.manager.sso
appsystem.authentication.manager.tokenmanager
'# -split "`r`n"
$TreeRoot = New-Object System.Collections.ArrayList
foreach ($Line in $s) {
$CurrentDepth = $TreeRoot
$RemainingChunks = $Line.Split('.')
while ($RemainingChunks)
{
# If there is a dictionary at this depth then use it, otherwise create one.
$Item = $CurrentDepth | Where-Object {$_.name -eq $RemainingChunks[0]}
if (-not $Item)
{
$Item = #{name=$RemainingChunks[0]}
$null = $CurrentDepth.Add($Item)
}
# If there will be child nodes, look for a 'children' node, or create one.
if ($RemainingChunks.Count -gt 1)
{
if (-not $Item.ContainsKey('children'))
{
$Item['children'] = New-Object System.Collections.ArrayList
}
$CurrentDepth = $Item['children']
}
$RemainingChunks = $RemainingChunks[1..$RemainingChunks.Count]
}
}
$TreeRoot | ConvertTo-Json -Depth 1000
Edit: It's too slow? I tried some random pausing profiling and found (not too surprisingly) that it's the inner nested loop, which searches children arrays for matching child nodes, which is being hit too many times.
This is a redesigned version which still builds the tree, and this time it also builds a TreeMap hashtable of shortcuts into the tree, to all the previously build nodes, so it can jump right too them instead of searching the children lists for them.
I made a testing file, some 20k random lines. Original code processed it in 108 seconds, this one does it in 1.5 seconds and the output matches.
$TreeRoot = New-Object System.Collections.ArrayList
$TreeMap = #{}
foreach ($line in (Get-Content d:\out.txt)) {
$_ = ".$line" # easier if the lines start with a dot
if ($TreeMap.ContainsKey($_)) # Skip duplicate lines
{
continue
}
# build a subtree from the right. a.b.c.d.e -> e then d->e then c->d->e
# keep going until base 'a.b' reduces to something already in the tree, connect new bit to that.
$LineSubTree = $null
$TreeConnectionPoint = $null
do {
$lastDotPos = $_.LastIndexOf('.')
$leaf = $_.Substring($lastDotPos + 1)
$_ = $_.Substring(0, $lastDotPos)
# push the leaf on top of the growing subtree
$LineSubTree = if ($LineSubTree) {
#{"name"=$leaf; "children"=([System.Collections.ArrayList]#($LineSubTree))}
} else {
#{"name"=$leaf}
}
$TreeMap["$_.$leaf"] = $LineSubTree
} while (!($TreeConnectionPoint = $TreeMap[$_]) -and $_)
# Now we have a branch built to connect in to the existing tree
# but is there somewhere to put it?
if ($TreeConnectionPoint)
{
if ($TreeConnectionPoint.ContainsKey('children'))
{
$null = $TreeConnectionPoint['children'].Add($LineSubTree)
} else {
$TreeConnectionPoint['children'] = [System.Collections.ArrayList]#($LineSubTree)
}
} else
{ # nowhere to put it, this is a new root level connection
$null = $TreeRoot.Add($LineSubTree)
}
}
$TreeRoot | ConvertTo-Json -Depth 100
(#mklement0's code takes 103 seconds and produces a wildly different output - 5.4M characters of JSON instead of 10.1M characters of JSON. [Edit: because my code allows multiple root nodes in a list which my test file has, and their code does not allow that])
Auto-generated PS help links from my codeblock (if available):
New-Object (in module Microsoft.PowerShell.Utility)
Get-Content (in module Microsoft.PowerShell.Management)
ConvertTo-Json (in module Microsoft.PowerShell.Utility)
To complement TessellatingHeckler's great answer with an alternative implementation that uses a recursive function.
The emphasis is on modularity and terseness, not performance.[1]
# Outer function that loops over all paths and builds up a one or more nested
# hashtables reflecting the path hierarchy, which are converted to JSON on output.
# Note that only a single JSON object is output if all paths share the same root
# component; otherwise, a JSON *array* is output.
function convert-PathsToNestedJsonObject([string[]] $paths) {
$hts = New-Object Collections.ArrayList
$paths.ForEach({
$rootName = $_.split('.')[0]
$ht = $hts.Where({ $_.name -eq $rootName }, 'First')[0]
if (-not $ht) { [void] $hts.Add(($ht = #{})) }
convert-PathToNestedHashtable $ht $_
})
$hts | ConvertTo-Json -Depth 100
}
# Recursive helper function that takes a path such as "appsystem.applications"
# and converts it into a nested hashtable with keys "name" and "children" to
# reflect the path hierarchy.
function convert-PathToNestedHashtable([hashtable] $ht, [string] $path) {
$name, $childName, $rest = $path -split '\.', 3
$ht.name = $name
if ($childName) {
if ($ht.children) {
$htChild = $ht.children.Where({ $_.name -eq $childName }, 'First')[0]
} else {
$ht.children = New-Object Collections.ArrayList
$htChild = $null
}
if (-not $htChild) {
[void] $ht.children.Add(($htChild = #{}))
}
convert-PathToNestedHashtable $htChild "$childName.$rest"
}
}
# Call the outer function with the input paths (assumed to be stored in $paths).
convert-PathsToNestedJsonObject $paths
[1] One deliberate type of optimization is applied, which, however, still keeps the code terse:
PSv4+ offers the (little-known) array methods .ForEach() and .Where(), which are not only noticeably faster than their cmdlet counterparts ForEach-Object and Where-Object, but also offer additional features.
Specifically:
$paths.ForEach({ ... }) is used instead of
$paths | ForEach-Object { ... }
$ht.children.Where({ $_.name -eq $childName }, 'First')[0] is used instead of
$ht.children | Where-Object { $_.name -eq $childName } | Select-Object -First 1