Powershell convert to json removes special characters - json

I have problem with save string to json file.
$newY = "12313tytk1.xp1`F4i12313211ddsada;"
First I read json file
$a = Get-Content 'settings.json' -raw | ConvertFrom-Json
Then updating field
$a.X.y = $newY
And saving file
$a | ConvertTo-Json -Depth 5 | set-content 'settings.json'
There are some problems:
After save Y in file is wrong:
"12313tytk1.xp1F4i12313211ddsada;"
The special characters are missing: `.
File is wrongly formatted. To much spaces
"<" and ">" are changed to: \u003c and \u003e
How to change it?

Bactick ` is an escape character in Powershell. Single quoted strings ' are string literals, so the contents are not evaluated, escaped or the like. Doulbe quoted strings " are evaluated, so the backtick is interpreted as an escape character. See about_Quoting_Rules for more information.
Consider,
PS C:\> $newY = "12313tytk1.xp1`F4i12313211ddsada;"
PS C:\> $newY # Misses the backtick
12313tytk1.xp1F4i12313211ddsada;
PS C:\> $newY2 = '12313tytk1.xp1`F4i12313211ddsada;'
PS C:\> $newY2 # Contains the backtick
12313tytk1.xp1`F4i12313211ddsada;

Related

How to convert cyrillic into utf16

tl;dr Is there a way to convert cyrillic stored in hashtable into UTF-16?
Like кириллица into \u043a\u0438\u0440\u0438\u043b\u043b\u0438\u0446\u0430
I need to import file, parse it into id and value then convert it into .json and now im struggling to find a way to convert value into utf codes.
And yes, it is needed that way
cyrillic.txt:
1 кириллица
PH:
clear-host
foreach ($line in (Get-Content C:\Users\users\Downloads\cyrillic.txt)){
$nline = $line.Split(' ', 2)
$properties = #{
'id'= $nline[0] #stores "1" from file
'value'=$nline[1] #stores "кириллица" from file
}
$temp+=New-Object PSObject -Property $properties
}
$temp | ConvertTo-Json | Out-File "C:\Users\user\Downloads\data.json"
Output:
[
{
"id": "1",
"value": "кириллица"
},
]
Needed:
[
{
"id": "1",
"value": "\u043a\u0438\u0440\u0438\u043b\u043b\u0438\u0446\u0430"
},
]
At this point as a newcomer to PH i have no idea even how to search for it properly
Building on Jeroen Mostert's helpful comment, the following works robustly, assuming that the input file contains no NUL characters (which is usually a safe assumption for text files):
# Sample value pair; loop over file lines omitted for brevity.
$nline = '1 кириллица'.Split(' ', 2)
$properties = [ordered] #{
id = $nline[0]
# Insert aux. NUL characters before the 4-digit hex representations of each
# code unit, to be removed later.
value = -join ([uint16[]] [char[]] $nline[1]).ForEach({ "`0{0:x4}" -f $_ })
}
# Convert to JSON, then remove the escaped representations of the aux. NUL chars.,
# resulting in proper JSON escape sequences.
# Note: ... | Out-File ... omitted.
(ConvertTo-Json #($properties)) -replace '\\u0000', '\u'
Output (pipe to ConvertFrom-Json to verify that it works):
[
{
"id": "1",
"value": "\u043a\u0438\u0440\u0438\u043b\u043b\u0438\u0446\u0430"
}
]
Explanation:
[uint16[]] [char[]] $nline[1] converts the [char] instances of the strings stored in $nline[1] into the underlying UTF-16 code units (a .NET [char] is an unsigned 16-bit integer encoding a Unicode code point).
Note that this works even with Unicode characters that have code points above 0xFFFF, i.e. that are too large to fit into a [uint16]. Such characters outside the so-called BMP (Basic Multilingual Plane), e.g. 👍, are simply represented as pairs of UTF-16 code units, so-called surrogate pairs, which a JSON processor should recognize (ConvertFrom-Json does).
However, on Windows such chars. may not render correctly, depending on your console window's font. The safest option is to use Windows Terminal, available in the Microsoft Store
The call to the .ForEach() array method processes each resulting code unit:
"`0{0:x4}" -f $_ uses an expandable string to create a string that starts with a NUL character ("`0"), followed by a 4-digit hex. representation (x4) of the code unit at hand, created via -f, the format operator.
This trick of replacing what should ultimately be a verbatim \u prefix temporarily with a NUL character is needed, because a verbatim \ embedded in a string value would invariably be doubled in its JSON representation, given that \ acts the escape character in JSON.
The result is something like "<NUL>043a", which ConvertTo-Json transforms as follows, given that it must escape each NUL character as \u0000:
"\u0000043a"
The result from ConvertTo-Json can then be transformed into the desired escape sequences simply by replacing \u0000 (escaped as \\u0000 for use with the regex-based -replace oeprator) with \u, e.g.:
"\u0000043a" -replace '\\u0000', '\u' # -> "\u043a", i.e. к
Here's a way simply saving it to a utf16be file and then reading out the bytes, and formatting it, skipping the first 2 bytes, which is the bom (\ufeff). $_ didn't work by itself. Note that there's two utf16 encodings that have different byte orders, big endian and little endian. The range of cyrillic is U+0400..U+04FF. Added -nonewline.
'кириллица' | set-content utf16be.txt -encoding BigEndianUnicode -nonewline
$list = get-content utf16be.txt -Encoding Byte -readcount 2 |
% { '\u{0:x2}{1:x2}' -f $_[0],$_[1] } | select -skip 1
-join $list
\u043a\u0438\u0440\u0438\u043b\u043b\u0438\u0446\u0430
There must be a simpler way of doing this, but this could work for you:
$temp = foreach ($line in (Get-Content -Path 'C:\Users\users\Downloads\cyrillic.txt')){
$nline = $line.Split(' ', 2)
# output an object straight away so it gets collected in variable $temp
[PsCustomObject]#{
id = $nline[0] #stores "1" from file
value = (([system.Text.Encoding]::BigEndianUnicode.GetBytes($nline[1]) |
ForEach-Object {'{0:x2}' -f $_ }) -join '' -split '(.{4})' -ne '' |
ForEach-Object { '\u{0}' -f $_ }) -join ''
}
}
($temp | ConvertTo-Json) -replace '\\\\u', '\u' | Out-File 'C:\Users\user\Downloads\data.json'
Simpler using .ToCharArray():
$temp = foreach ($line in (Get-Content -Path 'C:\Users\users\Downloads\cyrillic.txt')){
$nline = $line.Split(' ', 2)
# output an object straight away so it gets collected in variable $temp
[PsCustomObject]#{
id = $nline[0] #stores "1" from file
value = ($nline[1].ToCharArray() | ForEach-Object {'\u{0:x4}' -f [uint16]$_ }) -join ''
}
}
($temp | ConvertTo-Json) -replace '\\\\u', '\u' | Out-File 'C:\Users\user\Downloads\data.json'
Value "кириллица" will be converted to \u043a\u0438\u0440\u0438\u043b\u043b\u0438\u0446\u0430

Powershell String keep CR & LF

I want to Insert a Blob in a MySQL Database via "MySql.Data.MySqlClient.MySqlConnection", it works fine but Powershell removes the Newline Characters from my Variable:
$configString = Get-Content config.xml
$configString > test5.xml
$strAddConfig = "INSERT INTO config_xml(Version,ConfigXML,MD5,Comment,ClientMinVersion) VALUES('2','" + $($configString) + "','$configMD5','BLABLA','5.0.0.0')"
When I pipe the $configString variable into a Textfile, the CR&LF characters will keep, also when I input the blob via copy paste the Newline characters wil considered. I tried the "strAddConfig" line with different Quote combinations, but I wasn`t successful.
Thanks for help
# Read the text content *as a single, multi-line string*, using -Raw
$configString = Get-Content -Raw config.xml
# Use an expandable here-string for better formatting.
$strAddConfig = #"
INSERT INTO config_xml(Version,ConfigXML,MD5,Comment,ClientMinVersion)
VALUES('2','$configString','$configMD5','BLABLA','5.0.0.0')
"#
Your (primary) problem was the use of Get-Content config.xml without -Raw, which stored an array of lines in $configString, and when an array is used in an expandable string ("...", with string interpolation), its (stringified) elements are space-separated:
PS> $array = 'one', 'two'; "$array"
one two

Powershell not able to convert while converting values from "&" to JSON

RoleFullPath
Applications\User Admin & Support-DEMO
PowerShell Code
$NewJSON.roleFullPath = $Line.RoleFullPath
.
.
.
.
$JSONPath = $RolePath + $FolderName + "-JSON.json"
Convertto-JSON $NewJSON | Out-file -Encoding "UTF8" $JSONPath
Output:
"roleFullPath": "Applications\\User Admin \u0026 Support-DEMO"
While converting from csv to json, character '&' is getting converted to '\u0026'
Any help?
In Windows PowerShell v5.1, ConvertTo-Json indeed unexpectedly encodes & characters as Unicode escape sequence \u0026, where 0026 represents hex. number 0x26, the Unicode code point representing the & character, U+0026.
(PowerShell Core, by contrast, preserves the & as-is.)
That said, JSON parsers should be able to interpret such escape sequences and, indeed, the complementary ConvertFrom-Json cmdlet is.
Note: The solutions below are general ones that can handle the Unicode escape sequences of any Unicode character; since ConvertTo-Json seemingly only uses these Unicode escape-sequence representations for the characters &, ', < and >, a simpler solution is possible, unless false positives must be ruled out - see this answer.
That said, if you do want to manually convert Unicode escape sequences into their character equivalents in JSON text, you can use the following - limited solution:
# Sample JSON with Unicode escapes.
$json = '{ "roleFullPath": "Applications\\User Admin \u0026 Support-DEMO" }'
# Replace Unicode escapes with the chars. they represent,
# with limitations.
[regex]::replace($json, '\\u[0-9a-fA-F]{4}', {
param($match) [char] [int] ('0x' + $match.Value.Substring(2))
})
The above yields:
{ "roleFullPath": "Applications\\User Admin & Support-DEMO" }
Note how \u0026 was converted to the char. it represents, &.
A robust solution requires more work:
There are characters that must be escaped in JSON and cannot be represented literally, so in order for the to-character conversion to work generically, these characters must be excluded.
Additionally, false positives must be avoided; e.g., \\u0026 is not a valid Unicode escape sequence, because a JSON parser interprets \\ as an escaped \ followed by verbatim u0026.
Finally, the Unicode sequences for " and \ must be translated into their escaped forms, \" and \\, and it is possible to represent a few ASCII-range control characters by C-style escape sequences, e.g., \t for a tab character (\u0009).
The following robust solution addresses all these issues:
# Sample JSON with Unicode escape sequences:
# \u0026 is &, which CAN be converted to the literal char.
# \u000a is a newline (LF) character, which CANNOT be converted, but can
# be translated to escape sequence "\n"
# \\u0026 is *not* a Unicode escape sequence and must be preserved as-is.
$json = '{
"roleFullPath": "Applications\u000aUser Admin \u0026 Support-DEMO-\\u0026"
}'
[regex]::replace($json, '(?<=(?:^|[^\\])(?:\\\\)*)\\u([0-9a-fA-F]{4})', {
param($match)
$codePoint = [int] ('0x' + $match.Groups[1].Value)
if ($codePoint -in 0x22, 0x5c) {
# " or \ must be \-escaped.
'\' + [char] $codePoint
}
elseif ($codePoint -in 0x8, 0x9, 0xa, 0xc, 0xd) {
# Control chars. that can be represented as short, C-style escape sequences.
('\b', '\t', '\n', $null, '\f', '\r')[$codePoint - 0x8]
}
elseif ($codePoint -le 0x1f -or [char]::IsSurrogate([char] $codePoint)) {
# Other control chars. and halves of surrogate pairs must be retained
# as escape sequences.
# (Converting surrogate pairs to a single char. would require much more effort.)
$match.Value
}
else {
# Translate to literal char.
[char] $codePoint
}
})
Output:
{
"roleFullPath": "Applications\nUser Admin & Support-DEMO-\\u0026"
}
To stop Powershell from doing this pipe your Json output through this
$jsonOutput | ForEach-Object { [System.Text.RegularExpressions.Regex]::Unescape($_) } | Set-Content $jsonPath -Encoding UTF8;
This will prevent the & being converted :)

JSON slashes and backslashes in string on bourne shell

I am trying to parse json files that contain sequences of slashes and backslashes in some of their strings like this:
echo '{"tag_string":"/\/\/\ test"}' | jq
which gives me:
parse error: Invalid escape at line 1, column 27
I have tried escaping with backslashes at different positions, but I can't seem to find a correct way. How do I output the string as it is, without removing any character or getting errors?
This only works on bash, but not sh (or zsh):
echo '{"tag_string":"/\\/\\/\\ test"}' | jq -r '.tag_string'
/\/\/\ test
A forward slash character is legal, but a single backslash character is not. According to json.org char description, the valid chars are:
char
any-Unicode-character-
except-"-or-\-or-
control-character
\"
\\
\/
\b
\f
\n
\r
\t
\u four-hex-digits
So in your example, the single backslashes are not legal, you need either "\\" which is interpreted as double backslashes, or you need to remove them entirely.
If you are trying to include literal backslashes:
(bash)
echo '{"tag_string":"/\\/\\/\\ test"}' | jq
{
"tag_string": "/\\/\\/\\ test"
}
echo '{"tag_string":"/\\/\\/\\ test"}' | jq -r '.["tag_string"]'
/\/\/\ test
(sh)
echo '{"tag_string":"/\\\\/\\\\/\\\\ test"}' | jq -r '.["tag_string"]'
/\/\/\ test
printf "%s" '{"tag_string":"/\\/\\/\\ test"}' | jq -r '.["tag_string"]'
/\/\/\ test
If you are trying to convert a file with non-JSON strings, then consider a tool such as any-json. Using the "cson-to-json" mode, "\/" will be interpreted as "/":
$ any-json -format=cson
Input:
{"tag_string":"/\/\/\ test"}
Output:
{
"tag_string": "/// test"
}

PowerShell: ConvertTo-Json problem containing special characters

I am writing a script to make changes to a JSON file but when the file is converted back to JSON it expands special characters.
For example the JSON File contain passwords with "&". A quick way to replicate the problem is using following command:
PS> "Password&123" | Convertto-Json
output is:"Password\u0026123"
##Here is how I import the JSON FILE:
$jsonfile = (Get-Content .\example.json -Encoding Ascii) -join "`n" | ConvertFrom-Json
##Exporting JSON FILE without modifying it.
$jsonfile | ConvertTo-Json |Out-File "new.json"
--here is an example of simplified JSON FILE
{
"Server1":
{
"username":"root",
"password":"Password&dfdf"
},
"Server2":
{
"username":"admin",
"password":"Password&1234"
}
}
Try the Unescape() method:
$jsonfile | ConvertTo-Json | % { [System.Text.RegularExpressions.Regex]::Unescape($_) } | Out-File "new.json"
This is caused by the automatic character escape feature of Convertto-Json and it affects several symbols such as <>\'&
ConvertFrom-Json will read the escaped characters properly. Using your example:
PS C:\> {"Password\u0026123"} | ConvertFrom-Json
Password&123
And your example code results in a file that has escaped characters, but ConvertFrom-Json can read it back to the original passwords. See below:
PS C:\> (Get-Content .\example.json -Encoding Ascii) -join "`n" | ConvertFrom-Json
Server1 Server2
------- -------
#{username=root; password=Password&dfdf} #{username=admin; password=Password&1234}
PS C:\> (Get-Content .\new.json -Encoding Ascii) -join "`n" | ConvertFrom-Json
Server1 Server2
------- -------
#{username=root; password=Password&dfdf} #{username=admin; password=Password&1234}
If you need the passwords to be stored unescaped, some fancier work may be needed. See this thread about Converting Unicode strings to escaped ascii strings
Alternatively, avoid affected characters if possible.
Tested with PowerShell 7.2 and appears unicode and other special characters are converted successfully. Not to mention indentation also appears improved.