I'm processing JSON files in PowerShell, and it seems that ConvertFrom-Json changes case on its inputs only on some (rare) occasions.
For example, when I do:
$JsonStringSrc = '{"x":2.2737367544323206e-13,"y":1759,"z":33000,"width":664}'
$JsonStringTarget = $JsonStringSrc | ConvertFrom-Json | ConvertTo-Json -Depth 100 -Compress
$JsonStringTarget
It returns:
{"x":2.2737367544323206E-13,"y":1759,"z":33000,"width":664}
Lower case e became an uppercase E, messing up my hashes when validating proper i/o during processing.
Is this expected behavior (perhaps a regional setting)? Is there a setting for ConvertFrom-Json to leave my inputs alone for the output?
The problem lies in the way PowerShell's JSON library output the CLR foating point numbers. By converting from JSON you turn the JSON string into a CLR/PowerShell object with associated types for numbers and strings and such. Converting back to JSON serializes that object back to JSON, but uses the .NET default formatter configuration to do so. There is no metadata from the original JSON document to aid the conversion. Rounding errors and truncation, different order for elements may happen here too.
The JSON spec for canonical form (the form you want to use when hashing) is as follows:
MUST represent all non-integer numbers in exponential notation
including a nonzero single-digit significant integer part, and
including a nonempty significant fractional part, and
including no trailing zeroes in the significant fractional part (other than as part of a “.0” required to satisfy the preceding point), and
including a capital “E”, and
including no plus sign in the exponent, and
including no insignificant leading zeroes in the exponent
Source: https://gibson042.github.io/canonicaljson-spec/
Though the specs for JSON supports both options (e and E).
exponent
""
'E' sign digits
'e' sign digits
Source: https://www.crockford.com/mckeeman.html
You may be able to convert the object to JSON using the Newtonsoft.Json classes directly and passing in a custom Convertor.
https://stackoverflow.com/a/28743082/736079
A better solution would probably be to use a specialized formatter component that directly manipulates the existing JSON document without converting it to CLR objects first.
Related
Get-Content 'file.json' | ConvertFrom-Json
This produces a different result for powershell 5 vs 7.
v5 gives me actual timestamp values from the json: eg 2018-01-26T17:48:51.220Z
v7 gives me reprocessed timestamp values from the json eg 26/01/2018 17:48:51
How can I get v7 to behave as v5? I need the original values from the json.
The behavior of ConvertFrom-Json changed in PowerShell [Core] v6+: string values formatted with the o (round-trip) standard date/time format are now converted to [datetime]) instances rather than being parsed as strings - this is a convenient way to round-trip timestamps via (v6+) ConvertTo-Json, without having to do explicit to/from string conversions.
If you need the old behavior back, convert the resulting [datetime] instances back to strings explicitly, using .ToString('o').
Here's a simple example:
# v6+
PS> ('{ "timestamp": "2018-01-26T17:48:51.220Z" }' |
ConvertFrom-Json).timestamp.ToString('o')
2018-01-26T17:48:51.2200000Z
There is some flexibility around variations in the input format: the fractional seconds are optional, and if, present, the number of decimal places is allowed to vary.
By contrast, the o format always uses 7 decimal places, which differs from your input.
You're free to apply custom formatting based on a fixed number of decimal places, but note that you won't be able to tell how many decimal places were actually used in the input.
E.g., to get 3 decimal places:
[datetime]::UtcNow.ToString("yyyy-MM-dd'T'HH':'mm':'ss'.'fffK")
If you want to prevent the to-[datetime] conversion at the source, you'll have to use a lower-level approach - ConvertFrom-Json doesn't offer a solution.
Trying to scrape a webpage, I hit the necessity to work with ASP.NET's __VIEWSTATE variables. So, ever the optimist, I decided to read up on those variables, and their formats. Even though classified as Open Source by Microsoft, I couldn't find any formal definition:
Everybody agrees the first step to do is decode the string, using a Base64 decoder. Great - that works...
Next - and this is where the confusion sets in:
Roughly 3/4 of the decoders seem to use binary values (characters whose values indicate the the type of field which is follow). Here's an example of such a specification. This format also seems to expect a 'signature' of 0xFF 0x01 as first two bytes.
The rest of the articles (such as this one) describe a format where the fields in the format are separated (or marked) by t< ... >, p< ... >, etc. (this seems to be the case of the page I'm interested in).
Even after looking at over a hundred pages, I didn't find any mention about the existence of two formats.
My questions are: Are there two different formats of __VIEWSTATE variables in use, or am I missing something basic? Is there any formal description of the __VIEWSTATE contents somewhere?
The view state is serialized and deserialized by the
System.Web.UI.LosFormatter class—the LOS stands for limited object
serialization—and is designed to efficiently serialize certain types
of objects into a base-64 encoded string. The LosFormatter can
serialize any type of object that can be serialized by the
BinaryFormatter class, but is built to efficiently serialize objects
of the following types:
Strings
Integers
Booleans
Arrays
ArrayLists
Hashtables
Pairs
Triplets
Everything you need to know about ViewState: Understanding View State
Here is the MWE, how to get the correct number as character.
require(jsonlite)
j <- "{\"id\": 323907258301939713}"
a <- fromJSON(j)
print(a$id, digits = 20)
class(a$id)
a$id <- as.character(a$id)
a$id
class(a$id)
Here is the output.
Loading required package: jsonlite
Loading required package: methods
[1] 323907258301939712
[1] "numeric"
[1] "323907258301939712"
[1] "character"
I want to get the exact number 323907258301939713 as character in a
In JavaScript, numbers are double precision floating point, even when they look like integers, so they only have about 16 decimal digits of precision. In particular, the JavaScript code:
console.log(12345678901234567 == 12345678901234568)
prints "true".
The JSON standard inherits this limitation from JavaScript, so jsonlite is actually correctly interpreting your JSON by reading the number as a double.
Because this is actually a limitation of the JSON standard, if you have control over the program generating the JSON, you will save yourself pain and heartache down the road if you fix your JSON (for example, by representing the id attribute as a string):
{ "id": "323907258301939713" }
But, if you absolutely must parse this badly formed JSON, then you're in luck. The fromJSON function takes an undocumented boolean argument bigint_as_char which reads these large numbers into R as character values:
> a <- fromJSON(j, bigint_as_char=TRUE)
> print(a$id)
[1] "323907258301939713"
>
However, you must be prepared to handle both plain numbers and character values in the rest of your R code, since fromJSON will still read small integers as normal numbers and only read these too-big integers as strings.
I'm sorry if this is really trivial, but I've got a series of data as follows:
{"color":{"red":255,"blue":123,"green",1}}
I know it's in this format because, for some reason, it's easy to work with. What is this format called so that I might look it up?
\edit: If there is any significance to the organization of the data, of course.
That's JSON, a serialised text data storage based on a subset of JavaScript Object Notation. To learn more about JSON, vist: http://json.org
In JSON, there are the following data types:
object
array
number
string
null
boolean
Objects are represented using the following syntax, and are key-value pairings, similar to a dictionary (the key must be a string):
{ "number": 1, "string": "test" }
Like dictionaries, objects are unordered.
An array is a ordered, heterogeneous data structure, represented using the following syntax:
[0, true, false, "1", null]
Numbers are what you'd expect, however unlike JavaScript itself they cannot be Infinity or NaN (i.e. they must be decimals or integers) and contain no leading 0s. Exponents are represented using the following format (the e is not case sensitive):
10e6
where 10 is the base and 6 is the exponent - this is equivalent to 1000000 (1 million). The exponent section may have leading 0s, though there is not much point and may lower compatibility with parsers which are not 100% compliant.
Booleans are case sensitive and are both lowercase. In JSON, there are only two booleans:
true
false
To represent an intentionally left out or otherwise unknown field, use null (case sensitive too).
Strings must be delimited using double quotes (single quotes are invalid syntax), and single quotes need not be escaped.
"This string is valid, and it's alright to do this."
'No, this won't work'
'Nor will this.'
There are numerous escapes available using the backslash character - to use a literal backslash, use \\.
As JSON is a data transmission format, there is no syntax for comments available.
I am not asking for any libraries to do so and I am just writing code for bson_to_json and json_to_bson.
so here is the BSON specification.
For regular double, doc, array, string, it is fine and it is easy to convert between BSON and JSON.
However, for those particular objects, such as
Timestamp and UTC:
If convert from JSON to BSON, how can I know they are timestamp and utc?
Regex (string, string), JavaScript code with scope (string, doc)
their structures have multiple parts, how can I present the structures in JSON?
Binary data (generic, function, etc)`
How can I present the type of binary data in JSON?
int32 and int64
How can I present them in JSON, so BSON can know which is 32 bit or 64 bit?
Thanks
As we know JSON cannot express objects so you will need to decide how you want the stringified version of the BSON objects (field types) to be represented within the output of your ocaml driver.
Some of the data types are easy, Timestamp is not needed since it is internal to sharding only and Javascript blocks are best left out due to the fact that they are best used only within system.js as saved functions for use in MRs.
You also gotta consider that some of these fields are actually both in and out. What I mean by in and out is that some are used to specify input documents to be serialised to BSON and some are part of output document that need deserialising from BSON into JSON.
Regex is one which will most likely be a field type you send down. As such you will need to serialise your ocaml object to the BSON equivilant of {$regex: 'd', '$options': 'ig'} from /d/ig PCRE representation.
Dates can be represented in JSON by either choosing to use the ISODate string or a timestamp for the representation. The output will be something like {$sec:556675,$usec:6787} and you can convert $sec to the display you need.
Binary data in JSON can be represented by taking the data (if I remember right) property from the output document and then encoding that to base 64 and storing it as a stirng in the field.
int32 and int64 has no real definition between the two in JSON except that 64bit ints will be bigger than 2147483647 so I am unsure if you can keep the data types unique there.
That should help get you started.