Ruby how to parse a Json file with Array keys - json

I'm trying to get a Hash from a Json file that has Array Keys, but its return each array key like string.
hash = {[10, 10] => [[1, 1], [5, 5]]}
p JSON.parse(hash.to_json) #=> {"[10, 10]" => [[1, 1], [5, 5]]}
Maybe i should use YAML, any idea?

There are three slightly different versions JSON, as specified by
The original JSON.org website
ECMA-404 – The JSON Data Interchange Syntax
RFC8259 – The JavaScript Object Notation (JSON) Data Interchange Format
While there are small differences between the three, one thing they all agree on: Object Keys are Strings. Always.
In other words, "a Json file that has Array Keys" cannot possibly exist. Whatever you have, it is either a JSON file, but then it cannot have Array Keys, or it is simply not a JSON file.

Related

What is the difference between as_json vs JSON.parse?

Could someone please specify difference between as_json and JSON.parse?
The only difference is that JSON.parse part of Ruby Standard Library and as_json part of Rails ActiveSupport?
Rails console:
irb(main):001:0> json_str = "{\"foo\": {\"bar\": 1, \"baz\": 2}, \"bat\": [0, 1, 2]}"
irb(main):002:0> puts JSON.parse(json_str)
{"foo"=>{"bar"=>1, "baz"=>2}, "bat"=>[0, 1, 2]}
irb(main):003:0> puts json_str.as_json
{"foo": {"bar": 1, "baz": 2}, "bat": [0, 1, 2]}
Ref:
https://ruby-doc.org/stdlib-3.1.2/libdoc/json/rdoc/JSON/Ext/Parser.html#method-i-parse
https://api.rubyonrails.org/classes/ActiveModel/Serializers/JSON.html#method-i-as_json
The two methods are in some sense exact opposites of each other:
JSON.parse parses a Ruby String containing a JSON Document into a Ruby object corresponding to the JSON Value described by the JSON document. So, it goes from JSON to Ruby.
as_json returns a simplified representation of a complex Ruby object that uses only Ruby types that can be easily represented as JSON Values. In other words, it turns an arbitrarily complex Ruby object into a Ruby object that uses only Hash, Array, String, Integer, Float, true, false, and nil which correspond to the JSON types object (really a dictionary), array, string, number, boolean, and null. The intent is that you can then easily serialize this simplified representation to JSON. So, as_json goes from Ruby (halfway) to JSON. In other words, the opposite direction from JSON.parse.
Apart from operating in opposite directions, there are some minor other differences as well:
JSON.parse is a concrete method, whereas as_json is an abstract protocol that is implemented by many different kinds of objects. (Similar to e.g. each in Ruby).
JSON.parse is part of the Ruby standard library (but not the core library, more precisely, it is part of the json gem, which is a default gem). The as_json protocol is defined by ActiveRecord's Serializers API, i.e. it is part of ActiveRecord, not Ruby.
So, why does as_json exist in the first place? Why this two-step process of converting complex Ruby objects to simpler Ruby objects and then to a JSON Document instead of going straight from complex Ruby objects to a JSON Document? Well, if you have complex Ruby objects, chances are, that no object actually fully knows how to serialize itself as a JSON Document. It has to first ask its constituent objects to serialize themselves, and then stitch it all together, and this applies recursively to the constituent objects as well. With all this stitching together of JSON Documents, there is a real risk of producing an invalid JSON Document or double-encoding some part of it, or something along that lines.
Basically, once you have serialized something to a JSON Document, then all you have is a String and all you can do is String manipulation. Whereas, if you have a richer Ruby object like Hash, Array, Integer, etc., then you can use that object's methods as well. Imagine, for example, having to merge two JSON Documents containing JSON Objects as a String compared to simply merging two Ruby Hashes.
So, the idea is to use as_json first to create a Ruby object that is simpler and less powerful than the original, but still much more powerful than a simple String. And only once you have assembled the entire thing, do you use to_json to serialize it to a JSON Document. (Or rather, the serialization framework does that for you.)
JSON.parse() parses the given JSON string and converts it to an Object,While the as_json Returns a hash representing the model.
user = User.first
user.as_json
=> {"id"=>1, "email"=>"fa18-bcs-215#cuilahore.edu.pk",
"name"=>"Noman", "user_type"=>"Manager"}
and if we apply as_json to string it simple return that string
json_str = "{\"foo\": {\"bar\": 1, \"baz\": 2}, \"bat\": [0, 1, 2]}"
json_str.as_json
=> "{\"foo\": {\"bar\": 1, \"baz\": 2}, \"bat\": [0, 1, 2]}"
if we apply JSON.parse() to string it returns hash.
JSON.parse(json_str)
=> {"foo"=>{"bar"=>1, "baz"=>2}, "bat"=>[0, 1, 2]}
JSON.parse parses a json string.
.as_json is a serialization method available to all data types, not just strings.
E.g. from the ruby docs:
user = User.find(1)
user.as_json
# => { "id" => 1, "name" => "Konata Izumi", "age" => 16,
# "created_at" => "2006-08-01T17:27:133.000Z", "awesome" => true}
But JSON.parse won't handle that:
user = User.find(1)
JSON.parse(user)
--> no implicit conversion of User into String (TypeError)
Edit
Edited based on comment from #Jorg.
JSON.parse does not return specifically a Hash.

How to parse nested json-array in streaming fashion with zio-json

For a json array like this:
[
my-json-obj1,
my-json-obj2,
my-json-obj3,
....
my-json-objN
]
And MyJsonObj class that represents a mapping of single object in array I can say:
val myJson = '''[...]'''
ZStream
.fromIterable(myJson.toSeq)
.via(JsonDecoder[MyJsonObj].decodeJsonPipeline(JsonStreamDelimiter.Array))
to parse that array in a "streaming" way, i.e. emit mapped objects as they are parsed form the input as opposed to reading all the input first and then extracting the objects.
How can I do the same if the array is nested inside a json object say like this?:
{
"hugeArray":
[
my-json-obj1,
my-json-obj2,
my-json-obj3,
....
my-json-objN
]
}
I trawled through zio-json source code, but I can't find any foothole there for this use case. I guess I could carve out that array from the json document and feed that to decodeJsonPipeline. Is there any better, json-syntax aware way of doing this? If not directly in zio-json perhaps with help of some other open source json libraries?

JSON deserialize with newtonsoft [duplicate]

I have seen the terms "deserialize" and "serialize" with JSON. What do they mean?
JSON is a format that encodes objects in a string. Serialization means to convert an object into that string, and deserialization is its inverse operation (convert string -> object).
When transmitting data or storing them in a file, the data are required to be byte strings, but complex objects are seldom in this format. Serialization can convert these complex objects into byte strings for such use. After the byte strings are transmitted, the receiver will have to recover the original object from the byte string. This is known as deserialization.
Say, you have an object:
{foo: [1, 4, 7, 10], bar: "baz"}
serializing into JSON will convert it into a string:
'{"foo":[1,4,7,10],"bar":"baz"}'
which can be stored or sent through wire to anywhere. The receiver can then deserialize this string to get back the original object. {foo: [1, 4, 7, 10], bar: "baz"}.
Serialize and Deserialize
In the context of data storage, serialization (or serialisation) is the process of translating data structures or object state into a format that can be stored (for example, in a file or memory buffer) or transmitted (for example, across a network connection link) and reconstructed later. [...]
The opposite operation, extracting a data structure from a series of bytes, is deserialization.
– wikipedia.org
JSON
JSON (JavaScript Object Notation) is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays (or other serializable values). It is a common data format with diverse uses in electronic data interchange, including that of web applications with servers.
JSON is a language-independent data format. It was derived from JavaScript, but many modern programming languages include code to generate and parse JSON-format data. JSON filenames use the extension .json.
– wikipedia.org
Explained using Python
In Python serialization does nothing else than just converting the given data structure into its valid JSON pendant (e.g., Python's True will be converted to JSON's true and the dictionary itself will be converted to a string) and vice versa for deserialization.
Python vs. JSON
You can easily spot the difference between Python and JSON representations in a side-by-side comparison. For example, by examining their Boolean values. Have a look at the following table for the basic types used in both contexts:
Python
JSON
True
true
False
false
None
null
int, float
number
str (with single ', double " and tripple """ quotes)
string (only double " quotes)
dict
object
list, tuple
array
Code Example
Python builtin module json is the standard way to do serialization and deserialization:
import json
data = {
'president': {
"name": """Mr. Presidente""",
"male": True,
'age': 60,
'wife': None,
'cars': ('BMW', "Audi")
}
}
# serialize
json_data = json.dumps(data, indent=2)
print(json_data)
# {
# "president": {
# "name": "Mr. Presidente",
# "male": true,
# "age": 60,
# "wife": null,
# "cars": [
# "BMW",
# "Audi"
# ]
# }
# }
# deserialize
restored_data = json.loads(json_data) # deserialize
Sources: realpython.com, geeksforgeeks.org
Explanation of Serialize and Deserialize using Python
In python, pickle module is used for serialization. So, the serialization process is called pickling in Python. This module is available in Python standard library.
Serialization using pickle
import pickle
#the object to serialize
example_dic={1:"6",2:"2",3:"f"}
#where the bytes after serializing end up at, wb stands for write byte
pickle_out=open("dict.pickle","wb")
#Time to dump
pickle.dump(example_dic,pickle_out)
#whatever you open, you must close
pickle_out.close()
The PICKLE file (can be opened by a text editor like notepad) contains this (serialized data):
€}q (KX 6qKX 2qKX fqu.
Deserialization using pickle
import pickle
pickle_in=open("dict.pickle","rb")
get_deserialized_data_back=pickle.load(pickle_in)
print(get_deserialized_data_back)
Output:
{1: '6', 2: '2', 3: 'f'}
Share what I learned about this topic.
What is Serialization
Serialization is the process of converting a data object into a byte stream.
What is byte stream
Byte stream is just a stream of binary data. Because only binary data can be stored or transported.
What is byte string vs byte stream
Sometime you see people use the word byte string as well. String encodings of bytes are called byte strings. Then it can explain what is JSON as below.
What’s the relationship between JSON and serialization
JSON is a string format representational of byte data. JSON is encoded in UTF-8. So while we see human readable strings, behind the scenes strings are encoded as bytes in UTF-8.

KDB: How to parse a json file?

I created a config file in JSON format, and I want to use KDB to read it in as a dictionary.
In Python, it's so easy:
with open('data.json') as f:
data = json.load(f)
Is there a similar function in KDB?
To read your JSON file into kdb+, you should use read0. This returns the lines of the file as a list of strings.
q)read0`:sample.json
,"{"
"\"name\":\"John\","
"\"age\":30,"
"\"cars\":[ \"Ford\", \"BMW\", \"Fiat\" ]"
,"}"
kdb+ allows for the de-serialisation (and serialisation) of JSON objects to dictionaries using the .j namespace. The inbuilt .j.k expects a single string of characters containing json and converts this into a dictionary. A raze should be used to flatten our list of strings:
q)raze read0`:sample.json
"{\"name\":\"John\",\"age\":30,\"cars\":[ \"Ford\", \"BMW\", \"Fiat\" ]}"
Finally, using .j.k on this string yields the dictionary
q).j.k raze read0`:sample.json
name| "John"
age | 30f
cars| ("Ford";"BMW";"Fiat")
For a particularly large JSON file, it may be more efficient to use read1 rather than raze read0 on your file, e.g.
q).j.k read1`:sample.json
name| "John"
age | 30f
cars| ("Ford";"BMW";"Fiat")
If you're interested in the reverse operation, you can use .j.j to convert a dictionary into a list of strings and use 0: to save.
Further information on the .j namespace can be found here.
You can also see more examples on the Kx wiki of read0, read1 and 0:.
Working with JSON is handled by the .j namespace where .j.j serialises and .j.k deserialises the messages. Note the you will need to use raze to convert the JSON into a single string first.
There is more information available on the Kx wiki, where the following example is presented:
q).j.k "{\"a\":[0,1],\"b\":[\"hello\",\"world\"]}"
a| 0 1
b| "hello" "world"
When using .j.j both symbols and strings in kdb will be encoded into a JSON string while kdb will decode JSON strings to kdb strings except keys where they will be symbols.
To encode a kdb table in JSON an array of objects with identical keys should be sent. kdb will also encode tables as arrays of objects in JSON.
q).j.k "[{\"a\":1,\"b\":2},{\"a\":3,\"b\":4}]"
a b
---
1 2
3 4
When encoding q will use the value of \P to choose the precision, which is by default 7 which could lead to unwanted rounding.
This can be changed with 0 meaning maximum precision although the final digits are unreliable as shown below. See here for more info https://code.kx.com/q/ref/cmdline/#-p-display-precision.
q).j.j 1.000001 1.0000001f
"[1.000001,1]"
q)\P 0
q).j.j 1.000001 1.0000001f
"[1.0000009999999999,1.0000001000000001]"

JSON decoding in PERL - Maintaining the original data type

I am writing a simple perl script to read JSON from a file and insert into MongoDB. But I am facing issues with json decoding.
All non-string values in my original json are getting converted to object type after decode_json.
Input JSON(only part of it since it's original is huge) -
{
"_id": 2006010100000801089,
"show_image" : false,
"event" : "publish",
"publish_date" :1136091600,
"data_version" : 1
}
JSON that gets inserted to MongoDB -
{
"_id": NumberLong("2006010100000801089"),
"show_image" : BinData(0,"MA=="),
"event" : "publish",
"publish_date" :NumberLong(1136091600),
"data_version" : NumberLong(1)
}
I am providing the custom _id for the documents, which I want to get converted to NumberLong type. That is working as expected as you can see from the JSON above. But notice how other non-string values for show_image, publish_date and data_version got converted to it's object representation.
Is there any way I can retain the original type for these values?
Perl code snipper that does the insert -
use MongoDB;
use MongoDB::OID;
use JSON;
use JSON::XS
while(my $record = <$source_file>) {
my $record_decoded = decode_json($record);
$db_collection->insert($record_decoded);
}
Perl version used v5.18.2.
I looked up JSON::XS docs but couldn't find a way to do this. Any help is appreciated. Thanks in advance!
I am very new to perl. Sorry if this is a trivial question.
I am providing the custom _id for the documents, which I want to get converted to NumberLong type. That is working as expected as you can see from the JSON above. But notice how other non-string values for show_image, publish_date and data_version got converted to it's object representation.
From your example all of the data types are actually matching aside from the boolean value for show_image which is currently being converted to binary data.
It is expected that numeric types are displayed as NumberLong or NumberInt when queried from the mongo shell. The mongo shell uses JavaScript, which only has a single numeric type of Number (64-bit floating point). Shell helpers like NumberLong() and NumberInt() are used to represent values in MongoDB's BSON data types that do not have a native JavaScript equivalent.
Referring to my sample JSON, I want value of show_image to be inserted as false instead of BinData(0,"MA==") and publish_date to be inserted as 1136091600 instead of NumberLong(1136091600)
While it's OK to insert publish_date as a Unixtime if that suits your use case, you may find it more useful to use MongoDB's Date type instead. There are convenience methods for querying dates including Date Aggregation Operators. FYI, date fields will be displayed in the mongo shell with an ISODate() wrapper.
The boolean value for show_image definitely needs an assist, though.
If you use Data::Dumper to inspect the result from decode_json(), you will see that the show_image field is a blessed object:
'show_image' => bless( do{\(my $o = 0)}, 'JSON::PP::Boolean' )
In order to get the expected boolean value in MongoDB, the recommended approach in the MongoDB module docs is to use the boolean module (see: MongoDB::DataTypes).
I couldn't find an obvious built-in option for JSON or JSON::XS to support serialising booleans to something other than the JSON emulated boolean class, but one solution would be to use the Data::Clean::Base module which is part of the Data::Clean::JSON distribution.
Sample snippet (excluding the MongoDB set up):
use Data::Clean::Base;
use boolean;
my $cleanser = Data::Clean::Base->new(
'JSON::XS::Boolean' => ['call_func', 'boolean::boolean'],
'JSON::PP::Boolean' => ['call_func', 'boolean::boolean']
);
while (my $record = <$source_file>) {
my $record_decoded = decode_json($record);
$cleanser->clean_in_place($record_decoded);
$db_collection->insert($record_decoded);
}
Sample record as saved in MongoDB 3.0.2:
{
"_id": NumberLong("2006010100000801089"),
"event": "publish",
"data_version": NumberLong("1"),
"show_image": false,
"publish_date": NumberLong("1136091600")
}
JSON data contains only (double-precision) numbers, strings, and the special values true, false, and null. They can be arranged in arrays or "objects" (hashes).
The MongoDB engine is converting these basic types into something more complex, but the original values are available in the hash referred to by $record_decoded, like so
$record_decoded->{_id}
$record_decoded->{show_image}
$record_decoded->{event}
$record_decoded->{publish_date}
$record_decoded->{data_version}
Is that what you wanted?
The object serialization documentation (particularly allow_tags) in JSON::XS may do something like what you want. Note, though, that this is not a standard JSON feature and will only work with JSON::XS.