Could someone please specify difference between as_json and JSON.parse?
The only difference is that JSON.parse part of Ruby Standard Library and as_json part of Rails ActiveSupport?
Rails console:
irb(main):001:0> json_str = "{\"foo\": {\"bar\": 1, \"baz\": 2}, \"bat\": [0, 1, 2]}"
irb(main):002:0> puts JSON.parse(json_str)
{"foo"=>{"bar"=>1, "baz"=>2}, "bat"=>[0, 1, 2]}
irb(main):003:0> puts json_str.as_json
{"foo": {"bar": 1, "baz": 2}, "bat": [0, 1, 2]}
Ref:
https://ruby-doc.org/stdlib-3.1.2/libdoc/json/rdoc/JSON/Ext/Parser.html#method-i-parse
https://api.rubyonrails.org/classes/ActiveModel/Serializers/JSON.html#method-i-as_json
The two methods are in some sense exact opposites of each other:
JSON.parse parses a Ruby String containing a JSON Document into a Ruby object corresponding to the JSON Value described by the JSON document. So, it goes from JSON to Ruby.
as_json returns a simplified representation of a complex Ruby object that uses only Ruby types that can be easily represented as JSON Values. In other words, it turns an arbitrarily complex Ruby object into a Ruby object that uses only Hash, Array, String, Integer, Float, true, false, and nil which correspond to the JSON types object (really a dictionary), array, string, number, boolean, and null. The intent is that you can then easily serialize this simplified representation to JSON. So, as_json goes from Ruby (halfway) to JSON. In other words, the opposite direction from JSON.parse.
Apart from operating in opposite directions, there are some minor other differences as well:
JSON.parse is a concrete method, whereas as_json is an abstract protocol that is implemented by many different kinds of objects. (Similar to e.g. each in Ruby).
JSON.parse is part of the Ruby standard library (but not the core library, more precisely, it is part of the json gem, which is a default gem). The as_json protocol is defined by ActiveRecord's Serializers API, i.e. it is part of ActiveRecord, not Ruby.
So, why does as_json exist in the first place? Why this two-step process of converting complex Ruby objects to simpler Ruby objects and then to a JSON Document instead of going straight from complex Ruby objects to a JSON Document? Well, if you have complex Ruby objects, chances are, that no object actually fully knows how to serialize itself as a JSON Document. It has to first ask its constituent objects to serialize themselves, and then stitch it all together, and this applies recursively to the constituent objects as well. With all this stitching together of JSON Documents, there is a real risk of producing an invalid JSON Document or double-encoding some part of it, or something along that lines.
Basically, once you have serialized something to a JSON Document, then all you have is a String and all you can do is String manipulation. Whereas, if you have a richer Ruby object like Hash, Array, Integer, etc., then you can use that object's methods as well. Imagine, for example, having to merge two JSON Documents containing JSON Objects as a String compared to simply merging two Ruby Hashes.
So, the idea is to use as_json first to create a Ruby object that is simpler and less powerful than the original, but still much more powerful than a simple String. And only once you have assembled the entire thing, do you use to_json to serialize it to a JSON Document. (Or rather, the serialization framework does that for you.)
JSON.parse() parses the given JSON string and converts it to an Object,While the as_json Returns a hash representing the model.
user = User.first
user.as_json
=> {"id"=>1, "email"=>"fa18-bcs-215#cuilahore.edu.pk",
"name"=>"Noman", "user_type"=>"Manager"}
and if we apply as_json to string it simple return that string
json_str = "{\"foo\": {\"bar\": 1, \"baz\": 2}, \"bat\": [0, 1, 2]}"
json_str.as_json
=> "{\"foo\": {\"bar\": 1, \"baz\": 2}, \"bat\": [0, 1, 2]}"
if we apply JSON.parse() to string it returns hash.
JSON.parse(json_str)
=> {"foo"=>{"bar"=>1, "baz"=>2}, "bat"=>[0, 1, 2]}
JSON.parse parses a json string.
.as_json is a serialization method available to all data types, not just strings.
E.g. from the ruby docs:
user = User.find(1)
user.as_json
# => { "id" => 1, "name" => "Konata Izumi", "age" => 16,
# "created_at" => "2006-08-01T17:27:133.000Z", "awesome" => true}
But JSON.parse won't handle that:
user = User.find(1)
JSON.parse(user)
--> no implicit conversion of User into String (TypeError)
Edit
Edited based on comment from #Jorg.
JSON.parse does not return specifically a Hash.
Related
I have this JSON String that comes from a Kafka topic
{"id": 12345, "items": {"unit17": 0, "unit74": 0, "unit42": 0, "unit96": 0, "unit13": 0, "unit16": 0, "unit11": 0, "z10": 0, "z0": 1}}
By using spray-json (version 1.3.5), I want to parse it so I do:
val parsedStream = stream.map( event => event.parseJson )
This is working well BUT when using "parseJson" the nested JSON for items is alphabetically ordered and within a List:
parsedStream.print()
--> List(12345,{"unit11": 0, "unit13": 0, "unit16": 0, "unit17":0, "unit42": 0, "unit74": 0, "unit96": 0, "z0": 1, "z10": 0}}
Any ideas why spray-json is behaving like this and ordering it automatically? Any settings to avoid this or options to apply?
I have tried to do the same with another library, play-json (play.api.libs.json.Json) and it is properly working so I could use this one but I was curious if I was missing something with spray-json:
val parsedStream = stream.map( event => Json.parse(event))
parsedStream.print()
--> {"id":12345,"items":{"unit17":0,"unit74":0,"unit42":0,"unit96":0,"unit13":0,"unit16":0,"unit11":0,"z10":0,"z0":1}}
Finally, I would like to feed a case class with the values of that stream; for this I've implemented the next elements:
case class MyCaseClass(id:Int,items:Map[String,Int])
val parsedStream = stream.map{ value => MyCaseClass( (value \ "id").as[Int],
(value \ "items").as[Map[String,Int]] ) }
That works smoothly BUT since I'm using a Map for "items" attribute, the order in the JSON is not forcefully kept, and I need it to be, for this will be sent to a model to predict some stuff and the model training has been done with the same items order that is produced by the Kafka topic. I know scala provides a ListMap that is a kind of "ordered" map with an interface of List but when using it like this:
case class MyCaseClass(id:Int,items:ListMap[String,Int])
val parsedStream = stream.map{ value => MyCaseClass( (value \ "id").as[Int],
(value \ "items").as[ListMap[String,Int]] ) }
the compiler states "No Json deserializer found for type scala.collection.immutable.ListMap[String,Int]. Try to implement an implicit Reads or Format for this type.", I guess because there is not a convenient Play's automatic JSON macros for this type and it should be implemented. Do you have any hint on how to do that for a ListMap or any other way I could achieve to have the items in my case class with the same order they come in the JSON? Maybe change the "items" JSON String to a JSON Array could do the trick, any hint on how I could achieve that and if it is a good idea?
EDIT - Resolved with circe-json
After some tests with play-json it was a bit tricky to use with ListMap for the Reads macros, so I read and went to experiment about circe-json and with this last it was quite straight forward for both, the order in the parsing and the conversion to ListMap. It is enough with just some imports and one line of code to have the parsing done and the conversion to the case class:
import io.circe._
import io.circe.parser._
import io.circe.generic.auto._
[...]
case class MyCaseClass(id:Int,items:ListMap[String,Int])
[...]
val streamParsed = stream.map{ event => parser.decode[MyCaseClass](event) match {
case Left(failure) => [...]
case Right(myCaseClassInstance) => { myCaseClassInstance }
}}
[...]
That will automatically parse keeping the order and generate a DataStream[MyCaseClass] using the ListMap to still keep the order in the generated map. The mapping from JSON to ListMap is possible with the "io.circe.generic.auto._" that supports ListMap and transparently takes care of this conversion.
The JSON format does not guarantee ordering of the keys, and Spray is respecting the spec.
Like bellam said, you might want to use a library that does more than the spec, and keeps your keys ordered, like circe.
This seems to be an open issue on github introduced in 1.3.0. You could try downgrading to 1.2.6 or use another json lib like circe
https://github.com/spray/spray-json/issues/119
I tested putting just random integer value in a field during a laravel validation. It is returning it as a valid json. Then I tested it at https://jsonlint.com/?code= and that is also returning it as a valid JSON. I am a beginner so Can anyone please explain how it is a valid json?
Very early on, the definition of JSON was that it had to have an object or array at the top level, but that was quickly abandoned in favor of allowing any valid value at the top level. So all of these are valid JSON:
A number on its own:
42
A string on its own:
"question"
A boolean on its own:
false
An object:
{"answer": 42}
An array:
["one", "two", "three"]
More on json.org and in the standard.
I have seen the terms "deserialize" and "serialize" with JSON. What do they mean?
JSON is a format that encodes objects in a string. Serialization means to convert an object into that string, and deserialization is its inverse operation (convert string -> object).
When transmitting data or storing them in a file, the data are required to be byte strings, but complex objects are seldom in this format. Serialization can convert these complex objects into byte strings for such use. After the byte strings are transmitted, the receiver will have to recover the original object from the byte string. This is known as deserialization.
Say, you have an object:
{foo: [1, 4, 7, 10], bar: "baz"}
serializing into JSON will convert it into a string:
'{"foo":[1,4,7,10],"bar":"baz"}'
which can be stored or sent through wire to anywhere. The receiver can then deserialize this string to get back the original object. {foo: [1, 4, 7, 10], bar: "baz"}.
Serialize and Deserialize
In the context of data storage, serialization (or serialisation) is the process of translating data structures or object state into a format that can be stored (for example, in a file or memory buffer) or transmitted (for example, across a network connection link) and reconstructed later. [...]
The opposite operation, extracting a data structure from a series of bytes, is deserialization.
– wikipedia.org
JSON
JSON (JavaScript Object Notation) is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays (or other serializable values). It is a common data format with diverse uses in electronic data interchange, including that of web applications with servers.
JSON is a language-independent data format. It was derived from JavaScript, but many modern programming languages include code to generate and parse JSON-format data. JSON filenames use the extension .json.
– wikipedia.org
Explained using Python
In Python serialization does nothing else than just converting the given data structure into its valid JSON pendant (e.g., Python's True will be converted to JSON's true and the dictionary itself will be converted to a string) and vice versa for deserialization.
Python vs. JSON
You can easily spot the difference between Python and JSON representations in a side-by-side comparison. For example, by examining their Boolean values. Have a look at the following table for the basic types used in both contexts:
Python
JSON
True
true
False
false
None
null
int, float
number
str (with single ', double " and tripple """ quotes)
string (only double " quotes)
dict
object
list, tuple
array
Code Example
Python builtin module json is the standard way to do serialization and deserialization:
import json
data = {
'president': {
"name": """Mr. Presidente""",
"male": True,
'age': 60,
'wife': None,
'cars': ('BMW', "Audi")
}
}
# serialize
json_data = json.dumps(data, indent=2)
print(json_data)
# {
# "president": {
# "name": "Mr. Presidente",
# "male": true,
# "age": 60,
# "wife": null,
# "cars": [
# "BMW",
# "Audi"
# ]
# }
# }
# deserialize
restored_data = json.loads(json_data) # deserialize
Sources: realpython.com, geeksforgeeks.org
Explanation of Serialize and Deserialize using Python
In python, pickle module is used for serialization. So, the serialization process is called pickling in Python. This module is available in Python standard library.
Serialization using pickle
import pickle
#the object to serialize
example_dic={1:"6",2:"2",3:"f"}
#where the bytes after serializing end up at, wb stands for write byte
pickle_out=open("dict.pickle","wb")
#Time to dump
pickle.dump(example_dic,pickle_out)
#whatever you open, you must close
pickle_out.close()
The PICKLE file (can be opened by a text editor like notepad) contains this (serialized data):
€}q (KX 6qKX 2qKX fqu.
Deserialization using pickle
import pickle
pickle_in=open("dict.pickle","rb")
get_deserialized_data_back=pickle.load(pickle_in)
print(get_deserialized_data_back)
Output:
{1: '6', 2: '2', 3: 'f'}
Share what I learned about this topic.
What is Serialization
Serialization is the process of converting a data object into a byte stream.
What is byte stream
Byte stream is just a stream of binary data. Because only binary data can be stored or transported.
What is byte string vs byte stream
Sometime you see people use the word byte string as well. String encodings of bytes are called byte strings. Then it can explain what is JSON as below.
What’s the relationship between JSON and serialization
JSON is a string format representational of byte data. JSON is encoded in UTF-8. So while we see human readable strings, behind the scenes strings are encoded as bytes in UTF-8.
I'm trying to get a Hash from a Json file that has Array Keys, but its return each array key like string.
hash = {[10, 10] => [[1, 1], [5, 5]]}
p JSON.parse(hash.to_json) #=> {"[10, 10]" => [[1, 1], [5, 5]]}
Maybe i should use YAML, any idea?
There are three slightly different versions JSON, as specified by
The original JSON.org website
ECMA-404 – The JSON Data Interchange Syntax
RFC8259 – The JavaScript Object Notation (JSON) Data Interchange Format
While there are small differences between the three, one thing they all agree on: Object Keys are Strings. Always.
In other words, "a Json file that has Array Keys" cannot possibly exist. Whatever you have, it is either a JSON file, but then it cannot have Array Keys, or it is simply not a JSON file.
I am writing a simple perl script to read JSON from a file and insert into MongoDB. But I am facing issues with json decoding.
All non-string values in my original json are getting converted to object type after decode_json.
Input JSON(only part of it since it's original is huge) -
{
"_id": 2006010100000801089,
"show_image" : false,
"event" : "publish",
"publish_date" :1136091600,
"data_version" : 1
}
JSON that gets inserted to MongoDB -
{
"_id": NumberLong("2006010100000801089"),
"show_image" : BinData(0,"MA=="),
"event" : "publish",
"publish_date" :NumberLong(1136091600),
"data_version" : NumberLong(1)
}
I am providing the custom _id for the documents, which I want to get converted to NumberLong type. That is working as expected as you can see from the JSON above. But notice how other non-string values for show_image, publish_date and data_version got converted to it's object representation.
Is there any way I can retain the original type for these values?
Perl code snipper that does the insert -
use MongoDB;
use MongoDB::OID;
use JSON;
use JSON::XS
while(my $record = <$source_file>) {
my $record_decoded = decode_json($record);
$db_collection->insert($record_decoded);
}
Perl version used v5.18.2.
I looked up JSON::XS docs but couldn't find a way to do this. Any help is appreciated. Thanks in advance!
I am very new to perl. Sorry if this is a trivial question.
I am providing the custom _id for the documents, which I want to get converted to NumberLong type. That is working as expected as you can see from the JSON above. But notice how other non-string values for show_image, publish_date and data_version got converted to it's object representation.
From your example all of the data types are actually matching aside from the boolean value for show_image which is currently being converted to binary data.
It is expected that numeric types are displayed as NumberLong or NumberInt when queried from the mongo shell. The mongo shell uses JavaScript, which only has a single numeric type of Number (64-bit floating point). Shell helpers like NumberLong() and NumberInt() are used to represent values in MongoDB's BSON data types that do not have a native JavaScript equivalent.
Referring to my sample JSON, I want value of show_image to be inserted as false instead of BinData(0,"MA==") and publish_date to be inserted as 1136091600 instead of NumberLong(1136091600)
While it's OK to insert publish_date as a Unixtime if that suits your use case, you may find it more useful to use MongoDB's Date type instead. There are convenience methods for querying dates including Date Aggregation Operators. FYI, date fields will be displayed in the mongo shell with an ISODate() wrapper.
The boolean value for show_image definitely needs an assist, though.
If you use Data::Dumper to inspect the result from decode_json(), you will see that the show_image field is a blessed object:
'show_image' => bless( do{\(my $o = 0)}, 'JSON::PP::Boolean' )
In order to get the expected boolean value in MongoDB, the recommended approach in the MongoDB module docs is to use the boolean module (see: MongoDB::DataTypes).
I couldn't find an obvious built-in option for JSON or JSON::XS to support serialising booleans to something other than the JSON emulated boolean class, but one solution would be to use the Data::Clean::Base module which is part of the Data::Clean::JSON distribution.
Sample snippet (excluding the MongoDB set up):
use Data::Clean::Base;
use boolean;
my $cleanser = Data::Clean::Base->new(
'JSON::XS::Boolean' => ['call_func', 'boolean::boolean'],
'JSON::PP::Boolean' => ['call_func', 'boolean::boolean']
);
while (my $record = <$source_file>) {
my $record_decoded = decode_json($record);
$cleanser->clean_in_place($record_decoded);
$db_collection->insert($record_decoded);
}
Sample record as saved in MongoDB 3.0.2:
{
"_id": NumberLong("2006010100000801089"),
"event": "publish",
"data_version": NumberLong("1"),
"show_image": false,
"publish_date": NumberLong("1136091600")
}
JSON data contains only (double-precision) numbers, strings, and the special values true, false, and null. They can be arranged in arrays or "objects" (hashes).
The MongoDB engine is converting these basic types into something more complex, but the original values are available in the hash referred to by $record_decoded, like so
$record_decoded->{_id}
$record_decoded->{show_image}
$record_decoded->{event}
$record_decoded->{publish_date}
$record_decoded->{data_version}
Is that what you wanted?
The object serialization documentation (particularly allow_tags) in JSON::XS may do something like what you want. Note, though, that this is not a standard JSON feature and will only work with JSON::XS.