This question already has answers here:
What is Serialization?
(16 answers)
Closed 2 years ago.
I've seen the term "serialized" all over, but never explained. Please explain what that means.
Serialization usually refers to the process of converting an abstract datatype to a stream of bytes (You sometimes serialize to text, XML or CSV or other formats as well. The important thing is that it is a simple format that can be read/written without understanding the abstract objects that the data represents). When saving data to a file, or transmitting over a network, you can't just store a MyClass object, you're only able to store bytes. So you need to take all the data necessary to reconstruct your object, and turn that into a sequence of bytes that can be written to the destination device, and at some later point read back and deserialized, reconstructing your object.
Serialization is the process of taking an object instance and converting it to a format in which it can be transported across a network or persisted to storage (such as a file or database). The serialized format contains the object's state information.
Deserialization is the process of using the serialized state to reconstruct the object from the serialized state to its original state.
real simple explanation, serialization is the act of taking something that is in memory like an instance of a class (object) and transforming into a structure suitable for transport or storage.
A common example is XML serialization for use in web services - I have an instance of a class on the server and need to send it over the web to you, I first serialize it into xml which means to create an xml version of that data in the class, once in xml I can use a transport like HTTP to easily send it.
There are several forms of serialization like XML or JSON.
There are (at least) two entirely different meanings to serialization. One is turning a data structure in memory into a stream of bits, so it can be written to disk and reconstituted later, or transmitted over a network connection and used on another machine, etc.
The other meaning relates to serial vs. parallel execution -- i.e. ensuring that only one thread of execution does something at a time. For example, if you're going to read, modify and write a variable, you need to ensure that one thread completes a read, modify, write sequence before another can start it.
What they said. The word "serial" refers to the fact that the data bytes must be put into some standardized order to be written to a serial storage device, like a file output stream or serial bus. In practice, the raw bytes seldom suffice. For example, a memory address from the program that serializes the data structure may be invalid in the program that reconstructs the object from the stored data. So a protocol is required. There have been many, many standards and implementations over the years. I remember one from the mid 80's called XDR, but it was not the first.
You have data in a certain format (e.g. list, map, object, etc.)
You want to transport that data (e.g. via an API or function call)
The means of transport only supports certain data types (e.g. JSON, XML, etc.)
Serialization: You convert your existing data to a supported data type so it can be transported.
The key is that you need to transport data and the means by which you transport only allows certain formats. Your current data format is not allowed so you must "serialize" it. Hence as Mitch answered:
Serialization is the process of taking an object instance and converting it to a format in which it can be transported.
Related
I am writing an application where I am communicating with an SQL server, which provides an array of bytes in a blob field. I have a TObjectDictionary where I store objects, and each object stores the start byte and the number of bytes I need to read, and convert it to the required datatype.
The objects in the TObjectDictionary are referring to different SQL queries. So, to reduce the response time, my plan is to fire the queries at the same time, and whenever one of them finishes, it updates the global TObjectDictionary.
I know TObjectDictionary itself is not thread-safe, and if another thread would delete the object from the TObjectDictionary, I would have an issue, but this won't happen. Also, 2 or more threads won't be writing the same object.
At the moment I use TCriticalSection, so only 1 thread writes to objects in the dictionary, but I was wondering if this is really necessary?
Most RTL containers, including TObjectDictionary, are NOT thread-safe and do require adequate cross-thread serialization to avoid problems. Even just the act of adding an object to the dictionary will require protection. A TCriticalSection would suffice for that.
Unless you add all of the objects to the dictionary from the main thread before then starting worker threads that just access the existing objects and don't add/remove objects. Then you shouldn't need to serialize access to the dictionary.
We are a data team working with our data producers to store and process infrastructure log data. Workers running at our client systems generate log data which are primarily in json format.
There is no defined structure to the json data as it depends on multiple factors like # of clusters run by the client, tokens generated etc. There is some definite structure to the top-level json elements that contain the metadata where the logs are generated. Actual data can go into multiple levels of nesting and varying key-value pairs.
I want to build a sytem to ingest these logs, parse them and present in a way where engineers and PMs(prod managers) can read the data for analytics usecases.
My initial plan is to setup a compute layer like Kinesis to the source and write parsing logic to store the outcome in s3. However, this would need prior knowledge of the json file itself.
I define a parser module to process the data based on the log type. For every incoming log, my compute(kinesis?) directs data processing to corresponding parser module and emit data into s3.
However, I am starting to explore if any different storage engine(elastic etc.) will fit my usecase better. I am wondering if anyone as run into such usecases and what did you find helpful in solving the problem
I need to keep numbers passed in the JSON that are fit the requirements to be an int32s to be stored in the database as int32 instead of a float.
I have a Go application that receives JSON request data that is inserted into a Mongo database. This data is unmarshalled into an interface and passed into mgo's Insert method (https://godoc.org/github.com/globalsign/mgo#Collection.Insert).
By default, to unmarshal data into an interface, Go converts numbers into float64s. The problem I am having is that integers unmarshalled from the JSON are not having their types preserved properly, all numbers are passed in as floats and after the insertion into Mongo, the data is saved as a float. Understandably, this is due to the restrictions on typing when using JSON as data holder.
Since the JSON data is not well-defined and may include different types of nested dynamic objects, creating structs for this problem doesn't appear to be a viable option.
I have researched some options that include iterating over the JSON data and using the UseNumbers() for decoding, but issue I am facing with this is that the operation could be expensive, as some of the JSON data may contain several hundreds of fields and reflection over those fields may cause slowdown while handling requests, although I am not aware of how impactful the performance hit would be.
I am storing the info contained in a JSON file in Redis. I am doing it with the nodejs redis driver. Do you think that I am losing something if I am employing a hashtable for storing the info?
The info is simply a large array (several thousands) of elements (several fields within every element, no more than 50 fields sometimes) in the data and a small bunch of properties in the meta.
I understand that you're storing those JSON strings as follows:
hset some-key some-sub-key <the json>
Actually there's another valid approach which involves using the global key space directly:
set some-key:sub-key <the json>
If you're just storing those JSON strings I would say that creating global space keys is the simplest and most effective approach in your case.
What do you mean by losing something? Storing values(JSON) and retrieving them in Redis could be really fast. Plus Redis comes with some very handy APIs like TTL, FLUSHALL etc...
Personally, I'm using Redis for my Profile page. I store my image uploads in Redis and never had an issue.
My profile page: http://fanjin.computer
Github repo: https://github.com/bfwg/relay-gallery
Although this question has been answered, for future reference some might be asking the same question but looking for a different answer (like me).
If that's the case I would suggest looking in to RedisJSON for creating a JSON type in redis.
I'm using Orion to store context information and I'm interested to store binary data (Array of bytes) in the attributes.
Is it possible in the current version (1.1.0)?
Thanks in advance.
The Short answer is no, it's not possible store binary data on version 1.1.0.
It's happens because Orion Context Broker uses a Restful API, all data are transported through text in XML format (very old versions) or JSON(latest version) and use MongoDB as storage enginer, MongoDB stores objects in a binary format called BSON. BinData is a BSON data type for a binary byte array. However, MongoDB objects are typically limited to 4MB in size. To deal with this, files are “chunked” into multiple objects that are less than 4MB each. This has the added advantage of letting us efficiently retrieve a specific range of the given file. But BSON data is not supported by Orion, and certainly will do not because Orion Context Broker was not designed to store store binary data.
You can use some alternatives:
Use a separate file server and reference it by URL or another server side technology, you can also use other Fiware GE's like CKAN or ObjectStorage for example
Convert binary data to hexadecimal, then it will be on a AlphaNumeric data, and on receive you can convert it back to binary data. There is some examples in Python, PHP, Java and C++ how to manipulate binary as hexadecimal.