Retrieving MySQL BLOB field with Ruby/Rails - mysql

I make this simple MySQL query using ActiveRecord::Base
sql = "SELECT * FROM schedules WHERE id = 1"
schedule = (ActiveRecord::Base.connection.select_rows sql)[0]
It happens that schedule[9] is BLOB data, but it gets retrieved as a ruby String object. Is that normal? How are BLOB objects represented in ruby? Coming from the Objective-C world, BLOB data is usually represented by NSData objects. Is there some kind of equivalent in Ruby?

Strings in ruby are just a sequence of arbitrary bytes - there is no separate data type.
Strings can be given an encoding which tells ruby to interpret the bytes as utf-8, utf-16, ISO-latin etc. when doing various operations on them but there's also the ASCII-8bit encoding (bit of a misnomer) which just means arbitrary bytes.

If it isn't a serialized record in your model than a Blob is nothing other than a string which doesn't have size constraints. Strings are the base "data" object in Ruby.

Related

Question about JSON vs CBOR serialization

I have a theorical doubt about how serialization works, and especially about the difference between serialization schemes like JSON and binary serialization schemes like CBOR.
My question is: if a JSON serializer converts an object into a JSON string, then, to store or transmit the resulting JSON string, do you have to also convert the JSON string into its bytes representation? Is this why binary schemes might be faster, since they produce a binary output already?
In memory, a string is anyway represented as a sequence of bytes (actually, everything is just a sequence of bytes in memory), so this should not matter.
What matters is the conversion from the in-memory representation of a Javascript variable into its in-memory representation of its string equivalent. An extremely simply example is a numeric variable with value -1. This can be internally represented by one byte:
> Buffer.of(-1)
<Buffer ff>
but its JSON serialization "-1" takes two bytes:
> Buffer.from(JSON.stringify(-1))
<Buffer 2d 31>
This should give an idea why a binary scheme that sticks closer to the internal representation can be output (and input) faster.

Varchar or Blob object for very large string? in Mysql through Eclipselink

I have an application where I am going to store the JSON string in MySql database through Eclipselink JPA.
The JSON string can be of any length. Most of the time a String from a JSON file of length around 200 to 300 lines.
What is the best way to store the string? To use varchar or Blob?
Please provide an example if any.
You should not save it as a BLOB, as it is primarily used for image data or other binary data... Use Varchar() or use TEXT which has a size of 65535 characters if you are unsure about how many characters you might need to store..
There was a thread previously discussing WHEN to use varchar or text: Thread
To store text - use a TEXT column (or even a LONGTEXT), blobs are for binary.
Also if you're on Mysql 5.7+ - there's now a JSON data type, which is checked for being a correct json, stored more efficiently and have pretty manipulation methods

How to convert between BSON and JSON, especially for those special objects?

I am not asking for any libraries to do so and I am just writing code for bson_to_json and json_to_bson.
so here is the BSON specification.
For regular double, doc, array, string, it is fine and it is easy to convert between BSON and JSON.
However, for those particular objects, such as
Timestamp and UTC:
If convert from JSON to BSON, how can I know they are timestamp and utc?
Regex (string, string), JavaScript code with scope (string, doc)
their structures have multiple parts, how can I present the structures in JSON?
Binary data (generic, function, etc)`
How can I present the type of binary data in JSON?
int32 and int64
How can I present them in JSON, so BSON can know which is 32 bit or 64 bit?
Thanks
As we know JSON cannot express objects so you will need to decide how you want the stringified version of the BSON objects (field types) to be represented within the output of your ocaml driver.
Some of the data types are easy, Timestamp is not needed since it is internal to sharding only and Javascript blocks are best left out due to the fact that they are best used only within system.js as saved functions for use in MRs.
You also gotta consider that some of these fields are actually both in and out. What I mean by in and out is that some are used to specify input documents to be serialised to BSON and some are part of output document that need deserialising from BSON into JSON.
Regex is one which will most likely be a field type you send down. As such you will need to serialise your ocaml object to the BSON equivilant of {$regex: 'd', '$options': 'ig'} from /d/ig PCRE representation.
Dates can be represented in JSON by either choosing to use the ISODate string or a timestamp for the representation. The output will be something like {$sec:556675,$usec:6787} and you can convert $sec to the display you need.
Binary data in JSON can be represented by taking the data (if I remember right) property from the output document and then encoding that to base 64 and storing it as a stirng in the field.
int32 and int64 has no real definition between the two in JSON except that 64bit ints will be bigger than 2147483647 so I am unsure if you can keep the data types unique there.
That should help get you started.

What is BSON and exactly how is it different from JSON?

I am just starting out with MongoDB and one of the things that I have noticed is that it uses BSON to store data internally. However the documentation is not exactly clear on what BSON is and how it is used in MongoDB. Can someone explain it to me, please?
BSON is the binary encoding of JSON-like documents that MongoDB uses when storing documents in collections. It adds support for data types like Date and binary that aren't supported in JSON.
In practice, you don't have to know much about BSON when working with MongoDB, you just need to use the native types of your language and the supplied types (e.g. ObjectId) of its driver when constructing documents and they will be mapped into the appropriate BSON type by the driver.
What's BSON?
BSON [bee · sahn], short for Bin­ary JSON, is a bin­ary-en­coded
seri­al­iz­a­tion of JSON-like doc­u­ments.
How is it different from JSON?
BSON is designed to be efficient in space, but in some cases is not much more efficient than JSON. In some cases BSON uses even more space than JSON. The reason for this is another of the BSON design goals: traversability. BSON adds some "extra" information to documents, like length of strings and subobjects. This makes traversal faster.
BSON is also designed to be fast to encode and decode. For example, integers are stored as 32 (or 64) bit integers, so they don't need to be parsed to and from text. This uses more space than JSON for small integers, but is much faster to parse.
In addition to compactness, BSON adds additional data types unavailable in JSON, notably the BinData and Date data types.
Source: http://bsonspec.org/
MongoDB represents JSON documents in binary-encoded format so we call it BSON behind the scenes.
BSON extends the JSON model to provide additional data types such  as Date and binary which are not supported in JSON also provide ordered fields in order for it to be efficient for encoding and decoding within different languages. 
In other word we can say  BSON is just binary JSON ( a superset of JSON with some more data types, most importantly binary byte array ).
Mongodb using as a serialization format of JSON include with encoding format for storing and accessing documents. simply we can say BSON is a binary encoded format for JSON data.
for more mongoDB Article : https://om9x.com/blog/bson-vs-json/
MongoDB represents JSON documents in binary-encoded format called BSON behind the scenes. BSON extends the JSON model to provide additional data types and to be efficient for encoding and decoding within different languages.
By using BSON encoding on top of JSON, MongoDB gets the capability of creating indexes on top of values that resides inside the JSON document in raw format. This helps in running efficient analytical queries as NoSQL system were known for having no support for Indexes.
This relatively short article gives a pretty good explanation of BSON and JSON: It talks about some of the problems with JSON, why BSON was invented, what problems it solves compared to JSON and how it could benefit you.
https://www.compose.com/articles/from-json-to-bson-and-back/
In my use case that article told me that serializing to JSON would work for me and I didn't need to serialize to BSON
To stay strictly within the boundaries of the OP question:
What is BSON?
BSON is a specification for a rich set of scalar types (int32, int64, decimal, date, etc.) plus containers (object a.k.a. a map, and array) as they might appear in a byte stream. There is no "native" string form of BSON; it is a byte[] spec. To work with this byte stream, there are many native language implementations available that can turn the byte stream into actual types appropriate for the language. These are called codecs. For example, the Java implementation of a BSON codec into the Document class from MongoDB turns objects into something that implements java.util.Map. Dates are decoded into java.util.Date. Transmitting BSON looks like this in, for example, Java and python:
Java:
import org.bson.*;
MyObject --> get() from MyObject, set() into org.bson.Document --> org.bson.standardCodec.encode(Document) to byte[]
XMIT byte[]
python:
import bson
byte[] --> bson.decode(byte[]) to dict --> get from dict --> do something
There are no to- and from- string calls involved. There is no parser. There is nothing about whitespace and double quotes and escaped characters. Dates, BigDecimal, and arrays of Long captured on the Java side reappear in python as datetime.datetime, Decimal, and array of int.
In comparison, JSON is a string. There is no codec for JSON. Transmitting JSON looks like this:
MyObject --> convert to JSON (now you have a big string with quotes and braces and commas)
XMIT string
parse string to dict (or possibly a class via a framework)
Superficially this looks the same but the JSON specification for scalars has only strings and "number" (leaving out bools and nulls, etc.). There is no direct way to send a long or a BigDecimal from sender to receiver in JSON; they are both just "number". Furthermore, JSON has no type for plain byte array. All non-ASCII data must be base64 or otherwise encoded in a way to protect it and sent as a string. BSON has a byte array type. The producer sets it, the consumer gets it. There is no secondary processing of strings to turn it back into the desired type.
How does MongoDB use BSON?
To start, it is the wire protocol for content. It also is the on-disk format of data. Because varying length types (most notably string) carry length information in the BSON spec, this permits MongoDB to performantly traverse an object (hopping field to field). Finding the object in a collection is more than just BSON including use of indexes.

SQL storing MD5 in char column

I have a column of type char(32) where I want to store an MD5 hash key. The problem is i've used SQL to update the existing records using HashBytes() function which creates values like
:›=k! ©úw"5Ýâ‘<\
but when I do the insert via .NET it comes through as
3A9B3D6B2120A9FA772235DDE2913C5C
What do I need to do to get these to match up? Is it the encoding?
HashKey isn't a SQL function, did you mean HASHBYTES? Some actual code would help. SQL appears to be computing the raw binary hash and displaying it as ASCII characters.
.NET is computing the hash, then converting it to hexadecimal (or so it appears). CHAR(32) isn't a good way to store raw binary data, you would want to use the BINARY type.
An Example in SQL:
SELECT SUBSTRING(sys.fn_varbintohexstr(HASHBYTES('MD5',0x2040)),3, 32)
And an Example in .NET:
using (MD5 md5 = MD5.Create())
{
var data = new byte[] { 0x20, 0x40 };
var hashed = md5.ComputeHash(data);
var hexHash = BitConverter.ToString(hashed).Replace("-", "");
Console.Out.WriteLine("hexHash = {0}", hexHash);
}
These will both produce the same value. (Where 0x2040 is sample data).
You can either store the hexadecimal data as CHAR(32), or as BINARY(16). Storing the Binary data is twice as space efficient than storing it as hex. What you should not be doing is storing the binary data as CHAR(16).
It's not clear what you mean by "when I do the insert via .NET" - but you shouldn't be storing binary data just in a raw form, as it looks like your'e doing using HashKey(). (Do you definitely mean HashKey by the way? I can't find a reference for it, but there's HashBytes...)
Two common options are to encode the raw binary data as hex - which it looks like you're doing in the second case - or to use base64. Either way should be easy from .NET (Base64 marginally easier, using Convert.ToBase64String) and you probably just need to find the equivalent SQL Server function.
MD5 is typically stored as in hex encoding. I'd guess that your hashkey() SQL function is not hex encoding the MD5 hash, rather it's just returning the ASCII characters representing the hash. But your .NET method is HEX encoding. If you store your MD5 hashing consistently as HEX (or not - up to you but usually stored as HEX), then the results between the two should always be consistent.
For example, the : symbol from your SQL hash is the first character returned from HashKey(). In the .NET method, the first 2 characters are 3A. 31 is 51 in decimal. ASCII code 51 is the colon (:) character. Similarly, you can work your way through each other character, and do the HEX conversion.
See any ASCII codes table for reference, i.e. http://www.asciitable.com/