Correct way to store a bit array - mysql

I'm working on a project that needs to store something like
101110101010100011010101001
into the database. It's not a file or archive: it's only a bit array, and I think that storing it into a varchar column is waste of space/performance.
I've searched about the BLOB and the VARBINARY type. But both of then allows to insert a value like 54563423523515453453, that's not exactly a bit array.
For sure, if I store a bit array like 10001000 into a BLOB/varbinary/varchar column, it will consume more than a byte, and I want that the minimum space is consumed. In the case of eight bits, it needs to consume only one byte, 16 bits two bytes, and so on.
If it's not possible, then what is the best approach to waste the minimum amount of space in this case?
Important notes: The size of the array is variable, and is not divisible by eight in every situation. Sometimes I will need to store 325 bits, other times 7143 bits....

In one of my previous projects, I converted streams of 1's and 0' to decimal, but they were shorter. I dont know if that would be applicable in your project.
On the other hand, imho, you should clarify what will you need to do with that data once you get it stored. Search? Compare? It might largely depend on the purpose of the database.
Could you gzip it and then store it? Is that applicable?

Binary is a string representation of a number. The string
101110101010100011010101001
represents the number
... + 1*25 + 0*24 + 1*23 + 0*22 + 0*21 + 1*20
As such, it can be stored in a 32-bit integer if were to be converted from a binary string to the number it represents. In Perl, one would use
oct('0b'.$binary)
But you have a variable number of bits. Not a problem! Just process them 8 at a time to create a string of bytes to place in a BLOB or similar.
Ah, but there's a catch. You'll need to add padding to get a number divisible by 8, which means you'll have to use a means of removing that padding. A simple approach if there's a known maximum length is to use a length prefix. e.g. If you know the number of bits is never going to exceed 65,535, encode the number of bits in the first two bytes of the string.
pack('nB*', length($binary), $binary)
which is reverted using
my ($length, $binary) = unpacked('nB*', $packed);
substr($binary, $length) = '';

Related

Data type for huge binary numbers

I have to handle huge binary numbers (<=4096 digits) - what is the best way to handle such big numbers? I have to multiply them afterward and apply the %-operation on these numbers.
Do I have to use structs or how am I supposed to handle such data?
If you've got it as a string of 4096 digit, you can convert it into a list with separate smaller chunks (eg into bytes each consisting of 8 bits), then if you need to multiply/apply the %-operation on these numbers, you probably will need create a function that converts those "chunks" from binary to denary (so you can multiply them and so on.)

smallest storage of integer array in mysql?

I have a table of user entries, and for every entry I have an array of (2-byte) integers to store (15-25, sporadically even more). The array elements will be written and read all at the same time, it is never needed to update or to access them individually. Their order matters. It makes sense to think of this as an array object.
I have many millions of these user entries and want to store this with the minimum possible amount of disk space. I'm however struggling with MySQL's lack of Array datatype.
I've been considering the following options.
Do it the MySQL way. Make a table my_data with columns user_id, data_id and data_int. To make this efficient, one needs an index on user_id, totalling well over 10 bytes per integer.
Store the array in text format. This takes ~6.5 bytes per integer.
making 35-40 columns ("enough") and having -32768 be 'empty' (since this value cannot occur in my data). This takes 3.5-4 bytes per integer, but is somewhat ugly (as I have to impose a strict limit on the number of elements in the array).
Is there a better way to do this in MySQL? I know MySQL has an efficient varchar type, so ideally I'd store my 2-byte integers as 2-byte chars in a varchar (or a similar approach with blob), but I'm not sure how to do that. Is this possible? How should this be done?
You could store them as separate SMALLINT NULL columns.
In MyISAM this this uses 2 bytes of data + 1 bit of null indicator for each value.
In InnoDB, the null indicators are encoded into the column's field start offset, so they don't take any extra space, and null values are not actually stored in the row data. If the rows are small enough that all the offsets are 1 byte, then this uses 3 bytes for every existing value (1 byte offset, 2 bytes data), and 1 byte for every nonexistent value.
Either of these would be better than using INT with a special value to indicate that it doesn't exist, since that would be 4 bytes of data for every value.
See NULL in MySQL (Performance & Storage)
The best answer was given in the comments, so I'll repost it here with some use-ready code, for further reference.
MySQL has a varbinary type that works really well for this: you can simply use PHP's pack/unpack functions to convert them to and from binary form, and store that binary form in the database using varbinary. Example code for the conversion is below.
function pack24bit($n) { //input: 24-bit integer, output: binary string of length 3 bytes
$b3 = $n%256;
$b2 = $n/256;
$b1 = $b2/256;
$b2 = $b2%256;
return pack('CCC',$b1,$b2,$b3);
}
function unpack24bit($packed) { //input: binary string of 3 bytes long, output: 24-bit int
$arr = unpack('C3b',$packed);
return 256*(256*$arr['b1']+$arr['b2'])+$arr['b3'];
}

Parsing base 2^32 numbers to decimal (For theorically unlimited numbers)

I am working on a C++ problem where I have to print my class.
My class stores and does arithmetic and logic operations on theorically unlimited long numbers. It has an array of unsigned ints to hold the number. For example:
If the number is {a*(2^32) + b} , the class stores it as {array[0]=b , array[1]=a}.
So it is like a number of base (2^32). The problem is how do i convert this number to decimal so i can print it? Simply {a*(2^32) + b} will not do because it doesnt fit into unsigned int. I do not have to store the decimal number but just print it.
What i have got so far
I have thought of firstly converting the number to binary (which is an easy task) then printing it. But same problem arises because there is still no big enough variable to hold the multiplication.
Wild thought
I wonder if I can use my own class to hold the multiplication and with some iterative method do the printing?
I also wonder if this can be solved with some use of logarithmics?
Note: I am not allowed to use other libraries or other long types like double and longer.
Although I say this is for theorically unlimited numbers it would help if I could just find the way to print array of size 2. Then I can think about longer numbers.

Confused with endianess: bits or bytes?

I extracted this from a tutorial:
Little-Endian order is the one we will be using in this document, and unless stated specifically you should assume that Little-Endian order is used in any file. The alternate is Big-Endian ordering. So let’s see an example. Take the following stream or 8 bits 10001110 If you have been following the document so far, you would quickly calculate the value of this 8-bit number as being 1x2^7 + 0x2^6 + … + 1x2^1 + 0x2^0 = 142 This is an example of Little-Endian ordering. However, in Big-Endian ordering we need to read the number in the opposite direction 1x2^0 + 0x2^1 + … + 1x2^6 + 0x2^7 = 113
Is this correct?
I used to think that endianess has to do with order that the BYTES (not the bits) are read.
Yes, in the context of memory/storage, endianness indeed refers to byte ordering (typically). What would it mean to say that e.g. the least-significant bit "comes first"?
Bit endianness is relevant in some situations, for instance when sending data over a serial bus.
You are correct - that quote you have there is rubbish, IMHO.
It wouldn't make sense to reorder bits, and it would be pretty confusing to boot. CPUs don't read simgle bits, they read bytes, or combinations of bytes, at one time, so that's the ordering that's important.
When they store a number made up of multiple bytes, they can either store it from left to right, making the high-order byte lowest in memory, or right to left, with the low-order byte lowest in memory.

Why is it useful to know how to convert between numeric bases?

We are learning about converting Binary to Decimal (and vice-versa) as well as other base-conversion methods, but I don't understand the necessity of this knowledge.
Are there any real-world uses for converting numbers between different bases?
When dealing with Unicode escape codes— '\u2014' in Javascript is — in HTML
When debugging— many debuggers show all numbers in hex
When writing bitmasks— it's more convenient to specify powers of two in hex (or by writing 1 << 4)
In this article I describe a concrete use case. In short, suppose you have a series of bytes you want to transfer using some transport mechanism, but you cannot simply pass the payload as bytes, because you are not able to send binary content. Let's say you can only use 64 characters for encoding the payload. A solution to this problem is to convert the bytes (8-bit characters) into 6-bit characters. Here the number conversion comes into play. Consider the series of bytes as a big number whose base is 256. Then convert it into a number with base 64 and you are done. Each digit of the new base 64 number now denotes a character of your encoded payload...
If you have a device, such as a hard drive, that can only have a set number of states, you can only count in a number system with that many states.
Because a computer's byte only have on and off, you can only represent 0 and 1. Therefore a base2 system is used.
If you have a device that had 3 states, you could represent 0, 1 and 2, and therefore count in a base 3 system.