Storing very large integers in MySQL - mysql

I need to store a very large number (tens of millions) of 512-bit SHA-2 hashes in a MySQL table. To save space, I'd like to store them in binary form, rather than a string a hex digits. I'm using an ORM (DBix::Class) so the specific details of the storage will be abstracted from the code, which can inflate them to any object or structure that I choose.
MySQL's BIGINT type is 64 bits. So I could theoretically split the hash up amongst eight BIGINT columns. That seems pretty ridiculous though. My other thought was just using a single BLOB column, but I have heard that they can be slow to access due to MySQL's treating them as variable-length fields.
If anyone could offer some widsom that will save me a couple hours of benchmarking various methods, I'd appreciate it.
Note: Automatic -1 to anyone who says "just use postgres!" :)

Have you considered 'binary(64)' ? See MySQL binary type.

Use the type BINARY(64) ?

Related

store 300 digit number in sql

Which datatype can I use to store really big integer in SQL. I am using phpmyAdmin to view data and java program for storing and retrieving values. Actually I am working with Bilinear Maps which uses random numbers generated from Zp where p is very large prime number and then "raised to" operations on those number.
I want to store some numbers in database like public keys. What data type can I use for table columns in SQL for such values?
You could store them as strings of decimal digits using type CHARACTER. While this does waste some space, an advantage is that the database will be easier for humans to understand.
You could store them as raw binary big-endian values using type BLOB. This is the most efficient for software to access and takes up the least space. However, humans will not be able to easily query the database for these values or understand them in dumps.
Personally, I would opt for the blob unless there's a real need for the database to be understandable by humans using standard query tools. If you can't get around needing to administer the database with tools that don't understand your data format, then just use decimal values in text.
For MySQL, VARCHAR(300) CHARACTER SET ascii.
VAR, assuming the numbers won't always be exactly 300.
CHAR -- no big advantage in BLOB.
ascii -- no need for utf8 involvement.
DECIMAL won't work because there is a 64-digit limit.
The space taken will be 2+length bytes (302 in your example), where the 2 is for length for VAR.

MySQL best way to store long strings

I'm looking for some advice on the best way to store long strings of data from the mySQL experts.
I have a general purpose table which is used to store any kind of data, by which I mean it should be able to hold alphanumeric and numeric data.
Currently, the table structure is simple with an ID and the actual data stored in a single column as follows:
id INT(11)
data VARCHAR(128)
I now have a requirement to store a larger amount of data (up to 500 characters) and am wondering whether the best way would be to simply increase the varchar column size, or whether I should add a new column (a TEXT type column?) for the times I need to store longer strings.
If any experts out there has any advice I'm all ears!
My preferred method would be to simply increase the varchar column, but that's because I'm lazy.
The mySQL version I'm running is 5.0.77.
I should mention the new 500 character requirement will only be for the odd record; most records in the table will be not longer than 50 characters.
I thought I'd be future-proofing by making the column 128. Shows how much I knew!
Generally speaking, this is not a question that has a "correct" answer. There is no "infinite length" text storage type in MySQL. You could use LONGTEXT, but that still has an (absurdly high) upper limit. Yet if you do, you're kicking your DBMS in the teeth for having to deal with that absurd blob of a column for your 50-character text. Not to mention the fact that you hardly do anything with it.
So, most futureproofness(TM) is probably offered by LONGTEXT. But it's also a very bad method of resolving the issue. Honestly, I'd revisit the application requirements. Storing strings that have no "domain" (as in, being well-defined in their application) and arbitrary length is not one of the strengths of RDBMS.
If I'd want to solve this on the "application design" level, I'd use NoSQL key-value store for this (and I'm as anti-NoSQL-hype as they get, so you know it's serious), even though I recognize it's a rather expensive change for such a minor change. But if this is an indication of what your DBMS is eventually going to hold, it might be more prudent to switch now to avoid this same problem hundred times in the future. Data domain is very important in RDBMS, whereas it's explicitly sidelined in non-relational solutions, which seems to be what you're trying to solve here.
Stuck with MySQL? Just increase it to VARCHAR(1000). If you have no requirements for your data, it's irrelevant what you do anyway.
Careful if using text. TEXT data is not stored in the database server’s memory, therefore, whenever you query TEXT data, MySQL has to read from it from the disk, which is much slower in comparison with CHAR and VARCHAR as it cannot make use of indexes.The better way to store long string will be nosql databases
We can use varchar(<maximum_limit>). The maximum limit that we can pass is 65535 bytes.
Note: This maximum length of a VARCHAR is shared among all columns except TEXT/BLOB columns and the character set used.

MySQL: primary key is a 8-byte string. Is it better to use BIGINT or BINARY(8)?

We need to store many rows in a MySQL (InnoDB) table, all of them having a 8-byte binary string as primary key.
I was wondering wether it was best to use the BIGINT column type (which contains 64-bit, thus 8-byte, integers) or BINARY(8), which is fixed length.
Since we're using those ids as strings in our application, and not numbers, storing them as binary strings sounds more coherent to me. However, I wonder if there are performance issues with this. Does it make any difference?
If that matters, we are reading/storing these ids using hex notation (like page_id = 0x1122334455667788).
We wouldn't use integers in queries anyway, since we're writing a PHP application and, as you surely know, there isn't a "unsigned long long int" type, so all integers are machine-dependant size.
I'd use the binary(8) if this matches your design.
Otherwise you'll always have a conversion overhead in performance or complexity somewhere. There won't be much (if any) difference between the types at the RDBMS level

MySQL Storage and Optimization

I'm looking at a db schema for a project I'm inheriting. There are many instances of binary answers being stored as INT(11) rather than TinyInt(1), which is the way I've normally handled this type or storage.
I've checked the data and everything is either "1" or "0". Is there any reason to or not to change the datatype to TinyInt(1) Unsigned for all of these instances?
Similarly, if something like "last_name" if the current column allows varchar(255), would switching to varchar(100) create any gains? I'm more interested in performance/efficiency than in just limiting data storage at this point.
Thanks,
D.
I would say definitely go ahead with the changes to the boolean columns. (Note: Actually if you're using MySQL 5+, I would use the bit datatype instead of tinyint).
As far as the varchar columns, it doesn't actually make a difference changing 255 to 100 length.
From The SQL Docs:
A column uses one length byte if
values require no more than 255 bytes,
two length bytes if values may require
more than 255 bytes.
So as long as its under 255, you're really not gaining much in terms of memory storage.
That being said, by limiting the size of the names, less data needs to be transferred between your SQL server and your application.
Switching to TINYINT would save you 3 bytes I believe, which doesn't seem like a lot to me, although it's certainly a little more efficient.
I always try and make VARCHAR columns as small as I can get away with. I would personally focus on any gains you can get from that.
The main reason I can think of to avoid any of these changes is if you have so much data that running an ALTER TABLE would cause significant downtime.
Whether any of this will help your app perform better is open to debate. In theory, with VARCHARs, MySQL will only send the actual data over the wire, so if all your last names are 40 bytes long, it's only sending 40 bytes. If the column isn't being used in lookups, it shouldn't really have any impact on your perfomance. There's a couple relevant questions like this one on SO covering this issue already.

Storing a binary array in MySQL

I have an array of values called A, B... X, Y, Z. Fun though it would be to have 26 columns in the table I can't help but feel there is a better way. I have considered creating a second table with the id value of row from the first table, the id of the item in the array and then the boolean value but it seems clunky and confusing.
Is there a better way?
Short answer, no. Long answer, it depends.
You can store binary data in a bunch of ways - abusing a number, using a BINARY OR VARBINARY, using a BLOB or TINYBLOB, etc. BINARY types will generally be faster than BLOB types, provided your data is a known size.
However, relational databases aren't designed for doing anything intelligent with binary data. On a project I used to work on, there was a table where each record had as specific binary pattern - stored as some sort of integer - and searching required a lot of ANDs, ORs, XORs and NOTs. It never really worked very well, performance sucked, and it held the whole project down. Looking back, I would have taken a completely different approach.
So if you just want to drop the data in and pull it out again, great. If you want to use it for anything intelligent, tough.
The situation may be different on other database vendors. In fact, have you considered using something else in place of the database? Some sort of object persistence?
Are your possible array values static?
If so, try using MySQL's SET data type.
You can try storing it as a TINYBLOB, or even an UNSIGNED INT, but you'll have to do bit masking in your code.
You can store it as a string and use text manipulation functions to (re)create your array.