Is Not BigInt Enough To House sha1? - mysql

I want to know if BigInt is enough in size.
I have created a registration.php where the user gets emailed an account activation link to click to verify his email so his account gets activated.
Account Activation Link is in this format:
[php]
$account_activation_link =
"http://www.".$site_domain."/".$social_network_name."/activate_account.php?primary_website_email=".$primary_website_email."&account_activation_code=".$account_activation_code."";
[/php]
Account Activation Code is in this format:
$account_activation_code = sha1( (string) mt_rand(5, 30)); //Type Casted the INT to STRING on the 1st parameter of sha1 as it needs to be a STRING.
Now, the following link got emailed:
http://www.myssite.com/folder/activate_account.php?primary_website_email=my.email#gmail.com&account_activation_code=22d200f8670dbdb3e253a90eee5098477c95c23d
Note the account activation code that got generated by sha1:
22d200f8670dbdb3e253a90eee5098477c95c23d
But in my mysql db, in the "account_activation_code" column, I only see:
"22". The rest of the activation code is missing. Why is that ?
The column is set to BigInt. Is not that enough to house the Sha1 generated code ?
What is your suggestion ?
Thank You

Hashing methods like SHA-1 produce binary values that are on the order of 160+ bits long depending on the variant used. The common SHA256 one is 256 bits long. No cryptographic hash will fit in a 64-bit BIGINT field because 64-bit hashes are uselessly small, you'll have nothing but collisions.
Normally people store hashes as their hex-encoded equivalents in a VARCHAR(255) column. These can be indexed and perform well enough in most situations, especially one where you do periodic lookups based on clicks. From a performance and storage perspective there's no problems here.

Short answer: BIGINT is way too small.
A hash is basically a stream of bits (160 bits in the case of SHA-1). While it's certainly possible to render those bits as a base 2 number and convert it to base 10, you need a really big storage to do so (as far as I know it's not common to see integer variables larger then 64 bits) and there aren't obvious advantages. BIGINT is a 64-bit type, thus cannot do the job.
Unless you have a good reason to store it as number, I'd simply go for either a binary column type or its plain-text hexadecimal representation in a good old VARCHAR (the latter tends to be more practical to handle).

You are trying to store a string in a BigInt. That is your issue. SHA hashes are a mix of alphanumeric characters not just numbers. Change the field to a VARCHAR and you'll be fine

Related

How does Adobe target obfuscates IPs?

I have a dataset containing obfuscated IPs. In order to do something, I would need to match IPs I know with this dataset.
If my dataset contains hashed IPs 1053617334, 1043615471
And I have IP 192.168.0.1, how can I hash it so I can verify if it is in the dataset or not?
IPv4 addresses are commonly represented as so-called "dotted quads", like 192.0.2.42 or 192.168.0.1.
That's 32 bits of data. And, that same data can be represented as a single unsigned decimal number. Your numbers like 1053617334, 1043615471 are probably examples of those numbers. They aren't, strictly speaking, hashed or obfuscated. They're just represented differently.
http://192.168.0.1 and http://3232235521 mean exactly the same thing.
There are all sorts of online tools to convert back and forth between dotted quad and decimal representation. For example.
Consult your AA documentation or support team to figure out how to handle this.

SQL string literal hexadecimal key to binary and back

after extensive search I am resorting to stack-overflows wisdom to help me.
Problem:
I have a database table that should effectively store values of the format (UserKey, data0, data1, ..) where the UserKey is to be handled as primary key but at least as an index. The UserKey itself (externally defined) is a string of 32 characters representing a checksum, which happens to be (a very big) hexadecimal number, i.e. it looks like this UserKey = "000000003abc4f6e000000003abc4f6e".
Now I can certainly store this UserKey in a char(32)-field, but I feel this being mighty inefficient, as I store a series of in principle arbitrary characters, i.e. reserving space for for more information per character than the 4 bits i need to store the hexadecimal characters (0..9,A-F).
So my thought was to convert this string literal into the hex-number it really represents, and store that. But this number (32*4 bits = 16Bytes) is much too big to store/handle as SQL only handles BIGINTS of 8Bytes.
My second thought was to convert this into a BINARY(16) representation, which should be compact and efficient concerning memory. However, I do not know how to efficiently convert between these two formats, as SQL also internally only handles numbers up to the maximum of 8 Bytes.
Maybe there is a way to convert this string to binary block by block and stitch the binary together somehow, in the way of:
UserKey == concat( stringblock1, stringblock2, ..)
UserKey_binary = concat( toBinary( stringblock1 ), toBinary( stringblock2 ), ..)
So my question is: is there any such mechanism foreseen in SQL that would solve this for me? How would a custom solution look like? (I find it hard to believe that I should be the first to encounter such a problem, as it has become quite modern to use ridiculously long hashkeys in many applications)
Also, the Userkey_binary should than act as relational key for the table, so I hope for a bit of speed by this more compact representation, as it needs to determine the difference on a minimal number of bits. Additionally, I want to mention that I would like to do any conversion if possible on the Server-side, so that user-scripts have not to be altered (the user-side should, if possible, still transmit a string literal not [partially] converted values in the insert statement)
In Contradiction to my previous statement, it seems that MySQL's UNHEX() function does a conversion from a string block by block and then concat much like I stated above, so the method works also for HEX literal values which are bigger than the BIGINT's 8 byte limitation. Here an example table that illustrates this:
CREATE TABLE `testdb`.`tab` (
`hexcol_binary` BINARY(16) GENERATED ALWAYS AS (UNHEX(charcol)) STORED,
`charcol` CHAR(32) NOT NULL,
PRIMARY KEY (`hexcol_binary`));
The primary key is a generated column, so that that updates to charcol are the designated way of interacting with the table with string literals from the outside:
REPLACE into tab (charcol) VALUES ('1010202030304040A0A0B0B0C0C0D0D0');
SELECT HEX(hexcol_binary) as HEXstring, tab.* FROM tab;
as seen building keys and indexes on the hexcol_binary works as intended.
To verify the speedup, take
ALTER TABLE `testdb`.`tab`
ADD INDEX `charkey` (`charcol` ASC);
EXPLAIN SELECT * from tab where hexcol_binary = UNHEX('1010202030304040A0A0B0B0C0C0D0D0') #keylength 16
EXPLAIN SELECT * from tab where charcol = '1010202030304040A0A0B0B0C0C0D0D0' #keylength 97
the lookup on the hexcol_binary column is much better performing, especially if its additonally made unique.
Note: the hex conversion does not care if the hex-characters A through F are capitalized or not for the conversion process, however the charcol will be very sensitive to this.

Correct way to store a bit array

I'm working on a project that needs to store something like
101110101010100011010101001
into the database. It's not a file or archive: it's only a bit array, and I think that storing it into a varchar column is waste of space/performance.
I've searched about the BLOB and the VARBINARY type. But both of then allows to insert a value like 54563423523515453453, that's not exactly a bit array.
For sure, if I store a bit array like 10001000 into a BLOB/varbinary/varchar column, it will consume more than a byte, and I want that the minimum space is consumed. In the case of eight bits, it needs to consume only one byte, 16 bits two bytes, and so on.
If it's not possible, then what is the best approach to waste the minimum amount of space in this case?
Important notes: The size of the array is variable, and is not divisible by eight in every situation. Sometimes I will need to store 325 bits, other times 7143 bits....
In one of my previous projects, I converted streams of 1's and 0' to decimal, but they were shorter. I dont know if that would be applicable in your project.
On the other hand, imho, you should clarify what will you need to do with that data once you get it stored. Search? Compare? It might largely depend on the purpose of the database.
Could you gzip it and then store it? Is that applicable?
Binary is a string representation of a number. The string
101110101010100011010101001
represents the number
... + 1*25 + 0*24 + 1*23 + 0*22 + 0*21 + 1*20
As such, it can be stored in a 32-bit integer if were to be converted from a binary string to the number it represents. In Perl, one would use
oct('0b'.$binary)
But you have a variable number of bits. Not a problem! Just process them 8 at a time to create a string of bytes to place in a BLOB or similar.
Ah, but there's a catch. You'll need to add padding to get a number divisible by 8, which means you'll have to use a means of removing that padding. A simple approach if there's a known maximum length is to use a length prefix. e.g. If you know the number of bits is never going to exceed 65,535, encode the number of bits in the first two bytes of the string.
pack('nB*', length($binary), $binary)
which is reverted using
my ($length, $binary) = unpacked('nB*', $packed);
substr($binary, $length) = '';

Truncates Long Text/Memo string to 255 characters when it is a primary key field or "Indexed: Yes (no-duplicates) allowed"

I created a table in MS Access 2013 with only one column of "Long Text" type (called as Memo earlier) and made it the primary key of the table. I stored a long string of 255+ characters and then I tried to store another string whose first 255 characters were same as previous stored string but all other characters after first 255 were different and MS Access gave "duplicate data" error. In the new string I changed the characters that were after 255th position, using different combinations of characters and all gave error. But when I change any character before the 255th position it does not give any error. So, I concluded that MS Access checks only the first 255 characters of "Long Text" data type for checking duplicates in that column. Is it so? What else could be reason?
String Stored of 256 characters:
LoremIpsumissimplydummytextoftheprintingandtypesettingindustryLoremIpsumhasbeentheindustrysstandarddummytexteversincethe1500swhenanunknownprintertookagalleyoftypeandscrambledittomakeatypespecimenbookIthassurvivednotonlyfivecenturiesbutalsotheleapintoelectr
String Gave Error:
LoremIpsumissimplydummytextoftheprintingandtypesettingindustryLoremIpsumhasbeentheindustrysstandarddummytexteversincethe1500swhenanunknownprintertookagalleyoftypeandscrambledittomakeatypespecimenbookIthassurvivednotonlyfivecenturiesbutalsotheleapintoelect1
String Gave Error:
LoremIpsumissimplydummytextoftheprintingandtypesettingindustryLoremIpsumhasbeentheindustrysstandarddummytexteversincethe1500swhenanunknownprintertookagalleyoftypeandscrambledittomakeatypespecimenbookIthassurvivednotonlyfivecenturiesbutalsotheleapintoelect2
String Gave Error:
LoremIpsumissimplydummytextoftheprintingandtypesettingindustryLoremIpsumhasbeentheindustrysstandarddummytexteversincethe1500swhenanunknownprintertookagalleyoftypeandscrambledittomakeatypespecimenbookIthassurvivednotonlyfivecenturiesbutalsotheleapintoelect123
Does Not Give Error:
LoremIpsumissimplydummytextoftheprintingandtypesettingindustryLoremIpsumhasbeentheindustrysstandarddummytexteversincethe1500swhenanunknownprintertookagalleyoftypeandscrambledittomakeatypespecimenbookIthassurvivednotonlyfivecenturiesbutalsotheleapintoelec1
Does Not Give Error:
LoremIpsumissimplydummytextoftheprintingandtypesettingindustryLoremIpsumhasbeentheindustrysstandarddummytexteversincethe1500swhenanunknownprintertookagalleyoftypeandscrambledittomakeatypespecimenbookIthassurvivednotonlyfivecenturiesbutalsotheleapintoelec2
Does Not Give Error:
LoremIpsumissimplydummytextoftheprintingandtypesettingindustryLoremIpsumhasbeentheindustrysstandarddummytexteversincethe1500swhenanunknownprintertookagalleyoftypeandscrambledittomakeatypespecimenbookIthassurvivednotonlyfivecenturiesbutalsotheleapintoelec3
Please notice the difference in the last few characters of above samples. The first stored string has 256 characters. Even if the column is not the primary key, the problem remains same if "Indexed: Yes (no-duplicates) allowed" value is set true in the table design for that column.
As #HansUp stated in the comments, Access (specifically the Jet/ACE db engine) only uses the first 255 characters of a Memo/Long Text field to create its index. Hence, it only uses the first 255 characters to enforce No Duplicates.
#HansUp's advice to use a different db engine that provides better support for long strings and Full Text search is probably the best approach, but I understand there are often other considerations that may be limiting you to solving your problem in Access.
As such, here is an Access-only approach to solving your problem. This assumes the requirement you listed in the comments is valid; i.e., you need to store unique strings of between 400 and 1000 characters.
Alternative 1
Keep your initial Memo/Long Text field: Notes
Create four text fields (not Memo/Long Text) of 250 characters max: Notes1, Notes2, Notes3, Notes4
Set all four text fields: Required -> True and Allow Zero Length -> True (this is required to ensure the unique index is enforced for strings less than 751 characters)
Create a unique index and add all four text fields to that index
Don't ignore nulls in your index
When you store the values, you will need to store them in the Notes field and also split the string among the four smaller NotesX fields
Alternative 2:
Keep your current setup and enforce the uniqueness at code level. Every time you update or insert a note, do a search on all notes that match the first 255 characters, read the value and perform the comparison in code.
Alternative 3 (thanks to #HansUp for suggesting this in the comments):
Keep your initial Memo/Long Text field: Notes
Create a 16 or 32 character text field to store the 256 bit or 512 bit hash of your long text: NotesHash
Add a unique index to your NotesHash field
Every time the memo field is changed, re-compute the hash value and attempt to store it in the table
Notes for this method:
As the pigeonhole principle easily proves, there is the possibility that two different strings will generate the same hash (a collision). However, using a good hashing algorithm will make the actual probability approach zero.
This site offers some VB6/VBA/VBScript implementations of various hashing algorithms. I can't vouch for their correctness, but they passed the eye test for me. Use at your own risk, but it's at least a good starting point.
Really, you can use any deterministic function that returns a string of 255 characters or fewer given an arbitrarily large input. The difference between a crappy hash algorithm and a good one is how well it minimizes collisions. For that reason, I would suggest you use one based on a popular standard.
And yes, I still highly recommend #HansUp's solution to simply use a different db engine.

What column type/length should I use for storing a Bcrypt hashed password in a Database?

I want to store a hashed password (using BCrypt) in a database. What would be a good type for this, and which would be the correct length? Are passwords hashed with BCrypt always of same length?
EDIT
Example hash:
$2a$10$KssILxWNR6k62B7yiX0GAe2Q7wwHlrzhF3LqtVvpyvHZf0MwvNfVu
After hashing some passwords, it seems that BCrypt always generates 60 character hashes.
EDIT 2
Sorry for not mentioning the implementation. I am using jBCrypt.
The modular crypt format for bcrypt consists of
$2$, $2a$ or $2y$ identifying the hashing algorithm and format
a two digit value denoting the cost parameter, followed by $
a 53 characters long base-64-encoded value (they use the alphabet ., /, 0–9, A–Z, a–z that is different to the standard Base 64 Encoding alphabet) consisting of:
22 characters of salt (effectively only 128 bits of the 132 decoded bits)
31 characters of encrypted output (effectively only 184 bits of the 186 decoded bits)
Thus the total length is 59 or 60 bytes respectively.
As you use the 2a format, you’ll need 60 bytes. And thus for MySQL I’ll recommend to use the CHAR(60) BINARYor BINARY(60) (see The _bin and binary Collations for information about the difference).
CHAR is not binary safe and equality does not depend solely on the byte value but on the actual collation; in the worst case A is treated as equal to a. See The _bin and binary Collations for more information.
A Bcrypt hash can be stored in a BINARY(40) column.
BINARY(60), as the other answers suggest, is the easiest and most natural choice, but if you want to maximize storage efficiency, you can save 20 bytes by losslessly deconstructing the hash. I've documented this more thoroughly on GitHub: https://github.com/ademarre/binary-mcf
Bcrypt hashes follow a structure referred to as modular crypt format (MCF). Binary MCF (BMCF) decodes these textual hash representations to a more compact binary structure. In the case of Bcrypt, the resulting binary hash is 40 bytes.
Gumbo did a nice job of explaining the four components of a Bcrypt MCF hash:
$<id>$<cost>$<salt><digest>
Decoding to BMCF goes like this:
$<id>$ can be represented in 3 bits.
<cost>$, 04-31, can be represented in 5 bits. Put these together for 1 byte.
The 22-character salt is a (non-standard) base-64 representation of 128 bits. Base-64 decoding yields 16 bytes.
The 31-character hash digest can be base-64 decoded to 23 bytes.
Put it all together for 40 bytes: 1 + 16 + 23
You can read more at the link above, or examine my PHP implementation, also on GitHub.
If you are using PHP's password_hash() with the PASSWORD_DEFAULT algorithm to generate the bcrypt hash (which I would assume is a large percentage of people reading this question) be sure to keep in mind that in the future password_hash() might use a different algorithm as the default and this could therefore affect the length of the hash (but it may not necessarily be longer).
From the manual page:
Note that this constant is designed to change over time as new and
stronger algorithms are added to PHP. For that reason, the length of
the result from using this identifier can change over time. Therefore,
it is recommended to store the result in a database column that can
expand beyond 60 characters (255 characters would be a good choice).
Using bcrypt, even if you have 1 billion users (i.e. you're currently competing with facebook) to store 255 byte password hashes it would only ~255 GB of data - about the size of a smallish SSD hard drive. It is extremely unlikely that storing the password hash is going to be the bottleneck in your application. However in the off chance that storage space really is an issue for some reason, you can use PASSWORD_BCRYPT to force password_hash() to use bcrypt, even if that's not the default. Just be sure to stay informed about any vulnerabilities found in bcrypt and review the release notes every time a new PHP version is released. If the default algorithm is ever changed it would be good to review why and make an informed decision whether to use the new algorithm or not.
I don't think that there are any neat tricks you can do storing this as you can do for example with an MD5 hash.
I think your best bet is to store it as a CHAR(60) as it is always 60 chars long
I think best choice is nonbinary type, because in comparison is less combination and should be faster. If data is encoded with base64_encode then each position, each byte have only 64 possible values. If encoded with bin2hex each byte have only 16 possible values, but string is much longer. In binary byte have 256 position on each.
I use for hashes in form of encode64 VARCHAR(255) column with ascii character set and the same collation.
VARBINARY causes comparison problem as described in MySQL documentation. I don't know why answers advice to use VARBINARY have so many positives.
I checked this on my author site, where measure time (just refresh to see).