AES Encryption in Oracle and MySQL are giving different results - mysql

I am in need to compare data between an Oracle database and a MySQL database.
In Oracle, the data is first encrypted with the AES-128 algorithm, and then hashed. Which means it is not possible to recover the data and decrypt it.
The same data is available in MySQL, and in plain text. So to compare the data, I tried encrypting and then hashing the MySQL data while following the same steps done in Oracle.
After lots of tries, I finally found out that the aes_encrypt in MySQL returns different results than the one in Oracle.
-- ORACLE:
-- First the key is hashed with md5 to make it a 128bit key:
raw_key := DBMS_CRYPTO.Hash (UTL_I18N.STRING_TO_RAW ('test_key', 'AL32UTF8'), DBMS_CRYPTO.HASH_MD5);
-- Initialize the encrypted result
encryption_type:= DBMS_CRYPTO.ENCRYPT_AES128 + DBMS_CRYPTO.CHAIN_CBC + DBMS_CRYPTO.PAD_PKCS5;
-- Then the data is being encrypted with AES:
encrypted_result := DBMS_CRYPTO.ENCRYPT(UTL_I18N.STRING_TO_RAW('test-data', 'AL32UTF8'), encryption_type, raw_key);
The result for the oracle code will be: 8FCA326C25C8908446D28884394F2E22
-- MySQL
-- While doing the same with MySQL, I have tried the following:
SELECT hex(aes_encrypt('test-data', MD5('test_key'));
The result for the MySQL code will be: DC7ACAC07F04BBE0ECEC6B6934CF79FE
Am I missing something? Or are the encryption methods between different languages not the same?
UPDATE:
According to the comments below, I believe I should mention the fact that the result of DBMS_CRYPTO.Hash in Oracle is the same as the result returned by the MD5 function in MySQL.
Also using CBC or CBE in Oracle gives the same result, since the IV isn't being passed to the function, thus the default value of the IV is used which is NULL
BOUNTY:
If someone can verify my last comment, and the fact that if using same padding on both sides, will yield same results gets the bounty:
#rossum The default padding in MySQL is PKCS7, mmm... Oh.. In Oracle
it's using PKCS5, can't believe I didn't notice that. Thanks. (Btw
Oracle doesn't have the PAD_PKCS7 option, not in 11g at least)

MySQL's MD5 function returns a string of 32 hexadecimal characters. It's marked as a binary string but it isn't the 16 byte binary data one would expect.
So to fix it, this string must be converted back to the binary data:
SELECT hex(aes_encrypt('test-data', unhex(MD5('test_key'))));
The result is:
8FCA326C25C8908446D28884394F2E22
It's again a string of 32 hexadecimal characters. But otherwise it's the same result as with Oracle.
And BTW:
MySQL uses PKCS7 padding.
PKCS5 padding and PKCS7 padding are one and the same. So the Oracle padding option is correct.
MySQL uses ECB block cipher mode. So you'll have to adapt the code accordingly. (It doesn't make any difference for the first 16 bytes.)
MySQL uses no initialization vector (the same as your Oracle code).
MySQL uses a non-standard folding a keys. So to achieve the same result in MySQL and Oracle (or .NET or Java), only use keys that are 16 byte long.

Just would like to give the complete solution for dummies based on #Codo's very didactic answer.
EDIT:
For being exact in general cases, I found this:
- "PKCS#5 padding is a subset of PKCS#7 padding for 8 byte block sizes".
So strictly PKCS5 can't be applied to AES; they mean PKCS7 but use their
names interchangeably.
About PKCS5 and PKCS7
/* MySQL uses a non-standard folding a key.
* So to achieve the same result in MySQL and Oracle (or .NET or Java),
only use keys that are 16 bytes long (32 hexadecimal symbols) = 128 bits
AES encryption, the MySQL AES_encrypt default one.
*
* This means MySQL admits any key length between 16 and 32 bytes
for 128 bits AES encryption, but it's not allowed by the standard
AES to use a non-16 bytes key, so do not use it as you won't be able
to use the standard AES decrypt in other platform for keys with more
than 16 bytes, and would be obligued to program the MySQL folding of
the key in that other platform, with the XOR stuff, etc.
(it's already out there but why doing weird non-standard things thay
may change when MySQL decide, etc.).
Moreover, I think they say the algorithm chosen by MySQL for those
cases is a really bad choose on a security level...
*/
-- ### ORACLE:
-- First the key is hashed with md5 to make it a 128 bit key (16 bytes, 32 hex symbols):
raw_key := DBMS_CRYPTO.Hash (UTL_I18N.STRING_TO_RAW ('test_key', 'AL32UTF8'), DBMS_CRYPTO.HASH_MD5);
-- MySQL uses AL32UTF8, at least by default
-- Configure the encryption parameters:
encryption_type:= DBMS_CRYPTO.ENCRYPT_AES128 + DBMS_CRYPTO.CHAIN_ECB + DBMS_CRYPTO.PAD_PKCS5;
-- Strictly speaking, it's really PKCS7.
/* And I choose ECB for being faster if applied and
#Codo said it's the correct one, but as standard (Oracle) AES128 will only accept
16 bytes keys, CBC also works, as I believe they are not applied to a 16 bytes key.
Could someone confirm this? */
-- Then the data is encrypted with AES:
encrypted_result := DBMS_CRYPTO.ENCRYPT(UTL_I18N.STRING_TO_RAW('test-data', 'AL32UTF8'), encryption_type, raw_key);
-- The result is binary (varbinary, blob).
-- One can use RAWTOHEX() for if you want to represent it in hex characters.
In case you use directly the 16 bytes hashed passphrase in hex characters representation or 32 hex random chars:
raw_key := HEXTORAW(32_hex_key)
encryption_type := 6 + 768 + 4096 -- (same as above in numbers; see Oracle Docum.)
raw_data := UTL_I18N.STRING_TO_RAW('test-data', 'AL32UTF8')
encrypted_result := DBMS_CRYPTO.ENCRYPT( raw_data, encryption_type, raw_key )
-- ORACLE Decryption:
decrypted_result := UTL_I18N.RAW_TO_CHAR( CRYPTO.DECRYPT( raw_data, encryption_type, raw_key ), 'AL32UTF8' )
-- In SQL:
SELECT
UTL_I18N.RAW_TO_CHAR(
DBMS_CRYPTO.DECRYPT(
UTL_I18N.STRING_TO_RAW('test-data', 'AL32UTF8'),
6 + 768 + 4096,
HEXTORAW(32_hex_key)
) , 'AL32UTF8') as "decrypted"
FROM DUAL;
-- ### MySQL decryption:
-- MySQL's MD5 function returns a string of 32 hexadecimal characters (=16 bytes=128 bits).
-- It's marked as a binary string but it isn't the 16 bytes binary data one would expect.
-- NOTE: Note that the kind of return of MD5, SHA1, etc functions changed in some versions since 5.3.x. See MySQL 5.7 manual.
-- So to fix it, this string must be converted back from hex to binary data with unHex():
SELECT hex(aes_encrypt('test-data', unhex(MD5('test_key')));
P.S.:
I would recommend to read the improved explanation in MySQL 5.7 Manual, which moreover now allows a lot more configuration.
MySQL AES_ENCRYPT improved explanation from v5.7 manual

Could be CBC vs ECB. Comment at the bottom of this page: http://dev.mysql.com/doc/refman/5.5/en/encryption-functions.html says mysql function uses ECB

Related

SQL string literal hexadecimal key to binary and back

after extensive search I am resorting to stack-overflows wisdom to help me.
Problem:
I have a database table that should effectively store values of the format (UserKey, data0, data1, ..) where the UserKey is to be handled as primary key but at least as an index. The UserKey itself (externally defined) is a string of 32 characters representing a checksum, which happens to be (a very big) hexadecimal number, i.e. it looks like this UserKey = "000000003abc4f6e000000003abc4f6e".
Now I can certainly store this UserKey in a char(32)-field, but I feel this being mighty inefficient, as I store a series of in principle arbitrary characters, i.e. reserving space for for more information per character than the 4 bits i need to store the hexadecimal characters (0..9,A-F).
So my thought was to convert this string literal into the hex-number it really represents, and store that. But this number (32*4 bits = 16Bytes) is much too big to store/handle as SQL only handles BIGINTS of 8Bytes.
My second thought was to convert this into a BINARY(16) representation, which should be compact and efficient concerning memory. However, I do not know how to efficiently convert between these two formats, as SQL also internally only handles numbers up to the maximum of 8 Bytes.
Maybe there is a way to convert this string to binary block by block and stitch the binary together somehow, in the way of:
UserKey == concat( stringblock1, stringblock2, ..)
UserKey_binary = concat( toBinary( stringblock1 ), toBinary( stringblock2 ), ..)
So my question is: is there any such mechanism foreseen in SQL that would solve this for me? How would a custom solution look like? (I find it hard to believe that I should be the first to encounter such a problem, as it has become quite modern to use ridiculously long hashkeys in many applications)
Also, the Userkey_binary should than act as relational key for the table, so I hope for a bit of speed by this more compact representation, as it needs to determine the difference on a minimal number of bits. Additionally, I want to mention that I would like to do any conversion if possible on the Server-side, so that user-scripts have not to be altered (the user-side should, if possible, still transmit a string literal not [partially] converted values in the insert statement)
In Contradiction to my previous statement, it seems that MySQL's UNHEX() function does a conversion from a string block by block and then concat much like I stated above, so the method works also for HEX literal values which are bigger than the BIGINT's 8 byte limitation. Here an example table that illustrates this:
CREATE TABLE `testdb`.`tab` (
`hexcol_binary` BINARY(16) GENERATED ALWAYS AS (UNHEX(charcol)) STORED,
`charcol` CHAR(32) NOT NULL,
PRIMARY KEY (`hexcol_binary`));
The primary key is a generated column, so that that updates to charcol are the designated way of interacting with the table with string literals from the outside:
REPLACE into tab (charcol) VALUES ('1010202030304040A0A0B0B0C0C0D0D0');
SELECT HEX(hexcol_binary) as HEXstring, tab.* FROM tab;
as seen building keys and indexes on the hexcol_binary works as intended.
To verify the speedup, take
ALTER TABLE `testdb`.`tab`
ADD INDEX `charkey` (`charcol` ASC);
EXPLAIN SELECT * from tab where hexcol_binary = UNHEX('1010202030304040A0A0B0B0C0C0D0D0') #keylength 16
EXPLAIN SELECT * from tab where charcol = '1010202030304040A0A0B0B0C0C0D0D0' #keylength 97
the lookup on the hexcol_binary column is much better performing, especially if its additonally made unique.
Note: the hex conversion does not care if the hex-characters A through F are capitalized or not for the conversion process, however the charcol will be very sensitive to this.

Is Not BigInt Enough To House sha1?

I want to know if BigInt is enough in size.
I have created a registration.php where the user gets emailed an account activation link to click to verify his email so his account gets activated.
Account Activation Link is in this format:
[php]
$account_activation_link =
"http://www.".$site_domain."/".$social_network_name."/activate_account.php?primary_website_email=".$primary_website_email."&account_activation_code=".$account_activation_code."";
[/php]
Account Activation Code is in this format:
$account_activation_code = sha1( (string) mt_rand(5, 30)); //Type Casted the INT to STRING on the 1st parameter of sha1 as it needs to be a STRING.
Now, the following link got emailed:
http://www.myssite.com/folder/activate_account.php?primary_website_email=my.email#gmail.com&account_activation_code=22d200f8670dbdb3e253a90eee5098477c95c23d
Note the account activation code that got generated by sha1:
22d200f8670dbdb3e253a90eee5098477c95c23d
But in my mysql db, in the "account_activation_code" column, I only see:
"22". The rest of the activation code is missing. Why is that ?
The column is set to BigInt. Is not that enough to house the Sha1 generated code ?
What is your suggestion ?
Thank You
Hashing methods like SHA-1 produce binary values that are on the order of 160+ bits long depending on the variant used. The common SHA256 one is 256 bits long. No cryptographic hash will fit in a 64-bit BIGINT field because 64-bit hashes are uselessly small, you'll have nothing but collisions.
Normally people store hashes as their hex-encoded equivalents in a VARCHAR(255) column. These can be indexed and perform well enough in most situations, especially one where you do periodic lookups based on clicks. From a performance and storage perspective there's no problems here.
Short answer: BIGINT is way too small.
A hash is basically a stream of bits (160 bits in the case of SHA-1). While it's certainly possible to render those bits as a base 2 number and convert it to base 10, you need a really big storage to do so (as far as I know it's not common to see integer variables larger then 64 bits) and there aren't obvious advantages. BIGINT is a 64-bit type, thus cannot do the job.
Unless you have a good reason to store it as number, I'd simply go for either a binary column type or its plain-text hexadecimal representation in a good old VARCHAR (the latter tends to be more practical to handle).
You are trying to store a string in a BigInt. That is your issue. SHA hashes are a mix of alphanumeric characters not just numbers. Change the field to a VARCHAR and you'll be fine

How to insert a binary stream in to MySQL without ASCII conversation?

I have an application written in "C" - debian distro (using libmysqlclient). My application is inserting a huge number of rows in to the database (around 30.000 rows/sec, row length ~ 150 B). Insertion takes many of CPU, because the client (my app) must convert binary stream (integers, blobs) in to ASCII representation in to a valid SQL insert statement. And also the MySQL must convert this values in to binary representation and store it to the file.
And my question. Is some possibility to call SQL insert without this conversation? I have been using a classical sprintf (implemented in libc). Now, I'm using optimized version and
without calling sprintf (too many func calling). I thought I'll use a MySQL prepared statement, but it seems a prepared statement also convert '?' variables to the ASCII.
Jan
See mysql_real_query().
22.8.7.56 mysql_real_query()
int mysql_real_query(MYSQL *mysql, const char *stmt_str, unsigned long length)
Apparently, something like the below should work
#define DATA "insert into atable (acolumn) values('zero\000quotes\"\'etc ...')"
if (mysql_real_query(db, DATA, sizeof DATA - 1)) /* error */;

using binary text comparison in mysql - efficiency pitfalls?

I'm using binary string comparison on user names (defined as varchar) to be sure the string matches exactly:
... where binary Owner = '$user' ...
or
... from Records join Users on binary Records.Owner = Users.User ....
But I'm not sure if it has some (either positive or negative) impact on efficiency. The manual states:
Note that in some contexts, if you cast an indexed column to BINARY,
MySQL is not able to use the index efficiently.
Is it an issue in this case? I would expect binary to be faster on the contrary, because it doesn't have to ignore case, whitespace, some accents etc.
Indeed, it's possible for an expression in a WHERE clause to make it impossible to use an index on a column. You wouldn't expect an index on a float column called x to be any use for searching
WHERE SIN(x) BETWEEN 0.0 and 0.1
for example.
The answer to your problem is to define your Owner column to have a binary collation rather than a case-insensitive collation. See here for how to do that: http://dev.mysql.com/doc/refman/5.0/en/charset-column.html

What column type/length should I use for storing a Bcrypt hashed password in a Database?

I want to store a hashed password (using BCrypt) in a database. What would be a good type for this, and which would be the correct length? Are passwords hashed with BCrypt always of same length?
EDIT
Example hash:
$2a$10$KssILxWNR6k62B7yiX0GAe2Q7wwHlrzhF3LqtVvpyvHZf0MwvNfVu
After hashing some passwords, it seems that BCrypt always generates 60 character hashes.
EDIT 2
Sorry for not mentioning the implementation. I am using jBCrypt.
The modular crypt format for bcrypt consists of
$2$, $2a$ or $2y$ identifying the hashing algorithm and format
a two digit value denoting the cost parameter, followed by $
a 53 characters long base-64-encoded value (they use the alphabet ., /, 0–9, A–Z, a–z that is different to the standard Base 64 Encoding alphabet) consisting of:
22 characters of salt (effectively only 128 bits of the 132 decoded bits)
31 characters of encrypted output (effectively only 184 bits of the 186 decoded bits)
Thus the total length is 59 or 60 bytes respectively.
As you use the 2a format, you’ll need 60 bytes. And thus for MySQL I’ll recommend to use the CHAR(60) BINARYor BINARY(60) (see The _bin and binary Collations for information about the difference).
CHAR is not binary safe and equality does not depend solely on the byte value but on the actual collation; in the worst case A is treated as equal to a. See The _bin and binary Collations for more information.
A Bcrypt hash can be stored in a BINARY(40) column.
BINARY(60), as the other answers suggest, is the easiest and most natural choice, but if you want to maximize storage efficiency, you can save 20 bytes by losslessly deconstructing the hash. I've documented this more thoroughly on GitHub: https://github.com/ademarre/binary-mcf
Bcrypt hashes follow a structure referred to as modular crypt format (MCF). Binary MCF (BMCF) decodes these textual hash representations to a more compact binary structure. In the case of Bcrypt, the resulting binary hash is 40 bytes.
Gumbo did a nice job of explaining the four components of a Bcrypt MCF hash:
$<id>$<cost>$<salt><digest>
Decoding to BMCF goes like this:
$<id>$ can be represented in 3 bits.
<cost>$, 04-31, can be represented in 5 bits. Put these together for 1 byte.
The 22-character salt is a (non-standard) base-64 representation of 128 bits. Base-64 decoding yields 16 bytes.
The 31-character hash digest can be base-64 decoded to 23 bytes.
Put it all together for 40 bytes: 1 + 16 + 23
You can read more at the link above, or examine my PHP implementation, also on GitHub.
If you are using PHP's password_hash() with the PASSWORD_DEFAULT algorithm to generate the bcrypt hash (which I would assume is a large percentage of people reading this question) be sure to keep in mind that in the future password_hash() might use a different algorithm as the default and this could therefore affect the length of the hash (but it may not necessarily be longer).
From the manual page:
Note that this constant is designed to change over time as new and
stronger algorithms are added to PHP. For that reason, the length of
the result from using this identifier can change over time. Therefore,
it is recommended to store the result in a database column that can
expand beyond 60 characters (255 characters would be a good choice).
Using bcrypt, even if you have 1 billion users (i.e. you're currently competing with facebook) to store 255 byte password hashes it would only ~255 GB of data - about the size of a smallish SSD hard drive. It is extremely unlikely that storing the password hash is going to be the bottleneck in your application. However in the off chance that storage space really is an issue for some reason, you can use PASSWORD_BCRYPT to force password_hash() to use bcrypt, even if that's not the default. Just be sure to stay informed about any vulnerabilities found in bcrypt and review the release notes every time a new PHP version is released. If the default algorithm is ever changed it would be good to review why and make an informed decision whether to use the new algorithm or not.
I don't think that there are any neat tricks you can do storing this as you can do for example with an MD5 hash.
I think your best bet is to store it as a CHAR(60) as it is always 60 chars long
I think best choice is nonbinary type, because in comparison is less combination and should be faster. If data is encoded with base64_encode then each position, each byte have only 64 possible values. If encoded with bin2hex each byte have only 16 possible values, but string is much longer. In binary byte have 256 position on each.
I use for hashes in form of encode64 VARCHAR(255) column with ascii character set and the same collation.
VARBINARY causes comparison problem as described in MySQL documentation. I don't know why answers advice to use VARBINARY have so many positives.
I checked this on my author site, where measure time (just refresh to see).