I have a very specific requirement where some columns need to be encrypted using aes_encrypt / aes_decrypt. We need to encrypt the information at SQL level using a eas so it can be read using another app or directly from MySQL using a query and aes_encrypt / aes_decrypt.
Our app was developed using CakePHP 3 and database is MySQL 5.6.25.
I found and carefully follow the instruction on this selected answer: Encyption/Decryption of Form Fields in CakePHP 3
Now the data is being saved encrypted on the database... the problem is that we still need to be able to use aes_decrypt on MySQL to decrypt the information and it's returning NULL.
On CakePHP 3, config/app.php:
'Security' => ['salt' => '1234567890']
Then encrypted using:
Security::encrypt($value, Security::salt());
Data is saved on MySQL but aes_decrypt() returns NULL
SELECT AES_DECRIPT(address_enc, '1234567890') FROM address;
How can I setup CakePHP 3 to correctly encrypt information so I can later decrypt it on MySQL using aes_decrypt() ?
[EDIT]
My MYSQL table:
CREATE TABLE IF NOT EXISTS `address` (
`id` int(11) NOT NULL,
`address` varchar(255) DEFAULT NULL,
`address_enc` blob,
`comment` varchar(255) DEFAULT NULL,
`comment_enc` blob
) ENGINE=MyISAM AUTO_INCREMENT=2 DEFAULT CHARSET=utf8;
Note: address and comment are just for testings.
Then, on CakePHP, I created a custom database type:
src/Database/Type/CryptedType.php
<?php
namespace App\Database\Type;
use Cake\Database\Driver;
use Cake\Database\Type;
use Cake\Utility\Security;
class CryptedType extends Type
{
public function toDatabase($value, Driver $driver)
{
return Security::encrypt($value, Security::salt());
}
public function toPHP($value, Driver $driver)
{
if ($value === null) {
return null;
}
return Security::decrypt($value, Security::salt());
}
}
src/config/bootstrap.php
Register the custom type.
use Cake\Database\Type;
Type::map('crypted', 'App\Database\Type\CryptedType');
src/Model/Table/AddressTable.php
Finally map the cryptable columns to the registered type, and that's it, from now on everything's being handled automatically.
use Cake\Database\Schema\Table as Schema;
class AddressTable extends Table
{
// ...
protected function _initializeSchema(Schema $table)
{
$table->columnType('address_enc', 'crypted');
$table->columnType('comment_enc', 'crypted');
return $table;
}
// ...
}
Do you really need to do that?
I'm not going to argue about the pros and cons of storing encrypted data in databases, but whether trying to decrypt on SQL level is a good idea, is a question that should be asked.
So ask yourself whether you really need to do that, maybe it would be better to implement the decryption at application level instead, it would probably make things easier with regards to replicating exactly what Security::decrypt() does, which is not only decrypting, but also integrity checking.
Just take a look at what Security::decrypt() does internally.
https://github.com/cakephp/cakephp/blob/3.1.7/src/Utility/Security.php#L201
https://github.com/cakephp/cakephp/blob/3.1.7/src/Utility/Crypto/OpenSsl.php#L77
https://github.com/cakephp/cakephp/blob/3.1.7/src/Utility/Crypto/Mcrypt.php#L89
It should be pretty easy to re-implement that in your other application.
Watch out, you may be about to burn your fingers!
I am by no means an encryption expert, so consider the following as just a basic example to get things started, and inform yourself about possible conceptual, and security related problems in particular!
Handling encryption/decryption of data without knowing exactly what you are doing, is a very bad idea - I can't stress that enough!
Decrypting data at SQL level
That being said, using the example code from my awful (sic) answer that you've linked to, ie using Security::encrypt(), and Security::salt() as the encryption key, will by default leave you with a value that has been encrypted in AES-256-CBC mode, using an encryption key derived from the salt concatenated with itself (first 32 bytes of its SHA256 representation).
But that's not all, additionally the encrypted value gets an HMAC hash, and the initialization vector pepended, so that you do not end up with "plain" encrypted data that you could directly pass to AES_DECRYPT().
So if you'd wanted to decrypt this on MySQL level (for whatever reason), then you'd first of all have to set the proper block encryption mode
SET block_encryption_mode = 'aes-256-cbc';
sparse out the HMAC hash (first 64 bytes) and the initialization vector (following 16 bytes)
SUBSTRING(`column` FROM 81)
and use the first 32 bytes of hash('sha256', Security::salt() . Security::salt()) as the encryption key, and the initialization vector from the encrypted value for decryption
SUBSTRING(`column`, 65, 16)
So in the end you'd be left with something like
SET block_encryption_mode = 'aes-256-cbc';
SELECT
AES_DECRYPT(
SUBSTRING(`column` FROM 81), -- the actual encryted data
'the-encryption-key-goes-here',
SUBSTRING(`column`, 65, 16) -- the intialization vector
)
FROM table;
Finally you maybe also want to cast the value (CAST(AES_DECRYPT(...) AS CHAR)), and remove possible zero padding (not sure whether AES_DECRYPT() does that automatically).
Data integrity checks
It should be noted that the HMAC hash that is prepended to the encrypted value, has a specific purpose, it is used to ensure integrity, so by just dropping it, you'll lose that. In order to keep it, you'd have to implement a (timing attack safe) HMAC256 generation/comparison on SQL level too. This leads us back to the intial question, do you really need to decrypt on SQL level?
[Solution] The solution for this particular requirement (we need to encrypt the information at SQL level using a eas so it can be read using another app or directly from MySQL using a query and aes_encrypt / aes_decryp) was to create a custom database type in CakePHP them, instead of using CakePHP encryption method, we implemented PHP Mcrypt.
Now the information is saved to the database from our CakePHP 3 app and the data be read at MySQL/phpMyAdmin level using eas_decrypt and aes_encrypt.
FOR ANYONE STRUGGLING TO DECRYPT WITH MYSQL: This generally applies to anyone using symmetric AES encryption/decryption - specifically when trying to decrypt with AES_DECRYPT.
For instance, if you are using aes-128-ecb, and your encrypted data is 16 bytes long with no padding, you need to add padding bytes to your encrypted data before trying to decrypt (because mySQL is expecting PKCS7 padding). Because MySQL uses PKCS7, you need to add 16 more bytes, in this case those pad bytes are 0x10101010101010101010101010101010. We take the left 16 bytes because when we encrypt the 0x10101010101010101010101010101010, we get 32 bytes, and we only need the first 16.
aes_decrypt(concat(<ENCRYPTED_BYTES>, left(aes_encrypt(<PAD BYTES>, <KEY>), 16)), <KEY>)
Related
I have two fields for storing geolocation data, defined as doubles in my MySQL database:
`address_geo_latitude` float(10,6) NOT NULL,
`address_geo_longitude` float(10,6) NOT NULL
And I'm using Yii2's double validator over values passed by user:
public function rules()
{
return [
[['address_geo_latitude', 'address_geo_longitude'], 'double', 'min'=>0, 'max'=>360]
];
}
(though my tests seems to be proving, that this issue has nothing to do with Yii2 validators)
During tests I've observed strange (?) changes of values, i.e.:
359.90 becomes 359.899994 (0,000006 difference),
359.80 becomes 359.799988 (0,000012 difference),
311.11 becomes 311.109985 (0,000015 difference),
255.55 becomes 255.550003 (-0,000003 difference),
205.205 becomes 205.205002 (-0,000002 difference),
105.105 becomes 105.105003 (-0,000003 difference).
but:
359.899994 remains 359.899994,
311.109985 remains 311.109985,
311 remains 311,
255 remains 255,
200 remains 200,
75.75 remains 75.75,
11.11 remains 11.11.
What am I missing? I can't see any pattern or logic behind these.
Is this, because I have an incorrect MySQL's field declaration for this kind of data? If yes, then what is the correct one? Few different answers:
Database/SQL: How to store longitude/latitude data?
What datatype to use when storing latitude and longitude data in SQL databases?
What is the ideal data type to use when storing latitude / longitudes in a MySQL database?
suggests, that using float(10,6) is the best option, if not using MySQL's spatial extensions.
My tests seems to be proving, that this issue has nothing to do with Yii2 validators, because value remains correct until re-read from database:
print_r(Yii::$app->request->post()); //Correct!
print_r($lab->address_geo_latitude); //Correct!
if ($lab->load(Yii::$app->request->post(), 'Lab') && $lab->save()) {
print_r($lab->address_geo_latitude); //Correct!
$lab2 = $this->findModel($lab->id);
print_r($lab2->address_geo_latitude); //<-- HERE! Incorrect!
}
My question is on contrary to this one. My numbers gains, not looses, accuracy! And only for certain numbers, not always.
This happens not because of Yii but because of how floating-point values are stored on binary systems.
As you can read in the MySQL documentation "Problems with Floating-Point Values":
Floating-point numbers sometimes cause confusion because they are
approximate and not stored as exact values. A floating-point value as
written in an SQL statement may not be the same as the value
represented internally.
Here you can find the great explanation for this problem with examples. As you can see numbers can get a bit bigger, smaller or not changed at all but you always have to remember that this is just an approximation.
For gelocation data you can use simple DECIMAL type to make sure values are stored unchanged in database or use Spacial Data type optimized to store and query data that represents objects defined in a geometric space.
I wrote a program to use chrome's login cookies to do something automatically, but since Chrome encrypt all the cookies at January, my program can't work anymore.
I'm trying to decrypt cookies, and success in java on mac os by This Topic, but my usual running environment is win7 os, so I have to decrypt that on windows.
I found os_crypt_win.cc in Chromium's source code, it has a encrypt part:
bool OSCrypt::EncryptString(const std::string& plaintext, std::string* ciphertext) {
DATA_BLOB input;
input.pbData = const_cast<BYTE*>(reinterpret_cast<const BYTE*>(plaintext.data()));
input.cbData = static_cast<DWORD>(plaintext.length());
DATA_BLOB output;
BOOL result = CryptProtectData(&input, L"", NULL, NULL, NULL, 0, &output);
if (!result)
return false;
// this does a copy
ciphertext->assign(reinterpret_cast<std::string::value_type*>(output.pbData), output.cbData);
LocalFree(output.pbData);
return true;
}
I imitate this part in java with JNA:
String encrypted = bytesToHex(Crypt32Util.cryptProtectData(Native.toByteArray(plaintext), 0));
or
String encrypted = bytesToHex(Crypt32Util.cryptProtectData(plaintext.getBytes());
or
String encrypted = bytesToHex(Crypt32Util.cryptProtectData(plaintext.getBytes("UTF-8"));
or
String encrypted = bytesToHex(Crypt32Util.cryptProtectData(plaintext.getBytes("UTF-16"));
But I got a wrong encrypted values different with the value store in Chrome.
Did I used a wrong method to encrypt this, or did I miss something important?
Can you help me figure this out?
You used the correct method to encrypt the values.
How are the values "wrong"? if they are just different from the one's stored in chrome that is not a problem.
The reason for that is very simple:
from msdn:
"The function creates a session key to perform the encryption. The
session key is derived again when the data is to be decrypted."
from msdn blog:
"A random session key is created for each call to CryptProtectData.
This key is derived from the master key, some random data, and some
optional entropy passed in by the user. The session key is then used
to do the actual encryption."
The important thing you should check is whether you are able to decrypt the values using DecryptUnprotectData.
I want to store a JSON payload into redis. There's really 2 ways I can do this:
One using a simple string keys and values.
key:user, value:payload (the entire JSON blob which can be 100-200 KB)
SET user:1 payload
Using hashes
HSET user:1 username "someone"
HSET user:1 location "NY"
HSET user:1 bio "STRING WITH OVER 100 lines"
Keep in mind that if I use a hash, the value length isn't predictable. They're not all short such as the bio example above.
Which is more memory efficient? Using string keys and values, or using a hash?
This article can provide a lot of insight here: http://redis.io/topics/memory-optimization
There are many ways to store an array of Objects in Redis (spoiler: I like option 1 for most use cases):
Store the entire object as JSON-encoded string in a single key and keep track of all Objects using a set (or list, if more appropriate). For example:
INCR id:users
SET user:{id} '{"name":"Fred","age":25}'
SADD users {id}
Generally speaking, this is probably the best method in most cases. If there are a lot of fields in the Object, your Objects are not nested with other Objects, and you tend to only access a small subset of fields at a time, it might be better to go with option 2.
Advantages: considered a "good practice." Each Object is a full-blown Redis key. JSON parsing is fast, especially when you need to access many fields for this Object at once. Disadvantages: slower when you only need to access a single field.
Store each Object's properties in a Redis hash.
INCR id:users
HMSET user:{id} name "Fred" age 25
SADD users {id}
Advantages: considered a "good practice." Each Object is a full-blown Redis key. No need to parse JSON strings. Disadvantages: possibly slower when you need to access all/most of the fields in an Object. Also, nested Objects (Objects within Objects) cannot be easily stored.
Store each Object as a JSON string in a Redis hash.
INCR id:users
HMSET users {id} '{"name":"Fred","age":25}'
This allows you to consolidate a bit and only use two keys instead of lots of keys. The obvious disadvantage is that you can't set the TTL (and other stuff) on each user Object, since it is merely a field in the Redis hash and not a full-blown Redis key.
Advantages: JSON parsing is fast, especially when you need to access many fields for this Object at once. Less "polluting" of the main key namespace. Disadvantages: About same memory usage as #1 when you have a lot of Objects. Slower than #2 when you only need to access a single field. Probably not considered a "good practice."
Store each property of each Object in a dedicated key.
INCR id:users
SET user:{id}:name "Fred"
SET user:{id}:age 25
SADD users {id}
According to the article above, this option is almost never preferred (unless the property of the Object needs to have specific TTL or something).
Advantages: Object properties are full-blown Redis keys, which might not be overkill for your app. Disadvantages: slow, uses more memory, and not considered "best practice." Lots of polluting of the main key namespace.
Overall Summary
Option 4 is generally not preferred. Options 1 and 2 are very similar, and they are both pretty common. I prefer option 1 (generally speaking) because it allows you to store more complicated Objects (with multiple layers of nesting, etc.) Option 3 is used when you really care about not polluting the main key namespace (i.e. you don't want there to be a lot of keys in your database and you don't care about things like TTL, key sharding, or whatever).
If I got something wrong here, please consider leaving a comment and allowing me to revise the answer before downvoting. Thanks! :)
It depends on how you access the data:
Go for Option 1:
If you use most of the fields on most of your accesses.
If there is variance on possible keys
Go for Option 2:
If you use just single fields on most of your accesses.
If you always know which fields are available
P.S.: As a rule of the thumb, go for the option which requires fewer queries on most of your use cases.
Some additions to a given set of answers:
First of all if you going to use Redis hash efficiently you must know
a keys count max number and values max size - otherwise if they break out hash-max-ziplist-value or hash-max-ziplist-entries Redis will convert it to practically usual key/value pairs under a hood. ( see hash-max-ziplist-value, hash-max-ziplist-entries ) And breaking under a hood from a hash options IS REALLY BAD, because each usual key/value pair inside Redis use +90 bytes per pair.
It means that if you start with option two and accidentally break out of max-hash-ziplist-value you will get +90 bytes per EACH ATTRIBUTE you have inside user model! ( actually not the +90 but +70 see console output below )
# you need me-redis and awesome-print gems to run exact code
redis = Redis.include(MeRedis).configure( hash_max_ziplist_value: 64, hash_max_ziplist_entries: 512 ).new
=> #<Redis client v4.0.1 for redis://127.0.0.1:6379/0>
> redis.flushdb
=> "OK"
> ap redis.info(:memory)
{
"used_memory" => "529512",
**"used_memory_human" => "517.10K"**,
....
}
=> nil
# me_set( 't:i' ... ) same as hset( 't:i/512', i % 512 ... )
# txt is some english fictionary book around 56K length,
# so we just take some random 63-symbols string from it
> redis.pipelined{ 10000.times{ |i| redis.me_set( "t:#{i}", txt[rand(50000), 63] ) } }; :done
=> :done
> ap redis.info(:memory)
{
"used_memory" => "1251944",
**"used_memory_human" => "1.19M"**, # ~ 72b per key/value
.....
}
> redis.flushdb
=> "OK"
# setting **only one value** +1 byte per hash of 512 values equal to set them all +1 byte
> redis.pipelined{ 10000.times{ |i| redis.me_set( "t:#{i}", txt[rand(50000), i % 512 == 0 ? 65 : 63] ) } }; :done
> ap redis.info(:memory)
{
"used_memory" => "1876064",
"used_memory_human" => "1.79M", # ~ 134 bytes per pair
....
}
redis.pipelined{ 10000.times{ |i| redis.set( "t:#{i}", txt[rand(50000), 65] ) } };
ap redis.info(:memory)
{
"used_memory" => "2262312",
"used_memory_human" => "2.16M", #~155 byte per pair i.e. +90 bytes
....
}
For TheHippo answer, comments on Option one are misleading:
hgetall/hmset/hmget to the rescue if you need all fields or multiple get/set operation.
For BMiner answer.
Third option is actually really fun, for dataset with max(id) < has-max-ziplist-value this solution has O(N) complexity, because, surprise, Reddis store small hashes as array-like container of length/key/value objects!
But many times hashes contain just a few fields. When hashes are small we can instead just encode them in an O(N) data structure, like a linear array with length-prefixed key value pairs. Since we do this only when N is small, the amortized time for HGET and HSET commands is still O(1): the hash will be converted into a real hash table as soon as the number of elements it contains will grow too much
But you should not worry, you'll break hash-max-ziplist-entries very fast and there you go you are now actually at solution number 1.
Second option will most likely go to the fourth solution under a hood because as question states:
Keep in mind that if I use a hash, the value length isn't predictable. They're not all short such as the bio example above.
And as you already said: the fourth solution is the most expensive +70 byte per each attribute for sure.
My suggestion how to optimize such dataset:
You've got two options:
If you cannot guarantee max size of some user attributes then you go for first solution, and if memory matter is crucial then
compress user json before storing in redis.
If you can force max size of all attributes.
Then you can set hash-max-ziplist-entries/value and use hashes either as one hash per user representation OR as hash memory optimization from this topic of a Redis guide: https://redis.io/topics/memory-optimization and store user as json string. Either way you may also compress long user attributes.
we had a similar issue in our production env , we have came up with an idea of gzipping the payload if it exceeds some threshold KB.
I have a repo only dedicated to this Redis client lib here
what is the basic idea is to detect the payload if the size is greater than some threshold and then gzip it and also base-64 it and then keep the compressed string as a normal string in the redis. on retrieval detect if the string is a valid base-64 string and if so decompress it.
the whole compressing and decompressing will be transparent plus you gain close to 50% network traffic
Compression Benchmark Results
BenchmarkDotNet=v0.12.1, OS=macOS 11.3 (20E232) [Darwin 20.4.0]
Intel Core i7-9750H CPU 2.60GHz, 1 CPU, 12 logical and 6 physical cores
.NET Core SDK=5.0.201
[Host] : .NET Core 3.1.13 (CoreCLR 4.700.21.11102, CoreFX 4.700.21.11602), X64 RyuJIT DEBUG
Method
Mean
Error
StdDev
Gen 0
Gen 1
Gen 2
Allocated
WithCompressionBenchmark
668.2 ms
13.34 ms
27.24 ms
-
-
-
4.88 MB
WithoutCompressionBenchmark
1,387.1 ms
26.92 ms
37.74 ms
-
-
-
2.39 MB
To store JSON in Redis you can use the Redis JSON module.
This gives you:
Full support for the JSON standard
A JSONPath syntax for selecting/updating elements inside documents
Documents stored as binary data in a tree structure, allowing fast access to sub-elements
Typed atomic operations for all JSON values types
https://redis.io/docs/stack/json/
https://developer.redis.com/howtos/redisjson/getting-started/
https://redis.com/blog/redisjson-public-preview-performance-benchmarking/
You can use the json module: https://redis.io/docs/stack/json/
It is fully supported and allows you to use json as a data structure in redis.
There is also Redis Object Mappers for some languages: https://redis.io/docs/stack/get-started/tutorials/
I got an unified JDBC code for reading/writing large texts. Column is CLOB on Oracle and TEXT on MySQL. The following code
java.sql.Clob aClob = resultSet.getClob(COLUMN_NAME);
java.io.InputStream aStream = aClob.getAsciiStream();
int av = aStream.available();
gives relevant value on MySQL (Connector/J 5.0.4) but zero on Oracle (Oracle JDBC driver 11.2.0.2). Clob.length() fortunately gives correct value on both and InputStream.read() up to -1 works too, so there are other ways of obtaining the data in unified way.
Javadoc gives this weird note:
The available method for class InputStream always returns 0.
So which driver is right? And no, i don't want to drag vendor-specific packages into the code :-) This question is JDBC neutral.
I would be tempted to say that both drivers were right.
The Javadoc for the available() method appears to suggest that the value returned is an estimate of how many bytes the InputStream currently has cached and can return to you without an I/O operation. How many bytes it has cached, and how it does any caching, would seem to me to be an implementation detail. The fact that these values are different merely suggests that the two drivers are implemented differently. Nothing in the Javadoc for the available() method suggests to me that either driver is doing anything wrong.
I'd guess that the Oracle driver doesn't cache any data from the CLOB immediately after executing the query, so that might be why the available() method returns 0. However, once data has been read from the stream, the available() method for the Oracle driver no longer returns 0, as it seems Oracle JDBC driver has been to the database and fetched some data out of the CLOB column. On the other hand, MySQL seems to be a bit more proactive in actually fetching data out of the TEXT column as soon as the query has finished executing.
Having read the Javadoc for the available() method I'm not sure why I'd use it. What are you using it for?
I have a view that flattens out a hierachy of 4 tables to display as a report. within the view it contains the primary keys (Guid) of each of the tables along with some display data.
The problem is that the the guids are being returned as varbinary(16) instead of binary(16) and as a result nhibernate throws an error. This would appear to be the same to me but maybe I am missing something.
Guid should contain 32 digits with 4 dashes (xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx).
I have tried adding Respect Binary Flags=true; to the config string all that seems to do is affect whether the regular classes work or not.
This one hase me stumped. I am about to revert the primary keys to Integers as a last resort.
Solution: create a custom dialect
public class MySQL5GuidFixDialect
: MySQL5Dialect
{
public MySQL5GuidFixDialect()
{
RegisterColumnType(DbType.Guid, "CHAR(36)");
}
}
Don't forget to configure it in your NHibernate configuration. I prefer CHAR over VARCHAR because it uses (or is supposed to use) static alloc instead of dynamic for fixed-length fields
This is a bug in Mysql .net connector check this bug report for more details
http://bugs.mysql.com/bug.php?id=52747
UPDATE:
After version 6.1.1 you should add "old guids=true" to your connection string whenever you use BINARY(16) as your storage type. Else you should use CHAR(36)