Atomicy of Add operation in Couchbase (Java SDK)

Atomicy of Add operation in Couchbase (Java SDK) - couchbase

Using Couchbase server 2.2 with Java SDK 1.4.4.
The documentation of MemcachedClient::add(String key, int exp, Object o) inherited by CouchbaseClient states: "Add an object to the cache (using the default transcoder) iff it does not exist already".
I haven't found any mention of the atomicy of this operation.
Will asynchronous calls keep the initial value of the added key? Or this is a non-atomic wrapper for a get followed by a set?
Thanks.

add (like most Couchbase operations) is atomic - the cluster will (atomically) perform a check to see if the specified key exists and only if it doesn't will it set it to the given value.
If the key does exist you'll get an error back (EEXISTS or the Java native equivalent).

Related

Working on migration of SPL 3.0 to 4.2 (TEDA)

I am working on migration of 3.0 code into new 4.2 framework. I am facing a few difficulties:
How to do CDR level deduplication in new 4.2 framework? (Note: Table deduplication is already done).
Where to implement PostDedupProcessor - context or chainsink custom? In either case, do I need to remove duplicate hashcodes from the list or just reject the tuples? Here I am also doing column updating for a few tuples.
My file is not moving into archive. The temporary output file is getting generated and that too empty and outside load directory. What could be the possible reasons? - I have thoroughly checked config parameters and after putting logs, it seems correct output is being sent from transformer custom, so I don't know where it is stuck. I had printed TableRowGenerator stream for logs(end of DataProcessor).

1. and 2.:
You need to select the type of deduplication. It is not a big difference if you choose "table-" or "cdr-level-deduplication".
The ite.businessLogic.transformation.outputType does affect this. There is one Dedup only. You can not have both.
Select recordStream for "cdr-level-deduplication", do the transformation to table row format (e.g. if you like to use the TableFileWriter) in xxx.chainsink.custom::PostContextDataProcessor.
In xxx.chainsink.custom::PostContextDataProcessor you need to add custom code for duplicate-handling: reject (discard) tuples or set special column values or write them to different target tables.
3.:
Possibly reasons could be:
Missing forwarding of window punctuations or statistic tuple
error in BloomFilter configuration, you would see it easily because PE is down and error log gives hints about wrong sha2 functions be used
To troubleshoot your ITE application, I recommend to enable the following debug sinks if checking the StreamsStudio live graph is not sufficient:
ite.businessLogic.transformation.debug=on
ite.businessLogic.group.debug=on
ite.businessLogic.sink.debug=on
Run a test with a single input file only and check the flow of your record and statistic tuples. "Debug sinks" write punctuations markers also to debug files.

How can decrypt Cakephp3 encrypted data right from MySQL?

I have a very specific requirement where some columns need to be encrypted using aes_encrypt / aes_decrypt. We need to encrypt the information at SQL level using a eas so it can be read using another app or directly from MySQL using a query and aes_encrypt / aes_decrypt.
Our app was developed using CakePHP 3 and database is MySQL 5.6.25.
I found and carefully follow the instruction on this selected answer: Encyption/Decryption of Form Fields in CakePHP 3
Now the data is being saved encrypted on the database... the problem is that we still need to be able to use aes_decrypt on MySQL to decrypt the information and it's returning NULL.
On CakePHP 3, config/app.php:
'Security' => ['salt' => '1234567890']
Then encrypted using:
Security::encrypt($value, Security::salt());
Data is saved on MySQL but aes_decrypt() returns NULL
SELECT AES_DECRIPT(address_enc, '1234567890') FROM address;
How can I setup CakePHP 3 to correctly encrypt information so I can later decrypt it on MySQL using aes_decrypt() ?
[EDIT]
My MYSQL table:
CREATE TABLE IF NOT EXISTS `address` (
`id` int(11) NOT NULL,
`address` varchar(255) DEFAULT NULL,
`address_enc` blob,
`comment` varchar(255) DEFAULT NULL,
`comment_enc` blob
) ENGINE=MyISAM AUTO_INCREMENT=2 DEFAULT CHARSET=utf8;
Note: address and comment are just for testings.
Then, on CakePHP, I created a custom database type:
src/Database/Type/CryptedType.php
<?php
namespace App\Database\Type;
use Cake\Database\Driver;
use Cake\Database\Type;
use Cake\Utility\Security;
class CryptedType extends Type
{
public function toDatabase($value, Driver $driver)
{
return Security::encrypt($value, Security::salt());
}
public function toPHP($value, Driver $driver)
{
if ($value === null) {
return null;
}
return Security::decrypt($value, Security::salt());
}
}
src/config/bootstrap.php
Register the custom type.
use Cake\Database\Type;
Type::map('crypted', 'App\Database\Type\CryptedType');
src/Model/Table/AddressTable.php
Finally map the cryptable columns to the registered type, and that's it, from now on everything's being handled automatically.
use Cake\Database\Schema\Table as Schema;
class AddressTable extends Table
{
// ...
protected function _initializeSchema(Schema $table)
{
$table->columnType('address_enc', 'crypted');
$table->columnType('comment_enc', 'crypted');
return $table;
}
// ...
}

Do you really need to do that?
I'm not going to argue about the pros and cons of storing encrypted data in databases, but whether trying to decrypt on SQL level is a good idea, is a question that should be asked.
So ask yourself whether you really need to do that, maybe it would be better to implement the decryption at application level instead, it would probably make things easier with regards to replicating exactly what Security::decrypt() does, which is not only decrypting, but also integrity checking.
Just take a look at what Security::decrypt() does internally.
https://github.com/cakephp/cakephp/blob/3.1.7/src/Utility/Security.php#L201
https://github.com/cakephp/cakephp/blob/3.1.7/src/Utility/Crypto/OpenSsl.php#L77
https://github.com/cakephp/cakephp/blob/3.1.7/src/Utility/Crypto/Mcrypt.php#L89
It should be pretty easy to re-implement that in your other application.
Watch out, you may be about to burn your fingers!
I am by no means an encryption expert, so consider the following as just a basic example to get things started, and inform yourself about possible conceptual, and security related problems in particular!
Handling encryption/decryption of data without knowing exactly what you are doing, is a very bad idea - I can't stress that enough!
Decrypting data at SQL level
That being said, using the example code from my awful (sic) answer that you've linked to, ie using Security::encrypt(), and Security::salt() as the encryption key, will by default leave you with a value that has been encrypted in AES-256-CBC mode, using an encryption key derived from the salt concatenated with itself (first 32 bytes of its SHA256 representation).
But that's not all, additionally the encrypted value gets an HMAC hash, and the initialization vector pepended, so that you do not end up with "plain" encrypted data that you could directly pass to AES_DECRYPT().
So if you'd wanted to decrypt this on MySQL level (for whatever reason), then you'd first of all have to set the proper block encryption mode
SET block_encryption_mode = 'aes-256-cbc';
sparse out the HMAC hash (first 64 bytes) and the initialization vector (following 16 bytes)
SUBSTRING(`column` FROM 81)
and use the first 32 bytes of hash('sha256', Security::salt() . Security::salt()) as the encryption key, and the initialization vector from the encrypted value for decryption
SUBSTRING(`column`, 65, 16)
So in the end you'd be left with something like
SET block_encryption_mode = 'aes-256-cbc';
SELECT
AES_DECRYPT(
SUBSTRING(`column` FROM 81), -- the actual encryted data
'the-encryption-key-goes-here',
SUBSTRING(`column`, 65, 16) -- the intialization vector
)
FROM table;
Finally you maybe also want to cast the value (CAST(AES_DECRYPT(...) AS CHAR)), and remove possible zero padding (not sure whether AES_DECRYPT() does that automatically).
Data integrity checks
It should be noted that the HMAC hash that is prepended to the encrypted value, has a specific purpose, it is used to ensure integrity, so by just dropping it, you'll lose that. In order to keep it, you'd have to implement a (timing attack safe) HMAC256 generation/comparison on SQL level too. This leads us back to the intial question, do you really need to decrypt on SQL level?

[Solution] The solution for this particular requirement (we need to encrypt the information at SQL level using a eas so it can be read using another app or directly from MySQL using a query and aes_encrypt / aes_decryp) was to create a custom database type in CakePHP them, instead of using CakePHP encryption method, we implemented PHP Mcrypt.
Now the information is saved to the database from our CakePHP 3 app and the data be read at MySQL/phpMyAdmin level using eas_decrypt and aes_encrypt.

FOR ANYONE STRUGGLING TO DECRYPT WITH MYSQL: This generally applies to anyone using symmetric AES encryption/decryption - specifically when trying to decrypt with AES_DECRYPT.
For instance, if you are using aes-128-ecb, and your encrypted data is 16 bytes long with no padding, you need to add padding bytes to your encrypted data before trying to decrypt (because mySQL is expecting PKCS7 padding). Because MySQL uses PKCS7, you need to add 16 more bytes, in this case those pad bytes are 0x10101010101010101010101010101010. We take the left 16 bytes because when we encrypt the 0x10101010101010101010101010101010, we get 32 bytes, and we only need the first 16.
aes_decrypt(concat(<ENCRYPTED_BYTES>, left(aes_encrypt(<PAD BYTES>, <KEY>), 16)), <KEY>)

Returning values from InputFormat via the Hadoop Configuration object

Consider a running Hadoop job, in which a custom InputFormat needs to communicate ("return", similarly to a callback) a few simple values to the driver class (i.e., to the class that has launched the job), from within its overriden getSplits() method, using the new mapreduce API (as opposed to mapred).
These values should ideally be returned in-memory (as opposed to saving them to HDFS or to the DistributedCache).
If these values were only numbers, one could be tempted to use Hadoop counters. However, in numerous tests counters do not seem to be available at the getSplits() phase and anyway they are restricted to numbers.
An alternative could be to use the Configuration object of the job, which, as the source code reveals, should be the same object in memory for both the getSplits() and the driver class.
In such a scenario, if the InputFormat wants to "return" a (say) positive long value to the driver class, the code would look something like:
// In the custom InputFormat.
public List<InputSplit> getSplits(JobContext job) throws IOException
{
...
long value = ... // A value >= 0
job.getConfiguration().setLong("value", value);
...
}
// In the Hadoop driver class.
Job job = ... // Get the job to be launched
...
job.submit(); // Start running the job
...
while (!job.isComplete())
{
...
if (job.getConfiguration().getLong("value", -1))
{
...
}
else
{
continue; // Wait for the value to be set by getSplits()
}
...
}
The above works in tests, but is it a "safe" way of communicating values?
Or is there a better approach for such in-memory "callbacks"?
UPDATE
The "in-memory callback" technique may not work in all Hadoop distributions, so, as mentioned above, a safer way is, instead of saving the values to be passed back in the Configuration object, create a custom object, serialize it (e.g., as JSON), saved it (in HDFS or in the distributed cache) and have it read in the driver class. I have also tested this approach and it works as expected.

Using the configuration is a perfectly suitable solution (admittedly for a problem I'm not sure I understand), but once the job has actually been submitted to the Job tracker, you will not be able to amend this value (client side or task side) and expect to see the change on the opposite side of the comms (setting configuration values in a map task for example will not be persisted to the other mappers, nor to the reducers, nor will be visible to the job tracker).
So to communicate information back from within getSplits back to your client polling loop (to see when the job has actually finished defining the input splits) is fine in your example.
What's your greater aim or use case for using this?

Should we overload the meaning of configuration settings?

Imagine an instance of some lookup of configuration settings called "configuration", used like this:
if(! string.IsNullOrEmpty(configuration["MySetting"])
{
DoSomethingWithTheValue(configuration["MySetting"]);
}
The meaning of the setting is overloaded. It means both "turn this feature on or off" and "here is a specific value to do something with". These can be decomposed into two settings:
if(configuration["UseMySetting"])
{
DoSomethingWithTheValue(configuration["MySetting"]);
}
The second approach seems to make configuration more complicated, but slightly easier to parse, and it separate out the two sorts of behaviour. The first seems much simpler at first but it's not clear what we choose as the default "turn this off" setting. "" might actually a valid value for MySetting.
Is there a general best practice rule for this?

I find the question to be slightly confusing, because it talks about (1) parsing, and (2) using configuration settings, but the code samples are for only the latter. That confusion means that my answer might be irrelevant to what you intended to ask. Anyway...
I suggest an approach that is illustrated by the following pseudo-code API (comments follow afterwards):
class Configuration
{
void parse(String fileName);
boolean exists(String name);
String lookupString(String name);
String lookupString(String name, String defaultValue);
int lookupInt(String name);
int lookupInt(String name, int defaultValue);
float lookupFloat(String name);
float lookupFloat(String name, float defaultValue);
boolean lookupBoolean(String name);
boolean lookupBoolean(String name, boolean defaultValue);
... // more pairs of lookup<Type>() operations for other types
}
The parse() operation parses a configuration file and stores the parsed data in a convenient format, for example, in a map or hash-table. (If you want, parse() can delegate the parsing to a third-party library, for example, a parser for XML, Java Properties, JSON, .ini files or whatever.)
After parsing is complete, your application can invoke the other operations to retrieve/use the configuration settings.
A lookup<Type>() operation retrieves the value of the specified name and parses it into the specified type (and throws an exception if the parsing fails). There are two overloadings for each lookup<Type>() operation. The version with one parameter throws an exception if the specified variable does not exist. The version with an extra parameter (denoting a default value) returns that default value if the specified variable does not exist.
The exists() operation can be used to test whether a specified name exists in the configuration file.
The above pseudo-code API offers two benefits. First, it provides type-safe access to configuration data (which wasn't a stated requirement in your question, but I think it is important anyway). Second, it enables you to distinguish between "variable is not defined in configuration" and "variable is defined but its value happens to be an empty string".
If you have already committed yourself to a particular configuration syntax, then just implement the above Configuration class as a thin wrapper around a parser for the existing configuration syntax. If you haven't already chosen a configuration syntax and if your project is in C++ or Java, then you might want to look at my Config4* library, which provides a ready-to-use implementation of the above pseudo-code class (with a few extra bells and whistles).

Problem with nhibernate, Mysql, and Guids

I have a view that flattens out a hierachy of 4 tables to display as a report. within the view it contains the primary keys (Guid) of each of the tables along with some display data.
The problem is that the the guids are being returned as varbinary(16) instead of binary(16) and as a result nhibernate throws an error. This would appear to be the same to me but maybe I am missing something.
Guid should contain 32 digits with 4 dashes (xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx).
I have tried adding Respect Binary Flags=true; to the config string all that seems to do is affect whether the regular classes work or not.
This one hase me stumped. I am about to revert the primary keys to Integers as a last resort.

Solution: create a custom dialect
public class MySQL5GuidFixDialect
: MySQL5Dialect
{
public MySQL5GuidFixDialect()
{
RegisterColumnType(DbType.Guid, "CHAR(36)");
}
}
Don't forget to configure it in your NHibernate configuration. I prefer CHAR over VARCHAR because it uses (or is supposed to use) static alloc instead of dynamic for fixed-length fields

This is a bug in Mysql .net connector check this bug report for more details
http://bugs.mysql.com/bug.php?id=52747
UPDATE:
After version 6.1.1 you should add "old guids=true" to your connection string whenever you use BINARY(16) as your storage type. Else you should use CHAR(36)

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Atomicy of Add operation in Couchbase (Java SDK) - couchbase

add (like most Couchbase operations) is atomic - the cluster will (atomically) perform a check to see if the specified key exists and only if it doesn't will it set it to the given value. If the key does exist you'll get an error back (EEXISTS or the Java native equivalent).

Related

Working on migration of SPL 3.0 to 4.2 (TEDA)

How can decrypt Cakephp3 encrypted data right from MySQL?

Returning values from InputFormat via the Hadoop Configuration object

Should we overload the meaning of configuration settings?

Problem with nhibernate, Mysql, and Guids

Categories

Resources