How do I prevent duplicate values in Google Cloud Datastore?

How do I prevent duplicate values in Google Cloud Datastore? - duplicates

Is there any mechanism available to prevent duplicate data in certain fields of my entities? Something similar to an SQL unique.
Failing that, what techniques do people normally use to prevent duplicate values?

The only way to do the equivalent on a UNIQUE constraint in SQL will not scale very well in a NoSQL storage system like Cloud Datastore. This is mainly because it would require a read before every write, and a transaction surrounding the two operations.
If that's not an issue (ie, you don't write values very often), the process might look something like:
Begin a serializable transaction
Query across all Kinds for a match of property = value
If the query has matches, abort the transaction
If there are no matches, insert new entity with property = value
Commit the transaction
Using gcloud-python, this might look like...
from gcloud import datastore
client = datastore.Client()
with client.transaction(serializable=True) as t:
q = client.query(kind='MyKind')
q.add_filter('property', '=', 'value')
if q.fetch(limit=1):
t.rollback()
else:
entity = datastore.Entity(datastore.Key('MyKind'))
entity.property = 'value'
t.put(entity)
t.commit()
Note: The serializable flag on Transactions is relatively new in gcloud-python. See https://github.com/GoogleCloudPlatform/gcloud-python/pull/1205/files for details.
The "right way" to do this is to design your data such that the key is your measure of "uniqueness", but without knowing more about what you're trying to do, I can't say much else.

The approach given above will not work in the datastore because you cannot to a query across arbitrary entities inside a transaction. If you try, an exception will be thrown.
However you can do it by using a new kind for each unique field and doing a "get" (lookup by key) within the transaction.
For example, say you have a Person kind and you want to ensure that Person.email is unique you also need a kind for e.g. UniquePersonEmail. That does not need to be referenced by anything but it is just there to ensure uniqueness.
start transaction
get UniquePersonEmail with id = theNewAccountEmail
if exists abort
put UniquePersonEmail with id = theNewAccountEmail
put Person with all the other details including the email
commit the transaction
So you end up doing one read and two writes to create your account.

Related

Database Access Objects batchInsert() yii2 run another function after each record is inserted

I am using batchInsert() function of Yii2 Database Access Objects. I need to run another function like sending email from PHP code after each record is inserted.What is the workaround to achieve this? Or is it possible to get all AUTO_INCREMENT ids of rows inserted?
The code is
Yii::$app->db->createCommand()->batchInsert(Address::tableName(),
['form_id','address'], $rows)->execute();
I am using batchInsert() documented in https://www.yiiframework.com/doc/api/2.0/yii-db-command#batchInsert()-detail

First, you're not using ActiveRecord. More about ActiveRecord you can find in the documentation: API and guide. Yii::$app->db->createCommand() is a DAO which is much simpler than ActiveRecord and does not support events in the same way.
Second, there is no batchInsert() for ActiveRecord and there is a good reason for that - it is hard to detect IDs of inserted records when you're inserting them in batch (at least in a reliable and DB-independent way). More about this you can read at GitHub.
However if you know IDs of records or some unique fields before insert (for example user in addition to the numeric ID, it also has a unique login and/or email), you may just fetch them after insert and run event manually:
$models = Address::findAll(['email' => $emailsArray]);
foreach ($models as $model) {
$model->trigger('myEvent');
}
But unless you're inserting millions of records, you should probably stick to simple foreach and $model->save(), for the sake of simplicity and reliability.
There is also a yii2-collection official extension. This is still more like a draft and POC, but it may be interesting in the future.

Create or update record for ArchivedUser

I am trying to make a backup table of users, called archived users. It creates the ArchivedUser by taking a hash of the current users attributes (self) and merging in the self.id as the user_id.
When a user is reinstated, their record as an ArchivedUser still remains in the ArchivedUser table. If the user gets deleted a second time, it should update any attributes that have changed.
Currently, it throws a validation error:
Validation failed: User has already been taken, as the self.id already exists in the ArchivedUser table.
What is a better way to handle an object where you update an existing object if possible, or create a new record if it doesn't exist. I am using Rails 4 and have tried find_or_create_by but it throws an error
Mysql2::Error: Unknown column 'device_details.device_app_version'
which is odd, as that column exists in both tables and doesn't get modified.
User Delete Method
# creates ArchivedUser with the exact attributes of the User
# object and merges self.id to fill user_id on ArchivedUser
if ArchivedUser.create!(
self.attributes.merge(user_id: self.id)
)
Thanks for taking a peek!

If your archived_users table is truly acting as a backup for users and not adding any additional functionality, I would ditch the ArchiveUser model and simply add an archived boolean on the User model to tell whether or not the user is archived.
That way you don't have to deal with moving an object to another table and hooking into a delete callback.
However, if your ArchiveUser model does offer some different functionality compared to User, another option would be to use single table inheritence to differentiate the type of user. In this case, you could have User govern all users, and then distinguish between a user being, for example, an ActiveUser or an ArchivedUser.
This takes more setup and can be a bit confusing if you haven't worked with STI, but it can be useful when two similar models need to differ only slightly.
That being said, if you want to keep your current setup, I believe there are a few issues I see with your code:
If you are going to create an object from an existing object, it's good practice to duplicate the object (dup). That way the id won't be automatically set and can be auto-incremented.
If you truly have deleted the User record from the database, there's no reason to store a reference to its id because it's gone. But if you aren't actually deleting the record, you should definitely just use a boolean attribute to determine whether or not the user is active or archived.
I don't have enough context here as to why find_or_create_by isn't working, but if it were the case, then I would keep it as simple as possible. Don't use all the attributes, but just the consistent ones (like id) that you know will return the proper result.
if ArchivedUser.create! # ... is problematic. The bang after create (i.e. create!) will throw an error if the record could not be created, making the if pointless. So, either use if if you don't want errors thrown and want to handle the condition in which the record was not created. Or use create! without if if you do want to throw an error.

Sailsjs error handling with database / waterline errors

I'm building an api that receives webhook data (orders from shopify) and saves them to a relational mysql database. Each order has products in it that belong to collections (categories). I relate the product to the collection, or create a new collection if it doesn't already exist. The problem I run into is that since I can't use findOrCreate() I do a find() first, and then if nothing is found I do a create. However, since nodejs is async, if I get many orders coming in at once, sometimes the same collection isn't found for two different products at the same time, and then one will create it right before the other one tries, so when the other one tries to create it waterline throws an error:
col_id
• A record with that col_id already exists (405145674).
What is the best way to catch this error so I can just do another find() so the function returns the proper collection instead of an undefined? I could just do this:
if (err){ findmethodagain(); }
But then it would try that for any error, not a "record with that id already exists" error.

If you're not seeing this a lot, it wouldn't hurt to just do a second attempt after a delay. You can use Promise.delay from, for example, Bluebird to run a second function after some number of milliseconds on any error. If in most situations one of them creates the collection, then you'll succeed with just one retry.
If you are seeing lots of other errors, well, fix them.
However, race conditions usually indicate that you need to rework the architecture. I'd suggest instead of relying on error attributes or trying to hack around them, you redesign it, at least for the situation where you have to create a new collection. One popular option is to set up a worker queue. In your case, you would just use find in the first pass to see if a collection exists, and if not, kick it off to the queue for a single worker to pick up sequentially. In the worker job, do your find then create. You can use Redis and Bull to set this up quickly and easily.

Couchbase 4.0 Data Modelling

I have an application with entities like User, Message and MessageFeatures. Each User can have many messages and each message has a MessageFeatures entity. Currently the relational model is expressed as:
User{
UUID id
String email
...
}
Message{
UUID id,
UUID userId
String text
....
}
MessageFeatures{
UUID id
UUID messageId
UUID userId
PrimitiveObject feature1
....
PrimitiveObject featureN
}
The most important queries are:
Get all messages for user
Get all message features for a user
Get message by uuid
Get/Update message feature by uuid
Get message feature by message uuid
Less important(can be slow) queries are like :
Get message features where user_id = someuuid and featureX = value
Get all/count user uuids for which featureX = value
update message features set featureX = newValue where featureX = oldValue
While evaluating couchbase i am unable to arrive at a proper data model. I do not think putting all messages and message features for a user in a single document is a good idea because the size will keep on increasing and based on current data it will easily be in range of 4-5 MB for 2 year data. Also to maintain consistency i can update only one message feature at a time as atomicity is per document.
If i do not place them in a single document they will be scattered around the cluster and queries like get all messages/messagefeatures of a user will result in scatter and gather.
I have checked out global secondary indexes and N1QL but even if I index user_uuid field of messages it will only help in fetching the message_uuids of that user, loading all the messages will result in scatter and gather...
Is there a way to force that all messages, message features of a user_uuid get mapped to a same physical node without embedding them in the same document something like hashtags in redis.

You should translate the relational model above directly to Couchbase. You should create GSI indexes for all the relationships (id fields). Use EXPLAIN to make sure every query uses an index. For direct lookup by id, use USE KEYS.
Scatter/gather in Couchbase means something different than what you describe. It is when a single index scan has to visit several nodes and then merge the scan results (distributed index). Instead, each GSI index lives on a single node, so GSI indexes avoid scatter/gather.
Finally, note that Couchbase is fast at key-value fetches even across nodes, so you do not need to worry about locality of data.

Perl MySQL - How do I skip updating or inserting a row if a particular field matches?

I am pretty new to this so sorry for my lack of knowledge.
I set up a few tables which I have successfully written to and and accessed via a Perl script using CGI and DBI modules thanks to advice here.
This is a member list for a local band newsletter. Yeah I know, tons of apps out there but, I desire to learn this.
1- I wanted to avoid updating or inserting a row if an piece of my input matches column data in one particular column/field.
When creating the table, in phpmyadmin, I clicked the "U" (unique) on that columns name in structure view.
That seemed to work and no dupes are inserted but, I desire a hard coded Perl solution so, I understand the mechanics of this.
I read up on "insert ignore" / "update ignore" and searched all over but, everything I found seems to not just skip a dupe.
The column is not a key or autoinc just a plain old field with an email address. (mistake?)
2- When I write to the database, I want to do NOTHING if the incoming email address matches one in that field.
I desire the fastest method so I can loop through their existing lists export data, (they cannot figure out the software) with no racing / locking issues or whatever conditions in which I am in obvious ignorance.
Since I am creating this from scratch, 1 and 2 may be in fact partially moot. If so, what would be the best approach?
I would still like an auto increment ID so, I can access via the ID number or loop through with some kind of count++ foreach.
My stone knife approach may be laughable to the gurus here but, I need to start somewhere.
Thanks in advance for your assistance.

With the email address column declared UNIQUE, INSERT IGNORE is exactly what you want for insertion. Sounds like you already know how to do the right thing!
(You could perform the "don't insert if it already exists" functionality in perl, but it's difficult to get right, because you have to wrap the test and update in a transaction. One of the big advantages of a relational database is that it will perform constraint checks like this for you, ensuring data integrity even if your application is buggy.)
For updating, I'm not sure what an "update ignore" would look like. What is in the WHERE clause that is limiting your UPDATE to only affect the 1 desired row? Perhaps that auto_increment primary key you mentioned? If you are wanting to write, for example,
UPDATE members SET firstname='Sue' WHERE member_id = 5;
then I think this "update ignore" functionality you want might just be something like
UPDATE members SET firstname='Sue' WHERE member_id = 5
AND email != 'sue#example.com';
which is an odd thing to do, but that's my best guess for what you might mean :)

Just do the insert, if data would make the unique column not be unique you'll get an SQL error, you should be able to trap this and do whatever is appropriate (e.g. ignore it, log it, alert user ...)

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008