HANDLING MULTIPLE INSERTS IN A MICROSERVICE - mysql

I Have an inventory service which is responsible for all the possible operations that can be performed on an inventory. namely create, update, delete, read. The create endpoint is behind a queue to avoid data losses.
While creating an inventory, i need to insert data into multiple tables.
lets say
stocks (primary table)
StocksadditionalInfo (secondary table)
while creating the inventory i need to first insert it into the primary stocks table and then use its Id to insert into the secondary table StocksadditionalInfo.
now lets say while insert the data got created in the primary table stocks but due to some db exception the additionalInfo data didnt get create.
since the endpoint is behind the queue, the inventory will be tried to be created again. but now since the inventory is already created in my primary table, it will give duplicate exception.
how do I avoid this.
one way would be to make a get operation before insert and if not present insert it. and if present update it.
other way could be i could have another endpoint for the additional data and that endpoint can be called from behind the queue. the problem with this is that as the no. of additional table increases i will have to add new queues, new endpoints
any suggestions/resource will be helpful

First I'd wrap the create/update for stocks and StocksadditionalInfo in a transaction, even its just two inserts. This simplifies the cases for a partial insert like you have and models the data consistency of your application better.
Secondly to facilitate the reprocessing maybe use either insert on duplicate key update or insert ignore to avoid your duplicate exceptions.

Related

Query optimation for insert to database

I need a solution to insert a lot of rows in concurrent time in my sql DB.
I have a rule, that everytime I insert to my transaction table, I need a unique ID that's composed by currentTime+transactionSource+sequenceNumber. my problem is, when I test my service using Jmeter, the service is down when the concurrent insert process is up to 3000 rows. the problem relies on, the duplication of the unique ID I generate. so, there are some duplications. in my assumption, the duplication happen because a previous insert process hasnt finished, but there's another insert process. So,it generates unique ID duplication.
Can anyone give me suggestion in what the best way for doing this? Thank you.
MySQL has three wonderful methods to ensure that an id is unique:
auto_increment columns
uuid()
uuid_short()
Use them! The most common way to implement a unique id is the first one:
create table t (
t_id int auto_increment primar key,
. . .
)
I strongly, strongly advise you not to maintain your own id. You get race conditions (as you have seen). Your code will be less efficient than the code in the database. If you need the separate components, you can implement them as columns in the table.
In other words, your fundamental problem is your "rule". And there are zillions of databases in the world that work perfectly well without such a rule.
Why don't you let the database handle the insert id and then update the row with a secondary field containing the format you want ? If you have dupplicates, you can always append the row id to this identifier so it will always be unique.

Migrate data from MySQL with auto-increment Ids to the Google Datastore?

I am trying to migrate some data from MySql to the Datastore. I have a table called User with auto-increment primary keys (Bigint(20)). Now I want to move the data from the User table to the datastore.
My plan was let the Datastore generate new Ids for the migrated users and all the new user created after the migration is done. However we have many services (notifications, urls etc) that depend on the old ids. So I want to use the old ids for the migrated user, however how can I guarantee that all new generated ids won't collide with the migrated Ids?
Record the maximum and minimum ids before migrating. Migrate all the sql rows to datastore entities, setting entity.key.id = sql.row.id.
To prevent new datastore ids from colliding with the old ones, always call AllocateIds() to allocate new ids. In C#, the code looks like this:
Key key;
Key incompleteKey = _db.CreateKeyFactory("Task").CreateIncompleteKey();
do
{
key = _db.AllocateId(incompleteKey);
} while (key.Path[0].Id >= minOldId && key.Path[0].Id <= maxOldId);
// Use new key for new entity.
In reality, you are more likely to win the lottery than to see a key collide, so it won't cost anything more to check against the range of old ids.
You cannot hint/tell the Datastore to reserve specific IDs. So, if you manually set IDs when inserting existing data, and later have the Datastore assign an ID, it my pick an ID that you have already used. Depending on the operation you are using (e.g. INSERT or UPSERT), the operation may fail or overwrite the existing entity.
You need to come up with a migration plan to map existing IDs to Datastore IDs. Depending on the number of tables you have and the complexity of relations between them, this could become a time consuming project, but you should still be able to do it.
Let's take a simple example and assume you have two tables:
USER (USER_ID is primary key)
USER_DATA (USER_ID is foreign key)
You could possibly add another column to USER (or another way) to map the USER_ID to DATASTORE_ID. Here, you call Datastore's allocateID method for the Kind you want to use and store the returned ID into the new column.
Now, you can move USER data to Cloud Datastore ignoring the MySQL User ID, instead use the ID from the new column.
To migrate the data from USER_DATA, do a join between the two tables and push the data using datastore ID.
Also, note that using sequential IDs (referred to as monotonically increasing values) could cause performance issues with Datastore. So, you probably want to use IDs that are generated by the Datastore.

Many-to-many without a extra table

I have two tables operation and source in mysql database.
In operation I have 10 rows(possibility) and in source just 3 rows(possibility) and between them there is a many-to-many relationship.
Question: Is it necessary to add this extra table or just add a foreign key of source in operation.
operation can be subscribe request, subscribe enabled , subscribe disabled , subscribe canceled , payment ok , subscribe deal ok , subscribe start.
source can be from internet , from agent
there is common operations and independent operations from source.
the operaion subscribe enabled can be done from internet subscribe or from agent subscribe and operation : subscribe deal ok can be just from agent and subscribe request can be just from internet.
In a relational database you need 3 tables to make a many to many relationship. Two containing the primary keys and the join table. There's no other way.
For the short and brief answer, normally, with an rdbms like mysql, where only one-to-many relations are supported, you need a 3rd (junction, or cross-reference) table to implement a many-to-many relation between two entities.
But....
Since you don't have too many records, you can map your many-to-many relation between source and operation with just one additional column in source and without redundant data storage. However, you may possibly loose some performance (e.g.: less powerful indexes) and definitely make your life harder working with these tables...
The trick is to use specific binary values as primary key values in your operation table, and add an integer column to the source table where you use its bits to map the relations. So one bit of this column describes one relation between the actual source record and the corresponding operation record.
For your sample operation table, you can create a table with a pri key of a bit type, with a size equal to your estimated number of rows. You say that you are going to have ~10 rows, so use bit(10) as data type. Since, mysql would store int on 4 bytes, you don't loose on the storage size here (instead, compared to int, you may actually win some, but it is really a matter of how the dbe is able to compress the records. actually, you could simply use int as well, if you wish.)
create table operation (id bit(10) primary key, title varchar(50));
insert into operation values (b'0', 'none');
insert into operation values (b'1', 'subscribe request');
insert into operation values (b'10', 'subscribe enabled');
insert into operation values (b'100', 'subscribe disabled');
insert into operation values (b'1000', 'subscribe canceled');
insert into operation values (b'10000', 'payment ok');
insert into operation values (b'100000', 'subscribe deal ok');
insert into operation values (b'1000000', 'subscribe start');
Now, suppose that you have the following in your source table:
create table source (id int primary key, value int, operations bit(10));
insert into source values (1, 1, b'0');
insert into source values (2, 2, b'1'); -- refers to subscribe request
insert into source values (3, 3, b'10'); -- refers to subscribe enabled
insert into source values (4, 4, b'10011'); -- refers to payment ok, subscribe request, subscribe enabled
insert into source values (5, 5, b'1110011'); -- refers to subscribe deal ok, subscribe start, payment ok, subscribe request, subscribe enabled
Now, if you want to select all the relations, join these two tables as follows:
select source.id, operation.title
from source
join operation
on (source.operations & operation.id);
id operation.title
2 subscribe request
4 subscribe request
5 subscribe request
3 subscribe enabled
4 subscribe enabled
5 subscribe enabled
4 payment ok
5 payment ok
5 subscribe deal ok
5 subscribe start
If you want to add a new relation, you may take advantage of the on duplicate key update clause of insert, so you don't have to worry about existing relations:
insert into source (id,value,operations)
values (2,2,(select id from operation where title = 'subscribe start'))
on duplicate key update operations = operations
| (select id from operation where title = 'subscribe start');
If you want to delete a relation:
update source set operations = operations
& ~(select id from operation where title = 'subscribe start') where source.id=2;
All in all, it is not a nice, but a possible way to map your many-to-many relation to just two tables.
Your question can have many answers depending on your real needs.
In fact, in the described situation, you can have just one table "operation" with a source column defined as a MySQL SET type. You will then be able to select 0 to many sources for each opeartion.
You might then alter your table operation to set a source column
ALTER TABLE operation ADD source SET('from internet', 'from agent');
If you really need to have two tables (let suppose your "source" table contain other fields), you should have a third table to make the relation between them.
But, technically, there are situations where, for performance reasons for instance, you could prefer to store your foreign keys in a varchar() field of one of the table, with a coma delimitor, and use PHP to retrieve the data. But it's not the good way of doing it, although it is possible as long as your retrieval of the data is done in one only direction, and you're really sure what you're doing.
For instance, in this "hacky-way", you can imagine an ActiveRecord-like PHP class, where you might wish to retrieve your sources with a method like this
public function getSources() {
private $_sources;
if (!isset($this->_sources)) {
$this->_sources=DBSource::findByPks(explode(",", $this->sources));
}
return $this->_sources;
}
According to the problem you describe, it seems you don't necessarily have a many-to-many relationship as both 'source' and 'operation' are enumerations ( a constant set of values ). Therefore, 'source' and 'operation' do not act as a table but as a data type (i.e. a column type).
You might take a look to Enums in mySQL and to create your own 'source' and 'operation' enum and place them into a table that keeps that "virtual many-to-many relation".
Please, keep in mind that for the solution I am proposing I am assuming that 'source' and 'operation' have a constant and known set of values. In case that was not true, then you would get into trouble as you would have a non-normalized database.
I suggest that you take most simple approach to the problem, it is usually the best one. Use many to many relationship only when it is really needed.
You wrote:
in source just 3 rows(possibility)
source can be from internet , from agent
Those are only two options.
Why not have source like this:
from internet
from agent
from internet & agent
Basically if you are pretty sure that the set of sources will not grow you can hardcode all variants. It gets optimized this way but you loose flexibility. Something similar to #lp_ 's answer.
If you know that in the source table there are at maximum 3 rows you can map the relationship as many to 3 (instead of many to many) with operation table like the following
operation
---------
id_source_1
id_source_2
id_source_3
If you don't know how many rows there are in source you need a third table, because a many to many relationship can be mapped only with a third table.

Reserve a block of auto-increment ids in MySQL

I receive batches of, say, 100 items that I need to insert into three related MySQL tables: say current, recent, and historical. I want to insert each batch in each table as a group in a single insert statement for speed. The current table has an auto-increment primary key id that I need to obtain for each inserted row and use as the primary key to insert the same row in the recent and historical tables. My idea is to get the current auto_increment value for current, increment it by 100 using alter table current AUTO_INCREMENT=, then insert the 100 rows into current with programmatically set ids from the block that I just "reserved". Then I can use the same 100 reserved id values for the inserts into the recent and historical tables without having to query them again from the current table.
My question: Is there some reason that this is a bad idea? I have seen nothing about it on the web. The closest I have seen on stack overflow is Insert into an auto increment field but that is not quite the same thing. I can see possible threading issues if I tried to do this from more than one thread at a time.
I'm also open to other suggestions on how to accomplish this.
There might be concurrency issues: If another connection inserts values between the time you get the current value and you set the new value, you would get duplicate keys.
I am not aware if that can happen in your situation, however, or if the inserts happen only from your batch described above, and there is never another instance of it running in parallel.
Methinks you shoud decouple the IDs from the 3 tables and using ALTER TABLE sounds very fishy too.
The most proper way I can think of:
in recent and historical, add a colum that references to current ID; don't try to force the primary IDs to be the same.
Acquire a WRITE table lock on current.
Get the auto_increment value X for current.
Insert your 100 records; their IDs should now run from X+1 to X+100.
Release the table lock.
Insert records in recent and historical with the know IDs in the extra column.
Note: I'm not sure if the auto_increment value points to the next ID, or the current highest value. If you use MAX(id) then you should use the code above.
This is [a bit] late, but in case someone else has this same question (as I did):
As Ethan pointed out in his comment, auto_increment is an internal MySQL utility to produce unique keys. Since you have the ability to generate your own id values external to MySQL, I suggest removing the auto_increment overhead from the table (but keep id as PK, for transport to the other tables). You can then insert your own id values along with the data.
Obviously once you do this you'll have to program your own incrementing id values. To retrieve a "starting point" for each batch and maintain the speed of a single batch INSERT call, create another table (I'll call in management) with just a single record of last_id, which is equivalent to, but independent of, max(id) of your three primary tables. Then, each time a new batch is ready to be processed, start a transaction on management with a write lock, read management.last_id, UPDATE management.last_id to (last_id+1)+number in batch, then close the transaction. You now have sequential id values to insert that are reserved for that batch because any future calls to management.last_id will the next-larger set of id values.
The write-locked transaction removes any concurrency issues (as stated in FrankPI's answer) because any other processes attempting to read management must wait for the lock to be removed and will return the value after the UPDATE. This also removes the id ambiguity in JvO's answer: "...IDs should now run from X+1 to X+100", which can be a dangerous assumption.

Mysql Constraign Database Entries in Rails

I am using Mysql 5 and Ruby On Rails 2.3.1. I have some validations that occasionally fail to prevent duplicate items being saved to my database. Is it possible at the database level to restrict a duplicate entry to be created based on certain parameters?
I am saving emails to a database, and don't want to save a duplicate subject line, body, and sender address. Does anyone know how to impose such a limit on a DB through a migration?
You have a number of options to ensure a unique value set is inserted into your table. Lets consider 1) Push responsibility to the database engine or 2) your application's responsibilitiy.
Pushing responsibility to the database engine could entail the use of creating a UNIQUE index on your table. See MySql Create Index syntax. Note, this solution may result in an exception thrown in case a duplicate value is inserted. As you've identified what I infer to be three columns to determine uniqueness (subject line, body, and sender address) you'll create the index to include all three columns. Its been a while since I've worked with Rails so you may want to check the record count inserted as well.
If you desire to push this responsibility to your application software you'll need to contend with potential data insertion conflicts. That is, assume you have two users creating an email simultaneously (just work with me here) having the same subject line, body, and send address. Should your code simple query for any records consisting of the text (identical for both users in this example) both will return no records found and will proceed along merily inserting their emails which now violate your premise. So, you can address this with perhaps a table lock, or some other syncing field in the database to ensure duplicates don't appear. This latter approach could consist of another table with a single field indicating if someone is inserting a record or not, once completed it updates that record to state it has completed and then others can proceed.
While there you can have a separate architectural discussion on the implications of each alternative I'll leave that to a separate post. Hopefully this suffices in answering your question.
You should be able to add a unique index to any columns you want to be unique throughout the table.