Many-to-many without a extra table - mysql

I have two tables operation and source in mysql database.
In operation I have 10 rows(possibility) and in source just 3 rows(possibility) and between them there is a many-to-many relationship.
Question: Is it necessary to add this extra table or just add a foreign key of source in operation.
operation can be subscribe request, subscribe enabled , subscribe disabled , subscribe canceled , payment ok , subscribe deal ok , subscribe start.
source can be from internet , from agent
there is common operations and independent operations from source.
the operaion subscribe enabled can be done from internet subscribe or from agent subscribe and operation : subscribe deal ok can be just from agent and subscribe request can be just from internet.

In a relational database you need 3 tables to make a many to many relationship. Two containing the primary keys and the join table. There's no other way.

For the short and brief answer, normally, with an rdbms like mysql, where only one-to-many relations are supported, you need a 3rd (junction, or cross-reference) table to implement a many-to-many relation between two entities.
But....
Since you don't have too many records, you can map your many-to-many relation between source and operation with just one additional column in source and without redundant data storage. However, you may possibly loose some performance (e.g.: less powerful indexes) and definitely make your life harder working with these tables...
The trick is to use specific binary values as primary key values in your operation table, and add an integer column to the source table where you use its bits to map the relations. So one bit of this column describes one relation between the actual source record and the corresponding operation record.
For your sample operation table, you can create a table with a pri key of a bit type, with a size equal to your estimated number of rows. You say that you are going to have ~10 rows, so use bit(10) as data type. Since, mysql would store int on 4 bytes, you don't loose on the storage size here (instead, compared to int, you may actually win some, but it is really a matter of how the dbe is able to compress the records. actually, you could simply use int as well, if you wish.)
create table operation (id bit(10) primary key, title varchar(50));
insert into operation values (b'0', 'none');
insert into operation values (b'1', 'subscribe request');
insert into operation values (b'10', 'subscribe enabled');
insert into operation values (b'100', 'subscribe disabled');
insert into operation values (b'1000', 'subscribe canceled');
insert into operation values (b'10000', 'payment ok');
insert into operation values (b'100000', 'subscribe deal ok');
insert into operation values (b'1000000', 'subscribe start');
Now, suppose that you have the following in your source table:
create table source (id int primary key, value int, operations bit(10));
insert into source values (1, 1, b'0');
insert into source values (2, 2, b'1'); -- refers to subscribe request
insert into source values (3, 3, b'10'); -- refers to subscribe enabled
insert into source values (4, 4, b'10011'); -- refers to payment ok, subscribe request, subscribe enabled
insert into source values (5, 5, b'1110011'); -- refers to subscribe deal ok, subscribe start, payment ok, subscribe request, subscribe enabled
Now, if you want to select all the relations, join these two tables as follows:
select source.id, operation.title
from source
join operation
on (source.operations & operation.id);
id operation.title
2 subscribe request
4 subscribe request
5 subscribe request
3 subscribe enabled
4 subscribe enabled
5 subscribe enabled
4 payment ok
5 payment ok
5 subscribe deal ok
5 subscribe start
If you want to add a new relation, you may take advantage of the on duplicate key update clause of insert, so you don't have to worry about existing relations:
insert into source (id,value,operations)
values (2,2,(select id from operation where title = 'subscribe start'))
on duplicate key update operations = operations
| (select id from operation where title = 'subscribe start');
If you want to delete a relation:
update source set operations = operations
& ~(select id from operation where title = 'subscribe start') where source.id=2;
All in all, it is not a nice, but a possible way to map your many-to-many relation to just two tables.

Your question can have many answers depending on your real needs.
In fact, in the described situation, you can have just one table "operation" with a source column defined as a MySQL SET type. You will then be able to select 0 to many sources for each opeartion.
You might then alter your table operation to set a source column
ALTER TABLE operation ADD source SET('from internet', 'from agent');
If you really need to have two tables (let suppose your "source" table contain other fields), you should have a third table to make the relation between them.
But, technically, there are situations where, for performance reasons for instance, you could prefer to store your foreign keys in a varchar() field of one of the table, with a coma delimitor, and use PHP to retrieve the data. But it's not the good way of doing it, although it is possible as long as your retrieval of the data is done in one only direction, and you're really sure what you're doing.
For instance, in this "hacky-way", you can imagine an ActiveRecord-like PHP class, where you might wish to retrieve your sources with a method like this
public function getSources() {
private $_sources;
if (!isset($this->_sources)) {
$this->_sources=DBSource::findByPks(explode(",", $this->sources));
}
return $this->_sources;
}

According to the problem you describe, it seems you don't necessarily have a many-to-many relationship as both 'source' and 'operation' are enumerations ( a constant set of values ). Therefore, 'source' and 'operation' do not act as a table but as a data type (i.e. a column type).
You might take a look to Enums in mySQL and to create your own 'source' and 'operation' enum and place them into a table that keeps that "virtual many-to-many relation".
Please, keep in mind that for the solution I am proposing I am assuming that 'source' and 'operation' have a constant and known set of values. In case that was not true, then you would get into trouble as you would have a non-normalized database.

I suggest that you take most simple approach to the problem, it is usually the best one. Use many to many relationship only when it is really needed.
You wrote:
in source just 3 rows(possibility)
source can be from internet , from agent
Those are only two options.
Why not have source like this:
from internet
from agent
from internet & agent
Basically if you are pretty sure that the set of sources will not grow you can hardcode all variants. It gets optimized this way but you loose flexibility. Something similar to #lp_ 's answer.

If you know that in the source table there are at maximum 3 rows you can map the relationship as many to 3 (instead of many to many) with operation table like the following
operation
---------
id_source_1
id_source_2
id_source_3
If you don't know how many rows there are in source you need a third table, because a many to many relationship can be mapped only with a third table.

Related

HANDLING MULTIPLE INSERTS IN A MICROSERVICE

I Have an inventory service which is responsible for all the possible operations that can be performed on an inventory. namely create, update, delete, read. The create endpoint is behind a queue to avoid data losses.
While creating an inventory, i need to insert data into multiple tables.
lets say
stocks (primary table)
StocksadditionalInfo (secondary table)
while creating the inventory i need to first insert it into the primary stocks table and then use its Id to insert into the secondary table StocksadditionalInfo.
now lets say while insert the data got created in the primary table stocks but due to some db exception the additionalInfo data didnt get create.
since the endpoint is behind the queue, the inventory will be tried to be created again. but now since the inventory is already created in my primary table, it will give duplicate exception.
how do I avoid this.
one way would be to make a get operation before insert and if not present insert it. and if present update it.
other way could be i could have another endpoint for the additional data and that endpoint can be called from behind the queue. the problem with this is that as the no. of additional table increases i will have to add new queues, new endpoints
any suggestions/resource will be helpful
First I'd wrap the create/update for stocks and StocksadditionalInfo in a transaction, even its just two inserts. This simplifies the cases for a partial insert like you have and models the data consistency of your application better.
Secondly to facilitate the reprocessing maybe use either insert on duplicate key update or insert ignore to avoid your duplicate exceptions.

Query optimation for insert to database

I need a solution to insert a lot of rows in concurrent time in my sql DB.
I have a rule, that everytime I insert to my transaction table, I need a unique ID that's composed by currentTime+transactionSource+sequenceNumber. my problem is, when I test my service using Jmeter, the service is down when the concurrent insert process is up to 3000 rows. the problem relies on, the duplication of the unique ID I generate. so, there are some duplications. in my assumption, the duplication happen because a previous insert process hasnt finished, but there's another insert process. So,it generates unique ID duplication.
Can anyone give me suggestion in what the best way for doing this? Thank you.
MySQL has three wonderful methods to ensure that an id is unique:
auto_increment columns
uuid()
uuid_short()
Use them! The most common way to implement a unique id is the first one:
create table t (
t_id int auto_increment primar key,
. . .
)
I strongly, strongly advise you not to maintain your own id. You get race conditions (as you have seen). Your code will be less efficient than the code in the database. If you need the separate components, you can implement them as columns in the table.
In other words, your fundamental problem is your "rule". And there are zillions of databases in the world that work perfectly well without such a rule.
Why don't you let the database handle the insert id and then update the row with a secondary field containing the format you want ? If you have dupplicates, you can always append the row id to this identifier so it will always be unique.

Integer values for status fields

Often I find myself creating 'status' fields for database tables. I set these up as TINYINT(1) as more than often I only need a handful of status values. I cross-reference these values to array-lookups in my code, an example is as follows:
0 - Pending
1 - Active
2 - Denied
3 - On Hold
This all works very well, except I'm now trying to create better database structures and realise that from a database point of view, these integer values don't actually mean anything.
Now a solution to this may be to create separate tables for statuses - but there could be several status columns across the database and to have separate tables for each status column seems a bit of overkill? (I'd like each status to start from zero - so having one status table for all statuses wouldn't be ideal for me).
Another option is to use the ENUM data type - but there are mixed opinions on this. I see many people not recommending to use ENUM fields.
So what would be the way to go? Do I absolutely need to be putting this data in to its own table?
I think the best approach is to have a single status table for each kind of status. For example, order_status ("placed", "paid", "processing", "completed") is qualitatively different from contact_status ("received", "replied", "resolved"), but the latter might work just as well for customer contacts as for supplier contacts.
This is probably already what you're doing — it's just that your "tables" are in-memory arrays rather than database tables.
As I really agree with "ruakh" on creating another table structured as id statusName which is great. However, I would like to add that for such a table you can still use tinyint(1) for the id field. as tinyint accepts values from 0 to 127 which would cover all status cases you might need.
Can you add (or remove) a status value without changing code?
If yes, then consider a separate lookup table for each status "type". You are already treating this data in a generic way in your code, so you should have a generic data structure for it.
I no, then keep the ENUM (or well-documented integer). You are treating each value in a special way, so there isn't much purpose in trying to generalize the data model.
(I'd like each status to start from zero - so having one status table for all statuses wouldn't be ideal for me
You should never mix several distinct sets of values within the same lookup table (regardless of your "zero issue"). Reasons:
A simple FOREIGN KEY alone won't be able to prevent referencing a value from the wrong set.
All values are forced into the same type, which may not always be desirable.
That's such a common anti-pattern that it even has a name: "one true lookup table".
Instead, keep each lookup "type" within a separate table. That way, FKs work predictably and you can tweak datatypes as necessary.

Reserve a block of auto-increment ids in MySQL

I receive batches of, say, 100 items that I need to insert into three related MySQL tables: say current, recent, and historical. I want to insert each batch in each table as a group in a single insert statement for speed. The current table has an auto-increment primary key id that I need to obtain for each inserted row and use as the primary key to insert the same row in the recent and historical tables. My idea is to get the current auto_increment value for current, increment it by 100 using alter table current AUTO_INCREMENT=, then insert the 100 rows into current with programmatically set ids from the block that I just "reserved". Then I can use the same 100 reserved id values for the inserts into the recent and historical tables without having to query them again from the current table.
My question: Is there some reason that this is a bad idea? I have seen nothing about it on the web. The closest I have seen on stack overflow is Insert into an auto increment field but that is not quite the same thing. I can see possible threading issues if I tried to do this from more than one thread at a time.
I'm also open to other suggestions on how to accomplish this.
There might be concurrency issues: If another connection inserts values between the time you get the current value and you set the new value, you would get duplicate keys.
I am not aware if that can happen in your situation, however, or if the inserts happen only from your batch described above, and there is never another instance of it running in parallel.
Methinks you shoud decouple the IDs from the 3 tables and using ALTER TABLE sounds very fishy too.
The most proper way I can think of:
in recent and historical, add a colum that references to current ID; don't try to force the primary IDs to be the same.
Acquire a WRITE table lock on current.
Get the auto_increment value X for current.
Insert your 100 records; their IDs should now run from X+1 to X+100.
Release the table lock.
Insert records in recent and historical with the know IDs in the extra column.
Note: I'm not sure if the auto_increment value points to the next ID, or the current highest value. If you use MAX(id) then you should use the code above.
This is [a bit] late, but in case someone else has this same question (as I did):
As Ethan pointed out in his comment, auto_increment is an internal MySQL utility to produce unique keys. Since you have the ability to generate your own id values external to MySQL, I suggest removing the auto_increment overhead from the table (but keep id as PK, for transport to the other tables). You can then insert your own id values along with the data.
Obviously once you do this you'll have to program your own incrementing id values. To retrieve a "starting point" for each batch and maintain the speed of a single batch INSERT call, create another table (I'll call in management) with just a single record of last_id, which is equivalent to, but independent of, max(id) of your three primary tables. Then, each time a new batch is ready to be processed, start a transaction on management with a write lock, read management.last_id, UPDATE management.last_id to (last_id+1)+number in batch, then close the transaction. You now have sequential id values to insert that are reserved for that batch because any future calls to management.last_id will the next-larger set of id values.
The write-locked transaction removes any concurrency issues (as stated in FrankPI's answer) because any other processes attempting to read management must wait for the lock to be removed and will return the value after the UPDATE. This also removes the id ambiguity in JvO's answer: "...IDs should now run from X+1 to X+100", which can be a dangerous assumption.

How to deal with duplicates in database?

In a program, should we use try catch to check insertion of duplicate values into tables, or should we check if the value is already present in the table and avoid insertion?
This is easy enough to enforce with a UNIQUE constraint on the database side so that's my recommendation. I try to put as much of the data integrity into the database so that I can avoid having bad data (although sometimes unavoidable).
If this is how you already have it you might as well just catch the mysql exception for duplicate value insertion on such a table as doing the check then the insertion is more costly then having the database do one simple lookup (and possibly an insert).
Depends upon whether you are inserting one, or a million, as well as whether the duplicate is the primary key.
If its the primary key, read: http://database-programmer.blogspot.com/2009/06/approaches-to-upsert.html
An UPSERT or ON DUPLICATE KEY... The idea behind an UPSERT is simple.
The client issues an INSERT command. If a row already exists with the
given primary key, then instead of throwing a key violation error, it
takes the non-key values and updates the row.
This is one of those strange (and very unusual) cases where MySQL
actually supports something you will not find in all of the other more
mature databases. So if you are using MySQL, you do not need to do
anything special to make an UPSERT. You just add the term "ON
DUPLICATE KEY UPDATE" to the INSERT statement:
If it's not the primary key, and you are inserting just one row, then you can still make sure this doesn't cause a failure.
For your actual question, I don't really like the idea of using try/catch for program flow, but really, you have to evaluate readability and user experience (in this case performance), and pick what you think is the best of mix of the two.
You can add a UNIQUE constraint to your table.. Something like
CREATE TABLE IF NOT EXISTS login
(
loginid SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY,
loginname CHAR(20) NOT NULL,
UNIQUE (loginname)
);
This will ensure no two login names are the same.
you can Create a Unique Composite Key
ALTER TABLE `TableName` ADD UNIQUE KEY (KeyOne, KeyTwo, ...);
you just need to create a unique key in your table so that it will not permit to add the same value again.
You should try inserting the value and catch the exception. In a busy system, if you check for the existience of a value it might get inserted between the time you check and the time you insert it.
Let the database do it's job, let the database check for the duplicate entry.
A database is a computerized representation of a set of business rules and a DBMS is used to enforce these business rules as constraints. Neither can verify a proposition in the database is true in the real world. For example, if the model in question is the employees of an enterprise and the Employees table contains two people named 'Jimmy Barnes' DBMS (nor the database) cannot know whether one is a duplicate, whether either are real people, etc. A trusted source is required to determine existence and identity. In the above example, the enterprise's personnel department is responsible for checking public records, perusing references, ensuring the person is not already on the payroll, etc then allocating an unique employee reference number that can be used as a key. This is why we look for industry-standard identifiers with a trusted source: ISBN for books, VIN for cars, ISO 4217 for currencies, ISO 3166 for countries, etc.
I think it is better to check if the value already exists and avoid the insertion. The check for duplicate values can be done in the procedure that saves the data (using exists if your database is an SQL database).
If a duplicate exists you avoid the insertion and can return a value to your app indicating so and then show a message accordingly.
For example, a piece of SQL code could be something like this:
select #ret_val = 0
If exists (select * from employee where last_name = #param_ln and first_name = #param_fn)
select #ret_val = -1
Else
-- your insert statement here
Select #ret_val
Your condition for duplicate values will depend on what you define as a duplicate record. In your application you would use the return value to know if the data was a duplicate. Good luck!