MySQL bulk insertion: parent table autogenerated key into child table - mysql

We are doing migration from SQL Server 2012 to MySQL 5.6. One of the scenarios that came up is inserting bulk records in parent child table. An example:
create table parent (
parent_id int primary key auto_increment,
parent_name varchar(100) );
create table child (
child_id int primary key auto_increment,
child_name varchar(100) ,
foreign key (parent_id) references parent(parent_id));
Say I have two temp tables parent_temp and child_temp and I want to insert the records into parent and child tables. The problem is that I need to keep track of the auto-generated parent_id column with the parent_temp_id. In SQL Server, we used Output into statement to work around this problem. Since there is no direct method available here, here are some straight forward solutions that I could think of:
Do the insertion through entity framework.
Use a while loop to iterate the parent records and do an insertion in the parent table, get hold of the auto-generated key and insert into child table. And so on.
Add a spare column dummy_col in the parent table to hold the mapping. This will allow bulk inserts for parent table. The insert query looks like
insert into parent(parent_name,dummy_col )
select parent_temp_name, parent_temp_id from parent_temp
In this way we will have a 1-1 mapping between the rows of parent and parent_temp tables. The child table query looks like
insert into child(child_name,parent_id)
select child_temp_name, p.parent_id from child_temp ct
inner join parent p on p.dummy_col = ct.parent_temp_id
The problem with approaches 1 and 2 is that they are slow for bulk insertions. We could be inserting easily 15k rows at one time. Approach 3 will be problematic if two or more users are simultaneously running the same insertion query, and if their parent_temp_id's match since (we are using int, and they would be always be starting from 1,2,3,4...). If we use GUIDs instead of ints, we can probably avoid this duplicate issue. But we would always need to create extra columns in such tables and make sure that they are not used for some other purpose.
Based on the above scenario, are there any other solutions for MySQL? And which one would you prefer?

Related

Copying from one table to other, how to enforce foreign key check on the whole data set but not on separate rows?

I'm using MySQL. Let's assume I have a table hierarchy with two columns: id, parent_id.
The parent_id refers to id of other row of the same table, so I have the foreign key there.
The hierarchy table contains some data, but they are not relevant now.
I also have a second table called new_hierarchy_entries that has the same columns, but there are no foreign key restrictions set.
new_hierarchy_entries contains:
id parent_id
2 1
1 null
Now I want to copy all the rows from new_hierarchy_entries into hierarchy. When I run naively:
INSERT INTO hierarchy SELECT * FROM new_hierarchy_entries
I get error: Cannot add or update a child row: a foreign key constraint fails (my_db.hierarchy, CONSTRAINT hierarchy_ibfk_2 FOREIGN KEY (parent_id) REFERENCES hierarchy (id))
Of course, if the rows are inserted one by one, the first row (id=2, parent=1) cannot be inserted, because there is no row with id=1 in table hierarchy.
On the other hand, if all rows were added at once, then the constraints would be satisfied. So how can I copy the rows in such a way that I'm sure that constraints are satisfied after the copying, but they may not be satisfied while copying?
Sorting rows of new_hierarchy_entries by id will not help. I cannot assume that parent_id < id in the same row.
Sorting rows of new_hierarchy_entries by the hierarchy (using tree terminology, give me leaves first, then their parents etc.) would help, but I'm not sure how to do that in MySQL query.
I played with the idea of temporarily turning the FOREIGN_KEY_CHECKS off. But then I could insert inconsistent data and I wouldn't find out. Turning FOREIGN_KEY_CHECKS on doesn't make the database check consistency of all the data. It would take too much resources anyway.
This is tricky. I don't know any way to make MySQL re-check foreign key references after enabling FOREIGN_KEY_CHECKS.
You could check yourself for orphan rows, and if there are any, roll back.
BEGIN;
SET SESSION FOREIGN_KEY_CHECKS=0;
INSERT INTO hierarchy SELECT * FROM new_hierarchy_entries;
SET SESSION FOREIGN_KEY_CHECKS=1;
SELECT COUNT(*) FROM hierarchy AS c
LEFT OUTER JOIN hierarchy AS p ON p.id=c.parent_id
WHERE p.id IS NULL;
-- if count == 0 then...
COMMIT;
-- otherwise ROLLBACK and investigate the bad data
One other possibility is to use INSERT with the IGNORE option, which will skip failed rows. Then repeat the same statement in a loop, as long as you see "rows affected" more than 0.
INSERT IGNORE INTO hierarchy SELECT * FROM new_hierarchy_entries;
INSERT IGNORE INTO hierarchy SELECT * FROM new_hierarchy_entries;
INSERT IGNORE INTO hierarchy SELECT * FROM new_hierarchy_entries;
...

Limit number of associated records on DB level

Let's assume I have two tables parents and children. Naturally parents may have many children with one-to-many association. Is there any construct within MySQL or PostgreSQL which would allow to limit the number of associated objects, something like:
FOREIGN KEY (parent_id)
REFERENCES parent(id)
LIMIT 3
Does anything like this exists or do we need to have a custom trigger?
Not in the definition of foreign keys. I would solve that by adding a sequential number for each child per parent like this (code for PostgreSQL, the principal is standard SQL):
CREATE TABLE child (
child_id serial PRIMARY KEY
, parent_id int NOT NULL REFERENCES parent
, child_nr int NOT NULL
, CHECK (child_nr BETWEEN 1 AND 3)
, UNIQUE (parent_id, child_nr)
);
This way you can have children 1 through 3 for each parent or some or none of those. But no others.
Since you have a natural PK with (parent_id, child_nr) now, you could drop the surrogate PK column child_id. But I like to have a single-column surrogate PK for almost every table ...
You could use a trigger to limit the number, which checks how many children are there already before inserting a new one. But you'd run into concurrency issues, and it's more expensive, less reliable, easier to circumvent and vendor-specific.
How to manage child_nr?
The RDBMS just enforces (reliably) that no illegal state can ever exist in the table. How you figure out the next child_nr is up to you. Many different approaches are possible.
For just three children, you could insert all children automatically when creating a parent (with a trigger, a rule or in your application). With given (parent_id, child_nr) and additional columns NULL.
Then you would only allow UPDATE and not INSERT or DELETE for the child table (GRANT / REVOKE), or even make sure with another trigger, so that superusers cannot circumvent it. Make the FK to parent with ON DELETE CASCADE, so children die with the parent automatically.
Alternative
Somewhat less reliable, but cheaper: Keep a running count of children in the parent table and restrict it to be <= 3. Update it with every change in the child table with triggers. Be sure to cover all possible ways to alter data in the child table.

Mysql database empty column values vs additional identifying table

Sorry, not sure if question title is reflects the real question, but here goes:
I designing system which have standard orders table but with additional previous and next columns.
The question is which approach for foreign keys is better
Here I have basic table with following columns (previous, next) which are self referencing foreign keys. The problem with this table is that the first placed order doesn't have previous and next fields, so they left out empty, so if I have say 10 000 records 30% of them have those columns empty that's 3000 rows which is quite a lot I think, and also I expect numbers to grow. so in a let's say a year time period it can come to 30000 rows with empty columns, and I am not sure if it's ok.
The solution I've have came with is to main table with other 2 tables which have foreign keys to that table. In this case those 2 additional tables are identifying tables and nothing more, and there's no longer rows with empty columns.
So the question is which solution is better when considering query speed, table optimization, and common good practices, or maybe there's one even better that I don't know? (P.s. I am using mysql with InnoDB engine).
If your aim is to do order sets, you could simply add a new table for that, and just have a single column as a foreign key to that table in the order table.
The orders could also include a rank column to indicate in which order orders belonging to the same set come.
create table order_sets (
id not null auto_increment,
-- customer related data, etc...
primary key(id)
);
create table orders (
id int not null auto_increment,
name varchar,
quantity int,
set_id foreign key (order_set),
set_rank int,
primary key(id)
);
Then inserting a new order means updating the rank of all other orders which come after in the same set, if any.
Likewise, for grouping queries, things are way easier than having to follow prev and next links. I'm pretty sure you will need these queries, and the performances will be much better that way.

MySQL Database design. Inserting rows in 1to1 tables.

What is the best way to insert rows into tables with references 1 to 1 of each other?
I mean, in a MySQL 5.5 and tables InnoDB, I have a database design similar to the following
The problem arises when we try to insert rows in table1 and table2. Since there is no multi-table insert in MySQL, I can not insert a row becouse the foreign keys are NOT NULL fields in both tables and should be inserted simultaneously in both.
Which is the bes way to solve this problem?
I have in mind 3 possible solutions, but I want to know if there are more than these or which is the best and why.
Set the foreign key field as NULLABLE and after insert one row in a table, insert the other one and afterwards, update de first one.
Just as indicated above but with an special value like -1. First, insert in one table with foreign key = -1 that is equivalent to NULL but avoiding set the field as NULLABLE. Afterwards, we insert the row in the other table and update the first one inserted.
Create a relational table between both though it is not really necessary because it is a 1 to 1 ratio
Thanks!!
EDIT
I briefly explain what I need this circular relationship: It is a denormalization from the parent table to one of its childs. It is made in order of high performance to have always the reference of the best ranked child from a parent table.
I'll make this an answer as I feel this is a design flaw.
First, if the two tables are in true 1:1 relationship, why don't you just have one table?
Second, if it's not a true 1:1 relationship but a supertype-subtype problem, you don't need this circular foreign keys either. Lets say table1 is Employee and table2 is Customer. Off course most customers are not employees (and vice-versa). But sometimes a customer may be an employee too. This can be solved having 3 tables:
Person
------
id
PRIMARY KEY: id
Employee
--------
personid
lastname
firstname
... other data
PRIMARY KEY: personid
FOREIGN KEY: personid
REFERENCES Person(id)
Customer
--------
personid
creditCardNumber
... other data
PRIMARY KEY: personid
FOREIGN KEY: personid
REFERENCES Person(id)
In the scenario you describe you have two tables Parent and Child having 1:N relationship. Then, you want to store somehow the best performing (based on a defined calculation) child for every parent.
Would this work?:
Parent
------
id
PRIMARY KEY: id
Child
-----
id
parentid
... other data
PRIMARY KEY: id
FOREIGN KEY: parentid
REFERENCES Parent(id)
UNIQUE KEY: (id, parentid) --- needed for the FK below
BestChild
---------
parentid
childid
... other data
PRIMARY KEY: parentid
FOREIGN KEY: (childid, parentid)
REFERENCES Child(id, parentid)
This way, you enforce the wanted referential integrity (every BestChild is a Child, every Parent has only one BestChild) and there is no circular path in the References. The reference to the best child is stored in the extra table and not in the Parent table.
You can find BestChild for every Parent by joining:
Parent
JOIN BestChild
ON Parent.id = BestChild.parentid
JOIN Child
ON BestChild.childid = Child.id
Additionally, if you want to store best children for multiple performance tests (for different types of tests, or tests in various dates), you can add a test field, and alter the Primary Key to (test, parentid):
BestChild
---------
testid
parentid
childid
... other data
PRIMARY KEY: (testid, parentid)
FOREIGN KEY: (childid, parentid)
REFERENCES Child(id, parentid)
FOREIGN KEY: testid
REFERENCES Test(id)
I'd create a blackhole table and put a trigger on that to take care of inserts
CREATE TABLE bh_table12 (
table1col varchar(45) not null,
table2col varchar(45) not null
) ENGINE = BLACKHOLE
and put a trigger on that to take care of inserts
DELIMITER $$
CREATE TRIGGER ai_bh_table12_each AFTER INSERT ON bh_table12 FOR EACH ROW
BEGIN
DECLARE mytable1id integer;
DECLARE mytable2id integer;
SET foreign_key_checks = 0;
INSERT INTO table1 (table1col, table2_id) VALUES (new.table1col, 0);
SELECT last_insert_id() INTO mytable1id;
INSERT INTO table2 (table2col, table1_id) VALUES (new.table2col, table1id);
SELECT last_insert_id() INTO mytable2id;
UPDATE table1 SET table2_id = mytable2id WHERE table1.id = mytable1id;
SET foreign_key_checks = 1;
END $$
DELIMITER ;
Note that actions in a trigger are part of one transaction (when using InnoDB or likewise), so an error in the trigger will rollback partial changes.
Note on your table structure
Note that if it's a 1-on-1 table, you only need to put a table2_id in table1 and no table1_id in table2 (or visa versa).
If you need to query table1 based on table2 you can just use:
SELECT table1.* FROM table1
INNER JOIN table2 on (table2.id = table1.table2_id)
WHERE table2.table2col = 'test2'
Likewise for the other way round
SELECT table2.* FROM table2
INNER JOIN table1 on (table2.id = table1.table2_id)
WHERE table1.table1col = 'test1'
Links:
http://dev.mysql.com/doc/refman/5.1/en/blackhole-storage-engine.html
http://dev.mysql.com/doc/refman/5.1/en/triggers.html
I feel this is an important question, and I haven't found any 100% satisfying answer throughout the web. The 2 answers that you have given are the best ones I found, yet they are not 100% satisfactory.
Here's why :
The reason why Emilio cannot put his best child inside his parent table is pretty simple, I presume, because I share the same problem : not every child will be labelled as a parent's best child. So he would still need to store information on other children somewhere else. In that case, he would have some information about the best children in their parent's table, and other children in a separate database. This is a huge mess. For example, the day he wants to change the data structure about children, he needs to change it in both tables. Every time he writes a query on all children, he should query both tables, etc...
the reason why Emilio cannot just set the best child foreign key to nullable (I presume for Emilio, but for me it would be very strict), is that he needs to be sure that a parent always has a best child. In Emilio's case it's maybe not very easy to imagine, but in mine, I cannot have the equivalent of the parent have no child.
Thus I would have tended to think that the solution with setting foreign_key_checks to zero would be best, but here is the problem :
after setting foreign_key_checks back to 1, there is no check on data's consistency. Thus, you have a risk of making mistakes in the meantime. You can consider that you won't, but still it is not a very clean solution.

Hierarchical Data Join of parent/child relationship in same table

I have the following table:
Id ParentId Weight
1 1 0
2 1 10
3 2 5
ParentId references Id of the same table. How can I query this table so that I join it on itself, adding up the cumulative weight of the third column?
For example, if I wanted to know the cumulative weight of Id 2, the result would return 15 (Id2 + Id3 = 15) as the parent of item 3 is 2. If I wanted to know the cumulative weight of item 3, it would return 5, as no records have a parent id of item 3.
Essentially, if the record I am querying has a child, I want to add the sequence of data's children and return one result.
Is this possible to do in one fell swoop to the database or would I have to loop through the entire record set to find matches?
Take a look on this article. If your table is not updated frequently, you can modify a little their GenericTree procedure that it generates all paths for all rows (and call it every time you insert record into the table or update ParentId column), store this data into a new table, and then you can perform all the tasks required using simple queries. Personally, I end up with the following table structure:
CREATE TABLE `tree_for_my_table` (
`rootID` INT(11) NOT NULL, // root node id
`parentID` INT(11) NOT NULL, // current parent id
`childID` INT(11) NOT NULL, // child id (direct child of the parent)
`level` INT(11) NOT NULL, // how far child is from root
PRIMARY KEY (`rootID`, `parentID`, `childID`),
UNIQUE INDEX `childID` (`childID`, `level`)
)
Populating data for that table doesn't take too long even for a quite large my_table.
Last I looked, mysql didn't have a built-in way of doing hierarchical queries, but you can always use a technique such as the adjacency list, discussed (among other techniques) in Managing Hierarchical Data in MySQL, which encodes the hierarchy in another table and lets you join against that to retrieve subtrees in your hierarchy.
You need to index your tree. See Managing Hierarchical Data in MySQL for some ways to do this.