Copying Relational Table Data - mysql

I hope I'm not being too long winded in this question, I just want to make sure that what I'm asking is completely clear (I think it's rather confusing :).
I've got a database with a bunch of tables with all my foreign key constraints set up. Relations are sometimes several tables deep, and there are also cases where a child is related to more than one parent table. I want to insert a copy of one of my "top level" table rows, and all of child table data that relates to it (keeping the relational integrity). That is to say my new top level row gets its own new primary key (via auto_increment), and all the new child rows get their own primary keys (again through auto_increment), and all the foreign key information of the tables relates analogously to the data I copied (only now with the newly created primary keys). So now I would have a copy of the relational data that was independently mutable from the original.
To make my example more concrete, I have painstakingly set up an analogous, but simpler example. Lets define the following tables:
alt text http://www.freeimagehosting.net/uploads/ef22070a89.png
All the green id fields are auto_update primary keys, and the yellowish ones are indexed columns with foreign key constraints. Lets say the database has the following data in it initially:
job_types
+----+----------+
| id | jobcode |
+----+----------+
| 1 | DEADBEEF |
| 3 | FEEDFACE |
+----+----------+
managers
+----+---------------+-------------+
| id | name | job_type_id |
+----+---------------+-------------+
| 1 | John | 1 |
| 3 | Michael Scott | 3 |
+----+---------------+-------------+
departments
+----+------+------------+
| id | name | manager_id |
+----+------+------------+
| 1 | H32 | 1 |
| 2 | X11 | 3 |
+----+------+------------+
employees
+----+-------------+---------------+------------+-------------+
| id | name | department_id | manager_id | job_type_id |
+----+-------------+---------------+------------+-------------+
| 1 | Billy Bob | 1 | 1 | 1 |
| 2 | Sandra Lee | 1 | 1 | 3 |
| 3 | Buddy Holly | 2 | 3 | 1 |
+----+-------------+---------------+------------+-------------+
Now say what I want to do is make a relational copy of department H32 (id=1).
What I should end up with is something like the following (obviously actual values of primary keys is not important, referential integrity is).
job_types
+----+----------+
| id | jobcode |
+----+----------+
| 1 | DEADBEEF |
| 3 | FEEDFACE |
| 4 | DEADBEEF |
| 5 | FEEDFACE |
+----+----------+
managers
+----+---------------+-------------+
| id | name | job_type_id |
+----+---------------+-------------+
| 1 | John | 1 |
| 3 | Michael Scott | 3 |
| 4 | John | 4 |
+----+---------------+-------------+
departments
+----+------+------------+
| id | name | manager_id |
+----+------+------------+
| 1 | H32 | 1 |
| 2 | X11 | 3 |
| 3 | H32 | 4 |
+----+------+------------+
employees
+----+-------------+---------------+------------+-------------+
| id | name | department_id | manager_id | job_type_id |
+----+-------------+---------------+------------+-------------+
| 1 | Billy Bob | 1 | 1 | 1 |
| 2 | Sandra Lee | 1 | 1 | 3 |
| 3 | Buddy Holly | 2 | 3 | 1 |
| 4 | Billy Bob | 3 | 4 | 4 |
| 5 | Sandra Lee | 3 | 4 | 5 |
+----+-------------+---------------+------------+-------------+
What's the most efficiently performing way to implement this type of copy operation? For what it's worth I'm working with MySQL, using the InnoDB table engine, in the context of Grails. I'm looking forward to hearing some good ideas on how you do something like this "the right way."
-- Regards, Vic
I've posted a MySQLDump of the example initialization on PasteBin.
EDIT
For what it's worth, I posted a much simpler / broader question here and I got generally positive responses, suggesting that I'm not "just doing it wrong"...

I have done a similar thing using INSERT INTO ... SELECT syntax. (The C api also has a MYSQL_OPTION_MULTI_STATEMENTS_ON which you can use to run multiple statements with. or you could use a procedure).
This is the most effective, since you do not have to move data between the client and the server. The value copied was a template to create a new entity. I don't understand why you are doing this with this particiluar data model.

Related

How to design a relational database with dynamic columns?

Let's say that I have a courses table with the following structure.
courses (Primary keys: (id) )
+-------+--------------------+
| id | name |
+-------+--------------------+
| 1 | Chinese - English |
| 2 | Japanese - English |
| 3 | Chinese - Spanish |
+-------+--------------------+
Courses will have columns like this.
course_columns (Primary keys: (id), Indexes: (course_id) )
+-------+-----------+----------+
| id | course_id | title |
+-------+-----------+----------+
| 1 | 1 | Chinese |
| 2 | 1 | English |
| 3 | 1 | Pinyin |
| 4 | 2 | Japanese |
| 5 | 2 | English |
| 6 | 2 | Kana |
| 7 | 2 | Romaji |
| 8 | 3 | Chinese |
| 9 | 3 | Spanish |
+-------+-----------+----------+
Lastly, courses will have words like this
course_word_data (Primary Keys: (id, column_id) )
+------+-----------+-----------+
| id | column_id | content |
+------+-----------+-----------+
| 1 | 1 | 你好 |
| 1 | 2 | hello |
| 1 | 3 | nĭ hăo |
| 2 | 1 | 谢谢 |
| 2 | 2 | goodbye |
| 2 | 3 | xièxie |
| 3 | 4 | 好む |
| 3 | 5 | to prefer |
| 3 | 6 | このむ |
| 3 | 7 | konomu |
| 4 | 8 | 你好 |
| 4 | 9 | hola |
+------+-----------+-----------+
Now let me say that I am pretty much a newbie when it comes to database and for the last few days I've been thinking about how to design this better. The use-case scenario is like the following:
User creates a course
User adds arbitrary number of columns with titles to the course
User can modify or delete existing columns
User can add, modify or delete words to the course
Also, if a word in a course is edited by the course owner, it should be course-specific and should not affect other courses.
Here are my questions:
Is this a good database design? My problem is that if there are ten columns in a course, then that means there are ten rows in course_word_data table for a single word. Basically for m columns and n words, there are m*n rows for a single course. Let's say that there are 5000 courses and the average number of columns per course is 6. If every course has an average of 1000 words, this means there are 5000*6*1000 = 30 million rows. Is it okay to do the design this way, or should I reconsider my design? What would be a better way to do it?
Is it possible to reduce the number of repeating words? As you can see, the word 「你好」exist in the table two times. Is there an efficient way to reduce it?
An example of a course (think of this as a grid element on a webpage)
+------------+--------------+--------------+
| English | Pinyin | Chinese |
+------------+--------------+--------------+
| hello | nĭ hăo | 你好 |
| goodbye | xièxie | 谢谢 |
+------------+--------------+--------------+
(Columns can be added or removed by the user.
For example part of speech column which indicates
whether a column is noun, ver, adjective etc. )
I apologize for the length of the question in advance.
Invoke Separation of Concerns principle here; It is not the same an HTML table (a View) than a database table (Persistence).
"User add columns" in a View context would be equivalent to "User SELECTs columns" in a query context. Should a user create languages? Maybe not, languages are there already.
Side note: In translation problems, it is useful a word context.

What is the best way to reduce MXN table search in MySQL

I have two MySQL tables called tasks and users. All I want to do is I don't want to display the tasks that is already done by a user in his panel. Suppose the task table has about 1000 entries and there are about 50000 users. Also the users and the tasks keep increasing.
One solution I can think of, 1st is creating a separate table of task x user size.
For example:
user table
+---------+--------+-------+
| user_id | fname | lname |
+---------+--------+-------+
| 1 | John | Smith |
| 2 | Steve | Mark |
+---------+--------+-------+
task table
+---------+-------------+---------------+
| task_id | task | task_duration |
+---------+-------------+---------------+
| 1 | Do task 1 | 1 hour |
| 2 | Do task 2 | 1 hour |
+---------+-------------+---------------+
Creating a separate table called display
+------------+---------+---------+
| display_id | task_id | user_id |
+------------+---------+---------+
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 3 | 2 | 1 |
| 4 | 2 | 2 |
+------------+---------+---------+
So only listed tasks will be shown to the particular user.
The problem is that This does not look like an efficient solution. How can I design table in this scenario in an efficient way. If not what are the other ways?

Should I have two separate tables if they differ by only one column?

I am developing a movie database with both credits for actors and filmmakers. Currently, the actor and filmmaker credits are all together in one table because they only differ by one column (the "role_id" foreign key never gets used for filmmaker credits -- it is always NULL). Is it still best practice to keep them in one table ("movie_credits") or would it be better to separate ("actor_credits" and "filmmaker_credits")?
Here is a very basic schema I've defined, and I'm including some example data in it:
movie_credits
+----+-----------+---------+---------------+----------+
| id | talent_id | role_id | identity_id | title_id |
+----+-----------+---------+---------------+----------+
| 1 | 1 | 1 | 1 | 1 |
| 2 | 2 | 2 | 1 | 2 |
| 3 | 1 | NULL | 2 | 4 |
+----+-----------+---------+---------------+----------+
talent
+----+------------+-----------+
| id | first_name | last_name |
+----+------------+-----------+
| 1 | Clint | Eastwood |
| 2 | Sylvester | Stallone |
+----+------------+-----------+
roles
+----+---------------+
| id | name |
+----+---------------+
| 1 | Walt Kowalski |
| 2 | Rocky Balboa |
+----+---------------+
identities
+----+--------------+
| id | identity |
+----+--------------+
| 1 | actor |
| 2 | director |
| 3 | producer |
| 4 | screenwriter |
+----+--------------+
titles
+----+-----------------+
| id | name |
+----+-----------------+
| 1 | Gran Torino |
| 2 | Rocky |
| 3 | Creed |
| 4 | American Sniper |
+----+-----------------+
If you see a completely different way of structuring the data for this scenario, please let me know. I am open to any solution that may be even more efficient and scalable.
Basically, both actor and filmmaker are persons.
So they must be stored in a single table.
We must avoid null and redundancy from the database. But in this case, making the extra table will increase much overhead than the profit of removing null.
Also, you can simply assign filmmaker as the role to avoid nulls.
That is, there will be a role named filmmaker.

Assistance with database design

I've got a excel sheet that contains all the employees that have worked for my company and is still working for us. It's a sheet of around 200 rows. Each row has basic info, like surname, name, position, qualification etc etc. 16 columns of basic info. Now, the tricky part is this. After the 16 columns, there are months (May-05 up to the present (Apr-12)). Under every month column, an employee either get's a 0 (contract), 1 (permanent), 2 (contract-terminated) or 3 (student).
What would be the best way to do this? I was thinking of 4 tables (listed below), where the one table determines permanently terminated people (for the sake of knowing who was on what type of employment).
MySQL Table: hr_employees
|-----------------|-------|----|----|----|
| employee_number | name | sur| etc| etc|
|-----------------|-------|----|----|----|
| 1 | Dave | F | xx | xx |
|-----------------|-------|----|----|----|
MySQL Table: hr_month
|----|--------|
| id | month |
|----|--------|
| 1 | May-05 |
| 2 | Jun-05 |
|----|--------|
MySQL Table: hr_status
|----|------|------|--------|
| id | e_no | date | status |
|----|------|------|--------|
| 1 | 1 | 1 | 1 |
| 2 | 1 | 2 | 1 |
|----|------|------|--------|
MySQL Table: hr_terminated
|----|------|
| id | e_no |
|----|------|
| 1 | 1 |
| 2 | 1 |
|----|------|
I hope you guys understand what I want to achieve, otherwise, ask a question, and I'll answer as best I can! :)
Thanks.
Here is a design that simplifies your data entry and is more relational database like and less Excel like, insofar as it's normalized.
MySQL Table: hr_employee
|-----------------|-------|----|----|----|
| employee_number | name | sur| etc| etc|
|-----------------|-------|----|----|----|
| 1 | Dave | F | xx | xx |
|-----------------|-------|----|----|----|
| 2 | Bob | M | xx | xx |
|-----------------|-------|----|----|----|
MySQL Table: hr_employee_status
|-----------------|------------|------------|--------|
| employee_number | from_date | to_date | status |
|-----------------|------------|------------|--------|
| 1 | 2005-05-01 | 2005-08-31 | 3 |
|-----------------|------------|------------|--------|
| 1 | 2006-05-01 | 2010-02-28 | 0 |
|-----------------|------------|------------|--------|
| 2 | 2010-03-01 | 9999-12-31 | 1 |
|-----------------|------------|------------|--------|
Here you can see that Dave was hired on as a student from May '05 to August '05, then he came back in May '06 as a contract employee which he worked as until the end of February '10. Then on March 1, 2010 Bob was hired as permanent employee and he is still working (max collating date means "until further notice").
The great advantage of this design is that you only have to enter/edit data when something changes, not once a month for every employee that you have or have ever had. You can also see what your workforce looked like at any given date (not just by months!) with a very simple SQL query.

MySQL - Optimizing column referencing?

I'm having difficulty trying to form the actual question I have, so hopefully I can get the point across with visual aid.
What I'm wondering is whether or not my current database design is horribly flawed, slightly inefficient, or ... correct.
Q: Is there a better way for a row cell to reference a column in another table, aside from just saving a string of that column's name?
Example:
There are two tables.
table01: has 3 columns - c_id, customer, tier (tier holds the column name of a tier in table02 )
table02: has 5 columns - i_id, item, tier1, tier2, tier3
table01 +------+----------+------+
|'c_id'|'customer'|'tier'|
+------+----------+------+
| 1 | John | tier1|
| 2 | Lisa | tier2|
| 3 | Mike | tier1|
| 4 | Tom | tier3|
+------+----------+------+
table02 +------+------+-------+-------+-------+
|'i_id'|'item'|'tier1'|'tier2'|'tier3'|
+------+------+-------+-------+-------+
| 1 | apple| $1.99 | $2.99 | $3.99 |
| 2 | chalk| $2.99 | $3.99 | $4.99 |
| 3 | pens | $3.99 | $4.99 | $5.99 |
| 4 | shirt| $4.99 | $5.99 | $6.99 |
+------+------+-------+-------+-------+
Result:
John +------+-------+
| apple| $1.99 |
| chalk| $2.99 |
| pens | $3.99 |
| shirt| $4.99 |
+------+-------+
Selecting John's row would yield tier1, which is then used to query for table02.tier1, and grab that column's contents. (In this discounted prices for tier1 customers). The tier1 items would then be displayed with their correct prices.
Is there a more efficient way to reference an outside table's column name, like... in the style of foreign keys? Where if the column name changes, then that information waterfalls to other linked data cells? Is my approach correct, or should I restructure how I'm doing things?
It isn't completely clear how the c_id and i_id columns are supposed to be used, but a more conventional design would be to use a design like this:
table01 +------+----------+------+
|'c_id'|'customer'|'tier'|
+------+----------+------+
| 1 | John | 1 |
| 2 | Lisa | 2 |
| 3 | Mike | 1 |
| 4 | Tom | 3 |
+------+----------+------+
table02 +------+------+-------+
|'i_id'|'tier'|'cost' |
+------+------+-------+
| 1 | 1 | $1.99 |
| 2 | 1 | $2.99 |
| 3 | 1 | $3.99 |
| 4 | 1 | $4.99 |
| 1 | 2 | $2.99 |
| 2 | 2 | $3.99 |
| 3 | 2 | $4.99 |
| 4 | 2 | $5.99 |
| 1 | 3 | $3.99 |
| 2 | 3 | $4.99 |
| 3 | 3 | $5.99 |
| 4 | 3 | $6.99 |
+------+------+-------+
I edited the example to better exemplify my thoughts. Which really doesn't change your answer much. So it'd be better to consolidate the tier columns into one column and just repeat the item id (i_id) data several times? My reasoning behind breaking the tier/cost structure into columns was to cut down on redundancy. Would that approach require me to make a new table to index the tiers for foreign keys?
I understand what you're doing now. Somewhere along the line, you might have a 'Sales' table which records a c_id, and i_id, a quantity, a date and other such information (if each sale consists of one item; otherwise, you'd have an 'Orders' table and an 'OrderItems', and the c_id would be associated with the orders table and the i_id would be associated with order items, and there'd be an order number to connect the two tables).
If you add your item desciption to the revised table02 above, you would be violating 3NF. You should have a table such as 'Products' with the i_id and item columns. This would give names to item numbers. The table02 (maybe 'TieredItemCosts') table I proposed would remain as it is; the primary key is the combination of (i_id, tier).