I am moving an old Mantis table that had a varchar(64) category_id column to a new Mantis table that has a int(10) category_id column.
The simplified structure is as follows
bug_table (Old DB)
+----+-------------+-------------+--------+
| id | project_id | category_id | report |
+----+-------------+-------------+--------+
| 1 | 0 | Server | crash |
| 2 | 0 | Database | error |
| 3 | 1 | Server | bug |
| 4 | 1 | Server | crash |
+----+-------------+-------------+--------+
category_table (New DB)
+----+------------+----------+
| id | project_id | name |
+----+------------+----------+
| 0 | 1 | Server |
| 1 | 1 | Database |
| 2 | 2 | Server |
| 3 | 2 | Database |
+----+------------+----------+
I need a magical query that will replace category_id in the bug_table with the numerical category_id in the category_table. Thankfully I am able to match rows by project_id and categories by name.
Here is the query I am working on but have gotten stuck in the complexity
UPDATE bug_table b SET b.category_id = c.id USING category_table WHERE b.category_id = c.name
I like to approach such a task a little differently than you do for a new lookup/reference table.
To me, the new category table would only have id and name columns. There are only two rows based on the sample data: Server and Database. Yes, I realize there could be other names, but those can easily be added, and should be added, before proceeding to maximize the id matching that follows.
Next I would add a new column to the bug table that could be called 'category_new' with the data type that will store the new category id. Alternatively, you could rename the existing category_id column to category, and the new column for the id's could then be column_id.
After all that is done then you can update the new column by joining the category on names and set the id that matches: (note this assumes the non-alternative approach mentioned in step 2)
UPDATE bug_table JOIN category_table ON bug_table.category_id = category_table.name
SET bug_table.category_new = category_table.id
After that runs, check the new column to verify the updated id's.
Finally, after successful update, now the old category_id column (with the names) from the bugs_table can be dropped, and the category_new column can be renamed as the category_id.
=====
Note that if you decide to go with the alternative column approach mentioned, of course the query will be similar but differ slightly. Then only a column drop is needed at the end
If there are other tables to apply the same category changes, the operation (basically steps 2 through 5) would be similar for those tables too.
When a row is duplicated in our system, it inserts the new row with a reference back to the ID of the one it was duplicated from.
If that new row is then duplicated, it has a reference back to the row it was duplicated from and so on.
What I cant work out is how to follow this trail in a SELECT.
Take the following data...
+------+-------------------+
| ID | duplicated_from |
+------+-------------------+
| 1 | NULL |
| 2 | 1 |
| 3 | 2 |
| 4 | NULL |
+------+-------------------+
So, given ID 1, how would you look up all the slides in the chain that have been duplicated off it?
Or is this something that will have to be done at an application level?
Seems that you are after a recursive query, i found this solutions that may help you How to create a MySQL hierarchical recursive query
HTH
I creating a database in which I have an artefact that can be associated with either a project, production or performance. I will call the relationship 'comes_from'. This relationship can be a project or a more specific version of a project such as a production or performance.
I don't want to have separate foreign keys on my artefact for each possible value of the 'comes_from' relationship as it feels wrong to have multiple attributes for the same relationship. The only way I can think of doing this is having a separate table that stores the comes_from relationship containing the id of the referenced project or more specific version along with the table the item is located in.
artefact table
+-------------+------------+
| artefact_id | comes_from | -- Foreign key to comes_from
+-------------+------------+
| 1 | 7 |
| 2 | 8 |
+-------------+------------+
comes_from table
+---------------+-----------------+---------------------------------+
| comes_from_id | comes_from (FK) | comes_from_table (FK table) |
+---------------+-----------------+---------------------------------+
| 7 | 19 | project |
| 8 | 13 | performance |
| 9 | 21 | production |
+---------------+-----------------+---------------------------------+
project table
+-------------+
| project_id |
+-------------+
| 19 |
| 20 |
+-------------+
performance table
+-----------------+
| performance_id |
+-----------------+
| 13 |
| 14 |
+-----------------+
production table
+---------------+
| production_id |
+---------------+
| 21 |
| 22 |
+---------------+
Is there a better way to do this as I am not sure I can even resolve this relationship in a SQL query and it may cause issues when I use Doctrine as an ORM on top of this database.
Your solution is good, the "comes_from_table" column could be a simple VARCHAR or INT indexed field acting as a discriminator field. However, I would remove the "comes_from" column from the "artefact" table and the "comes_from_id" column and use directly the "artefact_id" column to reference artefacts in the relationship table.
Regarding Doctrine there shouldn't be any problem, I did something similar in the past using Symfony2 and Doctrine2 for an entity called Tags where a Tag could either belong to a contact or to a contact spouse. I also created a function in the repository file where I could pass the "tag_type" as a parameter so that I could get either the contact or the contact spouse tags.
My question is in regards to normalizing data.
INFO
I'm trying to tabulate test results in a database. The information I'd like to record is test_instance, user_id, test_id, completed(date/time), duration (of test), score, incorrect questions and reviewed questions.
For the most part, I think I'd organise the info according to TABLE 1, but I've come a little unstuck trying to work out the best way to record incorrect or reviewed questions. Please note that I DON'T want to put all the incorrect questions together in one entry as per TABLE 2.
I'd like to make a separate entry for each incorrectly marked question (or reviewed question).
NOTE: Reviewed questions are ones that at one time or another were marked incorrectly and hence need to be tested again.
TABLE 1
-------------------------------------------------------------
| instance | user_id | test_id |completed |duration|score|
-------------------------------------------------------------
| 1 | 23 | 33 | 2JAN2012 | 20m | 75 |
| 2 | 11 | 12 | 10DEC2011| 35m | 100 |
| 3 | 1 | 3 | 3JUL2008 | 1m | 0 |
| 4 | 165 | 213 | 4SEP2010 | 10m | 50 |
-------------------------------------------------------------
TABLE 2
------------------------
| instance ||wrong Q|
------------------------
| 1 || 3,5,7 |
------------------------
Ultimately, I'd like to know how many times a user has gotten a particular question wrong over time. Also, I need to keep track of which test the wrong questions came from. This is the same for the reviewed questions.
Incidentally it's possible for questions to be reviewed AND wrong in the same instance.
I've come up with 2 different ways to represent the data, but I don't like either of them.
-------------------------------------------------
| instance | Q number | Wrong | Reviewed |
-------------------------------------------------
OR
---------------------------------------------------
| user_id | test_id | Q number | Wrong | Reviewed |
---------------------------------------------------
Note: Wrong/Reviewed category is counting how many times the Q number falls into that category.
MY QUESTIONS SUMMARISED
How can I efficiently represent wrong/reviewed questions in a table? Is TABLE 1 set up efficiently?
EDIT : Questions that have been answered incorrectly can be used to generate new tests. Only incorrect questions will be used for the tests. If a generated test is taken, the questions tested will be marked as reviewed. The score will not be updated as it will be a new test and a new test_id will be generated.
NOTE-It is possible to retake old tests, but the score will not be updated. A new instance will be created for each test that is taken.
In regards to the generated tests, I guess this means I will need to include one more table to keep track of which quiz the questions originally came from. Sorry- I hadn't thought it all the way through to the end.
THANKS
It was difficult for me to choose an answer as everyone gave me really useful information. My final design will take into consideration everything you have said. Thanks again.
Revisiting my answer after you updates, I came up with this kind of layout which I think would work quite nicely.
As a prerequisite, I'm assuming you have your tests and questions somewhere. For consistency, I'm including them (with only relevant columns) in my layout.
USERS
- user id
TESTS
- test id
QUESTIONS
- question id
- test id
Then for the interesting part. Considering how you say:
Questions that have been answered incorrectly can be used to generate
new tests. Only incorrect questions will be used for the tests
You don't mention how many times a test can be retaken, I assume an indefinite or at least more than one time.
TEST INSTANCE
- instance id [PK]
- revision id [PK]
- user id
- completed
- duration
COMMENT: you may want to consider replacing completed and duration with
a start and end timestamp. They will serve the same purpose without
the need for any calculations at insert/update.
TEST INSTANCE SCORE
- instance id [FK, TEST INSTANCE (instance id)]
- score
FAILED QUESTIONS
- question id [FK, QUESTION (question id)]
- instance id [FK, TEST INSTANCE (instance id)]
- reviewed [FK, TEST INSTANCE (revision id)]
Then to my comments.
As I see it, a new actual test for the failed questions wouldn't make sense, so instead I added a revision id to the TEST INSTANCE table. Each time a test is retaken, a new record for the same instance id with a revision id (e.g. a running number sequence) is created.
Any failed questions would be stored in FAILED QUESTIONS along with the instance id and initially a NULL value for reviewed. When a failed question is considered reviewed, its reviewed column would be updated with the revision id of the latest test instance for instance id.
With this approach, you will have a complete history of how many times a failed question has been attempted before it was successfully reviewed.
Furthermore, I chose in my revised answer to move the score to its own table, because you said scores won't be updated despite reviewing the failed questions and my proposed model would have introduced data duplication. You'll notice I left out the revision id from that table, because for a test instance (and any number of revisions) there is only one score.
Talking about normalization, and just to make sure that you can retrieve all kind of calculated data out of your database, I'd propose a more complex model, which will end up in something easier to manage...
You'll need the following tables
test_table
PK: id_test
testDescription
question_table
PK: id_question
FK: id_test
questionDescription
instance_table *please note that duration and scores will be calculated later on
PK: id_instance
FK: id_user
FK: id_test
startingTime
endingTime
question_instance_table
PK: id_question_instance
FK: id_instance
FK: id_question
questionResult (Boolean)
(please note here that the PK could be id_instance + id_question ...)
Back to your needs, we then have the following:
duration is calculated with startingDate and endingDate of instance_table
score is calculated as the sum of True values from questionResult field
you can track and compare answers on same question over time for one user
thus your reviewed questions can be defined as questions with at least one false value for a specific user
if your database supports null values for boolean fields, you'll have the possibility to follow unanswered questions (with questionResult = Null). Otherwise, I advise you to use or build a three states field (integer with Null allowed, plus 0 and 1 values for example) to follow unanswered questions (null), wrong answers (0), and correct answers (1).
Score, being 100 * (number of good answers)/(number of questions in the test), can easily be calculated via SQL agregates.
You could even calculate partial scores as number of good answers/number of questions answered in the test.
This model accepts any number of tests, any number of questions per test, any number of instances, any number of users...
Of course, it can be further improved by adding missing properties to tables (testNumber, questionNumber fields for example)
etc...
Assuming the number of questions on a test don't change, and that each question is worth the same number of marks, I suggest the following tables:
test
----
test_id
number_of_questions
test_instance
-------------
instance_id
user_id
test_id
completed
duration
notable_questions
-----------------
instance_id
question_id
status (W - Wrong, R - Reviewed)
So, for example:
test:
---------------------------------
| test_id | number_of_questions |
---------------------------------
| 3 | 50 |
| 12 | 100 |
| 33 | 78 |
| 213 | 50 |
---------------------------------
test_instance:
-------------------------------------------------------
| instance_id | user_id | test_id |completed |duration|
-------------------------------------------------------
| 1 | 23 | 33 | 2JAN2012 | 20m |
| 2 | 11 | 12 | 10DEC2011| 35m |
| 3 | 1 | 3 | 3JUL2008 | 1m |
| 4 | 23 | 213 | 4SEP2010 | 10m |
-------------------------------------------------------
notable_questions:
------------------------------------
| instance_id |question_id| Status |
------------------------------------
| 1 | 3 | W |
| 1 | 5 | W |
| 1 | 7 | W |
| 4 | 2 | R |
------------------------------------
From the above example I assume that instance correlates directly to user_id+test_id combination.
If that is so, you can consider having table 2 in the following format:
Instance | question_id | status | date
PK for the table should be on instance, question_id and status.
entries in this table will not be updated, only inserted. That way you can have:
Instance | question_id | status | date
1 3 W 1/1/2011
1 3 R 1/5/2011
this will allow you complete tracking of wrong and reviewed questions, and the date of review. if you don't need the date of review, don't define this column :)
You can add a unique index on instance and status fields, so when you access the table your search will be more efficient.
*Additional data that can be added to the 2nd table is "new test_id" and "new question_id" for reviewed questions, so that you can check if for the same question (assuming question_id is generated each time) you still have failures.
Is there an easy way to backup and restore partial data from a mysql database while maintaining the FK constraints?
Say if I have 2 tables
| CustomerId | CustomerName |
-----------------------------
| 12 | Bon Jovi |
| 13 | Seal |
and
| AddressId| CustomerId | City |
---------------------------------------
| 1 | 12 | London |
| 2 | 13 | Paris |
The backup would only take customer 12 and address 1.
My goal is to take a large database from a production server and replicate it locally, but with partial data.
Due to fairly complicated schema, a custom query is not an option. Also I can't rely on the existence of a main table from which one would get the related rows.
Thanks
You could replicate specific customers manually and by adding an FK constraint on the address table replication will fail to insert/update these records.
For replicating specified tables in the db http://dev.mysql.com/doc/refman/5.1/en/replication-options-slave.html#option_mysqld_replicate-do-table .
Use this parameter to silently skip errors on replication http://dev.mysql.com/doc/refman/5.1/en/replication-options-slave.html#sysvar_slave_skip_errors .