Add Foreign Key relationships as bulk operation - mysql

I've inherited a database with hundreds of tables. Tables may have implicit FK relations that are not explicitly defined as such. I would like to be able to write a script or query that would be able to do this for all tables. For instance, if a table has a field called user_id, then we know there's a FK relationship with the users table on the id column. Is this even doable?
Thanks in advanced,

Yes, possible but I would want to explore more. Many folks design relational databases without foreign keys especially in the MySQL world. Also people reuse column names in different tables in the same schema (often with less than optimal results). Double check that what you think is a foreign key can be used that way (same data type, width, collation/character set, etc.).
Then i would recommend you copy the tables to a test machine and start doing your ALTER TABLES to add foreign keys. Test like heck.

Related

Polymorphic relationships vs separate tables per type

I am working on a database which has some types (e.g. User, Appointment, Task etc.) which can have zero or more Notes associated with each type.
The possible solutions I have come across for implementing these relationships are:
Polymorphic relationship
Separate table per type
Polymorphic Relationship
Suggested by many as being the easiest solution to implement and seemingly the most common implementation for frameworks that follow the Active Record pattern, I would add a table whose data is morphable:
My notable_type would allow me to distinguish between the type (User, Appointment, Task) the Note relates to, whilst the notable_id would allow me to obtain the individual type record in the related type table.
PROS:
Easy to scale, more models can be easily associated with the polymorphic class
Limits table bloat
Results in one class that can be used by many other classes (DRY)
CONS
More types can make querying more difficult and expensive as the data grows
Cannot have a foreign key
Lack of data consistency
Separate Table per Type
Alternatively I could create a table for each type which is responsible for the Notes associated with that type only. The type_id foreign key would allow me to quickly obtain the individual type record.
Deemed by many online as a code smell, many articles advocate avoiding the polymorphic relationship in favour of an alternative (here and here for example).
PROS:
Allows us to use foreign keys effectively
Efficient data querying
Maintains data consistency
CONS:
Increases table bloat as each type requires a separate table
Results in multiple classes, each representing the separate type_notes table
Thoughts
The polymorphic relationship is certainly the simpler of the two options to implement, but the lack of foreign key constraints and therefore potential for consistency issues feels wrong.
A table per notes relationship (user_notes, task_notes etc.) with foreign keys seems the correct way (in keeping with design patterns) but could result in a lot of tables (addition of other types that can have notes or addition of types similar to notes [e.g. events]).
It feels like my choice is either simplified table structure but forgo foreign keys and increased query overhead, or increase the number of tables with the same structure but simplify queries and allow for foreign keys.
Given my scenario which of the above would be more appropriate, or is there an alternative I should consider?
What is "table bloat"? Are you concerned about having too many tables? Many real-world databases I've worked on have between 100 and 200 tables, because that's what it takes.
If you're concerned with adding multiple tables, then why do you have separate tables for User, Appointment, and Task? If you had a multi-valued attribute for User, for example for multiple phone numbers per user, would you create a separate table for phones, or would you try to combine them all into the user table somehow? Or have a polymorphic "things that belong to other things" table for user phones, appointment invitees, and task milestones?
Answer: No, you'd create a Phone table, and use it to reference only the User table. If Appointments have invitees, that gets its own table (probably a many-to-many between appointments and users). If tasks have milestones, that gets its own table too.
The correct thing to do is to model your database tables like you would model object types in your application. You might like to read a book like SQL and Relational Theory: How to Write Accurate SQL Code 3rd Edition by C. J. Date to learn more about how tables are analogous to types.
You already know instinctively that the fact that you can't create a foreign key is a red flag. A foreign key must reference exactly one parent table. This should be a clue that it's not valid relational database design to make a polymorphic foreign key. Once you start thinking of tables and their attributes as concrete types (like described in SQL and Relational Theory), this will become obvious.
If you must create one notes table, you could make it reference one table called "Notable" which is like a superclass of User, Appointment, and Task. Then each of those three tables would also reference a primary key of Notable. This mimics the object-oriented structure of polymorphism, where you can have a class Note have a reference to an object by its superclass type.
But IMHO, that's more complex than it needs to be. I would just create separate tables for UserNotes, AppointmentNotes, and TaskNotes. I'm not troubled by having three more tables, and it makes your code more clear and maintainable.
I think you should think about these two things, before you can make a decision.
Performance. a lot of reads, a lot of writes ? Test which is better.
Growth of your model. Can it easily be expanded ?

SQL - (Foreign key?) constraint to table names?

I'm curious if something like this is possible, if at all reasonable.
I have a column in a table, that's called ref_table and it points to a table that the current entry relates to. Let's say, in table table_people, Person ID 1 is a client and Person ID 3 is an employee, so respectively their ref_tables will show "table_clients" and "table_emplyees". I shouldn't have a problem keeping the values valid through PHP, but what would some ways of achieving it through SQL be?
I tried testing it with a foreign key constraint to INFROMATION_SCHEMA:
FOREIGN KEY `people_constraint_tables` (`ref_table`)
REFERENCES `INFORMATION_SCHEMA`.`COLUMNS`(`COLUMN_NAME`)
ON DELETE RESTRICT
ON UPDATE RESTRICT
No point refining it since it didn't work. It seems like there's one way to make it work but it is a dirty cheat apparently.
Would you do it with triggers? Would you do it at all? Someone with experience with MySQL tell me if that'sreasonable at all, I'd like to know. Thank you.
MySQL doesn't have the facility to do this easily. Other databases do, through generated columns or table inheritance.
Would I do this with triggers? Well, yes and no. If I had to do this with one table and I had to use MySQL and I wanted to introduce relational integrity, then triggers are the way to go. There is little other choice.
But really, I would simply have a different table for each reference type. There is a little bit of overhead in this (in terms of partially filled tables). And for some applications, a single reference table is quite convenient (internationalization comes to mind). But in general, I would stick with the standard method of a separate table for each entity with properly declared foreign key relationships.

Foreign Keys in Database Design

Not so much a problem but a question about best practice and what will work for me in the future.
I have a number of tables which contain data that are linked to accounts in my schema - services, locations, providers, etc.
I have two choices, I can add a foreign key to accounts to all of my tables which will reduce the number of joins needed, but potentially will add to the data stored and (maybe?) lead to inconsistencies.
So, my question is, should I add an accounts FK to services, locations, etc. or rely on joins to manage that for me?
Without knowing the structure of your databases it's hard to give a correct answer. But let's take a single table providers for example. If a provider can only have one account, then I would add a FK to the providers table. If this is not the case then I would not use a FK because it wouldn't work.
Foreign Keys are to relate things together so there is no inconsistency. So if you had an employees table and a departments table, employees would have a FK to departments because an employee can only be in one department.
You seem to have some trouble understanding FK's and what they give you.
Your FK should join to a table with a PK (primary key) and this ensures data integrity between the tables.
However if you do not index the columns of the FK, it will be an unindexed FK and this can lead to full table scans on your joins.
PK's and FK's are merely constraints and do not add to storage. Indexing a FK adds to storage, but the performance benefits of indexes usually outweigh the overhead of storage.
Using PK's and indexed FK's are all part of normalized data design and you should not have concerns in using them.

What is the best way to merge 2 MySQL data dumps?

We have built an application with MySQL as the database. Every week we export the data dump from the database, and delete all the data. Now we want to merge all these dumps together for some data-analysis tasks.
The problem we are facing is that the "id" field for all the tables is Auto-Increment, so it starts with 1 in all the data dumps, which causes duplicate IDs in the table. I am sure there must be better ways to do it since it should be a pretty common task in MySQL administration.
What would be the best way to go about it?
If you can easily identify your foreign key fields (like they take the form *_id) then you can use the scripting language of your choice to modify the primary and foreign keys in the dump files by adding an "id space offset".
For example let's say you have two dump files and you know their primary key range does not exceed 1,000,000, you increment the primary and foreign keys in the second dump file by 1,000,000.
This is not entirely trivial to implement, as you will have to detect the position of the foreign key fields in the statements and then modify values at the same column position elsewhere in the statement.
If your foreign keys are not easily identifiable by a common naming convention then you must keep separate information per table about how to find their positions based on column position.
Good luck.
The best way would be that you have another database that acts as data warehouse into which you copy the contents of your app's database. After that, you don't truncate all the tables, you simply use DELETE FROM tablename - that way, your auto_increments won't get reset.
It's an ugly solution to have something exported, then truncate the database, then expect an import will proceed properly. Even if you go around the problem of clashing auto increments (there's ON DUPLICATE KEY statement that allows you to do something if a unique key constraint fails), nothing guarantees that relations between tables (foreign keys) will be preserved.
This is a broad topic and solution given is quick and not nice, some other people will probably suggest other methods, but if you are doing this to offload the db your app uses - it's a bad design. Try to google MySQL's partitioning support if you're aiming for better performance with larger data set.
For the data you've already dumped, load it into a table that doesn't use the ID column as a primary key. You don't have to define any primary key. You will have multiple rows with the same ID, but that won't impede your data analysis.
Going forward, you can set up a discipline where you dump and then DELETE the rows that are more than, say, one day old. That way the your ID will keep incrementing.
Or, you can copy this data to a table that uses the ARCHIVE storage engine. This is good for retaining data for analysis, because it compresses its contents.

Alternate to storing Large number of tables -- MySQL

Well, I have been working with large amount of network data. In which I have to filter out some IP address and store their communication with other IP's. But the number of IP's are huge, hundreds of thousands, for which I have to create so many tables. Ultimately I my MySQL access will slow down, everything will slow down. Each table will have few columns, many rows.
My Questions:
Is there a better way to deal with this, I mean storing data of each IP?
Is there something like table of tables?
[Edit]
The reason I am storing in different tables is, I have to keep removing and add entries as time passes by.
Here is the table structure
CREATE TABLE IP(syn_time datetime, source_ip varchar(18), dest_ip varchar(18));
I use C++ to access with ODBC connector
Don't DROP/CREATE tables frequently. MySQL is very buggy with doing that, and understandably so--it should only be done once when the database is created on a new machine. It will hurt things like your buffer pool hit ratio, and disk IO will spike out.
Instead, use InnoDB or xtradb, which means you can delete old rows whilst inserting new ones.
Store the IP in a column of type int(10) unsigned e.g. 192.168.10.50 would be stored as (192 * 2^24) + (168 * 2^16) + (10 * 2^8) + 50 = 3232238130
Put all the information into 1 table, and just use an SELECT ... WHERE on an indexed column
Creating tables dynamically is almost always a bad idea. The alternative is normalisation. I won't go into the academic details of that, but I'll try to explain it in more simple terms.
You can separate relationships between data into three types: one-to-one, one-to-many and many-to-many. Think about how each bit of data relates to other bits and which type of relationship it has.
If a data relationship is one-to-one,
then you can usually just stick it in
the same row of the same table.
Occasionally there may be a reason to
separate it as if it were
one-to-many, but generally speaking,
stick it all in the same place.
If a data relationship is
one-to-many, it should be referenced
between two tables by it's primary
key (you've given each table a
primary key, right?). The "one" side
of one-to-many should have a field
which references the primary key of
the other table. This field is called
a foreign key.
Many-to-many is the most complex
relationship, and it sounds like you
have a few of these. You have to
create a join table. This table will
contain two foreign key fields, one
for one table and another for the
other. For each link between two
records, you'll add one record to
your join table.
Hopefully this should get you started.