SQL - (Foreign key?) constraint to table names? - mysql

I'm curious if something like this is possible, if at all reasonable.
I have a column in a table, that's called ref_table and it points to a table that the current entry relates to. Let's say, in table table_people, Person ID 1 is a client and Person ID 3 is an employee, so respectively their ref_tables will show "table_clients" and "table_emplyees". I shouldn't have a problem keeping the values valid through PHP, but what would some ways of achieving it through SQL be?
I tried testing it with a foreign key constraint to INFROMATION_SCHEMA:
FOREIGN KEY `people_constraint_tables` (`ref_table`)
REFERENCES `INFORMATION_SCHEMA`.`COLUMNS`(`COLUMN_NAME`)
ON DELETE RESTRICT
ON UPDATE RESTRICT
No point refining it since it didn't work. It seems like there's one way to make it work but it is a dirty cheat apparently.
Would you do it with triggers? Would you do it at all? Someone with experience with MySQL tell me if that'sreasonable at all, I'd like to know. Thank you.

MySQL doesn't have the facility to do this easily. Other databases do, through generated columns or table inheritance.
Would I do this with triggers? Well, yes and no. If I had to do this with one table and I had to use MySQL and I wanted to introduce relational integrity, then triggers are the way to go. There is little other choice.
But really, I would simply have a different table for each reference type. There is a little bit of overhead in this (in terms of partially filled tables). And for some applications, a single reference table is quite convenient (internationalization comes to mind). But in general, I would stick with the standard method of a separate table for each entity with properly declared foreign key relationships.

Related

Foreign keys when cascades aren't needed

If I don't need to use cascade/restrict and similar constraints in a field which would logically be a foreign key, do I have any reason to explicitly declare it as a foreign key, other than aesthetics?
Wouldn't it actually decrease performance, since it has to test for integrity?
edit: to clarify, I don't need it since:
I won't edit nor delete those values anyway, so I don't need to do cascade and similar checks
Before calling INSERT, I'll check anyway if the target key exists, so I don't need restrict checks either
I understand that this kind of constraint will ensure that that relation will be still valid if the database becomes somehow corrupted, and that is a good thing. However, I'm wondering if there is any other reason to use this function in my case. Am I missing something?
The answers to this quesiton might actually also apply to your question.
If you have columns in tables which reference rows in other tables, you should always be using foreign keys, since even if you think that you 'do not need' the features offered by those checks, it will still help guarantee data integrity in case you forgot a check in your own code.
The performance impact of foreign key checks is neglegible in most cases (see above link), since relational databases use very optimised algorithms to perform them (after all, they are a key feature since they are what actually defines relations between entities).
Another major advantage of FKs is that they will also help others to understand the layout of your database.
Edit:
Since the question linked above is referring to SQL-Server, here's one with replies of a very similar kind for MySQL: Does introducing foreign keys to MySQL reduce performance
You must to do it. If it will touch performance in write -- it's a "pixel" problem.
Main performance problems are in read -- FKs could help query optimizer to select best plan and etc. Even if you DBMS(-s) (if you provide cross-DBMS solution) will gain from it now -- it can happen later.
So answer is -- yes, it's not only aestetics.

What is the best way to merge 2 MySQL data dumps?

We have built an application with MySQL as the database. Every week we export the data dump from the database, and delete all the data. Now we want to merge all these dumps together for some data-analysis tasks.
The problem we are facing is that the "id" field for all the tables is Auto-Increment, so it starts with 1 in all the data dumps, which causes duplicate IDs in the table. I am sure there must be better ways to do it since it should be a pretty common task in MySQL administration.
What would be the best way to go about it?
If you can easily identify your foreign key fields (like they take the form *_id) then you can use the scripting language of your choice to modify the primary and foreign keys in the dump files by adding an "id space offset".
For example let's say you have two dump files and you know their primary key range does not exceed 1,000,000, you increment the primary and foreign keys in the second dump file by 1,000,000.
This is not entirely trivial to implement, as you will have to detect the position of the foreign key fields in the statements and then modify values at the same column position elsewhere in the statement.
If your foreign keys are not easily identifiable by a common naming convention then you must keep separate information per table about how to find their positions based on column position.
Good luck.
The best way would be that you have another database that acts as data warehouse into which you copy the contents of your app's database. After that, you don't truncate all the tables, you simply use DELETE FROM tablename - that way, your auto_increments won't get reset.
It's an ugly solution to have something exported, then truncate the database, then expect an import will proceed properly. Even if you go around the problem of clashing auto increments (there's ON DUPLICATE KEY statement that allows you to do something if a unique key constraint fails), nothing guarantees that relations between tables (foreign keys) will be preserved.
This is a broad topic and solution given is quick and not nice, some other people will probably suggest other methods, but if you are doing this to offload the db your app uses - it's a bad design. Try to google MySQL's partitioning support if you're aiming for better performance with larger data set.
For the data you've already dumped, load it into a table that doesn't use the ID column as a primary key. You don't have to define any primary key. You will have multiple rows with the same ID, but that won't impede your data analysis.
Going forward, you can set up a discipline where you dump and then DELETE the rows that are more than, say, one day old. That way the your ID will keep incrementing.
Or, you can copy this data to a table that uses the ARCHIVE storage engine. This is good for retaining data for analysis, because it compresses its contents.

How do you handle descriptive database table names and their effect on foreign key names?

I am working on a database schema, and am trying to make some decisions about table names. I like at least somewhat descriptive names, but then when I use suggested foreign key naming conventions, the result seems to get ridiculous. Consider this example:
Suppose I have table
session_subject_mark_item_info
And it has a foreign key that references
sessionSubjectID
in the
session_subjects
table.
Now when I create the foreign key name based on fk_[referencing_table]__[referenced_table]_[field_name] I end up with this maddness:
fk_session_subject_mark_item_info__session_subjects_sessionSubjectID
Would this type of a foreign key name cause me problems down the road, or is it quite common to see this?
Also, how do the more experienced database designers out there handle the conflict between descriptive naming for readability vs. the long names that result?
I am using MySQL and MySQL Workbench if that makes any difference.
UPDATE
Received the answers I needed below, but I wanted to mention that after some testing, I discovered that MySQL does have a limit on how long the FK name can be. So using the naming convention I mentioned, and descriptive table names, meant that in two instances in my db I had to shorten the names to avoid the MySQL 1059 error
http://dev.mysql.com/doc/refman/5.1/en/error-messages-server.html#error_er_too_long_ident
Why do you care what the FK names are? You never see them in code or use them. We also name our tables quite descriptively and commonly have names like this, using SQL Server. It doesn't matter to us, because we never seen them. They are just there to enforce data.
FK names are important for maintenance. Generally I only refernce the FK and the two table names, not the fields in the names. If you have named your fields correctly, it will be obvious what the fields are.
Although it probably makes no difference. I will say that i've had table names both ways. And in my opinion using long descriptive table names is overkill, and when working in code or even at the command line these long table names become burdensome and tedius. I mean seriously, who in their right mind would have a nearly 30 character table name, ie. stationchangelogmasterreport. Now imagine tens or even hundreds of these in a database system. from a developers point of view, this is just dumb!! My recommendation... put some thought into it, use abbreviations (when you can) and keep it short and to the point. for example, the above table name could be shortened to: stnchangelog, and if someone absolutely NEEDS a huge description explaining every meaning and use case for the table, then put this description in the table metadata, ie. the comments on the table. This keeps us developers from going crazy (and hating you for it), and offers the "meaning" of the table if needed.

Should I add a autoinc primary key for the sake of having a primary key?

I have a table which needs 2 fields. One will be a foreign key, the other is not necessarily unique. There really isn't a reason that I can find to have a primary key other than having read that "every single tabel ever needs needs needs a primary key".
Edit:
Some good thoughts in here.
For clarity's sake, I will give you an example that is similar to my database needs.
Let's say have a table with product type, quantity, cost, and manufacturer.
Product type will not always be unique (say, MP3 Player), but manufacturer/product type will be unique (say, Apple MP3 Player). Forget about the various models the manufacturers make for this example. For ease, this table has a autoincrementing primary key.
I am giving a point value and logging how often these products are searched for, added to a cart, and bought for display on a list of hot items.
The way I have it layed out currently is in a second table with a FK pointing to the main table, and a second column for the total number of "popularity points" this item has gained.
The answers have seen here have made me think that perhaps I should just add a "points" column to my primary products table so that I could just track there... but that seems like I'm not normalizing my database enough.
My problem is I'm currently mostly just a hobbyist doing this for learning, and don't have the luxury of a DBA to tell me how to set up my tables, so I have to learn both the coding side and the database side.
You have to distinguish between primary key and surrogate key. Auto-incremented column would be a particular case of the latter. Your question, therefore, is twofold:
Does every table need to have a primary key?
Does every table need to have a surrogate primary key?
The answer to first question is YES except in some special cases (association table for many-to-many relationship arguably being an example of such a special case). The reason for this is that you usually need to be able (if not right now then in the future) to consistently address individual rows of that table - for updates / deletion, for example.
The answer to the second question is NO. If your table represents a core business entity then OR it can be referenced from many-to-one association, having a surrogate key is probably a good idea; but it's not absolutely necessary.
It's somewhat unclear what your table's function is; from your description it sounds like it has "collection of values" semantics (FK to "main" table + value). Certain ORMs don't support surrogate keys in such circumstances; if that's what has prompted your question it's OK to leave the surrogate (or even primary in case of bag) key off.
For the sake of having something unique and as identifier, please please please please have a primary key in every table :)
It also helps forward compaitability in case there are future schema changes and 2 values are no long unique. Plus, memory are much cheaper now, feel free to use them as investments. ;)
i am not sure how the other field looks like .. but i am guessing that it would be to ok to have a composite primary key , which is based on the FK and the other field .. but then again i dont know your exact scenario.
I would say that it's absolutely necessary to have some sort of primary key in every table.
Interestingly enough, one of the DBA's for a Viacom property once told me that there was really no discernible difference in using an INT UNSIGNED or a VARCHAR(n) as a primary key in MySQL. This was in reference to a user table with more than 64 million rows. I believe n can be decently large (<=100), but I forget the what they limited to. Unfortunately, I don't have any empirical data to back that up.
You don't HAVE to have a primary key on every table, but it is considered best practice to have them as they are almost always necessary on a normalized relational database design. If you're finding a bunch of tables you don't think need PKs, then you should revisit the design/layout of your tables. To read more on normalization see here.
A couple scenarios that I can think of where you may not need or want a PK on a table would be a table strictly for logging. (to limit performance degradation of writing the log and maintaining a unique index) and in the scenario where your just storing data used to pump through an application for test purposes.
I'll be contrary and say you shouldn't add the key if you don't have a reason for it. It is very easy to add this column later if needed.
Strictly speaking, a surrogate key is not necessary, but a primary key is.
Many people use the term "primary key" to mean a single column that is an auto-incrementing integer. But this is not an accurate definition of a primary key.
A primary key is a constraint on one or more columns that serve to identify each row uniquely. Yes, you need some way of addressing individual rows. This is a crucial characteristic of a relation (aka a table).
You say you have a foreign key and another column that is not unique. But are these two columns taken together unique? If so, you can declare a primary key constraint over these two columns.
Defining another surrogate key (also called a pseudokey -- the auto-incrementing type) is a convenience because some people don't like to have to reference two columns when selecting a single row. Or they want the freedom to change values in the other columns easily, without changing the value of the primary key by which one addresses the individual row.
This is a technique related to normalization and a pretty good practice. A key made up of an auto incrementing number has many benefits:
You have a PK that does not pertain to the data.
You never have to change the PK value
Every row will automatically have a unique identifier

Add Foreign Key relationships as bulk operation

I've inherited a database with hundreds of tables. Tables may have implicit FK relations that are not explicitly defined as such. I would like to be able to write a script or query that would be able to do this for all tables. For instance, if a table has a field called user_id, then we know there's a FK relationship with the users table on the id column. Is this even doable?
Thanks in advanced,
Yes, possible but I would want to explore more. Many folks design relational databases without foreign keys especially in the MySQL world. Also people reuse column names in different tables in the same schema (often with less than optimal results). Double check that what you think is a foreign key can be used that way (same data type, width, collation/character set, etc.).
Then i would recommend you copy the tables to a test machine and start doing your ALTER TABLES to add foreign keys. Test like heck.