I have this idea I've been mulling around in my head based on another concept I read somewhere. Basically you have a single "Primary" table with very few fields, other tables inherit that primary table through a foreign key. This much has been done before so its no news. What I would like to do, is to have virtually every table in the database inherit from that Primary table. This way, every object, every record, every entry in every table can have a fully unique primary key(since the PK is actually stored in the Primary table), and can be simply referenced by ID instead of by table.
Another benefit is that it becomes easy to make relationships that can touch multiple tables. For example: I have a Transaction table, and this table wants to have a FK to whatever it is a transaction for(inventory, account, contact, order, etc.). The Transaction can just have a FK to the Primary table, and the necessary piece of data is referenced through that.
The issue that keeps coming up in my head, is whether or not that Primary table will become a bottleneck. The thing is gonna have literally millions of records at one point. I know that gigantic record sets can be handled by good table design, but whats the limit?
Has anyone attempted anything similar to this, and what were your results?
You have to consider that this table will have a tons of foreign key relations. These can cause performance issues, if you want to delete a row from the root table. (Which can cause some nasty execution plans on delete)
So if you plan to remove rows, then it could impact performance. I recently had issues with a setup like this, and it was a pain to clean it up (it was refferencing 120 other tables - deletes where slow as hell).
To overcome this performance issue, you might consider not enforcing contrains (Bad plan), using no contrains for performance (Bad plan), or try to group all data that belongs to one entity in one row, and stick to the normal normalization practices (Good plan)
Yes, the primary table will almost certainly be a bottleneck.
How do you enforce real referential integrity?
For example, How can you be sure that the transaction's FK is actually linked to an inventory, account, contact or order rather than an apple, orange or pineapple?
I think this is something that would be a horrible bottleneck. Not only that it would make enforcing the real PK/FK relationships much harder. It could create a data integrity nightmare. I don't see where you gain any benefits at all.
Related
Scenario 1 : Foreign keys are defined properly
Scenario 2 : On Deletion and Updation of records, I can ensure that no orphaned data will be left while writing API without defining any FK. Also I will check data integrity while Insertion
So what is the difference between these two scenarios? I just want to know what benefits I will get using FK (quantitative analysis). Will I achieve better performance in Scenario - 1 than 2? I am newbie in MySQL database design.
Performance differences...
An FK check must reach into the other table (via an index) to do the integrity check. However...
Situation 1: Simple FK:
In many cases, you can, via understanding the flow of the app code, assure yourself that some FK violations "cannot" happen. For example, when you insert into two tables in a row (and you have checked for errors, etc), and the second table is to point to the first table's row you just inserted, then the FK test is redundant and hurts performance (a little).
If, on the other hand, you "simulate" an FK check by doing an extra SELECT, that would be a noticeable performance hit.
Situation 2: FK with cascading delete:
If you use FKs for "cascading delete" (etc), then this may be more efficient than manually doing the delete.
Further, if you can batch the DELETEs, it is probably faster than letting the cascade do them one by one.
Another comment: "Side effects", such as 'cascading' are considered (by some) to be a naughty coding practice.
But... The above differences are probably not enough to make a difference in your app.
I am currently learning about foreign keys and trying to add them as much as I can in my application to ensure data-integrity. I am using INNODB on Mysql.
My clicks table has a structure something like...
id, timestamp, link_id, user_id, ip_id, user_agent_id, ... etc for about 12 _id columns.
Obviously these all point to other tables, so should I add a foreign key on them? MySQL is creating an index automatically for every foreign key, so essentially I'll have an index on every column? Is this what I want?
FYI - this table will essentially be my most bulky table. My research basically tells me I'm sacrificing performance for integrity but doesn't suggest how harsh the performance drop will be.
Right before inserting such a row, you did 12 inserts or lookups to get the ids, correct? Then, as you do the INSERT, it will do 12 checks to verify that all of those ids have a match. Why bother; you just verified them with the code.
Sure, have FKs in development. But in production, you should have weeded out all the coding mistakes, so FKs are a waste.
A related tip -- Don't do all the work at once. Put the raw (not-yet-normalized) data into a staging table. Periodically do bulk operations to add new normalization keys and get the _id's back. Then move them into the 'real' table. This has the added advantage of decreasing the interference with reads on the table. If you are expecting more than 100 inserts/second, let's discuss further.
The generic answer is that if you considered a data item so important that you created a lookup table for the possible values, then you should create a foreign key relationship to ensure you are not getting any orphan records.
However, you should reconsider, whether all data items (fields) in your clicks table need a lookup table. For example ip_id field probably represents an IP address. You can simply store the IP address directly in the clicks table, you do not really need a lookup table, since IP addresses have a wide range and the IP addresses are unique.
Based on the re-evaluation of the fields, you may be able to reduce the number of related tables, thus the number of foreign keys and indexes.
Here are three things to consider:
What is the ratio of reads to writes on this table? If you are reading much more often than writing, then more indexes could be good, but if it is the other way around then the cost of maintaining those indexes becomes harder to bear.
Are some of the foreign keys not very selective? If you have an index on the gender_id column then it is probably a waste of space. My general rule is that indexes without included columns should have about 1000 distinct values (unless values are unique) and then tweak from there.
Are some foreign keys rarely or never going to be used as a filter for a query? If you have a last_modified_user_id field but you never have any queries that will return a list of items which were last modified by a particular user then an index on that field is less useful.
A little bit of knowledge about indexes can go a long way. I recommend http://use-the-index-luke.com
Say I have a table called 'child' that contains a foreign key referencing another table called 'parent'. If the parent table contains column values I frequently want to access when SELECTing from the child table, is it better to JOIN the tables on the foreign key or store the columns i'm frequently accessing from the parent table a second time in the child table.
Sometimes I also have a third 'grandchild' table that references the child table and want a mixture of information from all 3 tables. A triple JOIN seems like i'm over complicating it.
I feel like there's a much better way to go about this. If anyone has advice or some good resources on this topic, let me know.
This question is based on premature optimization, which is bad.
You're talking about denornalization, which should only be done is there's a genuine and pressing performance problem. While your idea sounds enticing, it's almost always a bad idea, because:
you're only doing it for performance reasons, but databases are pretty fast - you're unlikely to benefit much anyway
denormalizing introduces complexity - if you change a child value you must keep the value in the parent updated. This is a big hassle (not going into detail here)
you don't even know if you have a performance problem: if it ain't busted, don't fix it
I have some very heavy write intensive tables (user tracking tables) which will be writing nonstop. Problem is on a fully normalized schema I will have 16 foreign keys. Some keys are purely for lookup references, some are imp like linking user ID, user session ID, activity ID, etc.
With this many FK on a write intensive table performance is an issue. (I have a user content website which needs near to real time updates). So I am planning to drop all FKs for these write intensive tables but before that I want to know how else can i link data? When people say in the code, what exactly are we doing at the code level to keep data linked together as i assume in the application we cannot have relationships?
Secondly, if I dont use FKs I assume data will still be consistent as long as the the corect ID is written? Not like if member ID is 2000 it will write 3000 instead if no FK is used for whatever reason?
Lastly, this will not effect joins right? While i hope to avoid joins I may need some. But i assume FKs or not joins can still be done as is?
Secondly, if I dont use FKs I assume data will still be consistent
as long as the the corect ID is written?
Yes.
Lastly, this will not effect joins right?
right.
When people say in the code, what exactly are we doing at the
code level to keep data linked together
This is the real question. Actually, the really real two questions are:
1) How confident are you that the incoming values are all valid and do not need to be checked.
2) How big are the lookup tables being referenced?
If the answers are "not very confident" and "really small" then you can enforce in code by caching the answers in the app layer and just doing lookups using these super-fast in-memory tables before inserting. however, consider this, the database will also cache those small tables, so it might still be simpler to keep the fks.
If the answers are "not very confident" and "really huge" then you have a choice. You can drop the FK constraints, knowingly insert bad values and do some post-job cleanup, or you can keep those fks in the database because otherwise you've got all of that bad data.
For this combination it is not practical to cache the tables in the app, and if you drop thee fks and do lookups from the app it is even slower than having fk's in the database.
If the answers are "100% confident" then the 2nd question does not matter. Drop the fk's and insert the data with speed and confidence.
I want to crate new table for each new user on the web site and I assume that there will be many users, I am sure that search performance will be good, but what is with maintenance??
It is MySQL which has no limit in number of tables.
Thanks a lot.
Actually tables are stored in a table too. So in this case you would move searching in a table of users to searching in the system tables for a table.
Performance AND maintainibility will suffer badly.
This is not a good idea:
The maximum number of tables is unlimited, but the table cache is finite in size, opening tables is expensive. In MyISAM, closing a table throws its keycache away. Performance will suck.
When you need to change the schema, you will need to do one ALTER TABLE per user, which will be an unnecessary pain
Searching for things for no particular user will involve a horrible UNION query between all or many users' tables
It will be difficult to construct foreign key constraints correctly, as you won't have a single table with all the user ids in any more
Why are you sure that performance will be good? Have you tested it?
Why would you possibly want to do this? Just have one table for each thing that needs a table, and add a "user" column. Having a bunch of tables vs a bunch of rows isn't going to make your performance better.
To give you a direct answer to your question: maintenance will lower your enthousiasm at the same rate that new users sign up for your site.
Not sure what language / framework you are using for your web site, but in this stage it is best to look up some small examples in that. Our guess is that in every example that you'll find, every new user gets one record in a table, not a table in the database.
I would go with option 1 (a table called tasks with a user_id foreign key) in the short run, assuming that a task can't have more than one user? If so then you'll need a JOIN table. Check into setting up an actual foreign key as well, this promotes referential integrity in the data itself.