Is having extra data in database tables considered as a bad practice? - mysql

Consider the following scenario:
chapter(chapter_id*,book_id, chapter_no)
book(book_id*)
user(user_id*)
position(user_id*,book_id,chapter_id*)
= (component of) PRIMARY KEY
When I want to know in which chapter of a book the user currently is, I simply query the Position table with certain User and Book. But the problem is that the foreign key to Book is garbage here because the Chapter specifies the Book already!
What should I do now?

It's always depending of queries you will do against database. You will have to decide if duplicate a value worth to avoid a JOIN.
With what you give I would drop the chapter table and simply use
position(user_id*,book_id,chapter_no)
Can it be enough for your application ?

I think your design is valid, and position.book_id isn't "garbage" at all. You wrote:
When I want to know in which chapter of a book the user currently is [..]
So it sounds like a user can have only one "position" per book, and your unique/primary key should be user_id,book_id instead of user_id,chapter_id.
In order to prevent a foreign key mismatch (book_id and chapter_id do not match) you can define a compound foregn key (book_id, chapter_id) referencing chapter(book_id, chapter_id).

Related

Database Design - Custom attributes table - Table that "relate" entities

I'm designing a database (for use in mysql) that permits new user-defined attributes to an entity called nodes.
To accomplish this I have created 2 other tables. One customvars table that holds all custom attributes and a *nodes_customvars* that define the relationship between nodes and customvars creating a 1..n and n..1 relationship.
Here is he link to the drawed model: Sketched database model
So far so good... But I'm not able to properly handle INSERTs and UPDATEs using separate IDs for each table.
For example, if I have a custom attribute called color in the *nodes_customvars* table inserted for a specific node, if I try to "INSERT ... ON DUPLICATE KEY UPDATE" either it will always insert or always update.
I've thinked on remove the "ID" field from the *nodes_customvars* tables and make it a composite key using nodes id and customvars id, but I'm not sure if this is the best solution...
I've read this article, and the comments, as well: http://weblogs.sqlteam.com/jeffs/archive/2007/08/23/composite_primary_keys.aspx
What is the best solution to this?
EDIT:
Complementing: I don't know the *nodes_customvars* id, only nodes id and customvars id. Analysing the *nodes_customvars* table:
1- If I make nodes id and/or customvars id UNIQUE in this table, using "INSERT ... ON DUPLICATE KEY UPDATE" will always UPDATE. Since that multiple nodes can share the same customvar, this is wrong;
2- If I don't make any UNIQUE key, "INSERT ... ON DUPLICATE KEY UPDATE" will always INSERT, since that no UNIQUE key is already found in the statement...
You have two options for solving your specific problem of the "INSERT...ON DUPLICATE KEY" either always inserting or updating as you describe.
Change the primary to be a composite key using nodeId and customvarId (as suggested by SyntaxGoonoo and in your question as a possible option).
Add a composite unique index using nodeId and customvarId.
CREATE UNIQUE INDEX IX_NODES_CUSTOMVARS ON NODES_CUSTOMVARS(nodeId, customvarId);
Both of the options would allow for the "INSERT...ON DUPLICATE KEY" functionality to work as you require (INSERT if a unique combination of nodeId and customvarId doesn't exist; update if it does).
As for the question about whether to have a composite primary key or a separate primary key column with an additional unique index, there are many things to consider in the design. There's the 1NF considerations and the physical characteristics of the database platform you're on and the preference of the ORM you happen to be using (if any). Given how InnoDB secondary indexes work (see last paragraph at: http://dev.mysql.com/doc/refman/5.0/en/innodb-index-types.html), I would suggest that you keep the design as you currently have it and add in the additional unique index.
HTH,
-Dipin
You current entity design breaks 1NF. This means that your schema can erroneously store duplicate data.
nodes_customvars describes the many-to-many relationship between nodes and customvars. This type of table is sometimes referred to as an auxiliary table, because its contents are purely derived from base tables (in this case nodes and customvars).
The PK for an auxiliary table describing a many-to-many relationship should be a composite key in order to prevent duplication. Basically 1NF.
Any PK on a table is inherently UNIQUE. regardless of whether it is a single, or composite key. So in some ways your question doesn't make sense, because you are talking about turning the UNIQUE constraint on/off on id for nodes and customvars . Which you can't do if your id is actually a PK.
So what are you actually trying to achieve here???

Renaming foreign keys to fit the context of a table

When using a foreign key in a table, is it good form to change the name of the key for that table to make it clear what function the key performs in the table, or is it good form to retain the original name, to make it clear that it is a foreign key?
Example:
a table keeps track of users, the primary key is user_id
a second table stores articles on the website and keeps track of the author with the foreign key user_id.
In the context of the second table it would make more sense to call the foreign key author. In the context of the whole database it would make more sense to call the foreign key user_id
Is there a general convention that deals with this situation, or is that what comments are for?
Well, if you have a movie table you wouldn't want columns called person_id and person_id, but rather producer and director, or perhaps producer_id and director_id, or maybe producer_person_id and director_person_id.
I know movies can have multiple directors and multiple producers; this was just an example. Any case in which a table has two foreign keys to the same table will show you that you cannot in principle stick completely to a convention of using only the table name in the column name. You can use both (as in the producer_person_id example) but that leads to long column names.
Don't use comments. No one reads them. Okay that was just snark, perhaps, but in general favor descriptive names to comments!
Aside from the two-foreign-key issue, I'm not really aware of any univerally accepted convention.
It is conventional to know the database schema's modelling and designing. Whatever makes sense to the database administrator. Business logic is not concerned with how the database is named, only the results. For the database administrator if it make more sense to rename the foreign key author_id to refer to user_id of another table then do so and notate it in some documents that T2.author_id must exist in T1.user_id. When transitioning from modelling to designing the database (which is where you are now) it would make sense to just keep it simple, but you can change the foreign key names so long as you can remember them (and document them as well).

Sql simple beginner operation

I have a table named USERS with user_id as primary key and user_name.
I have another table USERS_ACT with user_act_id primary key, user_act_user_id and another 2 columns.
I need user_act_user_id to be foreign key in USERS? How can I achieve this?
This is my first day in SQL so please be kind to explain if what I ask is wrong.
let's assume you are not the DB admin and you just want to get all the active users' names ;))
select users.user_name
from users
join users_act on users.user_id = users_act.user_act_user_id
Without referencial integrity it's up to you to make it work, there's no "magic" around it.
Populate your user_act_user_id with a pk-value from USERS and there you have it.
You may want to add constraints, but that may not be what you're asking for,
http://msdn.microsoft.com/en-us/library/ms175464.aspx
In short, they keep the keys between tables in good shape.
Assuming you are using InnoDB (which is the only engine that supports foreign keys):
ALTER TABLE users_act
ADD CONSTRAINT fk_users_act_users
FOREIGN KEY (user_act_user_id)
REFERENCES users (user_id);
It depends on your DB Type if MySql even supports foreign keys. For example you can use foreign keys with InnoDB format but not with MyIsam format.
When working with MySql i personally prefer working with MyIsam and do most of the checking about integrity while programming.
In general you can just add user_act_user_id in your table USERS but not mark it as any key. After that you can simple use a JOIN, but ofc the referencial integrity is not given so have to write your own "trigger" on programming site if you want f.e. to automaticly delete data belonging to a user in the other table. Otherwise you have to use constraints or triggers, but this might be not that easy when just started with SQL.

Does MySQL require a primary key for a many-to-many link table?

Note to Mod: I read through about a dozen posts that seemed to pertain to this issue, but none of them answered my question. Please do not flag this post for deletion; this is not a duplicate question.
I am building a database for a web-gallery that will contain many-to-many relationships. For example, tags and images. Obviously, to accomplish this a third, link, table will be created. I can see a use for having a primary key column in the tags table and the images table, but I can't imagine a use for it in the links table. It would just take up server space. So, I'm thinking of just not having a primary key column in the links table. Does MySQL allow this? Or, would there be any compelling reason to have a primary key in the links table? Thanks.
Link Table:
+--------------+---------+-----------+
| primary key? | tag ids | image ids |
+--------------+---------+-----------+
Clarification
Will not having a primary key in a table break the database?
There is no requirement that you have a primary key.
However, there is also no requirement that a primary key be only one field. In this case you might declare your primary key to be (tag_id, image_id).
You've got a question in reply to another post that gives me the idea that maybe you're thinking you should concatenate the two fields to make the primary key. Don't. Define the key as
alter table link add primary key (tag_id, image_id);
Do NOT say
alter table link add primary key (tag_id + image_id);
(I think "+" is the concatenation operator in MySQL. It's been a while. The SQL standard is "&" but MySQL uses that for something else.)
There's a big difference between the two, namely, in the first case, 25,34 and 253,4 are two different values, while in the second case they both get turned into 2534.
Will you always go from tag to image, or will you also want to go from image to tag? If you need to go in both directions, then you should create two indexes, or a primary key and an index, with the fields in both directions. Like:
create index link_tag_image on link(tag_id, image_id);
create index link_image_tag on link(image_id, tag_id);
If you make only the first (for example), then consider this query:
select tag.name
from image
join link on image.image_id=link.imagae_id
join tag on tag.tag_id=link.tag_id
where image.foo='bar'
This seems plausbile enough: find all the tags that match images that meet a certain condition. But without the second index, this query could take a very long time, because the db will have to read the entire link table sequentially to find all the records with a given image_id.
There is no need for primary key in the link table. Although a compound key is a good idea. Uniqueness can be achieved by using UNIQUE ( tag_ids, image_ids)
Yes, your primary key should be a compound/composite key of tag_id and image_id, i.e. PRIMARY KEY (tag_id, image_id). There's no need for an extra autoincrement column in this case.
When working with MySQL Workbench it's highly advisable because without a primary key it won't allow any access to your tables other than read only, which is a pain when trying to test your database. Although it does seem wasteful to have a PK that is never going to be referenced in a relationship.

simple DB design question

This is probably a very stupid question, but I am just not sure which solution is the most elegant and the best(most performant) way to go in the following scenario.
I have the following tables:
Customer, Company, Meter, Reading
all of the tables above the line are supposed to be linked to one or more records of a "Comment" table. Which is the best way to model this relationship?
I am seeing two solutions here:
1.) use m:n relationships: CustomerComment, CompanyComment, etc. -> easy to extend later on, but a lot of new tables
2.) use 1:n relationships: Comment table has a field for the PK of the tables above (Customer_id, Company_id, ...) -> minimal table approach, but "harder" to extend since I would have to add a new field to the comment table whenever there is a new table that needs to be have comments
The target is a modular application, which may or may not have any of those four tables.
Which one is the better one - or are there more?
Thanks!
This is the problem with using integers for primary keys. You have a few solutions you can use.
The true unique ID for any given row for Customer, Company, Meter, Reading is a UUID. Maybe because of the database design the primary key has to be an integer but that is ok. This means you never have to add fields to the COMMENTS table if you have a new type in your system. It will always reference by the types ID.
Your tables can look like this:
CUSTOMER
ID UUID
COMPANY
ID UUID
METER
ID UUID
COMMENTS
ID
RELATED_TO UUID
COMMENT TEXT
Now you can attach comments to any table that has a unique ID.
If you want to support referential constraints
OBJECT is a table that holds all of the ID's of all the pieces of data you have in your system. We really start building a system in which you can associate any comment with anything you want. This may not be suitable in your design.
OBJECT
ID UUID
CUSTOMER
ID UUID
FOREIGN_KEY (ID) REFERENCES OBJECT(ID) ON DELETE CASCADE
COMPANY
ID UUID
FOREIGN_KEY (ID) REFERENCES OBJECT(ID) ON DELETE CASCADE
METER
ID UUID
FOREIGN_KEY (ID) REFERENCES OBJECT(ID) ON DELETE CASCADE
COMMENTS
ID
RELATED_TO UUID
COMMENT TEXT
FOREIGN_KEY (RELATED_TO) REFERENCES OBJECT(ID) ON DELETE CASCADE
This complicates the design in order to assure you don't need to add 2 tables for each new type in the system. Each design has sacrifices. In this one you've complicated things by saying for every each entry whether it be Company, Customer, Meter I need an associated ID int he Object table so I can put a foreign key on it.
I prefer one for each pair - CustomerComment, CompanyComment, etc. It eventually will speed up your queries, and while it isn't as 'extensible' as a single CommentLink table, you'll need to make schema changes when you add something else that needs comments anyway.
I would use seperate tables, that way you can keep the referential constraints simple.