I've seen a lot of discussion regarding this. I'm just seeking for your suggestions regarding this. Basically, what I'm using is PHP and MySQL. I have a users table which goes:
users
------------------------------
uid(pk) | username | password
------------------------------
12 | user1 | hashedpw
------------------------------
and another table which stores updates by the user
updates
--------------------------------------------
uid | date | content
--------------------------------------------
12 | 2011-11-17 08:21:01 | updated profile
12 | 2011-11-17 11:42:01 | created group
--------------------------------------------
The user's profile page will show the 5 most recent updates of a user. The questions are:
For the updates table, would it be possible to set both uid and date as composite primary keys with uid referencing uid from users
OR would it be better to just create another column in updates which auto-increments and will be used as the primary key (while uid will be FK to uid in users)?
Your idea (under 1.) rests on the assumption that a user can never do two "updates" within one second. That is very poor design. You never know what functions you will implement in the future, but chances are that some day 1 click leads to 2 actions and therefore 2 lines in this table.
I say "updates" quoted because I see this more as a logging table. And who knows what you may want to log somewhere in the future.
As for unusual primary keys: don't do it, it almost always comes right back in your face and you have to do a lot of work to add a proper autoincremented key afterwards.
It depends on the requirement but a third possibility is that you could make the key (uid, date, content). You could still add a surrogate key as well but in that case you would presumably want to implement both keys - a composite and a surrogate - not just one. Don't make the mistake of thinking you have to make an either/or choice.
Whether it is useful to add the surrogate or not depends on how it's being used - don't add a surrogate unless or until you need it. In any case uid I would assume to be a foreign key referencing the users table.
Related
I am creating a application that involves a friend system such as the one in facebook. The way I structured this in my SQL database is by having a friend table which has the columns ID, accountID1, accountID2 so that the each of the two accounts involved in the friendship is noted. The problem is that a friendship can be noted in two different ways for example:
ID | accountID1 | accountID2
1 | 1 | 2
2 | 2 | 1
If I make the combination unique it does not protect against this from occurring. How can I create a constraint in MySQL to prevent a friendship to be present in two different ways to ensure data integrity? or is there a different way of storing this information to prevent such problems in the first place?
The final solution I used is to first of all get rid of the ID for the friends table and make a composite primary key out of the two account ID's PrimaryKey(accountID0, accountID1). This ensures that the combination of them are unique. Then I created a "before Insert trigger" to switch the values so that the smaller accountID is always in accountID0. This method worked perfectly and made no problems so far.
We have a DB on SQL, where we have a table (1) for users and a table (2) for user's saved information. Each piece of information is one line in table (2). So my question is the following - If we are intending to grow number of users to more than 1.000.000 and each user can have more than 10 piece of information, which of the following is a better way to build our DB:
a) Having 2 tables - 1 for users and 1 for information from all users, related to users with ID
b) Having a separate table for each user.
Thanks in advance.
Definitely it should be having a single table for the user is much better. Think from the DB prospective. You are thinking about the search time in a 1.000.000 row for a sorted ID. In the second case you have to search 1.000.000 table to get into a right table. So better go for option A.
I'm going to agree that option A is the better of the two options presented.
That being said, I would personally break up the information for the users into more tables as well. This would all be connected using foreign keys and will allow for more specific querying of the information.
SQL is not really horizontally scalable, so if you end up with users with less or more information than others, then you'll have NULL columns and this requires dealing with in various ways.
By using separate tables, you can still have all of the information contained, but not have to worry if one user has a home and cell phone number, while another only has a cell number.
If and when you do need to access a lot of the information at once, SQL is very good at dealing with this through joins and the like.
Option B is not bad, it just does not fit SQL. I would work if the DB in question was document based instead of tables. In that case, creating a single document for each user is a good idea, and likely preferred.
Option C)
table for users with a unique UserID as Clustered Index (Primary Key)
table for Type of saved information with a unique InformationID as Clustered Index (Primary Key)
table for UserInformation with unique UserInformationID as Clustered Index (Primary Key), a column for UserID (nonclustered index, foreign key to user table) and a column for InformationID (nonclustered index, foreign key to Information table). Have a "Value" or similar column to hold the data being save as it relates to the type of information.
Example:
Users Table
UserID UserName
1 | UserName1
2 | UserName2
Information Table
InfoID InfoName
1 | FavoriteColor
2 | FavoriteNumber
3 | Birthday
UserInformation Table
ID UserID InfoID Value
1 | 1 | 1 | Blue
2 | 1 | 2 | 7
3 | 1 | 3 | '11/01/1999'
4 | 2 | 3 | '05/16/1960'
This method allows for you to save any combination of values for any user without recording any of the non-supplied user information. It keeps the information table 'clean' because you won't need to keep adding columns for each new piece of information you wish to track. Just add a new record to the Info table, and then record only the values submitted to the UserInformation table.
I'm making a site that will be a subscription based service that will provide users several courses based on whatever they signed up for. A single user can register in multiple courses.
Currently the db structure is as follows:
User
------
user_id | pwd | start | end
Courses
-------
course_id | description
User_course_subscription
------------------------
user_id | course_id | start | end
course_chapters
---------------
course_id | title | description | chapter_id | url |
The concern is that with the user_course_subscription table I cannot (at least at the moment I don't know how) I can have one user with multiple course subscriptions (unless I enter the same user_id in multiple times with a different course_id each time). Alternatively I would add many columns in the format calculus_1 chem_1 etc., but that would give me a ton of columns as the list of courses grow.
I was wondering if having the user_id put in multiple times is the most optimal way to do this? Or is there another way to structure the table (or maybe I'd have to restructure all the tables)?
Your database schema looks fine. Don't worry, you're on the right track. As for the User_course_subscription table, both user_id and course_id form the primary key together. This is called a joint primary key and is basically fine.
Values are still unique because no user subscribes to the same course twice. Your business logic code should ensure this anyway. For the database part: You might want to look up in your database system's manual how to set joint primary keys up properly when creating the table (syntax might differ).
If you don't like this idea, you can also create a pseudo primary key, that is having:
user_course_subscription
------------------------
user_course_subscription_id | user_id | course_id | start | end
...where user_course_subscription_id is just an auto-incremented integer. This way, you can use user_course_subscription_id to identify records. This might make things easier in some places of your code, because you don't always have to use two values.
As for heaving calculus_1, chem_1 etc. - don't do this. You might want to read up on database normalization, as mike pointed out. Especially 1NF through 3NF are very common in database design.
The only reason not to follow normal forms is performance, and then again, in most cases optimization is premature. If you're concerned, stress-test the prototype of your appliation under realistic (expected) conditions and measure response times to get some hard evidence.
I don't know what's the meaning of the start and end columns in the user table. But you seem to have no redundancy.
You should check out the boyce-codd normal form wikipedia article. There is a useful example.
I've created a database with three tables in it:
Restaurant
restaurant_id (autoincrement, PK)
Owner
owner_id (autoincrement, PK)
restaurant_id (FK to Restaurant)
Deal
deal_id (autoincrement)
owner_id (FK to Owner)
restaurant_id (FK to Restaurant)
(PK: deal_id, owner_id, restaurant_id)
There can be many owners for each restaurant. I chose two foreign keys for Deal so I can reference the deal by either the owner or the restaurant. The deal table would have three primary keys, two being foreign keys. And it would have two one-to-many relationships pointing to it. All of my foreign keys are primary keys and I don't know if I'll regret doing it like this later on down the road. Does this design make sense, and seem good for what I'm trying to achieve?
Edit: What I really need to be able to accomplish here is when a owner is logged in and viewing their account, I want them to be able to see and edit all the deals that are associated with that particular restaurant. And because there can be more that one owner per restaurant, I need to be able to perform a query something like: select *from deals where restaurant_id = restaurant_id. In other words, if I'm an owner and I'm logged in, I need to be able to make query: get all of the deal that are related to not just me, the owner, but to all of the owners associated with this restaurant.
You're having some trouble with terminology.
A table can only ever have a one primary key. It is not possible to create a table with two different primary keys. You can create a table with two different unique indexes (which are much like a primary key) but only one primary key can exist.
What you're asking about is whether you should have a composite or compound primary key; a primary key using more than one column.
Your design is okay, but as written you probably have no need for the column deal_id. It seems to me that restaurant_id and owner_id together are enough to uniquely identify a row in Deal. (This may not be true if one owner can have two different ownership stakes in a single restaurant as the result of recapitalization or buying out another owner, but you don't mention anything like that in your problem statement).
In this case, deal_id is largely wasted storage. There might be an argument to be made for using the deal_id column if you have many tables that have foreign keys pointing to Deal, or if you have instances in which you want to display to the user Deals for multiple restaurants and owners at the same time.
If one of those arguments sways you to adopt the deal_id column, then it, and only it, should be the primary key. There would be nothing added by including the other two columns since the autoincrement value itself would be unique.
If u have a unique field, this should be the PK, that would be the incremented field.
In this specific case it gives u nothing at all to add more fields to this key, it actually somewhat impacts performance (don't ask me how much, u bench it).
if you want to create 2 foreign keys in the deal table which are the restaurant and the owner the logic is something like a table could exist in the deal even without an owner or an owner could exist in the deal even without identifying the table on it but you could still identify the table because it's being used as a foreign key on the owner table, but if your going to put values on each columns that you defined as foreign key then I think it's going to be redundant cause I'm not sure how you would use the deal table later on but by it's name I think it speaks like it would be used to identify if a restaurant table is being reserved or not by a customer and to see how you have designed your database you could already identify the table which they have reserved even without specifying the table as foreign key in the deal table cause by the use of the owner table you would able to identify which table they have reserved already since you use it as foreign key on the owner table you just really have to be wise on defining relationships between your tables and avoid redundancy as much as possible. :)
I think it is not best.
First of all, the Deal table PK should be the deal_id. There is no reason to add additional columns to it--and if you did want to refer to the deal_id in another table, you'd have to include the restaurant_id and owner_id which is not good. Whether deal_id should also be the clustered index (a.k.a. index organized on this column) depends on the data access pattern. Will your database be full of data_id values most often used for lookup, or will you primarily be looking deals up by owner_id or restaurant_id?
Also, using two separate FKs way the you have described it (as far as I can tell!) would allow a deal to have an owner and restaurant combination that are not a valid (combining an owner that does not belong to that restaurant). In the Deal table, instead of one FK to Owner and one FK to Restaurant, if you must have both columns, there should be a composite FK to only the Owner table on (OwnerID, RestaurantID) with a corresponding unique key in the Owner table to allow this link up.
However, with such a simple table structure I don't really see the problem in leaving RestaurantID out of the Deal table, since the OwnerID always fully implies the RestaurantID. Obviously your deals cannot be linked only with the restaurant, because that would imply a 1:M relationship on Deal:Owner. The cost of searching based on Restaurant through the Owner table shouldn't really be that bad.
Its not wrong, it works. But, its not recommended.
Autoincrement Primary Keys works without Foreign Keys (or Master Keys)
In some databases, you cannot use several fields as a single primary key.
Compound Primary Keys or Compose Primary Keys are more difficult to handle in a query.
Compound Primary Key Query Example:
SELECT
D.*
FROM
Restaurant AS R,
Owner AS O,
Deal AS D
WHERE
(1=1) AND
(D.RestaurantKey = D.RestaurantKey) AND
(D.OwnerKey = D.OwnerKey)
Versus
Single Primary Key Query Example:
SELECT
D.*
FROM
Restaurant AS R,
Owner AS O,
Deal AS D
WHERE
(D.OwnerKey = O.OwnerKey)
Sometimes, you have to change the value of foreign key of a record, to another record. For Example, your customers already order, the deal record is registered, and they decide to change from one restaurant table to another. So, the data must be updated, in the "Owner", and "Deal" tables.
+-----------+-------------+
| OwnerKey | OwnerName |
+-----------+-------------+
| 1 | Anne Smith |
+-----------+-------------+
| 2 | John Connor |
+-----------+-------------+
| 3 | Mike Doe |
+-----------+-------------+
+-----------+-------------+-------------+
| OwnerKey | DealKey | Food |
+-----------+-------------+-------------+
| 1 | 1 | Hamburguer |
+-----------+-------------+-------------+
| 2 | 2 | Hot-Dog |
+-----------+-------------+-------------+
| 3 | 3 | Hamburguer |
+-----------+-------------+-------------+
| 1 | 3 | Soda |
+-----------+-------------+-------------+
| 2 | 1 | Apple Pie |
+-----------+-------------+-------------+
| 3 | 3 | Chips |
+-----------+-------------+-------------+
If you use compound primary keys, you have to create a new record for "Owner", and new records for "Deals", copy the other fields, and delete the previous records.
If you use single keys, you just have to change the foreign key of Table, without inserting or deleting new records.
Cheers.
Say I have the following table:
TABLE: product
============================================================
| product_id | name | invoice_price | msrp |
------------------------------------------------------------
| 1 | Widget 1 | 10.00 | 15.00 |
------------------------------------------------------------
| 2 | Widget 2 | 8.00 | 12.00 |
------------------------------------------------------------
In this model, product_id is the PK and is referenced by a number of other tables.
I have a requirement that each row be unique. In the example about, a row is defined to be the name, invoice_price, and msrp columns. (Different tables may have varying definitions of which columns define a "row".)
QUESTIONS:
In the example above, should I make name, invoice_price, and msrp a composite key to guarantee uniqueness of each row?
If the answer to #1 is "yes", this would mean that the current PK, product_id, would not be defined as a key; rather, it would be just an auto-incrementing column. Would that be enough for other tables to use to create relationships to specific rows in the product table?
Note that in some cases, the table may have 10 or more columns that need to be unique. That'll be a lot of columns defining a composite key! Is that a bad thing?
I'm trying to decide if I should try to enforce such uniqueness in the database tier or the application tier. I feel I should do this in the database level, but I am concerned that there may be unintended side effects of using a non-key as a FK or having so many columns define a composite key.
When you have a lot of columns that you need to create a unique key across, create your own "key" using the data from the columns as the source. This would mean creating the key in the application layer, but the database would "enforce" the uniqueness. A simple method would be to use the md5 hash of all the sets of data for the record as your unique key. Then you just have a single piece of data you need to use in relations.
md5 is not guaranteed to be unique, but it may be good enough for your needs.
First off, your intuition to do it in the DB layer is correct if you can do it easily. This means even if your application logic changes, your DB constraints are still valid, lowering the chance of bugs.
But, are you sure you want uniqueness on that? I could easily see the same widget having different prices, say for sale items or what not.
I would recommend against enforcing uniqueness unless there's a real reason to.
You might have something like this (obvoiusly, don't use * in production code)
# get the lowest price for an item that's currently active
select *
from product p
where p.name = "widget 1" # a non-primary index on product.name would be advised
and p.active
order-by sale_price ascending
limit 1
You can define composite primary keys and also unique indexes. As long as your requirement is met, defining composite unique keys is not a bad design. Clearly, the more columns you add, the slower the process of updating the keys and searching the keys, but if the business requirement needs this, I don't think it is a negative as they have very optimized routines to do these.