I have a pretty typical user table setup for my web application:
user_id | username
--------------------
0 greg
1 john
... ...
Both fields are indexed and unique, so I can look up a user by id or username quickly.
I want to keep a friends table, and am not sure whether to store the user_id values or usernames in that table:
user_id_1 | user_id_2
--------------------------
or
username_1 | username_2
--------------------------
I am going to want to get a list of friends for a user, so it would be convenient to immediately have the usernames in the friends table instead of doing a join on the users table:
select * from friends where username_1 = 'greg';
If I'm storing user ids, I need to do a join then to get the usernames - will that be costly?:
select * from friends
where user_id_1 = x
join users where user_id = { all those results }
Using user ids allows me to let users change usernames flexibly, but I'm not letting them do that anyway. Any advice would be great.
Thanks
A join on the IDs won't be too bad. The ID may be smaller to store on disk. Also, I would imagine a list of friends would have something other than just user names, in which case, you have to join no matter what.
Well, as you said, using id semantics means you can change the username without having to deal with cascading effects. For most cases PK / UNQ + FK indexes will make joins thundering fast, but you may have a point for huge tables (for which you will eventually need some kind of external index, or other tool anyway).
The ID will be smaller if you use numeric values. Also the index search will be faster. Here you'll find the data types for MySQL 5.0.
Also I don't know how are you using index, but I'd recommend to add and auto-increment field. You can do that to a table, for an integer index like this:
ALTER TABLE `Database`.`tableName` ADD COLUMN `indexName` INTEGER NOT NULL AUTO_INCREMENT
Related
Lets say I have users table:
| id | username | email | address |
And posts table:
| id | post | user_id | date |
When I want to show posts, each time I need to go users table to retrieve username from user_id. I want to avoid using JOIN for this simple data retreive so what I do is adding another coloumn to posts table:
| id | post | user_id | username | date |
This way I will not have to use JOIN to retreive username when showing posts
Do you think that this is better?
no. your alternative structure is vulnerable to inconsistencies (e.g. if a user changes his name; read about 3rd Normal form here http://en.wikipedia.org/wiki/Third_normal_form#.22Nothing_but_the_key.22)
why don't you want to use JOINs? have you set up approriate indexes?
I think it depends on the design and future, niy I will suggest you not to do that:
although from present respect, you will think it will be better performance to avoid join, but what if your application expand, and it is no good to use this unnormalized table structure.
For instance, if one of the poster changed username, how could you achieve that? to update the whole table? if your data could exccess 10Million tuples, it will be tough because update will lock the table in the process of updating.
thus I will not recommend this.
Join performance can be omit if your application needs frequently updating in that way.
If the [id] of [users] table is the primary key, I think it is good enough to use JOIN.
Alternatively, if you select limited number of posts, such as 10 posts, can also try this sql:
select id, post, user_id,
(select username from users where id = user_id) as username, date
from posts
limit 0, 10
So I'm new to databases in the scope of the subject and looking for some advice for what I am sure is fairly simple. first I'm using MySql as my db I currently have two tables one for storing user accounts and details :
TABLE user
id | username | password | email_address | user_devices | contact_method
and another for storing video content by producers which looks like:
TABLE series
id | series_title | still_broadcasting | last_updated |
I would like to implement a feature where Users can select series which they wish to be notified of when new releases are made available and also select how to be notified about these releases (email or push notification ) and how often to be notified (on arrival, hourly, daily, weekly ) I am wondering whats the best way to go about doing this?
I've thought of these ideas by myself but am looking for a second opinion/ better way altogether: (all ideas minus 4 involve storing how to notify user along with how often in user table)
adding a text column to user table called following and just having csv's for each series
adding multiple boolean column's to user table one for each series
adding text column to series table with csv's of user's Id numbers following series
creating an entirely new table for notifications though i don't really see the purpose of this as its very redundant
I then plan to just add cron jobs to my server to actually go about regulaurly sending notifications to user's
Thanks in advance for any help.
First of all, it might be worth giving some articles on basic database design a read. A quick google turned up this which covers identifying relationships
http://www.datanamic.com/support/lt-dez005-introduction-db-modeling.html
Your best bet is to use a linking table i.e.
CREATE TABLE userHasSeries (
userID INT,
seriesID INT
);
This can then be used in an INNER JOIN query to get the users choices. What you are doing here is an n:m link between 2 tables. An example inner join would be
SELECT
u.id AS userID,
u.username,
s.seriesID,
s.series_title,
s.still_broadcasting,
s.last_updated
FROM users AS u
INNER JOIN userHasSeries AS uhs
ON uhs.userID = u.id
INNER JOIN series AS s
ON s.id = uhs.seriesID
If users.user_devices is also a comma seperated list I would advise heavily that you adopt a similar n:m approach there also.
A partial answer which complements what has been written in other answers:
Don't keep a list of devices in the 'user_devices' field - break this out into a separate table. In fact, you'll need two tables: one to list the various devices, and one a join table which has two fields: user_id and device_id. This will enable you to track which user has which device, but also to provide a list of users per device.
If I were you I would add a third table as following:
TABLE user
id | username | password | email_address | user_devices | contact_method |notification_type
TABLE series
id | series_title | still_broadcasting | last_updated
TABLE followings
id | user_id | series_id
In notification_type I would put (on arrival, hourly, daily, or weekly), now in the followings tables I will store all the user's preferred series.
Doing this way makes easy to add, delete, update, or select all user's preferred series. All will be simple SQL queries. Also you avoid parsing comma separated strings.
for example, if you want to get all preferred series of an user:
SELECT * FROM followings AS f INNER JOIN series AS s ON f.series_id = s.id WHERE f.user_id = ?
if want to get all users that prefer a serie:
SELECT * FROM followings AS f INNER JOIN user AS u ON f.user_id = u.id WHERE f.series_id = ?
For storing friends relationships in social networks, is it better to have another table with columns relationship_id, user1_id, user2_id, time_created, pending or should the confirmed friend's user_id be seralized/imploded into a single long string and stored along side with the other user details like user_id, name, dateofbirth, address and limit to like only 5000 friends similar to facebook?
Are there any better methods? The first method will create a huge table! The second one has one column with really long string...
On the profile page of each user, all his friends need to be retrieved from database to show like 30 friends similar to facebook, so i think the first method of using a seperate table will cause a huge amount of database queries?
The most proper way to do this would be to have the table of Members (obviously), and a second table of Friend relationships.
You should never ever store foreign keys in a string like that. What's the point? You can't join on them, sort on them, group on them, or any other things that justify having a relational database in the first place.
If we assume that the Member table looks like this:
MemberID int Primary Key
Name varchar(100) Not null
--etc
Then your Friendship table should look like this:
Member1ID int Foreign Key -> Member.MemberID
Member2ID int Foreign Key -> Member.MemberID
Created datetime Not Null
--etc
Then, you can join the tables together to pull a list of friends
SELECT m.*
FROM Member m
RIGHT JOIN Friendship f ON f.Member2ID = m.MemberID
WHERE f.MemberID = #MemberID
(This is specifically SQL Server syntax, but I think it's pretty close to MySQL. The #MemberID is a parameter)
This is always going to be faster than splitting a string and making 30 extra SQL queries to pull the relevant data.
Separate table as in method 1.
method 2 is bad because you would have to unserialize it each time and wont be able to do JOINS on it; plus UPDATE's will be a nightmare if a user changes his name, email or other properties.
sure the table will be huge, but you can index it on Member11_id, set the foreign key back to your user table and could have static row sizes and maybe even limit the amount of friends a single user can have. I think it wont be an issue with mysql if you do it right; even if you hit a few million rows in your relationship table.
I'm planing to build some database project.
One of the tables have a lot of attributes.
My question is: What is better, to divide the the class into 2 separate tables or put all of them into one table. below is an example
create table User { id, name, surname,... show_name, show_photos, ...)
or
create table User { id, name, surname,... )
create table UserPrivacy {usr_id, show_name, show_photos, ...)
The performance i suppose is similar due to i can use index.
It's best to put all the attributes in the same table.
If you start storing attribute names in a table, you're storing meta data in your database, which breaks first normal form.
Besides, keeping them all in the same table simplifies your queries.
Would you rather have:
SELECT show_photos FROM User WHERE user_id = 1
Or
SELECT up.show_photos FROM User u
LEFT JOIN UserPrivacy up USING(user_id)
WHERE u.user_id = 1
Joins are okay, but keep them for associating separate entities and 1->N relationships.
There is a limit to the number of columns, and only if you think you might hit that limit would you do anything else.
There are legitimate reasons for storing name value pairs in a separate table, but fear of adding columns isn't one of them. For example, creating a name value table might, in some circumstances, make it easier for you to query a list of attributes. However, most database engines, including PDO in PHP include reflection methods whereby you can easily get a list of columns for a table (attributes for an entity).
Also, please note that your id field on User should be user_id, not just id, unless you're using Ruby, which forces just id. 'user_id' is preferred because with just id, your joins look like this:
ON u.id = up.user_id
Which seems odd, and the preferred way is this:
ON u.user_id = up.user_id
or more simply:
USING(user_id)
Don't be afraid to 'add yet another attribute'. It's normal, and it's okay.
I'd say the 2 separate tables especially if you are using ORM. In most cases its best to have each table correspond to a particular object and have its field or "attributes" be things that are required to describe that object.
You don't need 'show_photos' to describe a User but you do need it to describe UserPrivacy.
You should consider splitting the table if all of the privacy attributes are nullable and will most probably have values of NULL.
This will help you to keep the main table smaller.
If the privacy attributes will mostly be filled, there is no point in splitting the table, as it will require extra JOINs to fetch the data.
Since this appears to be a one to one relationship, I would normally keep it all in one table unless:
You would be near the limit of the number of bytes that can be stored in a row - then you should split it out.
Or if you will normally be querying the main table separately and won't need those fields much of the time.
If some columns is (not identifiable or dependent on the primary key) or (values from a definite/fixed set is being used repeatedly) of the Table make a Different Table for those columns and maintain a one to one relationship.
Why not have a User table and Features table, e.g.:
create table User ( id int primary key, name varchar(255) ... )
create table Features (
user_id int,
feature varchar(50),
enabled bit,
primary key (user_id, feature)
)
Then the data in your Features table would look like:
| user_id | feature | enabled
| -------------------------------
| 291 | show_photos | 1
| -------------------------------
| 291 | show_name | 1
| -------------------------------
| 292 | show_photos | 0
| -------------------------------
| 293 | show_name | 0
I would suggest something differnet. It seems likely that in the future you will be asked for 'yet another attribute' to manage. Rather than add a column, you could just add a row to an attributes table:
TABLE Attribute
(
ID
Name
)
TABLE User
(
ID
...
)
TABLE UserAttributes
(
UserID FK Users.ID
Attribute FK Attributes.ID
Value...
)
Good comments from everyone. I should have been clearer in my response.
We do this quite a bit to handle special-cases where customers ask us to tailor our site for them in some way. We never 'pivot' the NVP's into columns in a query - we're always querying "should I do this here?" by looking for a specific attribute listed for a customer. If it is there, that's a 'true'. So rather than having these be a ton of boolean-columns, most of which would be false or NULL for most customers, AND the tendency for these features to grow in number, this works well for us.
I have a pretty basic question on which is the preferred way of storing data in my database.
I have a table called "users" with each user getting a username and user_id. Now, I want to make a table called "comments" for users to comment on news.
Is it better to have a column in comments called "username" and storing the logged in user's name, or have a column called "user_id". If I use user_id I would have to make my sql statement have another select statement. "(SELECT username FROM users WHERE users.id = comments.user_id) as username". It seems like performance would be better just storing the username.
I thought I read to avoid duplicate data in a database though.
Which is better?
Thanks
Typically, you use ID fields to link tables together. The reason being (in your situation) that you might allow the person to change their username, but you don't want to try and update all the places that is at...
Therefore, put the user_id in your comments table and pull the username out on a join, as you've shown.
If the user_id is the primary key then you should use user_id instead of username, if you want to use username instead of user_id then why do you have a user_id in the first place?
If there's the potential of creating a large enough database, store the user_id in the comments table. Less overhead. Also consider that usernames my be modified easier this way.
Data should be stored in (at least) third normalized form, so you should use the user_id as the primary key in the users table, and as a foreign key in the comments table and use this to get the details:
SELECT comments.*, users.username
FROM comments, users
WHERE users.user_id = comments.user_id;
If you are getting the comments based on an article, you could do this like this:
SELECT comments.*, users.username
FROM comments, users
WHERE users.user_id = comments.user_id
AND comments.article_id = '$current_article_id';
Storing the userid (integer) will mean faster JOINs later. Unless you plan on having people dig through the database by hand, there's really no reason to use the username
I'm pretty sure storing the user id in the comments table is sufficient. If you're returning rows from the comments table, just use the JOIN statement.
Cheers
Which is going to be a unique identifier? The user_id, I'd bet, or you can't have two "John Smith"s in your system.
And if volume is much of a concern, text matching the username field is going to be more expensive than linking to the users table in your query in the long term.
Numeric values are cheaper to join and index than an alphanumeric id. Use a number to uniquely identify a row. Another benefit is that the PK doesn't need to change if they need to change the user id. The last benefit is that this is the design of most modern web frameworks such as django and rails.