I am designing a simple twitter site ( for study ) but with a little bit different: the users can follow other users, keywords and lists. I want to know how to create a following table to put information about following.
Is this approach ( below ) correct ?
Following Table:
id ( id of the following table )
type ( type can be 1 ( user ), 2 ( keyword ) or 3 ( list ) )
idtype ( id of the type table )
user ( user's id )
However there isn't a keyword table. So I don't know.
What is the best approach ?
It's incorrect because you can't create a foreign key from idtype to the parent table, because "parent table" changes depending on type. BTW, if user can follow multiple keywords, then you won't escape having a separate table for that (unless you want to break the 1NF by "packing" several values into the same field, which is a really bad idea).
There are couple of ways to resolve this, probably the simplest one is to use separate id fields for each of the possible parent tables, and then constrain them so only one of them can be non-NULL.
However, since InnoDB tables are clustered and secondary indexes in clustered tables are expensive, I'd rather go with something like this (tweets table not shown):
This will enable you to very efficiently answer the query: "which users follow the given user (or keyword or list)". If you need to answer: "which users (or keywords or lists) the given user follows", reverse the order of fields in the PKs shown above. If you need both, then you'd need indexes in both directions (and pay the clustering price).
Related
Hi, I'm designing a item catalog using MySQL and Squalize ORM (NodeJS).
Suppose I have a product list with different attributes based on its category (attributes_id in this case).
I would like to get a product by using a JOIN statement with an appropriate attribute table. The design should be scalable as we will have more than a hundred attribute tables.
Roughly the statement will look like this:
JOIN
if ( product.attributes_id == 1 ) 'attributes_car'
elseif ( product.attributes_id == 2 ) 'attributes_food'
BUT the the number of elseif cases will grow more than a hundred later.
So the question is how to design attributes_id? Is it a good idea to make it a foreign key to the database metadata (like INFORMATION_SCHEMA) to point to another table? Should I introduce another table to manage such relationship?
One of option is a Dynamic SQL but I don't think it is a good idea because the ifelse cases should grow.
Or should I give up designing a query and just implement such logic on NodeJS side using ORM?
Thank you!
One solution would be to create an additional table that stores attribute_id and table_name.
Create table attibute_tablename(
attribute_id int(11),
table_name varchar(25),
PRIMARY KEY (attribute_id, table_name)
)
You can also add a foreign key to the product table if you want.
Then you only need an insert to this table for every new item added
I'm helping a friend design a database but I'm curious if there is a general rule of thumb for the following:
TABLE_ORDER
OrderNumber
OrderType
The column OrderType has the possibility of coming from a preset list of Order Types. Should I allow VARCHAR values to be used in the OrderType column (ex. Production Order, Sales Order, etc...) Or should I separate it out into another table and have it referenced as a foreign key instead from the TABLE_ORDER as the following?:
TABLE_ORDER
OrderNumber
OrderTypeID
TABLE_ORDER_TYPE
ID
OrderType
If the order type list is set, and will not change, you could opt to not-make a seperate table. But in this case, do not make it VARCHAR, but make it an ENUM.
You can index this better, and you will end up with arguably the same type of database as when you make it an ID with lookup-table.
But if there is any change at all you need to add types, just go for the second. You can add an interface later, but you can easily make "get all types" kind of pages etc.
I would say use another table say "ReferenceCodes" for example:
Type, Name, Description, Code
Then you can just use the Code through out the database and need not worry about the name associated to that code. If you use a name (for example order type in your case), if would be really difficult to change the name later on. This is what we actually do in our system.
In a perfect world, any column that can contain duplicate data should be an id or an ENUM. This helps you make sure that the data is always internally consistent and can reduce database size as well as speed up queries.
For something like this structure, I would probably create a master_object table that you could use for multiple types. OrderType would reference the master_object table. You could then use the same table for other data. For example, let's say you had another table - Payments, with a column of PaymentType. You could use the master_object table to also store the values and meta-data for that column. This gives you quite a bit of flexibility without forcing you to create a bunch of small tables, each containing 2-10 rows.
Brian
If the list is small ( less than 10 items ) then you could choose to model it as your first but put a column constraint to limit the inputs to the values in your list. This forces the entries to belong to your list, but your list should not change often.
e.g. check order_type in ('Val1','Val2',...'Valn')
If the list will ever change, if it is used in multiple tables, you are required to support multiple languages or any other design criteria that demands variability, then create your type table (you are always safe with this choice, it is why it is the most used).
You can collect all such tables into a 'codes' table that generalizes the concept
CREATE TABLE Codes (
Code_Class CHARACTER VARYING(30) NOT NULL,
Code_Name CHARACTER VARYING(30) NOT NULL,
Code_Value_1 CHARACTER VARYING(30),
Code_Value_2 CHARACTER VARYING(30),
Code_Value_3 CHARACTER VARYING(30),
CONSTRAINT PK_Codes PRIMARY KEY (Code_Class, Code_Name)
);
insert into codes ( code_class, code_name, code_value_1 )
values( 'STATE','New York','NY' ),
values( 'STATE, 'California','CA'),
.... );
You can then place and UPDATE/INSERT trigger on the table.column under change that should be constrained to a list of states. Lets say an employee table has a column EMP_STATE to hold state short-forms.
The trigger would simply call a select statement like
SELECT code_name
, code_value_1
INTO v_state_name, v_state_short_name
FROM codes
WHERE code_class = 'STATE'
AND code_value_1 = new.EMP_STATE;
if( not found ) then
raise( some error to fail the trigger and the insert );
end if;
This can be extended to other types:
insert into codes ( code_class, code_name )
values( 'ORDER_TYPE','Production' ),
values( 'ORDER_TYPE', 'Sales'),
.... );
select code_name
, code_value_1
into v_state_name, v_state_short_name
from codes
where code_class = 'ORDER_TYPE'
and code_name = 'Sales';
This last method, although generally applicable can be over-used. It also has the downside that you cannot use different data types (code_name, code_value_*).
The general rule of thumb: create a 'TYPE' (e.g. ORDER_TYPE) table (to hold the values you wish to constrain an attribute to for each type), use an ID as the primary key, use a single sequence to generate all such id's (for all your 'TYPE' tables). The many TYPE tables may clutter your model, but the meaning will be clear to your developers (the ultimate goal).
I have a person table and I want users to be able to create custom many to many relations of information with them. Educations, residences, employments, languages, and so on. These might require different number of columns. E.g.
Person_languages(person_fk,language_fk)
Person_Educations(person,institution,degree,field,start,end)
I thought of something like this. (Not correct sql)
create Tables(
table_id PRIMARY_KEY,
table_name_fk FOREIGN_KEY(Table_name),
person_fk FOREIGN_KEY(Person),
table_description TEXT
)
Table holding all custom table name and descriptions
create Table_columns(
column_id PRIMARY_KEY,
table_fk FOREIGN_KEY(Tables),
column_name_fk FOREIGN_KEY(Columns),
rank_column INT,
)
Table holding the columns in each custom table and the order they are to be displayed in.
create Table_rows(
row_id PRIMARY_KEY,
table_fk FOREIGN_KEY(Tables),
row_nr INT,
)
Table holding the rows of each custom table.
create Table_cells(
cell_id PRIMARY_KEY,
table_fk FOREIGN_KEY(Tables),
row_fk FOREIGN_KEY(Table_rows),
column_fk FOREIGN_KEY(Table_columns),
cell_content_type_fk FOREIGN_KEY(Content_types),
cell_object_id INT,
)
Table holding cell info.
If any custom table starts to be used with most persons and becomes large, the idea was to maybe then extract it into a separate hard-coded many-to-many table just for that table.
Is this a stupid idea? Is there a better way to do this?
I strongly advise against such a design - you are on the road to an extremely fragmented and hard to read design.
IIUC your base problem is, that you have a common set of (universal) properties for a person, that may be extended by other (non-universal) properties.
I'd tackle this by having the universal properties in the person table and create two more tables: property_types, which translates a property name into an INT primary key and person_properties which combines person PK, propety PK and value.
If you set the PK of this table to be (person,property) you get the best possible index locality for the person, which makes requesting all properties for a person a very fast query.
My web application allows a user to define from 1 up to 30 emails (could be anything else).
Which of these options is best?
1) ...store the data inside only one column using a separator, like this:
[COLUMN emails] peter#example.com,mary#example.com,john#example.com
Structure:
emails VARCHAR(1829)
2) ...or save the data using distinct columns, like this:
[COLUMN email1] peter#example.com
[COLUMN email2] mary#example.com
[COLUMN email3] john#example.com
[...]
Structure:
email1 VARCHAR(60)
email2 VARCHAR(60)
email3 VARCHAR(60)
[...]
email30 VARCHAR(60)
Thank you in advance.
Depends on how you are going to use the data and how fixed the amount of 30 is. If it is an advantage to quickly query for the 3rd address or filter using WHERE clauses and such: use distinct fields; otherwise it might not be worth the effort of creating the columns.
Having the data in a database still has the advantage of concurrent access by several users.
Number two is the better option, without question. If you do the first one (comma separated), then it negates the advantages of using a RDBMS (you can't run an efficient query on your emails in that case, so it may as well be a flat file).
number 2 is better than number one.
However, you should consider another option of getting a normalized structure where you have a separate emails table with a foreign key to your user record. This would allow you to define an index if you wanted to search by email to find a user and place a constraint ensuring no duplicate emails are registered - if you wanted to do that.
Neither one is a very good option.
Option 1 is a poor idea because it makes looking a user up by email a complex, inefficient task. You are effectively required to perform a full text search on the email field in the user record to find one email.
Option 2 is really a WORSE idea, IMO, because it makes any surrounding code a huge pain to write. Suppose, again, that you need to look up all users who have a value X. You now need to enumerate 30 columns and check each one to see if that value exists. Painful!
Storing data in this manner -- 1-or-more of some element of data -- is very common in database design, and as Adam has previously mentioned, is best solved in MOST cases by using a normalized data structure.
A correct table structure, written in MySQL since this was tagged as such, might look like:
Users table:
CREATE TABLE user (
user_id int auto_increment,
...
PRIMARY KEY (user_id)
);
Emails table:
CREATE TABLE user_email (
user_id int,
email char(60) not null default '',
FOREIGN KEY (user_id) REFERENCES user (user_id) ON DELETE CASCADE
);
The FOREIGN KEY statement is optional -- the design will work without it, however, that line causes the database to force the relationship. For example, if you attempt to insert a record into user_email with a user_id of 10, there MUST be a corresponding user record with a user_id of 10, or the query will fail. The ON DELETE CASCADE tells the database that if you delete a record from the user table, all user_email records associated with it will also be deleted (you may or may not want this behavior).
This design of course also means that you need to perform a join when you retrieve a user record. A query like this:
SELECT user.user_id, user_email.email FROM user LEFT JOIN user_email ON user.user_id = user_email.user_id WHERE <your where clause>;
Will return one row for EACH user_email address stored in the system. If you have 5 users and each user has 5 email addresses, the above query will return 25 rows.
Depending on your application, you may want to get one row per user but still have access to all the emails. In that case you might try an aggregate function like GROUP_CONCAT which will return a single row per user, with a comma-delimited list of emails belonging to that user:
SELECT user.user_id, GROUP_CONCAT(user_email.email) AS user_emails FROM user LEFT JOIN user_email ON user.user_id = user_email.user_id WHERE <your where clause> GROUP BY user.user_id;
Again, depending on your application, you may want to add an index to the email column.
Finally, there ARE some situations where you do not want a normalized database design, and a single-column design with delimited text might be more appropriate, although those situations are few and far between. For most normal applications, this type of normalized design is the way to go and will help it perform and scale better.
I am currently writing my truly first PHP Application and i would like to know how to project/design/implement MySQL Views properly;
In my particular case User data is spread across several tables (as a consequence of Database Normalization) and i was thinking to use a View to group data into one large table:
CREATE VIEW `Users_Merged` (
name,
surname,
email,
phone,
role
) AS (
SELECT name, surname, email, phone, 'Customer'
FROM `Customer`
)
UNION (
SELECT name, surname, email, tel, 'Admin'
FROM `Administrator`
)
UNION (
SELECT name, surname, email, tel, 'Manager'
FROM `manager`
);
This way i can use the View's data from the PHP app easily but i don't really know how much this can affect performance.
For example:
SELECT * from `Users_Merged` WHERE role = 'Admin';
Is the right way to filter view's data or should i filter BEFORE creating the view itself?
(I need this to have a list of users and the functionality to filter them by role).
EDIT
Specifically what i'm trying to obtain is Denormalization of three tables into one. Is my solution correct?
See Denormalization on wikipedia
In general, the database engine will perform the optimization for you. That means that the engine is going to figure out that the users table needs to be filtered before being joined to the other tables.
So, go ahead and use your view and let the database worry about it.
If you detect poor performance later, use MySQL EXPLAIN to get MySQL to tell you what it's doing.
PS: Your data design allows for only one role per user, is that what you wanted? If so, and if the example query you gave is one you intend to run frequently, make sure to index the role column in users.
If you have <1000 users (which seems likely), it doesn't really matter how you do it. If the user list is unlikely to change for long periods of time, the best you can probably do in terms of performance is to load the user list into memory and not go to the database at all. Even if user data were to change in the meantime, you could update the in-memory structure as well as the database and, again, not have to read user information from the DB.
You would probably be much better off normalizing the Administrators, Users, Managers and what-have-you into one uniform table with a discriminator column "Role" that would save a lot of duplication, which is essentially the reason to do normalization in the first place. You can then add the role specific details to distinct tables that you use with the User table in a join.
Your query could then look as simple as:
SELECT
`Name`, `Surname`, `Email`, `Phone`, `Role`
FROM `User`
WHERE
`User`.`Role` IN('Administrator','Manager','Customer', ...)
Which is also easier for the database to process than a set of unions
If you go a step further you could add a UserRoleCoupling table (instead of the Role column in User) that holds all the roles a User has per user:
CREATE TABLE `UserRoleCoupling` (
UserID INT NOT NULL, -- assuming your User table has and ID column of INT
RoleID INT NOT NULL,
PRIMARY KEY(UserID, RoleID)
);
And put the actual role information into a separate table as well:
CREATE TABLE `Role` (
ID INT NOT NULL UNIQUE AUTO_INCREMENT,
Name VARCHAR(64) NOT NULL
PRIMARY KEY (Name)
)
Now you can have multiple roles per User and use queries like
SELECT
`U`.`Name`
,`U`.`Surname`
,`U`.`Email`
,`U`.`Phone`
,GROUP_CONCAT(`R`.`Name`) `Roles`
FROM `User`
INNER JOIN `UserGroupCoupling` `UGC` ON `UGC`.`UserID` = `User`.`ID`
INNER JOIN `Role` `R` ON `R`.`ID` = `UGC`.`RoleID`
GROUP BY
`U`.`Name`, `U`.`Surname`, `U`.`Email`, `U`.`Phone`
Which would give you the basic User details and a comma seperated list of all assigned Role names.
In general, the best way to normalize a database structure is to make the tables as generic as possible without being redundant, so don't add administrator or customer specific details to the user table, but use a relationship between User and Administrator to find the specific administrator details. The way you're doing it now isn't really normalized.
I'll see if i can find my favorite book on database normalization and post the ISBN when I have time later.