I'm developing an app and it requires me to design the database. I'm wondering what'd be more optimal design in following scenario:
Approach 1:
Have one user table with all the user fields viz.
id | uid | username | first_name | last_name | profession
OR
Approach 2:
Table I:
id | uid | username
Table II:
uid | key | value |
1 | 'first_name' | John
2 | 'last_name' | Donald and so on
The first approach favours more columns to store the user data, while the second approach relies on multiple tables and stores data into several rows for each user.
The second approach would mean that for each user, the user_meta table will have large number of rows while approach #1 will be more compact.
Questions:
Which approach is better in terms of performance and speed of queries?
Is there any rule for designing the database where you've to decide whether to store the data in rows vs columns?
The first model you propose is a regular relational design. It is widely used, very efficient in terms of speed and storage space, but it requires you to understand the data model before you store the data; adding an additional field would require a schema change.
The second model you propose is commonly known as "Entity-Attribute-Value" or EAV. You'll find a detailed question here.
It's worth thinking this through though - imagine a screen which lists all users who have logged in today. In your first model, you issue a single query - select * from users where last_logged_in >= '1 Jan 2015'
Now imagine that query in model 2 - you'd have something like
select u.*, ln.value, fn.value
from users u
outer join metadata ln on u.user_id = ln.user_id
and ln.key = 'last_name'
outer join metadata fn on u.user_id = fn.user_id
and fn.key = 'first_name'
and u.llast_logged_in >= '1 Jan 2015'
Two outer joins, and a complex query once you go beyond this trivial example.
If you have a lot of additional data, and you don't expect to use it as a major part of the relational model (i.e. use it as a criteria in a join or where statement), you can use MySQL's support for JSON or XML.
This allows you to store data whose schema you may not know at design time, and which is "sparse" (i.e. not all records have all fields populated), but it's slightly more awkward to query and populate into your client language.
You can actually use a combination of the two. For the common data which you can define, stick to a table with fixed column names.
Then when you add attributes which are (for example) customer defined, then use the second method to supplement the data.
Related
I am trying to normalise my MySQL 5.7 data shema and strugle with replacing the SQL queries:
At the moment there is one table containing all attributes of each article:
article_id | title | ref_id | dial_c_id
The task is to retrieve all articles which match two given attributes (ref_id and dial_c_id) and also retrieve all their other attributes.
With just one table, this is straightforward:
SELECT *
FROM test.articles_test
WHERE
ref_id = '127712'
AND dial_c_id = 51
Now in my effort to normalise, I have created a second table, which stores the attributes of each article and removed the ones in table articles:
table 1:
article_id | title
table 2:
article_id | attr_group | attribute
1 ref_id 51
1 dial_c_id 33
1 another 5
2 ..
I would like to retrieve all article details including ALL attributes which match ref_id and dial_c_id with this two table shema.
Somehow like this:
SELECT
a.article_id,
a.title,
attr.*
FROM test.articles_test a
INNER JOIN attributes attr ON a.article_id = attr.article_id
AND ref_id = '127712'
AND dial_c_id = 51
How can this be done?
You have used an Entity-Attribute-Value table to record your attributes.
This is the opposite of normalization.
Name the rule of normalization that guided you to put different attributes into the same column. You can't, because this is not a normalization practice.
To accomplish your query with your current EAV design, you need to pivot the result so you get something as if you had your original table.
SELECT * FROM (
SELECT
a.article_id,
a.title,
MAX(CASE attr_group WHEN 'ref_id' THEN attribute END) AS ref_id,
MAX(CASE attr_group WHEN 'dial_c_id' THEN attribute END) AS dial_c_id
-- ...others...
FROM test.articles_test a
INNER JOIN attributes attr ON a.article_id = attr.article_id
GROUP BY a.article_id, a.title) AS pivot
WHERE pivot.ref_id = '127712'
AND pivot.dial_c_id = 51
While the above query can produce the result you want, the performance will be terrible. It has to create a temp table for the subquery, containing all data from both tables, then apply the WHERE clause against the temp table.
You're really better off with each attribute in its own column in your original table.
I understand that you are trying to allow for many attributes in the future. This is a common problem.
See my answer to
How to design a product table for many kinds of product where each product has many parameters
But you shouldn't call it "normalised," because it isn't. It's not even denormalised. It's derelational.
You can't just use words to describe anything you want — especially not the opposite of what the word means. I can't let the air out of my bicycle tire and say "I'm inflating it."
You commented that you're trying to make your database "scalable." You also misunderstand what the word "scalable" means. By using EAV, you're creating a structure where the queries needed are difficult to write and inefficient to execute, and the data takes 10x space. It's the opposite of scalable.
What you mean is that you're trying to create a system that is extensible. This is complex to implement in SQL, but I describe several solutions in the other Stack Overflow answer to which I linked. You might also like my presentation Extensible Data Modeling with MySQL.
For example have url like domain.com/transport/cars
Based on the url want to select from mysql and show list of ads for cars
Want to choose fastest method (method that takes less time to show results and will use less resources).
Comparing 2 ways
First way
Mysql table transport with rows like
FirstLevSubcat | Text
---------------------------------
1 | Text1 car
2 | Text1xx lorry
1 | Text another car
FirstLevSubcat Type is int
Then another mysql table subcategories
Id | NameOfSubcat
---------------------------------
1 | cars
2 | lorries
3 | dogs
4 | flats
Query like
SELECT Text, AndSoOn FROM transport
WHERE
FirstLevSubcat = (SELECT Id FROM subcategories WHERE NameOfSubcat = `cars`)
Or instead of SELECT Id FROM subcategories get Id from xml file or from php array
Second way
Mysql table transport with rows like
FirstLevSubcat | Text
---------------------------------
cars | Text1 car
lorries | Text1xx lorry
cars | Text another car
FirstLevSubcat Type is varchar or char
And query simply
SELECT Text, AndSoOn FROM transport
WHERE FirstLevSubcat = `cars`
Please advice which way would use less resources and takes less time to show results. I read that better select where int than where varchar SQL SELECT speed int vs varchar
So as understand the First way would be better?
The first design is much better, because you separate two facts in your data:
There is a category 'cars'.
'Text1 car' is in the Category 'cars'.
Imagine, in your second design you enter another car, but type in 'cors' instead of 'cars'. The dbms doesn't see this, and so you have created another category with a single entry. (Well, in MySQL you could use an enum column instead to circumvent this issue, but this is not available in most other dbms. And anyhow, whenever you want to rename your category, say from 'cars' to 'vans', then you would have to change all existing records plus alter the table, instead of simply renaming the entry once in the subcategories table.)
So stay away from your second design.
As to Praveen Prasannan's comment on sub queries and joins: That is nonsense. Your query is straight forward and good. You want to select from transport where the category is the desired one. Perfect. There are two groups of persons who would prefer a join here:
Beginners who simply don't know better and always join from the start and try to sort things out in the end.
Experienced programmers who know that some dbms often handle joins better than sub-queries. But this is a pessimistic habit. Better write your queries such that they are easy to read and maintain, as you are already doing, and only change this in case grave performance issues occur.
Yup. As the SO link in your question suggests, int comparison is faster than character comparison and yield faster fetch. Keeping this in mind, first design would be considered as better design. However sub queries are never recommended. Use join instead.
eg:
SELECT t.Text, t.AndSoOn FROM transport t
INNER JOIN subcategories s ON s.ID = t.FirstLevSubcat
WHERE s.NameOfSubcat = 'cars'
So I'm new to databases in the scope of the subject and looking for some advice for what I am sure is fairly simple. first I'm using MySql as my db I currently have two tables one for storing user accounts and details :
TABLE user
id | username | password | email_address | user_devices | contact_method
and another for storing video content by producers which looks like:
TABLE series
id | series_title | still_broadcasting | last_updated |
I would like to implement a feature where Users can select series which they wish to be notified of when new releases are made available and also select how to be notified about these releases (email or push notification ) and how often to be notified (on arrival, hourly, daily, weekly ) I am wondering whats the best way to go about doing this?
I've thought of these ideas by myself but am looking for a second opinion/ better way altogether: (all ideas minus 4 involve storing how to notify user along with how often in user table)
adding a text column to user table called following and just having csv's for each series
adding multiple boolean column's to user table one for each series
adding text column to series table with csv's of user's Id numbers following series
creating an entirely new table for notifications though i don't really see the purpose of this as its very redundant
I then plan to just add cron jobs to my server to actually go about regulaurly sending notifications to user's
Thanks in advance for any help.
First of all, it might be worth giving some articles on basic database design a read. A quick google turned up this which covers identifying relationships
http://www.datanamic.com/support/lt-dez005-introduction-db-modeling.html
Your best bet is to use a linking table i.e.
CREATE TABLE userHasSeries (
userID INT,
seriesID INT
);
This can then be used in an INNER JOIN query to get the users choices. What you are doing here is an n:m link between 2 tables. An example inner join would be
SELECT
u.id AS userID,
u.username,
s.seriesID,
s.series_title,
s.still_broadcasting,
s.last_updated
FROM users AS u
INNER JOIN userHasSeries AS uhs
ON uhs.userID = u.id
INNER JOIN series AS s
ON s.id = uhs.seriesID
If users.user_devices is also a comma seperated list I would advise heavily that you adopt a similar n:m approach there also.
A partial answer which complements what has been written in other answers:
Don't keep a list of devices in the 'user_devices' field - break this out into a separate table. In fact, you'll need two tables: one to list the various devices, and one a join table which has two fields: user_id and device_id. This will enable you to track which user has which device, but also to provide a list of users per device.
If I were you I would add a third table as following:
TABLE user
id | username | password | email_address | user_devices | contact_method |notification_type
TABLE series
id | series_title | still_broadcasting | last_updated
TABLE followings
id | user_id | series_id
In notification_type I would put (on arrival, hourly, daily, or weekly), now in the followings tables I will store all the user's preferred series.
Doing this way makes easy to add, delete, update, or select all user's preferred series. All will be simple SQL queries. Also you avoid parsing comma separated strings.
for example, if you want to get all preferred series of an user:
SELECT * FROM followings AS f INNER JOIN series AS s ON f.series_id = s.id WHERE f.user_id = ?
if want to get all users that prefer a serie:
SELECT * FROM followings AS f INNER JOIN user AS u ON f.user_id = u.id WHERE f.series_id = ?
I'm trying to select some data from a MySQL database.
I have a table containing business details, and a seperate one containing a list of trades. As we have multiple trades
business_details
id | business_name | trade_id | package_id
1 | Happy News | 12 | 1
This is the main table, contains the business name, the trade ID and the package ID
shop_trades
id | trade
1 | newsagents
This contains the trade type of the business
configuration_packages
id | name_of_trade_table
1 | shop_trades
2 | leisure_trades
This contains the name of the trade table to look in
So, basically, if I want to find the trade type (e.g., newsagent, fast food, etc) I look in the XXXX_trades table. But I first need to look up the name of XXXX from the configuration_packages table.
What I would normally do is 2 SQL queries:
SELECT business_details.*, configuration_packages.name_of_trade_table
FROM business_details, configuration_packages
WHERE business_details.package_id = configuration_packages.id
AND business_details.id = '1'
That gives me the name of the database table to look in for the trade name, so I look up the name of the table
SELECT trade FROM XXXX WHERE id='YYYY'
Where XXXX is the name of the table returned as part of the first query and YYYY is the id of the package, again returned from the first query.
Is there a way to combine these two queries so that I only run one?
I've used subqueries before, but only on the SELECT side of the query - not the FROM side.
Typically, this is handled by a union in a single query.
Normalization gets you to a logical model. This helps better understand the data. It is common to denormalize when implementing the model. Subtypes as you have here are commonly implemented in two ways:
Seperate tables as you have, which makes retrieval difficult. This results in your question about how to retreive the data.
A common table for all subtypes with a subtype indicator. This may result in columns which are always null for certain subtypes. It simplifies data access, and may alter the way that the subtypes are handled in code.
If the extra columns for a subtype are relatively rarely accessed, then you may use a hybrid implementation where the common columns are in the type table, and some or all of the subtype columns are in a subtype table. This is more complex to code.
That's not possible, and it sounds like a problem with your model.
Why don't you put shop_trades and leisure_traces into the same table with one column to distinct between the two?
If this is possible, try this
SELECT trade
FROM (SELECT 'TABLE_NAME' FROM 'INFORMATION_SCHEMA'.'TABLES'
WHERE 'TABLE_SCHEMA'='*schema name*')
WHERE id='YYYY'
UPDATE:
I think the code I have above is not possible. :|
I'm planing to build some database project.
One of the tables have a lot of attributes.
My question is: What is better, to divide the the class into 2 separate tables or put all of them into one table. below is an example
create table User { id, name, surname,... show_name, show_photos, ...)
or
create table User { id, name, surname,... )
create table UserPrivacy {usr_id, show_name, show_photos, ...)
The performance i suppose is similar due to i can use index.
It's best to put all the attributes in the same table.
If you start storing attribute names in a table, you're storing meta data in your database, which breaks first normal form.
Besides, keeping them all in the same table simplifies your queries.
Would you rather have:
SELECT show_photos FROM User WHERE user_id = 1
Or
SELECT up.show_photos FROM User u
LEFT JOIN UserPrivacy up USING(user_id)
WHERE u.user_id = 1
Joins are okay, but keep them for associating separate entities and 1->N relationships.
There is a limit to the number of columns, and only if you think you might hit that limit would you do anything else.
There are legitimate reasons for storing name value pairs in a separate table, but fear of adding columns isn't one of them. For example, creating a name value table might, in some circumstances, make it easier for you to query a list of attributes. However, most database engines, including PDO in PHP include reflection methods whereby you can easily get a list of columns for a table (attributes for an entity).
Also, please note that your id field on User should be user_id, not just id, unless you're using Ruby, which forces just id. 'user_id' is preferred because with just id, your joins look like this:
ON u.id = up.user_id
Which seems odd, and the preferred way is this:
ON u.user_id = up.user_id
or more simply:
USING(user_id)
Don't be afraid to 'add yet another attribute'. It's normal, and it's okay.
I'd say the 2 separate tables especially if you are using ORM. In most cases its best to have each table correspond to a particular object and have its field or "attributes" be things that are required to describe that object.
You don't need 'show_photos' to describe a User but you do need it to describe UserPrivacy.
You should consider splitting the table if all of the privacy attributes are nullable and will most probably have values of NULL.
This will help you to keep the main table smaller.
If the privacy attributes will mostly be filled, there is no point in splitting the table, as it will require extra JOINs to fetch the data.
Since this appears to be a one to one relationship, I would normally keep it all in one table unless:
You would be near the limit of the number of bytes that can be stored in a row - then you should split it out.
Or if you will normally be querying the main table separately and won't need those fields much of the time.
If some columns is (not identifiable or dependent on the primary key) or (values from a definite/fixed set is being used repeatedly) of the Table make a Different Table for those columns and maintain a one to one relationship.
Why not have a User table and Features table, e.g.:
create table User ( id int primary key, name varchar(255) ... )
create table Features (
user_id int,
feature varchar(50),
enabled bit,
primary key (user_id, feature)
)
Then the data in your Features table would look like:
| user_id | feature | enabled
| -------------------------------
| 291 | show_photos | 1
| -------------------------------
| 291 | show_name | 1
| -------------------------------
| 292 | show_photos | 0
| -------------------------------
| 293 | show_name | 0
I would suggest something differnet. It seems likely that in the future you will be asked for 'yet another attribute' to manage. Rather than add a column, you could just add a row to an attributes table:
TABLE Attribute
(
ID
Name
)
TABLE User
(
ID
...
)
TABLE UserAttributes
(
UserID FK Users.ID
Attribute FK Attributes.ID
Value...
)
Good comments from everyone. I should have been clearer in my response.
We do this quite a bit to handle special-cases where customers ask us to tailor our site for them in some way. We never 'pivot' the NVP's into columns in a query - we're always querying "should I do this here?" by looking for a specific attribute listed for a customer. If it is there, that's a 'true'. So rather than having these be a ton of boolean-columns, most of which would be false or NULL for most customers, AND the tendency for these features to grow in number, this works well for us.