Supoose I have the following:
tbl_options
===========
id name
1 experience
2 languages
3 hourly_rate
tbl_option_attributes
=====================
id option_id name value
1 1 beginner 1
2 1 advanced 2
3 2 english 1
4 2 french 2
5 2 spanish 3
6 3 £10 p/h 10
7 3 £20 p/h 20
tbl_user_options
================
user_id option_id value
1 1 2
1 2 1
1 2 2
1 2 3
1 3 20
In the above example tbl_user_options stores option data for the user. We can store multiple entries for some options.
Now I wish to extend this, i.e. for "languages" I want the user to be able to specify their proficiency in a language (basic/intermediate/advanced). There will also be other fields that will have extended attributes.
So my question is, can these extended attributes be stored in the same table (tbl_user_options) or do I need to create more tables? Obviously if I put in a field "language_proficiency" it won't apply to the other fields. But this way I only have one user options table to manage. What do you think?
EDIT: This is what I propose
tbl_user_options
================
user_id option_id value lang_prof
1 1 2 null
1 2 1 2
1 2 2 3
1 2 3 3
1 3 20 null
My gut instinct would be to split the User/Language/Proficiency relationship out into its own tables. Even if you kept it in the same table with your other options, you'd need to write special code to handle the language case, so you might as well use a new table structure.
Unless your data model is in constant flux, I would rather have tbl_languages and tabl_user_languages tables to store those types of data:
tbl_languages
================
lang_id name
1 English
2 French
3 Spanish
tbl_user_languages
================
user_id lang_id proficiency hourly_rate
1 1 1 20
1 2 2 10
2 2 1 15
2 2 3 20
3 3 2 10
Designing a system that is "too generic" is a Turing tarpit trap for a relational SQL database. A document-based database is better suited to arbitrary key-value stores.
Excepting certain optimisations, your database model should match your domain model as closely as possible to minimise the object-relational impedance mismatch.
This design lets you display a sensible table of user language proficiencies and hourly rates with only two inner joins:
SELECT
ul.user_id,
u.name,
l.name,
ul.proficiency,
ul.hourly_rate
FROM tbl_user_languages ul
INNER JOIN tbl_languages l
ON l.lang_id = ul.lang_id
INNER JOIN tbl_users u
ON u.user_id = ul.user_id
ORDER BY
l.name, u.hour
Optionally you can split out a list of language proficiencies into a tbl_profiencies table, where 1 == Beginner, 2 == Advanced, 3 == Expert and join it onto tbl_user_languages.
i'm thinking it's a mistake to put "languages" as an option. while reading your text it seems to me that english is an option, and it might have an attribute from option_attributes.
Related
Which of these methods would be the most efficient way of storing, retrieving, processing and searching a large (millions of records) index of stored URLs along with there keywords.
Example 1: (Using one table)
TABLE_URLs-----------------------------------------------
ID DOMAIN KEYWORDS
1 mysite.com videos,photos,images
2 yoursite.com videos,games
3 hissite.com games,images
4 hersite.com photos,pictures
---------------------------------------------------------
Example 2: (one-to-one Relationship from one table to another)
TABLE_URLs-----------------------------------------------
ID DOMAIN KEYWORDS
1 mysite.com
2 yoursite.com
3 hissite.com
4 hersite.com
---------------------------------------------------------
TABLE_URL_KEYWORDS---------------------------------------------
ID DOMAIN_ID KEYWORDS
1 1 videos,photos,images
2 2 videos,games
3 3 games,images
4 4 photos,pictures
---------------------------------------------------------
Example 3: (one-to-one Relationship from one table to another (Using a reference table))
TABLE_URLs-----------------------------------------------
ID DOMAIN
1 mysite.com
2 yoursite.com
3 hissite.com
4 hersite.com
---------------------------------------------------------
TABLE_URL_TO_KEYWORDS------------------------------------
ID DOMAIN_ID KEYWORDS_ID
1 1 1
2 2 2
3 3 3
4 4 4
---------------------------------------------------------
TABLE_KEYWORDS-------------------------------------------
ID KEYWORDS
1 videos,photos,images
2 videos,games
3 games,images
4 photos,pictures
---------------------------------------------------------
Example 4: (many-to-many Relationship from url to keyword ID (using reference table))
TABLE_URLs-----------------------------------------------
ID DOMAIN
1 mysite.com
2 yoursite.com
3 hissite.com
4 hersite.com
---------------------------------------------------------
TABLE_URL_TO_KEYWORDS------------------------------------
ID DOMAIN_ID KEYWORDS_ID
1 1 1
2 1 2
3 1 3
4 2 1
5 2 4
6 3 4
7 3 3
8 4 2
9 4 5
---------------------------------------------------------
TABLE_KEYWORDS-------------------------------------------
ID KEYWORDS
1 videos
2 photos
3 images
4 games
5 pictures
---------------------------------------------------------
My understanding is that Example 1 would take the largest amount of storage space however searching through this data would be quick (Repeat keywords saved multiple times, however keywords are sat next to the relevant domain)
wWhereas Example 4 would save a tons on storage space but searching through would take longer. (Not having to store duplicate keywords, however referencing multiple keywords for each domain would take longer)
Could anyone give me any insight or thoughts on which the best method would be to utilise when designing a database that can handle huge amounts of data? With the foresight that you may want to display a URL with its assosicated keywords OR search for one or more keywords and bring up the most relevant URLs
You do have a many-to-many relationship between url and keywords. The canonical way to represent this in a relational database is to use a bridge table, which corresponds to example 4 in your question.
Using the proper data structure, you will find out that the queries will be much easier to write, and as efficient as it gets.
I don't know what drives you to think that searchin in a structure like the first one will be faster. This requires you to do pattern matching when searching for each single keyword, which is notably slow. On the other hand, using a junction table lets you search for exact matches, which can take advantage of indexes.
Finally, maintaining such a structure is also much easier; adding or removing keywords can be done with insert and delete statements, while other structures require you do do string manipulation in delimited list, which again is tedious, error-prone and inefficient.
None of the above.
Simply have a table with 2 string columns:
CREATE TABLE domain_keywords (
domain VARCHAR(..) NOT NULL,
keyword VARCHAR(..) NOT NULL,
PRIMARY KEY(domain, keyword),
INDEX(keyword, domain)
) ENGINE=InnoDB
Notes:
It will be faster.
It will be easier to write code.
Having a plain id is very much a waste.
Normalizing the domain and keyword buys little space savings, but at a big loss in efficiency.
"Huse database"? I predict that this table will be smaller than your Domains table. That is, this table is not your main concern for "huge".
I have a table users and some other tables like images and products
Table users:
user_id user_name
1 andrew
2 lutz
3 sophie
4 michael
5 peter
6 oscor
7 anton
8 billy
9 henry
10 jon
Tables images:
user_id img_type img_url
1 0 url1
1 1 url4
2 0 url5
7 0 url7
8 0 url8
9 1 url9
Table Products
user_id prod_id
1 5
1 55
2 555
8 5555
9 5
9 55
I use this kind of SELECT:
SELECT * FROM
(SELECT user.user_id,user.user_name, img.img_type, prod.prod_id FROM
users
LEFT JOIN images img ON img.user_id = users.user_id
LEFT JOIN products prod ON prod.user_id = users.user_id
WHERE user.user_id <= 5) AS users
ORDER BY user.user_id ASC
The result should be the following output. Due to performance improvements, I use ORDER BY and an inner select. If I put a LIMIT 5 within the inner or outer select, things won't work. MySQL will hard LIMIT the results to 5. However I need the LIMIT of 5 (pagination) found unique user_id results which would lead to 9 in this case.
Can I use maybe an if-statement to push an array with found user_id and break/finish up the select when the array consist of 5 UIDs? Or can I modify somehow the select?
user_id user_name img_type prod_id
1 andrew 0 5
1 andrew 1 5
1 andrew 0 55
1 andrew 1 55
2 lutz 0 5
2 lutz 0 55
3 sophie null null
4 michael null null
5 peter null null
results: 9
LIMIT 5 and user_id <= 5 do not necessarily give you the same results. One reason: There are multiple rows (after the JOINs) for user_id = 1. This is because there can be multiple images and/or multiple products for a given 'user'.
So, first decide which you want.
LIMIT without ORDER BY gives you an arbitrary set of rows. (Yeah, it is somewhat predictable, but you should not depend on it.)
ORDER BY + LIMIT usually implies gathering all the potentially relevant rows, sorting them, then doing the "limit". There are sometimes ways around this sluggishness.
LEFT leads to the NULLs you got; did you want that?
What do you want pagination to do if you are displaying 5 items per page, but user 1 has 6 images? You need to think about this edge case before we can help you with a solution. Maybe you want all of user 1 on a page, even if it exceeds 5? Maybe you want to break in the middle of '1'; but then we need an unambiguous way to know where to continue from for the next page.
Probably any viable solution will not use nested SELECTs. As you are finding out, it leads to "errors". Think of it this way: First find all the rows you need to display on all the pages, then carve out 5 for the current page.
Here are some more musings on pagination: http://mysql.rjweb.org/doc.php/pagination
I'm trying to join a few tables in MySQL. Our setup is a little unique so I try to explain as good as I can.
I have a table 'INVENTORY' that represents the current items on stock.
These items are stored in a table 'COMPONENT'
Components are being used in installations.
Every user can have multiple installations and the same component can be used in multiple installation as well.
To uniquely map a component to an installation, it can be assigned to a PRODUCT. a product as has a 1-1 relationship with an installation. A component is not directly related to an installation
To finally assign a product to a specific installation a mapping table COMPOMENT_PRODUCT is used.
Example:
A component is like a part, lets say a screw. This screw is used in a computer. The very same screw can be used on multiple computers. But each computer can only be used on one specific installation.
TABLE COMPOMENT_PRODUCT
COMPOMENT_ID PRODUCT_ID
1 1
1 2
2 1
2 2
So we have the components C1 and C2 relevant for two installations.
TABLE INVENTORY
COMPOMENT_ID INSTALLATION_ID ON_STOCK
1 1 5
1 2 2
What I want to achieve
Now, I want to retrieve the inventory state for all components. But, not every component has an inventory record. In these cases, the ON_STOCK value from the inventory shall be NULL
That means, for this example I'd expect the following results
COMPOMENT_ID PRODUCT_ID ON_STOCK
1 1 5
1 2 2
2 1 NULL
2 2 NULL
But executing this query:
SELECT DISTINCT
COMPONENT_PRODUCT.COMPONENT_ID,
COMPONENT_PRODUCT.PRODUCT_ID,
INVENTORY.ON_STOCK
FROM INVENTORY
RIGHT JOIN COMPONENT_PRODUCT ON COMPONENT_PRODUCT.COMPONENT_ID =
INVENTORY.COMPONENT_ID
returns the following resultset:
COMPONENT_ID PRODUCT_ID ON_STOCK
1 1 5
1 2 5
1 1 2
1 2 2
2 1 (null)
2 2 (null)
Now, my next thought was, "of course, this is how joins behave, okay I need to group the results". But the way SQL works, the aggregation is not entirely predictable. SO when I
GROUP BY COMPONENT_PRODUCT.COMPONENT_ID,COMPONENT_PRODUCT.PRODUCT_ID
I get this result:
COMPONENT_ID PRODUCT_ID ON_STOCK
1 1 5
1 2 5
2 1 (null)
2 2 (null)
I have prepared a Fiddle here: http://sqlfiddle.com/#!9/71ca87
What am I forgetting here? Thanks in advance for any pointers.
Try this query -
SELECT DISTINCT
COMPONENT_PRODUCT.COMPONENT_ID,
COMPONENT_PRODUCT.PRODUCT_ID,
INVENTORY.ON_STOCK
FROM INVENTORY
RIGHT JOIN COMPONENT_PRODUCT ON COMPONENT_PRODUCT.COMPONENT_ID =
INVENTORY.COMPONENT_ID
AND COMPONENT_PRODUCT.PRODUCT_ID = INVENTORY.INSTALLATION_ID
I am pretty new to mysql and this site. I got an old mysql database (100.000 entries) to migrate to our new system. This is the old table:
CUSTOMER
Customer_ID Name Categories
1 Bob 1,2
2 Phil NULL
3 Ines 10,8
4 Carol 1
5 Rick 13,2
And i need the following structure:
CUSTOMER
Customer_ID Name
1 Bob
2 Phil
3 Ines
4 Carol
5 Rick
Category
Category_ID Category_Name
1 Biker
2 Doctors
3 Teacher
... ...
13 Drivers
CustomerHasCategory
Customer_ID Category_ID
1 1
1 2
3 10
3 8
4 1
5 13
5 2
Thanks for any help.
I also had this problem but not in MySQL. I solved it with Python using the Pandas library. So, the exact steps I followed won't be useful for you. However, I'll show you the general idea behind the solution I used.
Below is image of the original column
First, I splitted the text into columns using the comas as the delimiter.
Next, I 'stacked' the columns
Finally, I removed the artefact column(s). So, I have only the ID and the values columns. This creates a one-to-many relationship.
I'm building a e-Commerce platform (PHP + MySQL) and I want to add a attribute (feature) to products, the ability to specify (enable/disable) the selling status for specific city.
Here are simplified tables:
cities
id name
==========
1 Roma
2 Berlin
3 Paris
4 London
products
id name cities
==================
1 TV 1,2,4
2 Phone 1,3,4
3 Book 1,2,3,4
4 Guitar 3
In this simple example is easy to query (using FIND_IN_SET or LIKE) to check the availability of product for specific city.
This is OK for 4 city in this example or even 100 cities but will be practical for a large number of cities and for very large number of products?
For better "performance" or better database design should I add another table to table to JOIN in query (productid, cityid, status) ?
availability
id productid cityid status
=============================
1 1 1 1
2 1 2 1
3 1 4 1
4 2 1 1
5 2 3 1
6 2 4 1
7 3 1 1
8 3 2 1
9 3 3 1
10 3 4 1
11 4 3 1
For better "performance" or better database design should I add
another table
YES definitely you should create another table to hold that information likewise you posted rather storing in , separated list which is against Normalization concept. Also, there is no way you can gain better performance when you try to JOIN and find out the details pf products available in which cities.
At any point in time if you want to get back a comma separated list like 1,2,4 of values then you can do a GROUP BY productid and use GROUP_CONCAT(cityid) to get the same.