I'm creating a website for use by a Youth Organisation to organize events, by providing a place for upcoming events to be listed and to be signed up to by members. In my database I have a table full of tags that can be assigned to events, quite like this website where you can tag your questions. I have another table to store the information about the events, for example title, description, requirements, date, etc.
I want to connect these databases up, so that when an event is made they are assigned a primary tag and an infinite amount of secondary tags from those in the tag table. Currently, I have a linking table that has a field for the event ID, the tag ID and whether the tag is the primary tag or not, however as I have had to set the fields to unique to allow me to create a relation I cannot store enter multiple event or tag IDs.
My question is, what is the the best way for me to structure my database for the functionality described above? Further more, if what I am doing correctly is correct, then how can I link the tables without either field in the linking table being a primary or unique key?
tblEvents
tblTags
As you have whats referred to as a "many to many" relationship (many events can have many tags) you need an intermediate table that handles assignment - this is called normalisation.
In this case, you need 3 columns: AssignmentID, EventID, TagID
Put all your tags in the same database table, but flag each one as either primary or secondary and handle the 1 primary tag per event within your code outside of the database.
For example your tag table could look like:
ETagID ETagName ETagColour ETagPrimary ETagDel
1 first red 1 0
2 second blue 0 0
3 third green 0 0
4 forth yellow 1 0
5 fifth orange 0 0
6 sixth white 0 0
and your assignment table:
AssignmentID EventID TagID
1 1 1
2 1 2
3 1 5
4 2 4
5 3 4
6 3 1
7 4 4
As your code outside of the sql handles the insertions in the first place, you can now query your tables using joins to pull out the event + tags for that event
SELECT ETagName, ETagColour FROM TagTable
JOIN AssignmentTable on AssignmentTable.TagID = TagTable.ETagID
JOIN EventTable on EventTable.EID = AssignmentTable.EventID
WHERE EventTable.EID = <some value> AND TagTable.ETagDel = 1
This would select all the tag names and colours for that specific event that aren't deleted.
An important thing to note is not to overcomplicate things. If your primary and secondary tags store the same info except for being either primary or secondary, then its pointless separating them into individual tables. Flagging them like I mentioned will be sufficient and reduces the number of tables required.
Hopefully this points you in the right direction moving forwards
Update:
As per the recent comment, you can handle the allocation of the primary tag within the assignment table. Create the same table as above, but include the primary flag column too
AssignmentID EventID TagID PrimaryFlag
1 1 1 1
2 1 2 0
3 1 5 0
4 2 4 1
5 3 4 1
6 3 1 0
7 4 4 1
then within the query, you can also select the status of the tag using a slightly modified version of the one written before:
SELECT ETagName, ETagColour, AssignmentTable.PrimaryFlag FROM TagTable
JOIN AssignmentTable on AssignmentTable.TagID = TagTable.ETagID
JOIN EventTable on EventTable.EID = AssignmentTable.EventID
WHERE EventTable.EID = <some value> AND TagTable.ETagDel = 1
if you want to make sure the primary tag appears at the top of that list, you can also bolt on
ORDER BY AssignmentTable.PrimaryFlag
Related
I'm currently working on a little project that uses MySQL. However I'm struggling with the database design. Currently I've come up with 2 designs, one stores more data but is actually the way I want it to be, however this way makes it really hard to work with the data. The other way is I think more basic and simplifies a lot of things but stores less data.
Design 1
Example data items table
id
description
time_created
1
Car
2021-04-17 17:30:00
2
Bike
2021-04-17 17:30:00
Example data user_items table
id
user_id
item_id
time_achieved
1
1
1
2021-04-17 17:30:04
2
1
1
2021-04-17 17:30:03
3
1
1
2021-04-17 17:30:17
4
1
1
2021-04-17 17:30:22
5
1
1
2021-04-17 17:30:34
6
1
2
2021-04-17 17:30:42
7
1
2
2021-04-17 17:30:54
Design 2
Example data items table
id
description
time_created
1
Car
2021-04-17 17:30:00
2
Bike
2021-04-17 17:30:00
Example data user_items table
id
user_id
item_id
count
1
1
1
5
2
1
2
2
Basically we have items that can be anything, they include a description to specify what they actually are. A user can collect items (a lot). These are stored in the user_items table which contains a FK user_id and item_id to the users and items table. The users table is left out for simplicity.
As you can see design 1 stores a lot more rows for the user_items table, this allows us to add more information (time_achieved and more) per item that a user achieved. However this results in more rows and probably a harder time queriyng. Design 2 on the other hand simply adds a count column to determine how many items the user has, but this is very limiting because we cannot add more data (achieved time..) per user_item.
I'm not sure if design 1 is the right and only design for what we want to achieve. Basically we really want to store additional metadata per user_item but I just don't know if this is the right design since it quickly fills up the database. Does anyone have a suggestion/idea for an alternative design which stores less data than design 1 but still allows to add more info per user_item?
Thanks in advance.
Does anyone have a suggestion/idea for an alternative design which stores less data than design 1 but still allows to add more info per user_item?
Design 1 should work.
This design will also work but quickly fills up, more efficient.
id, item_id,Item_des,Item_qty,user_id,username,time_created all in one table.
some of the values will be repeated.
Which of these methods would be the most efficient way of storing, retrieving, processing and searching a large (millions of records) index of stored URLs along with there keywords.
Example 1: (Using one table)
TABLE_URLs-----------------------------------------------
ID DOMAIN KEYWORDS
1 mysite.com videos,photos,images
2 yoursite.com videos,games
3 hissite.com games,images
4 hersite.com photos,pictures
---------------------------------------------------------
Example 2: (one-to-one Relationship from one table to another)
TABLE_URLs-----------------------------------------------
ID DOMAIN KEYWORDS
1 mysite.com
2 yoursite.com
3 hissite.com
4 hersite.com
---------------------------------------------------------
TABLE_URL_KEYWORDS---------------------------------------------
ID DOMAIN_ID KEYWORDS
1 1 videos,photos,images
2 2 videos,games
3 3 games,images
4 4 photos,pictures
---------------------------------------------------------
Example 3: (one-to-one Relationship from one table to another (Using a reference table))
TABLE_URLs-----------------------------------------------
ID DOMAIN
1 mysite.com
2 yoursite.com
3 hissite.com
4 hersite.com
---------------------------------------------------------
TABLE_URL_TO_KEYWORDS------------------------------------
ID DOMAIN_ID KEYWORDS_ID
1 1 1
2 2 2
3 3 3
4 4 4
---------------------------------------------------------
TABLE_KEYWORDS-------------------------------------------
ID KEYWORDS
1 videos,photos,images
2 videos,games
3 games,images
4 photos,pictures
---------------------------------------------------------
Example 4: (many-to-many Relationship from url to keyword ID (using reference table))
TABLE_URLs-----------------------------------------------
ID DOMAIN
1 mysite.com
2 yoursite.com
3 hissite.com
4 hersite.com
---------------------------------------------------------
TABLE_URL_TO_KEYWORDS------------------------------------
ID DOMAIN_ID KEYWORDS_ID
1 1 1
2 1 2
3 1 3
4 2 1
5 2 4
6 3 4
7 3 3
8 4 2
9 4 5
---------------------------------------------------------
TABLE_KEYWORDS-------------------------------------------
ID KEYWORDS
1 videos
2 photos
3 images
4 games
5 pictures
---------------------------------------------------------
My understanding is that Example 1 would take the largest amount of storage space however searching through this data would be quick (Repeat keywords saved multiple times, however keywords are sat next to the relevant domain)
wWhereas Example 4 would save a tons on storage space but searching through would take longer. (Not having to store duplicate keywords, however referencing multiple keywords for each domain would take longer)
Could anyone give me any insight or thoughts on which the best method would be to utilise when designing a database that can handle huge amounts of data? With the foresight that you may want to display a URL with its assosicated keywords OR search for one or more keywords and bring up the most relevant URLs
You do have a many-to-many relationship between url and keywords. The canonical way to represent this in a relational database is to use a bridge table, which corresponds to example 4 in your question.
Using the proper data structure, you will find out that the queries will be much easier to write, and as efficient as it gets.
I don't know what drives you to think that searchin in a structure like the first one will be faster. This requires you to do pattern matching when searching for each single keyword, which is notably slow. On the other hand, using a junction table lets you search for exact matches, which can take advantage of indexes.
Finally, maintaining such a structure is also much easier; adding or removing keywords can be done with insert and delete statements, while other structures require you do do string manipulation in delimited list, which again is tedious, error-prone and inefficient.
None of the above.
Simply have a table with 2 string columns:
CREATE TABLE domain_keywords (
domain VARCHAR(..) NOT NULL,
keyword VARCHAR(..) NOT NULL,
PRIMARY KEY(domain, keyword),
INDEX(keyword, domain)
) ENGINE=InnoDB
Notes:
It will be faster.
It will be easier to write code.
Having a plain id is very much a waste.
Normalizing the domain and keyword buys little space savings, but at a big loss in efficiency.
"Huse database"? I predict that this table will be smaller than your Domains table. That is, this table is not your main concern for "huge".
I am wondering if any of you would be able to help me. I am trying to loop through table 1 (which has duplicate values of the plant codes) and based on the unique plant codes, create a new record for the two other tables. For each unique Plant code I want to create a new row in the other two tables and regarding the non unique PtypeID I link any one of the PTypeID's for all inserts it doesnt matter which I choose and for the rest of the fields like name etc. I would like to set those myself, I am just stuck on the logic of how to insert based on looping through a certain table and adding to another. So here is the data:
Table 1
PlantCode PlantID PTypeID
MEX 1 10
USA 2 11
USA 2 12
AUS 3 13
CHL 4 14
Table 2
PTypeID PtypeName PRID
123 Supplier 1
23 General 2
45 Customer 3
90 Broker 4
90 Broker 5
Table 3
PCreatedDate PRID PRName
2005-03-21 14:44:27.157 1 Classification
2005-03-29 00:00:00.000 2 Follow Up
2005-04-13 09:27:17.720 3 Step 1
2005-04-13 10:31:37.680 4 Step 2
2005-04-13 10:32:17.663 5 General Process
Any help at all would be greatly appreciated
I'm unclear on what relationship there is between Table 1 and either of the other two, so this is going to be a bit general.
First, there are two options and both require a select statement to get the unique values of PlantCode out of table1, along with one of the PTypeId's associated with it, so let's do that:
select PlantCode, min(PTypeId)
from table1
group by PlantCode;
This gets the lowest valued PTypeId associated with the PlantCode. You could use max(PTypeId) instead which gets the highest value if you wanted: for 'USA' min will give you 11 and max will give you 12.
Having selected that data you can either write some code (C#, C++, java, whatever) to read through the results row by row and insert new data into table2 and table3. I'm not going to show that, but I'll show how the do it using pure SQL.
insert into table2 (PTypeId, PTypeName, PRID)
select PTypeId, 'YourChoiceOfName', 24 -- set PRID to 24 for all
from
(
select PlantCode, min(PTypeId) as PTypeId
from table1
group by PlantCode
) x;
and follow that with a similar insert.... select... for table3.
Hope that helps.
I need some advice of how to setup my tables I currently have a product table and a product codes table.
In the codes table I have an id and a title such as:
1 567902
2 345789
3 345678
there can be many items in this table.
In my product table I have the usual product id,title, etc but also a code id column that I'm currently storing a comma separate list of ids for any codes the product needs to reference.
in that column I could end up with ids like: 2,5,6,9
I'm going to need to be able to search the products table looking for code ids for a specific set this is where I've come into problems trying to use id IN ($var) or FIND_IN_SET is proving problematic I've been advised to restructure it I'm happy to do just wondering what the best method would be.
Sounds like you have two choices. If this is a 1 to many relationship, then you need to have the foreign key in the code table, not the product table.
i.e.
codeId code productId
1 567902 2
2 345789 6
3 345678 9
4 345690 9
The other option is to have another table which contains productId and codeId (both as foreign keys), this is a many-to-many relationship. This is what you should go for if a code can be assigned to multiple products (I assume not). It will look something like this:
codeId productId
1 2
1 10
2 6
3 9
4 9
I think the first option is what you need.
I have a table User that stores user information - such as name, date of birth, locations, etc.
I have also created a link table called User_Options - for the purpose of storing multi-value attributes - this basically stores the checkbox selections.
I have a front-end form for the user to fill in and create their user profile. Here are the tables I have created to generate the checkbox options:
Table User_Attributes
=====================
id attribute_name
---------------------
1 Hobbies
2 Music
Table User_Attribute_Options
======================================
id user_attribute_id option_name
--------------------------------------
1 1 Reading
2 1 Sports
3 1 Travelling
4 2 Rock
5 2 Pop
6 2 Dance
So, on the front-end form there are two sets of checkbox options - one set for Hobbies and one set for Music.
And here are the User tables:
Table User
========================
id name age
------------------------
1 John 25
2 Mark 32
Table User_Options
==================================================
id user_id user_attribute_id value
--------------------------------------------------
1 1 1 1
2 1 1 2
3 1 2 4
4 1 2 5
5 2 1 2
6 2 2 4
(in the above table 'user_attribute_id' is the ID of the parent attribute and 'value' is the ID of the attribute option).
So I'm not sure that I've done all this correctly, or efficiently. I know there is a method of storing hierarchical data in the same table but I prefer to keep things separate.
My main concern is with the User_Options table - the idea behind this is that there only needs to be one link table that stores multi-value attributes, rather than have a table for each and every multi-value attribute.
The only thing I can see that I'd change is that in the association table, User_Options, you have an id that doesn't seem to serve a purpose. The primary key for that table would be all three columns, and I don't think you'd be referring to the options a user has by an id--you'd be getting them by user_id/user_attribute_id. For example, give me all the user options where user is 1 and user attribute id is 2. Having those records uniquely keyed with an additional field seems extraneous.
I think otherwise the general shape of the tables and their relationships looks right to me.
There's nothing wrong with how you've done it.
It's possible to make things more extensible at the price of more linked table references (and in the composition of your queries). It's also possible to make things flatter, and less extensible and flexible, but your queries will be faster.
But, as is usually the case, there's more than one way to do it.