I am working on a database in MySQL to show multivalued attributes. I am trying to find a way to create a parent and child table. The idea that I am working with is having an employee table and a hobby table. The hobbies in the hobby table would be considered the multivalued attributes since employees can have more than one hobby. My question would be, when creating these tables, should I use 3NF and add a 3 table to show the relation between the two or is there a way to implement this idea with simply two tables. I’m not very familiar with multivalued attributes so I need help creating the tables as far as what kind of keys each would use, as well as the end form. I believe I would need to make it in 3NF form and have the hobbies as the multivalued attribute, but have the hobby id from the hobby table as a primary key and the employee id as another primary key and then making the relational table contain both the employee id and hobby id as foreign keys referencing the other two tables. Any help or suggestions to show multivalued attributes would be greatly appreciated!
You have a table for employees already. It probably has a primary key we'll call employee_id.
You need a table for hobbies. It will need a primary key we'll call hobby_id.
Then, you need a way to relate employees and hobbies many-to-many. That's implemented with a third table, let's call it employees_hobbies. Using a name like that is a good idea, because the next guy to work on your code will recognize its purpose right away.
employees_hobbies should have two columns, employee_id and hobby_id. Those two columns together should be the composite primary key. Then, to confer a hobby on an employee, you add a row to employees_hobbies containing the two id values. If an employee drops a hobby, you delete the row.
If you want a list of employees showing their hobbies, you do this
SELECT e.name, GROUP_CONCAT(h.hobbyname) hobbies
FROM employees e
LEFT JOIN employees_hobbies eh ON e.employee_id = eh.employee_id
LEFT JOIN hobbies h ON eh.hobby_id = h.hobby_id
GROUP BY e.employee_id, e.name
Use LEFT JOIN operations here to keep employees without any hobbies (all work and no play) in your list.
If you want to find the most common five hobbies and the employees doing them, try this
SELECT COUNT(*) hobbycount, h.hobbyname,
GROUP_CONCAT(e.name ORDER BY e.name) people
FROM hobbies h
LEFT JOIN employees_hobbies eh ON h.hobby_id = eh.hobby_id
LEFT JOIN employees e ON eh.employee_id = e.employee_id
GROUP BY h.hobbyname
ORDER BY 1 DESC
LIMIT 5
This way of handling many-to-many relationships gives you all kinds of ways of slicing and dicing your data.
MySQL is made for this sort of thing and handles it very efficiently at small scale and large, opinions to the contrary notwithstanding.
(Avoid putting a surrogate primary id key into your employees_hobbies table. It adds no value.)
mva is not very good for mysql .
and best way to store it depend from price you can pay and accessibility need . if you need index , because database is big , and highly loaded , then possible you will need 2 tables .
employee( id , name )
employee_hobbies ( id , employeeid , hobbyid )
but in simplest case , or if you need good accessibility, you can just add text field to employee table , store there comma separated hobbyid , and then select by FIND_IN_SET() function .
e.q. single table
employee( id , name , MVA VARCHAR(512) )
you need be sure that all ids comma separated will fit into fields , side .
SELECT * from employee where FIND_IN_SET(some_hobbyid , MVA)
advantage of this method is less queries ,
disadvantage - may slower then 1st .
also there is advantages for high load system , when import into sphinx ... but this is another story ...
Related
i am trying to implement a database which has multi valued attributes and create a filter based search. For example i want my people_table to contain id, name, address, hobbies, interests (hobbies and interests are multi-valued). The user will be able to check many attributes and sql will return only those who have all of them.
I made my study and i found some ways to implement this but i can't decide which one is the best.
The first one is to have one table with the basic info of people (id, name, address), two more for the multi-valued attributes and one more which contains only the keys of the other tables (i understand how to create this tables, i don't know yet how to implement the search).
The second one is to have one table with the basic info and then one for each attribute. So i will have 20 or more tables (football, paint, golf, music, hiking etc.) which they only contain the ids of the people. Then when the user checks the hobbies and the activities i am going to get the desired results with the use of the JOIN feature (i am not sure about the complexity, so i don't know how fast is going to be if the user do many checks).
The last one is an implementation that i didn't find on internet (and i know there is a reason :) ) but in my mind is the easiest to implement and the fastest in terms of complexity. Use only one table which will have the basic infos as normal and also all the attributes as boolean variables. So if i have 1000 people in my table there are going to be only 1000 loops and which i imagine with the use of AND condition are going to be fast enough.
So my question is: can i use the the third implementation or there is a big disadvantage that i don't get? And also which one of the first two ways do you suggest me to use?
That is a typical n to m relation. It works like this
persons table
------------
id
name
address
interests table
---------------
id
name
person_interests table
----------------------
person_id
interest_id
person_interests contains a record for each interest of a person. To get the interests of a person do:
select i.name
from interests i
join person_interests pi on pi.interest_id = i.id
join persons p on pi.person_id = p.id
where p.name = 'peter'
You could create also tables for hobbies. To get the hobbies do the same in a separate query. To get both in one query you can do something like this
select p.id, p.name,
i.name as interest,
h.name as hobby
from persons p
left join person_interests pi on pi.person_id = p.id
left join interests i on pi.interest_id = i.id
left join person_hobbies ph on ph.person_id = p.id
left join hobbies h on ph.hobby_id = h.id
where p.name = 'peter'
The basic way to deal with this is with a many-to-many join table. Each user can have many hobbies. Each hobby can have many users. That's basic stuff you can find information about anywhere, and #juergend already covered that.
The harder part is tracking different information about various hobbies and interests. Like if their hobby is "baseball" you might want to track what position they play, but if their hobby is "travel" you might want to track their favorite countries. Doing this with typical SQL relationships will lead to a rapid proliferation of tables and columns.
A hybrid approach is to use the new JSON data type to store some unstructured data. To expand on #juergend's example, you might add a field to Person_Interests which can store some of those details about that person's interest.
create table Person_Interests (
InterestID integer references Interests(ID),
PersonID integer references Persons(ID),
Details JSON
);
And now you could add that Person 45 has Interest 12 (travel), their favorite country is Djibouti, and they've been to 45 countries.
insert into person_interests
(InterestID, PersonID, Details)
(12, 45, '{"favorite_country": "Djibouti", "countries_visited": 45}');
And you can use JSON search functions to find, for example, everyone whose favorite country is Djibouti.
select p.id, p.name
from person_interests pi
join persons p on p.id = pi.personid
where pi.details->"$.favorite_country" = "Djibouti"
The advantage here is flexibility: interests and their attributes aren't limited by your database schema.
The disadvantages is performance. The JSON data type isn't the most efficient, and indexing a JSON column in MySQL is complicated. Good indexing is critical to good SQL performance. So as you figure out common patterns you might want to turn commonly used attributes into real columns in real tables.
The other option would be to use table inheritance. This is a feature of Postgres, not MySQL, and I'd recommend considering switching. Postgres also has better and more mature JSON support and JSON columns are easier to index.
With table inheritance, rather than having to write a completely new table for every different interest, you can make specific tables which inherit from a more generic one.
create table person_interests_travel (
FavoriteCountry text,
CountriesVisited text[]
) inherits(person_interests);
This still has InterestID, PersonID, and Details, but it's added some specific columns for tracking their favorite country and countries they've visited.
Note that text[]. Postgresql also supports arrays so you can store real lists without having to create another join table. You can also do this in MySQL with a JSON field, but arrays offer type constraints that JSON does not.
I have two tables, one named as employee , contains the details of an employee, with the primary keys as employee_id and employee_name.
The other named as assignment, with primary key as assign_id.
Now, there are two columns in the table employee. One is preference_1 and other is preference_2. These both can contain the assign_id from table assignment. Preference 1 has to be filled by all the employees and preference 2 is optional but no more than two preferences should be allowed.
How do I link both these tables ?
preference_1 and preference_2 should be two separate tables, not inside employee table. You can have an employee_ID inside the pref_1, pref_2 AND assignment tables.
Maybe something like this
SELECT e.employee_name, a1.assigment_name AS firstPreference, a2.assignment_name AS secondPreference
FROM employee e
JOIN assignment a1 ON e.preference_1 = a1.assign_id
LEFT JOIN assignment a2 ON e.preference_2 = a2.assign_id
Since, preference_2 is optional; make preference_1 (OR) preference_1/preference_2 foreign keys to assignement's assignment_id. Means, preference_1/preference_2 references assignment(assignment_id).
Talking about the relationship, to me it looks like a many-many relationship; cause a employee may work on multiple assignment and similarly a single assignment may be assigned to more than one employee.
Table creation:
Put NULL constraint for preference_1 since it must be filled by employee; like preference_1 int not null but make preference_2 a nullable column like preference_2 int null.
That will make sure, every employee has at least one assignment. If optional pref_2 is filled then he/she will work on 2 assignment.
While querying, you can do a join like
select e.* from employee e join assignment a on
e.preference_1 = a.assignment_id
and isnull(e.preference_2,0) = a.assignment_id
I'm creating a table for a college major. The table is called major. The columns will be majorID, majorName, and requiredCourses.
In Access how can I make requiredCourses a multivalue field? Required courses will be around 20 courses.
Thank you for your help.
You need to create a one-to-many relationship. The way is is usually done is like this:
You need to create a new table for the courses. Call it course. The table will contain CourseID, CourseName, etc. CourceID will be a primary key of this table
You will need to create another table that will act as a link between your major and your course tables. The table can be called something like majorCourses. The table will contain at least these two fields: majorID and courseID (you can of course add more fields, like dateAdded, isInactive, etc.).
To link your tables you will need to JOIN these tables, like this:
SELECT m.majorID, m.majorName, c.courseID, c.CourseName
FROM major m
INNER JOIN (majorCourses mc INNER JOIN course c ON mc.courseID = c.courseID)
ON m.majorID = mc.majorID
I'm working on a personal project for timekeeping on various projects, but I'm not sure of the best way to structure my database.
A simplified breakdown of the structure is as follows:
Each client can have multiple reports.
Each report can have multiple line items.
Each line item can have multiple time records.
There will ultimately be more relationships, but that's the basis of the application. As you can see, each item is related to the item beneath it in a one-to-many relationship.
My question is, should I relate each table to each "parent" table above it? Something like this:
clients
id
reports
id
client_id
line_items
id
report_id
client_id
time_records
id
report_id
line_item_id
client_id
And as it cascaded down, there would be more and more foreign keys added to each new table.
My initial reaction is that this is not the correct way to do it, but I would love to get some second(and third!) opinions.
The advantage of the way you're doing it is that you could check all time records for, say, a specific client id without needing a join. But really, it isn't necessary. All you need is to store a reference back up one "level" so to speak. Here are some examples from the "client" perspective:
To get a specific client's reports: (simple; same as current schema you suggest)
SELECT * FROM `reports`
WHERE `client_id` = ?;
To get a specific client's line items: (new schema; don't need "client_id" in table)
SELECT `line_items`.* FROM `line_items`
JOIN `reports` ON `reports`.`id` = `line_items`.`id`
JOIN `clients` ON `clients`.`id` = `reports`.`client_id`
WHERE `clients`.`id` = ?;
To get a specific client's time entries: (new schema; don't need "client_id" or "report_id" in table)
SELECT `time_records`.* FROM `time_records`
JOIN `line_items` ON `line_items`.`id` = `time_records`.`line_item_id`
JOIN `reports` ON `reports`.`id` = `line_items`.`id`
JOIN `clients` ON `clients`.`id` = `reports`.`client_id`
WHERE `client_id` = ?;
So, the revised schema would be:
clients
id
reports
id
client_id
line_items
id
report_id
time_records
id
line_item_id
EDIT:
Additionally, I would consider using views to simplify the queries (I assume you'll use them often), definitely creating indexes on the join columns, and utilizing foreign key references for normalization (InnoDB only).
No, if there is no direct relation in the elements of the model, then there should not be direct relation in the corresponding tables. Otherwise your data will have redundancies and you will have problems for updating.
This is the right way:
clients
id
reports
id
client_id
line_items
id
report_id
time_records
id
line_id
You don't need to create client_id on line_items table if you never join line items directly clients, becouse you can get that by reports table. Same happens to others FKs.
I recommend you think in your report needs/queries over this collection of data before create redundant foreign keys who can complicate your development.
Create redundant FKs is not difficult if you need them in the future, some ALTERS and UPDATE SELECTS solves your problem.
If you not have so much information in the line_items, you can denormalize and add this info in the time_records.
Anywhere there is a direct relationship between two tables, you should use foreign keys to keep the data integrity. Personally, I would look at a structure like this:
Client
ClientId
Report
ReportId
ClientId
LineItem
LineItemId
ReportId
TimeRecord
TimeRecordId
LineItemId
In this example, you do not need ClientId in LineItem because you have that relationship through the Report table. The major disadvantage of having ClientId in all of your tables is that if the business logic does not enforce consistency of these values (a bug is in the code) you can run into situations where you get different values if you search based on
Report:
ReportId = 3
ClientId = 2
LineItem:
LineItemId = 1
ReportId = 3
ClientId = 3
In the above situation, you would be looking at ClientId = 2 if your query went through Report and ClientId = 3 if your query went through LineItem It is difficult once this happens to determine which relationship is correct, and where the bug is.
Also, I would advocate for not having id columns, but instead more explicit names to describe what the id is used for. (ReportId or ClientId) In my opinion, this makes Joins easier to read. As an example:
SELECT COUNT(1) AS NumberOfLineItems
FROM Client AS c
INNER JOIN Report AS r ON c.ClientId = r.ClientId
INNER JOIN LineItem AS li ON r.ReportId = li.ReportId
WHERE c.ClientId = 12
As personal opinion, I would have:
clients
id
time_records
id
client_id
report
line_item
report_id
That way all of your fields are over in the time_records table. You can then do something like:
SELECT *
FROM 'time_records'
WHERE 'time_records'.'client_id' = 16542
AND 'time_records'.'report' = 164652
ORDER BY 'time_records'.'id' ASC
I'm wondering if this its even posible.
I want to join 2 tables based on the data of table 1.
Example table 1 has column food with its data beeing "hotdog".
And I have a table called hotdog.
IS it possible to do a JOIN like.
SELECT * FROM table1 t join t.food on id = foodid
I know it doesnt work but, its even posible, is there a work arround?.
Thanks in advance.
No, you can't join to a different table per row in table1, not even with dynamic SQL as #Cade Roux suggests.
You could join to the hotdog table for rows where food is 'hotdog' and join to other tables for other specific values of food.
SELECT * FROM table1 JOIN hotdog ON id = foodid WHERE food = 'hotdog'
UNION
SELECT * FROM table1 JOIN apples ON id = foodid WHERE food = 'apples'
UNION
SELECT * FROM table1 JOIN soups ON id = foodid WHERE food = 'soup'
UNION
...
This requires that you know all the distinct values of food, and that all the respective food tables have compatible columns so you can UNION them together.
What you're doing is called polymorphic associations. That is, the foreign key in table1 references rows in multiple "parent" tables, depending on the value in another column of table1. This is a common design mistake of relational database programmers.
For alternative solutions, see my answers to:
Possible to do a MySQL foreign key to one of two possible tables?
Why can you not have a foreign key in a polymorphic association?
I also cover solutions for polymorphic associations in my presentation Practical Object Oriented Models In SQL, and in my book SQL Antipatterns Volume 1: Avoiding the Pitfalls of Database Programming.
Only with dynamic SQL. It is also possible to left join many different tables and use CASE based on type, but the tables would be all have to be known in advance.
It would be easier to recommend an appropriate design if we knew more about what you are trying to achieve, what your design currently looks like and why you've chosen that particular table design in the first place.
-- Say you have a table of foods:
id INT
foodtype VARCHAR(50) (right now it just contains 'hotdog' or 'hamburger')
name VARCHAR(50)
-- Then hotdogs:
id INT
length INT
width INT
-- Then hamburgers:
id INT
radius INT
thickness INT
Normally I would recommend some system for constraining only one auxiliary table to exist, but for simplicity, I'm leaving that out.
SELECT f.*, hd.length, hd.width, hb.radius, hb.thickness
FROM foods f
LEFT JOIN hotdogs hd
ON hd.id = f.id
AND f.foodtype = 'hotdog'
LEFT JOIN hamburgers hb
ON hb.id = f.id
AND f.foodtype = 'hamburger'
Now you will see that such a thing can be code generated (or even for a very slow prototype dynamic SQL on the fly) from SELECT DISTINCT foodtype FROM foods given certain assumptions about table names and access to the table metadata.
The problem is that ultimately whoever consumes the result of this query will have to be aware of new columns showing up whenever a new table is added.
So the question moves back to your client/consumer of the data - how is it going to handle the different types? And what does it mean for different types to be in the same set? And if it needs to be aware of the different types, what's the drawback of just writing different queries for each type or changing a manual query when new types are added given the relative impact of such a change anyway?