I am using MySQL workbench to manage my a database that was handed down to me for a development task. Unfortunately, the schema is a nightmare: no primary keys for numerous tables, lots of column duplication, etc.
First off, I wanted to add some uniqueness so that I can begin normalizing somehow. I have a 'students' table where a student (with an ID) works on a project that belongs to a specific term (Fall 2014, Spring 2015, etc.)
Since the same student can work on the same project two semesters in a row for whatever reason, that only way to tell them apart would be to have a (student ID, term) PK. So I actually do need that composite PK.
How might I alter the existing tables and set a composite PK?
EDIT:
To clarify: the schema contains a users table with actual student information (First/Last name, Email, Term). The students table would more aptly be named projects, as it references only the students by ID and then lists the project they worked on, in the semester that they worked on it. So at the very least, students.id would also be a FK from users.
I know the above doesn't quite make any sense; I'm just trying to keep this to one step at a time because the application depends on the schema and I don't want to introduce any new bugs at this point.
To clarify even further, here is how the users and students tables look like:
students
id project termInitiated ...
20 XYZ Summer 2013
20 XYZ Fall 2013
23 ABC Fall 2013
24 ABC Fall 2014
...
users
studentId firstName lastName termInitiated
20 A AA Summer 2013
20 A AA Fall 2013
23 Z ZZ Fall 2013
24 Y YY Fall 2014
...
Unfortunately, due to the way it is setup, I cannot have studentId be a PK by itself as the same student could be working on the same project multiple semesters in a row.
The best fix to this would be a globally unique identifier that could refer to the same student in different terms, but this would introduce a huge amount of bugs right now that I do not have the time to fix. Thus, I think that a composite PK would be the best solution given my limitations.
You may need to grant yourself the alter privilege if the composite key is not being added take a look at this: https://kb.mediatemple.net/questions/788/How+do+I+grant+privileges+in+MySQL%3F#dv
Here is adrian's link :ALTER TABLE to add a composite primary key
My suggestion is that, add a new field in your table for primary key with the unique combination of two fields.
For example in your case, add a field suppose pid in students table:
ALTER TABLE students
ADD pid VARCHAR(100) UNIQUE PRIMARY KEY;
And then update this table:
UPDATE students SET pid=CONCAT(id, '-', termInitiated);
So you will have unique combination of id and termInitiated in pid field.
You can use it as primary key and if you want to select it or join it with other table, you can reference it with combination of two field.
For example,
If you want to join student table with users, you can join it like:
SELECT * FROM users
INNER JOIN students
ON CONCAT(users.studentId, '-', termInitiated) = student.pid);
I hope this will work for you.
Please correct/suggest me, I am wrong.
Thank you.
Related
I have a table as such:
id entity_id first_year last_year sessions_attended age
1 2020 1996 2008 3 34.7
2 2024 1993 2005 2 45.1
3 ... ... ...
id is auto-increment primary key, and entity_id is a foreign key that must be unique for the table.
I have a query that calculates first and last year of attendance, and I want to be able to update this table with fresh data each time it is run, only updating the first and last year columns:
This is my insert/update for "first year":
insert into my_table (entity_id, first_year)
( select contact_id, #sd:= year(start_date)
from
( select contact_id, event_id, start_date from participations
join events on participations.event_id = events.id where events.event_type_id = 7
group by contact_id order by event_id ASC) as starter)
ON DUPLICATE KEY UPDATE first_year_85 = #sd;
I have one similar that does "last year", identical except for the target column and the order by.
The queries alone return the desired values, but I am having issues with the insert/update queries. When I run them, I end up with the same values for both fields (the correct first_year value).
Does anything stand out as the cause for this?
Anecdotal Note: This seems to work on MySQL 5.5.54, but when run on my local MariaDB, it just exhibits the above behavior...
Update:
Not my table design to dictate. This is a CRM that allows custom fields to be defined by end-users, I am populating the data via external queries.
The participations table holds all event registrations for all entity_ids, but the start dates are held in a separate events table, hence the join.
The variable is there because the ON DUPLICATE UPDATE will not accept a reference to the column without it.
Age is actually slightly more involved: It is age by the start date of the next active event of a certain type.
Fields are being "hard" updated as the values in this table are being pulled by in-CRM reports and searches, they need to be present, can't be dynamically calculated.
Since you have a 'natural' PK (entity_id), why have the id?
age? Are you going to have to change that column daily, or at least monthly? Not a good design. It would be better to have the constant birth_date in the table, then compute the ages in SELECT.
"calculates first and last year of attendance" -- This implies you have a table that lists all years of attendance (yoa)? If so, MAX(yoa) and MIN(yoa) would probably a better way to compute things.
One rarely needs #variables in queries.
Munch on my comments; come back for more thoughts after you provide a new query, SHOW CREATE TABLE, EXPLAIN, and some sample data.
In MySQL, I was advised to store the multiple choice options for "Drugs" as a separate table user_drug where each row is one of the options selected by a particular user. I was also advised to create a 3rd table drug that describes each option selected in table user_drug. Here is an example:
user
id name income
1 Foo 10000
2 Bar 20000
3 Baz 30000
drug
id name
1 Marijuana
2 Cocaine
3 Heroin
user_drug
user_id drug_id
1 1
1 2
2 1
2 3
3 3
As you can see, table user_drug can contain the multiple drugs selected by a particular user, and table drug tells you what drug each drug_id is referring to.
I was told a Foreign Key should tie tables user_drug and drug together, but I've never dealt with Foreign Key's so I'm not sure how to do that.
Wouldn't it be easier to get rid of the drug table and simply store the TEXT value of each drug in user_drug? Why or why not?
If adding the 3rd table drug is better, then how would I implement the Foreign Key structure, and how would I normally retrieve the respective values using those Foreign Keys?
(I find it far easier to use just 2 tables, but I've heard Foreign Keys are helpful in that they ensure a proper value is entered, and that it is also a lot faster to search and sort for a drug_id than a text value, so I want to be sure.)
Wouldn't it be easier to get rid of the drug table and simply store the TEXT value of each drug in user_drug? Why or why not?
Easier, yes.
But not better.
Your data would not be normalized, wasting lots of space to store the table.
The index on that field would occupy way more space again wasting space and slowing things down.
If you want to query a drop-down list of possible values, that's trivial with a separate table, hard (read: slow) with just text in a field.
If you just drop text fields in 1 table, it's hard to ensure misspellings do not get in there, with a separate link table preventing misspellings is easy.
If adding the 3rd table drug is better, then how would I implement the Foreign Key structure
ALTER TABLE user_drug ADD FOREIGN KEY fk_drug(drug_id) REFERENCES drug(id);
and how would I normally retrieve the respective values using those Foreign Keys?
SELECT u.name, d.name as drug
FROM user u
INNER JOIN user_drug ud ON (ud.user_id = u.id)
INNER JOIN drug d ON (d.id = ud.drug_id)
Don't forget to declare the primary key for table user_drug as
PRIMARY KEY (user_id, drug_id)
Alternatively
You can use an enum
CREATE TABLE example (
id UNSIGNED INTEGER NOT NULL PRIMARY KEY AUTO_INCREMENT,
example ENUM('value1','value2','value3'),
other_fields .....
You don't get all the benefits of a separate table, but if you just want a few values (e.g. yes/no or male/female/unknown) and you want to make sure it's limited to only those values it's a good compromise.
And much more self documenting and robust than magic constants (1=male, 2=female, 3= unknown,... but what happens if we insert 4?)
Wouldn't it be easier to get rid of the drug table and simply store
the TEXT value of each drug in user_drug? Why or why not?
Normally, you'd have lots of other columns on the drug table -- things like description, medical information, chemical properties, etc. In that case, you wouldn't want to duplicate all of that information on every record of the user_drug table. In this particular case however, you've only got one column, so that issue is not really a big deal.
Also, you want to be sure that the drug referenced in the user_drug table actually exists. For example, if you store the field as text, then you could have heroin and its related misspellings like haroin or herion. This will give you problems when you try to select all heroin records later. Using a foreign key to a lookup table forces the id to exist in that table, so you can be absolutely sure that all references to heroin are accurate.
I have created a 'shops' and a 'customers' table and an intermediate table customers_shops. Every shop has a site_url web address, except that some customers use an alternative url to access the shop's site (this url is unique to a particular customer).
In the intermediate table below, I have added an additional field, shop_site_url. My understanding is that this is in 2nd normalised form, as the shop_site_url field is unique to a particular customer and shop (therefore won't be duplicated for different customers/shops). Also, since it depends on customer and shop, I think this is in 3rd normalised form. I'm just not used to using the 'mapping' table (customers_shops) to contain additional fields - does the design below make sense, or should I reserve the intermediate tables purely as a to convert many-to-many relationships to one-to-one?
######
customers
######
id INT(11) NOT NULL PRIMARY KEY
name VARCHAR(80) NOT NULL
######
shops
######
id INT(11) NOT NULL PRIMARY KEY
site_url TEXT
######
customers_shops
######
id INT(11) NOT NULL PRIMARY KEY
customer_id INT(11) NOT NULL
shop_id INT(11) NOT NULL
shop_site_url TEXT //added for a specific url for customer
Thanks
What you are calling an "intermediate" table is not a special type of table. There is only one kind of table and the same design principles ought to be applicable to all.
Well, let's create the table, insert some sample data, and look at the results.
id cust_id shop_id shop_site_url
--
1 1000 2000 NULL
2 1000 2000 http://here-an-url.com
3 1000 2000 http://there-an-url.com
4 1000 2000 http://everywhere-an-url-url.com
5 1001 2000 NULL
6 1001 2000 http://here-an-url.com
7 1001 2000 http://there-an-url.com
8 1001 2000 http://everywhere-an-url-url.com
Hmm. That doesn't look good. Let's ignore the alternative URL for a minute. To create a table that resolves a m:n relationship, you need a constraint on the columns that make up the m:n relationship.
create table customers_shops (
customer_id integer not null references customers (customer_id),
shop_id integer not null references shops (shop_id),
primary key (customer_id, shop_id)
);
(I dropped the "id" column, because it tends to obscure what's going on. You can add it later, if you like.)
Insert some sample data . . . then
select customer_id as cust_id, shop_id
from customers_shops;
cust_id shop_id
--
1000 2000
1001 2000
1000 2001
1001 2001
That's closer. You should have only one row for each combination of customer and shop in this kind of table. (This is useful data even without the url.) Now what do we do about the alternative URLs? That depends on a couple of things.
Do customers access the sites through
only one URL, or might they use more
than one?
If the answer is "only one", then you can add a column to this table for the URL, and make that column unique. It's a candidate key for this table.
If the answer is "more than one--at the very least the site url and the alternative url", then you need to make more decisions about constraints, because altering this table to allow multiple urls for each combination of customer and shop cuts across the grain of this requirement:
the shop_site_url field is unique to a
particular customer and shop
(therefore won't be duplicated for
different customers/shops)
Essentially, I'm asking you to decide what this table means--to define the table's predicate. For example, these two different predicates lead to different table structures.
customer 'n' has visited the web site
for shop 'm' using url 's'
customer 'n' is allowed to visit the
web site for shop 'm' using alternate
url 's'
Your schema does indeed make sense, as shop_site_url is an attribute of the relationship itself. You might want to give it a more meaningful name in order to distinguish it from shops.site_url.
Where else would you put this information? It's not an attribute of a shop, and it's not an attribute of a customer. You could put this in a separate table, if you wanted to avoid having a NULLable column, but you'd end up having to have a reference to your intermediate table from this new table, which probably would look even weirder to you.
Relationships can have attributes, just like entities can have attributes.
Entity attributes go into columns in entity tables. Relationship attributes, at least for many-to-many relationships, go in relationship tables.
It sounds as though, in general, URL is determined by the combination of shop and customer. So I would put it in the shop-customer table. The fact that many shops have only one URL suggests that there is a fifth normal form that is more subtle than this. But I'm too lazy to work it out.
I need a schema for fitness class.
The booking system needs to store max-number of students it can take, number of students who booked to join the class, students ids, datetime etc.
A student table needs to store classes which he/she booked. But this may not need if I store students ids in class tables.
I am hoping to get some good ideas.
Thanks in advance.
Student: ID, Name, ...
Class: ID, Name, MaxStudents, ...
Student_in_Class: STUDENT_ID, CLASS_ID, DATE_ENROLL
*Not a mySql guru, I typically deal w/ MS SQL, but I think you'll get the idea. You might need to dig a little in the mySql docs to find appropriate data types that match the ones I've suggested. Also, I only gave brief explanation for some types to clarify what they're for, since this is mySql and not MS SQL.
Class_Enrollment - stores the classes each student is registered for
Class_Enrollment_ID INT IDENTITY PK ("identity is made specifically
to serve as an id and it's a field that the system will manage
on its own. It automatically gets updated when a new record is
created. I would try to find something similar in mySql")
Class_ID INT FK
Student_ID INT FK
Date_Time smalldatetime ("smalldatetime just stores the date as a
smaller range of years than datetime + time up to minutes")
put a unique constraint index on class_id and student_id to prevent duplicates
Class - stores your classes
Class_ID INT IDENTITY PK
Name VARCHAR('size') UNIQUE CONSTRAINT INDEX ("UNIQUE CONSTRAINT INDEX is
like a PK, but you can have more than one in a table")
Max_Enrollment INT ("unless you have a different max for different sessions
of the same class, then you only need to define max enrollment once per
class, so it belongs in the class table, not the Class_Enrollment table")
Student - stores your students
Student_ID INT IDENTITY PK
First_Name VARCHAR('size')
Last_Name VARCHAR('size')
Date_of_Birth smalldatetime ("smalldatetime can store just the date,
will automatically put 0's for the time, works fine")
put a unique constraint index on fname, lname, and date of birth to eliminate duplicates (you may have two John Smiths, but two John Smiths w/ exact same birth date in same database is unlikely unless it's a very large database. Otherwise, consider using first name, last name, and phone as a unique constraint)
Newish to mysql DBs here. I have a table of USERS and a table of TEAMS. A user can be on more then one team. What's the best way to store the relationship between a user and what teams he's on?
Lets say there are hundreds of teams, each team consists of about 20 users, and on average a user could be on about 10 teams, also note that users can change teams from time to time.
I can think of possibly adding a column to my TEAMS table which holds a list of user ids, but then i'd have to add a column to my USERS table which holds a list of team ids. Although this might be a solution it seems messy for updating membership. It seems like there might be a smarter way to handle this information... Like another table perhaps? Thoughts?
Thanks!
ps, whats the best field type for storing a list, and whats the best way to delimit?
whats the best field type for storing a list, and whats the best way to delimit?
It's usually a really bad idea to try to store multiple values in a single column. It's hell to process and you'll never get proper referential integrity.
What you're really looking for is a join table. For example:
CREATE TABLE user_teams (
user_id INT NOT NULL FOREIGN KEY REFERENCES users(id),
team_id INT NOT NULL FOREIGN KEY REFERENCES teams(id),
PRIMARY KEY (user_id, team_id)
);
so there can be any number of team_ids for one user and any number of user_ids for one team. (But the primary key ensures there aren't duplicate mappings of the same user-and-team.)
Then to select team details for a user you could say something like:
SELECT teams.*
FROM user_teams
JOIN teams ON teams.id= user_teams.team_id
WHERE user_teams.user_id= (...some id...);