Appropriate use of a junction table? - mysql

I need to record or match one or more rows in one table to one or more rows in another table. I think this works with a third table (a junction table), but not certain.
Not concerned about speed or anything fancy like that atm, just a reliable table design. The two important reporting elements here are:
being able to generate paycheque hours AND a list of tasks performed on same report
to tell who worked on each task (since the data may come in without individual hours (ie a team spent x number of hours on the task), we won't always be able to narrow down exactly how much each employee on a team spent on one task, we are fine with this limitation).
Here is an example:
"HOURS TABLE" records hours worked.
Fred and Joe each work 8 hours on day 1 - so this is two rows in the db
Frank worked 8 hours on day 2
"HOURS TABLE"
eeid hours junction
1 8 1
2 8 1
3 8 2
Second table records what was worked on (Called units table but actually want to record the hours to a task).
Day 1, Fred and Joe built a bench (12hrs), drove around (3hrs) and cleaned up the shop (1hr) - So this is three rows in the table "TASK TABLE"
Day 2, Marilyn spent 8 hours on the same bench, one row in "TASK TABLE"
"TASK TABLE"
ItemID hours junction
1 12 1
2 3 1
3 1 1
1 8 2
Third table is the "JUNCTION TABLE", serves no other purpose except to tie it all together, this is the junction table
"JUNCTION TABLE"
ID
1
2
Issues :
Fred and joe might be a team of 1 or a team of 10.
Multiple employees may work on the same things (like building a bench), and need to tie each row in the "TASK TABLE" to the row(s) in the "HOURS TABLE"
Not sure the junction table should just be a reference, wikipedia shows an example where it is actually storing data, but i don't think that would be appropriate here
I don't know how I will write the query to do my reports mostly because i have never used a junction table before, can't imagine it is that much harder.....
I suppose I could suggest that we force the employees to give separate time sheets, this would remove all of these challenges, as one record could record the hours per task and those records could be summed by day to get the daily hours. - Don't know how bad a junction table is to know if this should be suggested.

Many-to-many relationships in SQL require a JOIN table. Foreign key constraints can only model 1-to-many relationships, so you need one for each side.
Your requirements are confusing, but here's a simple example where one or more USER can be assigned to one or more UNIT:
create table user (
id int not null auto_increment,
primary key(id)
);
create table unit (
id int not null auto_increment,
primary key(id)
);
create table user_unit (
user_id int,
unit_id int,
primary key(user_id, unit_id),
foreign key user_id references user(id),
foreign key unit_id references unit(id)
);

Related

Disadvantage of "combined" lookup table in mySQL vs individual lookup tables

Are there big disadvantages (maybe in query speed etc.) of using only ONE combined lookup table (mySQL database) to store "links" between tables over having individual lookup tables? I am asking because in my project scenario I would end up with over a hundred individual lookup tables, which I assume will be a lot of work to setup and maintain. But to make an easier example here is a simplified scenario between only 4 tables:
Table: teacher
teacherID
name
1
Mr. X
2
Mrs. Y
Table: student
studentID
name
4
Tom
5
Chris
Table: class
classID
name
7
Class A
8
Class B
Table: languageSpoken
languageSpokenID
name
10
English
11
German
======================= INDIVIDUAL LOOKUP TABLES ==========================
Table: student_teacher
studentID
teacherID
4
1
5
1
Table: student_class
studentID
classID
4
7
5
8
Table: student_languageSpoken
studentID
languageSpokenID
4
10
4
11
====== VS ONE COMBINED LOOKUP TABLE (with one helper table) =====
helper table: allTables
tableID
name
1
teacher
2
student
3
class
4
languageSpoken
table: lookupTable
table_A
ID_A
table_B
ID_B
1
1
2
4
1
1
2
5
3
7
2
4
3
8
2
5
Your 2nd lookup schema is absolutely unuseful.
You refer to a table by its name/index. But you cannot use this relation directly (tablename cannot be parametrized), you need to build conditional joining expression or use dynamic SQL. This is slower.
Your lookup table is reversable, i.e. the same reference may be written by 2 ways. Of course, you may add CHECK constraint like CHECK table_A < table_B (additionally it avoids self-references), but this again degrades the performance.
Your lookup does not prevent non-existent relations (for example, class and language are not related but nothing prevents to create a row for such relation). Again, additional constraint and decreased performance.
There are more disadvantages... but I'm too lazy to list them all.
Another very important point: Foreign key constraints assuring referential integrity cannot be used in the "combined lookup" approach. They needed to be simulated by complex and error prone triggers. Overall the "combined lookup" approach is just a horrible idea. – sticky bit
There is a rule - non-relational relations must be separated.
In the 1st scheme - does a student may study in more than one class at the same time? If not then you do not need in student_class lookup table, and class_id is an attribute in student table.
Lookup tables are usually static so there shouldn't be much maintenance overhead. If you update the lookup data, however, now have to manage the life cycle of a subset of rows of your single lookup table which may get tricky opposed to just truncating a table when new data becomes available. Where I would be careful if your lookup table have different schemas with columns have to be null as they apply to a given "type" of row. You may not be able to implement the right foreign keys. If you happen to use the wrong id, you would get a nonsensical value. Those help you keep your data consistent (in production systems). If this is school project, especially a database class, you will be dinged for not using textbook normalization.

Database design: Value(s) per user per day

I'm setting up a system where for every user (1000+), I want to add a set of values every single day.
Hypotetically:
A system where I can log when Alice and Bob woke up and what they had for dinner on the August 1st 2019 or 2024.
Any suggestions on how to best structure the database tables?
A person table with a primary person ID?
rows: n
A date table with a primary date ID?
rows: m
And a personDate table the person ID and date ID as foreign keys?
rows n x m
I don't think u need a date table unless u want to use it to make specific queries easier. Such as left join against the date to see what days you are missing events. Nevertheless, I would stick to the DATE or DATETIME as the field and avoid making a separate surrogate foreign key. It won't save any space and will potentially perform worse and will be more difficult to use for the developer.
This seems simple and fine to me. I wouldn't worry too much about the performance based upon the number of elements alone. You can insert a billion records with no problem and that implies a very large site.
Just don't insert records if the event didn't happen. In other words you want your database to grow in relation to the real usage. Avoid growth based upon phantom events and you should be okay.
person
person_id
action
action_id
personAction
person_id
action_id
action_datetime

Composite primary key in MySQL on existing table

I am using MySQL workbench to manage my a database that was handed down to me for a development task. Unfortunately, the schema is a nightmare: no primary keys for numerous tables, lots of column duplication, etc.
First off, I wanted to add some uniqueness so that I can begin normalizing somehow. I have a 'students' table where a student (with an ID) works on a project that belongs to a specific term (Fall 2014, Spring 2015, etc.)
Since the same student can work on the same project two semesters in a row for whatever reason, that only way to tell them apart would be to have a (student ID, term) PK. So I actually do need that composite PK.
How might I alter the existing tables and set a composite PK?
EDIT:
To clarify: the schema contains a users table with actual student information (First/Last name, Email, Term). The students table would more aptly be named projects, as it references only the students by ID and then lists the project they worked on, in the semester that they worked on it. So at the very least, students.id would also be a FK from users.
I know the above doesn't quite make any sense; I'm just trying to keep this to one step at a time because the application depends on the schema and I don't want to introduce any new bugs at this point.
To clarify even further, here is how the users and students tables look like:
students
id project termInitiated ...
20 XYZ Summer 2013
20 XYZ Fall 2013
23 ABC Fall 2013
24 ABC Fall 2014
...
users
studentId firstName lastName termInitiated
20 A AA Summer 2013
20 A AA Fall 2013
23 Z ZZ Fall 2013
24 Y YY Fall 2014
...
Unfortunately, due to the way it is setup, I cannot have studentId be a PK by itself as the same student could be working on the same project multiple semesters in a row.
The best fix to this would be a globally unique identifier that could refer to the same student in different terms, but this would introduce a huge amount of bugs right now that I do not have the time to fix. Thus, I think that a composite PK would be the best solution given my limitations.
You may need to grant yourself the alter privilege if the composite key is not being added take a look at this: https://kb.mediatemple.net/questions/788/How+do+I+grant+privileges+in+MySQL%3F#dv
Here is adrian's link :ALTER TABLE to add a composite primary key
My suggestion is that, add a new field in your table for primary key with the unique combination of two fields.
For example in your case, add a field suppose pid in students table:
ALTER TABLE students
ADD pid VARCHAR(100) UNIQUE PRIMARY KEY;
And then update this table:
UPDATE students SET pid=CONCAT(id, '-', termInitiated);
So you will have unique combination of id and termInitiated in pid field.
You can use it as primary key and if you want to select it or join it with other table, you can reference it with combination of two field.
For example,
If you want to join student table with users, you can join it like:
SELECT * FROM users
INNER JOIN students
ON CONCAT(users.studentId, '-', termInitiated) = student.pid);
I hope this will work for you.
Please correct/suggest me, I am wrong.
Thank you.

Database many-to-many intermediate tables: extra fields

I have created a 'shops' and a 'customers' table and an intermediate table customers_shops. Every shop has a site_url web address, except that some customers use an alternative url to access the shop's site (this url is unique to a particular customer).
In the intermediate table below, I have added an additional field, shop_site_url. My understanding is that this is in 2nd normalised form, as the shop_site_url field is unique to a particular customer and shop (therefore won't be duplicated for different customers/shops). Also, since it depends on customer and shop, I think this is in 3rd normalised form. I'm just not used to using the 'mapping' table (customers_shops) to contain additional fields - does the design below make sense, or should I reserve the intermediate tables purely as a to convert many-to-many relationships to one-to-one?
######
customers
######
id INT(11) NOT NULL PRIMARY KEY
name VARCHAR(80) NOT NULL
######
shops
######
id INT(11) NOT NULL PRIMARY KEY
site_url TEXT
######
customers_shops
######
id INT(11) NOT NULL PRIMARY KEY
customer_id INT(11) NOT NULL
shop_id INT(11) NOT NULL
shop_site_url TEXT //added for a specific url for customer
Thanks
What you are calling an "intermediate" table is not a special type of table. There is only one kind of table and the same design principles ought to be applicable to all.
Well, let's create the table, insert some sample data, and look at the results.
id cust_id shop_id shop_site_url
--
1 1000 2000 NULL
2 1000 2000 http://here-an-url.com
3 1000 2000 http://there-an-url.com
4 1000 2000 http://everywhere-an-url-url.com
5 1001 2000 NULL
6 1001 2000 http://here-an-url.com
7 1001 2000 http://there-an-url.com
8 1001 2000 http://everywhere-an-url-url.com
Hmm. That doesn't look good. Let's ignore the alternative URL for a minute. To create a table that resolves a m:n relationship, you need a constraint on the columns that make up the m:n relationship.
create table customers_shops (
customer_id integer not null references customers (customer_id),
shop_id integer not null references shops (shop_id),
primary key (customer_id, shop_id)
);
(I dropped the "id" column, because it tends to obscure what's going on. You can add it later, if you like.)
Insert some sample data . . . then
select customer_id as cust_id, shop_id
from customers_shops;
cust_id shop_id
--
1000 2000
1001 2000
1000 2001
1001 2001
That's closer. You should have only one row for each combination of customer and shop in this kind of table. (This is useful data even without the url.) Now what do we do about the alternative URLs? That depends on a couple of things.
Do customers access the sites through
only one URL, or might they use more
than one?
If the answer is "only one", then you can add a column to this table for the URL, and make that column unique. It's a candidate key for this table.
If the answer is "more than one--at the very least the site url and the alternative url", then you need to make more decisions about constraints, because altering this table to allow multiple urls for each combination of customer and shop cuts across the grain of this requirement:
the shop_site_url field is unique to a
particular customer and shop
(therefore won't be duplicated for
different customers/shops)
Essentially, I'm asking you to decide what this table means--to define the table's predicate. For example, these two different predicates lead to different table structures.
customer 'n' has visited the web site
for shop 'm' using url 's'
customer 'n' is allowed to visit the
web site for shop 'm' using alternate
url 's'
Your schema does indeed make sense, as shop_site_url is an attribute of the relationship itself. You might want to give it a more meaningful name in order to distinguish it from shops.site_url.
Where else would you put this information? It's not an attribute of a shop, and it's not an attribute of a customer. You could put this in a separate table, if you wanted to avoid having a NULLable column, but you'd end up having to have a reference to your intermediate table from this new table, which probably would look even weirder to you.
Relationships can have attributes, just like entities can have attributes.
Entity attributes go into columns in entity tables. Relationship attributes, at least for many-to-many relationships, go in relationship tables.
It sounds as though, in general, URL is determined by the combination of shop and customer. So I would put it in the shop-customer table. The fact that many shops have only one URL suggests that there is a fifth normal form that is more subtle than this. But I'm too lazy to work it out.

How to handle fragmentation of auto_increment ID column in MySQL

I have a table with an auto_increment field and sometimes rows get deleted so auto_increment leaves gaps. Is there any way to avoid this or if not, at the very least, how to write an SQL query that:
Alters the auto_increment value to be the max(current value) + 1
Return the new auto_increment value?
I know how to write part 1 and 2 but can I put them in the same query?
If that is not possible:
How do I "select" (return) the auto_increment value or auto_increment value + 1?
Renumbering will cause confusion. Existing reports will refer to record 99, and yet if the system renumbers it may renumber that record to 98, now all reports (and populated UIs) are wrong. Once you allocate a unique ID it's got to stay fixed.
Using ID fields for anything other than simple unique numbering is going to be problematic. Having a requirement for "no gaps" is simply inconsistent with the requirement to be able to delete. Perhaps you could mark records as deleted rather than delete them. Then there are truly no gaps. Say you are producing numbered invoices: you would have a zero value cancelled invoice with that number rather than delete it.
There is a way to manually insert the id even in an autoinc table. All you would have to do is identify the missing id.
However, don't do this. It can be very dangerous if your database is relational. It is possible that the deleted id was used elsewhere. When removed, it would not present much of an issue, perhaps it would orphan a record. If replaced, it would present a huge issue because the wrong relation would be present.
Consider that I have a table of cars and a table of people
car
carid
ownerid
name
person
personid
name
And that there is some simple data
car
1 1 Van
2 1 Truck
3 2 Car
4 3 Ferrari
5 4 Pinto
person
1 Mike
2 Joe
3 John
4 Steve
and now I delete person John.
person
1 Mike
2 Joe
4 Steve
If I added a new person, Jim, into the table, and he got an id which filled the gap, then he would end up getting id 3
1 Mike
2 Joe
3 Jim
4 Steve
and by relation, would be the owner of the Ferrari.
I generally agree with the wise people on this page (and duplicate questions) advising against reusing auto-incremented id's. It is good advice, but I don't think it's up to us to decide the rights or wrongs of asking the question, let's assume the developer knows what they want to do and why.
The answer is, as mentioned by Travis J, you can reuse an auto-increment id by including the id column in an insert statement and assigning the specific value you want.
Here is a point to put a spanner in the works: MySQL itself (at least 5.6 InnoDB) will reuse an auto-increment ID in the following circumstance:
delete any number rows with the highest auto-increment id
Stop and start MySQL
insert a new row
The inserted row will have an id calculated as max(id)+1, it does not continue from the id that was deleted.
As djna said in her/his answer, it's not a good practice to alter database tables in such a way, also there is no need to that if you have been choosing the right scheme and data types. By the way according to part od your question:
I have a table with an auto_increment field and sometimes rows get deleted so auto_increment leaves gaps. Is there any way to avoid this?
If your table has too many gaps in its auto-increment column, probably as a result of so many test INSERT queries
And if you want to prevent overwhelming id values by removing the gaps
And also if the id column is just a counter and has no relation to any other column in your database
, this may be the thing you ( or any other person looking for such a thing ) are looking for:
SOLUTION
remove the original id column
add it again using auto_increment on
But if you just want to reset the auto_increment to the first available value:
ALTER TABLE `table_name` AUTO_INCREMENT=1
not sure if this will help, but in sql server you can reseed the identity fields. It seems there's an ALTER TABLE statement in mySql to acheive this. Eg to set the id to continue at 59446.
ALTER TABLE table_name AUTO_INCREMENT = 59446;
I'm thinking you should be able to combine a query to get the largest value of auto_increment field, and then use the alter table to update as needed.