How do I properly structure my relational mySQL database - mysql

I am making a database that is for employee scheduling. I am, for the first time ever, making a relational mySQL database so that I can efficiently manage all of the data. I have been using the mySQL Workbench program to help me visualize how this is going to go. Here is what I have so far:
What I have pictured in my head is that, based on the drawing, I would set the schedule in the schedule table which uses references from the other tables as shown. Then when I need to display this schedule, I would pull everything from the schedule table. Whenever I've worked with a database in the past, it hasn't been of the normalized type, so I would just enter the data into one table and then pull the data out from that one table. Now that I'm tackling a much larger project I am sure that having all of the tables split (normalized) like this is the way to go, but I'm having trouble seeing how everything comes together in the end. I have a feeling it doesn't work the way I have it pictured, #grossvogel pointed out what I believe to be something critical to making this all work and that is to use the join function to pull the data.
The reason I started with a relational database was so that if I made a change to (for example) the shift table and instead of record 1 being "AM" I wanted it to be "Morning", it would then automatically change the relevant sections through the cascade option.
The reason I'm posting this here is because I am hoping someone can help fill in the blanks and to point me in the right direction so I don't spend a lot of hours only to find out I made a wrong turn at the beginning.

Maybe the piece you're missing is the idea of using a query with joins to pull in data from multiple tables. For instance (just incorporating a couple of your tables):
SELECT Dept_Name, Emp_Name, Stat_Name ...
FROM schedule
INNER JOIN departments on schedule.Dept_ID = departments.Dept_ID
INNER JOIN employees on schedule.Emp_ID = employees.Emp_ID
INNER JOIN status on schedule.Stat_ID = status.Stat_ID
...
where ....
Note also that a schedule table that contains all of the information needed to be displayed on the final page is not in the spirit of relational data modeling. You want each table to model some entity in your application, so it might be more appropriate to rename schedule to something like shifts if each row represents a shift. (I usually use singular names for tables, but there are multiple perspectives there.)

This is, frankly, a very difficult question to answer because you could get a million different answers, each with their own merits. I'd suggest you take a look at these (there are probably better links out there too, these just seemed like good points to note) :
http://www.devshed.com/c/a/MySQL/Designing-a-MySQL-Database-Tips-and-Techniques/
http://en.wikipedia.org/wiki/Boyce%E2%80%93Codd_normal_form
http://www.sitepoint.com/forums/showthread.php?66342-SQL-and-RDBMS-Database-Design-DO-s-and-DON-Ts
I'd also suggest you try explaining what it is you want to achieve in more detail rather than just post the table structure and let us try to figure out what you meant by what you've done.
Often by trying to explain something verbally you may come to the realisations you need without anyone else's input at all!
One thing I will mention is that you don't have to denormalise a table to report certain values together, you should be considering views for that kind of thing...

Related

How can I combine two tables and merge fields?

Apologies is this is an incredibly obvious question but I'm new to access and have learned it over the past week through a rather daunting project.
Basically I am in the process of designing a database to track everything the company does: people it knows, things it does, meetings it holds, etc.
I think I've structured it pretty well, but what I would like is an "activity" table. Because the activities are diverse, they are all in multiple tables (events, meetings, enquiries, etc), but they all have a sort of general description field. Is there some way to make a table or form that will pull in all of that data and sort of lay it out as:
Date - Type - Description.
Type would be based on where it came from, so if the meetings table was the source of the information, then it would be typed as meeting.
I realise a simpler way to do this would have been to have an activities table with categories as breaking it up seems easier than combining it, but the number of fields that would mean having on one table made that seem impossible to run.
This seems like something that should be simple, but I'm struggling.
Any thoughts? Thanks in advance!

How to set up relational database tables for this many-to-many relationship?

I have a type of data called a chain. Each chain is made up of a specific sequence of another type of data called a step. So a chain is ultimately made up of multiple steps in a specific order. I'm trying to figure out the best way to set this up in MySQL that will allow me to do the following:
Look up all steps in a chain, and get them in the right order
Look up all chains that contain a step
I'm currently considering the following table set up as the appropriate solution:
TABLE chains
id date_created
TABLE steps
id description
TABLE chains_steps (this would be used for joins)
chain_id step_id step_position
In the table chains_steps, the step_position column would be used to order the steps in a chain correctly. It seems unusual for a JOIN table to contain its own distinct piece of data, such as step_position in this case. But maybe it's not unusual at all and I'm just inexperienced/paranoid.
I don't have much experience in all this so I wanted to get some feedback. Are the three tables I suggested the correct way to do this? Are there any viable alternatives and if so, what are the advantages/drawback?
You're doing it right.
Consider a database containing the Employees and Projects tables, and how you'd want to link them in a many-to-many fashion. You'd probably come up with an Assignments table (or Project_Employees in some naming conventions).
At some point you'd decide you want not only to store each project assignment, but you'd also want to store when the assignment started, and when it finished. The natural place to put that is in the assignment itself; it doesn't make sense to store it either with the project or with the employee.
In further designs you might even find it necessary to store further information about the assignment, for example in an employee review process you may wish to store feedback related to their performance in that project, so you'd make the assignment the "one" end of a relationship with a Review table, which would relate back to Assignments with a FK on assignment_id.
So in short, it's perfectly normal to have a junction table that has its own data.
That looks fine, and it's not unusual for the join table to contain a position/rank field.
Look up all steps in a chain, and get them in the right order
SELECT * FROM chains_steps
LEFT JOIN steps ON steps.id = chains_steps.step_id
WHERE chains_steps.chain_id = ?
ORDER BY chains_steps.step_position ASC
Look up all chains that contain a step
SELECT DISTINCT chain_id FROM chains_steps
LEFT JOIN chains ON chains.id = chains_steps.chain_id
I think that the plan you've outlined is the correct approach. Don't worry too much about the presence of step_position on your mapping table. After all the step_position is a bit of data that is directly related to a step in the context of a chain. So the chains_steps table is the right place for it IMHO.
Some things to think about:
Foreign keys - use 'em!
Unique key on the chains_steps table - can a step be present in more than one position in a single chain? What about in different chains?
Good luck!

How slow is the LIKE query on MySQL? (Custom fields related)

Apologies if this is redundant, and it probably is, I gave it a look but couldn't find a question here that fell in with what I wanted to know.
Basically we have a table with about ~50000 rows, and it's expected to grow much bigger than that. We need to be able to allow admin users to add in custom data to an item based on its category, and users can just pick which fields defined by the administrators they want to add info to.
Initially I had gone with an item_categories_fields table which pairs up entries from item_fields to item_categories, so admins can add custom fields and reuse them across categories for consistency. item_fields has a relationship to item_field_values which links values with fields, which is how we handled things in .NET. The project is using CAKEPHP though, and we're just learning as we go, so it can get a bit annoying at times.
I'm however thinking of maybe just adding an item_custom_fields table that is essentially the item_id and a text field that stores XMLish formatted data. This is just for the values of the custom fields.
No problems if I want to fetch the item by its id as the required data is stored in the items table, but what if I wanted to do a search based on a custom field? Would a
SELECT * FROM item_custom_fields
WHERE custom_data LIKE '%<material>Plastic</material>%'
(user input related issues aside) be practical if I wanted to fetch items made of plastic in this case? Like how slow would that be?
Thanks.
Edit: I was afraid of that as realistically this thing will be around 400k rows for that one table at launch, thanks guys.
Any LIKE query that starts with % will not use any indexes you have on the column, so the query will scan the whole table to find the result.
The response time for that depends highly on your machine and the size of the table, but it definitely won't be efficient in any shape or form.
Your previous/existing solution (if well indexed) should be quite a bit faster.

Implementing Comments and Likes in database

I'm a software developer. I love to code, but I hate databases... Currently, I'm creating a website on which a user will be allowed to mark an entity as liked (like in FB), tag it and comment.
I get stuck on database tables design for handling this functionality. Solution is trivial, if we can do this only for one type of thing (eg. photos). But I need to enable this for 5 different things (for now, but I also assume that this number can grow, as the whole service grows).
I found some similar questions here, but none of them have a satisfying answer, so I'm asking this question again.
The question is, how to properly, efficiently and elastically design the database, so that it can store comments for different tables, likes for different tables and tags for them. Some design pattern as answer will be best ;)
Detailed description:
I have a table User with some user data, and 3 more tables: Photo with photographs, Articles with articles, Places with places. I want to enable any logged user to:
comment on any of those 3 tables
mark any of them as liked
tag any of them with some tag
I also want to count the number of likes for every element and the number of times that particular tag was used.
1st approach:
a) For tags, I will create a table Tag [TagId, tagName, tagCounter], then I will create many-to-many relationships tables for: Photo_has_tags, Place_has_tag, Article_has_tag.
b) The same counts for comments.
c) I will create a table LikedPhotos [idUser, idPhoto], LikedArticles[idUser, idArticle], LikedPlace [idUser, idPlace]. Number of likes will be calculated by queries (which, I assume is bad). And...
I really don't like this design for the last part, it smells badly for me ;)
2nd approach:
I will create a table ElementType [idType, TypeName == some table name] which will be populated by the administrator (me) with the names of tables that can be liked, commented or tagged. Then I will create tables:
a) LikedElement [idLike, idUser, idElementType, idLikedElement] and the same for Comments and Tags with the proper columns for each. Now, when I want to make a photo liked I will insert:
typeId = SELECT id FROM ElementType WHERE TypeName == 'Photo'
INSERT (user id, typeId, photoId)
and for places:
typeId = SELECT id FROM ElementType WHERE TypeName == 'Place'
INSERT (user id, typeId, placeId)
and so on... I think that the second approach is better, but I also feel like something is missing in this design as well...
At last, I also wonder which the best place to store counter for how many times the element was liked is. I can think of only two ways:
in element (Photo/Article/Place) table
by select count().
I hope that my explanation of the issue is more thorough now.
The most extensible solution is to have just one "base" table (connected to "likes", tags and comments), and "inherit" all other tables from it. Adding a new kind of entity involves just adding a new "inherited" table - it then automatically plugs into the whole like/tag/comment machinery.
Entity-relationship term for this is "category" (see the ERwin Methods Guide, section: "Subtype Relationships"). The category symbol is:
Assuming a user can like multiple entities, a same tag can be used for more than one entity but a comment is entity-specific, your model could look like this:
BTW, there are roughly 3 ways to implement the "ER category":
All types in one table.
All concrete types in separate tables.
All concrete and abstract types in separate tables.
Unless you have very stringent performance requirements, the third approach is probably the best (meaning the physical tables match 1:1 the entities in the diagram above).
Since you "hate" databases, why are you trying to implement one? Instead, solicit help from someone who loves and breathes this stuff.
Otherwise, learn to love your database. A well designed database simplifies programming, engineering the site, and smooths its continuing operation. Even an experienced d/b designer will not have complete and perfect foresight: some schema changes down the road will be needed as usage patterns emerge or requirements change.
If this is a one man project, program the database interface into simple operations using stored procedures: add_user, update_user, add_comment, add_like, upload_photo, list_comments, etc. Do not embed the schema into even one line of code. In this manner, the database schema can be changed without affecting any code: only the stored procedures should know about the schema.
You may have to refactor the schema several times. This is normal. Don't worry about getting it perfect the first time. Just make it functional enough to prototype an initial design. If you have the luxury of time, use it some, and then delete the schema and do it again. It is always better the second time.
This is a general idea
please donĀ“t pay much attention to the field names styling, but more to the relation and structure
This pseudocode will get all the comments of photo with ID 5
SELECT * FROM actions
WHERE actions.id_Stuff = 5
AND actions.typeStuff="photo"
AND actions.typeAction = "comment"
This pseudocode will get all the likes or users who liked photo with ID 5
(you may use count() to just get the amount of likes)
SELECT * FROM actions
WHERE actions.id_Stuff = 5
AND actions.typeStuff="photo"
AND actions.typeAction = "like"
as far as i understand. several tables are required. There is a many to many relation between them.
Table which stores the user data such as name, surname, birth date with a identity field.
Table which stores data types. these types may be photos, shares, links. each type must has a unique table. therefore, there is a relation between their individual tables and this table.
each different data type has its table. for example, status updates, photos, links.
the last table is for many to many relation storing an id, user id, data type and data id.
Look at the access patterns you are going to need. Do any of them seem to made particularly difficult or inefficient my one design choice or the other?
If not favour the one that requires the fewer tables
In this case:
Add Comment: you either pick a particular many/many table or insert into a common table with a known specific identifier for what is being liked, I think client code will be slightly simpler in your second case.
Find comments for item: here it seems using a common table is slightly easier - we just have a single query parameterised by type of entity
Find comments by a person about one kind of thing: simple query in either case
Find all comments by a person about all things: this seems little gnarly either way.
I think your "discriminated" approach, option 2, yields simpler queries in some cases and doesn't seem much worse in the others so I'd go with it.
Consider using table per entity for comments and etc. More tables - better sharding and scaling. It's not a problem to control many similar tables for all frameworks I know.
One day you'll need to optimize reads from such structure. You can easily create agragating tables over base ones and lose a bit on writes.
One big table with dictionary may become uncontrollable one day.
Definitely go with the second approach where you have one table and store the element type for each row, it will give you a lot more flexibility. Basically when something can logically be done with fewer tables it is almost always better to go with fewer tables. One advantage that comes to my mind right now about your particular case, consider you want to delete all liked elements of a certain user, with your first approach you need to issue one query for each element type but with the second approach it can be done with only one query or consider when you want to add a new element type, with the first approach it involves creating a new table for each new type but with the second approach you shouldn't do anything...

Guidelines for join/link/many to many tables

I have my own theories on the best way to do this, but I think its a common topic and I'd be interested in the different methods people use. Here goes
Whats the best way to deal with many-to-many join tables, particularly as far as naming them goes, what to do when you need to add extra information to the relationship, and what to do whene there are multiple relationships between two tables?
Lets say you have two tables, Users and Events and need to store the attendees. So you create EventAttendees table. Then a requirement comes up to store the organisers. Should you
create an EventOrganisers table, so each new relationship is modelled with a join table
or
rename EventAttendees to UserEventRelationship (or some other name, like User2Event or UserEventMap or UserToEvent), and an IsAttending column and a IsOrganiser column i.e. You have a single table which you store all relationship info between two attendees
or
a bit of both (really?)
or
something else entirely?
Thoughts?
The easy answer to a generic question like this is, as always, "It all depends on the details".
But in general, I try to create fewer tables when this can be done without abusing the data definitions unduly. So in your example, I would probably add an isOrganizer column to the table, or maybe an attendeeType to allow for easy future expansion from audience/organizer to audience/organizer/speaker/caterer or whatever may be needed. Creating an extra table with essentially identical columns, where the table name is in effect a flag identifying the "attendee type", seems to me the wrong way to go both from a pristine design perspective and also from a practical point of view.
A single table is more flexible. With one table and a type field, if we want to know just the organizers -- like when we're sending invitations to a planning meaning -- fine, we write "select userid from userevent where eventid=? and attendeetype='O'". If we want to know everyone who will be there -- like when we're printing name cards for the lunch tables -- we just don't include the attendeetype test.
But suppose we have two tables. Then if we want just the organizers, okay, that's easy, join on the organizer table. But if we want both organizers and audience, then we have to do a union, which makes for more complicated queries and is usually slow. And if you're thinking, What's the big deal doing a union?, note that there may be more to the query. Perhaps a person can have multiple phone numbers and we care about this, so the query is not just joining user and eventAttendee but also phone. Maybe we want to know if they've attended previous conferences because we give special deals to "alumni", so we have to join in eventAttendee a second time, etc etc. A ten-table join with a union can get very messy and confusing to read.