MySql table with potentially *very* many columns - mysql

A friend who is a recruiter for software engineers wants me to create an app for him.
He wants to be able to search candidates' CVs based on skills.
As you can imagine, there are potentially hundreds, possibly thousands of skills.
What's the best way to represent the candidate in a table? I am thinking skill_1, skill_2, skill_n, etc, but somewhere out there there is a candidate with more than n skills.
Also, it is possible that more skills will be added to the database in future.
So, what's the best way to represent a candidate's skills?
[Update] for #zohar, here's a rough first pass at teh schema. Any comments?

You need three tables (at least):
One table for candidates, that will contain all the details such as name, contact information, the cv (or a link to it) and all other relevant details.
One table for skills - that will contain the skill name, and perhaps a short description (if that's relevant)
and one table to connect candidates to skills - candidatesToSkills - that will have a 1 to many relationship with both tables - and a primary key that is the combination of the candidate id and the skill id.
This is the relational way of creating a many to many relationship.
As a bonus, you can also add a column for skill level - beginner, intermediate, skilled, expert etc'.
You might also want to add a table for job openings and another table to connect that to the skills table, so that you can easily find the most suitable candidate for the job based on the required skills. (but please note that skills is not the only match needed - other points to match are geographic location, salary expectations, etc'.)

Related

Difference between two separate tables and one table with column

I have users and doctors. It is an app that both regular people and doctors can use. I'm not sure which way to setup my database.
Two tables - users and doctors tables
One table, extra column - users table with column user_type
Which one is the best way to do it? What are the pros and cons of both?
Keeping separate tables for me is good especially if you make user table to be as basic as possible (with username and password). This way you know that even if requirements changes in the future and you need to add another user type with other additional fields, then you don't have to worry about other sections of your code. I am just thinking about the future changes or additions. Doctors is abstract. Again this might be the problem in the future when you have to categorize them (e.g by their specialty).
Just my two cents worth.
2 tables is your best approach. Although not the 2 tables you have suggested.
If you analyze your situation a little more basically, you have users and they belong to 1 of 2 groups (general public and doctors).
This is a standard user group situation, so using 1 table to store user specific data with a reference to a user_group table (ie user belongs to group) is a good solution.
This also allows you to add groups later on - renal patient, renal doctor, cardiac patient, cardiac doctor ...
Using a setup with users and user_groups is usually a good way to go.

Database design for tracking and sharing expenses

I want to design a web application for keeping track of the finance of the members of an organization. Certain functions are very similar to Splittr. I define my requirements using the MWE database diagrams:
"Finance" tables: Each user will have one personal finance account, for which I am using the following three red tables:
"SharedExpense" tables: Each user can have shared expenses with other users in many 'shared-expense-groups'. For each group, I am using the following three blue tables:
(Note how each user can define amount of their share, and own category of the shared expense. UserShare table uses a composite primary key.)
The problem: I have to relate the users to their 3 personal "Finance" tables and the 3N "SharedExpense" tables (where N is the number of 'shared-expense-groups' the user belongs to).
Attempted Solutions:
Multiple databases. Each user has a unique database for their "Finance" tables. Each 'shared-expense-group' has a unique database on the server. I can then relate the users from one master database with the following four purple tables:
Drawbacks: Foreign keys come from different databases, large number of databases to be backed up.
Multiple tables. I can create all the tables in the same database and relate all of them with the four green master tables:
Here, the number of tables is a potential problem. If there are M users and N 'shared-expense-groups, then there will be 3M + 3N tables!
The question: Is there more elegant and simpler database design for the purpose? If not, which of the above two solutions is better and why?
Links to relevant, previous StackOverflow Q&A:
Personal finance app database design
Database design for tracking progress over time
SQL for a Household Bill splitting db
Comparing 1 Database with Many Tables to Multiple Databases with Fewer Tables in Each
There is to much to describe all the challenges in a summary, but I'll pick out a few.
Fundamental design violations: such as a table/database for each user
entity design, 3NF: such as category.budget and ledger.transaction_type
referential integrity/relationship design:
account is for one user, but account table does not contain the user id;
usershare is a subset of ledger, but they both point to a user;
object naming concerns:
clear and consistent naming entities, based on real usage. Is a member a user or a user a member? If they are the same, choose one name. If they are not the same, the design is different. Do staff use client or customer rather than member?
consistency in your key naming. The key name should directly tie it to the source entity. Members.ID should be referenced as members_id, rather than user_id. However, see the next entry before correcting this.
be consistent in your entity plurality. The general consensus is that the name should describe a single record (User) rather than all the records (Users).
ledger.spent_on - that name is not obviously a date. It could be pointing to a user or category as well. An attribute name should describe the attribute without needing additional explanation. For example, ledger.Purchase_Date is self explanatory. It should also be clear how it relates to the entity. UserShare.Share doesn't really tell me what it contains.
Sorry to be blunt, but I would start over. Consider what you have as a good trial run and start again using the additional information you have.
Ask questions of your designs (Are all users members? Are all members users?). If the answer is anything other than Yes or No, break it down further.
Try what-if scenarios (What if a shared ledger exceeds the category budget? How will previous spending be perceived if the category budget changes?)
Consider what reporting questions may be asked (Who went over budget? How much are we spending on this category?) and then consider the query to answer the question.
Read up on 3NF and maybe some of the higher normalization levels as well. Whereas 3NF is pretty nearly the minimum normalization, the higher levels become increasingly specialized and may or may not be appropriate for you design.
The better you understand your data AND business, the better your design will be, and the better your end product will turn out.

Improving my Database Design for future scalability

Well, I am working on a project which might involve thousands of users & I don't have much experience in databases especially when it involves relationships between entities.
Let me explain my scenario. First there's an User who can login into our system using his credentials. We have a module in our system, which will enable him to create Projects. So that brings a relationship between User table & Projects table.
Now there's another module, namely Team Creation Module, it does what it says. Out of the list of available members, he can pick who he likes and add them to a team. So there are tables for that Members & Team. Furthermore, a member can be a part of many teams and a team can have many members & a "User" can be member as well.
I have a designed the database myself but I am not sure if it is good or bad one. Moreover, I would really appreciate if someone can point me to good tutorials which shows how to insert or update into tables involving relationships.
Here's my design till now:
Update
After a discussion with someone on IRC, I came up with a revised design. I merged "User" & "Members" table as User is also a Member.
My question still remains the same, Am I on right track?
It's great that you're thinking long-term, but your solution won't work long-term.
This is not the first time this sort of thing has been tried before. Rely on the wisdom of those that have messed up before. Read data modeling pattern books.
Abstract and Normalize. That's how you get to a good long-term solution.
At least read up on The Party Model. A group and individual are actually the same (abstract) thing.
Put actually different things in different tables. An Address and Member don't belong in the same table.
"Am I on the right track" is not a useful question - we have no way of telling, because it depends on where you are headed.
A couple of things:
it's a good idea to name the relation columns after the relationship. For instance, in the first diagram, the "owner" of the project should not be called users_user_id - that's meaningless. Call it "owner_id" or something that meaningfully describes the relationship between the project and members table.
in the second diagram, you appear to have a "many to many" relationship between members and projects in the members table - but there's no efficient way of storing the id of more than one project in the members table. You need to factor that out into a joining table - projects_members, for instance, just like you did with teams_members.
the "teams_members" table has a primary key called tm_id. A purist would tell you this is wrong - the unique identifier for that table should be the combination of member_id and team_id. You don't need another unique identifier - and in fact it's harmful, because you must guarantee uniqueness of the member_id and team_id combination.
As Neil says, you probably want to start reading up on this. I can recommend 'Database Systems: Design, Implementation, and Management' by Coronel et al.

Method To Create Database for Tv Shows

This is my first question to stackoverflow so if i do something wrong please let me know i will fix it as soon as possible.
So i am trying to make a database for Tv Shows and i would like to know the best way and to make my current database more simple (normalization).
I would to be able to have the following structure or similar.
Fringe
Season 1
Episodes 1 - 10(whatever there are)
Season 2
Episodes 1 - 10(whatever there are)
... (so on)
Burn Notice
Season 1
Episodes 1 - 10(whatever there are)
Season 2
Episodes 1 - 10(whatever there are)
... (so on)
... (More Tv Shows)
Sorry if this seems unclear. (Please ask for clarification)
But the structure i have right now is 3 tables (tvshow_list, tvshow_episodes, tvshow_link)
//tvshow_list//
TvShow Name | Director | Company_Created | Language | TVDescription | tv_ID
//tvshow_episodes//
tv_ID | EpisodeNum | SeasonNum | EpTitle | EpDescription | Showdate | epid
//tvshow_link//
epid | ep_link
The Director and the company are linked by an id to another table with a list of companies and directors.
I am pretty sure that there is an more simplified way of doing this.
Thanks for the help in advance,
Krishanthan Lingeswaran
The basic concept of Normalization is the idea that you should only store one copy of any item of data that you have. It looks like you've got a good start already.
There are two basic ways to model what you're trying to do here, with episodes and shows. In the database world, we you might have heard the term "one to many" or "many to many". Both are useful, it just depends on your specific situation to know which is the correct one to use. In your case, the big question to ask yourself is whether a single episode can belong to only one show, or can an episode belong to multiple shows at once? I'll explain the two forms, and why you need to know the answer to that question.
The first form is simply a foreign key relationship. If you have two tables, 'episodes' and 'shows', in the episodes table, you would have a column named 'show_id' that contains the ID of one (and only one!) show. Can you see how you could never have an episode belong to more than one show this way? This is called a "one to many" relationship, i.e. a show can have many episodes.
The second form is to use an association table, and this is the form you used in your example. This form would allow you to associate an episode with multiple shows and is therefore called a "many to many" relationship.
There is some benefit to using the first form, but it's not really that big of a deal in most cases. Your queries will be a little bit shorter because you only have to join 2 tables to get episodes->shows but the other table is just one more join. It really comes down to figuring out if you need a "one to many" or "many to many" type relationship.
An example of a situation where you would need a many-to-many relationship would be if you were modeling a library and had to keep track of who checked out which book. You'd have a table of books, a table of users, and then a table of "books to users" that would have an id, a book_id, and a user_id and would be a many-to-many relationship.
Hope that helps!
I am pretty sure that there is an more simplified way of doing this.
Not as far as I know. Your schema is close to the simplest you can make for what I presume is the functionality you're asking for. "Improvements" on it really only make it more complicated, and should be added as you judge the need emerges on your side. The following examples come to mind (none of which really simplify your schema).
I would standardize your foreign key and primary key names. An example would be to have the columns shows.id, episodes.id, episodes.show_id, link.id, link.episode_id.
Putting SeasonNum as what I presume will be an int in the Episodes table, in my opinion, violates the normalization constraint. This is not a major violation, but if you really want to stick to it, I would create a separate Seasons table and associate it many-to-one to the Shows table, and then have the Episodes associate only with the Seasons. This gives you the opportunity to, for instance, attach information to each season. Also, it prevent repetition of information (while the type of the season ID foreign key column in the Episodes table would ostensibly still be an INT, a foreign key philosophically stores an association, what you want, versus dumb data, what you have).
You may consider putting language, director, and company in their own tables rather than your TV show list. This is the same concern as above and in your case a minor violation of normalization.
Language, director, and company all have interesting issues attached to them regarding the level of the association. Most TV shows have different directors for different episodes. Many are produced in multiple languages and by several different companies and sometimes networks. So at what level do you plan on storing this information? I'm not a software architect, so someone else can better answer this question than me, but I'd set up a polymorphic many-to-many association for languages, directors, and companies and an inheritance model that allows for these values to be specified on an episode-by-episode, season-by-season, or show-by-show basis, inheriting the value from its parent if none are provided.
Bottom line concerning all these suggestions: Pick what's appropriate for your project. If you don't need the functionality afforded by this level of associations, and you don't mind manually entering in repetitive data (you might end up implementing an auto-complete system to help you), you can gloss over some of the normalization constraints.
Normalization is merely a suggestion. Pick what's right for you and learn from your mistakes.

A Beginner Question on database design

this is a follow-up question on my previous one.We junior year students are doing website development for the univeristy as volunteering work.We are using PHP+MySQL technique.
Now I am mainly responsible for the database development using MySQL,but I am a MySQL designer.I am now asking for some hints on writing my first table,to get my hands on it,then I could work well with other tables.
The quesiton is like this,the first thing our website is going to do is to present a Survey to the user to collect their preference on when they want to use the bus service.
and this is where I am going to start my database development.
The User Requirement Document specifies that for the survey,there should be
Customer side:
Survery will be available to customers,with a set of predefined questions and answers and should be easy to fill out
Business side:
Survery info. will be stored,outputed and displayable for analysis.
It doesnt sound too much work,and I dont need to care about any PHP thing,but I am just confused on :should I just creat a single table called " Survery",or two tables "Survey_business" and "Survey_Customer",and how can the database store the info.?
I would be grateful if you guys could give me some help so I can work along,because the first step is always the hardest and most important.
Thanks.
I would use multiple tables. One for the surveys themselves, and another for the questions. Maybe one more for the answer options, if you want to go with multiple-choice questions. Another table for the answers with a record per question per answerer. The complexity escalates as you consider multiple types of answers (choice, fill-in-the-blank single-line, free-form multiline, etc.) and display options (radio button, dropdown list, textbox, yada yada), but for a simple multiple-choice example with a single rendering type, this would work, I think.
Something like:
-- Survey info such as title, publish dates, etc.
create table Surveys
(
survey_id number,
survey_title varchar2(200)
)
-- one record per question, associated with the parent survey
create table Questions
(
question_id number,
survey_id number,
question varchar2(200)
)
-- one record per multiple-choice option in a question
create table Choices
(
choice_id number,
question_id number,
choice varchar2(200)
)
-- one record per question per answerer to keep track of who
-- answered each question
create table Answers
(
answer_id number,
answerer_id number,
choice_id number
)
Then use application code to:
Insert new surveys and questions.
Populate answers as people take the surveys.
Report on the results after the survey is in progress.
You, as the database developer, could work with the web app developer to design the queries that would both populate and retrieve the appropriate data for each task.
only 1 table, you'll change only the way you use the table for each ocasion
customers side insert data into the table
business side read the data and results from the same table
Survey.Customer sounds like a storage function, while Survey.Business sounds like a retrieval function.
The only tables you need are for storage. The retrieval operations will take place using queries and reports of the existing storage tables, so you don't need additional tables for those.
Use a single table only. If you were to use two tables, then anytime you make a change you would in effect have to do everything twice. That's a big pain for maintenance for you and anyone else who comes in to do it in the future.
most of the advice/answers so far are applicable but make certain (unstated!) assumptions about your domain
try to make a logical model of the entities and attributes that are required to capture the requirements, examine the relationships, consider how the data will be used on both sides of the process, and then design the tables. Talk to the users, talk to the people that will be running the reports, talk to whoever is designing the user interface (screens and reports) to get the complete picture.
pay close attention the the reporting requirements, as they often imply additional attributes and entities not extant in the data-entry schema
i think 2 tables needed:
a survey table for storing questions and choices for answer. each survey will be stored in one row with a unique survey id
other table is for storing answers. i think its better to store each customers answer in one row with a survey id and a customer id if necessary.
then you can compute results and store them in a surveyResults view.
Is the data you're presenting as the questions and answers going to be dynamic? Is this a long-term project that's going to have questions swapped in and out? If so, you'll probably want to have the questions and answers in your database as well.
The way I'd do it would be to define your entities and figure out how to design your tables so relationships are straightforward. Sounds to me like you have three entities:
Question
Answer
Completed Survey
Just a sample elaboration of what Steven and Chris has mentioned above.
There are gonna be multiple tables, if there are gonna be multiple surveys, and each survey has a different set of questions, and if same user can take multiple surveys.
Customer Table with CustID as the primary key
Questions Table with a Question ID as the primary key. If a question cannot belong to more than one survey (a N:1 relationship), then can also have Survey ID (of table Survey table mentioned in point 3) as one of the values in the table.
But if a Survey to Question relationship is N:M, then
(SurveryID, QuestionID) would become a composite key for the SurveyTable, else it would just have the SurveyID with the high level details of the survey like description.
UserSurvey table which would contain (USerID, SurveryID, QuestionID, AnswerGiven)
[Note: if same user can take the same survey again and again, either the old survey has to be updated or the repeat attempts have to stored as another rows with some serial number)