Database structure for simple waiting times project with CSV data and MySql - mysql

Suppose I have some sample data like that shown below (with a lot more entries), and my main use case is to look up a specific aliment and provide a list of waiting times for different hospitals which offer that treatment.
Not being very experienced at all with DB design, I don't know whether in this example there is an advantage to using separate tables with links between then or if a simple import of the CSV to a single table will suffice.
If I used separate tables, I'm guessing they would be for hospital and ailment perhaps?
I would be very grateful if someone tell me the best approach for this.
ID,Main Department,Specific Complaint,Hospital ,Waiting time
1,Cardiology,general,Hospital 1,7
2,Cardiology,general,Hospital 2,7
3,Cardiology,general,Hospital 3,7
4,Cardiology,general,Hospital 4,21
5,Cardiology,traumatology,Hospital 1,8
6,Cardiology,traumatology,Hospital 2,7
7,Dermatology,general,Hospital 1,21
8,Dermatology,general,Hospital 2,14
9,Dermatology,general,Hospital 3,21
10,Dermatology,erysipelas,Hospital 1,7
11,Dermatology,erysipelas,Hospital 3,7
...

One detail you must understand, SO is not a teaching site, tutorials abound for that. It is more to address specific problems that arise when developing solutions. That being said, I like this type of question, so here goes.
The type of solution to implement (simple CSV or complete database) depends on the volume of data, and type type of reports you require.
CSV is quick to implement.
Database takes more time, but will allow you to produce more complex reports than CSV, through the use of queries.
CSV is often used as a medium to load or extract data, but as for queries it is not as powerful.
A database can be expanded. Ex. today you only consider the name of the hospital. You could expand your table to include the address, phone number, ... You could also expand your model to add insurance company links, doctors, ...
Basic modeling:
Identify your objects. Ex. here I would consider ailment, hospital, complaint.
Identify relations between objects, and their type. Ex. ailment and hospital are linked, the that link is n-n. Meaning 1 ailment can be treated in many hospitals, and 1 hospital can treat many ailments.
I am not certain what to do with complaint. In your question you do not specify if all hospitals treat all (ailment - complaint) duos or not. More on that later.
As you define your structure, make sure you apply the normal forms. In most cases, forms 1-3 are enough.
1NF: atomic values and no repeating groups. Ex. you would create table with columns hospital and ailments separated by commas. 1 line == 1 hospital <-> 1 ailment.
2NF: 1NF is achieved and all the non-key attributes are dependent on the primary key. Ex. you should not create a table linking ailment and wait time. The wait time is not dependent on the ailment, it is dependent on the combination of ailment and hospital.
3NF: 2NF is achieved and there are no transitive functional dependencies. So A is dependent on B, B is dependant on C, so A is transitively dependent on C.
Some critical questions must be answered before you can model your data:
A hospital can treat a certain ailment. In all cases?
Can you have: hospital 1 can tread ailment 1 when the complaint is A and B, but not C?
Ex. all hospitals can provide primary care for cardiac patients, but cardiac surgery can only be performed as some hospitals.
In that case, you cannot link ailment and hospital together directly. A combination of (ailment,complaint) can. And this will impact wait time.
Based on reality, I will link (ailment and complaint) and link this duo to hospital.
Here is my first model, "for fun", which might need to be modified for your needs:
Wait time is in table Hospital_Treads_Ailment_has_Complaint. In my model, an hospital can only estimate the wait time once they know which ailment and which complaint the patient has.
A final exercise I do to test my model is try the main queries I need. If one query cannot be done with the model, it needs to be changed.
Which hospital treats cardiac problems? Ok, select hospital where ailment == cardiology, complaint == *.
Which hospital can accept patients who have trauma. Ok, select hospital where ailment == *, complaint == trauma.
and so on...

Related

ER Diagram, Physical Data Model Relations

I am trying to create a very simple database Supermarket management system.
And it seems that I am having a problem with how relations work between entities, I am using PowerDesigner to create the ERD and then generate everything from it(LDM, PDM, OOM). Is this a bad idea?.
Now for my main problem It's between these 3 tables:
Employee(Cashier)
Customer
Orders(Receipt).
The way I did it is:
The customer gather the products he wants to buy and present it to the employee, then the employee gets the order for the customer from the machine, so:
There is a relation between the Customer and the Employee (Many to Many) : each customer can request_order from one or more Employee and each Employee can get_order to one or more Customer.
There is a relation between the Employee and the Orders (1 To Many) : each Employee can get one or more orders, each order is fetched by one employee.
The problem is if I want to know the customer related to that specific order......I can't.
How do I fix this? How can I get the specific order that customer made.
I am still very new to this, so sorry for any obvious mistakes.
I am sticking to the Relational Database context, that you have tagged.
Data Modelling is an iterative process. There is a lot more definition that is needed, before the data model can be complete. Rather than answering the specifics that you request, which would be limited to one iteration; one increment, allow me to provide something more complete, several iterations progressed.
If it is useful, please discuss this data model, and progress it to fulfil all your requirements.
Of course it is too small as an inline graphic. As a PDF Supermarket Data Model.
The Standard for Relational Data Modelling since 1983 is IDEF1X. For those unfamiliar with the Standard, refer to the short IDEF1X Introduction.
I am using PowerDesigner to create the ERD and then generate everything from it (LDM, PDM, OOM). Is this a bad idea?.
PowerDesigner is great. Just ignore the Oracle-specific nonsense, it pushes you into considering the physical far too early.
Skip the ERD, it is brain-dead in the context of the Relational paradigm, and surpassed by IDEF1X, which is specific to that paradigm.
Use the Entity Level display for ERD equivalence.
For small projects you can ignore the academic distinctions {CDM; LDM; PDM; OOM, etc}.
There is actually just one model: it is "conceptual" at the beginning, and you just progress to "logical", and last, when the "logical" is stable, to the "physical".
Understand that the whole process is Logical.
Unfortunately, in PD you have to have separate "models" or files for each.
Now for my main problem It's between these 3 tables:
I have solved that issue. And exposed others.
each customer can request_order from one or more Employee and each Employee can get_order to one or more Customer
each Employee can get_order to one or more Customer
Yes, but that is the overall result. In each shopping or presentation instance:
a customer can request_order from one Employee (Cashier)
a Employee can get_order from one Customer
The problem is if I want to know the customer related to that specific order......I can't. How do I fix this?
Solved: Each Order is Identified by (CustomerId, DateTime), ie. the Customer who created the Order.
Note
Do not mix Process elements (eg. Get_Order) with Data elements (eg. the data model). The two areas are separate, and governed by quite different science. Here we are solving the Data; only the Data; and nothing but the Data. After that, the Process Model is easy.
RecordIds are anti-Relational. They are certainly not needed in a Relational database. Read my other Answers for detailed explanations.
Relational Keys (aka Compound Keys or Composite Keys) are standard fare in a Relational database. They provide far more integrity than a RecordId based file ever can.
You need to be more precise (state the exact sequence) in defining how an Order is created.
Please feel free to comment or ask specific questions.

MySql table with potentially *very* many columns

A friend who is a recruiter for software engineers wants me to create an app for him.
He wants to be able to search candidates' CVs based on skills.
As you can imagine, there are potentially hundreds, possibly thousands of skills.
What's the best way to represent the candidate in a table? I am thinking skill_1, skill_2, skill_n, etc, but somewhere out there there is a candidate with more than n skills.
Also, it is possible that more skills will be added to the database in future.
So, what's the best way to represent a candidate's skills?
[Update] for #zohar, here's a rough first pass at teh schema. Any comments?
You need three tables (at least):
One table for candidates, that will contain all the details such as name, contact information, the cv (or a link to it) and all other relevant details.
One table for skills - that will contain the skill name, and perhaps a short description (if that's relevant)
and one table to connect candidates to skills - candidatesToSkills - that will have a 1 to many relationship with both tables - and a primary key that is the combination of the candidate id and the skill id.
This is the relational way of creating a many to many relationship.
As a bonus, you can also add a column for skill level - beginner, intermediate, skilled, expert etc'.
You might also want to add a table for job openings and another table to connect that to the skills table, so that you can easily find the most suitable candidate for the job based on the required skills. (but please note that skills is not the only match needed - other points to match are geographic location, salary expectations, etc'.)

Designing a correct table structure

We are trying to save the Family History of a particular foreign job applicant. Below are the details we have to save.
Familiy Member: Father|Mother|1st Brother| 2nd Brother| 1st Sister| etc etc
Health Status: Alive|Deceased
Health Condition (Negative/Positive): Arthritis |Asthma |COPD |Diabetes etc etc
Health Condition (Comment): Arthritis |Asthma |COPD |Diabetes etc etc
Overall Comment
Below is its UI, so you can understand it better.
Now our problem creating a database table for storing these information. Below are the things to be considered.
If the job applicant like, he can provide the data of any number of family members.
There are hundreds of items to come under "General Data". So we can't create columns in table for every single items in that.
"Overall History Comment" is a comment about the entire family history, not related to a particular member.
The table design we made is below
Here are some sample input to the table.
FamilyHistory
a) 1,1,1st Brother,Alive, Asthma, Not serious
b) 2,1,1st Brother,Alive, Cancer, Lung Cancer
c) 3,2,2nd Sister,Alive, Asthma,serious
d) 4,2,2nd Sister,Alive, Diabetes,serious
OverallComment
a) 1,1,1,Overall Condition Normal
b) 2,3,2,NULL
However we feel this design is bad due to the below points.
Have a look at the a) and b) of Family History input. The 1st brother of the job applicant have 2 health conditions. To enter this, 2 rows are inserted and all the details about him are repeated except the different health conditions.
Can you please let me know how to make this design better?
My first thought is: Why do you record this data? What is it good for? I cannot imagine its use. However, the answer will help design.
Is it important whether the second sister or the father has arthritis? If not, then why distinguish the two? You could go without Family member types. (If you want so use a text field where you type in '2nd sis', 'mom', 'father', whatever.)
Are you going to have reports on that (e.g. 20% of our applicants told us they have family members with cancer)? Or will you always simply look up one applicant and see their family entries? If the latter, you could make this one text column where you simply type in all members and their health (or have your program write this).
Another point: Why is OverallComment a separate table? Do you need this for internationalization? Or for database-wide text search? If not, make this a column in the related table instead.
applicant ( applicant_id , name , comment )
If you need the relational model for queries and reports, then have one table for the family member:
family_member ( family_member_id , applicant_id , family_member_type, alive, comment )
And another table for the several desease entries per member:
family_member_desease ( family_member_id , desease_id , comment )
Maybe you should add dates. E.g.: When was the father reported to be alive?

Database design - table design for modeling a hierarchy

I am designing a laboratory information system (LIS) and am confused on how to design the tables for the different laboratory tests. How should I deal with a table that has an attribute with multiple values and each of the multiple values of that attribute can also have multiple values as well?
Here's some of the data in my LIS design...
HEMATOLOGY <-------- Lab group
**************************************************************
CBC <-------- Sub group 1
RBC <-------- Component
WBC
Hemoglobin
Hematocrit
MCV
MCH
MCHC
Platelet count
Hemoglobin
Hematocrit
WBC differential
Neutrophils
Lymphocytes
Monocytes
Eosinophils
Basophils
Platelet count
Reticulocyte count
ESR
Bleeding time
Clotting time
Pro-time
Peripheral smear
Malarial smear
ABO
RH typing
CLINICAL MICROSCOPY <-------- Lab Group
**************************************************************
Routine urinalysis <-------- Sub group 1
Visual Examination <-------- Sub group 2
Color <-------- Component
Turbidity
Specific Gravity
Chemical Examination
pH
protein
glucose
ketones
RBC
Hbg
bilirubin
specific gravitiy
nitrite for bacteria
urobilinogen
leukocyte esterase
Microscopic Examination
Red Blood Cells (RBCs)
White Blood Cells (WBCs)
Epithelial Cells
Microorganisms (bacteria, trichomonads, yeast)
Trichomonads
Casts
Crystals
Occult Blood
Pregnancy Test
...This hierarchy of data also gets repeated in other lab groupings in my design (e.g. Blood chemistry, Serology, etc)...
Another question is, how am I gonna deal with a component (for example, RBC) which can be a member of one or more lab groups?
I already implemented a solution to my problem by making a separate tables, 1 for lab group, 1 for sub group 1, 1 for sub group 2 and 1 for component. And then created another table to consolidate all of them by placing a foreign key of each in this table...the only trade off is that some of the rows in this table may have null values. Im not satisfied with my design, so I'm hoping someone could give me advise on how to make it right; any help would be greatly appreciated.
Here are a couple options:
If it is just the hierarchy above you are modeling, and there is no other data involved, then you can do it in two tables:
One problem with this is that you do not enforce that, for example, a sub_group must be a child of a lab_group, or that a component must be child of either a sub_group_1 or a sub_group_2, but you could enforce these requirements in your application tier instead.
The plus side of this approach is that the schema is nice and simple. Even if the entities have more data associated with them, it might still be worth modeling the hierarchy like this and have some separate tables for the entities themselves.
If you want to enforce the correct relationships at the data level, then you are going to have to split it out into separate tables. Maybe something like this:
This assumes that each sub_group_1 is only related to a single lab_group. If this is not the case then add a link table between lab_group and sub_group_1. Likewise for the sub_group_1 -> sub_group_2 relationship.
There is a single link table between component and sub_group_1 and sub_group_2. This allows a single component to be related to several sub_group_1 and sub_group_2 entities. The fact it is a single table means that a lot of the sub_group_1_id and sub_group_2_id records will be null (like you mentioned in your question). You could prevent the nulls be having two separate link tables:
sub_group_1_component with a foreign key to sub_group_1 and a foreign key to component
sub_group_2_component with a foreign key to sub_group_2 and a foreign key to component
The reason I didn't put this in the diagram is that for me, having to query two tables rather than one to get all the component -> sub_group relationships is too much of a pain. For the sake of a little denormalisation (allowing a few nulls) it is much easier to query a single table. If you find yourself allowing a lot of nulls (like a single link table for the relationships between all the entities here) then that is probably denormalising too much.
Personally, I would create 3 tables using relationships for the values. It gives you the ability to create limitless arrays of values. Just try to make sure you give great column names, or your head will spin for days. :)
Also, null values aren't a problem look into all the different type of joins

Method To Create Database for Tv Shows

This is my first question to stackoverflow so if i do something wrong please let me know i will fix it as soon as possible.
So i am trying to make a database for Tv Shows and i would like to know the best way and to make my current database more simple (normalization).
I would to be able to have the following structure or similar.
Fringe
Season 1
Episodes 1 - 10(whatever there are)
Season 2
Episodes 1 - 10(whatever there are)
... (so on)
Burn Notice
Season 1
Episodes 1 - 10(whatever there are)
Season 2
Episodes 1 - 10(whatever there are)
... (so on)
... (More Tv Shows)
Sorry if this seems unclear. (Please ask for clarification)
But the structure i have right now is 3 tables (tvshow_list, tvshow_episodes, tvshow_link)
//tvshow_list//
TvShow Name | Director | Company_Created | Language | TVDescription | tv_ID
//tvshow_episodes//
tv_ID | EpisodeNum | SeasonNum | EpTitle | EpDescription | Showdate | epid
//tvshow_link//
epid | ep_link
The Director and the company are linked by an id to another table with a list of companies and directors.
I am pretty sure that there is an more simplified way of doing this.
Thanks for the help in advance,
Krishanthan Lingeswaran
The basic concept of Normalization is the idea that you should only store one copy of any item of data that you have. It looks like you've got a good start already.
There are two basic ways to model what you're trying to do here, with episodes and shows. In the database world, we you might have heard the term "one to many" or "many to many". Both are useful, it just depends on your specific situation to know which is the correct one to use. In your case, the big question to ask yourself is whether a single episode can belong to only one show, or can an episode belong to multiple shows at once? I'll explain the two forms, and why you need to know the answer to that question.
The first form is simply a foreign key relationship. If you have two tables, 'episodes' and 'shows', in the episodes table, you would have a column named 'show_id' that contains the ID of one (and only one!) show. Can you see how you could never have an episode belong to more than one show this way? This is called a "one to many" relationship, i.e. a show can have many episodes.
The second form is to use an association table, and this is the form you used in your example. This form would allow you to associate an episode with multiple shows and is therefore called a "many to many" relationship.
There is some benefit to using the first form, but it's not really that big of a deal in most cases. Your queries will be a little bit shorter because you only have to join 2 tables to get episodes->shows but the other table is just one more join. It really comes down to figuring out if you need a "one to many" or "many to many" type relationship.
An example of a situation where you would need a many-to-many relationship would be if you were modeling a library and had to keep track of who checked out which book. You'd have a table of books, a table of users, and then a table of "books to users" that would have an id, a book_id, and a user_id and would be a many-to-many relationship.
Hope that helps!
I am pretty sure that there is an more simplified way of doing this.
Not as far as I know. Your schema is close to the simplest you can make for what I presume is the functionality you're asking for. "Improvements" on it really only make it more complicated, and should be added as you judge the need emerges on your side. The following examples come to mind (none of which really simplify your schema).
I would standardize your foreign key and primary key names. An example would be to have the columns shows.id, episodes.id, episodes.show_id, link.id, link.episode_id.
Putting SeasonNum as what I presume will be an int in the Episodes table, in my opinion, violates the normalization constraint. This is not a major violation, but if you really want to stick to it, I would create a separate Seasons table and associate it many-to-one to the Shows table, and then have the Episodes associate only with the Seasons. This gives you the opportunity to, for instance, attach information to each season. Also, it prevent repetition of information (while the type of the season ID foreign key column in the Episodes table would ostensibly still be an INT, a foreign key philosophically stores an association, what you want, versus dumb data, what you have).
You may consider putting language, director, and company in their own tables rather than your TV show list. This is the same concern as above and in your case a minor violation of normalization.
Language, director, and company all have interesting issues attached to them regarding the level of the association. Most TV shows have different directors for different episodes. Many are produced in multiple languages and by several different companies and sometimes networks. So at what level do you plan on storing this information? I'm not a software architect, so someone else can better answer this question than me, but I'd set up a polymorphic many-to-many association for languages, directors, and companies and an inheritance model that allows for these values to be specified on an episode-by-episode, season-by-season, or show-by-show basis, inheriting the value from its parent if none are provided.
Bottom line concerning all these suggestions: Pick what's appropriate for your project. If you don't need the functionality afforded by this level of associations, and you don't mind manually entering in repetitive data (you might end up implementing an auto-complete system to help you), you can gloss over some of the normalization constraints.
Normalization is merely a suggestion. Pick what's right for you and learn from your mistakes.