Designing a correct table structure - mysql

We are trying to save the Family History of a particular foreign job applicant. Below are the details we have to save.
Familiy Member: Father|Mother|1st Brother| 2nd Brother| 1st Sister| etc etc
Health Status: Alive|Deceased
Health Condition (Negative/Positive): Arthritis |Asthma |COPD |Diabetes etc etc
Health Condition (Comment): Arthritis |Asthma |COPD |Diabetes etc etc
Overall Comment
Below is its UI, so you can understand it better.
Now our problem creating a database table for storing these information. Below are the things to be considered.
If the job applicant like, he can provide the data of any number of family members.
There are hundreds of items to come under "General Data". So we can't create columns in table for every single items in that.
"Overall History Comment" is a comment about the entire family history, not related to a particular member.
The table design we made is below
Here are some sample input to the table.
FamilyHistory
a) 1,1,1st Brother,Alive, Asthma, Not serious
b) 2,1,1st Brother,Alive, Cancer, Lung Cancer
c) 3,2,2nd Sister,Alive, Asthma,serious
d) 4,2,2nd Sister,Alive, Diabetes,serious
OverallComment
a) 1,1,1,Overall Condition Normal
b) 2,3,2,NULL
However we feel this design is bad due to the below points.
Have a look at the a) and b) of Family History input. The 1st brother of the job applicant have 2 health conditions. To enter this, 2 rows are inserted and all the details about him are repeated except the different health conditions.
Can you please let me know how to make this design better?

My first thought is: Why do you record this data? What is it good for? I cannot imagine its use. However, the answer will help design.
Is it important whether the second sister or the father has arthritis? If not, then why distinguish the two? You could go without Family member types. (If you want so use a text field where you type in '2nd sis', 'mom', 'father', whatever.)
Are you going to have reports on that (e.g. 20% of our applicants told us they have family members with cancer)? Or will you always simply look up one applicant and see their family entries? If the latter, you could make this one text column where you simply type in all members and their health (or have your program write this).
Another point: Why is OverallComment a separate table? Do you need this for internationalization? Or for database-wide text search? If not, make this a column in the related table instead.
applicant ( applicant_id , name , comment )
If you need the relational model for queries and reports, then have one table for the family member:
family_member ( family_member_id , applicant_id , family_member_type, alive, comment )
And another table for the several desease entries per member:
family_member_desease ( family_member_id , desease_id , comment )
Maybe you should add dates. E.g.: When was the father reported to be alive?

Related

Database ER Model weekday availability

I've got a annoying design issue when designing a database and it's models. Essentially, the database got clients and customers which should be able to make appointments with eachother. The clients should have their availability (on a general week basis) stored in the database, and this needs to be added to the appointment model. The solution does not require or want precise hours for the availability, just one value for each day - ranging from "not available", to "maybe available " to "available". The only solution i've come up with so far includes having all 7 days stored in a row for each client, but it looks nasty.
So here's some of what I got so far:
Client model:
ClientId
Service,
Fee
Customer-that-uses-Client model:
CustomerId
ServiceNeed
Availability-model:
ClientID (FK/PK)
Monday, (int)
...
...
Sunday (int)
And finally, appointment model:
AppointmentId
ClientID
CustomerID
StartDate
Hourse
Problem: is there any way i can redesign the avilability model to ... well, need less fields and still get each day stored with a (1-3) value depending on the clients availability ? Would also be really good if the appointment model wouldnt need to reference all that data from the availability model...
Problem
Answering the narrow question is easy. However, noting the Relational Database tag, there are a few problems in your model, that render it somewhat less than Relational.
Eg. the data content in each logical row needs to be unique. (Uniqueness on the Record id, which is physical, system-generated, and not from the data, cannot provide row uniqueness.) The Primary Key must be "made up from the data", which is of course the only way to make the data row unique.
Eg. values such as Day of availability and AvailabilityType are not constrained, and they need to be.
Relational Data Model
With the issues fixed, the answer looks like this:
Notation
All my data models are rendered in IDEF1X, the Standard for modelling Relational databases since 1993.
My IDEF1X Introduction is essential reading for those who are new to the Relational Model or data modelling.
Content
In the Relational Model, there is a large emphasis on constraining the data, such that the database as a whole contains only valid data.
The only solution i've come up with so far includes having all 7 days stored in a row for each client, but it looks nasty.
Yes. What you had was a repeating attribute (they are named Monday..Sunday, which may not look like a repeating attribute, but it is one, no less than a CSV list). That breaks Codd's Second Normal Form.
The solution is to place the single element in a subordinate table ProviderAvailable.
Day of availability and AvailabilityType are now constrained to a set of values.
The rows in Provider (sorry, the use of "Client" in this context grates on me) and Customer are now unique, due to addition of a Name. The users will not use an internal number to identify such entities, they will use a name, usually a ShortName.
Once the model is tightened up, and all the columns are defined, if Name (not a combination of LastName, FirstName, Initial) is unique, you can eliminate the RecordId, and elevate the Name AK to the PK.
Not Modelled
You have not asked, and I have not modelled these items, but I suspect they will come up as you progress in the development.
A Provider (Client) provides 1 Service. There may be more than 1 in future.
A Customer, seeking 1 Service, can make an Appointment with any Provider (who may or may not provide that Service). You may want to constrain each Appointment to a Provider who provides the sought Service.
As per my comment. It depends on how tight you want this Availability/Reservation system to be. Right now, there is nothing to prevent more than one Customer reserving one Provider on a particular Day, ie. a double-booking.
Normalize that availability table: instead of
ClientID (FK/PK)
Monday, (int)
...
...
Sunday (int)
go with
ClientID (PK/FK)
weekday integer value (0-6 or maybe 1-7) (PK)
availability integer value 1-3
This table has a compound primary key, made of (ClientID, weekday) because each client may have either zero or one entry for each of the seven weekdays.
In this table, you might have these rows:
43 2 3 (on Tuesdays = 2, client 43 is Available =3)
43 3 2 (on Wednesdays = 3, client 43 is MaybeAvailable =2)
If the row is missing, it means the client is unavailable. an availability value of 1 also means that.

MySql table with potentially *very* many columns

A friend who is a recruiter for software engineers wants me to create an app for him.
He wants to be able to search candidates' CVs based on skills.
As you can imagine, there are potentially hundreds, possibly thousands of skills.
What's the best way to represent the candidate in a table? I am thinking skill_1, skill_2, skill_n, etc, but somewhere out there there is a candidate with more than n skills.
Also, it is possible that more skills will be added to the database in future.
So, what's the best way to represent a candidate's skills?
[Update] for #zohar, here's a rough first pass at teh schema. Any comments?
You need three tables (at least):
One table for candidates, that will contain all the details such as name, contact information, the cv (or a link to it) and all other relevant details.
One table for skills - that will contain the skill name, and perhaps a short description (if that's relevant)
and one table to connect candidates to skills - candidatesToSkills - that will have a 1 to many relationship with both tables - and a primary key that is the combination of the candidate id and the skill id.
This is the relational way of creating a many to many relationship.
As a bonus, you can also add a column for skill level - beginner, intermediate, skilled, expert etc'.
You might also want to add a table for job openings and another table to connect that to the skills table, so that you can easily find the most suitable candidate for the job based on the required skills. (but please note that skills is not the only match needed - other points to match are geographic location, salary expectations, etc'.)

Database structure for simple waiting times project with CSV data and MySql

Suppose I have some sample data like that shown below (with a lot more entries), and my main use case is to look up a specific aliment and provide a list of waiting times for different hospitals which offer that treatment.
Not being very experienced at all with DB design, I don't know whether in this example there is an advantage to using separate tables with links between then or if a simple import of the CSV to a single table will suffice.
If I used separate tables, I'm guessing they would be for hospital and ailment perhaps?
I would be very grateful if someone tell me the best approach for this.
ID,Main Department,Specific Complaint,Hospital ,Waiting time
1,Cardiology,general,Hospital 1,7
2,Cardiology,general,Hospital 2,7
3,Cardiology,general,Hospital 3,7
4,Cardiology,general,Hospital 4,21
5,Cardiology,traumatology,Hospital 1,8
6,Cardiology,traumatology,Hospital 2,7
7,Dermatology,general,Hospital 1,21
8,Dermatology,general,Hospital 2,14
9,Dermatology,general,Hospital 3,21
10,Dermatology,erysipelas,Hospital 1,7
11,Dermatology,erysipelas,Hospital 3,7
...
One detail you must understand, SO is not a teaching site, tutorials abound for that. It is more to address specific problems that arise when developing solutions. That being said, I like this type of question, so here goes.
The type of solution to implement (simple CSV or complete database) depends on the volume of data, and type type of reports you require.
CSV is quick to implement.
Database takes more time, but will allow you to produce more complex reports than CSV, through the use of queries.
CSV is often used as a medium to load or extract data, but as for queries it is not as powerful.
A database can be expanded. Ex. today you only consider the name of the hospital. You could expand your table to include the address, phone number, ... You could also expand your model to add insurance company links, doctors, ...
Basic modeling:
Identify your objects. Ex. here I would consider ailment, hospital, complaint.
Identify relations between objects, and their type. Ex. ailment and hospital are linked, the that link is n-n. Meaning 1 ailment can be treated in many hospitals, and 1 hospital can treat many ailments.
I am not certain what to do with complaint. In your question you do not specify if all hospitals treat all (ailment - complaint) duos or not. More on that later.
As you define your structure, make sure you apply the normal forms. In most cases, forms 1-3 are enough.
1NF: atomic values and no repeating groups. Ex. you would create table with columns hospital and ailments separated by commas. 1 line == 1 hospital <-> 1 ailment.
2NF: 1NF is achieved and all the non-key attributes are dependent on the primary key. Ex. you should not create a table linking ailment and wait time. The wait time is not dependent on the ailment, it is dependent on the combination of ailment and hospital.
3NF: 2NF is achieved and there are no transitive functional dependencies. So A is dependent on B, B is dependant on C, so A is transitively dependent on C.
Some critical questions must be answered before you can model your data:
A hospital can treat a certain ailment. In all cases?
Can you have: hospital 1 can tread ailment 1 when the complaint is A and B, but not C?
Ex. all hospitals can provide primary care for cardiac patients, but cardiac surgery can only be performed as some hospitals.
In that case, you cannot link ailment and hospital together directly. A combination of (ailment,complaint) can. And this will impact wait time.
Based on reality, I will link (ailment and complaint) and link this duo to hospital.
Here is my first model, "for fun", which might need to be modified for your needs:
Wait time is in table Hospital_Treads_Ailment_has_Complaint. In my model, an hospital can only estimate the wait time once they know which ailment and which complaint the patient has.
A final exercise I do to test my model is try the main queries I need. If one query cannot be done with the model, it needs to be changed.
Which hospital treats cardiac problems? Ok, select hospital where ailment == cardiology, complaint == *.
Which hospital can accept patients who have trauma. Ok, select hospital where ailment == *, complaint == trauma.
and so on...

SQL in PHPMYADMIN

I help with an SQL (using phpmyadmin) to join these tables and create a CLUB MEMBERSHIP list for a particular club, however I need to indicate whether the member is a club president, vice president,etc or just an ordinary member:
CLUBS: CLUBID,PRESIDENTID(memberID),VICEPRESIDENTID(memberID),TREASURER(memberID),
SECRETARY(MemberID)
MEMBERS_CLUB:
MEMBERID,CLUBID
MEMBERS:
MEMBERID, NAME,ADDRESS
There are probably half a dozen ways to solve this, and you certainly could come up with a way to make this table structure work for you, but it's probably not going to be pretty. Part of making this work well is determining what information you need to get out as well as store in the database. In this structure, it's very hard to get the member's officer status from their name, so we can improve that by changing your structure. On the other hand, if all you ever needed was a list of officers for each club, your current structure would be okay.
You could add a "member status" field to MEMBERS_CLUB (and remove the four corresponding columns from CLUBS). Each member gets a row for each club or position they hold.
SELECT `MEMBERS`.`NAME`, `MEMBERS_CLUB`.`STATUS`, `CLUBS`.`CLUBID`
FROM `MEMBERS`, `MEMBERS_CLUB`, `CLUBS`
WHERE `MEMBERS`.`MEMBERID` = `MEMBERS_CLUB`.`MEMBERID`
which is close but gives us duplicates if there are duplicates in the table, for instance if you have two entries for Bob, one listing him as president and one listing him as secretary. By using GROUP_CONCAT() we can accomplish exactly what you are looking for while dealing properly with duplicated names:
SELECT `MEMBERS`.`NAME`, GROUP_CONCAT(`MEMBERS_CLUB`.`STATUS`), `CLUBS`.`CLUBID`
FROM `MEMBERS`, `MEMBERS_CLUB`, `CLUBS`
WHERE `MEMBERS`.`MEMBERID` = `MEMBERS_CLUB`.`MEMBERID`
GROUP BY `NAME`

How to display item as 'in transit' instead of to specific location id (foreign key)?

I have following requirements for item management.
Item can be moved from location 'A' to 'B'. And later on it can also be moved from 'B' to 'C' location.
History should be maintained for each item to display it location wise items for specific period, can be display item wise history.
Also I need to display items 'in transit' on particular date.
Given below is the database design:
item_master
-----------
- ItemId
- Item name
- etc...
item_location_history
------------------
- ItemId
- LocationId (foreign key of location_master)
- Date
While item is being transported I want to insert data in following way:
1. At the time of transport I want to enter item to be moved from location 'A' to 'In Transit' on particular date. As there is possibilities that item remains in 'in transit' state for several days.
2. At the time of receive at location 'B' I want to insert item to be moved from 'In Transit' to location 'B' on particular date and so on.
This way I will have track of both 'In Transit' state and item location.
What is the best way to achieve this? What changes I need to apply to the above schema? Thanks.
Initial Response
What is the best way to achieve this?
This is a simple and common Data Modelling Problem, and the answer (at least in the Relational Database context) is simple. I would say, every database has at least a few of these. Unfortunately, because the authors who write books about the Relational Model, are in fact completely ignorant of it, they do not write about this sort of simple straight-forward issue, or the simple solution.
What you are looking for is an OR gate. In this instance, because the Item is in a Location XOR it is InTransit, you need an XOR gate.
In Relational terms, this is a Basetype::Subtype structure. If it is implemented properly, it provides full integrity, and eliminates Nulls.
As far as I know, it is the only Relational method. Beware, the methods provided by famous writers are non-relational, monstrous, massively inefficient, and they don't work.
###Record ID
But first ... I would not be serving you if I didn't mention that your files have no integrity right now, you have a Record Filing System. This is probably not your fault, in that the famous writers know only pre-1970's Record Filing Systems, so that is all that they can teach, but the problem is, they badge it "relational", and that is untrue. They also have various myths about the RM, such as it doesn't support hierarchies, etc.
By starting with an ID stamped on every table, the data modelling process is crippled
You have no Row Uniqueness, as is required for RDBS.
an ID is not a Key.
If you do not understand that, please read this answer.
I have partially corrected those errors:
In Item, I have given a more useful PK. I have never heard any user discuss an Item RecordId, they always uses Codes.
Often those codes are made up of components, if so, you need to record those components in separate columns (otherwise you break 1NF).
Item needs an Alternate Key on Name, otherwise you will allow duplicate Names.
In Location, I have proposed a Key, which identifies an unique physical location. Please modify to suit.
If Location has a Name, that needs to be an AK.
I have not given you the Predicates. These are very important, for many reasons. The main reason here, is that it will prove the insanity of Record IDs. If you want them, please ask.
If you would like more information on Predicates, visit this Answer, scroll down (way down!) to Predicate, and read that section. Also check the ERD for them.
###Solution
What changes [do] I need to apply to the above schema?
Try this:
Item History Data Model
(Obsolete, refer below for the updated mode, in the context of the progression)
If you are not used to the Notation, please be advised that every little tick, notch, and mark, the solid vs dashed lines, the square vs round corners, means something very specific. Refer to the IDEF1X Notation for a full explanation, or Model Anatomy.
If you have not encountered Subtypes implemented properly before, please read this Subtype Overview
That is a self-contained document, with links to code examples
There is also an SO discussion re How to implement referential integrity in subtypes.
When contemplating a Subtype cluster, consider each Basetype::Subtype pair as a single unit, do not perceive them as two fragments, or two halves. Each pair in one fact.
ItemHistory is an event (a fact) in the history of an Item.
Each ItemHistory fact is either a Location fact XOR an InTransit fact.
Each of those facts has different attributes.
Notice that the model represents the simple, honest, truth about the real world that you are engaging. In addition to the integrity, etc, as discussed above, the result is simple straight-forward code: every other "solution" makes the code complex, in order to handle exception cases. And some "solutions" are more horrendous than others.
Dr E F Codd gave this to us in 1970. It was implemented it as a modelling method in 1984, named IDEF1X. That became the standard for Relational Databases in 1993. I have used it exclusively since 1987.
But the authors who write books, allegedly on the Relational Model, have no knowledge whatsoever, about any of these items. They know only pre-1970's ISAM Record Filing Systems. They do not even know that they do not have the Integrity, Power, or Speed of Relational Databases, let alone why they don't have it.
Date, Darwen, Fagin, Zaniolo, Ambler, Fowler, Kimball, are all promoting an incorrect view of the RM.
Response to Comments
1) ItemHistory, contains Discriminator column 'InTransit'.
Correct. And all the connotations that got with that: it is a control element; its values better be constrained; etc.
Shall it be enum with the value Y / N?
First, understand that the value-stored has meaning. That meaning can be expressed any way you like. In English it means {Location|InTransit}.
For the storage, I know it is the values for the proposition InTransit are {True|False}, ...
In SQL (if you want the real article, which is portable), I intended it as a BIT or BOOLEAN. Think about what you want to show up in the reports. In this case it is a control element, so it won't be present in the user reports. There I would stick to InTransit={0|1}.
But if you prefer {Y|N}, that is fine. Just keep that consistent across the database (do not use {0|1} in one place and {Y|N} in another).
For values that do show up in reports, or columns such as EventType, I would use {InTransit|Location}.
In SQL, for implementation, if it BOOLEAN, the domain (range-of-values) is already constrained. nothing further is required.
If the column were other BOOLEAN,` you have two choices:
CHECKConstraint
CHECK #InTransit IN ( "Y", "N" )
Reference or Lookup Table
Implement a table that contains only the valid domain. The requirement is a single column, the Code itself. And you can add a column for short Descriptor that shows up in reports. CHAR(12)works nicely for me.
ENUM
There is no ENUM in SQL. Some of the non-SQL databases have it. Basically it implements option [2] above, with a Lookup table, under the covers. It doesn't realise that the rows are unique, and so it Enumerates the rows, hence the name, but it adds a column for the number, which is of course an ID replete with AUTOINCREMENT, so MySQL falls into the category of Stupid Thing to Do as described in this answer (scroll down to the Lookup Table section).
So no, do not use ENUM unless you wish to be glued at the hip to a home-grown, stupid, non-SQL platform, and suffer a rewrite when the database is ported to a real SQL platform. The platform might be stupid, but that is not a good reason to go down the same path. Even if MySQL is all you have, use one of the two SQL facilities given above, do not use ENUM.
2) Why is'ItemHistoryTransit' needed as 'Date' column
(DATETIME,not DATE, but I don't think that matters.)
[It] is there in ItemHistory?
The standard method of constraining (everything in the database is constrained) the nature of teh Basetype::Subtype relationship is, to implement the exact same PK of the Basetype in the Subtype. The Basetype PK is(ItemCode, DateTime).
[Why] will only Discriminator not work?
It is wrong, because it doesn't follow the standard requirement, and thus allows weird and wonderful values. I can't think of an instance where that could be justified, even if a replacement constraint was provided.
Second, there can well be more than two occs of ItemEventsthat are InTransitper ItemCode,`which that does not allow.
Third, it does not match the Basetype PK value.
Solution
Actually, a better name for the table would be ItemEvent. Labels are keys to understanding.
I have given the Predicates, please review carefully.
Data model updated.
Item Event Data Model
You could add a boolean field for in_transit to item_location_history so when it is getting moved from Location A to Location B, you set the LocationId to Location B (so you know where it is going) but then when it actually arrives you log another row with LocationId as LocationB but with in_transit as false. That way you know when it arrived also.
If you don't need to know where it is headed when it is "in transit" then you could just add "In Transit" as a location and keep your schema the same. In the past with an inventory applicaiton, I went as far as making each truck a location so that we knew what specific truck the item was in.
One of the techniques I've adopted over the years is to normalize transitional attributes (qty, status, location, etc.) from the entity table. If you also want to track the history, just version (versionize?) the subsequent status table.
create table ItemLocation(
ItemID int,
Effective date,
LocationID int,
Remarks varchar( 256 ),
constraint PK_ItemLocation primary key( ItemID, Effective ),
constraint FK_ItemLocation_Item foreign key( ItemID )
references Items( ID ),
constraint FK_ItemLocation_Location foreign key( LocationID )
references Locations( ID )
);
There are several good design options, I've shown the simplest, where "In transit" is implied. Consider the following data:
ItemID Effective LocationID Remarks
====== ========= ========== ===============================
1001 2015-04-01 15 In location 15
1001 2015-04-02 NULL In Transit [to location xx]
1001 2015-04-05 17 In location 17
Item 1001 appears in the database when it arrives at location 15, where it spends one whole day. The next day it is removed and shipped. Three days later it arrives at location 17 where it is remains to this day.
Implied meanings are generally frowned upon and are indeed easy to overdo. If desired, you can add an actual status field to contain "In location" and "In Transit" values. You may well consider such a course if you think there could be other status values added later (QA Testing, Receiving, On Hold, etc.). But for just two possible values, In Location or In Transit, implied meaning works.
At any rate, you know the current whereabouts of any item by fetching the LocationID with the latest Effective date. You also have a history of where the item is at any date -- and both can be had with the same query.
declare AsOf date = sysdate;
select i.*, il.Effective, IfNull( l.LocationName, 'In Transit' ) as Location
from Items i
join ItemLocation il
on il.ItemID = i.ID
and il.Effective =(
select Max( Effective )
from ItemLocation
where ItemID = il.ItemID
and Effective <= AsOf )
left join Locations l
on l.ID = il.LocationID;
Set the AsOf value to "today" to get the most recent location or set it to any date to see the location as of that date. Since the current location will be far and away the most common query, define a view that generates just the current location and use that in the join.
join CurrentItemLocation cil
on cil.ItemID = i.ID
left join Locations l
on l.ID = cil.LocationID;