Converting medical datasheet into MySQL database design - mysql

I am trying to develop a medical symptom checker app and therefore I need to convert my excel datasheet which contains over 190k of records into a MySQL database. I have asked and read multiple related questions before, but I still find it difficult to create an efficient/proper database design.
Please take a look at the design of the app (1st image) to get an idea how the app works.
Steps user should follow to check symptom(s)
User chooses gender, age and bodypart
App shows all (common/less common) symptoms of chosen bodypart
User chooses symptom
Apps asks if there are more symptoms that apply (only if symptom has additional symptoms in database). User can tick up to 2 additional symptoms.
App shows all (common/less common) diseases of chosen symptom and additional symptoms. The order (weight) of the diseases is dependent on the selected age, gender, bodypart, main symptom and selected additional symptoms.
User chooses disease
App shows disease information
Attributes:
age, gender, bodypart, symptom, disease
age: the app queries the database using id's; 0-5 is 1, 6-17 is 2, 18-59 is 3, 60+ is 4
gender: the app queries the database using id's; male is 1, female is 0
bodypart: the app queries the database using id's; 'Head front' is 1, 'Neck front' is 2 etc...
symptom: name, critical. critical is created to tell the user that he/she needs to contact their doctor immediately.
disease: name, critical, description, tests and treatment. critical is created to tell the user that he/she needs to contact their doctor immediately
I already have a database which contains all data and possible combinations of input/output. Unfortunately it is not designed to be used in an app (2nd image).
As you can see in the 2nd image, the order of the diseases (disease weight) is dependent on the selected age, gender, bodypart, symptom and selected additional symptoms (additional symptoms that apply). Each symptom may have up to 2 additional symptoms. The user can check either 0, 1 or 2 of the additional symptoms and the order of the disease will be different for each of these options.
Each symptom is either common (1) or less common (0). It depends on the user input (age, gender, bodypart).
Each disease weight <= 5 is considered to be a common disease. Diseases with weight > 5 are considered as less common diseases. Of course it is also depends on the user input (age, gender, bodypart, symptom, additional symptoms). I have tried so many things but I still don't know how to design this feature in a proper way.
Could anyone help me designing a suitable database?
UPDATE 1
Basically we need to keep 3 queries in mind when designing the database
Get all symptoms (symptom.id, symptom.name, symptom.critical, symptom community (common/less common)) that belong to the combination of selected age = $age, gender = $gender AND bodypart = $bp
Get all additional symptoms (symtpom.id symptom.name) of selected symptom
Get all diseases (disease.id, disease.name, disease.critical disease weight) that belong to the combination of selected age, gender, bodypart, main symptom and additional symptoms.
App design
Excel datasheet

A Disease table is fairly simple; it contains columns H..O, with duplicates removed. Plus there is a unique ID for each row. (See AUTO_INCREMENT) I'm unclear on whether disease_weight belongs in the Disease table or somewhere else.
Symptoms might be best implemented as a SET datatype.
Another table contains columns of gender, age_range, body_part, symptoms, and disease_id (and maybe disease_weight).
The main SELECTs that I see are
SELECT symptoms FROM table2
WHERE age_range = $ar
AND gender = $gender
AND body_part = $bp
AND FIND_IN_SET(symptoms, $symptom1);
To get the possible secondary symptoms.
(You have not explained how a user will enter an age_range; I assume that will end up in your cgi language as $ar. (Etc)
SELECT d.name, ...
FROM Table2 t
JOIN Diseases d ON d.disease_id = t.disease_id
WHERE age_range ...
AND symptoms & $symptoms;
(I may have a syntax error on the SET operators.)
If there are other SELECTs, you need to think about them now, not later.
You have not explained how this dataset will be updated; that could be an issue, too.
You did not actually ask how to get from Excel to MySQL; let's finish the database design first.

I know this question is outdated. But from my research, a lot of symptoms checker application are using API to access certain data.
From here, is one of the API used. Not sure if you are used your own created database but that is bad practice because the infromation that you used might be wrong and outdated.

Related

How to add new relations and making weight dependent in MySQL database?

I am working on a health app and I'm designing a database (MySQL), which should store symptoms and diseases and bodyparts. The app will work as follows:
User chooses gender, age and bodypart (front/back)
App shows all (common/less common) symptoms of chosen bodypart
User chooses symptom
App shows all (common/less common) diseases of chosen symptom.
User chooses disease
App shows disease information
I got some help so far, but I still need some help finding a solution:
Making weight in symptom_disease dependent on the selected age and gender (order of the listed diseases should depend on the selected age and gender).
Some symptoms should have additional symptoms, that the user can choose as an extra. So for example, when the user chooses 'Head' -> 'Behavioral disturbances' (common), the app should display 2 extra checkboxes 'Depressed' and 'Drugs abuse'. The order of diseases list should depend on these inputs.
Note: The weight determines if a symptom/disease is common or less
common. The common symptoms/diseases are listed above the less common symptoms/diseases.
The weight determines if a symptom/disease is common or less common. The common symptoms/diseases are listed above the less common symptoms/diseases.
That is fine. A bit simplistic, but fine. That is ranking by weight or ORDER BY weight.
Making weight in symptom_disease dependent on the selected age and gender (order of the listed diseases should depend on the selected age and gender)
That tells me the result you want (which is also fine). But the data that is required to produce that result ...
weight per disease per symptom per age, or
weight per disease per symptom per age per gender
... is absent. You cannot produce something from nothing.
Concept
The whole concept is immature. (I am, of course, speaking about the party who gave you this job, not you, who is trying to implement it.) Eg. we know that:
white people who live in sun-drenched countries suffer melanoma, etc, that the locals don't
south-east Asians who live in Western society have more diabetes and heart disease than they do in their own countries
many diseases, eg. breast cancer, have a hereditary factor
The point is, the concept is very simplistic. When that simplistic approach is thought through carefully, hopefully by medically-trained and experienced professionals, they will come up with something that is much more indicative than a symptom_disease.weight, which then has to be modulated up or down based on age and gender. They will come up with many factors that influence weight, age and gender being just two such factors.
After the system is implemented, if not before, you are going to need:
many more factors or categories
that must each have either a weight value or a factor value
which (a) the user chooses, and (b) influences the end result (probability of the disease)
Further, during such resolution:
some of those factors might turn out to be a weight (another weight, at that factor level) rather than a factor
during that process, it may turn out that the symptom_disease.weight is actually an intermediate result, not a stored value at the symptom_disease level
that might lead to weight at the disease level, that is modulated by each factor.
Result Weight
Once the data is structured and organised, obtaining resulting weight based on the user's choice of symptoms, categories, and factors is easy. It is just a single SELECT against the database.
I would use Dynamic SQL to construct the query, and then execute it, because the alternative is to code every possible query, and select the one that is relevant to the user request.
However, for a starting point, for proof of concept, just code a few separate queries.
You don't need a function on the database side. On the code side, a function is a good idea, because it eliminates duplicate code.
Record ID
This creature poses many problems, and guarantees a non-database. Please read this Answer. You may find the whole post useful, if not, the minimum you need is from the top, up to and including False Teachers.
Likewise, prefixing the column names with a file identifier is an error. In SQL, you can qualify a column name, as and when required, with the file name. Instead of s_name and bp_name:
SELECT symptom.name,
bodypart.name,
...
Data Model
The most important issue at this early stage, is to provide a database that is easy to extend, as per those expected changes, which means, pure Relational and adherence to Standards. You must place all data; rules about data; everything that relates to data, in the database (not in code).
Symptom Disease Data Model (Demonstrative)
In any case, right now, given what you have described, and considering my explanation above, we have two options. I have provided a separate data model for each.
1 Modulated Weight
Your client provides a weight per disease per symptom only, plus factors per age, gender, etc.
The factors will be used to modulate the weight.
Notice, the age Key is an age range, eg. 1-18; 19-25; 26-35; 35-55; etc.
Factor is a REAL between 0 and some reasonable maximum (eg. 100). Eg. 1.0 for age means the weight is unchanged with age; 0.5 means the weight is halved with age; 2.0 means the weight is doubled with age; etc.
The vectors or Dimensions are across the top. As and when new factors are identified (eg. Ethnicity), you can add another factor (Dimension), in the same way that age and gender have been added.
2 Specific Weight
Your client provides a specific weight per disease per symptom per age per gender, etc. There are no factors.
Discussion
In both models, when tables are added as and when new factors are identified, the core (five tables, top centre) do not change. However, in the second model, I have differentiated age and gender on this basis:
Gender is modelled as a definitive, a Dimension, and thus I have worked it through the core tables, as the Relational Key. Relational Keys are important for navigation (as in reducing it) and providing great power. Here in addition to that, I am saying gender is definitive.
The corollary is, you do not want definitive factors to be excluded from the Key of the core tables, because it will reduce navigation ability and speed.
Age is modelled as an additional Dimension, that can be added easily, after the core tables are stable, as an when new factors are identified.
One is not better than the other. The point I am making here is, and this applies to both models, but it is more visible in the Specific Weight model, the Dimensions that comprise the core tables need to be thought out carefully. Not because anything that you work into the Relational Keys are more difficult to change (and has an impact of existing code), but the other way around, that what is genuinely a definitive part of the core tables, that forms the core symptom::disease indicator, is stable. It must be determined and modelled early.
Eg. I don't understand why you have symptom as being a child of bodypart, I can imagine symptoms that are not specific to a bodypart, but I will leave that as is for now.
The ugly-as-sin fix without changing the structure is to have a special bodypart indicating the whole body.
Rather than { Head/Behavioural/...}, think about categories for symptom categories such as { Physical | Mental | Behaviour }. Or Behaviour { Smoking | Alcohol } may be a separate dimension that contributes to disease.
Eg. I can imagine symptoms that are specifically male or female only. I have modelled gender two ways, so that you can evaluate the difference, and the impact:
Divorced and isolated in 1 Modulated Weight
As a classifier (and therefore Identifier) of bodypart, symptom,and disease in 2 Specific Weight).
The point of all that is:
The definitive factors need to be exposed; evaluated; determined, early, such that the core tables are stable.
All factors that are relevant to disease diagnosis must be identified and implemented. Not merely Age and Gender, but Ethnicity; Hereditary diseases; Behaviour; Location; etc.
Please feel free to comment, ask questions, etc.
There is a method you can use to break a spectrum of choices into categories according to spectrum units, in this case age.
create table Symptom_Disease(
Sex char( 1 ) not null,
Age smallint not null,
Weight smallint not null,
BodyPart int not null references BodyPart( BP_ID),
SymptomID int not null references Symptoms( S_ID ),
DiseaseID int not null references Disease( D_ID ),
constraint PK_Symptom_Disease primary key( Sex, Age )
);
You have the sex, age, body part and symptom when you query:
select sd.Sex, sd.Age, d.*, sd.weight
from Symptom_Disease sd
join Disease d
on d.D_ID = sd.D_ID
where sd.Sex = :Sex
and sd.BodyPart = :BodyPart
and sd.Symptom = :Symptom
and sd.Age =(
select Max( Age )
from Symptom_Disease
where Sex = sd.Sex
and BodyPart = sd.BodyPart
and Symptom = sd.Symptom
and Age <= :Age )
order by weight;
For example, you might have entries (of various weights) for ages 0, 2, 12, 21, 35 and 70. The ages represent the lower cutoff for the age group. So the first record is for ages 2 or less, the second record is for ages 12 and younger down to but not including 2 and so forth. If your patient's age is 25, this query will return all the diseases of each weight for each set of (Sex, Body Part, Symptom) for age 21 -- it being the highest age less than or equal to the patient's age.
Just change sd.Symptom = :Symptom to sd.Symptom in( :S1, :S2,...) to allow for multiple symptoms. If so, you might want to add symptom to the order by clause.

Designing a correct table structure

We are trying to save the Family History of a particular foreign job applicant. Below are the details we have to save.
Familiy Member: Father|Mother|1st Brother| 2nd Brother| 1st Sister| etc etc
Health Status: Alive|Deceased
Health Condition (Negative/Positive): Arthritis |Asthma |COPD |Diabetes etc etc
Health Condition (Comment): Arthritis |Asthma |COPD |Diabetes etc etc
Overall Comment
Below is its UI, so you can understand it better.
Now our problem creating a database table for storing these information. Below are the things to be considered.
If the job applicant like, he can provide the data of any number of family members.
There are hundreds of items to come under "General Data". So we can't create columns in table for every single items in that.
"Overall History Comment" is a comment about the entire family history, not related to a particular member.
The table design we made is below
Here are some sample input to the table.
FamilyHistory
a) 1,1,1st Brother,Alive, Asthma, Not serious
b) 2,1,1st Brother,Alive, Cancer, Lung Cancer
c) 3,2,2nd Sister,Alive, Asthma,serious
d) 4,2,2nd Sister,Alive, Diabetes,serious
OverallComment
a) 1,1,1,Overall Condition Normal
b) 2,3,2,NULL
However we feel this design is bad due to the below points.
Have a look at the a) and b) of Family History input. The 1st brother of the job applicant have 2 health conditions. To enter this, 2 rows are inserted and all the details about him are repeated except the different health conditions.
Can you please let me know how to make this design better?
My first thought is: Why do you record this data? What is it good for? I cannot imagine its use. However, the answer will help design.
Is it important whether the second sister or the father has arthritis? If not, then why distinguish the two? You could go without Family member types. (If you want so use a text field where you type in '2nd sis', 'mom', 'father', whatever.)
Are you going to have reports on that (e.g. 20% of our applicants told us they have family members with cancer)? Or will you always simply look up one applicant and see their family entries? If the latter, you could make this one text column where you simply type in all members and their health (or have your program write this).
Another point: Why is OverallComment a separate table? Do you need this for internationalization? Or for database-wide text search? If not, make this a column in the related table instead.
applicant ( applicant_id , name , comment )
If you need the relational model for queries and reports, then have one table for the family member:
family_member ( family_member_id , applicant_id , family_member_type, alive, comment )
And another table for the several desease entries per member:
family_member_desease ( family_member_id , desease_id , comment )
Maybe you should add dates. E.g.: When was the father reported to be alive?

Proper way to model user groups

So I have this application that I'm drawing up and I start to think about my users. Well, My initial thought was to create a table for each group type. I've been thinking this over though and I'm not sure that this is the best way.
Example:
// Users
Users [id, name, email, age, etc]
// User Groups
Player [id, years playing, etc]
Ref [id, certified, etc]
Manufacturer Rep [id, years employed, etc]
So everyone would be making an account, but each user would have a different group. They can also be in multiple different groups. Each group has it's own list of different columns. So what is the best way to do this? Lets say I have 5 groups. Do I need 8 tables + a relational table connecting each one to the user table?
I just want to be sure that this is the best way to organize it before I build it.
Edit:
A player would have columns regarding the gear that they use to play, the teams they've played with, events they've gone to.
A ref would have info regarding the certifications they have and the events they've reffed.
Manufacturer reps would have info regarding their position within the company they rep.
A parent would have information regarding how long they've been involved with the sport, perhaps relations with the users they are parent of.
Just as an example.
Edit 2:
**Player Table
id
user id
started date
stopped date
rank
**Ref Table
id
user id
started date
stopped date
is certified
certified by
verified
**Photographer / Videographer / News Reporter Table
id
user id
started date
stopped date
worked under name
website / channel link
about
verified
**Tournament / Big Game Rep Table
id
user id
started date
stopped date
position
tourney id
verified
**Store / Field / Manufacturer Rep Table
id
user id
started date
stopped date
position
store / field / man. id
verified
This is what I planned out so far. I'm still new to this so I could be doing it completely wrong. And it's only five groups. It was more until I condensed it some.
Although I find it weird having so many entities which are different from each other, but I will ignore this and get to the question.
It depends on the group criteria you need, in the case you described where each group has its own columns and information I guess your design is a good one, especially if you need the information in a readable form in the database. If you need all groups in a single table you will have to save the group relevant information in a kind of object, either a blob, XML string or any other form, but then you will lose the ability to filter on these criteria using the database.
In a relational Database I would do it using the design you described.
The design of your tables greatly depends on the requirements of your software.
E.g. your description of users led me in a wrong direction, I was at first thinking about a "normal" user of a software. Basically name, login-information and stuff like that. This I would never split over different tables as it really makes tasks like login, session handling, ... really complicated.
Another point which surprised me, was that you want to store the equipment in columns of those user's tables. Usually the relationship between a person and his equipment is not 1 to 1 and in most cases the amount of different equipment varies. Thus you usually have a relationship between users and their equipment (1:n). Thus you would design an equipment table and there refer to the owner's user id.
But after you have an idea of which data you have in your application and which relationships exist between your data, the design of the tables and so on is rather straitforward.
The good news is, that your data model and database design will develop over time. Try to start with a basic model, covering the majority of your use cases. Then slowly add more use cases / aspects.
As long as you are in the stage of planning and early implementation phasis, it is rather easy to change your database design.

DB design - store selection in database

I'm working on a web application where I need to do some research before I implement the database. I hope you can help me make some good decisions before I start to code.
Today i have a database that among other things contains about two million contacts in a table
Contact:
cid, name, phone, address, etc...
Users of the application can search the contact table based on different criteria, and get a list of contacts.
Users are stored i a separate database table
User: uid, name, email, etc...
Now I want to make the users able to store a search result as a selection. The selection has to be a list of cid's representing every contact in the search result the user got. When the selection is stored, a user can open the selection and add notes, statuses etc to the different contacts in the selection.
My first thought is to make a selection table and a selection-contact mapping table like this:
Selection: sid, name, description, uid, etc
SelectionContactMap: sid, cid, status, note, etc...
With an average selection size between 1 000 and 100 000 contacts, and several thousand users storing many selections, I see that the SelectionContactMap table is going to grow very big very fast.
The database is MySql and the application is written in PHP. I'm on a limited budget so I can not throw unlimited hardware on the task.
I'm I on the wrong way here?
Do you have any suggestions to solve this the best possible way?
Other database?
MySql specific suggestions, table type etc?
Other database design?
Any comments and suggestions are appreciated.
Thanks in advance :)
-- Tor Inge
Question: What happens if the results of the query change - eg: a selected contact no longer has the chosen attribute or a new contact gets added?
If the answer is "The result set should be updated" - then you want to store the criteria in the database, not the results themselves.
If you need to cache the results for a period of time, this may be better handled by the application, not the database.

Searching a database of names

I have a MYSQL database containing the names of a large collection of people. Each person in the database could could have one or all of the following name types: first, last, middle, maiden or nick. I want to provide a way for people to search this database to see if a person exists in the database.
Are there any off the shelf products that would be suited to searching a database of peoples names?
With a bit of ingenuity, MySQL will do just what you need... The following gives a few ideas how this could be accomplished.
Your table: (I call it tblPersons)
PersonID (primary key of sorts)
First
Last
Middle
Maiden
Nick
Other columns for extra info (address, whatever...)
By keeping the table as-is, and building an index on each of the name-related columns, the following query provides an inefficient but plausible way of finding all persons whose name matches somehow a particular name. (Jack in the example)
SELECT * from tblPersons
WHERE First = 'Jack' OR Last = 'Jack' OR Middle = 'Jack'
OR Maiden = 'Jack' OR Nick = 'Jack'
Note that the application is not bound to only searching for one name value to be sought in all the various name types. The User can also input a specific set of criteria for example to search for the First Name 'John' and Last Name 'Lennon' and the Profession 'Artist' (if such info is stored in the db) etc.
Also, note that even with this single table approach, one of the features of your application could be to let the user tell the search logic whether this is a "given" name (like Paul, Samantha or Fatima) or a "surname" (like Black, McQueen or Dupont). The main purpose of this is that there are names that can be either (for example Lewis or Hillary), and by being, optionally, a bit more specific in their query, the end users can get SQL to automatically weed-out many irrelevant records. We'll get back to this kind of feature, in the context of alternative, more efficient database layout.
Introducing a "Names" table.
Instead (or in addition...) of storing the various names in the tblPersons table, we can introduce an extra table. and relate it to tblPersons.
tblNames
PersonID (used to relate with tblPersons)
NameType (single letter code, say F, L, M, U, N for First, Last...)
Name
We'd then have ONE record in tblPersons for each individual, but as many records in tblNames as they have names (but when they don't have a particular name, few people for example have a Nickname, there is no need for a corresponding record in tblNames).
Then the query would become
SELECT [DISTINCT] * from tblPersons P
JOIN tblNames N ON N.PersonID = P.PersonID
WHERE N.Name = 'Jack'
Such a layout/structure would be more efficient. Furthermore this query would lend itself to offer the "given" vs. "surname" capability easily, just by adding to the WHERE clause
AND N.NameType IN ('F', 'M', 'N') -- for the "given" names
(or)
AND N.NameType IN ('L', 'U', 'N') -- for the "surname" types. Note that
-- we put Nick name in there, but could just as eaily remove it.
Another interest of this approach is that it would allow storing other kinds of names in there, for example the SOUNDEX form of every name could be added, under their own NameType(s), allowing to easily find names even if the spelling is approximate.
Finaly another improvement could be to introduce a separate lookup table containing the most common abbreviations of given names (Pete for Peter, Jack for John, Bill for William etc), and to use this for search purposes (The name columns used for providing the display values would remain as provided in the source data, but the extra lookup/normalization at the level of the search would increase recall).
You shouldn't need to buy a product to search a database, databases are built to handle queries.
Have you tried running your own queries on it? For example: (I'm imagining what the schema looks like)
SELECT * FROM names WHERE first_name='Matt' AND last_name='Way';
If you've tried running some queries, what problems did you encounter that makes you want to try a different solution?
What does the schema look like?
How many rows are there?
Have you tried indexing the data in any way?
Please provide some more information to help answer your question.