So I have many data measurement values for different kinds of measurements.
For simplicity sake let's say they're height and weight values for two people.
Which do you think (and why) is the best method for storing the data? In actuality we're talking about 1000s of patients and a lot of data.
Method 1
TableNums
name id
height 1
weight 2
height
id value
1 140
2 130
weight
id value
1 70
2 60
In this method I have a separate table for each measurement type. This one seems good for readability and adding new measurements in the future. But it would also make for a lot of tables.
or
Method 2
TableNums
name id
height 1
weight 2
attributes
id type_id value unique_id
1 1 140 1
2 1 130 2
1 2 70 3
2 2 60 4
This method seems less readable but would require only one table for the measurements.
Which do you guys think is better practice?
Thanks,
Ben
I recommend something like this:
Table Meausurent:
MeasurementId PK
MeasurementType varchar -- height, weight, etc
MeasurementUnit varchar -- kg, cm, etc
Table patientMeasurement:
PatientId -- FK to patient
MeasurementId -- FK to measurement
value float
MeasurementDateTime datetime
other fields.
The PK of patientMeasurement could be composite (patientID, MeasurementId, MeasurementDateTime) or a separate field
Related
I have the 2 dimension tables and 1 fact table. They look like this:
Dimension table 1: Investor_DM
Investor_nr
Investor_name
1
Jack
2
Kelly
Dimension table 2: Company_DM
Company_nr
Company_name
1
Apple
2
Microsoft
Fact table: Positions
Company_nr
Investor_nr
InvestmentDate
InvestmentSize
1
1
2022-jan-02
$1,000
2
1
2022-feb-03
$1,500
1
2
2022-jun-02
$7,000
2
2
2022-jul-03
$7,500
I assigned Investor_nr and Company_nr as PK's in their respective dimension tables, and assigned FK's in the fact table in which I refer to the PK's.
What I want is when I look for data WHERE Investor_nr = '1', I only get the investments in the fact table of investor 1, etc. However, I still get all investments even though I have created one-to-many relationships between the tables. Same goes for the company dimension table.
This is my
SELECT *
FROM Positions
WHERE Investor_nr = '1'
;
query:
I would hope that if I SELECT * from all 3 tables, the Investor_nr's on each row correspond between the fact and dimension table, but this is not the case. Same goes for Company_nr.
What am I missing here? Am I using the wrong query, or are my relationships wrong?
I tried multiple queries, sometimes only on 1 table and sometimes 2 or 3 tables, but I always get data from investors I did not filter on.
Thanks!
Which of these methods would be the most efficient way of storing, retrieving, processing and searching a large (millions of records) index of stored URLs along with there keywords.
Example 1: (Using one table)
TABLE_URLs-----------------------------------------------
ID DOMAIN KEYWORDS
1 mysite.com videos,photos,images
2 yoursite.com videos,games
3 hissite.com games,images
4 hersite.com photos,pictures
---------------------------------------------------------
Example 2: (one-to-one Relationship from one table to another)
TABLE_URLs-----------------------------------------------
ID DOMAIN KEYWORDS
1 mysite.com
2 yoursite.com
3 hissite.com
4 hersite.com
---------------------------------------------------------
TABLE_URL_KEYWORDS---------------------------------------------
ID DOMAIN_ID KEYWORDS
1 1 videos,photos,images
2 2 videos,games
3 3 games,images
4 4 photos,pictures
---------------------------------------------------------
Example 3: (one-to-one Relationship from one table to another (Using a reference table))
TABLE_URLs-----------------------------------------------
ID DOMAIN
1 mysite.com
2 yoursite.com
3 hissite.com
4 hersite.com
---------------------------------------------------------
TABLE_URL_TO_KEYWORDS------------------------------------
ID DOMAIN_ID KEYWORDS_ID
1 1 1
2 2 2
3 3 3
4 4 4
---------------------------------------------------------
TABLE_KEYWORDS-------------------------------------------
ID KEYWORDS
1 videos,photos,images
2 videos,games
3 games,images
4 photos,pictures
---------------------------------------------------------
Example 4: (many-to-many Relationship from url to keyword ID (using reference table))
TABLE_URLs-----------------------------------------------
ID DOMAIN
1 mysite.com
2 yoursite.com
3 hissite.com
4 hersite.com
---------------------------------------------------------
TABLE_URL_TO_KEYWORDS------------------------------------
ID DOMAIN_ID KEYWORDS_ID
1 1 1
2 1 2
3 1 3
4 2 1
5 2 4
6 3 4
7 3 3
8 4 2
9 4 5
---------------------------------------------------------
TABLE_KEYWORDS-------------------------------------------
ID KEYWORDS
1 videos
2 photos
3 images
4 games
5 pictures
---------------------------------------------------------
My understanding is that Example 1 would take the largest amount of storage space however searching through this data would be quick (Repeat keywords saved multiple times, however keywords are sat next to the relevant domain)
wWhereas Example 4 would save a tons on storage space but searching through would take longer. (Not having to store duplicate keywords, however referencing multiple keywords for each domain would take longer)
Could anyone give me any insight or thoughts on which the best method would be to utilise when designing a database that can handle huge amounts of data? With the foresight that you may want to display a URL with its assosicated keywords OR search for one or more keywords and bring up the most relevant URLs
You do have a many-to-many relationship between url and keywords. The canonical way to represent this in a relational database is to use a bridge table, which corresponds to example 4 in your question.
Using the proper data structure, you will find out that the queries will be much easier to write, and as efficient as it gets.
I don't know what drives you to think that searchin in a structure like the first one will be faster. This requires you to do pattern matching when searching for each single keyword, which is notably slow. On the other hand, using a junction table lets you search for exact matches, which can take advantage of indexes.
Finally, maintaining such a structure is also much easier; adding or removing keywords can be done with insert and delete statements, while other structures require you do do string manipulation in delimited list, which again is tedious, error-prone and inefficient.
None of the above.
Simply have a table with 2 string columns:
CREATE TABLE domain_keywords (
domain VARCHAR(..) NOT NULL,
keyword VARCHAR(..) NOT NULL,
PRIMARY KEY(domain, keyword),
INDEX(keyword, domain)
) ENGINE=InnoDB
Notes:
It will be faster.
It will be easier to write code.
Having a plain id is very much a waste.
Normalizing the domain and keyword buys little space savings, but at a big loss in efficiency.
"Huse database"? I predict that this table will be smaller than your Domains table. That is, this table is not your main concern for "huge".
I have a teacher who doesn't like to explain to the class but loves putting up review questions for upcoming tests. Can anyone explain the image above? My main concern in the red underline which shows that supplier and supplierPhone are repeated values. I thought that repeated values occurred when there were many occurrences of the same item in a column.
Another question I have is that if the Supplier is a repeating value, why isnt Part_Name a repeating value because they both have 2 items with same names in their columns.
Example:
It's repeated because the result of the tuple is always the same. E.g. ABC Plastics will always have the same phone number, therefore having 2 rows with ABC Plastics means that we have redundant information in the phone number.
Part1 Company1 12341234
Part2 Company1 12341234
We could represent the same information with:
Part1 Company1
Part2 Company1
And
Company1 12341234.
Therefore having two rows with the same phone number is redundant.
This should answer your second question as well.
Essentially you're looking for tuples such that given the tuple (X, Y) exists, if there exists another tuple (X, Y') then Y' = Y
Looks like five tables to me.
model (entity)
modelid description price
1 Laserjet 423.56
1 256 Colour 89.99
part (entity)
partid name
PF123 Paper Spool
LC423 Laserjet cartridge
MT123 Power supply
etc
bill_of_materials (many to many relationship model >--< part )
modelid partid qty
1 PF123 2
1 LC423 4
1 MT123 1
2 MT123 2
supplier (entity)
supplier_id phone name
1 416-234-2342 ABC Plastics
2 905.. Jetson Carbons
3 767... ACME Power Supply
etc.
part_supplier (many to many relationship part >--< supplier )
part_id supplier_id
PF123 1
LC423 2
MT123 3
etc.
You have one row in model, part, supplier for each distinct entity
You have rows in bill_of_materials for each part that goes into each model.
You have a row in part_supplier for each supplier that can furnish each part. Notice that more than one part can come from one supplier, and more than one supplier can furnish each part. That's a many-to-many relationship.
The trick: Figure out what physical things you have in your application domain. Then make a table for each one. Then figure out how they relate to each other (that's what makes it relational.)
id user_id apt_id name value datetime
1 1 1 bp 109 ....
2 1 1 sugar 180 ....
3 2 2 bp 170 ....
I am trying to create the table in this approach because, the patient column is not the standard one, sometimes patient will be store the bp and sugar, sometime only bp.
Am i right in creating the design. If right, how to get the records of single patient.
Thanks,
If am not wrong, userid is your patientid in your scenario, in that case, use the below query to get the single patient record,
select * from Patienttable where user_id = '1'
Here you will get the single patient record. i.e., for user_id = 1
Output:
id user_id apt_id name value datetime
1 1 1 bp 109 ....
2 1 1 sugar 180 ....
Note: You can change as you want instead of 1
Others may disagree, but I wouldn't do it this way, unless you had several changing symptoms that you collect at different appointments. If it's a small collection (some of which are not collected), I would just add them as columns to the appointment table, and leave the sugar column as NULL when it's not collected.
user_id apt_id bp sugar datetime
1 1 109 180 ....
2 1 170 ....
The model you're proposing is a variant of Entity-Attribute-Value design, which has some strengths and some weaknesses. Aaron Bertrand had a good writeup of when an EAV design is useful, and what the costs are for that design. Based on the scenario you described, I don't think it's the best fit.
I have a general question about MySQL database table design. I have a table that contains ~ 650 thousand records, with approximately 100 thousand added per year. The data is requested quite frequently, 1.6 times per second on average.
It has the following structure right now
id port_id date product1_price product2_price product3_price
1 1 2012-01-01 100.00 200.00 155.00
2 2 2012-01-01 NULL 150.00 255.00
3 3 2012-01-01 300.00 NULL 355.00
4 1 2012-01-02 200.00 250.00 355.00
5 2 2012-01-02 400.00 230.00 255.00
Wouln't it be better to store the data in this manner?
id port_id date product price
1 1 2012-01-01 1 100
1 2 2012-01-01 1 200
1 3 2012-01-01 1 300
1 1 2012-01-02 1 240
Advantages of the alternative design:
with the second design we don't have to store NULL values (if there is no such product in the port)
we can add new products easily - comparing to the first design, where each new product requires a new column
Disadvantages of the alternative design:
The number of records will increase from 650 000 to 650 000 * number_of_products minus all NULL records; that will be approximately 2.1 million records.
In both cases we have id column as PRIMARY_KEY and UNIQUE key on combination of port_id and date.
So the question is: which way to go? Disk space does not matter, the speed of the queries is the most important aspect.
Thank you for your attention.
It seams, that will depend on definition of product table.
If product table is statically compound of maximum three parts, then changing the current design won't help much.
Although the current design smells bad but that will be a business dependent analysis.
BTW change must be done with caution on the side effects with product table and its usages.