I'm very new to databases and I have a quick question.
How would I design my MySQL database if I have these fields:
ID,
lat,
long,
date - multiple dates,
time - multiple times
I know I should put it into two tables, right? And how would those two tables look?
Thanks!
Your first table might be called "location" and it would have an "id" column as its primary key, along with two columns called "latitude" and "longditude" (which could be varchar or a numeric type, depending what your application requires). Your second table might be called "location_event" and it could have an "id" column as its primary key, along with a foreign key column called "location_id" that is a reference to the primary key of the "location" table. This "location_event" table would also have a "date" column and a "time" column (of types date and time respectively).
It's hard to tell what you're trying to do from the terse description but third normal form dictates that any column should be dependent on:
the key.
the whole key.
nothing but the key.
To that end, I'd say my initial analysis would generate:
Location
LocId primary key
Lat
Long
Events
LocId foreign key Location(LocId)
Date
Time
This is based on my (possibly flawed) analysis that you want to store a location at which zero or more events can happen.
It's good practice to put the events in a separate table since the alternative is to have arrays of columns which is never a good idea.
As far as I can guess the date en time are couple always appearing together. In that case I would suggest two tables, location and time.
CREATE TABLE location (
id INT NOT NULL,
lat FLOAT NOT NULL,
long FLOAT NOT NULL,
PRIMARY KEY (id)
)
CREATE TABLE time (
id INT NOT NULL,
locationid INT NOT NULL,
date DATE NOT NULL,
time DATE NOT NULL
)
Optionally you can add a foreign key constraint
ALTER TABLE time ADD CONSTRAINT location_fk_constraint FOREIGN KEY location_fk_constraint (locationid)
REFERENCES location (id)
ON DELETE CASCADE
ON UPDATE CASCADE;
OK, let's say, for the sake of argument, that you are talking about longitude and latitude. That these are entries in some kind of list (perhaps a sea log? Arrgh, me maties!) of longitude and latitude. And that each of these long/lat pairs may appear more than once in the list.
Perhaps you want to build a database that figures out how many appearances each long/lat pair has, and when each appearance happened?
So how's this: First we have a table of the long/lat pairs, and we'll give each of those an ID.
ID long lat
-- ----- -----
1 11111 22222
2 33333 44444
3 55555 66666
Next, we'll have another table, which will assign each appearance of the long/lat pairs a date/time:
ID date time
-- ---- -----
1 1/1/1900 12:30
1 2/2/1900 12:31
1 3/2/1900 12:30
2 1/1/1930 08:21
Let's say you'll call the first table "longlat" and the second one "appearances".
You could find all the appearances of a single long/lat pair by doing something like:
SELECT date,time FROM appearances
LEFT JOIN longlat ON appearances.ID=longlat.ID
WHERE longlat.long = 11111 AND longlat.lat = 22222
You could count how many times something happened at a longitude of 11111, by doing:
SELECT count(ID) FROM appearances
LEFT JOIN longlat ON appearances.ID=longlat.ID
WHERE longlat.long = 11111
Hope that helps! I gotta admit, it's really quite annoying to try and guess what people mean... Try making yourself more clear in the future, and you'll see that the help you'll get will be that much more useful, concise and targeted at what you need.
Good luck!
Related
It's been a while when I work with databases that's why I need some advice about what I'm doing and thinking. Lately I've been involve with a project that a primary keys would be a custom generated. This is the set up in supply table supply id should be a varchar why? They wanted to know when it is purchase sample:
supply table
supply id: ( sup-4/23/2017-123456 ) Explanation sup for supply 2/23/2017 for the current date 123456 random 6 digits
property table
property id ( 123456-123456-123456 ) Explanation first 6 digits general ID from the accounting system, second 6 digits specific id based on general id, third 6 digits random 6 digits.
What did I think:
I will create an ID primary key with varchar datatype look at the first image
I create supply_id primary key int and I created another supply_code varchar datatype look at the second image
I need some advise for what is right between the two. I'm thinking that there would be a problem in each of my solution. I'm always thinking to use a standard primary like using INT now with a custom primary key.
I have this design.
Table models:
id - primary key
title - varchar(256)
Table model_instances:
id - primary key
model_id - foreign key to app_models.id
title - varchar(256)
Table model_fields:
id - pk
model_id - foreign key to models.id
instance_id - foreign key to model_instances.id
title - name of the field
type - enum [text, checkbox, radio, select, 'etc']
Table model_field_values:
instance_id - forein key model_instance.id
field_id - foreign key to model_fields.id
value - text
Also there can be many values for some field (like for multiple select dropdown)
The problem is: value is always text field, because I want to store different types of data (text, datetime, integer) and this table contains all values for all instances of all models.
For example, if I have 10 models and every model has 1000 instances with 10 fields then model_field_values (at minimum) would contain 100000 rows, if some fields are multiple, then it would contain (120000-150000 rows).
SQL's select using value field would be slow.
Solution 1:
For every model create new model_field_values like:
model.id = 1, model_field_values_1
...
model.id = 10, model_field_values_10
Solution 2:
Because model_fields contains all fields for model, we can create model_field_values like this
model_fields for model.id=1 (by primary key): 1 - text, 2 - integer, 3 - datetime, 4 - smalltext
Fields for model_field_values_1: field_1 text, field_2 integer, field_3 datetime, field_4 varchar(256)
This solution is not good for fields with multiple values, because every multiple value should have another table with link to the row in model_field_values_1, but it is good for searching through database because mysql would use native datatypes in where clauses (not text fields).
May be I miss something? May be there is a better design?
This database would be used in crm-system, where user can create different model with many instances in these models, so I can not preconfigure all tables with all columns.
Note: 200,000 rows (two tenths of a megarow) is, in the usual operation of MySQL, a medium sized table. It's generally possible to index such a table fairly efficiently. http://use-the-index-luke.com/
That being said, I think I understand your problem. It is, in the jargon of object-oriented design, polymorphism.
You have this model_field_value table, containing
instance_id
field_id
value
Your problem is, the value's native data type is sometimes VARCHAR(255), sometimes DATETIME or maybe TIMESTAMP, and sometimes INT.
And you'll sometimes need to do queries like this one
SELECT fv.instance_id
FROM model_field_value fv
WHERE fv.field_id = something
AND fv.value >= '2017-01-01'
AND fv.value < '2018-01-01'
to find DATETIME values that happened in calendar year 2017. For example.
This is generally a pain in the neck with key/value storage like what you need. For a query like my example to be sargable, you need to be able to put an index on a DATETIME column. But if you don't have such a column, you can't index it. Duh.
Here's a suggestion. Give your table these columns.
instance_id INT pk fk
field_id INT pk fk
value VARCHAR(255) a text representation of every value.
value_double DOUBLE a numeric representation of every numeric value, or NULL
value_ts TIMESTAMP a timestamp value if possible, or NULL
This table will contain redundant data, and you'll have to be very careful when you're writing it to make sure it's correct. But you will be able to put indexes on the value_ts and value_double columns, so you can make those kinds of queries sargable.
Just an idea.
We're developing a monitoring system. In our system values are reported by agents running on different servers. This observations reported can be values like:
A numeric value. e.g. "CPU USAGE" = 55. Meaning 55% of the CPU is in
use).
Certain event was fired. e.g. "Backup completed".
Status: e.g. SQL Server is offline.
We want to store this observations (which are not know in advance and will be added dynamically to the system without recompiling).
We are considering adding different columns to the observations table like this:
IntMeasure -> INTEGER
FloatMeasure -> FLOAT
Status -> varchar(255)
So if the value we whish to store is a number we can use IntMeasure or FloatMeasure according to the type. If the value is a status we can store the status literal string (or a status id if we decide to add a Statuses(id, name) table).
We suppose it's possible to have a more correct design but would probably become to slow and dark due to joins and dynamic table names depending on types? How would a join work if we can't specify the tables in advance in the query?
I haven't done a formal study, but from my own experience I would guess that more than 80% of database design flaws are generated from designing with performance as the most important (if not only) consideration.
If a good design calls for multiple tables, create multiple tables. Don't automatically assume that joins are something to be avoided. They are rarely the true cause of performance problems.
The primary consideration, first and foremost in all stages of database design, is data integrity. "The answer may not always be correct, but we can get it to you very quickly" is not a goal any shop should be working toward. Once data integrity has been locked down, if performance ever becomes an issue, it can be addressed. Don't sacrifice data integrity, especially to solve problems that may not exist.
With that in mind, look at what you need. You have observations you need to store. These observations can vary in the number and types of attributes and can be things like the value of a measurement, the notification of an event and the change of a status, among others and with the possibility of future observations being added.
This would appear to fit into a standard "type/subtype" pattern, with the "Observation" entry being the type and each type or kind of observation being the subtype, and suggests some form of type indicator field such as:
create table Observations(
...,
ObservationKind char( 1 ) check( ObservationKind in( 'M', 'E', 'S' )),
...
);
But hardcoding a list like this in a check constraint has a very low maintainability level. It becomes part of the schema and can be altered only with DDL statements. Not something your DBA is going to look forward to.
So have the kinds of observations in their own lookup table:
ID Name Meaning
== =========== =======
M Measurement The value of some system metric (CPU_Usage).
E Event An event has been detected.
S Status A change in a status has been detected.
(The char field could just as well be int or smallint. I use char here for illustration.)
Then fill out the Observations table with a PK and the attributes that would be common to all observations.
create table Observations(
ID int identity primary key,
ObservationKind char( 1 ) not null,
DateEntered date not null,
...,
constraint FK_ObservationKind foreign key( ObservationKind )
references ObservationKinds( ID ),
constraint UQ_ObservationIDKind( ID, ObservationKind )
);
It may seem strange to create a unique index on the combination of Kind field and the PK, which is unique all by itself, but bear with me a moment.
Now each kind or subtype gets its own table. Note that each kind of observation gets a table, not the data type.
create table Measurements(
ID int not null,
ObservationKind char( 1 ) check( ObservationKind = 'M' ),
Name varchar( 32 ) not null, -- Such as "CPU Usage"
Value double not null, -- such as 55.00
..., -- other attributes of Measurement observations
constraint PK_Measurements primary key( ID, ObservationKind ),
constraint FK_Measurements_Observations foreign key( ID, ObservationKind )
references Observations( ID, ObservationKind )
);
The first two fields will be the same for the other kinds of observations except the check constraint will force the value to the appropriate kind. The other fields may differ in number, name and data type.
Let's examine an example tuple that may exist in the Measurements table:
ID ObservationKind Name Value ...
==== =============== ========= =====
1001 M CPU Usage 55.0 ...
In order for this tuple to exist in this table, a matching entry must first exist in the Observations table with an ID value of 1001 and an observation kind value of 'M'. No other entry with an ID value of 1001 can exist in either the Observations table or the Measurements table and cannot exist at all in any other of the "kind" tables (Events, Status). This works the same way for all the kind tables.
I would further recommend creating a view for each kind of observation which will provide a join of each kind with the main observation table:
create view MeasurementObservations as
select ...
from Observations o
join Measurements m
on m.ID = o.ID;
Any code that works solely with measurements would need to only hit this view instead of the underlying tables. Using views to create a wall of abstraction between the application code and the raw data greatly enhances the maintainability of the database.
Now the creation of another kind of observation, such as "Error", involves a simple Insert statement to the ObservationKinds table:
F Fault A fault or error has been detected.
Of course, you need to create a new table and view for these error observations, but doing so will have no impact on existing tables, views or application code (except, of course, to write the new code to work with the new observations).
Just create it as a VARCHAR
This will allow you to store whatever data you require in it. It is much more difficult to do queries based on the number in the field such as
Select * from table where MyVARCHARField > 50 //get CPU > 50
However if you think you want to do this, then either you need a field per item or a generalised table such as
Create Table
Description : Varchar
ValueType : Varchar //Can be String, Float, Int
ValueString: Varchar
ValueFloat: Float
ValueInt : Int
Then when you are filling the data you can put your value in the correct field and select like this.
Select Description ,ValueInt from table where Description like '%cpu%' and ValueInt > 50
I had a used two columns for a similar problem. First column was for data type and second value contained data as a Varchar.
First column had codes ( e.g. 1= integer, 2 = string, 3 = date and so on), which could be combined to compare values. ( e.g. find the max integer where type=1)
I did not have joins, but i think you can use this approach. It will also help you if tomorrow more data types are introduced.
In MySQL, I was advised to store the multiple choice options for "Drugs" as a separate table user_drug where each row is one of the options selected by a particular user. I was also advised to create a 3rd table drug that describes each option selected in table user_drug. Here is an example:
user
id name income
1 Foo 10000
2 Bar 20000
3 Baz 30000
drug
id name
1 Marijuana
2 Cocaine
3 Heroin
user_drug
user_id drug_id
1 1
1 2
2 1
2 3
3 3
As you can see, table user_drug can contain the multiple drugs selected by a particular user, and table drug tells you what drug each drug_id is referring to.
I was told a Foreign Key should tie tables user_drug and drug together, but I've never dealt with Foreign Key's so I'm not sure how to do that.
Wouldn't it be easier to get rid of the drug table and simply store the TEXT value of each drug in user_drug? Why or why not?
If adding the 3rd table drug is better, then how would I implement the Foreign Key structure, and how would I normally retrieve the respective values using those Foreign Keys?
(I find it far easier to use just 2 tables, but I've heard Foreign Keys are helpful in that they ensure a proper value is entered, and that it is also a lot faster to search and sort for a drug_id than a text value, so I want to be sure.)
Wouldn't it be easier to get rid of the drug table and simply store the TEXT value of each drug in user_drug? Why or why not?
Easier, yes.
But not better.
Your data would not be normalized, wasting lots of space to store the table.
The index on that field would occupy way more space again wasting space and slowing things down.
If you want to query a drop-down list of possible values, that's trivial with a separate table, hard (read: slow) with just text in a field.
If you just drop text fields in 1 table, it's hard to ensure misspellings do not get in there, with a separate link table preventing misspellings is easy.
If adding the 3rd table drug is better, then how would I implement the Foreign Key structure
ALTER TABLE user_drug ADD FOREIGN KEY fk_drug(drug_id) REFERENCES drug(id);
and how would I normally retrieve the respective values using those Foreign Keys?
SELECT u.name, d.name as drug
FROM user u
INNER JOIN user_drug ud ON (ud.user_id = u.id)
INNER JOIN drug d ON (d.id = ud.drug_id)
Don't forget to declare the primary key for table user_drug as
PRIMARY KEY (user_id, drug_id)
Alternatively
You can use an enum
CREATE TABLE example (
id UNSIGNED INTEGER NOT NULL PRIMARY KEY AUTO_INCREMENT,
example ENUM('value1','value2','value3'),
other_fields .....
You don't get all the benefits of a separate table, but if you just want a few values (e.g. yes/no or male/female/unknown) and you want to make sure it's limited to only those values it's a good compromise.
And much more self documenting and robust than magic constants (1=male, 2=female, 3= unknown,... but what happens if we insert 4?)
Wouldn't it be easier to get rid of the drug table and simply store
the TEXT value of each drug in user_drug? Why or why not?
Normally, you'd have lots of other columns on the drug table -- things like description, medical information, chemical properties, etc. In that case, you wouldn't want to duplicate all of that information on every record of the user_drug table. In this particular case however, you've only got one column, so that issue is not really a big deal.
Also, you want to be sure that the drug referenced in the user_drug table actually exists. For example, if you store the field as text, then you could have heroin and its related misspellings like haroin or herion. This will give you problems when you try to select all heroin records later. Using a foreign key to a lookup table forces the id to exist in that table, so you can be absolutely sure that all references to heroin are accurate.
I'm building this tool for classifying data. Basically I will be regularly receiving rows of data in a flat-file that look like this:
a:b:c:d:e
a:b:c:d:e
a:b:c:d:e
a:b:c:d:e
And I have a list of categories to break these rows up into, for example:
Original Cat1 Cat2 Cat3 Cat4 Cat5
---------------------------------------
a:b:c:d:e a b c d e
As of right this second, there category names are known, as well as number of categories to break the data down by. But this might change over time (for instance, categories added/removed...total number of categories changed).
Okay so I'm not really looking for help on how to parse the rows or get data into a db or anything...I know how to do all that, and have the core script mostly written already, to handle parsing rows of values and separating into variable amount of categories.
Mostly I'm looking for advice on how to structure my database to store this stuff. So I've been thinking about it, and this is what I came up with:
Table: Generated
generated_id int - unique id for each row generated
generated_timestamp datetime - timestamp of when row was generated
last_updated datetime - timestamp of when row last updated
generated_method varchar(6) - method in which row was generated (manual or auto)
original_string varchar (255) - the original string
Table: Categories
category_id int - unique id for category
category_name varchar(20) - name of category
Table: Category_Values
category_map_id int - unique id for each value (not sure if I actually need this)
category_id int - id value to link to table Categories
generated_id int - id value to link to table Generated
category_value varchar (255) - value for the category
Basically the idea is when I parse a row, I will insert a new entry into table Generated, as well as X entries in table Category_Values, where X is however many categories there currently are. And the category names are stored in another table Categories.
What my script will immediately do is process rows of raw values and output the generated category values to a new file to be sent somewhere. But then I have this db I'm making to store the data generated so that I can make another script, where I can search for and list previously generated values, or update previously generated entries with new values or whatever.
Does this look like an okay database structure? Anything obvious I'm missing or potentially gimping myself on? For example, with this structure...well...I'm not a sql expert, but I think I should be able to do like
select * from Generated where original_string = '$string'
// id is put into $id
and then
select * from Category_Values where generated_id = '$id'
...and then I'll have my data to work with for search results or form to alter data...well I'm fairly certain I can even combine this into one query with a join or something but I'm not that great with sql so I don't know how to actually do that..but point is, I know I can do what I need from this db structure..but am I making this harder than it needs to be? Making some obvious noob mistake?
My suggestion:
Table: Generated
id unsigned int autoincrement primary key
generated_timestamp timestamp
last_updated timestamp default '0000-00-00' ON UPDATE CURRENT_TIMESTAMP
generated_method ENUM('manual','auto')
original_string varchar (255)
Table: Categories
id unsigned int autoincrement primary key
category_name varchar(20)
Table: Category_Values
id unsigned int autoincrement primary key
category_id int
generated_id int
category_value varchar (255) - value for the category
FOREIGN KEY `fk_cat`(category_id) REFERENCES category.id
FOREIGN KEY `fk_gen`(generated_id) REFERENCES generated.id
Links
Timestamps: http://dev.mysql.com/doc/refman/5.1/en/timestamp.html
Create table syntax: http://dev.mysql.com/doc/refman/5.1/en/create-table.html
Enums: http://dev.mysql.com/doc/refman/5.1/en/enum.html
I think this solution is perfect for what you want to do. The Categories list is now flexible so that you can add new categories or retire old ones (I would recommend thinking long and hard about it before agreeing to delete a category - would you orphan record or remove them too, etc.)
Basically, I'm saying you are right on target. The structure is simple but it will work well for you. Great job (and great job giving exactly the right amount of information in the question).