It's been a while when I work with databases that's why I need some advice about what I'm doing and thinking. Lately I've been involve with a project that a primary keys would be a custom generated. This is the set up in supply table supply id should be a varchar why? They wanted to know when it is purchase sample:
supply table
supply id: ( sup-4/23/2017-123456 ) Explanation sup for supply 2/23/2017 for the current date 123456 random 6 digits
property table
property id ( 123456-123456-123456 ) Explanation first 6 digits general ID from the accounting system, second 6 digits specific id based on general id, third 6 digits random 6 digits.
What did I think:
I will create an ID primary key with varchar datatype look at the first image
I create supply_id primary key int and I created another supply_code varchar datatype look at the second image
I need some advise for what is right between the two. I'm thinking that there would be a problem in each of my solution. I'm always thinking to use a standard primary like using INT now with a custom primary key.
Related
I have this design.
Table models:
id - primary key
title - varchar(256)
Table model_instances:
id - primary key
model_id - foreign key to app_models.id
title - varchar(256)
Table model_fields:
id - pk
model_id - foreign key to models.id
instance_id - foreign key to model_instances.id
title - name of the field
type - enum [text, checkbox, radio, select, 'etc']
Table model_field_values:
instance_id - forein key model_instance.id
field_id - foreign key to model_fields.id
value - text
Also there can be many values for some field (like for multiple select dropdown)
The problem is: value is always text field, because I want to store different types of data (text, datetime, integer) and this table contains all values for all instances of all models.
For example, if I have 10 models and every model has 1000 instances with 10 fields then model_field_values (at minimum) would contain 100000 rows, if some fields are multiple, then it would contain (120000-150000 rows).
SQL's select using value field would be slow.
Solution 1:
For every model create new model_field_values like:
model.id = 1, model_field_values_1
...
model.id = 10, model_field_values_10
Solution 2:
Because model_fields contains all fields for model, we can create model_field_values like this
model_fields for model.id=1 (by primary key): 1 - text, 2 - integer, 3 - datetime, 4 - smalltext
Fields for model_field_values_1: field_1 text, field_2 integer, field_3 datetime, field_4 varchar(256)
This solution is not good for fields with multiple values, because every multiple value should have another table with link to the row in model_field_values_1, but it is good for searching through database because mysql would use native datatypes in where clauses (not text fields).
May be I miss something? May be there is a better design?
This database would be used in crm-system, where user can create different model with many instances in these models, so I can not preconfigure all tables with all columns.
Note: 200,000 rows (two tenths of a megarow) is, in the usual operation of MySQL, a medium sized table. It's generally possible to index such a table fairly efficiently. http://use-the-index-luke.com/
That being said, I think I understand your problem. It is, in the jargon of object-oriented design, polymorphism.
You have this model_field_value table, containing
instance_id
field_id
value
Your problem is, the value's native data type is sometimes VARCHAR(255), sometimes DATETIME or maybe TIMESTAMP, and sometimes INT.
And you'll sometimes need to do queries like this one
SELECT fv.instance_id
FROM model_field_value fv
WHERE fv.field_id = something
AND fv.value >= '2017-01-01'
AND fv.value < '2018-01-01'
to find DATETIME values that happened in calendar year 2017. For example.
This is generally a pain in the neck with key/value storage like what you need. For a query like my example to be sargable, you need to be able to put an index on a DATETIME column. But if you don't have such a column, you can't index it. Duh.
Here's a suggestion. Give your table these columns.
instance_id INT pk fk
field_id INT pk fk
value VARCHAR(255) a text representation of every value.
value_double DOUBLE a numeric representation of every numeric value, or NULL
value_ts TIMESTAMP a timestamp value if possible, or NULL
This table will contain redundant data, and you'll have to be very careful when you're writing it to make sure it's correct. But you will be able to put indexes on the value_ts and value_double columns, so you can make those kinds of queries sargable.
Just an idea.
We're developing a monitoring system. In our system values are reported by agents running on different servers. This observations reported can be values like:
A numeric value. e.g. "CPU USAGE" = 55. Meaning 55% of the CPU is in
use).
Certain event was fired. e.g. "Backup completed".
Status: e.g. SQL Server is offline.
We want to store this observations (which are not know in advance and will be added dynamically to the system without recompiling).
We are considering adding different columns to the observations table like this:
IntMeasure -> INTEGER
FloatMeasure -> FLOAT
Status -> varchar(255)
So if the value we whish to store is a number we can use IntMeasure or FloatMeasure according to the type. If the value is a status we can store the status literal string (or a status id if we decide to add a Statuses(id, name) table).
We suppose it's possible to have a more correct design but would probably become to slow and dark due to joins and dynamic table names depending on types? How would a join work if we can't specify the tables in advance in the query?
I haven't done a formal study, but from my own experience I would guess that more than 80% of database design flaws are generated from designing with performance as the most important (if not only) consideration.
If a good design calls for multiple tables, create multiple tables. Don't automatically assume that joins are something to be avoided. They are rarely the true cause of performance problems.
The primary consideration, first and foremost in all stages of database design, is data integrity. "The answer may not always be correct, but we can get it to you very quickly" is not a goal any shop should be working toward. Once data integrity has been locked down, if performance ever becomes an issue, it can be addressed. Don't sacrifice data integrity, especially to solve problems that may not exist.
With that in mind, look at what you need. You have observations you need to store. These observations can vary in the number and types of attributes and can be things like the value of a measurement, the notification of an event and the change of a status, among others and with the possibility of future observations being added.
This would appear to fit into a standard "type/subtype" pattern, with the "Observation" entry being the type and each type or kind of observation being the subtype, and suggests some form of type indicator field such as:
create table Observations(
...,
ObservationKind char( 1 ) check( ObservationKind in( 'M', 'E', 'S' )),
...
);
But hardcoding a list like this in a check constraint has a very low maintainability level. It becomes part of the schema and can be altered only with DDL statements. Not something your DBA is going to look forward to.
So have the kinds of observations in their own lookup table:
ID Name Meaning
== =========== =======
M Measurement The value of some system metric (CPU_Usage).
E Event An event has been detected.
S Status A change in a status has been detected.
(The char field could just as well be int or smallint. I use char here for illustration.)
Then fill out the Observations table with a PK and the attributes that would be common to all observations.
create table Observations(
ID int identity primary key,
ObservationKind char( 1 ) not null,
DateEntered date not null,
...,
constraint FK_ObservationKind foreign key( ObservationKind )
references ObservationKinds( ID ),
constraint UQ_ObservationIDKind( ID, ObservationKind )
);
It may seem strange to create a unique index on the combination of Kind field and the PK, which is unique all by itself, but bear with me a moment.
Now each kind or subtype gets its own table. Note that each kind of observation gets a table, not the data type.
create table Measurements(
ID int not null,
ObservationKind char( 1 ) check( ObservationKind = 'M' ),
Name varchar( 32 ) not null, -- Such as "CPU Usage"
Value double not null, -- such as 55.00
..., -- other attributes of Measurement observations
constraint PK_Measurements primary key( ID, ObservationKind ),
constraint FK_Measurements_Observations foreign key( ID, ObservationKind )
references Observations( ID, ObservationKind )
);
The first two fields will be the same for the other kinds of observations except the check constraint will force the value to the appropriate kind. The other fields may differ in number, name and data type.
Let's examine an example tuple that may exist in the Measurements table:
ID ObservationKind Name Value ...
==== =============== ========= =====
1001 M CPU Usage 55.0 ...
In order for this tuple to exist in this table, a matching entry must first exist in the Observations table with an ID value of 1001 and an observation kind value of 'M'. No other entry with an ID value of 1001 can exist in either the Observations table or the Measurements table and cannot exist at all in any other of the "kind" tables (Events, Status). This works the same way for all the kind tables.
I would further recommend creating a view for each kind of observation which will provide a join of each kind with the main observation table:
create view MeasurementObservations as
select ...
from Observations o
join Measurements m
on m.ID = o.ID;
Any code that works solely with measurements would need to only hit this view instead of the underlying tables. Using views to create a wall of abstraction between the application code and the raw data greatly enhances the maintainability of the database.
Now the creation of another kind of observation, such as "Error", involves a simple Insert statement to the ObservationKinds table:
F Fault A fault or error has been detected.
Of course, you need to create a new table and view for these error observations, but doing so will have no impact on existing tables, views or application code (except, of course, to write the new code to work with the new observations).
Just create it as a VARCHAR
This will allow you to store whatever data you require in it. It is much more difficult to do queries based on the number in the field such as
Select * from table where MyVARCHARField > 50 //get CPU > 50
However if you think you want to do this, then either you need a field per item or a generalised table such as
Create Table
Description : Varchar
ValueType : Varchar //Can be String, Float, Int
ValueString: Varchar
ValueFloat: Float
ValueInt : Int
Then when you are filling the data you can put your value in the correct field and select like this.
Select Description ,ValueInt from table where Description like '%cpu%' and ValueInt > 50
I had a used two columns for a similar problem. First column was for data type and second value contained data as a Varchar.
First column had codes ( e.g. 1= integer, 2 = string, 3 = date and so on), which could be combined to compare values. ( e.g. find the max integer where type=1)
I did not have joins, but i think you can use this approach. It will also help you if tomorrow more data types are introduced.
Is it ok to use data type varbinary for foreign keys?
Why?
I have an EvalAnswer table with a FK to a Score table.
The score is sensitive and should be encrypted. The encrypt/decrypt happens in the asp.net (4.0) project and not in sql server (2008), so the data type needs to be varbinary.
EDIT: more info
Of course.
I have these columns: Id, Score, ScoreText, Description, Index
The Id is an incremental counter. (PK)
The Score is the score as number (such as 1).
The ScoreText is the score as a letter (Score 1 equals letter A).
The Description is a comment for every score.
The reason I have it like this is also that there are special situations,
such as one of the questions have only scoring from 1-4, and the rest has 1-5.
So every question has a score 1, but the the description differs from another questions score 1.
So If I have 5 questions, this gives 5*5 rows in the Score table. (All with different description)
When I page load I get the correct scoring (with description) for every dropdownlist. Normally 1-5.
But when the user has saved the scoring, I need to know the earlier saaved score for every question when I page load.
Therefore I have a relation between EvalAnswer and the scoring.
There are questions with relation to the score table which is NOT sensitive.
But some are. And for them I need to hide the relation beetween EvalAnswer and Score.
What might be a bad design is the fact that I use the same table (the score table) as the
one to show the available scoring for every questions.
and also as the one to hold what the user has chosen. (this is the FK from EvalAnswer to Score)
Please advice.......
I suggest adding an ID int column to the Score table and reference this field from the EvalAnswer table.
This means your table scripts will change to
CREATE TABLE Score (
Id int not null identity primary key
, Code varbinary(max) --> The new field containing the encrypted information
, Score int
, ScoreText varchar(5)
, Description varchar(max)
, Index int)
CREATE TABLE EvalAnswer (
Id int not null identity,
ScoreId int not null references Score(Id)
...
)
As you can see, the "Old" Id field has now become the Code field. The new field is an identity column containing a unique number
There is nothing against using a varbinary column in a foreign key, but it will make querying and debugging much harder.
Also note there is a 900 byte limit on the width of an index which might be easier to hit when storing an encrypted blob.
I'm building this tool for classifying data. Basically I will be regularly receiving rows of data in a flat-file that look like this:
a:b:c:d:e
a:b:c:d:e
a:b:c:d:e
a:b:c:d:e
And I have a list of categories to break these rows up into, for example:
Original Cat1 Cat2 Cat3 Cat4 Cat5
---------------------------------------
a:b:c:d:e a b c d e
As of right this second, there category names are known, as well as number of categories to break the data down by. But this might change over time (for instance, categories added/removed...total number of categories changed).
Okay so I'm not really looking for help on how to parse the rows or get data into a db or anything...I know how to do all that, and have the core script mostly written already, to handle parsing rows of values and separating into variable amount of categories.
Mostly I'm looking for advice on how to structure my database to store this stuff. So I've been thinking about it, and this is what I came up with:
Table: Generated
generated_id int - unique id for each row generated
generated_timestamp datetime - timestamp of when row was generated
last_updated datetime - timestamp of when row last updated
generated_method varchar(6) - method in which row was generated (manual or auto)
original_string varchar (255) - the original string
Table: Categories
category_id int - unique id for category
category_name varchar(20) - name of category
Table: Category_Values
category_map_id int - unique id for each value (not sure if I actually need this)
category_id int - id value to link to table Categories
generated_id int - id value to link to table Generated
category_value varchar (255) - value for the category
Basically the idea is when I parse a row, I will insert a new entry into table Generated, as well as X entries in table Category_Values, where X is however many categories there currently are. And the category names are stored in another table Categories.
What my script will immediately do is process rows of raw values and output the generated category values to a new file to be sent somewhere. But then I have this db I'm making to store the data generated so that I can make another script, where I can search for and list previously generated values, or update previously generated entries with new values or whatever.
Does this look like an okay database structure? Anything obvious I'm missing or potentially gimping myself on? For example, with this structure...well...I'm not a sql expert, but I think I should be able to do like
select * from Generated where original_string = '$string'
// id is put into $id
and then
select * from Category_Values where generated_id = '$id'
...and then I'll have my data to work with for search results or form to alter data...well I'm fairly certain I can even combine this into one query with a join or something but I'm not that great with sql so I don't know how to actually do that..but point is, I know I can do what I need from this db structure..but am I making this harder than it needs to be? Making some obvious noob mistake?
My suggestion:
Table: Generated
id unsigned int autoincrement primary key
generated_timestamp timestamp
last_updated timestamp default '0000-00-00' ON UPDATE CURRENT_TIMESTAMP
generated_method ENUM('manual','auto')
original_string varchar (255)
Table: Categories
id unsigned int autoincrement primary key
category_name varchar(20)
Table: Category_Values
id unsigned int autoincrement primary key
category_id int
generated_id int
category_value varchar (255) - value for the category
FOREIGN KEY `fk_cat`(category_id) REFERENCES category.id
FOREIGN KEY `fk_gen`(generated_id) REFERENCES generated.id
Links
Timestamps: http://dev.mysql.com/doc/refman/5.1/en/timestamp.html
Create table syntax: http://dev.mysql.com/doc/refman/5.1/en/create-table.html
Enums: http://dev.mysql.com/doc/refman/5.1/en/enum.html
I think this solution is perfect for what you want to do. The Categories list is now flexible so that you can add new categories or retire old ones (I would recommend thinking long and hard about it before agreeing to delete a category - would you orphan record or remove them too, etc.)
Basically, I'm saying you are right on target. The structure is simple but it will work well for you. Great job (and great job giving exactly the right amount of information in the question).
I'm very new to databases and I have a quick question.
How would I design my MySQL database if I have these fields:
ID,
lat,
long,
date - multiple dates,
time - multiple times
I know I should put it into two tables, right? And how would those two tables look?
Thanks!
Your first table might be called "location" and it would have an "id" column as its primary key, along with two columns called "latitude" and "longditude" (which could be varchar or a numeric type, depending what your application requires). Your second table might be called "location_event" and it could have an "id" column as its primary key, along with a foreign key column called "location_id" that is a reference to the primary key of the "location" table. This "location_event" table would also have a "date" column and a "time" column (of types date and time respectively).
It's hard to tell what you're trying to do from the terse description but third normal form dictates that any column should be dependent on:
the key.
the whole key.
nothing but the key.
To that end, I'd say my initial analysis would generate:
Location
LocId primary key
Lat
Long
Events
LocId foreign key Location(LocId)
Date
Time
This is based on my (possibly flawed) analysis that you want to store a location at which zero or more events can happen.
It's good practice to put the events in a separate table since the alternative is to have arrays of columns which is never a good idea.
As far as I can guess the date en time are couple always appearing together. In that case I would suggest two tables, location and time.
CREATE TABLE location (
id INT NOT NULL,
lat FLOAT NOT NULL,
long FLOAT NOT NULL,
PRIMARY KEY (id)
)
CREATE TABLE time (
id INT NOT NULL,
locationid INT NOT NULL,
date DATE NOT NULL,
time DATE NOT NULL
)
Optionally you can add a foreign key constraint
ALTER TABLE time ADD CONSTRAINT location_fk_constraint FOREIGN KEY location_fk_constraint (locationid)
REFERENCES location (id)
ON DELETE CASCADE
ON UPDATE CASCADE;
OK, let's say, for the sake of argument, that you are talking about longitude and latitude. That these are entries in some kind of list (perhaps a sea log? Arrgh, me maties!) of longitude and latitude. And that each of these long/lat pairs may appear more than once in the list.
Perhaps you want to build a database that figures out how many appearances each long/lat pair has, and when each appearance happened?
So how's this: First we have a table of the long/lat pairs, and we'll give each of those an ID.
ID long lat
-- ----- -----
1 11111 22222
2 33333 44444
3 55555 66666
Next, we'll have another table, which will assign each appearance of the long/lat pairs a date/time:
ID date time
-- ---- -----
1 1/1/1900 12:30
1 2/2/1900 12:31
1 3/2/1900 12:30
2 1/1/1930 08:21
Let's say you'll call the first table "longlat" and the second one "appearances".
You could find all the appearances of a single long/lat pair by doing something like:
SELECT date,time FROM appearances
LEFT JOIN longlat ON appearances.ID=longlat.ID
WHERE longlat.long = 11111 AND longlat.lat = 22222
You could count how many times something happened at a longitude of 11111, by doing:
SELECT count(ID) FROM appearances
LEFT JOIN longlat ON appearances.ID=longlat.ID
WHERE longlat.long = 11111
Hope that helps! I gotta admit, it's really quite annoying to try and guess what people mean... Try making yourself more clear in the future, and you'll see that the help you'll get will be that much more useful, concise and targeted at what you need.
Good luck!