When to replace a database column with an ID instead - mysql

I'm helping a friend design a database but I'm curious if there is a general rule of thumb for the following:
TABLE_ORDER
OrderNumber
OrderType
The column OrderType has the possibility of coming from a preset list of Order Types. Should I allow VARCHAR values to be used in the OrderType column (ex. Production Order, Sales Order, etc...) Or should I separate it out into another table and have it referenced as a foreign key instead from the TABLE_ORDER as the following?:
TABLE_ORDER
OrderNumber
OrderTypeID
TABLE_ORDER_TYPE
ID
OrderType

If the order type list is set, and will not change, you could opt to not-make a seperate table. But in this case, do not make it VARCHAR, but make it an ENUM.
You can index this better, and you will end up with arguably the same type of database as when you make it an ID with lookup-table.
But if there is any change at all you need to add types, just go for the second. You can add an interface later, but you can easily make "get all types" kind of pages etc.

I would say use another table say "ReferenceCodes" for example:
Type, Name, Description, Code
Then you can just use the Code through out the database and need not worry about the name associated to that code. If you use a name (for example order type in your case), if would be really difficult to change the name later on. This is what we actually do in our system.

In a perfect world, any column that can contain duplicate data should be an id or an ENUM. This helps you make sure that the data is always internally consistent and can reduce database size as well as speed up queries.
For something like this structure, I would probably create a master_object table that you could use for multiple types. OrderType would reference the master_object table. You could then use the same table for other data. For example, let's say you had another table - Payments, with a column of PaymentType. You could use the master_object table to also store the values and meta-data for that column. This gives you quite a bit of flexibility without forcing you to create a bunch of small tables, each containing 2-10 rows.
Brian

If the list is small ( less than 10 items ) then you could choose to model it as your first but put a column constraint to limit the inputs to the values in your list. This forces the entries to belong to your list, but your list should not change often.
e.g. check order_type in ('Val1','Val2',...'Valn')
If the list will ever change, if it is used in multiple tables, you are required to support multiple languages or any other design criteria that demands variability, then create your type table (you are always safe with this choice, it is why it is the most used).
You can collect all such tables into a 'codes' table that generalizes the concept
CREATE TABLE Codes (
Code_Class CHARACTER VARYING(30) NOT NULL,
Code_Name CHARACTER VARYING(30) NOT NULL,
Code_Value_1 CHARACTER VARYING(30),
Code_Value_2 CHARACTER VARYING(30),
Code_Value_3 CHARACTER VARYING(30),
CONSTRAINT PK_Codes PRIMARY KEY (Code_Class, Code_Name)
);
insert into codes ( code_class, code_name, code_value_1 )
values( 'STATE','New York','NY' ),
values( 'STATE, 'California','CA'),
.... );
You can then place and UPDATE/INSERT trigger on the table.column under change that should be constrained to a list of states. Lets say an employee table has a column EMP_STATE to hold state short-forms.
The trigger would simply call a select statement like
SELECT code_name
, code_value_1
INTO v_state_name, v_state_short_name
FROM codes
WHERE code_class = 'STATE'
AND code_value_1 = new.EMP_STATE;
if( not found ) then
raise( some error to fail the trigger and the insert );
end if;
This can be extended to other types:
insert into codes ( code_class, code_name )
values( 'ORDER_TYPE','Production' ),
values( 'ORDER_TYPE', 'Sales'),
.... );
select code_name
, code_value_1
into v_state_name, v_state_short_name
from codes
where code_class = 'ORDER_TYPE'
and code_name = 'Sales';
This last method, although generally applicable can be over-used. It also has the downside that you cannot use different data types (code_name, code_value_*).
The general rule of thumb: create a 'TYPE' (e.g. ORDER_TYPE) table (to hold the values you wish to constrain an attribute to for each type), use an ID as the primary key, use a single sequence to generate all such id's (for all your 'TYPE' tables). The many TYPE tables may clutter your model, but the meaning will be clear to your developers (the ultimate goal).

Related

inserting multiple values into one row mySQL

How can i insert multiple values into one row?
My query
insert into table_RekamMedis values ('RM001', '1999-05-01', 'D01', 'Dr Zurmaini', 'S11', 'Tropicana', 'B01', 'Sulfa', '3dd1');
i cant insert two values into one row. is there another way to do it?
I'm ignorant of the human language you use, so this is a guess.
You have two entities in your system. One is dokter, the other is script (prescription). Your requirement is to store zero or more scripts for each dokter. That is, the relationship between your entities is one-to-many.
In a relational database management system (SQL system) you do that with two tables, one per entity. Your dokter table will contain a unique identifier for each doctor, and the doctor's descriptive attributes.
CREATE TABLE dokter(
dokter_id BIGINT AUTO_INCREMENT PRIMARY KEY NOT NULL,
nama VARCHAR (100),
kode VARCHAR(10),
/* others ... */
);
And you'll have a second table for script
CREATE TABLE script (
script_id BIGINT AUTO_INCREMENT PRIMARY KEY NOT NULL,
dokter_id BIGINT NOT NULL,
kode VARCHAR(10),
nama VARCHAR(100),
dosis VARCHAR(100),
/* others ... */
);
Then, when a doctor writes two prescriptions, you insert one row in dokter and two rows in script. You make the relationship between script and dokter by putting the correct dokter_id into each script row.
Then you can retrieve this information with a query like this:
SELECT dokter.dokter_id, dokter.nama, dokter.kode,
script.script_id, script.kode, script.nama, script.dosis
FROM dokter
LEFT JOIN script ON dokter.dokter_id = script.dokter_id
Study up on entity-relationship data design. It's worth your time to learn and will enhance your career immeasurably.
You can't store multiple values in a single field but there are various options to achieve what you're looking for.
If you know that a given field can only have a set number of values then it might make sense to simply create multiple columns to hold these values. In your case, perhaps Nama obat only ever has 2 different values so you could break out that column into two columns: Nama obat primary and Nama obat secondary.
But if a given field could have any amount of values, then it would likely make sense to create a table to hold those values so that it looks something like:
NoRM
NamaObat
RM001
Sulfa
RM001
Anymiem
RM001
ABC
RM002
XYZ
And then you can combine that with your original table with a simple join:
SELECT * FROM table_RekamMedis JOIN table_NamaObat ON table_RekamMedis.NoRM = table_NamaObat.NoRM
The above takes care of storing the data. If you then want to query the data such that the results are presented in the way you laid out in your question, you could combine the multiple NamaObat fields into a single field using GROUP_CONCAT which could look something like:
SELECT GROUP_CONCAT(NamaObat SEPARATOR '\n')
...
GROUP BY NoRM

char(2) or enum or tinyint in MySQL

I store CountryCode at my database and I have only 5 options to store at the column CountryCode "EG, AE, BH, QA, KW"
Should I use char(2) or tinyint or enum('EG', 'AE', 'BH', 'QA', 'KW') any why?
Use the 2-letter standard country_codes.
And make it CHAR(2) CHARACTER SET ascii. And debate between ascii_bin (which disallows case folding) and ascii_general_ci (for case folding).
That would be 2 bytes.
ENUM and TINYINT UNSIGNED would be only one byte, but the total number of countries is dangerously close to 256. At that point you would need a 2-byte ENUM or SMALLINT.
An argument in favor of CHAR(2): It is human readable (mostly). And, if you need more info about each country (full name, population, etc), you can still have a table with PRIMARY KEY(country_code) and easily (and efficiently) JOIN when needed.
Your list of 5 ccs is too long and too likely to change; don't use ENUM.
In general, ENUM should be limited to very short lists that are unlikely to change. Also, consider starting the list with something like 'unknown' instead of making the field NULLable.
If you're quite sure the list of accepted values is not gonna increase too much I would go with the enum to have more clean values, avoiding faulty inputs like 'Bh' 'eg' 'kW' or stuff like that.
ENUM s are fine, but there are drawbacks in terms of maintenance:
listing the allowed values requires accessing the definition of the table
adding new possible values to the list requires modifying the structure of the table
if more than one table has a CountryCode column, you need to recreate another ENUM
So this should be used only in cases where the list is not meant to change over time, and a single column uses it.
In all other cases, it is simpler to have a referential table that stores the values, and create foreign keys in the referencing table(s):
-- referential table
create table countries (countryCode varchar(2) primary key);
insert into countries values ('EG'), ('AE'), ('BH'), ('KW');
-- referencing table
create table mytable (
id int, -- and/or other columns of the table ...
countryCode varchar(2) references countries(countryCode)
);
With this technique, you get the full benefit and flexibility of foreign keys: easy maintenance, data integrity, possible indexing, nice options such as on delete cascade, and so on.

How to store a data whose type can be numeric, date or string in mysql

We're developing a monitoring system. In our system values are reported by agents running on different servers. This observations reported can be values like:
A numeric value. e.g. "CPU USAGE" = 55. Meaning 55% of the CPU is in
use).
Certain event was fired. e.g. "Backup completed".
Status: e.g. SQL Server is offline.
We want to store this observations (which are not know in advance and will be added dynamically to the system without recompiling).
We are considering adding different columns to the observations table like this:
IntMeasure -> INTEGER
FloatMeasure -> FLOAT
Status -> varchar(255)
So if the value we whish to store is a number we can use IntMeasure or FloatMeasure according to the type. If the value is a status we can store the status literal string (or a status id if we decide to add a Statuses(id, name) table).
We suppose it's possible to have a more correct design but would probably become to slow and dark due to joins and dynamic table names depending on types? How would a join work if we can't specify the tables in advance in the query?
I haven't done a formal study, but from my own experience I would guess that more than 80% of database design flaws are generated from designing with performance as the most important (if not only) consideration.
If a good design calls for multiple tables, create multiple tables. Don't automatically assume that joins are something to be avoided. They are rarely the true cause of performance problems.
The primary consideration, first and foremost in all stages of database design, is data integrity. "The answer may not always be correct, but we can get it to you very quickly" is not a goal any shop should be working toward. Once data integrity has been locked down, if performance ever becomes an issue, it can be addressed. Don't sacrifice data integrity, especially to solve problems that may not exist.
With that in mind, look at what you need. You have observations you need to store. These observations can vary in the number and types of attributes and can be things like the value of a measurement, the notification of an event and the change of a status, among others and with the possibility of future observations being added.
This would appear to fit into a standard "type/subtype" pattern, with the "Observation" entry being the type and each type or kind of observation being the subtype, and suggests some form of type indicator field such as:
create table Observations(
...,
ObservationKind char( 1 ) check( ObservationKind in( 'M', 'E', 'S' )),
...
);
But hardcoding a list like this in a check constraint has a very low maintainability level. It becomes part of the schema and can be altered only with DDL statements. Not something your DBA is going to look forward to.
So have the kinds of observations in their own lookup table:
ID Name Meaning
== =========== =======
M Measurement The value of some system metric (CPU_Usage).
E Event An event has been detected.
S Status A change in a status has been detected.
(The char field could just as well be int or smallint. I use char here for illustration.)
Then fill out the Observations table with a PK and the attributes that would be common to all observations.
create table Observations(
ID int identity primary key,
ObservationKind char( 1 ) not null,
DateEntered date not null,
...,
constraint FK_ObservationKind foreign key( ObservationKind )
references ObservationKinds( ID ),
constraint UQ_ObservationIDKind( ID, ObservationKind )
);
It may seem strange to create a unique index on the combination of Kind field and the PK, which is unique all by itself, but bear with me a moment.
Now each kind or subtype gets its own table. Note that each kind of observation gets a table, not the data type.
create table Measurements(
ID int not null,
ObservationKind char( 1 ) check( ObservationKind = 'M' ),
Name varchar( 32 ) not null, -- Such as "CPU Usage"
Value double not null, -- such as 55.00
..., -- other attributes of Measurement observations
constraint PK_Measurements primary key( ID, ObservationKind ),
constraint FK_Measurements_Observations foreign key( ID, ObservationKind )
references Observations( ID, ObservationKind )
);
The first two fields will be the same for the other kinds of observations except the check constraint will force the value to the appropriate kind. The other fields may differ in number, name and data type.
Let's examine an example tuple that may exist in the Measurements table:
ID ObservationKind Name Value ...
==== =============== ========= =====
1001 M CPU Usage 55.0 ...
In order for this tuple to exist in this table, a matching entry must first exist in the Observations table with an ID value of 1001 and an observation kind value of 'M'. No other entry with an ID value of 1001 can exist in either the Observations table or the Measurements table and cannot exist at all in any other of the "kind" tables (Events, Status). This works the same way for all the kind tables.
I would further recommend creating a view for each kind of observation which will provide a join of each kind with the main observation table:
create view MeasurementObservations as
select ...
from Observations o
join Measurements m
on m.ID = o.ID;
Any code that works solely with measurements would need to only hit this view instead of the underlying tables. Using views to create a wall of abstraction between the application code and the raw data greatly enhances the maintainability of the database.
Now the creation of another kind of observation, such as "Error", involves a simple Insert statement to the ObservationKinds table:
F Fault A fault or error has been detected.
Of course, you need to create a new table and view for these error observations, but doing so will have no impact on existing tables, views or application code (except, of course, to write the new code to work with the new observations).
Just create it as a VARCHAR
This will allow you to store whatever data you require in it. It is much more difficult to do queries based on the number in the field such as
Select * from table where MyVARCHARField > 50 //get CPU > 50
However if you think you want to do this, then either you need a field per item or a generalised table such as
Create Table
Description : Varchar
ValueType : Varchar //Can be String, Float, Int
ValueString: Varchar
ValueFloat: Float
ValueInt : Int
Then when you are filling the data you can put your value in the correct field and select like this.
Select Description ,ValueInt from table where Description like '%cpu%' and ValueInt > 50
I had a used two columns for a similar problem. First column was for data type and second value contained data as a Varchar.
First column had codes ( e.g. 1= integer, 2 = string, 3 = date and so on), which could be combined to compare values. ( e.g. find the max integer where type=1)
I did not have joins, but i think you can use this approach. It will also help you if tomorrow more data types are introduced.

MYSQL|Datatype for states

I need to figure out which datatype to use for states.
Should it be SET or VARCHAR or anything else?
CREATE TABLE actors(
state SET('USA','Germany','....)
)
alternatively
CREATE TABLE actors(
state VARCHAR(30)
)
Assuming there's going to be tens or over hundred of the countries, it's best to use separate table.
CREATE TABLE states(
state_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
name VARCHAR(30)
);
It's also recommended to use foreign key on the state_id so if you want to delete a state from your database, it wouldn't break other data depending on it.
If each actor is going to be assigned only to one state (1:1), you can use column in the actors table.
CREATE TABLE actors(
actor_id INT ...,
state_id INT,
)
Or if each actor can be assigned to more states (1:N), use another table for these relations:
CREATE TABLE actors(
actor_id INT ...,
)
CREATE TABLE actors_to_states(
actor_id INT,
state_id INT
)
SET is a compound datatype containing values from predefined set of possible values. If table contains such a data then according to relational databases theory it is not in 1NF. So it is only few special cases where this approach is reasonable. In most cases I suggest using separate table for countries like in example below:
CREATE TABLE countries (id SMALLINT, name VARCHAR(100))
To answer these type of questions , you should so little bit of data analysis and ask some questions to your data like :
What is the maximum size of my data ?
In your case it will be the country which has the largest name .Note this down. Add 20 to remain on safe side .
Will my data always contain numbers or characters, or combination ?
In your case only characters . So it is varchar .
Also plan your data-model such that you don't need to edit afterwards .Use of set would not be recommended by me in that case.
I recommend you use the standard abbreviations (US for USA, DE for Germany) and put it in
country_code CHAR(2) CHARACTER SET ascii NOT NULL
That way, it is compact (2 bytes) and readable by users. Then, if you want, you can have another table that spells out the country names.
If an actor can belong to multiple states, then this won't work, and you do need to have a SET. If you need that, we can discuss it further.

MySQL database with user created tables with custom column numbers

I have a person table and I want users to be able to create custom many to many relations of information with them. Educations, residences, employments, languages, and so on. These might require different number of columns. E.g.
Person_languages(person_fk,language_fk)
Person_Educations(person,institution,degree,field,start,end)
I thought of something like this. (Not correct sql)
create Tables(
table_id PRIMARY_KEY,
table_name_fk FOREIGN_KEY(Table_name),
person_fk FOREIGN_KEY(Person),
table_description TEXT
)
Table holding all custom table name and descriptions
create Table_columns(
column_id PRIMARY_KEY,
table_fk FOREIGN_KEY(Tables),
column_name_fk FOREIGN_KEY(Columns),
rank_column INT,
)
Table holding the columns in each custom table and the order they are to be displayed in.
create Table_rows(
row_id PRIMARY_KEY,
table_fk FOREIGN_KEY(Tables),
row_nr INT,
)
Table holding the rows of each custom table.
create Table_cells(
cell_id PRIMARY_KEY,
table_fk FOREIGN_KEY(Tables),
row_fk FOREIGN_KEY(Table_rows),
column_fk FOREIGN_KEY(Table_columns),
cell_content_type_fk FOREIGN_KEY(Content_types),
cell_object_id INT,
)
Table holding cell info.
If any custom table starts to be used with most persons and becomes large, the idea was to maybe then extract it into a separate hard-coded many-to-many table just for that table.
Is this a stupid idea? Is there a better way to do this?
I strongly advise against such a design - you are on the road to an extremely fragmented and hard to read design.
IIUC your base problem is, that you have a common set of (universal) properties for a person, that may be extended by other (non-universal) properties.
I'd tackle this by having the universal properties in the person table and create two more tables: property_types, which translates a property name into an INT primary key and person_properties which combines person PK, propety PK and value.
If you set the PK of this table to be (person,property) you get the best possible index locality for the person, which makes requesting all properties for a person a very fast query.