I wasn't sure what to call this post but here is my problem:
I have two tables: Online_Module & Offline_Module These two tables are used in my program to determine if the learning module has to be taken online or on location.
Now I also have a table called Academy. An Academy consists of many modules. For this I wanted to create the following sub-table: Academy_has_Module
And here lies the problem. Because the Online_Module and Offline_Module are not in the same table one of the values in my Academy_has_Module will always be null
Here are some pictures that show the buildup of these tables:
As you can see, one of the values will always be null. I want to know, what is best practice in situations like this?
1) create a table Module, holding fields commong for online and offline module:
id INT,
description
material
status
category
2) then keep the offline/online module tables (edit: minus the fields -except the id field - you are now keeping in the Module table ), but make them FK reference the new Module table, using the Module table as an intermediate link.
Edit:
Now, not to overwhelm you with a lot of stuff, but there are several questions you have to Q&A yourself, ie:
- can module be only offline/online?
- if yes, do i want to enforce it 100% in DB?
Because with my solution, you can have one Module and have it referenced by several Offline/Online modules. There are ways to solve it, but i think they would go far beyound what you asked, i am just mentioning it so you know..
As to getting the information (maybe not the best, but this is my level for now, if anyone knows better, feel free to teach me a new trick :)). *Notice: ugly coding, too tired for full up coding style :D *:
select *, case
when OffM.Id is null and OnM.Id is null then 'No module!'
when OffM.Id is not null and OnM.Id is not null then 'Too many modules!'
when OffM.Id is null and OnM.Id is not null then 'Online module!'
when OffM.Id is not null and OnM.Id is null then 'Offline module!'
end --probably a different, better way to compare?
from Academy_has_Module as AHM
join Module as M
on Academy.Module = M.Id
left join OfflineModule as OffM
on OffM.ModuleID = M.Id
left join OnlineModule as OnM
on OnM.ModuleID = M.Id
But now, from what i understand, if you added an ModuleType into the Module table, ORM frameworks (i dont really use them that much to be honest, so no 1st hand experience), can use this to return you an object of the correct class. But this is going much deeper into the whole architecture and technology used in your project and is outside the scope of this question and even my actual experience.
EDIT2:
Ok, one more thing that came to my mind: Is it not reasonable to change the online/offline module table structure to be same somehow? for instance online module :
-could have a mentor/responsible person as weel
-could be open from - to datetime
-offline module doesnt have a name?
-location for online module would be ie 'Online'
and just merge the tables together with a ModuleType - either constraint or a FK to an enumeration table (not sure what the right term is in English). Maybe a little bit forced, but (again, a lot of this depends on overall requirements, i have never seen even a simple table being added in a single iteration, it always influences something and it propagates inside the design) could make your life simple. Sometimes, its better to waste few bytes of space per record then trying to be too cute and getting bit in the posterior down the road.
Have a nice day
This is superclass/subclass design (or super-entity/sub-entity if preferred). To expand on #VladislavZalesak's answer, place the common attributes into a super-entity table. But don't forget to implement the proper data integrity checks.
create table Module(
ID int not null primary key,
ModType char( 1 ) not null,
Name varchar( 45 ) not null,
Description text,
MaterialID int,
StatusID int,
CategoryID int,
constraint TypeCK check( ModType in( 'O', 'F' ))
);
create unique index ModuleType_UIX on Module( ID, ModType );
Now, if ID is the PK, it must be unique. So why, you ask, do we create a unique index with ID and ModType? So we can reference them as a group:
create table OnlineModule(
ID int not null primary key,
ModType char( 1 ) not null default 'O',
Price double,
Time varchar( 45 ),
constraint OnlineTypeCK check( ModType = 'O' ),
constraint OnlineTypeFK foreign key( ID, ModType )
references Module( ID, ModType )
);
create table OfflineModule(
ID int not null primary key,
ModType char( 1 ) not null default 'F',
StartDate datetime,
EndDate datetime,
Mentor varchar( 45 ),
constraint OfflineTypeCK check( ModType = 'F' ),
constraint OfflineTypeFK foreign key( ID, ModType )
references Module( ID, ModType )
);
Now your Acadamy_has_Module table needs only the one FK to Module, which gives most of the information you need, even the fact that it is offline or online.
Both submodules have an identity relationship with the module table, not just by ID, which would be technically sufficient, but also by module type, which implements the sub/super relationship. The check constraints keep the structure enforced.
You can create views which join module to either submodule to show all the fields of each type of submodule.
It's extensible in that you can add new submodules. But if you think that would be likely, you might want to create a submodule_type table and define the ModType field of Module as an FK to that table. That would eliminate having to alter the check constraint of Module every time you added a new subtype.
Kinda cumbersome when you first look at it but it's flexible and solves some up-front design problems.
Related
I am creating a table for dietary_supplement where a supplement can have many ingredients.
I am having trouble designing the table for the ingredients.
The issue is that an ingredient can have many names or an acronym.
For example, vitaminB1 has other names like Thiamine and thiamin.
An acronym BHA can stand for both Butylated hydroxyanisole and beta hydroxy acid(this is actually an ingredient for skincare products but I am using it anyways because it makes a good example).
I am also concerned about the spacing and "-". For example, someone can spell vitaminA without spacing and someone can write vitamin A. Also, beta hydroxy acid can also be written as β-hydroxy acid(with "-") or β hydroxy acid(without "-").
What I have in mind are 2 options)
1) put all the names for one ingredient in a column using semi-colon to distinguish between names. eg) beta hydroxy acid;BHA;β-hydroxy acid;β hydroxy acid
-this would be easy but I am not sure if this is the smart way to design the database when I have to perform search actions etc.
2) create a table for all the names and relate it with a table for ingredients.
-This is the option that I am leaned towards, but I wonder if there are better ways to do this. And do I have to create separate rows for the same items with difference in spacing and "-"?
Make a mapping table of 'name' to 'canonical_name' (or id). It would have rows like
Thiamine vitaminB1
thiamin vitaminB1
vitaminB1 vitaminB1
B1 vitaminB1
By using a collation ending with _ci, you don't need to worry about capitalization.
When ingesting the data for a suplement, first lookup the name to get the canonical_name, then use the latter in any other table(s).
In that 2-column table, have
PRIMARY KEY(canonical_name),
INDEX(name, canonical_name)
so that you can go either direction.
Create a table for ingredients and supplement and make a column that will be the same in table ingredients and supplement and just join them if you want to select
It might be something like this:
CREATE TABLE Ingredient (
Id INTEGER UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY
, ImagePath VARCHAR(63)
, Description TEXT
-- other ingredient's non-name dependent properties
);
CREATE TABLE IngredientName (
Id INTEGER UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY
, IngredientId INTEGER UNSIGNED NOT NULL
, IsMain TINYINT(1) UNSIGNED NOT NULL DEFAULT 0
, Name VARCHAR(63) NOT NULL
, KEY IX_IngredientName_IngredientId_IsMain (IngredientId, IsMain)
, UNIQUE KEY IX_IngredientName_IngredientId_Name (IngredientId, Name)
, CONSTRAINT FK_IngredientName_IngredientId FOREIGN KEY (`IngredientId`) REFERENCES `Ingredient` (`Id`) ON DELETE CASCADE ON UPDATE CASCADE
);
Or you can add Ingredient.Name that would be the main name and rid off the IngredientName.IsMain then.
For spaces you should use some name normalization in your application such as removing consecutive spaces, capitalizing, normalizing spaces around commas, dashes etc. Sure, you can apply such normalization on database in trigger if you like.
There are some other possibilities.
You should think what would be user cases for using the DB first.
This is very important. There is no 'the best universal DB design'.
If you need some special search cases you might need special DB design or at least indexes.
P.S. I believe that putting different names in one field as something-separated value is bad idea
We're developing a monitoring system. In our system values are reported by agents running on different servers. This observations reported can be values like:
A numeric value. e.g. "CPU USAGE" = 55. Meaning 55% of the CPU is in
use).
Certain event was fired. e.g. "Backup completed".
Status: e.g. SQL Server is offline.
We want to store this observations (which are not know in advance and will be added dynamically to the system without recompiling).
We are considering adding different columns to the observations table like this:
IntMeasure -> INTEGER
FloatMeasure -> FLOAT
Status -> varchar(255)
So if the value we whish to store is a number we can use IntMeasure or FloatMeasure according to the type. If the value is a status we can store the status literal string (or a status id if we decide to add a Statuses(id, name) table).
We suppose it's possible to have a more correct design but would probably become to slow and dark due to joins and dynamic table names depending on types? How would a join work if we can't specify the tables in advance in the query?
I haven't done a formal study, but from my own experience I would guess that more than 80% of database design flaws are generated from designing with performance as the most important (if not only) consideration.
If a good design calls for multiple tables, create multiple tables. Don't automatically assume that joins are something to be avoided. They are rarely the true cause of performance problems.
The primary consideration, first and foremost in all stages of database design, is data integrity. "The answer may not always be correct, but we can get it to you very quickly" is not a goal any shop should be working toward. Once data integrity has been locked down, if performance ever becomes an issue, it can be addressed. Don't sacrifice data integrity, especially to solve problems that may not exist.
With that in mind, look at what you need. You have observations you need to store. These observations can vary in the number and types of attributes and can be things like the value of a measurement, the notification of an event and the change of a status, among others and with the possibility of future observations being added.
This would appear to fit into a standard "type/subtype" pattern, with the "Observation" entry being the type and each type or kind of observation being the subtype, and suggests some form of type indicator field such as:
create table Observations(
...,
ObservationKind char( 1 ) check( ObservationKind in( 'M', 'E', 'S' )),
...
);
But hardcoding a list like this in a check constraint has a very low maintainability level. It becomes part of the schema and can be altered only with DDL statements. Not something your DBA is going to look forward to.
So have the kinds of observations in their own lookup table:
ID Name Meaning
== =========== =======
M Measurement The value of some system metric (CPU_Usage).
E Event An event has been detected.
S Status A change in a status has been detected.
(The char field could just as well be int or smallint. I use char here for illustration.)
Then fill out the Observations table with a PK and the attributes that would be common to all observations.
create table Observations(
ID int identity primary key,
ObservationKind char( 1 ) not null,
DateEntered date not null,
...,
constraint FK_ObservationKind foreign key( ObservationKind )
references ObservationKinds( ID ),
constraint UQ_ObservationIDKind( ID, ObservationKind )
);
It may seem strange to create a unique index on the combination of Kind field and the PK, which is unique all by itself, but bear with me a moment.
Now each kind or subtype gets its own table. Note that each kind of observation gets a table, not the data type.
create table Measurements(
ID int not null,
ObservationKind char( 1 ) check( ObservationKind = 'M' ),
Name varchar( 32 ) not null, -- Such as "CPU Usage"
Value double not null, -- such as 55.00
..., -- other attributes of Measurement observations
constraint PK_Measurements primary key( ID, ObservationKind ),
constraint FK_Measurements_Observations foreign key( ID, ObservationKind )
references Observations( ID, ObservationKind )
);
The first two fields will be the same for the other kinds of observations except the check constraint will force the value to the appropriate kind. The other fields may differ in number, name and data type.
Let's examine an example tuple that may exist in the Measurements table:
ID ObservationKind Name Value ...
==== =============== ========= =====
1001 M CPU Usage 55.0 ...
In order for this tuple to exist in this table, a matching entry must first exist in the Observations table with an ID value of 1001 and an observation kind value of 'M'. No other entry with an ID value of 1001 can exist in either the Observations table or the Measurements table and cannot exist at all in any other of the "kind" tables (Events, Status). This works the same way for all the kind tables.
I would further recommend creating a view for each kind of observation which will provide a join of each kind with the main observation table:
create view MeasurementObservations as
select ...
from Observations o
join Measurements m
on m.ID = o.ID;
Any code that works solely with measurements would need to only hit this view instead of the underlying tables. Using views to create a wall of abstraction between the application code and the raw data greatly enhances the maintainability of the database.
Now the creation of another kind of observation, such as "Error", involves a simple Insert statement to the ObservationKinds table:
F Fault A fault or error has been detected.
Of course, you need to create a new table and view for these error observations, but doing so will have no impact on existing tables, views or application code (except, of course, to write the new code to work with the new observations).
Just create it as a VARCHAR
This will allow you to store whatever data you require in it. It is much more difficult to do queries based on the number in the field such as
Select * from table where MyVARCHARField > 50 //get CPU > 50
However if you think you want to do this, then either you need a field per item or a generalised table such as
Create Table
Description : Varchar
ValueType : Varchar //Can be String, Float, Int
ValueString: Varchar
ValueFloat: Float
ValueInt : Int
Then when you are filling the data you can put your value in the correct field and select like this.
Select Description ,ValueInt from table where Description like '%cpu%' and ValueInt > 50
I had a used two columns for a similar problem. First column was for data type and second value contained data as a Varchar.
First column had codes ( e.g. 1= integer, 2 = string, 3 = date and so on), which could be combined to compare values. ( e.g. find the max integer where type=1)
I did not have joins, but i think you can use this approach. It will also help you if tomorrow more data types are introduced.
I'm developing a classifieds site. And I'm totally stuck at database design level.
Advertisiment can only be in 1 category.
In my database I have table called "ads", which has columns, common for all advertisements.
CREATE TABLE Ads (
AdID int not null,
AdDate datetime not null,
AdCategory int not null,
AdHeading varchar(255) not null,
AdText varchar(255) not null,
etc...
);
I also have a lot of categories.
Ads that are posted in "cars" category, for example, have additional columns like make, model, color, etc. Ads, posted in "housing" have columns like housing type, sqft. etc...
I did something like:
CREATE TABLE Cars (
AdID int not null,
CarMake varchar (255) not null,
CarModel varchar(255) not null,
...
);
CREATE TABLE Housing (
AdID int not null,
HousingType varchar (255) not null
...
);
AdId in those is a foreign key to Ads.
But when I need to retrieve information from Ads, I have to look up all those additional tables and check if AdId in Ads equals to AdId in those tables.
For every category I need a new table. I'm gonna end up with like 15 tables or so.
I had an idea to have a boolean columns in Ads table like is_Cars, is_Housing, etc but having a 15 columns, where 14 would be NULL seems to be horrible.
Is there any better way to design this database? I need my database to be in a 3rd normal form, this is the most important requirement.
Don't worry too much - it's a well known dilemma, there are no 'silver bullets' and all solutions have some trade-offs. Your solution sounds good to me, and is commonly used in the industry. On the down side it has JOINS as you mentioned (which is a well-known trade-off of normalization anyway), and also each new product type requires a new TABLE. On the up side the table structure precisely reflects your business logic, it's readable and efficient in storage.
Your other suggestion, as far as I understand, was a single table where each row has a "type" indication - car, house etc (btw no need for multiple columns such as 'is_car', 'is_house' - it's simpler to have a single column 'type', e.g. type=1 indicates car, type=2 indicates house etc). Then multiple columns where some of them are unused for some product types.
Well, here the advantage is capability to add new types dynamically (even user-defined types) without changing the database schema. Also no 'JOINs'. On the down side you'll be storing & retrieving lots of 'null' cells, and also the schema would be less descriptive: e.g. it's harder to put a constraint "carModel column is not nullable", because it is nullable for houses (you can use triggers, but it's less readable).
Personally I prefer the 1st solution (of course depending on the usecase, but the 1st solution is my first instinct). And I can use it with some peace of mind after considering the trade-offs, e.g. understanding that I'm tolerating those JOINS as payment for a readable & compact schema.
One, you are confusing categories and product specifications.
Two, you need to read up on Table Inheritance.
If you don't mind nulls, use Single Table Inheritance. All "categories" (cars, houses, ...) go in one table and have a "type" column.
If you don't like nulls, use Class Table Inheritance. Make a master table with the primary keys that you point your category foreign key at. Make child tables for each type (cars, houses, ...) whose primary key is also a foreign key to the master table. This is easier with an ORM like Hibernate.
I am currently developing a database storage solution for product inventory information for the company I work for. I am using MySql, and I am having a hard time coming up with an efficient, feasible format for the data storage.
As it works right now, we have ~25000 products to keep track of. For each product, there are about 20 different categories that we need to track information for(quantity available, price, etc..). This report is downloaded and updated every 3-4 days, and it is stored and updated in excel right now.
My problem is that the only solution I have come up with so far is to create separate tables for each one of the categories mentioned above, using foreign keys based off of the product skus, and cascading to update each respective table. However, this method would require that every table add 24000 rows each time the program is run, given that each product needs updated for the date it was run. The problem with this is that the data will be store for around a year, so the tables will grow an extensive amount. My research for other database formats has yielded some examples, but none on the scale of this. They are geared towards adding maybe 100 rows a day.
Does anybody know or have any ideas of a suitable way to set up this kind of database, or is the method I described above suitable and within the limitations of the MySql tables?
Thanks,
Mike
25,000 rows is nothing to MySQL or a flat file for that case. Do not initially worry about data volume. I've worked on many retail database schemas and products are usually defined by either a static or arbitrary-length set of attributes. Your data quantity ends of not being that far off either way.
Static:
create table products (
product_id integer primary key auto_increment
, product_name varchar(255) -- or whatever
, attribute1_id -- FK
, attribute2_id -- FK
, ...
, attributeX_id -- FK
);
create table attributes (
attribute_id integer primary key -- whatever
, attribute_type -- Category?
, attribute_value varchar(255)
);
Or, you obviously:
create table products (
product_id integer primary key auto_increment
, product_name varchar(255) -- or whatever
);
create table product_attributes (
product_id integer
, attribute_id integer
, -- other stuff you want like date of assignment
, primary key (product_id , attribute_id)
);
create table attributes (
attribute_id integer primary key -- whatever
, attribute_type -- Category?
, attribute_value varchar(255)
);
I would not hesitate to shove a few hundred million records into a basic structure like either.
I'm trying to create a relation where any of four different parts may be included, but any collection of the same parts should be handled as unique.
Example:
An assignment must have an assigned company, may optionally have an assigned location, workgroup and program.
An assignment may not have a workgroup without a location.
Let's assume we have companies A, B, C; locations X, Y, Z; workgroups I, J, K and programs 1, 2, 3.
So valid relations could include
A - X - I - 1
A - Z - 2
B - Y
C
C - 3
B - Z - K
But invalid relations would include
A - K (Workgroup without location)
Y - K - 1 (No company)
So, to create my table, I've created
companyID INT NOT NULL,
FOREIGN KEY companyKEY (companyID) REFERENCES company (companyID),
locationID INT,
FOREIGN KEY locationKEY (locationID) REFERENCES location (locationID),
workgroupID INT,
FOREIGN KEY workgroupKEY (workgroupID) REFERENCES workgroup (workgroupID),
programID INT,
FOREIGN KEY programKEY (programID) REFERENCES program (programID),
UNIQUE KEY companyLocationWorkgroupProgramKEY (companyID, locationID, workgroupID, programID)
I figure this would handle all my relations besides the neccessity of an assignment to have a location if there is a workgroup (which I can happily do programatically or with triggers, I think)
However, when I test this schema, it allows me to enter the following...
INSERT INTO test VALUES (1, null, null, null), (1, null, null, null);
...without complaint. I'm guessing that (1, null, null, null) does not equal itself because nulls are included. If this is the case, is there any way I can handle this relation?
Any help would be appreciated!
This is a Feature (though not what I expected, either).
This thread suggests making your key a Primary key to get the behavior you expected:
This is a feature - a NULL value is an
undefined value, therefore two NULL
values are not the same. Can be a
little confusing but makes sense when
you think about it.
A UNIQUE index does ensure that
non-NULL values are unique; you could
specify that your column not accept
NULL values.
The only way I can think of handling this without additional triggers/programming would be to have a single "None of the Above" value in each of the referenced tables, so that your test would look like
INSERT INTO test VALUES (1, NO_LOCATION, NO_WORKGROUP, NO_PROGRAM),
(1, NO_LOCATION, NO_WORKGROUP, NO_PROGRAM)
Where the NO_* identifiers are the right type/length for your ID columns. This would then fail, as you'd expect it.
In MySQL NULL != NULL, or anything. So that is what the UNIQUE doesn't work. You should use another default value for blanks, like zero
I think it's important to note that there is a proper way for NULL values to be interpreted and handled, and the behavior exhibited by the OP is exactly what's intended. You can disregard that behavior, and you can handle your query any way you want without objection from me, but it might be well to "Accept" an answer that describes some form of Best Practices, rather than a non-standard personal preference.
Or if you don't agree with the consensus Best Practice, you can just not Accept any answer.
It's not a race to get an answer accepted as quickly as possible. Deliberation and collaboration are also intended to be part of the process, I think.
I see that this was asked in 2009. However it is often requested from MySQL: https://bugs.mysql.com/bug.php?id=8173 and https://bugs.mysql.com/bug.php?id=17825 for example. People can click on affects me to try and get attention from MySQL.
Since MySQL 5.7 we can now use the following workaround:
ALTER TABLE test
ADD generatedLocationID INT AS (ifNull(locationID, 0)) NOT NULL,
ADD generatedWorkgroupID INT AS (ifNull(workgroupID, 0)) NOT NULL,
ADD generatedProgramID INT AS (ifNull(programID, 0)) NOT NULL,
ADD UNIQUE INDEX (companyID, generatedLocationID, generatedWorkgroupID, generatedProgramID);
The generated columns are virtual generated columns, so they have no storage space. When a user inserts (or updates) then the unique index cause the value of the generated columns to be generated on the fly which is a very quick operation.