How to display item as 'in transit' instead of to specific location id (foreign key)? - mysql

I have following requirements for item management.
Item can be moved from location 'A' to 'B'. And later on it can also be moved from 'B' to 'C' location.
History should be maintained for each item to display it location wise items for specific period, can be display item wise history.
Also I need to display items 'in transit' on particular date.
Given below is the database design:
item_master
-----------
- ItemId
- Item name
- etc...
item_location_history
------------------
- ItemId
- LocationId (foreign key of location_master)
- Date
While item is being transported I want to insert data in following way:
1. At the time of transport I want to enter item to be moved from location 'A' to 'In Transit' on particular date. As there is possibilities that item remains in 'in transit' state for several days.
2. At the time of receive at location 'B' I want to insert item to be moved from 'In Transit' to location 'B' on particular date and so on.
This way I will have track of both 'In Transit' state and item location.
What is the best way to achieve this? What changes I need to apply to the above schema? Thanks.

Initial Response
What is the best way to achieve this?
This is a simple and common Data Modelling Problem, and the answer (at least in the Relational Database context) is simple. I would say, every database has at least a few of these. Unfortunately, because the authors who write books about the Relational Model, are in fact completely ignorant of it, they do not write about this sort of simple straight-forward issue, or the simple solution.
What you are looking for is an OR gate. In this instance, because the Item is in a Location XOR it is InTransit, you need an XOR gate.
In Relational terms, this is a Basetype::Subtype structure. If it is implemented properly, it provides full integrity, and eliminates Nulls.
As far as I know, it is the only Relational method. Beware, the methods provided by famous writers are non-relational, monstrous, massively inefficient, and they don't work.
###Record ID
But first ... I would not be serving you if I didn't mention that your files have no integrity right now, you have a Record Filing System. This is probably not your fault, in that the famous writers know only pre-1970's Record Filing Systems, so that is all that they can teach, but the problem is, they badge it "relational", and that is untrue. They also have various myths about the RM, such as it doesn't support hierarchies, etc.
By starting with an ID stamped on every table, the data modelling process is crippled
You have no Row Uniqueness, as is required for RDBS.
an ID is not a Key.
If you do not understand that, please read this answer.
I have partially corrected those errors:
In Item, I have given a more useful PK. I have never heard any user discuss an Item RecordId, they always uses Codes.
Often those codes are made up of components, if so, you need to record those components in separate columns (otherwise you break 1NF).
Item needs an Alternate Key on Name, otherwise you will allow duplicate Names.
In Location, I have proposed a Key, which identifies an unique physical location. Please modify to suit.
If Location has a Name, that needs to be an AK.
I have not given you the Predicates. These are very important, for many reasons. The main reason here, is that it will prove the insanity of Record IDs. If you want them, please ask.
If you would like more information on Predicates, visit this Answer, scroll down (way down!) to Predicate, and read that section. Also check the ERD for them.
###Solution
What changes [do] I need to apply to the above schema?
Try this:
Item History Data Model
(Obsolete, refer below for the updated mode, in the context of the progression)
If you are not used to the Notation, please be advised that every little tick, notch, and mark, the solid vs dashed lines, the square vs round corners, means something very specific. Refer to the IDEF1X Notation for a full explanation, or Model Anatomy.
If you have not encountered Subtypes implemented properly before, please read this Subtype Overview
That is a self-contained document, with links to code examples
There is also an SO discussion re How to implement referential integrity in subtypes.
When contemplating a Subtype cluster, consider each Basetype::Subtype pair as a single unit, do not perceive them as two fragments, or two halves. Each pair in one fact.
ItemHistory is an event (a fact) in the history of an Item.
Each ItemHistory fact is either a Location fact XOR an InTransit fact.
Each of those facts has different attributes.
Notice that the model represents the simple, honest, truth about the real world that you are engaging. In addition to the integrity, etc, as discussed above, the result is simple straight-forward code: every other "solution" makes the code complex, in order to handle exception cases. And some "solutions" are more horrendous than others.
Dr E F Codd gave this to us in 1970. It was implemented it as a modelling method in 1984, named IDEF1X. That became the standard for Relational Databases in 1993. I have used it exclusively since 1987.
But the authors who write books, allegedly on the Relational Model, have no knowledge whatsoever, about any of these items. They know only pre-1970's ISAM Record Filing Systems. They do not even know that they do not have the Integrity, Power, or Speed of Relational Databases, let alone why they don't have it.
Date, Darwen, Fagin, Zaniolo, Ambler, Fowler, Kimball, are all promoting an incorrect view of the RM.
Response to Comments
1) ItemHistory, contains Discriminator column 'InTransit'.
Correct. And all the connotations that got with that: it is a control element; its values better be constrained; etc.
Shall it be enum with the value Y / N?
First, understand that the value-stored has meaning. That meaning can be expressed any way you like. In English it means {Location|InTransit}.
For the storage, I know it is the values for the proposition InTransit are {True|False}, ...
In SQL (if you want the real article, which is portable), I intended it as a BIT or BOOLEAN. Think about what you want to show up in the reports. In this case it is a control element, so it won't be present in the user reports. There I would stick to InTransit={0|1}.
But if you prefer {Y|N}, that is fine. Just keep that consistent across the database (do not use {0|1} in one place and {Y|N} in another).
For values that do show up in reports, or columns such as EventType, I would use {InTransit|Location}.
In SQL, for implementation, if it BOOLEAN, the domain (range-of-values) is already constrained. nothing further is required.
If the column were other BOOLEAN,` you have two choices:
CHECKConstraint
CHECK #InTransit IN ( "Y", "N" )
Reference or Lookup Table
Implement a table that contains only the valid domain. The requirement is a single column, the Code itself. And you can add a column for short Descriptor that shows up in reports. CHAR(12)works nicely for me.
ENUM
There is no ENUM in SQL. Some of the non-SQL databases have it. Basically it implements option [2] above, with a Lookup table, under the covers. It doesn't realise that the rows are unique, and so it Enumerates the rows, hence the name, but it adds a column for the number, which is of course an ID replete with AUTOINCREMENT, so MySQL falls into the category of Stupid Thing to Do as described in this answer (scroll down to the Lookup Table section).
So no, do not use ENUM unless you wish to be glued at the hip to a home-grown, stupid, non-SQL platform, and suffer a rewrite when the database is ported to a real SQL platform. The platform might be stupid, but that is not a good reason to go down the same path. Even if MySQL is all you have, use one of the two SQL facilities given above, do not use ENUM.
2) Why is'ItemHistoryTransit' needed as 'Date' column
(DATETIME,not DATE, but I don't think that matters.)
[It] is there in ItemHistory?
The standard method of constraining (everything in the database is constrained) the nature of teh Basetype::Subtype relationship is, to implement the exact same PK of the Basetype in the Subtype. The Basetype PK is(ItemCode, DateTime).
[Why] will only Discriminator not work?
It is wrong, because it doesn't follow the standard requirement, and thus allows weird and wonderful values. I can't think of an instance where that could be justified, even if a replacement constraint was provided.
Second, there can well be more than two occs of ItemEventsthat are InTransitper ItemCode,`which that does not allow.
Third, it does not match the Basetype PK value.
Solution
Actually, a better name for the table would be ItemEvent. Labels are keys to understanding.
I have given the Predicates, please review carefully.
Data model updated.
Item Event Data Model

You could add a boolean field for in_transit to item_location_history so when it is getting moved from Location A to Location B, you set the LocationId to Location B (so you know where it is going) but then when it actually arrives you log another row with LocationId as LocationB but with in_transit as false. That way you know when it arrived also.
If you don't need to know where it is headed when it is "in transit" then you could just add "In Transit" as a location and keep your schema the same. In the past with an inventory applicaiton, I went as far as making each truck a location so that we knew what specific truck the item was in.

One of the techniques I've adopted over the years is to normalize transitional attributes (qty, status, location, etc.) from the entity table. If you also want to track the history, just version (versionize?) the subsequent status table.
create table ItemLocation(
ItemID int,
Effective date,
LocationID int,
Remarks varchar( 256 ),
constraint PK_ItemLocation primary key( ItemID, Effective ),
constraint FK_ItemLocation_Item foreign key( ItemID )
references Items( ID ),
constraint FK_ItemLocation_Location foreign key( LocationID )
references Locations( ID )
);
There are several good design options, I've shown the simplest, where "In transit" is implied. Consider the following data:
ItemID Effective LocationID Remarks
====== ========= ========== ===============================
1001 2015-04-01 15 In location 15
1001 2015-04-02 NULL In Transit [to location xx]
1001 2015-04-05 17 In location 17
Item 1001 appears in the database when it arrives at location 15, where it spends one whole day. The next day it is removed and shipped. Three days later it arrives at location 17 where it is remains to this day.
Implied meanings are generally frowned upon and are indeed easy to overdo. If desired, you can add an actual status field to contain "In location" and "In Transit" values. You may well consider such a course if you think there could be other status values added later (QA Testing, Receiving, On Hold, etc.). But for just two possible values, In Location or In Transit, implied meaning works.
At any rate, you know the current whereabouts of any item by fetching the LocationID with the latest Effective date. You also have a history of where the item is at any date -- and both can be had with the same query.
declare AsOf date = sysdate;
select i.*, il.Effective, IfNull( l.LocationName, 'In Transit' ) as Location
from Items i
join ItemLocation il
on il.ItemID = i.ID
and il.Effective =(
select Max( Effective )
from ItemLocation
where ItemID = il.ItemID
and Effective <= AsOf )
left join Locations l
on l.ID = il.LocationID;
Set the AsOf value to "today" to get the most recent location or set it to any date to see the location as of that date. Since the current location will be far and away the most common query, define a view that generates just the current location and use that in the join.
join CurrentItemLocation cil
on cil.ItemID = i.ID
left join Locations l
on l.ID = cil.LocationID;

Related

Database ER Model weekday availability

I've got a annoying design issue when designing a database and it's models. Essentially, the database got clients and customers which should be able to make appointments with eachother. The clients should have their availability (on a general week basis) stored in the database, and this needs to be added to the appointment model. The solution does not require or want precise hours for the availability, just one value for each day - ranging from "not available", to "maybe available " to "available". The only solution i've come up with so far includes having all 7 days stored in a row for each client, but it looks nasty.
So here's some of what I got so far:
Client model:
ClientId
Service,
Fee
Customer-that-uses-Client model:
CustomerId
ServiceNeed
Availability-model:
ClientID (FK/PK)
Monday, (int)
...
...
Sunday (int)
And finally, appointment model:
AppointmentId
ClientID
CustomerID
StartDate
Hourse
Problem: is there any way i can redesign the avilability model to ... well, need less fields and still get each day stored with a (1-3) value depending on the clients availability ? Would also be really good if the appointment model wouldnt need to reference all that data from the availability model...
Problem
Answering the narrow question is easy. However, noting the Relational Database tag, there are a few problems in your model, that render it somewhat less than Relational.
Eg. the data content in each logical row needs to be unique. (Uniqueness on the Record id, which is physical, system-generated, and not from the data, cannot provide row uniqueness.) The Primary Key must be "made up from the data", which is of course the only way to make the data row unique.
Eg. values such as Day of availability and AvailabilityType are not constrained, and they need to be.
Relational Data Model
With the issues fixed, the answer looks like this:
Notation
All my data models are rendered in IDEF1X, the Standard for modelling Relational databases since 1993.
My IDEF1X Introduction is essential reading for those who are new to the Relational Model or data modelling.
Content
In the Relational Model, there is a large emphasis on constraining the data, such that the database as a whole contains only valid data.
The only solution i've come up with so far includes having all 7 days stored in a row for each client, but it looks nasty.
Yes. What you had was a repeating attribute (they are named Monday..Sunday, which may not look like a repeating attribute, but it is one, no less than a CSV list). That breaks Codd's Second Normal Form.
The solution is to place the single element in a subordinate table ProviderAvailable.
Day of availability and AvailabilityType are now constrained to a set of values.
The rows in Provider (sorry, the use of "Client" in this context grates on me) and Customer are now unique, due to addition of a Name. The users will not use an internal number to identify such entities, they will use a name, usually a ShortName.
Once the model is tightened up, and all the columns are defined, if Name (not a combination of LastName, FirstName, Initial) is unique, you can eliminate the RecordId, and elevate the Name AK to the PK.
Not Modelled
You have not asked, and I have not modelled these items, but I suspect they will come up as you progress in the development.
A Provider (Client) provides 1 Service. There may be more than 1 in future.
A Customer, seeking 1 Service, can make an Appointment with any Provider (who may or may not provide that Service). You may want to constrain each Appointment to a Provider who provides the sought Service.
As per my comment. It depends on how tight you want this Availability/Reservation system to be. Right now, there is nothing to prevent more than one Customer reserving one Provider on a particular Day, ie. a double-booking.
Normalize that availability table: instead of
ClientID (FK/PK)
Monday, (int)
...
...
Sunday (int)
go with
ClientID (PK/FK)
weekday integer value (0-6 or maybe 1-7) (PK)
availability integer value 1-3
This table has a compound primary key, made of (ClientID, weekday) because each client may have either zero or one entry for each of the seven weekdays.
In this table, you might have these rows:
43 2 3 (on Tuesdays = 2, client 43 is Available =3)
43 3 2 (on Wednesdays = 3, client 43 is MaybeAvailable =2)
If the row is missing, it means the client is unavailable. an availability value of 1 also means that.

Classpass.com like database design

I am trying to get my head around creating classpass like database design. I'm new to database design and there are a few things that are not quite for me how to implement them and I can't quite get my head around.
You can check the classpass example:
https://classpass.com/classes
https://classpass.com/studios
EDIT 1: So here is the idea: Each city have multiple neighbourhoods having multiple studios/venues.
After reading spencer7593's comment, here is what I came with and the things that are still not quite clear:
So what I am not quite sure about is:
I am not sure how to store the venue/studio address and geolocation. Is it better to have table Region which defines id | name | parent_id and stores the cities and the neighborhoods recursively? Or add a foreign key constraint to city and neighborhoods? Should I store the lan/lon into the venue table, into the address or even separate locations table? I would like to be able to perform searches like:
show me venues in that neighborhood or city
show me venues which are in radius XX from position
Each class should have a schedule and currently I am not sure how to design it. For example: Spinning class, Mo, We, Fr from 9 AM till 10 AM. I would like
to be able to do queries like:
show me venues, which have spinning classes on Mo
or show me all classes in category Spinning, Boxing for example
or even show me venues offering spinning classes
Should I create an extra table schedules here? Or just create some kind of view which creates the schedule? If it's an extra table, how should I describe start, end of each day of the week?
#Dimitar,
Even though #rhavendc is correct, this question should be placed in Database Adminstrator, I will answer your question in respective order to the best of my knowledge.
I am not sure how to store the venue/studio address and geolocation. [...]
You can easily find Geo-Locations by searching on the web. take MyGeoPosition for example.
I would like to be able to perform searches like
show me venues in that neighborhood or city.
You can do this easily. There are a few ways to do it, and each way will require a bit of tweaking with your ERD design. With the example I attached below, you can run a query to list all the venues with the address_id followed by the city id. The yellow entities are the one I added to ensure integrity.
For example:
-- venue.name is using the "[table].[field]" format to help
-- the engine recognize where the field is coming from.
-- This is useful if you are pulling the fields of the
-- same name from different tables.
select venue.name, city.name
from venue join
address using (address_id) join
city using (city_id);
NOTE: You don't have to include the city_name. I just threw it in there so you can try it out to see all the venues matching it.
If you would like to do it by the neighborhood, you would have to tweak the ERD I gave you by adding neighbor_id in the ADDRESS table. I have attached the example below, You would also have to add neighborhood_id From there, you can run a query like this:
Using this ERD:
-- Remember the format from the previously mentioned code.
select venue.name, neighborhood.name
from venue join
address using (address_id) join
neighborhood using (neighbor_id);
show me venues which are in radius XX from position
You can calculate the amount of miles, kilometers, etc. from longitude and latitude using Haversine's Formula.
Each class should have a schedule and currently I am not sure how to design it. For example: Spinning class, Mo, We, Fr from 9 AM till 10 AM. I would like to be able to do queries like:
show me venues, which have spinning classes on Mo
or show me all classes in category Spinning, Boxing for example
or even show me venues offering spinning classes
This can be easily derived from either of the ERDs I attached here. In the CLASS table, I added a field called parent_class_id which gets the class_id from the same table. This uses recursion, and I know this is a bit of a headache to understand. This recursion will allow the classes with assigned parent class to show that the classes are also offered at different times.
You can get this result by doing so:
-- Remember the format from the previously mentioned code.
select class1.name, class1.class_id, class2.class_id
from class as class1,
class as class2
where class1.parent_class_id = class2.class_id;
or even show me venues offering spinning classes
This may be a tricky one... If you are wondering which venues are offering spinning classes, where spinning is either part of or the name of the class, not a category, it's simple.
Try this...
-- Remember the format from the previously mentioned code.
select venue_id
from venue join
class using (venue_id)
where class_name = 'spinning';
NOTE: Keep in mind that most SQL languages are case-sensitive when it comes to searching for literals. You could try using where UPPER(class_name) = 'SPINNING'.
If the class name may include words other than "spinning" in its name, use this instead: where UPPER(class_name) like '%SPINNING%'.
If you are wondering which classes are offering spinning classes where spinning is a category, that's where the tricky bit comes in. I believe you would have to use a subquery for this.
Try this:
-- Remember the format from the previously mentioned code.
select class_id
from class join
class_category using (class_id)
where cat_id = (select cat_id
from category
where name = 'spinning');
Again, SQL engines are usually sensitive when it comes to literal searches. Make sure your cases are in its correct upper or lower cases.
Should I create an extra table schedules here? Or just create some kind of view which creates the schedule? If it's an extra table, how should I describe start, end of each day of the week?
Yes and no. You could, but if you can understand recursion in database systems, you don't have to.
Hope this helps. :)
Entity Relationship Modeling.
An entity is a person, place, thing, concept or event that can be uniquely identified, is important to the business, and we can store information about.
Based on information in the question, some candidates to consider as entities might be:
studio
class
rating
neighborhood
city
For each entity, what uniquely identifies it? Figure out the candidate keys.
And figure out the relationships between the entities, and the cardinalities. (What is related to what, and how many, required or optional?)
Is a studio related to a class?
Can a studio have more than one class?
Can a studio have zero classes?
Can a class be related to more than one studio?
Is a neighborhood related to zero, one or more city?
Can a studio be related to more than one neighborhood?
Once you've got the entities and relationships, getting the attributes assigned to each entity is pretty straightforward. Just make sure every attribute is dependent on the key, the whole key, and nothing but the key.
FIRST
Your question is not suited to be posted here in Stack Overflow for I guess it's best to be posted in Database Administrators.
SECOND
Here are some info for reading, just to give you a good start for building your database:
Data Modeling (It's kinda broad but it's for the better)
Logical Data Model (Short but comprehensive one)
THIRD
Basically, when designing your database you should first know all the data that would be needed in your system and group them (if needed) to make it small. Normalize it to reduce data redundancy.
EXAMPLE
Let's assume that table venue would be your main table or the center of all the transaction in your system. By that, venue may have subdata for example branch that may hold different branch location... and that branch may have subdata too for example schedule, teacher and/or class which may also related to each other (subdata gets data from another subdata)... so forth and so on with dependent tables.
Then you can also create independent tables but still have connections with others. For example the neighborhood table, it may contain the neighbor location and main venue location (so it should get the id of selected venue from the venuetable)... so forth and so on with related and independent tables.
NOTE
Just remember the "one-to-one, one-to-many" relationship. If a data will be going to hold many kinds of subdata, just split them in different table. If a data will be going to hold only (1) kind of subdata, then put it all in one table.

MySQL: Data structure for transitive relations

I tried to design a data structure for easy and fast querying (delete, insert an update speed does not really matter for me).
The problem: transitive relations, one entry could have relations through other entries whose relations I don't want to save separately for every possibility.
Means--> I know that Entry-A is related to Entry-B and also know that Entry-B is related to Entry-C, even though I don't know explicitly that Entry-A is related to Entry-C, I want to query it.
What I think the solution is:
Eliminating the transitive part when inserting, deleting or updating.
Entry:
id
representative_id
I would store them as sets, like group of entries (not mysql set type, the Math set, sorry if my English is wrong). Every set would have a representative entry, all of the set elements would be related to the representative element.
A new insert would insert the Entry and set the representative as itself.
If the newly inserted entry should be connected to another, I simply set the representative id of the newly inserted entry to the referred entry's rep.id.
Attach B to A
It doesn't matter, If I need to connect it to something that is not a representative entry, It would be the same, because every entry in the set would have the same rep.id.
Attach C to B
Detach B-C: The detached item would have become a representative entry, meaning it would relate to itself.
Detach B-C and attach C to X
Deletion:
If I delete a non-representative entry, it is self explanatory. But deleting a rep.entry is harder a bit. I need to chose a new rep.entry for the set and set every set member's rep.id to the new rep.entry's rep.id.
So, delete A in this:
Would result this:
What do you think about this? Is it a correct approach? Am I missing something? What should I improve?
Edit:
Querying:
So, If I want to query every entry that is related to an certain entry, whose id i know:
SELECT *
FROM entries a
LEFT JOIN entries b ON (a.rep_id = b.rep_id)
WHERE a.id = :id
SELECT * FROM AlkReferencia
WHERE rep_id=(SELECT rep_id FROM AlkReferencia
WHERE id=:id);
About the application that requires this:
Basically, I am storing vehicle part numbers (references), one manufacturer can make multiple parts that can replace another and another manufacturer can make parts that are replacing other manufacturer's parts.
Reference: One manufacturer's OEM number to a certain product.
Cross-reference: A manufacturer can make products that objective is to replace another product from another manufacturer.
I must connect these references in a way, when a customer search for a number (doesn't matter what kind of number he has) I can list an exact result and the alternative products.
To use the example above (last picture): B, D and E are different products we may have in store. Each one has a manufacturer and a string name/reference (i called it number before, but it can be almost any character chain). If I search for B's reference number, I should return B as an exact result and D,E as alternatives.
So far so good. BUT I need to upload these reference numbers. I can't just migrate them from an ALL-IN-ONE database. Most of the time, when I upload references I got from a manufacturer (somehow, most of the time from manually, but I can use catalogs too), I only get a list where the manufacturer tells which other reference numbers point to his numbers.
Example.:
Asas filter manufacturer, "AS 1" filter has these cross references (means, replaces these):
GOLDEN SUPER --> 1
ALFA ROMEO --> 101000603000
ALFA ROMEO --> 105000603007
ALFA ROMEO --> 1050006040
RENAULT TRUCKS (RVI) --> 122577600
RENAULT TRUCKS (RVI) --> 1225961
ALFA ROMEO --> 131559401
FRAD --> 19.36.03/10
LANDINI --> 1896000
MASSEY FERGUSON --> 1851815M1
...
It would took ages to write all of the AS 1 references down, but there is many (~1500 ?). And it is ONE filter. There is more than 4000 filter and I need to store there references (and these are only the filters). I think you can see, I can't connect everything, but I must know that Alfa Romeo 101000603000 and 105000603007 are the same, even when I only know (AS 1 --> alfa romeo 101000603000) and (as 1 --> alfa romeo 105000603007).
That is why I want to organize them as sets. Each set member would only connect to one other set member, with a rep_id, that would be the representative member. And when someone would want to (like, admin, when uploading these references) attach a new reference to a set member, I simply INSERT INTO References (rep_id,attached_to_originally_id,refnumber) VALUES([rep_id of the entry what I am trying to attach to],[id of the entry what I am trying to attach to], "16548752324551..");
Another thing: I don't need to worry about insert, delete, update speed that much, because it is an admin task in our system and will be done rarely.
It is not clear what you are trying to do, and it is not clear that you understand how to think & design relationally. But you seem to want rows satisfying "[id] is a member of the set named by member [rep_id]".
Stop thinking in terms of representations and pointers. Just find fill-in-the-(named-)blank statements ("predicates") that say what you know about your application situations and that you can combine to ask about your application situations. Every statement gets a table ("relation"). The columns of the table are the names of the blanks. The rows of the table are the ones that make its statement true. A query has a statement built from its table's statements. The rows of its result are the ones that make its statement true. (When a query has JOIN of table names its statement ANDs the tables' statements. UNION ORs them. EXCEPT puts in AND NOT. WHERE ANDs a condition. Dropping a column by SELECT corresponds to logical EXISTS.)
Maybe your application situations are a bunch of cells with values and pointers. But I suspect that your cells and pointers and connections and attaching and inserting are just your way of explaining & justifying your table design. Your application seems to have something to do with sets or partitions. If you really are trying to represent relations then you should understand that a relational table represents (is) a relation. Regardless, you should determine what your table statements are. If you want design help or criticism tell us more about your application situations, not about representation of them. All relational representation is by tables of rows satisfying statements.
Do you really need to name sets by representative elements? If we don't care what the name is then we typically use a "surrogate" name that is chosen by the DBMS, typically via some integer auto-increment facility. A benefit of using such a membership-independent name for a set is that we don't have to rename, in particular by choosing an element.

design database relating to time attribute

I want to design a database which is described as follows:
Each product has only one status at one time point. However, the status of a product can change during its life time. How could I design the relationship between product and status which can easily be queried all product of a specific status at current time? In addition, could anyone please give me some in-depth details about design database which related to time duration as problem above? Thanks for any help
Here is a model to achieve your stated requirement.
Link to Time Series Data Model
Link to IDEF1X Notation for those who are unfamiliar with the Relational Modelling Standard.
Normalised to 5NF; no duplicate columns; no Update Anomalies, no Nulls.
When the Status of a Product changes, simply insert a row into ProductStatus, with the current DateTime. No need to touch previous rows (which were true, and remain true). No dummy values which report tools (other than your app) have to interpret.
The DateTime is the actual DateTime that the Product was placed in that Status; the "From", if you will. The "To" is easily derived: it is the DateTime of the next (DateTime > "From") row for the Product; where it does not exist, the value is the current DateTime (use ISNULL).
The first model is complete; (ProductId, DateTime) is enough to provide uniqueness, for the Primary Key. However, since you request speed for certain query conditions, we can enhance the model at the physical level, and provide:
An Index (we already have the PK Index, so we will enhance that first, before adding a second index) to support covered queries (those based on any arrangement of { ProductId | DateTime | Status } can be supplied by the Index, without having to go to the data rows). Which changes the Status::ProductStatus relation from Non-Identifying (broken line) to Identifying type (solid line).
The PK arrangement is chosen on the basis that most queries will be Time Series, based on Product⇢DateTime⇢Status.
The second index is supplied to enhance the speed of queries based on Status.
In the Alternate Arrangement, that is reversed; ie, we mostly want the current status of all Products.
In all renditions of ProductStatus, the DateTime column in the secondary Index (not the PK) is DESCending; the most recent is first up.
I have provided the discussion you requested. Of course, you need to experiment with a data set of reasonable size, and make your own decisions. If there is anything here that you do not understand, please ask, and I will expand.
Responses to Comments
Report all Products with Current State of 2
SELECT ProductId,
Description
FROM Product p,
ProductStatus ps
WHERE p.ProductId = ps.ProductId -- Join
AND StatusCode = 2 -- Request
AND DateTime = ( -- Current Status on the left ...
SELECT MAX(DateTime) -- Current Status row for outer Product
FROM ProductStatus ps_inner
WHERE p.ProductId = ps_inner.ProductId
)
ProductId is Indexed, leading col, both sides
DateTime in Indexed, 2nd col in Covered Query Option
StatusCode is Indexed, 3rd col in Covered Query Option
Since StatusCode in the Index is DESCending, only one fetch is required to satisfy the inner query
the rows are required at the same time, for the one query; they are close together (due to Clstered Index); almost always on the same page due to the short row size.
This is ordinary SQL, a subquery, using the power of the SQL engine, Relational set processing. It is the one correct method, there is nothing faster, and any other method would be slower. Any report tool will produce this code with a few clicks, no typing.
Two Dates in ProductStatus
Columns such as DateTimeFrom and DateTimeTo are gross errors. Let's take it in order of importance.
It is a gross Normalisation error. "DateTimeTo" is easily derived from the single DateTime of the next row; it is therefore redundant, a duplicate column.
The precision does not come into it: that is easily resolved by virtue of the DataType (DATE, DATETIME, SMALLDATETIME). Whether you display one less second, microsecond, or nanosecnd, is a business decision; it has nothing to do with the data that is stored.
Implementing a DateTo column is a 100% duplicate (of DateTime of the next row). This takes twice the disk space. For a large table, that would be significant unnecessary waste.
Given that it is a short row, you will need twice as many logical and physical I/Os to read the table, on every access.
And twice as much cache space (or put another way, only half as many rows would fit into any given cache space).
By introducing a duplicate column, you have introduced the possibility of error (the value can now be derived two ways: from the duplicate DateTimeTo column or the DateTimeFrom of the next row).
This is also an Update Anomaly. When you update any DateTimeFrom is Updated, the DateTimeTo of the previous row has to be fetched (no big deal as it is close) and Updated (big deal as it is an additional verb that can be avoided).
"Shorter" and "coding shortcuts" are irrelevant, SQL is a cumbersome data manipulation language, but SQL is all we have (Just Deal With It). Anyone who cannot code a subquery really should not be coding. Anyone who duplicates a column to ease minor coding "difficulty" really should not be modelling databases.
Note well, that if the highest order rule (Normalisation) was maintained, the entire set of lower order problems are eliminated.
Think in Terms of Sets
Anyone having "difficulty" or experiencing "pain" when writing simple SQL is crippled in performing their job function. Typically the developer is not thinking in terms of sets and the Relational Database is set-oriented model.
For the query above, we need the Current DateTime; since ProductStatus is a set of Product States in chronological order, we simply need the latest, or MAX(DateTime) of the set belonging to the Product.
Now let's look at something allegedly "difficult", in terms of sets. For a report of the duration that each Product has been in a particular State: the DateTimeFrom is an available column, and defines the horizontal cut-off, a sub set (we can exclude earlier rows); the DateTimeTo is the earliest of the sub set of Product States.
SELECT ProductId,
Description,
[DateFrom] = DateTime,
[DateTo] = (
SELECT MIN(DateTime) -- earliest in subset
FROM ProductStatus ps_inner
WHERE p.ProductId = ps_inner.ProductId -- our Product
AND ps_inner.DateTime > ps.DateTime -- defines subset, cutoff
)
FROM Product p,
ProductStatus ps
WHERE p.ProductId = ps.ProductId
AND StatusCode = 2 -- Request
Thinking in terms of getting the next row is row-oriented, not set-oriented processing. Crippling, when working with a set-oriented database. Let the Optimiser do all that thinking for you. Check your SHOWPLAN, this optimises beautifully.
Inability to think in sets, thus being limited to writing only single-level queries, is not a reasonable justification for: implementing massive duplication and Update Anomalies in the database; wasting online resources and disk space; guaranteeing half the performance. Much cheaper to learn how to write simple SQL subqueries to obtain easily derived data.
"In addition, could anyone please give me some in-depth details about design database which related to time duration as problem above?"
Well, there exists a 400-page book entitled "Temporal Data and the Relational Model" that addresses your problem.
That book also addresses numerous problems that the other responders have not addressed in their responses, for lack of time or for lack of space or for lack of knowledge.
The introduction of the book also explicitly states that "this book is not about technology that is (commercially) available to any user today.".
All I can observe is that users wanting temporal features from SQL systems are, to put it plain and simple, left wanting.
PS
Even if those 400 pages could be "compressed a bit", I hope you don't expect me to give a summary of the entire meaningful content within a few paragraphs here on SO ...
tables similar to these:
product
-----------
product_id
status_id
name
status
-----------
status_id
name
product_history
---------------
product_id
status_id
status_time
then write a trigger on product to record the status and timestamp (sysdate) on each update where the status changes

			
				
Google "bi-temporal databases" and "slowly changing dimensions".
These are two names for esentially the same pattern.
You need to add two timestamp columns to your product table "VALID_FROM" and "VALID_TO".
When your product status changes you add a NEW row with "VALID_FROM" of now() some other known effective data/time and set the "VALID_TO" to 9999-12-31 23:59:59 or some other date ridiculously far into the future.
You also need to zap the "9999-12-31..." date on the previously current row to the current "VALID_FROM" time - 1 microsecond.
You can then easily query the product status at any given time.

The best way to structure this database?

At the moment I'm doing this:
gems(id, name, colour, level, effects, source)
id is the primary key and is not auto-increment.
A typical row of data would look like this:
id => 40153
name => Veiled Ametrine
colour => Orange
level => 80
effects => +12 sp, +10 hit
source => Ametrine
(Some of you gamers might see what I'm doing here :) )
But I realise this could be sorted a lot better. I have studied database relationships and secondary keys in my A-Level computing class but never got as far as to set one up properly. I just need help with how this database should be organised, like what tables should have what data with what secondary and foreign keys?
I was thinking maybe 3 tables: gem, effects, source. Which then have relationships to each other?
Can anyone shed some light on this? Is a complex way like I'm proposing really the way to go or should I just carry on with what I'm doing?
Cheers.
I happen to be passingly familiar with the environment you're describing (:))
Despite what you have convinced yourself, what you are doing is not particularly complex.
Anyway, currently, you have a table with no relationships. It's simple. It's easy. Each gem exists in the database.
If you were to move to the three tables that you proposed, you would also need to include link tables to assemble the tables into useable data, especially since (and mind, I'm not quite sure how your distinctions boil out) the effects and source table are involved in a many-to-x relationship: each gem has up to two effects, and each effect has up to Y gems where it is present // each source has up to Z gems.
I'd stick with the single table. The individual records may be longer, but its much simpler, and you'll encounter fewer errors than if you were trying to establish linking tables or the like.
Questions to ask yourself:
Is there a 1 to 1 relationship between gem, effects, and source?
Would you more often be pulling effects without pulling data from gem?
If the proposed tables have a 1 to 1 relationship then I'd suggest leaving them combined in one table. The only time I would consider splitting them out in this condition is if I only needed data from effects without needing other data AND these tables were going to be large enough to justify having them stored on different drives. Otherwise, you're just making work for yourself, adding more storage requriements and reaping exactly zero benefits.
You should also consider whether you will need the effects information for actual usage, or display only. If it is display only, no big deal to have it in one column in a table. If you have to use it, for example to apply the +12 and +10 appropriately, then I think you should put each occurrence of it in a separate column. Accordingly, you should have a separate table for effects, and then a separate table storing which gems have which effects, maybe gemeffects. The Effects table might have better descriptions of what "sp" stands for, maybe the min and max ranges, etc. The GemEffects table would just have the gem id, the value, and the effect itself. For example
Effects
effect => hit
desc => How many hit points
minimum => 0
maximum => 100
GemEffects
id => 40153
effect => sp
value => 12
and
id => 40153
effect => hit
value => 10
You would answer your own question if you do a simple exercise: describe in a natural, descriptive language your system. Which entities, their attributes, how they interact with other entities, etc. Underline substantives and verbs. Ask what entities do you mean to manage (eg: will there be an interface to manage the "effects" table?) You'll be surprised how it all gets assembled naturally.
Now for your example, I'd suggest two approaches (without syntactic details)
1) to gain experience in relational design, with some complexity overhead, and granular over each entity
gem (id, name,color_id,source_id,effect_assoc_id)
color (id, name)
source (id, name)
effect (id,value,nature_id)
nature (id, name)
effect_assoc (id, gem_id, effect_id)
2) straight to the point, possibly valid depending on the cardinality of your relations
just carry on ;)
From your description, I'd go with #1.
I would recommend the following:
Move all effects into their own table (e.g., ID, Name, Description, Enabled, ...)
Move source into its own table (e.g., ID, Name, Description, Enabled, ...)
Drop gems "effects" column (migrates to step 5 below)
Convert the gems "source" column into a foreign key value that corresponds to the PK from the "source" table
Add a new table to link a single gem entity to zero or more effect entities
Example: tbl_GemsEffectsLink, with two columns named "GemID" and "EffectID," that by
themselves are foreign keys back to the entity tables and when taken together, make up the
composite primary key.
A sample view of this link table would be as follows:
GemID EffectID
1 1
1 2
2 1
2 2
2 3
So, in summary, you would have the following tables:
gems
effects
source
gemseffectslink
With each table having the following columns:
gems
id (PK)
name
colour
level
sourceid (FK)
effects
id (PK)
name
description
enabled
...
source
id (PK)
name
description
enabled
...
gemseffectslink
gemid (FK)
effectid (FK)
Lastly, this assume each gem can have zero or more effects, a single source (you can enforce NULL or NOT NULL for this gem.sourceid FK field), and that the level integer value is just that (i.e., not representing something more robust and exhaustive in that there exists some type of "Level" entity and the value of "80" in your sample data row uniquely identifies one of these "Level" entities).
Hope this helps!
Michael