Optimal solution for: Keeping state of entity in database - mysql

Problem
When I design structure of a database, I often work with tables that can have a state assigned. For example a response for an offer - this response table can, for instance, have these states:
waiting - response was created and waits for approval by owner of offer
cancelled - response was cancelled by its author
approved - response was approved by the author of offer
rejected - response was rejected by the author of offer
expired - response expired together with associated offer
I am considering these two solutions
1. Solution
Create table response_state and keep its key in the response table as foreign key
Pros:
All states are together in one table
New state can be added easily
Cons:
Synchronization of response_state foreign key value with other response's columns is necessary. For example, for expiration - when expiration day is reached, state has to be changed to "expired".
2. Solution
Put approved/rejected/cancelled logical value columns into response and create view view_response_state that will contain a column with state name according to values in these columns and the expiration date.
For example, if approved is false, rejected is false, cancelled is false and expiration_date < today, then state is "waiting" etc.
Pros:
No synchronization needed, all data are kept once in the db
Cons:
When I want to add a new state I have to change table response and provide view_response_state with logic of identification of such state
Question
My question is, which approach would you choose? Or is there a better approach?

In my opinion you need three tables.
One is to be called "response_state". It contains five rows, one for each of your response names. If you need to add a new response name, just INSERT it to this table. It has the column "response_state_id." Little tables like this are often called codelist tables.
Another is to be called "offer". It will have an offer_id and other information as needed about the offer.
The third is "response." It contains the following columns.
response_id pk, autoincrement
offer_id fk to offer table
response_state_id fk to response_state table
response_timestamp
(other columns relating to the response as needed)
This table works as follows: Anytime the state of a response changes, you INSERT a row to this table showing the new state. You never UPDATE these rows. You might DELETE old ones in a purge process for completed transactions.
When you need to find the current state of an offer you give a query like this. It pulls only the most recent response to each offer from the table.
SELECT r.offer_id, r.response_state_id, rs.response_state_name
FROM response AS r
JOIN response_state AS rs ON r.response_state_id = rs.response_state_id
JOIN (
SELECT MAX(response_id) as latest_id,
offer_id
FROM resp
GROUP BY offer_id
) AS recent ON r.response_id = resp.latest_id
This is a really cool way to handle this because it retains the history of responses to each offer. Because it's an INSERT-only solution it's inherently robust against various kinds of race conditions if lots of responses come in on top of each other.

Related

How to handle changes in a relationship which would have an impact if there was an update [duplicate]

What is the best-practice for maintaining the integrity of linked data entities on update?
My scenario
I have two entities "Client and
Invoice". [client is definition and
Invoice is transaction].
After issuing many invoices to the
client it happens that the client
information needs to be changed
e.g. "his billing address/location
changed or business name ... etc".
It's normal that the users must be
able to update the client
information to keep the integrity of
the data in the system.
In the invoice "transaction entity"
I don't store just the client id but
also all the client information related to the
invoice like "client name, address,
contact", and that's well known
approach for storing data in
transaction entities.
If the user created a new invoice the
new client information will be
stored in the invoice record along
with the same client-id (very
obvious!).
My Questions
Is it okay to bind the data entities
"clients" from different locations
for the Insert and the update?
[Explanation: if I followed the
approach from step 1-4 I have to
bind the client entity from the
client table in case of creating new
invoice but in case of
updating/printing the invoice I have
to bind the client entity from the
invoice table otherwise the data
won't be consistent or integer...So
how I can keep the data integrity
without creating spaghetti code in
the DAL to handle this custom
requirements of data binding??]
I passed through a system that was
saving all previous versions of an
entity data before the update
"keeping history of all versions".
If I want to use the same method to
avoid the custom binding how I can
do this in term of database design
"Using MYSQL"? [Explanation: some
invoices created with version 1.0 of
the client then the client info
updated and its version became 1.1
and new invoices created with last
version...So is it good to follow
this methodology? and how I should
design my entities/tables to fulfil the requirements of entity
versioning and binding?
Please provide any book or reference
that can kick me in the right
direction?
Thanks,
What you need to do is leave the table the way it is. You are correct, you should be storing the customer information in the invoice for history of where the items were shipped to. When it changes, you should NOT update this information except for any invoices which have not yet been shipped. To maintain this type of information, you need a trigger on the customer table that looks for invoices that have not been shippe and updates those addresses automatically.
If you want to save historical versions of the client information, the correct process is to create an audit table and populate it through a trigger.
Data integrity in this case is simply through a foreign key to the customer id. The id itself should not ever change or be allowed to change by the user and should be a surrogate number such as an integer. Becasue you should not be changing the address information in the actual invoice (unless it has not been shipped in which case you had better change it or the product will be shipped to the wrong place), this is sufficent to maintain data integrity. This also allows you to see where the stuff was actually shipped but still look up the current info about the client through the use of the foreign key.
If you have clients that change (compaies bought by other companies), you can either run a process onthe server to update the customer id of old records or create a table structure that show which client ids belong to a current parent id. The first is easier to do if you aren;t talking about changing millions of records.
"This is a business case where data mnust be denormalized to preserve historical records of what was shipped where. His design is not incorrect."
Sorry for adding this as a new response, but the "add comment" button still doesn't show.
"His design" is indeed not incorrect ... because it is normalized !!!
It is normalized because it is not at all times true that the address corresponding to an invoice functionally depends on the customer ID exclusively.
So : normalization, yes I do think so. Not that normalization is the only issue involved here.
I'm not completely clear on what you are getting at, but I think you want to read up on normalization, available in many books on relational databases and SQL. I think what you will end up with is two tables connected by a foreign key, but perhaps some soul-searching per previous sentence will help you clarify your thoughts.

How to track change(Update/delete) in MYSQL for later query (NOT FOR LOG)

I have research some question in stackoverflow, but what I want is for later query purpose, not for logging purose.
I have a project that needs to get value from certain moment.
For example
I have a user table
User:
id
name
address
Pet:
id
name
type
Adoption:
id
user_id
pet_id
Data:
User:
1, John, One Street
Pet:
1, Lucy, Cat
Adoption:
1, 1, 1
Let's say the user change address so it look like
User:
1, John, Another Street
And what I need is
What is the address(or other field) of the user when they adopt the pet.
What I am thinking of is always create a new row in same table(in this case user) and refer the new row to the previous row
User:
2, 1, John, Another Street ( where 1 is referring to the previous id / updated from)
1, NULL, John, One Street, deleted (NULL means this is newly created data)
The advantage of using this is, it's easy to query(I just query like usual
The downside is the table will be so huge to record every update. Is there any solution?
Thank you
This is what i do sometimes:
For any field that i need to track value changes, i design a separate changes table.
For example, for the address field that is a concept associated with the user entity and is not a direct property of the adoption entity, i define the table:
UserAddressChanges(UserID, Address, ChangeDateTime, ChangerPersonID)
This way, the changes data may be used in any other sub-system or system, independent of your current adoption use-case.
I use in-table change tracking for very simple tables like:
UniversityManagers(PersonID, AssignDateTime, AssignorPersonID)
For more complex tables with frequent changes (and usually, few refers to previous data) where i need full record logging, i separate the main table (of current records) and the log table which have extra fields such as LogID, ChangeDateTime, ChangerPersonID, ChangerIP, ...
There are different approaches to this.
Perhaps the simplest is to denormalize the data. If there is data you need at the point of adoption, include it as columns in the adoption table. This address is the "point-in-time" address.
This method is useful for simple things, but it does not scale well. And you have to pre-define the columns you want.
The next step is to create audit tables for all your tables, or at least all tables of interest. Every time a record changes in user, a new record is added into userAudit. Audit tables are usually maintained using triggers.
The advantage of audit tables is that they do not clutter the existing table (and logic). The same queries work on the existing tables.
Finally, you can just cave in and realize that your data model is overly simplified. You really have slowly changing dimensions. This data can be represented using version effective dates and version end dates for each row. The user table ends up looking like:
user_id name address version_eff_dt version_end_dt
Because user_id is no longer a primary key, you might want two tables users and userHistory, or something like that.
This is a "correct" representation of the data at any point in time. However, it usually requires restructuring queries because a single user appears multiple times in the table -- and user_id is no longer the primary key.

Database ER Model weekday availability

I've got a annoying design issue when designing a database and it's models. Essentially, the database got clients and customers which should be able to make appointments with eachother. The clients should have their availability (on a general week basis) stored in the database, and this needs to be added to the appointment model. The solution does not require or want precise hours for the availability, just one value for each day - ranging from "not available", to "maybe available " to "available". The only solution i've come up with so far includes having all 7 days stored in a row for each client, but it looks nasty.
So here's some of what I got so far:
Client model:
ClientId
Service,
Fee
Customer-that-uses-Client model:
CustomerId
ServiceNeed
Availability-model:
ClientID (FK/PK)
Monday, (int)
...
...
Sunday (int)
And finally, appointment model:
AppointmentId
ClientID
CustomerID
StartDate
Hourse
Problem: is there any way i can redesign the avilability model to ... well, need less fields and still get each day stored with a (1-3) value depending on the clients availability ? Would also be really good if the appointment model wouldnt need to reference all that data from the availability model...
Problem
Answering the narrow question is easy. However, noting the Relational Database tag, there are a few problems in your model, that render it somewhat less than Relational.
Eg. the data content in each logical row needs to be unique. (Uniqueness on the Record id, which is physical, system-generated, and not from the data, cannot provide row uniqueness.) The Primary Key must be "made up from the data", which is of course the only way to make the data row unique.
Eg. values such as Day of availability and AvailabilityType are not constrained, and they need to be.
Relational Data Model
With the issues fixed, the answer looks like this:
Notation
All my data models are rendered in IDEF1X, the Standard for modelling Relational databases since 1993.
My IDEF1X Introduction is essential reading for those who are new to the Relational Model or data modelling.
Content
In the Relational Model, there is a large emphasis on constraining the data, such that the database as a whole contains only valid data.
The only solution i've come up with so far includes having all 7 days stored in a row for each client, but it looks nasty.
Yes. What you had was a repeating attribute (they are named Monday..Sunday, which may not look like a repeating attribute, but it is one, no less than a CSV list). That breaks Codd's Second Normal Form.
The solution is to place the single element in a subordinate table ProviderAvailable.
Day of availability and AvailabilityType are now constrained to a set of values.
The rows in Provider (sorry, the use of "Client" in this context grates on me) and Customer are now unique, due to addition of a Name. The users will not use an internal number to identify such entities, they will use a name, usually a ShortName.
Once the model is tightened up, and all the columns are defined, if Name (not a combination of LastName, FirstName, Initial) is unique, you can eliminate the RecordId, and elevate the Name AK to the PK.
Not Modelled
You have not asked, and I have not modelled these items, but I suspect they will come up as you progress in the development.
A Provider (Client) provides 1 Service. There may be more than 1 in future.
A Customer, seeking 1 Service, can make an Appointment with any Provider (who may or may not provide that Service). You may want to constrain each Appointment to a Provider who provides the sought Service.
As per my comment. It depends on how tight you want this Availability/Reservation system to be. Right now, there is nothing to prevent more than one Customer reserving one Provider on a particular Day, ie. a double-booking.
Normalize that availability table: instead of
ClientID (FK/PK)
Monday, (int)
...
...
Sunday (int)
go with
ClientID (PK/FK)
weekday integer value (0-6 or maybe 1-7) (PK)
availability integer value 1-3
This table has a compound primary key, made of (ClientID, weekday) because each client may have either zero or one entry for each of the seven weekdays.
In this table, you might have these rows:
43 2 3 (on Tuesdays = 2, client 43 is Available =3)
43 3 2 (on Wednesdays = 3, client 43 is MaybeAvailable =2)
If the row is missing, it means the client is unavailable. an availability value of 1 also means that.

Database design: Managing old and new data in database table

I have a table Student with field as followed,
Student table (one record per student)
student_id
Name
Parent_Name
Address_line1, Address_line2, Addess_line
Photo_path
Signature_file_path
Preferred_examcity_choice1,Preferred_examcity_choice1, Preferred_examcity_choice3
Gender
Nationality
.
.
.
I am inserting into this table on Registration form completion through the web interface.
Now there is one more module in a web interface for updating the student data, on every update request I am updating the student table records and inserting the new entry in student_data_change_request. student can change records any number of times.
student_data_change_request
request_id(auto_incr PK)
old_name
new_name
old_photo_path
new_photo_path
old_signature_file_path
new_signature_file_path
Now coming to problem, earlier students were allowed to change very few fields, now client want to allow the candidate to update more number of fields(around 20 fields) and adding old and new columns for the corresponding column isn't elegant and preferred(I guess), I will end up creating 40 columns to keep track of 20 columns. So how should I redesign my table? suggestions are welcomed.
One approach is to have a shadow table named (table)_xx that has the same columns, the time, date, update/insert/delete flag, user or whatever and no referential integrity. Set a trigger to update that table from the source whenever anything happens.
If you've got genuine business requirements that need history then do those properly but this pattern is great as a general audit, debugging and forensic tool.
It's also really easy to automate/script as you just generate it from the DB metadata.
Usually historical table looks like:
request_id
column_name
old_value
new_value
dt
request_id and column_name are primary key. When you update student table you insert new entry in student_data_change_request for each updating column.
Edited:
Another way:
request_id
value_type
name
photo_path
signature_file_path
...
and insert first entry with old values and second entry with new values. Colum value_type is mark old or new.
I would rather have just one table, with an additional column for effective date. Then a view that picks up just the most recent row for each student_id becomes your first "table". If for some reason you must show "current" and "most recently changed" values side-by-side, that is another view.
As usual, it all depends on how you intend to use the data.
My strong preference in these cases is the solution #mathguy suggests - embedding the concept of time in the main table design. This allows you to ask the question "what was this student's address on 1 Jan?", or "who had signature x on 12 Feb?".
If you have to report or execute business logic that reflects the status at any point in time, this design works really well. For instance, if you have to report on how many students lived in a particular address for a given term, you want to know when the records were valid.
But not all applications care about "time" - sometimes, you just want to have an audit table, so you can trace what happened over time in case of anomalies.
In that case, #loztinspace's solution is useful - but in my experience, this rapidly escalates into more work, because those who want to inspect the audit records can or should not get access to a SQL prompt on your production environment.

Access query is duplicating unique records / Linked table issues

I hope someone can help me with this:
I have a simple query combining a list of names and basic details with another table containing more specific information. Some names will necessarily appear more than once and arbitrary distinctions like "John Smith 1" and "John Smith 2" are not an option, so I have been using an autonumber to keep the records distinct.
The problem is that my query is creating two records for each name that appears more than once. For example, there are two clients named 'Sophoan', each with a different id number, and the query has picked up each one twice resulting in four records (in total there are 122 records when there should only be 102). 'Unique values' is set to 'yes'.
I've researched as much as I can and am completely stuck. I've tried to tinker with sql but it always comes back with errors, I presume because there are too many fields in the query.
What am I missing? Or is a query the wrong approach and I need to find another way to combine my tables?
Project in detail: I'm building a database for a charity which has two main activities: social work and training. The database is to record their client information and the results of their interactions with clients (issues they asked for help with, results of training workshops etc.). Some clients will cross over between activities which the organisation wants to track, hence all registered clients go into one list and individual tables spin of that to collect data for each specific activity the client takes part in. This query is supposed to be my solution for combining these tables for data entry by the user.
At present I have the following tables:
AllList (master list of client names and basic contact info; 'Social Work Register' and 'Participant Register' join to this table by
'Name')
Social Work Register (list of social work clients with full details
of each case)
Social Work Follow-up Table (used when staff call social work clients
to see how their issue is progressing; the register has too many
columns to hold this as well; joined to Register by 'Client Name')
Participants Register (list of clients for training and details of
which workshops they were attended and why they were absent if they
missed a session)
Individual workshop tables x14 (each workshop includes a test and
these tables records the clients answers and their score for each
individual test; there will be more than 20 of these when the
database is finished; all joined to the 'Participants Register' by
'Participant Name')
Queries:
Participant Overview Query (links the attendance data from the 'Register' with the grading data from each Workshop to present a read-only
overview; this one seems to work perfectly)
Social Work Query (non-functional; intended to link the 'Client
Register' to the 'AllList' for data entry so that when a new client
is registered it creates a new record in both tables, with the
records matched together)
Participant Query (not yet attempted; as above, intended to link the
'Participant Register' to the 'AllList' for data entry)
BUT I realised that queries can't be used for data entry, so this approach seems to be a dead end. I have had some success with using subforms for data entry but I'm not sure if it's the best way.
So, what I'm basically hoping to achieve is a way to input the same data to two tables simultaneously (for new records) and have the resulting records matched together (for new entries to existing records). But it needs to be possible for the same name to appear more than once as a unique record (e.g. three individuals named John Smith).
[N.B. There are more tables that store secondary information but aren't relevant to the issue as they are not and will not be linked to any other tables.]
I realised that queries can't be used for data entry
Actually, non-complex queries are usually editable as long as the table whose data you want to edit remains 'at the core' of the query. Access applies a number of factors to determine if a query is editable or not.
Most of the time, it's fairly easy to figure out why a query has become non-editable.
Ask yourself the question: if I edit that data, how will Access ensure that exactly that data will be updated, without ambiguity?
If your tables have defined primary keys and these are part of your query, and if there are no grouping, calculated fields (fields that use some function to change or test the value of that field), or complex joins, then the query should remain editable.
You can read more about that here:
How to troubleshoot errors that may occur when you update data in Access queries and in Access forms
Dealing with Non-Updateable Microsoft Access Queries and the Use of Temporary Tables.
So, what I'm basically hoping to achieve is a way to input the same data to two tables simultaneously (for new records) and have the resulting records matched together (for new entries to existing records). But it needs to be possible for the same name to appear more than once as a unique record (e.g. three individuals named John Smith).
This remark actually proves that you have design issues in your database.
A basic tenet of Database Design is to remove redundancy as much as possible. One of the reasons is actually to avoid having to update the same data in multiple places.
Another remark: you are using the Client's name as a Natural Key. Frankly, it is not a very good idea. Generally, you want to make sure that what constitutes a Primary key for a table is reliably unique over time.
Using people's names is generally the wrong choice because:
people change name, for instance in many cultures, women change their family name after they get married.
There could also have been a typo when entering the name and now it can be hard to correct it if that data is used as a Foreign Key all in different tables.
as your database grows, you are likely to end up with some people having the same name, creating conflicts, or forcing the user to make changes to that name so it doesn't create a duplicate.
The best way to enforce uniqueness of records in a table is to use the default AutoNumber ID field proposed by Access when you create a new table. This is called a Surrogate key.
It's not mean to be edited, changed or even displayed to the user. It's sole purpose is to allow the primary key of a table to be unique and non-changing over time, so it can reliably be used as a way to reference a record from one table to another (if a table needs to refer to a particular record, it will contain a field that will hold that ID. That field is called a Foreign Key).
The names you have for your tables are not precise enough: think of each table as an Entity holding related data.
The fact that you have a table called AllList means that its purpose isn't that well-thought of; it sounds like a catch-all rather than a carefully crafted entity.
Instead, if this is your list of clients, then simply call it Client. Each record of that table holds the information for a single client (whether to use plural or singular is up to you, just stick to your choice though, being consistent is hugely important).
Instead of using the client's name as a key, create an ID field, an Autonumber, and set it as Primary Key.
Let's also rename the "Social Work Register", which holds the Client's cases, simply as ClientCase. That relationship seems clear from your description of the table but it's not clear in the table name itself (by the way, I know Access allows spaces in table and field names, but it's a really bad idea to use them if you care at least a little bit about the future of your work).
In that, create a ClientID Number field (a Foreign Key) that will hold the related Client's ID in the ClientCase table.
You don't talk about the relationship between a Client and its Cases. This is another area where you must be clear: how many cases can a single Client have?
At most 1 Case ? (0 or 1 Case)
exactly 1 Case?
at least one Case? (1 or more Cases)
any number of Cases? (0 or more Cases)
Knowing this is important for selecting the right type of JOIN in your queries. It's a crucial part of the design assumptions when building your database.
For instance, in the most general case, assuming that a Client can have 0 or more cases, you could have a report that displays the Client's Name and the number of cases related to them like this:
SELECT Client.Name,
Count(ClientCase.ID) AS CountOfCases
FROM Client
LEFT JOIN ClientCase
ON Client.ID = ClienCase.ClientID
GROUP BY Client.Name
You've described your basic design a bit more, but that's not enough. Show us the actual table structures and the SQL of the queries you tried. From the description you give, it's hard to really understand the actual details of the design and to tell you why it fails and how to make it work.