MySQL: Custom Auto-Generated Key (AUTO_INCREMENT / multiple-column index) - mysql

MySQL is used as database.
As part of inventory system, I need generate stock number, that is unique identifier for asset. Client has requirement that this number is not just autoincremented integer, but follows pattern:
#BusinessUnit#YYYY#Number
, where
#BusinessUnit = string representing business unit;
#YYYY = current year;
#Number=n = Unique number for this BusinessUnit & This Year: n-th item asset registered in system this year and ready for sale.
For example,lets say we have we have various users entering assets for 2 Business Units = {NY, CA}. Stock numbers would be expected as follows:
NY201100001
NY201100002
CA201100001
NY201100003
CA201100002
So far based on manuals available, first thought would be using AUTO_INCREMENT and have separate table for each business unit with trigger on insert, where after insert from numberic auto-generated id update inventory table containing all business unit assets with generated id with concatenated business unit and year in front.
Also as first thing in new year reset AUTO_INCREMENT = 0 - alter all tables.
Is there any better way and ability avoid need create multiple tables, can I just create somehow multi-column index? If yes, could you please provide appropriate table definition sample?

DISREGARD THE CLIENT (partially).
Create your tables with auto-increment "InventoryID" for purposes of guaranteeing simple, disconnected context to anything else on the record. Create a SECOND column that is their "InventoryIDUnit" which can be a "candidate" key matching the business rules you are responsible for keeping. When a search is done on the "InventoryIDUnit" (specifically formatted field), internally and through the rest of your system, you'll have the INTERNAL numeric for joining the rest of the way down through the system.
Think of a customer order system. If you had it based on a person's name, how many "Jane Doe" versions out there... are they the same or not... Internally, customers have an ID and all orders go back to that common ID. Then, Jane gets married and is now "Jane Smith"... Are you going to go back through the data and rename all the entries to the new name??? That's the whole purpose of a surrogate key.

Related

How to track change(Update/delete) in MYSQL for later query (NOT FOR LOG)

I have research some question in stackoverflow, but what I want is for later query purpose, not for logging purose.
I have a project that needs to get value from certain moment.
For example
I have a user table
User:
id
name
address
Pet:
id
name
type
Adoption:
id
user_id
pet_id
Data:
User:
1, John, One Street
Pet:
1, Lucy, Cat
Adoption:
1, 1, 1
Let's say the user change address so it look like
User:
1, John, Another Street
And what I need is
What is the address(or other field) of the user when they adopt the pet.
What I am thinking of is always create a new row in same table(in this case user) and refer the new row to the previous row
User:
2, 1, John, Another Street ( where 1 is referring to the previous id / updated from)
1, NULL, John, One Street, deleted (NULL means this is newly created data)
The advantage of using this is, it's easy to query(I just query like usual
The downside is the table will be so huge to record every update. Is there any solution?
Thank you
This is what i do sometimes:
For any field that i need to track value changes, i design a separate changes table.
For example, for the address field that is a concept associated with the user entity and is not a direct property of the adoption entity, i define the table:
UserAddressChanges(UserID, Address, ChangeDateTime, ChangerPersonID)
This way, the changes data may be used in any other sub-system or system, independent of your current adoption use-case.
I use in-table change tracking for very simple tables like:
UniversityManagers(PersonID, AssignDateTime, AssignorPersonID)
For more complex tables with frequent changes (and usually, few refers to previous data) where i need full record logging, i separate the main table (of current records) and the log table which have extra fields such as LogID, ChangeDateTime, ChangerPersonID, ChangerIP, ...
There are different approaches to this.
Perhaps the simplest is to denormalize the data. If there is data you need at the point of adoption, include it as columns in the adoption table. This address is the "point-in-time" address.
This method is useful for simple things, but it does not scale well. And you have to pre-define the columns you want.
The next step is to create audit tables for all your tables, or at least all tables of interest. Every time a record changes in user, a new record is added into userAudit. Audit tables are usually maintained using triggers.
The advantage of audit tables is that they do not clutter the existing table (and logic). The same queries work on the existing tables.
Finally, you can just cave in and realize that your data model is overly simplified. You really have slowly changing dimensions. This data can be represented using version effective dates and version end dates for each row. The user table ends up looking like:
user_id name address version_eff_dt version_end_dt
Because user_id is no longer a primary key, you might want two tables users and userHistory, or something like that.
This is a "correct" representation of the data at any point in time. However, it usually requires restructuring queries because a single user appears multiple times in the table -- and user_id is no longer the primary key.

Database ER Model weekday availability

I've got a annoying design issue when designing a database and it's models. Essentially, the database got clients and customers which should be able to make appointments with eachother. The clients should have their availability (on a general week basis) stored in the database, and this needs to be added to the appointment model. The solution does not require or want precise hours for the availability, just one value for each day - ranging from "not available", to "maybe available " to "available". The only solution i've come up with so far includes having all 7 days stored in a row for each client, but it looks nasty.
So here's some of what I got so far:
Client model:
ClientId
Service,
Fee
Customer-that-uses-Client model:
CustomerId
ServiceNeed
Availability-model:
ClientID (FK/PK)
Monday, (int)
...
...
Sunday (int)
And finally, appointment model:
AppointmentId
ClientID
CustomerID
StartDate
Hourse
Problem: is there any way i can redesign the avilability model to ... well, need less fields and still get each day stored with a (1-3) value depending on the clients availability ? Would also be really good if the appointment model wouldnt need to reference all that data from the availability model...
Problem
Answering the narrow question is easy. However, noting the Relational Database tag, there are a few problems in your model, that render it somewhat less than Relational.
Eg. the data content in each logical row needs to be unique. (Uniqueness on the Record id, which is physical, system-generated, and not from the data, cannot provide row uniqueness.) The Primary Key must be "made up from the data", which is of course the only way to make the data row unique.
Eg. values such as Day of availability and AvailabilityType are not constrained, and they need to be.
Relational Data Model
With the issues fixed, the answer looks like this:
Notation
All my data models are rendered in IDEF1X, the Standard for modelling Relational databases since 1993.
My IDEF1X Introduction is essential reading for those who are new to the Relational Model or data modelling.
Content
In the Relational Model, there is a large emphasis on constraining the data, such that the database as a whole contains only valid data.
The only solution i've come up with so far includes having all 7 days stored in a row for each client, but it looks nasty.
Yes. What you had was a repeating attribute (they are named Monday..Sunday, which may not look like a repeating attribute, but it is one, no less than a CSV list). That breaks Codd's Second Normal Form.
The solution is to place the single element in a subordinate table ProviderAvailable.
Day of availability and AvailabilityType are now constrained to a set of values.
The rows in Provider (sorry, the use of "Client" in this context grates on me) and Customer are now unique, due to addition of a Name. The users will not use an internal number to identify such entities, they will use a name, usually a ShortName.
Once the model is tightened up, and all the columns are defined, if Name (not a combination of LastName, FirstName, Initial) is unique, you can eliminate the RecordId, and elevate the Name AK to the PK.
Not Modelled
You have not asked, and I have not modelled these items, but I suspect they will come up as you progress in the development.
A Provider (Client) provides 1 Service. There may be more than 1 in future.
A Customer, seeking 1 Service, can make an Appointment with any Provider (who may or may not provide that Service). You may want to constrain each Appointment to a Provider who provides the sought Service.
As per my comment. It depends on how tight you want this Availability/Reservation system to be. Right now, there is nothing to prevent more than one Customer reserving one Provider on a particular Day, ie. a double-booking.
Normalize that availability table: instead of
ClientID (FK/PK)
Monday, (int)
...
...
Sunday (int)
go with
ClientID (PK/FK)
weekday integer value (0-6 or maybe 1-7) (PK)
availability integer value 1-3
This table has a compound primary key, made of (ClientID, weekday) because each client may have either zero or one entry for each of the seven weekdays.
In this table, you might have these rows:
43 2 3 (on Tuesdays = 2, client 43 is Available =3)
43 3 2 (on Wednesdays = 3, client 43 is MaybeAvailable =2)
If the row is missing, it means the client is unavailable. an availability value of 1 also means that.

Database design without a 1 column table

I have been working on a database design and I'm stuck hitting a wall. I'm ending up with what I'm reading is not a normalized database structure but I'm having issues trying to find a "more correct" design and if this design is acceptable how do I execute it in Access?
TLDR: If a table with a single column set as an auto number is an acceptable design, how do you go about inserting a record in it using Access?
The segment of the database of concern is creating a structure for storing companies. Requirements for this is that any changes need to be approved by another user and all historical changes need to be captured so that it can be easily reverted also a company can have multiple aliases but only one legal name.
There is three tables in my solution but one of them is a single column table. From what I've read 95% of people on stack overflow all think its a very bad idea but I've found one post were people are that there are cases for it. I think this is not normal also because I can't find a way to just create a new record in a table with only an auto number column (In Access I have not tried others yet).
Table Structure
Company Names : ID, Company ID, Is Legal Name, Created By, Created On, Approved On, Approved By, Event ID, Is Active
(A company could have a few different names known to the public: TD vs Toronto Dominion. Each name is inserted here with a reference to the company it belongs to)
Companies : ID (Auto Number)
(A company exists and this is its ID)
Companies History : ID, Company ID, Market ID, Holding Company ID, Created By, Created On, Approved On, Approved By, Event ID, Is Active
(These are the historical changes that have been made to the company and who did them and who approved them)
Column Notes:
Event ID : is a FK reference to a table holding each record of actions that have either created, updated or deleted records. (User Research using method [y], Typo Fix, ...)
Is Active : Since deleting records is not possible (historical records need to be kept) this column is used to track if this record is to be included in queries.
Options I see and their issues:
I could get rid of the companies table and make Companies History : ID be the new company id but I find that in that case each time I want to update a company I would need to update each FK reference to the previous company id (I don't think this would be a very normalized approach)
Another Option I see is that I get rid of Companies table and use Company Names : ID as the company id and I would add a column to Company Names called Alias of Company ID. I find that solution adds a log of complexity to my stored data where an alias has company information that differs from the entry that was aliased.
Another Option is that I could add the columns: Created By, Created On, Approved On, Approved By, Event ID and Is Active but this would be duplicating information found in the first record for this company in the Companies History table and this isn't adding any real description to this record.
Anther Option is that I make the Companies table a mirror of Companies History and that when I update or insert a record in Companies I would also insert a record Companies History. With this solution I find that again I duplicate information, that newest record in "Companies History" would hold the same information found in last Inserted or Updated record in in Companies
Another option but is to replace the Companies : ID auto number with a short text and I just get the hash of the current timestamp + a random int. I can now insert new records into this table using access but I feel that this is overkill since I just need the exact same functionality as the auto number.
Another option is move only the legal name into Companies table but now when the legal name of a company changes I have no way of tracking this. Also if I want a list of all names I need to use a union on Companies and Company Names. I find that using unions can reduce performances of queries and I use them only when explicitly needed.
If I don't want to duplicate any information and I don't want to update all FK it seems that I need a table with a single column. If this is acceptable how do I go about inserting a record into a table with a single column set to auto number in Access.
If Companies can be derived from CompanyNames (select distinct CompanyId from CompanyNames), there is no point storing again that information. Just replace that table by a view if you want it (but it as little added value).
On the other hand, if CreatedOn refers to the Company creation (not the row creation) then it is obviously a property of the Company, and I would rather work with
Companies --> Aliases.
But of course I don't know the ins and outs of the reality you're dealing with.

Database design: Managing old and new data in database table

I have a table Student with field as followed,
Student table (one record per student)
student_id
Name
Parent_Name
Address_line1, Address_line2, Addess_line
Photo_path
Signature_file_path
Preferred_examcity_choice1,Preferred_examcity_choice1, Preferred_examcity_choice3
Gender
Nationality
.
.
.
I am inserting into this table on Registration form completion through the web interface.
Now there is one more module in a web interface for updating the student data, on every update request I am updating the student table records and inserting the new entry in student_data_change_request. student can change records any number of times.
student_data_change_request
request_id(auto_incr PK)
old_name
new_name
old_photo_path
new_photo_path
old_signature_file_path
new_signature_file_path
Now coming to problem, earlier students were allowed to change very few fields, now client want to allow the candidate to update more number of fields(around 20 fields) and adding old and new columns for the corresponding column isn't elegant and preferred(I guess), I will end up creating 40 columns to keep track of 20 columns. So how should I redesign my table? suggestions are welcomed.
One approach is to have a shadow table named (table)_xx that has the same columns, the time, date, update/insert/delete flag, user or whatever and no referential integrity. Set a trigger to update that table from the source whenever anything happens.
If you've got genuine business requirements that need history then do those properly but this pattern is great as a general audit, debugging and forensic tool.
It's also really easy to automate/script as you just generate it from the DB metadata.
Usually historical table looks like:
request_id
column_name
old_value
new_value
dt
request_id and column_name are primary key. When you update student table you insert new entry in student_data_change_request for each updating column.
Edited:
Another way:
request_id
value_type
name
photo_path
signature_file_path
...
and insert first entry with old values and second entry with new values. Colum value_type is mark old or new.
I would rather have just one table, with an additional column for effective date. Then a view that picks up just the most recent row for each student_id becomes your first "table". If for some reason you must show "current" and "most recently changed" values side-by-side, that is another view.
As usual, it all depends on how you intend to use the data.
My strong preference in these cases is the solution #mathguy suggests - embedding the concept of time in the main table design. This allows you to ask the question "what was this student's address on 1 Jan?", or "who had signature x on 12 Feb?".
If you have to report or execute business logic that reflects the status at any point in time, this design works really well. For instance, if you have to report on how many students lived in a particular address for a given term, you want to know when the records were valid.
But not all applications care about "time" - sometimes, you just want to have an audit table, so you can trace what happened over time in case of anomalies.
In that case, #loztinspace's solution is useful - but in my experience, this rapidly escalates into more work, because those who want to inspect the audit records can or should not get access to a SQL prompt on your production environment.

Database Historization

We have a requirement in our application where we need to store references for later access.
Example: A user can commit an invoice at a time and all references(customer address, calculated amount of money, product descriptions) which this invoice contains and calculations should be stored over time.
We need to hold the references somehow but what if the e.g. the product name changes? So somehow we need to copy everything so its documented for later and not affected by changes in future. Even when products are deleted, they need to reviewed later when the invoice is stored.
What is the best practise here regarding database design? Even what is the most flexible approach e.g. when the user want to edit his invoice later and restore it from the db?
Thank you!
Here is one way to do it:
Essentially, we never modify or delete the existing data. We "modify" it by creating a new version. We "delete" it by setting the DELETED flag.
For example:
If product changes the price, we insert a new row into PRODUCT_VERSION while old orders are kept connected to the old PRODUCT_VERSION and the old price.
When buyer changes the address, we simply insert a new row in CUSTOMER_VERSION and link new orders to that, while keeping the old orders linked to the old version.
If product is deleted, we don't really delete it - we simply set the PRODUCT.DELETED flag, so all the orders historically made for that product stay in the database.
If customer is deleted (e.g. because (s)he requested to be unregistered), set the CUSTOMER.DELETED flag.
Caveats:
If product name needs to be unique, that can't be enforced declaratively in the model above. You'll either need to "promote" the NAME from PRODUCT_VERSION to PRODUCT, make it a key there and give-up ability to "evolve" product's name, or enforce uniqueness on only latest PRODUCT_VER (probably through triggers).
There is a potential problem with the customer's privacy. If a customer is deleted from the system, it may be desirable to physically remove its data from the database and just setting CUSTOMER.DELETED won't do that. If that's a concern, either blank-out the privacy-sensitive data in all the customer's versions, or alternatively disconnect existing orders from the real customer and reconnect them to a special "anonymous" customer, then physically delete all the customer versions.
This model uses a lot of identifying relationships. This leads to "fat" foreign keys and could be a bit of a storage problem since MySQL doesn't support leading-edge index compression (unlike, say, Oracle), but on the other hand InnoDB always clusters the data on PK and this clustering can be beneficial for performance. Also, JOINs are less necessary.
Equivalent model with non-identifying relationships and surrogate keys would look like this:
You could add a column in the product table indicating whether or not it is being sold. Then when the product is "deleted" you just set the flag so that it is no longer available as a new product, but you retain the data for future lookups.
To deal with name changes, you should be using ID's to refer to products rather than using the name directly.
You've opened up an eternal debate between the purist and practical approach.
From a normalization standpoint of your database, you "should" keep all the relevant data. In other words, say a product name changes, save the date of the change so that you could go back in time and rebuild your invoice with that product name, and all other data as it existed that day.
A "de"normalized approach is to view that invoice as a "moment in time", recording in the relevant tables data as it actually was that day. This approach lets you pull up that invoice without any dependancies at all, but you could never recreate that invoice from scratch.
The problem you're facing is, as I'm sure you know, a result of Database Normalization. One of the approaches to resolve this can be taken from Business Intelligence techniques - archiving the data ina de-normalized state in a Data Warehouse.
Normalized data:
Orders table
OrderId
CustomerId
Customers Table
CustomerId
Firstname
etc
Items table
ItemId
Itemname
ItemPrice
OrderDetails Table
ItemDetailId
OrderId
ItemId
ItemQty
etc
When queried and stored de-normalized, the data warehouse table looks like
OrderId
CustomerId
CustomerName
CustomerAddress
(other Customer Fields)
ItemDetailId
ItemId
ItemName
ItemPrice
(Other OrderDetail and Item Fields)
Typically, there is either some sort of scheduled job that pulls data from the normalized datas into the Data Warehouse on a scheduled basis, OR if your design allows, it could be done when an order reaches a certain status. (Such as shipped) It could be that the records are stored at each change of status (with a field called OrderStatus tacking the current status), so the fully de-normalized data is available for each step of the oprder/fulfillment process. When and how to archive the data into the warehouse will vary based on your needs.
There is a lot of overhead involved in the above, but the other common approach I'm aware of carries even MORE overhead.
The other approach would be to make the tables read-only. If a customer wants to change their address, you don't edit their existing address, you insert a new record.
So if my address is AddressId 12 when I first order on your site in Jamnuary, then I move on July 4, I get a new AddressId tied to my account. (Say AddressId 123123 because your site is very successful and has attracted a ton of customers.)
Orders I palced before July 4 would have AddressId 12 associated with them, and orders placed on or after July 4 have AddressId 123123.
Repeat that pattern with every table that needs to retain historical data.
I do have a third approach, but searching it is difficult. I use this in one app only, and it actually works out pretty well in this single instance, which had some pretty specific business needs for reconstructing the data exactly as it was at a specific point in time. I wouldn't use it unless I had similar business needs.
At a specific status, serialize the data into an Xml document, or some other document you can use to reconstruct the data. This allows you to save the data as it was at the time it was serialized, retaining original table structure and relaitons.
When you have time-sensitive data, you use things like the product and Customer tables as lookup tables and store the information directly in your Orders/orderdetails tables.
So the order table might contain the customer name and address, the details woudl contain all relevant information about the produtct including especially price(you never want to rely on the product table for price information beyond the intial lookup at teh time of the order).
This is NOT denormalizing, the data changes over time but you need the historical value, so you must store it at the time the record is created or you will lose data intergrity. You don't want your financial reports to suddenly indicate you sold 30% more last year because you have price updates. That's not what you sold.