How to design a MySQL table that tracks the Status of each Asset, as well as every old Status? - mysql

I would like to create a table that tracks the status of each asset as well as each past status. Basically I want to keep a log of all status changes.
Do I create a timestamp for each updated status and have every update be its own separate row, linked back to the asset through the assetid? Then sort by the timestamp to get these statuses in order? I can see this table getting unwieldy if there are tons of rows for each asset and the table grows linearly over time.
This is for a MySQL database.

Here is an example of how I have designed a database table to track/log purposes.
Columns:
auto increment pk (if you don't have better pk)
timestamp
tracked object id (asset_id in your case)
event type (probably you don’t need but this is explained below)
content (this can be also named status in your case)
My example is very simplified but the main idea is to insert each record into own row. You can create a table with proper primary keys or indexes to have a good search performance.
Using the structure you should be able to search by asset, by status, or get latest changes etc. The structure depends on your needs so usually I have modified it to support the need.
Don’t care too much about the event -columns. I just put it here because most of the implementations are based on event sourcing. Here is a link to one article that could explain it: http://scottlobdell.me/2017/01/practical-implementation-event-sourcing-mysql/
I suggest that you could read more about that event sourcing that if the design could work in your case. Look only the database example because that is similar like in my example.
In the results, you should have a journal of status changes. Then it depends on your code how to handle/read data and show results.
About the linear growth… I would say it is not a big problem. Of course, if you have more information what “tons of rows” means, then ask. I have not seen any scaling problems. The same structure works very well with relational or with NoSQL databases. Mysql also has features to optimize that kind of structure if the size of data will be a problem.

Related

Database Architecture for logging

This is something that has bothered me for a long time and i still have been unable to find an answer.
I have a huge system with alot of different features. What is common for this system is of course that my users can
create, update, read & delete
different parts of my system.
For simple reasons lets say i have an application that has the following features:
Document administration
Video administration
User administration
Salery administration
(Please do note i took these at random just to prove a point that all of these would have their own separate tables and does not necessarily be connected).
Now i wish to create some sort of logging system. So that when ever someone either create,update or delete an entity it will be recorded.
Now as far as i can see i can do this two ways.
1.
Create a logging table for each of the 4 features that is in my system. However with this method i am required to create a logging table for each new feature i add to the system. i would also have to combine data from X number of tables if i wish to create a log which potentially could be a huge task!
2.
i could create something like the following:
However once again i would have to add a col for each new feature i will add.
So my question is what is the best way for creating logging database architecture
Or is there an easier way?
Instead of one target_xx for each feature, you could do it this way:
target_id | target_type
1 video
4 document
5 user
2 user
or even better. A table with target types and insert only the respective id's on target_type
Something like this:
if you want to capture for each table creation and update date, i would just use the default and the update event from mysql. You can define the fields like this for a table:
ALTER TABLE table
ADD COLUMN CreateDate Datetime DEFAULT CURRENT_TIMESTAMP,
ADD COLUMN LastModifiedDate Datetime ON UPDATE CURRENT_TIMESTAMP;
You can add these 2 fields in all tables. If you want to use one central table for logging (which might be more difficult to manage, because you always need to create joins, maybe also worse performance), then I would work with triggers.

member action table data model suggestion

I'm trying to add an action table, but i'm currently at odds as to how to approach the problem.
Before i go into more detail.
We have members who can do different actions on our website
add an image
update an image
rate an image
post a comment on image
add a blog post
update a blog post
comment on a blog post
etc, etc
the action table allows our users to "Watch" other member's activities if they want to add them to their watch list.
I currently created a table called member_actions with the following columns
[UserID] [actionDate] [actionType] [refID]
[refID] can be a reference either to the image ID in the DB or blogpost ID, or an id column of another actionable table (eg. event)
[actionType] is an Enum column with action names such as (imgAdd,imgUpdate,blogAdd,blogUpdate, etc...)
[actionDate] will decide which records get deleted every 90 days... so we won't be keeping the actions forever
the current mysql query i cam up with is
SELECT act.*,
img.Title, img.FileName, img.Rating, img.isSafe, img.allowComment AS allowimgComment,
blog.postTitle, blog.firstImageSRC AS blogImg, blog.allowComments AS allowBlogComment,
event.Subject, event.image AS eventImg, event.stimgs, event.ends,
imgrate.Rating
FROM member_action act
LEFT JOIN member_img img ON (act.actionType="imgAdd" OR act.actionType="imgUpdate")
AND img.imgID=act.refID AND img.isActive AND img.isReady
LEFT JOIN member_blogpost blog ON (act.actionType="blogAdd" OR act.actionType="blogUpdate")
AND blog.id=act.refID AND blog.isPublished AND blog.isPublic
LEFT JOIN member_event event ON (act.actionType="eventAdd" OR act.actionType="eventUpdate")
AND event.id=act.refID AND event.isPublished
LEFT JOIN img_rating imgrate ON act.actionType="imgRate" AND imgrate.UserID=act.UserID AND imgrate.imgID=act.refID
LEFT JOIN member_favorite imgfav ON act.actionType="imgFavorite" AND imgfav.UserID=act.UserID AND imgfav.imgID=act.refID
LEFT JOIN img_comment imgcomm ON (act.actionType="imgComment" OR act.actionType="imgCommentReply") AND imgcomm.imgID=act.refID
LEFT JOIN blogpost_comment blogcomm ON (act.actionType="blogComment" OR act.actionType="blogCommentReply") AND blogcomm.blogPostID=act.refID
ORDER BY act.actionDate DESC
LIMIT XXXXX,20
Ok so basically, given that i'll be deleting actions older than 90 days every week or so... would it make sense to go with this query for displaying the member action history?
OR should i add a new text column in member_actions table called [actionData] where i can store a few details in json or xml format for fast querying of the member_action table.
It adds to the table size and reduces query complexity, but the table will be purged from periodically from old entries.
the assumption is that eventually we'll have no more than a few 100k members so would i'm concerned about the table size of the member_action table with it's text [actionData] column that will contain some specific details.
I'm leaning towards the [actionData] model but any recommendations or considerations will be appreciated.
another consideration is that it's possible that the table entries for img or blog could get deleted... so i could have action but no reference record...this sure does add to the problem.
thanks in advance
Because you are dealing with user interface issues, performance is key. All the joins will do take time, even with indexes. And, querying the database is likely to lock records in all the tables (or indexes), which can slow down inserts.
So, I lean towards denormalizing the data, by maintaining the text in the record.
However, a key consideration is whether the text can be updated after the fact. That is, you will load the data when it is created. Can it then change? The problem of maintaining the data in light of changes (which could involve triggers and stored procedures) could introduce a lot of additional complexity.
If the data is static, this is not an issue. As for table size, I don't think you should worry about that too much. Databases are designed to manage memory. It is maintaining the table in a page cache, which should contain pages for currently active members. You can always increase memory size, especially for 100,000 users which is well within the realm of today's servers.
I'd be wary of this approach - as you add kinds of actions that you want to monitor the join is going to keep growing (and the sparse extra columns in the select statement as well).
I don't think it would be that scary to have a couple of extra columns in this table - and this query sounds like it would be running fairly frequently, so making it efficient seems like it would be a good idea.

Proper way to store requests in Mysql (or any) database

What is the "proper" (most normalized?) way to store requests in the database? For example, a user submits an article. This article must be reviewed and approved before it is posted to the site.
Which is the more proper way:
A) store it in in the Articles table with an "Approved" field which is either a 0, 1, 2 (denied, approved, pending)
OR
B) Have an ArticleRequests table which has the same fields as Articles, and upon approval, move the row data from ArticleRequests to Articles.
Thanks!
Since every article is going to have an approval status, and each time an article is requested you're very likely going to need to know that status - keep it inline with the table.
Do consider calling the field ApprovalStatus, though. You may want to add a related table to contain each of the statuses unless they aren't going to change very often (or ever).
EDIT: Reasons to keep fields in related tables are:
If the related field is not always applicable, or may frequently be null.
If the related field is only needed in rare scenarios and is better described by using a foreign key into a related table of associated attributes.
In your case those above reasons don't apply.
Definitely do 'A'.
If you do B, you'll be creating a new table with the same fields as the other one and that means you're doing something wrong. You're repeating yourself.
I think it's better to store data in main table with specific status. Because it's not necessary to move data between tables if this one is approved and the article will appear on site at the same time. If you don't want to store disapproved articles you should create cron script with will remove unnecessary data or move them to archive table. In this case you will have less loading of your db because you can adjust proper time for removing old articles for example at night.
Regarding problem using approval status in each query: If you are planning to have very popular site with high-load for searching or making list of article you will use standalone server like sphinx or solr(mysql is not good solution for this purposes) and you will put data to these ones with status='Approved'. Using delta indexing helps you to keep your data up-to-date.

versioning each field vs history date field?

Which do you recommend and why?
I have a few tables, when i make a change to the data... it should go to a history table (audit) with a effective date.
The other solution is versioning each field to insert a new row when making changes to the data?
Which is the best method for the invoice information? Item name and price is always change
These are slowly changing dimensions, type 2 and type 4, appropriately.
Both methods are valid and may be more appropriate for your needs, depending on your model and query requirements.
Basically, type 2 (versioning) is more appropriate when you need to query historical values as often as the current one, while type 4 (history table) is more suited when you are querying the current value more often and there are more queries (more queries to develop I mean) against the most recent value.
A system we use and happy with:
Each table that requires history, we create a similar table and adding a timestamp field at the end, which becomes a part of the PK.
Each update on original table, we insert into history table with the same conditions:
update table x WHERE somthing something
insert into table x_history
select * from x WHERE something something
That keeps your data clean and your tables slim.
My personal preference would be to user the Observer Pattern in your application and to implement a separate history table. This means that you can pull the data from the history table when you need it and you don't compromise the speed of querying the main table.

Versioned and indexed data store

I have a requirement to store all versions of an entity in a easily indexed way and was wondering if anyone has input on what system to use.
Without versioning the system is simply a relational database with a row per, for example, person. If the person's state changes that row is changed to reflect this. With versioning the entry should be updated in such a way so that we can always go back to a previous version. If I could use a temporal database this would be free and I would be able to ask 'what is the state of all people as of yesterday at 2pm living in Dublin and aged 30'. Unfortunately there doesn't seem to be any mature open source projects that can do temporal.
A really nasty way to do this is just to insert a new row per state change. This leads to duplication, as a person can have many fields but only one changing per update. It is also then quite slow to select the correct version for every person given a timestamp.
In theory it should be possible to use a relational database and a version control system to mimic a temporal database but this sounds pretty horrendous.
So I was wondering if anyone has come across something similar before and how they approached it?
Update
As suggested by Aaron here's the query we currently use (in mysql). It's definitely slow on our table with >200k rows. (id = table key, person_id = id per person, duplicated if the person has many revisions)
select name from person p where p.id = (select max(id) from person where person_id = p.person_id and timestamp <= :timestamp)
Update
It looks like the best way to do this is with a temporal db but given that there aren't any open source ones out there the next best method is to store a new row per update. The only problem is duplication of unchanged columns and a slow query.
There are two ways to tackle this. Both assume that you always insert new rows. In every case, you must insert a timestamp (created) which tells you when a row was "modified".
The first approach uses a number to count how many instances you already have. The primary key is the object key plus the version number. The problem with this approach seems to be that you'll need a select max(version) to make a modification. In practice, this is rarely an issue since for all updates from the app, you must first load the current version of the person, modify it (and increment the version) and then insert the new row. So the real problem is that this design makes it hard to run updates in the database (for example, assign a property to many users).
The next approach uses links in the database. Instead of a composite key, you give each object a new key and you have a replacedBy field which contains the key of the next version. This approach makes it simple to find the current version (... where replacedBy is NULL). Updates are a problem, though, since you must insert a new row and update an existing one.
To solve this, you can add a back pointer (previousVersion). This way, you can insert the new rows and then use the back pointer to update the previous version.
Here is a (somewhat dated) survey of the literature on temporal databases: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.91.6988&rep=rep1&type=pdf
I would recommend spending a good while sitting down with those references and/or Google Scholar to try to find some good techniques that fit your data model. Good luck!