versioning each field vs history date field? - mysql

Which do you recommend and why?
I have a few tables, when i make a change to the data... it should go to a history table (audit) with a effective date.
The other solution is versioning each field to insert a new row when making changes to the data?
Which is the best method for the invoice information? Item name and price is always change

These are slowly changing dimensions, type 2 and type 4, appropriately.
Both methods are valid and may be more appropriate for your needs, depending on your model and query requirements.
Basically, type 2 (versioning) is more appropriate when you need to query historical values as often as the current one, while type 4 (history table) is more suited when you are querying the current value more often and there are more queries (more queries to develop I mean) against the most recent value.

A system we use and happy with:
Each table that requires history, we create a similar table and adding a timestamp field at the end, which becomes a part of the PK.
Each update on original table, we insert into history table with the same conditions:
update table x WHERE somthing something
insert into table x_history
select * from x WHERE something something
That keeps your data clean and your tables slim.

My personal preference would be to user the Observer Pattern in your application and to implement a separate history table. This means that you can pull the data from the history table when you need it and you don't compromise the speed of querying the main table.

Related

How to design a MySQL table that tracks the Status of each Asset, as well as every old Status?

I would like to create a table that tracks the status of each asset as well as each past status. Basically I want to keep a log of all status changes.
Do I create a timestamp for each updated status and have every update be its own separate row, linked back to the asset through the assetid? Then sort by the timestamp to get these statuses in order? I can see this table getting unwieldy if there are tons of rows for each asset and the table grows linearly over time.
This is for a MySQL database.
Here is an example of how I have designed a database table to track/log purposes.
Columns:
auto increment pk (if you don't have better pk)
timestamp
tracked object id (asset_id in your case)
event type (probably you don’t need but this is explained below)
content (this can be also named status in your case)
My example is very simplified but the main idea is to insert each record into own row. You can create a table with proper primary keys or indexes to have a good search performance.
Using the structure you should be able to search by asset, by status, or get latest changes etc. The structure depends on your needs so usually I have modified it to support the need.
Don’t care too much about the event -columns. I just put it here because most of the implementations are based on event sourcing. Here is a link to one article that could explain it: http://scottlobdell.me/2017/01/practical-implementation-event-sourcing-mysql/
I suggest that you could read more about that event sourcing that if the design could work in your case. Look only the database example because that is similar like in my example.
In the results, you should have a journal of status changes. Then it depends on your code how to handle/read data and show results.
About the linear growth… I would say it is not a big problem. Of course, if you have more information what “tons of rows” means, then ask. I have not seen any scaling problems. The same structure works very well with relational or with NoSQL databases. Mysql also has features to optimize that kind of structure if the size of data will be a problem.

Redshift Usage - 1 row by 400 columns per user or (20-400) rows by 4 columns per user

We are building an analytics engine which has to store attribute preference score for each user. We are expecting 400 attributes and they may change(at what frequency is not known as yet). We are planning to store this in Redshift.
My qs is:
Should we store as 1 row per user with 400 cols(1 column for each attribute)
or should we go for a table structure like
(uid, attribute id, attribute value, preference score) which will be (20-400)rows by 3 columns
Which kind of storage would lead to a better performance in Redshift.
Should be really consider NoSQL for this?
Note:
1. This is a backend for real time application with increasing number of users.
2. For processing, the above table has to be read with entire information of all attibutes for one user i.e indirectly create a 1*400 matrix at runtime.
Please help me which desgin would be ideal for such a use case. Thank you
You can go for tables like given in this example and then use bitwise functions
http://docs.aws.amazon.com/redshift/latest/dg/r_bitwise_examples.html
Bitwise functions are here
For your problem, I would suggest a two table design. Its more pain in the beginning but will help in future.
First table would be a key value kind of first table, which would store all the base data and would be kind of future proof, where you can add/remove more attributes, but this table will continue working.
And a N(400 in your case) column 2nd table. This second table you can build using the first table. For the second table, you can start with a bare minimum set of columns .. lets say only 50 out of those 400. So that querying this table would be really fast. And the structure of this table can be refreshed periodically to match with the current reporting requirements. Also you will always have the base table in case you need to backfill any data.

MySQL Database Design Questions

I am currently working on a web service that stores and displays money currency data.
I have two MySQL tables, CurrencyTable and CurrencyValueTable.
The CurrencyTable holds the names of the currencies as well as their description and so forth, like so:
CREATE TABLE CurrencyTable ( name VARCHAR(20), description TEXT, .... );
The CurrencyValueTable holds the values of the currencies during the day - a new value is inserted every 2 minutes when the market is open. The table looks like this:
CREATE TABLE CurrencyValueTable ( currency_name VARCHAR(20), value FLOAT, 'datetime' DATETIME, ....);
I have two questions regarding this design:
1) I have more than 200 currencies. Is it better to have a separate CurrencyValueTable for each currency or hold them all in one table?
2) I need to be able to show the current (latest) value of the currency. Is it better to just insert such a field to the CurrencyTable and update it every two minutes or is it better to use a statement like:
SELECT value FROM CurrencyValueTable ORDER BY 'datetime' DESC LIMIT 1
The second option seems slower.. I am leaning towards the first one (which is also easier to implement).
Any input would be greatly appreciated!!
p.s. - please ignore SQL syntax / other errors, I typed it off the top of my head..
Thanks!
To your questions:
I would use one table. Especially if you need to report on or compare data from multiple currencies, it will be incredibly improved by sticking to one table.
If you don't have a need to track the history of each currency's value, then go ahead and just update a single value -- but in that case, why even have a separate table? You can just add "latest value" as a field in the currency table and update it there. If you do need to track history, then you will need the two tables and the SQL you posted will work.
As an aside, instead of FLOAT I would use DECIMAL(10,2). After MySQL 5.0, this will actually have improved results when it comes to currency handling with rounding.
It is better to have one table holding all currencies
If there is need for historical prices, then the table needs to hold them. A reasonable compromise in many situations is to split the price table into a full list of historical prices and another table which only has the current prices.
Using data type float can be troublesome. Please be sure you know what you are doing. If not, use a database currency data type.
As your webservice is transactional it is better if you'd have to access less tables at the same time. Since you will be reading and writing a lot, I would suggest having a single table.
Its better to insert a field to the CurrencyTable and update it rather than hitting two tables for a single request.

Versioned and indexed data store

I have a requirement to store all versions of an entity in a easily indexed way and was wondering if anyone has input on what system to use.
Without versioning the system is simply a relational database with a row per, for example, person. If the person's state changes that row is changed to reflect this. With versioning the entry should be updated in such a way so that we can always go back to a previous version. If I could use a temporal database this would be free and I would be able to ask 'what is the state of all people as of yesterday at 2pm living in Dublin and aged 30'. Unfortunately there doesn't seem to be any mature open source projects that can do temporal.
A really nasty way to do this is just to insert a new row per state change. This leads to duplication, as a person can have many fields but only one changing per update. It is also then quite slow to select the correct version for every person given a timestamp.
In theory it should be possible to use a relational database and a version control system to mimic a temporal database but this sounds pretty horrendous.
So I was wondering if anyone has come across something similar before and how they approached it?
Update
As suggested by Aaron here's the query we currently use (in mysql). It's definitely slow on our table with >200k rows. (id = table key, person_id = id per person, duplicated if the person has many revisions)
select name from person p where p.id = (select max(id) from person where person_id = p.person_id and timestamp <= :timestamp)
Update
It looks like the best way to do this is with a temporal db but given that there aren't any open source ones out there the next best method is to store a new row per update. The only problem is duplication of unchanged columns and a slow query.
There are two ways to tackle this. Both assume that you always insert new rows. In every case, you must insert a timestamp (created) which tells you when a row was "modified".
The first approach uses a number to count how many instances you already have. The primary key is the object key plus the version number. The problem with this approach seems to be that you'll need a select max(version) to make a modification. In practice, this is rarely an issue since for all updates from the app, you must first load the current version of the person, modify it (and increment the version) and then insert the new row. So the real problem is that this design makes it hard to run updates in the database (for example, assign a property to many users).
The next approach uses links in the database. Instead of a composite key, you give each object a new key and you have a replacedBy field which contains the key of the next version. This approach makes it simple to find the current version (... where replacedBy is NULL). Updates are a problem, though, since you must insert a new row and update an existing one.
To solve this, you can add a back pointer (previousVersion). This way, you can insert the new rows and then use the back pointer to update the previous version.
Here is a (somewhat dated) survey of the literature on temporal databases: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.91.6988&rep=rep1&type=pdf
I would recommend spending a good while sitting down with those references and/or Google Scholar to try to find some good techniques that fit your data model. Good luck!

MySQL - Best method of saving and loading items

So on my older work, I had always used the 'text' data type to store items, like so:
0=4151:54;1=995:5000;2=521:1;
So basically: slot=item:amount;
I've been looking into finding the best ways of storing information in a sql database, and everywhere i go, it says that using text is a big performance hit.
I was thinking of doing something else, like having a table with the following columns:
id, owner_id, slot_id, item_id, amount
Where as now i can just insert a row for each item a character allocates. But i have no clue how to save them, since the slot's item can change, etc. A character has 28 inventory slots, and 500 bank slots, should i insert them all at registration? or is there a smarter way to save the items
Yes use that structure. Using text to store relational data defeats the purpose of a relational database.
I don't see what you mean by insert them all at registration. Can you not insert them as you need to?
Edit
Based on your previous comment I would recommend only inserting a slot as it is needed (if I understand your problem). It may be an idea to keep the ID of the slot in the application, if need be.
If I understand you correctly, and that the slot's item can change, then you want to further abstract the mapping between item_id and the item:
entry_tbl.item_id->item_rel_realitems_tbl.real_id->items_tbl
This way, all entries with an itemid point to a table that maps those ids to a mutable item. When you UPDATE an item in 'items_tbl' then the mapping automatically updates the entry_tbl.
Another JOIN is needed however. I would also use stored procedures in any case to abstract the mechanism from semantics.
I am not sure I understand the wording of your question however.