Which data model to use to maintain every day historical data for a customer - mysql

I need to maintain everyday closing balance of a customer and plot a line graph based on that balance for the last 365 days. Which data model is preferred to maintain this data ?
MySQL, Cassandra or any other databases ?

The obvious solution would be to have a table with a key [client id, data] and the value being closing balance.
The question is how do you fill that data in? You could have a running job at the day end. The big question is how to make the system reliable? If job fails and is restarted next day, will that provide data for the previous day?
Typical way of addressing this type of problem is using "event sourcing". This is a fancy way of saying that it needs to be a storage of records for every operation executed on a balance. Every add/deduct should be there, including client id and date. These records also may have "resulting balance" - which implies that last operation for a customer in a day has the closing balance as well.
At the end, you will have list of transactions and you will be able to "replay" previous event to get balances. It's your choice if you want to have actual table for daily balances per client - as it may be cheaper to look up that data instead of recalculating it every time.
In banking industry, every transaction is stored as a separate record for this specific reason - to be able to get different reports; incl. closing balances per day.

Related

Summing a ledger over long period of time. (reconciliation, snapshots, rolling sum?)

We are building a warehouse stock management system and have a stock movements table that records stock into, through and out of the system, for each product and each location it is stored. i.e.
10 units of Product A is received into Location A
10 units Product A are moved to Location B and removed from Location A.
1 unit is removed (sold) from Location B
... and so on.
This means that over to work out how much of each product is stored in each location we would;
"SELECT SUM('qty') FROM stock_movements GROUP BY location, product"
(we actually use Eloquent but I have used SQL for an example)
Over time, this will mean our stock movements table will grow to millions of rows and I am wondering the way to best manage this. The options I can think of:
Sum the rows as grouped above and accept it may get slow over time. Im not sure how many rows it will take before it actually starts to cause any performance issues. When requesting a whole inventory log via our API each row would have to be summed for every product, so this will compile to a fairly large calculation.
Create a snapshot of the summed rows every day/week/month etc. on a cron and then just add the sum of the most recent rows on the fly.
Create a separate table with a live stock level which is added to and subtracted with every stock movement. The stock movements table shows an entire history of all movements while the new table just shows the live amounts. We would use database transactions here to ensure they keep in sync.
Is there a defined and best practice way to handle this kind of thing already? Would love to hear your thoughts!
The good news is that your system is already where a lot of people say the database world should be moving: event sourcing. ES just stores every event against an object, in this case your location, and in order to get the current state you have to start with an empty object and replay all of that objects events.
Of course, this can be time-consuming, and your last two bullet points are the standard ways of dealing with it. First, you can create regular snapshots with the current-as-of-then totals for that location, and then when someone asks for the current-as-of-now totals you only need to replay events since the last snapshot. Second, you can have a separate table of current values, and whenever you insert a record into your event store you also update the current value. If they ever get out-of-sync, you can always start fresh and replay the entire event series again.
Both of these scenarios are typically managed through an intermediary queue service, like SQL's Service Broker, RabbitMQ, or Amazon's SQS: instead of inserting an event directly into your event store, you send the change into a queue and the code that processes the queue will update your snapshot.
Good luck!

Database structure for storing schedules/cron job?

I am stuck with a problem. In an app's db, I am having a schedule table which will store user provided schedules. E.g
Daily
Every Week
Twice a Week
Every 3rd (or any user chosen) day of week
Every Month
Twice a month
Every x day of month
Every x month of year
And so on. These schedules will then provide reference point to schedule different tasks or identify their repeat-ance.
I am not able to think of a proper database structure for it. The best I can get is to have a table with following columns:
Day
Week
Month
Year
type
Then store the specified schedule in the related column and provide the type.
e.g Every week can go like 1 in week column and 1 (designated value for repeating whole) or something like that.
The problem with this approach is that this table is gonna be used very frequently and the data retrieved will not be straightforward. It will need calculation to know the schedule type and hence will require complex db queries to get each type of schedule.
I am implementing it in Laravel app if that can provide any other methodology. It's a SAAS app with huge amount of data related to the schedule table.
Any help will be very much appreciated. Thanks
I suggest you are approaching the problem backwards.
Devise several rules. Code the rules in your app, not in SQL. When inserting an event, pre-fill a calendar through the next 12 months with all occurrences of the event. Every month, go through all events and extend the "pre-fill" through another month (13 months hence).
Now the SELECTs are simple and fast.
SELECT ... WHERE date = '...'
has all the events for that day (assuming it is within 12 months).
The complexity is on inserting. But presumably you insert less often than you select.
The table with the event definitions would be only as complex as needed for your app to figure out what to do. Perhaps
start_date DATE,
frequency ENUM('day', 'week', 'month', ...)
multiplier TINYINT, -- this lets you say "every second week"
offset TINYINT, -- to get "15th of every month"
Twice a week would be two entries.
Better yet, there are several packages (in Perl, shell, etc) that provide a very rich language for expressing event-date-patterns. Furthermore, you may be able to simply 'call' it to do all the work for you!

MySQL Database Structure For Employee Timeclock

I'm working on an app that is partly an employee time clock. It's not too complex but I want to make sure I head in the right direction the first time. I currently have this table structure:
id - int
employee_id - int (fk)
timestamp - mysql timestamp
event_code - int (1 for clock in, 0 for clock out)
I've got everything working where if their last event was a "clock in" they only see the "clock out" button and visa-versa.
My problem is that we will need to run a report that shows how many hours an employee has worked in a month and also total hours during the current fiscal year (Since June 1 of the current year).
Seems like I could store clock in and outs in the same record and maybe even calculate minutes worked between the two events and store that in a column called "worked". Then I would just need to get the sum of all that column for that employee to know how much time total.
Should I keep the structure I have, move to all on one row per pair of clock in and out events, or is there a better way that I'm totally missing?
I know human error is also a big issue for time clocks since people often forget to clock in or out and I'm not sure which structure can handle that easier.
Is MySQL Timestamp a good option or should I use UNIX Timestamp?
Thanks for any advise/direction.
Rich
I would go with two tables:
One table should be simple log of what events occurred, like your existing design.
The second table contains the calculated working hours. There are columns for the logged in and logged out times and perhaps also a third column with the time difference between them precalculated.
The point is that the calculation of how many hours an employee has worked is complicated, as you mention. Employees may complain that they worked longer hours than your program reports. In this case you want to have access to the original log of all events with no information loss so that you can see and debug exactly what happened. But this raw format is slow and difficult to work with in SQL so for reporting purposes you also want the second table so that you can quickly generate reports with weekly, monthly or yearly sums.
Is MySQL Timestamp a good option or should I use UNIX Timestamp?
Timestamp is good because there are lots of MySQL functions that work well with timestamp. You might also want to consider using datetime which is very similar to timestamp.
Related
Should I use field 'datetime' or 'timestamp'?

Storing Calendar Data in MySQL

Just a quick architecture question really on storing calendar data.
Basically, I have a database of services for rental. On the front end, there is a calendar to show either "Available" or "Unavailable" for every future date. In the back-end the user can set any date/date range to available or unavailable (1 or 0) on a jQuery calendar.
The question I have is how would you go about storing this data in mysql and retrieving it on the front end?
Possible have all dates available and store the unavailable dates? Then if they are set to available again, remove the record for that date?
Cheers,
RJ
Possible have all dates available and store the unavailable dates? Then if they are set to available again, remove the record for that date?
Yes, I'd go with that, except I would not remove the record when renting expires - you'll easily know a renting expired because it's in the past, so you automatically keep the history of renting as well.
After all, there is infinite number of available dates1, so you'd have to artificially limit the supported range of dates if you went the other way around (and stored free dates).
1 In the future. And, in some sense, in the past as well.
Also, I'm guessing you want some additional information in case a service is rented (e.g. name of the renter) and there would be nowhere to store that if renting were represented by a non-existent row!
Since the granularity of renting is a whole day, I think you are looking at a database structure similar to this:
Note how RENTING_DAY PK naturally prevents overlaps.
Alternatively, you might ditch the RENTING_DAY and have START_DATE and END_DATE directly in RENTING, but this would require explicit range overlap checks, which may not scale ideally.
Decide whether the default is Available or Unavailable.
Possible have all dates available and store the unavailable dates?
So default is Available?
Then you can put unavailable_start and unavailable_end - store it as a date field. For single days, unavailable_start = _end. Then it's easy to query for a month or any date range and return the unavailability periods in that range. Then have jQuery parse it to display the calendar details for those dates.

Slowly Changing Dimensions in SSAS and SSRS

I have a project where establishments are inspected anything from once every 6 months to once every 3 years and the results of the inspection scorecard are recorded as a record in a type 2 slowly changing dimension table [tblInspections], using StartDate and EndDate to cover the period between inspections for which this scorecard is valid. The inspections table is linked to [tblEstablishments] which contains other details about other fixed dimensions such as location and business type.
So currently, we are providing aggregated reports of current situation (where EndDate is null) and also audit reports for the history of any one establishment (On EstablishmentID)
My next task is to provide more detailed analysis reports of trends of the scorecard results and I need to provide historical aggregated results of the situation on the last day of each month.
My problem is that despite knowing exactly what I want, I am now unsure how to get there.
1) Do I start by writing ETL process to build a cube based on all the historical results working out what all the aggregates would have been at the end of each month?
2) Am I then able to just process the current records at the end of each month effectively add a new slice onto the end of an existing cube without reprocessing from scratch? (if so how?)
3) Is there another way of doing this? Does Analysis Services have better ways of dealing with SCDs automatically when determining historical status at any point in time by selecting the correct record from multiple records with start and end date?
Any advice and pointers to tutorials related to this would be much appreciated.
First I think you are going to want to build a new periodic (monthly) snapshot fact table if you are trying to analyze the inspection results across establishments (and other dimensions, like time/date). Then you can build the ETL process to populate this new fact table. Finally, you can model the fact table as a new measure group in a new or existing cube...be sure to pay attention to the aggregation property of the measures in this new measure group...typically you don't want to sum periodic snapshot measures (think about what happens if you sum your bank account balance at the end of each month and look at it by year).
Yes, you will run your ETL at the end of each month which will had more rows to your periodic (monthly) snapshot fact table. Then you can just process the cube and you are all set.
Analysis Services handles SCD2 Dimensions quite well (assuming you are using Surrogate Keys...you are aren't you?). I think the business process that you are trying to model (Inspections)...is what is causing some confusion because it's no longer a dimension in this new analysis, it has become a fact (a periodic snapshot fact)