Summing a ledger over long period of time. (reconciliation, snapshots, rolling sum?) - mysql

We are building a warehouse stock management system and have a stock movements table that records stock into, through and out of the system, for each product and each location it is stored. i.e.
10 units of Product A is received into Location A
10 units Product A are moved to Location B and removed from Location A.
1 unit is removed (sold) from Location B
... and so on.
This means that over to work out how much of each product is stored in each location we would;
"SELECT SUM('qty') FROM stock_movements GROUP BY location, product"
(we actually use Eloquent but I have used SQL for an example)
Over time, this will mean our stock movements table will grow to millions of rows and I am wondering the way to best manage this. The options I can think of:
Sum the rows as grouped above and accept it may get slow over time. Im not sure how many rows it will take before it actually starts to cause any performance issues. When requesting a whole inventory log via our API each row would have to be summed for every product, so this will compile to a fairly large calculation.
Create a snapshot of the summed rows every day/week/month etc. on a cron and then just add the sum of the most recent rows on the fly.
Create a separate table with a live stock level which is added to and subtracted with every stock movement. The stock movements table shows an entire history of all movements while the new table just shows the live amounts. We would use database transactions here to ensure they keep in sync.
Is there a defined and best practice way to handle this kind of thing already? Would love to hear your thoughts!

The good news is that your system is already where a lot of people say the database world should be moving: event sourcing. ES just stores every event against an object, in this case your location, and in order to get the current state you have to start with an empty object and replay all of that objects events.
Of course, this can be time-consuming, and your last two bullet points are the standard ways of dealing with it. First, you can create regular snapshots with the current-as-of-then totals for that location, and then when someone asks for the current-as-of-now totals you only need to replay events since the last snapshot. Second, you can have a separate table of current values, and whenever you insert a record into your event store you also update the current value. If they ever get out-of-sync, you can always start fresh and replay the entire event series again.
Both of these scenarios are typically managed through an intermediary queue service, like SQL's Service Broker, RabbitMQ, or Amazon's SQS: instead of inserting an event directly into your event store, you send the change into a queue and the code that processes the queue will update your snapshot.
Good luck!

Related

Which data model to use to maintain every day historical data for a customer

I need to maintain everyday closing balance of a customer and plot a line graph based on that balance for the last 365 days. Which data model is preferred to maintain this data ?
MySQL, Cassandra or any other databases ?
The obvious solution would be to have a table with a key [client id, data] and the value being closing balance.
The question is how do you fill that data in? You could have a running job at the day end. The big question is how to make the system reliable? If job fails and is restarted next day, will that provide data for the previous day?
Typical way of addressing this type of problem is using "event sourcing". This is a fancy way of saying that it needs to be a storage of records for every operation executed on a balance. Every add/deduct should be there, including client id and date. These records also may have "resulting balance" - which implies that last operation for a customer in a day has the closing balance as well.
At the end, you will have list of transactions and you will be able to "replay" previous event to get balances. It's your choice if you want to have actual table for daily balances per client - as it may be cheaper to look up that data instead of recalculating it every time.
In banking industry, every transaction is stored as a separate record for this specific reason - to be able to get different reports; incl. closing balances per day.

Database Size Management

I am in the final stages of building my website and I am getting a little nervous that I am doing something wrong with my database.
I am building a Laravel/mysql site that allows users to add events. So I have an Event database. I allow users to choose dates for their event for next six months. That means one Event can have around 180 event dates which I save to a separate database called Shows. Each of those show dates can have tickets and the prices(vip, general etc) in a Ticket database. That means for just one event, it could create a huge number of entries.
event(1) -> show on every day(180) -> 5 ticket types(900 entries in my tickets database)
As far as I can tell this is the correct way to do it, but it seems like my Ticket database is going to get massive quickly. I will be using Elastic Search to filter through the data.

How to calculate the difference between times and store the result in a mySQL database (VB.NET)?

I'm quite new to VB and i'm working on a project to record the details of employees clocking in and clocking out. I want to know how to make it so when the 'clock in' button is clicked the time will start recording and when the 'clock out' button is pressed the time will stop recording. Also once clock out is clicked the hours in between clock in and clock out will be recorded and stored into a mySQL database.
This information will be outputted onto a DataGrid showing the time and date of when the employee has clocked in.
Then the amount of hours will be multiplied by a pre-written hourly wage .. which is already stored inside one of the tables in my mySQL database.
Any help would be appreciated.
You should store the event instead of the result.
Store the clock-in time as well as a row for the clock-out time.
Then you will need a procedure either on your database or in the application that will iterate over the rows and match clock-ins to clock-outs.
This approach will let the application crash/terminate and restart without losing data.
Alternatively you could put the clock-in and out in the same record (different columns), and just insert the clock-out into the first row that matched employee and null clock-out.
I would have the clock In button fire an event in the program that created a record for the employee ID I'm assuming you have at that time.
Then once the clock out button is clicked you would fire an event that would go out to your database and pull in the first record it found with the employee ID you are looking for, a valid clock in time and a null for the clock out time. If the program didn't find something that matched all that criteria you would have to handle that however you wanted (I would do the lookup when the employee logged in or whatever and only allow them access to the clock in button if there was no record present and only allow them to use the clock out button if there was a record found for their ID).
Once you have that record in memory you should set the clock out time and calculate the difference using the clock in time that was written to the database earlier.
I would use a stored procedure in the database to handle adding/updating/managing the record and do all the calculations and whatever else you want to do at the time of the clock in/out inside the program itself. But I think its all just preference as far as where the actual processing takes place is concerned.
The most obvious reason for this is that the program can be shut down in between clock in's and clock out's without losing anything at all. If you try to keep track of it all in memory you will lose all your clock in's once the program is shut down for whatever reason(closed manually/"End Task"ed through task manager/unhandled error).

Making a table that keeps logs of updates in mysql

I'd like to make a table that will keep track of a separate updating table on a day to day basis. For example, I have a table currently that keeps track of inventory and once a day I'd like to run a report that gives me information like how many new items were added, how many items were sold etc, and have each of those queries be stored as separate columns in the table. Is this possible? I've done some research trying to find a solution but haven't had any luck yet.
way 1: use database trigger, which could issue an event when you insert/update/delete a line,
way 2: in your code, like java, keep track of the insert/remove event in a memory counter(you can use spring aop to detect event, and use memory or memcache to keep the numbers), and use a scheduled program to write the data to a table, and reset them every day,(in java, jdk provide there class, or you can use quartz framework),

Slowly Changing Dimensions in SSAS and SSRS

I have a project where establishments are inspected anything from once every 6 months to once every 3 years and the results of the inspection scorecard are recorded as a record in a type 2 slowly changing dimension table [tblInspections], using StartDate and EndDate to cover the period between inspections for which this scorecard is valid. The inspections table is linked to [tblEstablishments] which contains other details about other fixed dimensions such as location and business type.
So currently, we are providing aggregated reports of current situation (where EndDate is null) and also audit reports for the history of any one establishment (On EstablishmentID)
My next task is to provide more detailed analysis reports of trends of the scorecard results and I need to provide historical aggregated results of the situation on the last day of each month.
My problem is that despite knowing exactly what I want, I am now unsure how to get there.
1) Do I start by writing ETL process to build a cube based on all the historical results working out what all the aggregates would have been at the end of each month?
2) Am I then able to just process the current records at the end of each month effectively add a new slice onto the end of an existing cube without reprocessing from scratch? (if so how?)
3) Is there another way of doing this? Does Analysis Services have better ways of dealing with SCDs automatically when determining historical status at any point in time by selecting the correct record from multiple records with start and end date?
Any advice and pointers to tutorials related to this would be much appreciated.
First I think you are going to want to build a new periodic (monthly) snapshot fact table if you are trying to analyze the inspection results across establishments (and other dimensions, like time/date). Then you can build the ETL process to populate this new fact table. Finally, you can model the fact table as a new measure group in a new or existing cube...be sure to pay attention to the aggregation property of the measures in this new measure group...typically you don't want to sum periodic snapshot measures (think about what happens if you sum your bank account balance at the end of each month and look at it by year).
Yes, you will run your ETL at the end of each month which will had more rows to your periodic (monthly) snapshot fact table. Then you can just process the cube and you are all set.
Analysis Services handles SCD2 Dimensions quite well (assuming you are using Surrogate Keys...you are aren't you?). I think the business process that you are trying to model (Inspections)...is what is causing some confusion because it's no longer a dimension in this new analysis, it has become a fact (a periodic snapshot fact)