I have a project where establishments are inspected anything from once every 6 months to once every 3 years and the results of the inspection scorecard are recorded as a record in a type 2 slowly changing dimension table [tblInspections], using StartDate and EndDate to cover the period between inspections for which this scorecard is valid. The inspections table is linked to [tblEstablishments] which contains other details about other fixed dimensions such as location and business type.
So currently, we are providing aggregated reports of current situation (where EndDate is null) and also audit reports for the history of any one establishment (On EstablishmentID)
My next task is to provide more detailed analysis reports of trends of the scorecard results and I need to provide historical aggregated results of the situation on the last day of each month.
My problem is that despite knowing exactly what I want, I am now unsure how to get there.
1) Do I start by writing ETL process to build a cube based on all the historical results working out what all the aggregates would have been at the end of each month?
2) Am I then able to just process the current records at the end of each month effectively add a new slice onto the end of an existing cube without reprocessing from scratch? (if so how?)
3) Is there another way of doing this? Does Analysis Services have better ways of dealing with SCDs automatically when determining historical status at any point in time by selecting the correct record from multiple records with start and end date?
Any advice and pointers to tutorials related to this would be much appreciated.
First I think you are going to want to build a new periodic (monthly) snapshot fact table if you are trying to analyze the inspection results across establishments (and other dimensions, like time/date). Then you can build the ETL process to populate this new fact table. Finally, you can model the fact table as a new measure group in a new or existing cube...be sure to pay attention to the aggregation property of the measures in this new measure group...typically you don't want to sum periodic snapshot measures (think about what happens if you sum your bank account balance at the end of each month and look at it by year).
Yes, you will run your ETL at the end of each month which will had more rows to your periodic (monthly) snapshot fact table. Then you can just process the cube and you are all set.
Analysis Services handles SCD2 Dimensions quite well (assuming you are using Surrogate Keys...you are aren't you?). I think the business process that you are trying to model (Inspections)...is what is causing some confusion because it's no longer a dimension in this new analysis, it has become a fact (a periodic snapshot fact)
Related
I am looking for a way to store auto-generated reports. There are about 10-15 columns and 100-3000 rows depending on the report but each report is consistent in column count.
I am looking for a way to organise and store these reports into a large group without creating an entire new database and 1000s of tables to store each indervidual report.
The reports need to be queryable so they can be subdivided by team/area/person etc as each report can be a combination of 3-4 different sub-reports depending on how you split/sort the data.
I am using Python to collect and sort the data from the database so using MariaDB/MySQL would be preferred but im happy to use something else if there is a pre-exising connection libary for it.
To sum up i need something similar to a excel spreadsheet with each table being a sheet and sheet name being the date it was generated so i can select by the date generated.
Think through the goals.
Is this a legal issue -- you need to produce an unalterable report as something "official". A la a non-editable .pdf?
(at the opposite extreme) Be able to generate (or regenerate) any report for any timeframe.
Is performance an issue? (Either perceived or real)
I like to build and maintain Summary Table(s) for any "Data Warehouse" application. And build "reports" that take as a parameter a date range and a small number of other things. And have the report generation so fast that it does not matter if multiple people are pulling reports at random times.
15 columns and 3000 rows is usually excessive. If pulling a report is trivial enough, it can be less 'massive'; just get the parts you want, without such bulk.
http://mysql.rjweb.org/doc.php/summarytables
We are building a warehouse stock management system and have a stock movements table that records stock into, through and out of the system, for each product and each location it is stored. i.e.
10 units of Product A is received into Location A
10 units Product A are moved to Location B and removed from Location A.
1 unit is removed (sold) from Location B
... and so on.
This means that over to work out how much of each product is stored in each location we would;
"SELECT SUM('qty') FROM stock_movements GROUP BY location, product"
(we actually use Eloquent but I have used SQL for an example)
Over time, this will mean our stock movements table will grow to millions of rows and I am wondering the way to best manage this. The options I can think of:
Sum the rows as grouped above and accept it may get slow over time. Im not sure how many rows it will take before it actually starts to cause any performance issues. When requesting a whole inventory log via our API each row would have to be summed for every product, so this will compile to a fairly large calculation.
Create a snapshot of the summed rows every day/week/month etc. on a cron and then just add the sum of the most recent rows on the fly.
Create a separate table with a live stock level which is added to and subtracted with every stock movement. The stock movements table shows an entire history of all movements while the new table just shows the live amounts. We would use database transactions here to ensure they keep in sync.
Is there a defined and best practice way to handle this kind of thing already? Would love to hear your thoughts!
The good news is that your system is already where a lot of people say the database world should be moving: event sourcing. ES just stores every event against an object, in this case your location, and in order to get the current state you have to start with an empty object and replay all of that objects events.
Of course, this can be time-consuming, and your last two bullet points are the standard ways of dealing with it. First, you can create regular snapshots with the current-as-of-then totals for that location, and then when someone asks for the current-as-of-now totals you only need to replay events since the last snapshot. Second, you can have a separate table of current values, and whenever you insert a record into your event store you also update the current value. If they ever get out-of-sync, you can always start fresh and replay the entire event series again.
Both of these scenarios are typically managed through an intermediary queue service, like SQL's Service Broker, RabbitMQ, or Amazon's SQS: instead of inserting an event directly into your event store, you send the change into a queue and the code that processes the queue will update your snapshot.
Good luck!
Lifecycle report
Before I spend a few hundred hours working on a report I wanted to get some feedback to determine if SSRS can do what I want to accomplish.
I have a database that stores data about product lifecycles. The database stores a variety of stages that a product may be in the form "Product Name: Stage Description: start date: end date".
When I query the database the data comes back with one row per stage, so if a product has 4 stages (example: planned, Production, Retire, Remove) I get 4 rows of data. This is the first problem that can probably be solved in SQL by concatenating all stages into a single row.
The data needs to be displayed horizontally plotted against a time scale (in years and quarters) with different colors to represent each stage using the associated start and end date for each stage. In addition, there are special conditions where we may pay extended support for a product that has gone "end of life" from a manufacturer. I would like to indicate where this occurs with a symbol (like a $ sign).
If SSRS cannot do this I would be interested in other suggested products.
Comments, opinions and suggestions appreciated.
[Lifecycle Report]
Does the Socrata SODA API support a method to query out all the dates a dataset has been updated? Basically a changelog for the dataset that has an object for every modification/update to a dataset.
There is an existing question that asks for the last modified date (you can get it through the "/data.json API available on all Socrata-powered sites".
There is also a method to get the modified dates of individual rows using System Fields and the :update_at field. But this is incomplete, a data provider might update every row each time. This means there is no guarantee that we are really getting back a history of modifications, just the top layer of modification on each row.
I'm looking for the complete list of modification dates, at least. We are trying to get a sense of activity on datasets and we need to know how often they are being updated.
Unfortunately, Max, we don't offer what you're looking for. We've got the last time the dataset and metadata were modified, but not a changelog of every single time that there was a change.
A surprisingly large number of datasets change very frequently, as often as every 5 minutes.
I'm relatively new to Access and VBA but I have managed to get some basic VBA tricks working in both Access and Excel. Now I've got a challenge that I can't seem to crack. I'm building a database to track maintenance of a small trucking fleet. I've got most of the tables and forms I need to do the basic tracking and management of equipment and maintenance in place.
One of the things we're tracking is called PM's which stands for preventative maintenance (lube jobs and oil changes). We do those on calendar intervals for trailers and mileage intervals for tractors. Right now, I'm trying to get the calendar tracked equipment working. I've a table called tblEquipmentMaster which is where all the specifics for each piece of equipment is kept (make, model, year, VIN, etc) and that table has a field called LastPMDate. All the maintenance records go in two other tables, tblMaintenance which records the unit number, vendor, invoice date and invoice amount and tblMaintenanceDetails which records the each line item of work that was performed on the unit (i.e. replaced water pump, replaced headlight etc).
The maintenance details table also contains a drop down list of standard maintenance codes to allow for easier searching of certain maintenance items later. One of those codes is PM. I also have several forms built to interact with these tables including a data entry form for adding new maintenance records.
What I'm trying to accomplish is to have the LastPMDate field for any unit number in tblEquipmentMaster automatically update to match the InvoiceDate field in tblMaintenance anytime an invoice is entered for that unit number which has a line item containing the code PM.
I've tried building an update query to do this but in addition to changing the LastPMDate field like I want it to, it also ends up changing the invoice dates for all previous PM invoices to the date of the last invoice which contained a PM. Not good.
So my question is, would an update query be the best way to do this or would I be better off with some sort of VBA solution? I have an add record button on my maintenance invoice data entry form which users use as a save record/clear form button when all the info for an invoice has been entered. I'm thinking some VBA code tied to the on_click of that button which would look at the invoice you just added, determine if it contains the PM maintenance code, then update LastPMDate field for that unit number with invoice date from that invoice would be a good way to do it but I honestly have no idea what functions or methods I'd need to get that to work.
Any insights or suggestions appreciated.
It's very hard to follow the flow of what you are describing, even if I have experience of PM and AM (Autonomous maintenance).
What is lacking is the relations between the three tables.
Nevertheless I prefer VBA solutions (maybe because I started programming when everything had to be written...). With VBA you can finely control your workflow.
If I understood well the 1st table is tblMaintenance in which you have the Invoice data.
Then you should have to scan the tblMaintenanceDetails to find the list of PMs and filter the tblEquipmentMaster with current PM value to update the LastPMDate with the date of the 1st table.
Did I succeed in providing you an idea to solve your problem?
Let me know.