I have customer dimension table and the location of customer can change.
The customerid filters the sales fact table.
I have 2 options:
Slowly changing dimension type 2 to hold 1 new record for each customer's location changes
Or
Store the location at the time of data load into the sales fact table.
Both ways allow me to see sales by location (although it's a customer location, the etl will place it on fact table).
The later option saves me from implementing SCD on dim table.
What are factors to decide which of the 2 approaches is suitable?
Your fact table should contain things that we measure, count, total. Your dimensions should be descriptive elements that allow users to slice their data along an axis - basically answer the "by" part of their request
I want to see total sales by year and month across this customer based regional hierarchy
Don't take my word for it, grab a data warehousing book or go read the freely available information from the Kimball Group
Storing the customer data on the fact is a bad idea regardless of your database engine. To satisfy a query like the above, the storage engine needs to read in the entirety of your fact table and the supporting dimensions. It could read (Date, RegionId, CustomerId, SalesAmount) which likely costs something like 16 bytes per row times however many rows you have. Or, it can read (Date, RegionId, CustomerName, CustomerAddress, CustomerCity, CustomerState, CustomerPostalCode, SalesAmount) at a cost of what, 70 bytes per row? That's an inflation to
store your data (disk is cheap but that's not the point)
read your data (basic physics, the more data you wrote to disk, the longer it takes to read it back out)
less available memory for other queries (you're in a multi-user/query environment, when you hog resources, there's less for others)
write data (ETL processing is going to take longer because you have to write more pages to disk than you should have)
inability to optimize (What if the business just wants to see "Total Sales by Year and Month" - no customer hierarchy. The database engine will still have to read all the pages with all that useless customer data just to get at the things the user actually wanted)
Finally, the most important takeaway from the Data Warehouse Toolkit is on like page 1. The biggest reason that Data Warehouse projects fails is that IT drives the requirements and it sounds like you're thinking of doing that to avoid creating a SCD type 2 dimension. If the business problem you're attempting to solve is that they need to be able to see sales data associated to the customer data at the point of time it happened, you have a Type 2 customer dimension.
Yes, technologies like Columnstore Compression can reduce the amount of storage required but it's not free because now you're adding workload to the cpu. Maybe you have it, maybe you don't. Or, you model it correctly and then do the compression as well and you still come out ahead in a proper dimensional model.
How you model location depends on what it relates to. If it is an attribute of a sale then it belongs as its own dim related to the sale. If it is an attribute of a customer (such as their home address) then it belongs in the customer dim. If the location is an attribute of both a sale and a customer then it belongs in both
I would like to commend to readers the answers here and here for the depth and thought that went into them. I stumbled across them while searching for something tangential for a project I'm working on, and I got caught up reading them from top to bottom.
I am trying to build a niche-market app using these principles (namely, double-entry accounting), with job-costing thrown in. The above answers have been extremely helpful in reshaping my concept of what both the accounting and the database-ing should look and work like. However, I'm having a hard time integrating the job-costing portion of the equation into the excellent graphical examples that were provided.
There were several transaction examples using the House, account holders, fees, etc. I have a few other specific use-cases I would love to get some input on:
I have no customers. I buy a property (usually cash goes out, a liability (loan) is created, an asset (the property) is created), spend a bunch of money to fix it up (either cash out at a store, credit card charges at a store, or a check written to a vendor, which debits the property asset and debits or credits the funding source), and then sell it (cash comes in, the loan is paid off, and hopefully there's more cash left than what I spent on the project). This likely creates more ledger entries than I've listed above, but I'm not an accountant. I think I understand that all my costs go toward my basis in the property, and if my net proceeds are greater than my basis, then I've made money, and if not, then not.
So what I need to record are expenses that a) come from a specific account (i.e. company checking account or owner's Best Buy card etc.), b) are generally associated with a specific job (but not always - I do have the occasional overhead expense like office supplies), and c) are always associated with a cost code (i.e. '100.12 - Window Materials', '100.13 - Window Labor', etc.).
Frequently I receive bills from vendors that are due sometime in the future. I would like to track the bills received but not-yet-paid for a given job (committed costs). I think this transaction looks like this, but I'm not really sure:
As you may have surmised from my quip above about the "owner's Best Buy card," I sometimes (more often than I should) use my personal funds for company- and job-related expenses. I think (again with the caveat that I'm a layman) that all of those expenditures credit "Owner's Equity," and debit/credit other accounts as needed.
I've been keeping track of all of this in a big, ugly spreadsheet, which is why I'm trying to build an app to replace it - the spreadsheet method doesn't work very well and it certainly won't scale.
Preliminary
For those reading this Answer, please note that the context is as follows, in increments:
Derived Account Balance vs Stored Account Balance
Relational Data Model for Double-Entry Accounting
If you have not availed yourself to that, this Answer may not make sense.
I will respond in a sequence that is Normalised, which is of course different to the way you have laid out the problem.
Principle & Correction
There are a few, more than one, errors in your stated problem which you are not aware of, so the first step is awareness; understanding. Once a problem is correctly and precisely declared, it is easy to solve. These are errors that developers commonly make, so they need to be understood as such ... long before an app is contemplated.
1 First Principle
I've been keeping track of all of this in a big, ugly spreadsheet [the spreadsheet method doesn't work very well and it certainly won't scale], which is why I'm trying to build an app to replace it
If the manual (or the previous computerised) system is broken, and you implement a new or replacement app that is based on it, you are guaranteed to carry that broken-ness into the app.
Worse, if this is not understood, a third app can be written, promising to fix the problems in the second app, but it too, is guaranteed to migrate the problems that were not fixed in the first and second app.
Therefore, you must identify and correct every single problem in the system that you are replacing, including testing, before you can design an app and database that has any chance of success.
Scaling is the least of our worries. How any particular thing works with any other thing is the problem.
The fact that you have one great big ugly spreadsheet means that you have an overall perspective: humans can do that, we can fly by the seat of our pants, but computers cannot, they require explicit instructions.
2 Second Principle
I've been keeping track of all of this in a big, ugly spreadsheet [...] - the spreadsheet method doesn't work very well
Why does it not work [as it stands] ?
Reason 1 of 2.
You make a mistake that developers commonly make: you inspect and study the the bits and pieces of a thing, which is in the physical realm, and try to figure out how the thing works. Guaranteed failure, because how a thing works; its purpose; etc, is in the intellectual realm, not the physical.
I won't detail it here, but the larger problem must be noted. This error is a specific instance of a larger error, and very common, that:
developers focus on the functions of the GUI,
instead of the demand, which is to
correctly define the data and its relations, upon which the functions of the GUI are existentially dependent.
A person who has not learned about internal combustion, cannot figure out how to build an engine from looking at the parts of an engine that has been taken apart, even if the parts are laid out carefully. Let alone one with injectors or turbo-chargers. The principle of internal combustion is logical, the parts are physical.
Here you have looked at the spreadsheets that others have used to do their Accounting, and perhaps copied that, without understanding what they are doing with the spreadsheets.
Case in point.
You have examined the first and second linked Answers, and you think you can figure out how to apply that to a new app that fixes the dirty big spreadsheet problem.
Many developers think that if they work out the nuts and bolts, copy-paste-and-substitute, somehow the app will work. Note the carefully thought-out, but still incomplete, graphics that details perceived transactions.
They are missing the logical realm, and messing with the physical realm without the demanded understanding of what they are messing with.
In a word, forget about the pretty graphics for the Transactions, both yours and mine, and seek to understand the Logic (this principle) and the Accounting Standard [3].
"Test driven development" aka "code the minimum" aka "trial and error" is a totally bankrupt method, it has no scientific basis (marketing, yes, but science, no), and it is guaranteed to fail. Dangerous, because the cost is ongoing, never finite.
And to keep failing, if you understand the above.
More precisely, it is anti-science, in that it contradicts the science for building apps and databases.
So the first step is to break that great big spreadsheet down into logical units that have a purpose. And certainly, link each referencing spreadsheet column to the right columns in the referenced spreadsheet ... such that any Amount value is never duplicated.
3 Third Principle
I've been keeping track of all of this in a big, ugly spreadsheet [...] - the spreadsheet method doesn't work very well
Why does it not work, either as it stands, or when the spreadsheet has been divided into logical units ?
Reason 2 of 2.
Lack of Standards.
Since the subject matter is Accounting, we must use Accounting Standards.
That single great big ugly spreadsheet is ready evidence that you have not used an Accountant to set it up. And of course, you cannot set up a set of spreadsheets to do your Accounting without either understanding Accounting or using a qualified Accountant.
Therefore the second step is to either get an Accountant, or obtain a good understanding of Accounting. Note again, the ready evidence of your carefully thought out transactions: despite the fact that you are a very capable person, you cannot figure out the Accounting logic that is in the first and second linked Answers, let alone the Accounting that you need for your app (or your manual system).
So the best advice I can give you is, as stated in the Double-Entry Accounting Answer, find some good Tutorials on the web, and study them.
If you did that, or hired an Accountant to set up your books, you would split the single big fat spreadsheet into standard Accounting Spreadsheets:
Balance Sheet:
Asset or Liability
Profit & Loss:
Revenue or Expense
and one more set (later)
Another way of stating this principle is this. When one is ignorant that a Standard exists, or worse, when one knowingly chooses to not comply with it, one is left in the dangerous position of re-inventing the wheel, from scratch. Aka "Test driven development", aka "code the minimum possible", aka "trial and error". That means that one will go through an entire series of increments of development, which can be eliminated by observance of the Standard.
Problem & Solution
Now that we understand the principles, we can move on to determination of the specific problems, and their solutions. Each of these is a specific application of the Third Principle.
4 Property/Mortgage Treatment
I have no customers. I buy a property (usually cash goes out, a liability (loan) is created, an asset (the property) is created), spend a bunch of money to fix it up (either cash out at a store, credit card charges at a store, or a check written to a vendor, which debits the property asset and debits or credits the funding source), and then sell it
I am not saying that you have not heeded the advice I have given in the Double-Entry Answer. I am saying you have not appreciated the gravity of the advice; what it means in an Accounting context (before we venture into the database context).
Money represents value. Money; value, cannot be created or destroyed. It can only be moved. From one bucket to another. The demand is to have your buckets defined and arranged properly, according to [3].
The property is not created, it already exists. When you buy a property, there is a movement of your cash to the bank, and a movement of their property to you. In the naïve sense only, the property is now an "asset", the mortgage is now a "liability". That naïveté will be clarified into proper accounting buckets later.
You are, in fact, operating as a small single-branch bank; a cooperative; a casino. The precise context for the Double-Entry Accounting Answer. The following is true for
either a corrected set of spreadsheets,
or for following and implementing the Double-Entry Accounting Answer (if you go directly into the app ... without testing the correction to your single spreadsheet).
This is really important to understand, because it has to do with legislation in your country, which you have not mentioned. That legislation will be known to you as Taxation, or your Tax Return for the business. Even if you hold just one property at any one time.
Your "customer" is each bank that is engaged for each property. Name it for the property.
Each mortgage (property) should be set up as an External Account. That will allow you to conduct only those transactions that are actually related to it, against it. Loan Payments; Bank Charges; Expenses; etc. There will be no incoming money, until the property is sold.
In any case, the External Account will match the Bank Statement that the bank gives you for the mortgage account (which you did not mention, but which is a fundamental requirement of Accounting).
As defined in the Double-Entry Accounting Answer, every transaction on an ExternalAccount will have one Double-Entry leg in the Ledger. More, later.
Whether it is an Asset or a Liability in Accounting terms, is a function of the Ledger entry, not a function of the External Account. (By all means, we know it represents a property, which by a naïve perspective is an "asset", until it starts losing money, when it by naïve perspective, becomes a "liability".)
Another way of defining this point is, the bank loan represents a contract, upon which money (value) "changes hands" (is moved). The bank which you engaged is the "customer", the External Account. You must keep all income and expense related to the contract, with the contract.
niche-market app ...
I have a few other specific use-cases ...
No, you don't. There is nothing new under the sun. If you set up your books correctly (multiple linked spreadsheets using Accounting Standards), this is a vanilla use case. Hopefully my explanation has demonstrated that fact.
5 Ledger
Where the above points have to do with the intellectual realm, the understanding of each problem and therein the solution, which causes little work in the physical realm, this point, which has the same demand for the intellectual, is onerous at the physical level. That is, the number of keystrokes; checking; changes; checking ... before you get it set up correctly.
Although the first linked Answer deals with:
Derived vs Stored Account Balance (efficient and audit-able processing re month end),
and the second linked Answer deals with:
Double-Entry Accounting (implementation of an over-arching Accounting Standard in an existing Accounting system, a higher level of audit-ability),
neither explains the Ledger in detail.
The Ledger is the central article of any Accounting system.
The Double-Entry system is not a stand-alone article, but an advancement to that Ledger.
The data model is the specific how to set the database up correctly for both the app, and any reporting client s/w to use, uneventfully.
You do not have a true Ledger. The single big spreadsheet is not a Ledger.
You must set up the Ledger, according to [3]. At best, some of the items in that spreadsheet will be entries in the Ledger, but note, you will perceive them quite differently, due to the corrections set forth in [1][2][3].
Note that when we say "put that in the Ledger" or "that is not in the Ledger", which is for simplicity, what we mean precisely is a reference to single Ledger Entry, which is identified by a specific Account Number in the Ledger.
In the data model, this is LedgerNo.
Likewise, when we say "Accounts", we mean precisely a single Account Number in the Ledger.
If a transaction is not in the Ledger (a specific Account Number, a LedgerNo, one leg of the DEA Credit/Debit), it is not in the "accounts", it is not accounted for.
This is where you will set up genuine Accounts for Assets, and for Liabilities. This is for Internal purposes, in the Ledger, as declared in the margin for Internal in the data model.
The best advice I can give you is, trawl the web for Tutorials on Accounting; determine which are good; study them carefully, with a view to setting up a proper Ledger for your purposes.
The simple answer is, the Ledger is an Hierarchy of Account Numbers.
Wherein the leaf level is an actual AccountNo that can be transacted against,
and the non-leaf levels exist for the purpose of aggregation, no transactions allowed.
Whenever the Ledger is reported (or any derivative of the Ledger, such as BalanceSheet or Profit & Loss):
the hierarchy is shown by indentation,
the transactional Account entries show the Current Balance for the current month
and the aggregate Account entries show the aggregate for the tree under it
[your graphics re transactions]
First and foremost, every Transaction is in the Ledger. That means one leg of the Double-Entry Accounting Transaction is in the Ledger. Look at § 5 in my Double-Entry Accounting Answer, notice that every Business Transaction has at least one blue entry (do not worry about the other details).
Second, the other DEA leg is:
either in the Ledger, meaning that the money moved between one Ledger Account LedgerNo and another Ledger Account LedgerNo. Notice the Business Transactions where both sides are blue.
or in an External Account, meaning that the money moved between one Ledger Account LedgerNo and an External Account AccountNo. Notice the Business Transactions where one side is blue and the other is green.
When you understand that, and you have your Ledger set up, there will be no "??" in your graphics, and the blue/green will be shown. (Do not re-do your graphics, I expect that this Answer will suffice.)
Your "asset/liab" designation is not correct. More precisely, it is premature to make that declaration before the Ledger is fully defined and arranged. First set up your Ledger, with Asset/Liability for each entry in mind. Then you will not have to declare "asset/liab" on each transaction, because that is a function of the Ledger Account Number LedgerNo, not a function of the transaction.
expenses that a) come from a specific account (i.e. company checking account or owner's Best Buy card etc.),
Ledger-ExternalAccount
(one DEA leg in the Ledger, the other leg in the External Account). Noting the caveats above. The other DEA leg will be to one of these (hierarchy):
Expense/Property Improvement/Structure/Material
Expense/Property Improvement/Structure/Labour
Expense/Property Improvement/Fitting/Material
Expense/Property Improvement/Fitting/Labour
Expense/Property Improvement/Furniture
expenses that c) are always associated with a cost code (i.e. '100.12 - Window Materials', '100.13 - Window Labor', etc.).
You will no longer have "cost codes", they will all be Ledger Account Numbers LedgerNos, because the Ledger is where you account for anything and everything.
One DEA leg in the Ledger, the other leg in the External Account for the particular property. The hierarchy will be the same as the previous point.
expenses that b) are generally associated with a specific job
Ledger-ExternalAccount
(one DEA leg in the Ledger, the other in the External Account).
(but not always - I do have the occasional overhead expense like office supplies)
Ledger-Ledger
one DEA leg in the Ledger for an Expense or Liability LedgerNo ... that the money was paid to
Expense/Regular/Office Supplies
the other leg in the Ledger for a Revenue or Asset LedgerNo ... that the money was paid from
Revenue/Monthly Payable
6 Credit & Other Card Treatment
credit card charge
Best Buy card
Each of your cards represents a contract, an Account that that needs to be transacted against, that must be balanced against the monthly statement provided by the institution that issued the card.
Set up each one as an External Account, one DEA leg here, the other in the Ledger.
"owner's Best Buy card" is not clear to me (who is the owner, you or the property owner ... if the latter then the assumption thus far, that "you" buy and sell properties is incorrect.)
In any case, I believe I have given enough detail for you to figure it out.
Do not amalgamate an owner's property Account and their Best Buy card into one External Account: keep separate External Accounts for each.
7 Job Costing
Notice that I am addressing this last, because once you fix the big problems, the problems that remain, are small. What you set out as the big problems (job costing; profit/loss per property) are, once the Ledger has been set up correctly for your business, actually small problems.
As far as I can see, Job Costing is the only remaining point that I have not addressed. First, the issue to be understood here is, the difference between Actuals and Estimates. Everything I have discussed thus far are Actuals.
For Estimates, the Standard procedure is to set up a separate Account structure (tree in the hierarchy) in the Ledger. These are often called Suspense Accounts, as in money that is held in suspense.
Treated properly, these Accounts will prevent you from closing or finalising an External Account before all the Estimates have been transferred to Actuals (Suspense to zero).
The Business Transactions are exactly the same as for Actuals.
This will provide precise tracking of such figures, and also the difference when an item moves from Estimate to Actual.
8 Data Model • Job Costing
Noting that the data model in the first and second linked Answers are complete for the purpose, wherein the Ledger is not expanded:
this Answer deals with explanation of the Ledger, and this data model gives the full definition of a Ledger
Arranged by AccountType
A single-parent hierarchy
Only the leaf level LedgerAccount may be transacted against
The intermediate level LedgerIntermediate is for summarising the tree below it.
I have further Normalised Transaction
expanded External Account to show a Person vs an Organisation
All constraints are made explicit.
Obviously too large for an inline graphic. Here is a PDF in two pages:
the Data Model alone (as above)
the Data Model with sample data and notes, it includes all the examples covered in the Answer
Note the indentation in the Ledger, which denotes the Account hierarchy
Comments
How do you insert the first ledger (e.g. 100 Asset, no parent)?
The Ledger is a Tree, a Single Parent Hierarchy (aka "one way" for strange reasons), as per Account Hierarchy. A root row is required. In a database build operation (using DDL from a file), we generally do all our CREATE TABLEs, followed by all our ADD CONSTRAINT FKs. Insert the root row in with the CREATE TABLE.
After the
CREATE TABLE Ledger
do
INSERT Ledger VALUES ( 0, 0, "I", "AL", "Root", ... ).
After the
CREATE TABLE LedgerIntermediate
do
INSERT LedgerIntermediate VALUES ( 0 ).
Given that the reverse of Comprises is belongs to, all first-level Ledgers eg. Fees, House, Interbank and your Asset would belong to this root row.
Creating the right database structure from a manual tariff
I have been assigned a rather challenging database design and thought someone may be able to give me a few pointers to help get going. We currently have a warehouse goods in and goods out system and now we would like to use the data to calculate storage charges.
The database already holds the following: Goods date in, Goods date out, Consignment weight, Number of pieces, Dimensions, Description of goods, Storage container type (if applicable). The data is held in MySQL which may not be suitable for the tariff structure below.
Here is the charging structure for Band 1,2,3,4. We have about 12 bands dependent on Customer size and importance. All the other bands are derivatives of the following:
BAND 1
On arrival in our facility
€0.04 per kilo + €4.00 per consignment for general cargo
€0.07 per kilo for MAGAZINES – NO STORAGE CHARGE
STORAGE CHARGES AFTER 5 DAYS
€4.00 per intact pallet max size 120x100x160cm (Standard warehouse wooden pallet)
€6.50 per cubic metre on loose cargo or out of gauge cargo.
CARGO DELIVERED IN SPECIFIC CONTAINERS
20FT PALLET ONLY - €50.00
40FT PALLET ONLY - €20.00
BAND 2
0.04 per kilo no min charge
STORAGE CHARGES AFTER 6 DAYS
€2.50 per cubic metre
CONTAINERS
20FT PALLET ONLY - €50.00
40FT PALLET ONLY - €20.00
BAND 3
€0.03 per kilo + €3.00 per consignment up to 2000kg
€0.02 per kilo + €2.00 per consignment over 2000kg
STORAGE CHARGES AFTER 5 DAYS
€4.00 per pallet max size 120x100x160
€0.04 per kilo loose cargo
BAND 4
€5.00 per pallet
STORAGE CHARGES AFTER 4 DAYS
€5.00 per pallet max size 120x100x160
My thoughts so far are to collect the charging band on arrival of the freight then try and fit the tariff into a table with some normalisation such as container type.
Anyone had experience of this type of manual to system conversion?
Probably the algorithm for computing the tariff is too messy to do in SQL. So, let's approach your question from a different point of view.
Build the algorithm in your client language (Java/PHP/VB/...).
As you are doing step 1, think about what data is needed - perhaps a 2-column array of "days" and "Euros"? Maybe something involving "kilos"? Maybe there are multiple patterns -- days and/or kilos?
Build the table or tables necessary to store those arrays.
Decide how to indicate that kilos is irrelevant -- perhaps by leaving out any rows in the kilo table? Or an extra column that gives size/weight?
My point is that the algorithm needs to drive the task; the database is merely a persistent store.
Here's another approach. Instead of having columns for days, kilos, etc, just have a JSON string of whatever factors are needed. In the client code, decode the JSON, then have suitable IF statements to act on kilos if present, ELSE ... etc.
Again, the database is just a persistent store; the client is driving the format by what is convenient to it.
In either implementation, there would be a column for the type of item involved, plus the Band. The SELECT would have ORDER BY Band. Note that there would be no concept of 12; any number could be implemented.
Performance? Fetching all ~12 rows and stepping through them -- this should not be a performance problem. If you had a thousand bands, you might notice a slight delay.