How to design a relational model for double-entry accounting with job costing - relational-database

I would like to commend to readers the answers here and here for the depth and thought that went into them. I stumbled across them while searching for something tangential for a project I'm working on, and I got caught up reading them from top to bottom.
I am trying to build a niche-market app using these principles (namely, double-entry accounting), with job-costing thrown in. The above answers have been extremely helpful in reshaping my concept of what both the accounting and the database-ing should look and work like. However, I'm having a hard time integrating the job-costing portion of the equation into the excellent graphical examples that were provided.
There were several transaction examples using the House, account holders, fees, etc. I have a few other specific use-cases I would love to get some input on:
I have no customers. I buy a property (usually cash goes out, a liability (loan) is created, an asset (the property) is created), spend a bunch of money to fix it up (either cash out at a store, credit card charges at a store, or a check written to a vendor, which debits the property asset and debits or credits the funding source), and then sell it (cash comes in, the loan is paid off, and hopefully there's more cash left than what I spent on the project). This likely creates more ledger entries than I've listed above, but I'm not an accountant. I think I understand that all my costs go toward my basis in the property, and if my net proceeds are greater than my basis, then I've made money, and if not, then not.
So what I need to record are expenses that a) come from a specific account (i.e. company checking account or owner's Best Buy card etc.), b) are generally associated with a specific job (but not always - I do have the occasional overhead expense like office supplies), and c) are always associated with a cost code (i.e. '100.12 - Window Materials', '100.13 - Window Labor', etc.).
Frequently I receive bills from vendors that are due sometime in the future. I would like to track the bills received but not-yet-paid for a given job (committed costs). I think this transaction looks like this, but I'm not really sure:
As you may have surmised from my quip above about the "owner's Best Buy card," I sometimes (more often than I should) use my personal funds for company- and job-related expenses. I think (again with the caveat that I'm a layman) that all of those expenditures credit "Owner's Equity," and debit/credit other accounts as needed.
I've been keeping track of all of this in a big, ugly spreadsheet, which is why I'm trying to build an app to replace it - the spreadsheet method doesn't work very well and it certainly won't scale.

Preliminary
For those reading this Answer, please note that the context is as follows, in increments:
Derived Account Balance vs Stored Account Balance
Relational Data Model for Double-Entry Accounting
If you have not availed yourself to that, this Answer may not make sense.
I will respond in a sequence that is Normalised, which is of course different to the way you have laid out the problem.
Principle & Correction
There are a few, more than one, errors in your stated problem which you are not aware of, so the first step is awareness; understanding. Once a problem is correctly and precisely declared, it is easy to solve. These are errors that developers commonly make, so they need to be understood as such ... long before an app is contemplated.
1 First Principle
I've been keeping track of all of this in a big, ugly spreadsheet [the spreadsheet method doesn't work very well and it certainly won't scale], which is why I'm trying to build an app to replace it
If the manual (or the previous computerised) system is broken, and you implement a new or replacement app that is based on it, you are guaranteed to carry that broken-ness into the app.
Worse, if this is not understood, a third app can be written, promising to fix the problems in the second app, but it too, is guaranteed to migrate the problems that were not fixed in the first and second app.
Therefore, you must identify and correct every single problem in the system that you are replacing, including testing, before you can design an app and database that has any chance of success.
Scaling is the least of our worries. How any particular thing works with any other thing is the problem.
The fact that you have one great big ugly spreadsheet means that you have an overall perspective: humans can do that, we can fly by the seat of our pants, but computers cannot, they require explicit instructions.
2 Second Principle
I've been keeping track of all of this in a big, ugly spreadsheet [...] - the spreadsheet method doesn't work very well
Why does it not work [as it stands] ?
Reason 1 of 2.
You make a mistake that developers commonly make: you inspect and study the the bits and pieces of a thing, which is in the physical realm, and try to figure out how the thing works. Guaranteed failure, because how a thing works; its purpose; etc, is in the intellectual realm, not the physical.
I won't detail it here, but the larger problem must be noted. This error is a specific instance of a larger error, and very common, that:
developers focus on the functions of the GUI,
instead of the demand, which is to
correctly define the data and its relations, upon which the functions of the GUI are existentially dependent.
A person who has not learned about internal combustion, cannot figure out how to build an engine from looking at the parts of an engine that has been taken apart, even if the parts are laid out carefully. Let alone one with injectors or turbo-chargers. The principle of internal combustion is logical, the parts are physical.
Here you have looked at the spreadsheets that others have used to do their Accounting, and perhaps copied that, without understanding what they are doing with the spreadsheets.
Case in point.
You have examined the first and second linked Answers, and you think you can figure out how to apply that to a new app that fixes the dirty big spreadsheet problem.
Many developers think that if they work out the nuts and bolts, copy-paste-and-substitute, somehow the app will work. Note the carefully thought-out, but still incomplete, graphics that details perceived transactions.
They are missing the logical realm, and messing with the physical realm without the demanded understanding of what they are messing with.
In a word, forget about the pretty graphics for the Transactions, both yours and mine, and seek to understand the Logic (this principle) and the Accounting Standard [3].
"Test driven development" aka "code the minimum" aka "trial and error" is a totally bankrupt method, it has no scientific basis (marketing, yes, but science, no), and it is guaranteed to fail. Dangerous, because the cost is ongoing, never finite.
And to keep failing, if you understand the above.
More precisely, it is anti-science, in that it contradicts the science for building apps and databases.
So the first step is to break that great big spreadsheet down into logical units that have a purpose. And certainly, link each referencing spreadsheet column to the right columns in the referenced spreadsheet ... such that any Amount value is never duplicated.
3 Third Principle
I've been keeping track of all of this in a big, ugly spreadsheet [...] - the spreadsheet method doesn't work very well
Why does it not work, either as it stands, or when the spreadsheet has been divided into logical units ?
Reason 2 of 2.
Lack of Standards.
Since the subject matter is Accounting, we must use Accounting Standards.
That single great big ugly spreadsheet is ready evidence that you have not used an Accountant to set it up. And of course, you cannot set up a set of spreadsheets to do your Accounting without either understanding Accounting or using a qualified Accountant.
Therefore the second step is to either get an Accountant, or obtain a good understanding of Accounting. Note again, the ready evidence of your carefully thought out transactions: despite the fact that you are a very capable person, you cannot figure out the Accounting logic that is in the first and second linked Answers, let alone the Accounting that you need for your app (or your manual system).
So the best advice I can give you is, as stated in the Double-Entry Accounting Answer, find some good Tutorials on the web, and study them.
If you did that, or hired an Accountant to set up your books, you would split the single big fat spreadsheet into standard Accounting Spreadsheets:
Balance Sheet:
Asset or Liability
Profit & Loss:
Revenue or Expense
and one more set (later)
Another way of stating this principle is this. When one is ignorant that a Standard exists, or worse, when one knowingly chooses to not comply with it, one is left in the dangerous position of re-inventing the wheel, from scratch. Aka "Test driven development", aka "code the minimum possible", aka "trial and error". That means that one will go through an entire series of increments of development, which can be eliminated by observance of the Standard.
Problem & Solution
Now that we understand the principles, we can move on to determination of the specific problems, and their solutions. Each of these is a specific application of the Third Principle.
4 Property/Mortgage Treatment
I have no customers. I buy a property (usually cash goes out, a liability (loan) is created, an asset (the property) is created), spend a bunch of money to fix it up (either cash out at a store, credit card charges at a store, or a check written to a vendor, which debits the property asset and debits or credits the funding source), and then sell it
I am not saying that you have not heeded the advice I have given in the Double-Entry Answer. I am saying you have not appreciated the gravity of the advice; what it means in an Accounting context (before we venture into the database context).
Money represents value. Money; value, cannot be created or destroyed. It can only be moved. From one bucket to another. The demand is to have your buckets defined and arranged properly, according to [3].
The property is not created, it already exists. When you buy a property, there is a movement of your cash to the bank, and a movement of their property to you. In the naïve sense only, the property is now an "asset", the mortgage is now a "liability". That naïveté will be clarified into proper accounting buckets later.
You are, in fact, operating as a small single-branch bank; a cooperative; a casino. The precise context for the Double-Entry Accounting Answer. The following is true for
either a corrected set of spreadsheets,
or for following and implementing the Double-Entry Accounting Answer (if you go directly into the app ... without testing the correction to your single spreadsheet).
This is really important to understand, because it has to do with legislation in your country, which you have not mentioned. That legislation will be known to you as Taxation, or your Tax Return for the business. Even if you hold just one property at any one time.
Your "customer" is each bank that is engaged for each property. Name it for the property.
Each mortgage (property) should be set up as an External Account. That will allow you to conduct only those transactions that are actually related to it, against it. Loan Payments; Bank Charges; Expenses; etc. There will be no incoming money, until the property is sold.
In any case, the External Account will match the Bank Statement that the bank gives you for the mortgage account (which you did not mention, but which is a fundamental requirement of Accounting).
As defined in the Double-Entry Accounting Answer, every transaction on an ExternalAccount will have one Double-Entry leg in the Ledger. More, later.
Whether it is an Asset or a Liability in Accounting terms, is a function of the Ledger entry, not a function of the External Account. (By all means, we know it represents a property, which by a naïve perspective is an "asset", until it starts losing money, when it by naïve perspective, becomes a "liability".)
Another way of defining this point is, the bank loan represents a contract, upon which money (value) "changes hands" (is moved). The bank which you engaged is the "customer", the External Account. You must keep all income and expense related to the contract, with the contract.
niche-market app ...
I have a few other specific use-cases ...
No, you don't. There is nothing new under the sun. If you set up your books correctly (multiple linked spreadsheets using Accounting Standards), this is a vanilla use case. Hopefully my explanation has demonstrated that fact.
5 Ledger
Where the above points have to do with the intellectual realm, the understanding of each problem and therein the solution, which causes little work in the physical realm, this point, which has the same demand for the intellectual, is onerous at the physical level. That is, the number of keystrokes; checking; changes; checking ... before you get it set up correctly.
Although the first linked Answer deals with:
Derived vs Stored Account Balance (efficient and audit-able processing re month end),
and the second linked Answer deals with:
Double-Entry Accounting (implementation of an over-arching Accounting Standard in an existing Accounting system, a higher level of audit-ability),
neither explains the Ledger in detail.
The Ledger is the central article of any Accounting system.
The Double-Entry system is not a stand-alone article, but an advancement to that Ledger.
The data model is the specific how to set the database up correctly for both the app, and any reporting client s/w to use, uneventfully.
You do not have a true Ledger. The single big spreadsheet is not a Ledger.
You must set up the Ledger, according to [3]. At best, some of the items in that spreadsheet will be entries in the Ledger, but note, you will perceive them quite differently, due to the corrections set forth in [1][2][3].
Note that when we say "put that in the Ledger" or "that is not in the Ledger", which is for simplicity, what we mean precisely is a reference to single Ledger Entry, which is identified by a specific Account Number in the Ledger.
In the data model, this is LedgerNo.
Likewise, when we say "Accounts", we mean precisely a single Account Number in the Ledger.
If a transaction is not in the Ledger (a specific Account Number, a LedgerNo, one leg of the DEA Credit/Debit), it is not in the "accounts", it is not accounted for.
This is where you will set up genuine Accounts for Assets, and for Liabilities. This is for Internal purposes, in the Ledger, as declared in the margin for Internal in the data model.
The best advice I can give you is, trawl the web for Tutorials on Accounting; determine which are good; study them carefully, with a view to setting up a proper Ledger for your purposes.
The simple answer is, the Ledger is an Hierarchy of Account Numbers.
Wherein the leaf level is an actual AccountNo that can be transacted against,
and the non-leaf levels exist for the purpose of aggregation, no transactions allowed.
Whenever the Ledger is reported (or any derivative of the Ledger, such as BalanceSheet or Profit & Loss):
the hierarchy is shown by indentation,
the transactional Account entries show the Current Balance for the current month
and the aggregate Account entries show the aggregate for the tree under it
[your graphics re transactions]
First and foremost, every Transaction is in the Ledger. That means one leg of the Double-Entry Accounting Transaction is in the Ledger. Look at § 5 in my Double-Entry Accounting Answer, notice that every Business Transaction has at least one blue entry (do not worry about the other details).
Second, the other DEA leg is:
either in the Ledger, meaning that the money moved between one Ledger Account LedgerNo and another Ledger Account LedgerNo. Notice the Business Transactions where both sides are blue.
or in an External Account, meaning that the money moved between one Ledger Account LedgerNo and an External Account AccountNo. Notice the Business Transactions where one side is blue and the other is green.
When you understand that, and you have your Ledger set up, there will be no "??" in your graphics, and the blue/green will be shown. (Do not re-do your graphics, I expect that this Answer will suffice.)
Your "asset/liab" designation is not correct. More precisely, it is premature to make that declaration before the Ledger is fully defined and arranged. First set up your Ledger, with Asset/Liability for each entry in mind. Then you will not have to declare "asset/liab" on each transaction, because that is a function of the Ledger Account Number LedgerNo, not a function of the transaction.
expenses that a) come from a specific account (i.e. company checking account or owner's Best Buy card etc.),
Ledger-ExternalAccount
(one DEA leg in the Ledger, the other leg in the External Account). Noting the caveats above. The other DEA leg will be to one of these (hierarchy):
Expense/Property Improvement/Structure/Material
Expense/Property Improvement/Structure/Labour
Expense/Property Improvement/Fitting/Material
Expense/Property Improvement/Fitting/Labour
Expense/Property Improvement/Furniture
expenses that c) are always associated with a cost code (i.e. '100.12 - Window Materials', '100.13 - Window Labor', etc.).
You will no longer have "cost codes", they will all be Ledger Account Numbers LedgerNos, because the Ledger is where you account for anything and everything.
One DEA leg in the Ledger, the other leg in the External Account for the particular property. The hierarchy will be the same as the previous point.
expenses that b) are generally associated with a specific job
Ledger-ExternalAccount
(one DEA leg in the Ledger, the other in the External Account).
(but not always - I do have the occasional overhead expense like office supplies)
Ledger-Ledger
one DEA leg in the Ledger for an Expense or Liability LedgerNo ... that the money was paid to
Expense/Regular/Office Supplies
the other leg in the Ledger for a Revenue or Asset LedgerNo ... that the money was paid from
Revenue/Monthly Payable
6 Credit & Other Card Treatment
credit card charge
Best Buy card
Each of your cards represents a contract, an Account that that needs to be transacted against, that must be balanced against the monthly statement provided by the institution that issued the card.
Set up each one as an External Account, one DEA leg here, the other in the Ledger.
"owner's Best Buy card" is not clear to me (who is the owner, you or the property owner ... if the latter then the assumption thus far, that "you" buy and sell properties is incorrect.)
In any case, I believe I have given enough detail for you to figure it out.
Do not amalgamate an owner's property Account and their Best Buy card into one External Account: keep separate External Accounts for each.
7 Job Costing
Notice that I am addressing this last, because once you fix the big problems, the problems that remain, are small. What you set out as the big problems (job costing; profit/loss per property) are, once the Ledger has been set up correctly for your business, actually small problems.
As far as I can see, Job Costing is the only remaining point that I have not addressed. First, the issue to be understood here is, the difference between Actuals and Estimates. Everything I have discussed thus far are Actuals.
For Estimates, the Standard procedure is to set up a separate Account structure (tree in the hierarchy) in the Ledger. These are often called Suspense Accounts, as in money that is held in suspense.
Treated properly, these Accounts will prevent you from closing or finalising an External Account before all the Estimates have been transferred to Actuals (Suspense to zero).
The Business Transactions are exactly the same as for Actuals.
This will provide precise tracking of such figures, and also the difference when an item moves from Estimate to Actual.
8 Data Model • Job Costing
Noting that the data model in the first and second linked Answers are complete for the purpose, wherein the Ledger is not expanded:
this Answer deals with explanation of the Ledger, and this data model gives the full definition of a Ledger
Arranged by AccountType
A single-parent hierarchy
Only the leaf level LedgerAccount may be transacted against
The intermediate level LedgerIntermediate is for summarising the tree below it.
I have further Normalised Transaction
expanded External Account to show a Person vs an Organisation
All constraints are made explicit.
Obviously too large for an inline graphic. Here is a PDF in two pages:
the Data Model alone (as above)
the Data Model with sample data and notes, it includes all the examples covered in the Answer
Note the indentation in the Ledger, which denotes the Account hierarchy
Comments
How do you insert the first ledger (e.g. 100 Asset, no parent)?
The Ledger is a Tree, a Single Parent Hierarchy (aka "one way" for strange reasons), as per Account Hierarchy. A root row is required. In a database build operation (using DDL from a file), we generally do all our CREATE TABLEs, followed by all our ADD CONSTRAINT FKs. Insert the root row in with the CREATE TABLE.
After the
CREATE TABLE Ledger
do
INSERT Ledger VALUES ( 0, 0, "I", "AL", "Root", ... ).
After the
CREATE TABLE LedgerIntermediate
do
INSERT LedgerIntermediate VALUES ( 0 ).
Given that the reverse of Comprises is belongs to, all first-level Ledgers eg. Fees, House, Interbank and your Asset would belong to this root row.

Related

Is It Efficient and Scalable for a Neural Network to Rely on Weights that Require Database Interaction?

I'm a high school senior interested in computer science and I have been programming for almost nine years now. I've recently become interested in machine learning and I have decided to implement a neural network. I haven't begun to code it yet and have been in the designing stage for a while now. The objective of the program is to analyze a student's paper, along with some other information, and then predict what grade the student will receive, much like PaperRater. However, I plan to make it far more personal than PaperRater.
The program has four inputs, one is the student's paper, the second is the student's id (i.e, primary key), third is the teacher's id, and finally the course id. I am implementing this on a website where registered, verified users alone can submit their papers for grading. The contents of the paper are going to be weighed in relation to the relationship between the teacher and student and in relation to the course difficulty. The network adapts to the teacher's grading habits for certain classes, the relationship between the teacher and student (e.g., if a teacher dislikes a student you might expect to see a drop in the student's grades), and the course-level (e.g., a teacher shouldn't grade a freshman's paper as harshly as a senior's paper).
However, this approach poses some considerable problems. There is an inherent limit imposed, where the numbers of students, teachers and courses prove to be too much and everything blows up! That's because there is no magic number which can account for every combination of student, teacher and course.
So, I've concluded that each teacher, student, and course must have an individual (albeit arbitrary) weight associated with them, not present in the Neural Network itself. The teacher's weight would describe her grading difficulty, and the student's weight would describe her ability as a writer. The weight of the course would describe the difficulty of the course. Of course, as more and more data is aggregated, the weights should adapt to become more accurate representations.
I realize that there is a relation between teachers and students, teachers and courses, and students and courses; therefore, I plan to make three respective hidden layers which sum the weights of its inputs and apply an activation function. How could I store the weights associated with each teacher, student and course, though?
I have considered storing it in their respective tables, but I don't know how well that would scale (or for that matter, if it would work). I also considered storing it in a file and calling it like that, but I'm sure that would be even worse than storing it in a database.
So the main question I have is: is it (objectively) efficient, in terms of space and computational complexity, and scalable, to store and manage separate, individual weights for each possible element of certain inputs in a SQL database outside of the neural network, if there are a finite (not necessarily small) amount of possible choices for such inputs, and still receive a reasonable output?
Regardless, I would like an explanation as to how come. I believe it would be just fine, but I can't justify it myself and so I'm asking for help. Thanks in advance!
(P.S.: If you realize any problems with my approach not covered in the scope of this question, or have general advice, please include it as an addendum to your answer or please message me).

Do security concerns justify a DB design that adds multiple rows to a transaction table (1 per unit) for a SINGLE transaction for an INVENTORY system?

EDIT / UPDATE to question:
The following question applies, specifically, to an INVENTORY application.
Specifically, the application supports a large number (thousands) of locations, each with a "medium" quantity of products (enough to fit on the shelves of a typical auto body shop). However, the system might expand to support much larger warehouses.
I do not personally understand why an inventory application would be any different from any other large-scale database application, but the programmer I mention below has told me that in his experience, it is common for inventory applications to utilize the design discussed in this question.
The programmer also states that the design in question applies specifically for applications where there is a high theft rate; i.e. where people working in the auto body shops have an inclination to slip a few items of product here and there off the shelves. Even in this scenario, I do not understand why having the backend system designed as indicated in this question (with the user interface unchanged) would warrant the design indicated.
Therefore, an addendum to the question below is is the design discussed below appropriate specifically for an INVENTORY application in which the theft rate of shelf stockers on the floor is high?
I had a significant difference of opinion with another programmer who presented a fundamentally different schema design proposal for an inventory database that is being built from scratch.
I would like to know whether the schema concept presented is either a standard design, and/or whether it should be considered the design of choice, or at least on par with other potential designs of choice, for an inventory system.
The inventory system is rather simple conceptually, consisting mostly of paint and other liquids, as well as some solid parts, that are stocked on the shelves of auto body shops throughout the country. The items are delivered by truck, entered into the system as additions to the inventory, and then removed from the inventory in order to fulfill repair orders.
The programmer presented a design in which the transaction table (which records every time any inventory items are added to, or removed from, the shop's inventory) has multiple rows added to the table for every individual transaction, one per unit of quantity in the transaction.
For example, if a truck driver delivers 10 cans of a particular blue paint to the shop, and these 10 cans are entered into the system via a single transaction, that nonetheless 10 rows would be added to the transaction table, one per can of paint; each row would be absolutely identical except for the auto-incrementing primary key.
Specifically:
CREATE TABLE IF NOT EXISTS `transactions` (
`transaction_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`part_id` int(10) unsigned, -- FK. parts.part_id
`user_id` int(10) unsigned, -- FK: users.user_id
`action` enum('ADD','REMOVE'),
`action_date` datetime,
PRIMARY KEY (`transaction_id`)
);
Notice, in particular, that the transaction table does not have a quantity column (in case, say, 10 cans of paint were added in one transaction). Instead, there will be 10 rows added to this table for this single transaction (all with the ADD flag set).
From an ease of use, data size, programmatic complexity, and performance perspective, I have confidence that this design is distinctly not as good as a design that has a quantity column and only one row per transaction. In this example, there would be only a single row added, and the quantity column would be set to 10.
The justification given by the programmer is security against malicious users of various sorts.
He agrees that this design is a relatively small additional layer of security in addition to other typical security precautions, but he points out various sorts of potential security features of the one row per unit quantity approach.
In particular, it is easier to take advantage of granular database permissions to prevent malicious users who have hacked into the system (or who are internal employees with access to the system at some level) from deleting entire rows or making bulk row changes, than it is to prevent malicious users from modifying a single quantity field in a single row of data.
In other words, the concept is "write once, freeze in stone, one row for every item".
I did point out that the "write once, freeze in stone" concept also applies to the approach with a quantity column, but it is true that all a malicious user needs to do to "rewrite history" in the latter case is modify a single field in a single row to make significant changes; whereas the only way to make significant changes in the proposed system above is to either delete many rows or make changes to rows in bulk.
I note that from a programmatic perspective, the two designs are quite different. The code to handle the multiple rows per transaction approach will be more complex, to a certain extent, than the code to handle the single row per transaction approach.
The programmer also pointed out that the very fact of the added code complexity using his suggested approach would make it less likely that a programmer in the future would make a mistake that could cause a major inaccuracy to appear in the data. In response, I pointed out that the opposite also holds true: a programmer could more easily make an off-by-one error with his more complex design that could go undetected for many transactions, causing a potentially equally significant inaccuracy to appear in the data.
It does seem to me that any way you cut it, slice it, or dice it, there will be noticeably more complex coding logic to handle the programmatic interface to this proposed design, and there will be noticeably more overhead in terms of performance and scaling. In other words, it's a fundamentally different database design approach, whose only justification is security.
(It's my sense that these security arguments applied much more strongly about 20 years ago, when the overall level of security for web applications and servers was much less sophisticated than it is today. I have the sense that there's only a 1% security gain, at the cost of 30% overhead in programmer time and server scaling costs.)
My question is this: Should the design described here be seriously considered for this system? Or is it clearly true that the schema design with only a single row per transaction - and a quantity column - is a better choice?
ADDITIONAL COMMENT RELATED TO EDIT OF QUESTION AT TOP:
I do not understand how the design discussed in this question would assist with security against gaming the system by employees who stock (and remove items from) the shelves, given that the USER INTERFACE of the application remains unchanged.
Am I missing an important detail about security design? Is there some way in which the backend design discussed in this question would make it more difficult for shelf-stockers to game the system?
Alternatively, is there some business reason why administrators of the system who have access to the backend systems with varying levels of user permission would also have an inducement to game the system in such a way that the given design approach is warranted?
I'm really scratching my head over this one.

Object Oriented style programming for interaction between objects

I am trying to write a program in object-oriented style. I have some confusions when coding the interaction between two objects.
Scenario:
Person (John) gives Person (Betty) $ 5.
Possible solutions (pseudo code):
A) John.pays(Betty, 5);
B) Betty.receives(John, 5);
C) Bank.transfer(John, Betty, 5);
D)
begin transaction:
John.decrease(5);
Betty.increase(5);
end transaction:
E) Service.transferMoney(John, Betty, 5); // Service is a generic service object
Please tell me which one is a more appropriate way of coding in OOP way, and the reason behind it. I am looking for some guidelines like "Tell, Don't Ask" rule.
Thanks.
One thing I've noticed is that people that are new to OOP get caught up in trying to map the physical world into the code they are writing. Do you really care that John and Betty are people or are you actually wanting to depict a bank account? I think your choice of objects in the example actually make it harder to figure out the solution to the problem.
The important parts of this are
1) Where to put the logic of how to move the money.
2) Where to store the data of how much money each person has.
You need to decide if you want to talk about the problem in the context of a person or a customer of a bank (may be a person, company, or something else). I'm guessing you are talking about a customer because assuming it is a person would be limiting and misleading. Also, a Bank is a pretty generic term, is it the big brick building with people inside of it or is it the online website with several different pages that do different things.
A bank account object can have a method (possibly static depending on how you decide to store your data and what all you are going to use your object for) that knows how to transfer from one account to another. The logic of how to transfer does not belong to Betty or John or a bank, it belongs to a bankAccount which can have special logic based on the type of account if there are fee's involved or the like. If you gave that logic to the bank you would end up with a giant bank class with methods for everything from greating a customer to dealing with money in very specific account types. Each account type my have different rules for how it handles transfers. Think of times where you may want to show a transfer or deposit as pending.
If you are just solving the problem of transfering money, there is no need to create a bunch of objects. Based on the known requirements and presumed future requirements the below would be a good choice.
CheckingAccount.Transfer(johnsAccountNo, bettysAccountNo, amount)
Can I ask a question now? Who controls the money? Does John decide the transaction amount, does Betty, or some unspecified 3rd party?
The reason I am asking is because there is no real right or wrong answer here, just one that might be more flexible, or robust than the others. If this is a real life situation then I would model the transaction as something that both parties have to agree on before it proceeds, and the person spending the money (John) initiating it. Something like answer C and #Mysterie Man
tx transaction_request = John.WantsToBuyFor(5); //check if John can
if( Betty.AgreesWith( transaction_request ) ) //check if Betty wants
{
transaction_request.FinalizeWith(Betty); //Do it with Betty
}
and the FinalizeWith function does the math
void FinalizeWith(Person party)
{
requestor.cash -= amount;
party.cash += amount;
{
Of course you might want to add some description of what item is John buying.
The answer to this question is a long and complicated one that you'll get in bits and pieces from a large number of people. Why only in bits and pieces? Because the correct answer depends almost entirely upon what your system's requirements are.
One trap you will have to make sure you don't fall into, however, is this one. Read the answers you get here. You'll get a lot of good advice. (Pay the most attention to the advice that's been voted up a lot.) Once you've read and understood those, read Steve Yegge's rant (and understand it!) as well. It will save you sanity in the long run.
I'd vote for none of the above :)
Why is John paying Betty? That's an important question, as it explains where the entry point is. Let's say John owes Betty money, and it's payday.
public class John
{
public void onPayday()
{
Betty.Receive(5.0f);
}
}
This is if you want to go with a pure object-interaction style approach, of course.
The difference here is that we don't have an outside routine coordinating the interactions between John and Betty. Instead, we have John responding to external events, and choosing when to interact with Betty. This style also leads to very easy descriptions of desired functionality - eg "on payday, John should pay Betty."
This is a pretty good example of what Inversion of Control means - the objects are interacting with each other, rather than being manipulated by some external routine. It's also an exmaple of Tell, Don't Ask, as the objects are telling each other things (John was told it's payday, John tells Betty to accept 5 dollars).
There are a number of alternate solutions here. For instance,
Betty.Receieves(John.Gives(5))
This assumes that the Gives function returns the amount given.
tx = CashTransaction(John, Betty);
tx.Transfer(5);
This assumes the first prameter is the Payor, and the second is the Payee, then you can perform multiple transactions without creating new objects.
Things can be modeled in a number of ways. You should choose the one that most closely resembles what you are trying to model.
There is one property of pure OOP that can help with the example which easily passes under the radar, but the object-capability model makes explicit and centers on. The linked document ("From Objects to Capabilities" (FOtC)) goes into the topic in detail, but (in short) the point of capabilities is that the ability of an object to affect its world is limited to objects it has references to. That may not seem significant at first, but is very important when it comes to protecting access and affects what methods of a class are available in methods of other classes.
Option A) gives account John access to account Betty, while option B) gives Betty access to account John; neither is desirable. With option C), account access is mediated by a Bank, so only Banks could steal or counterfeit money. Option D) is different than the other three: the others show a message being sent but not the implementation, while D) is a method implementation that doesn't show what message it handles, nor what class it handles it for. D) could easily be the implementation for any of the first three options.
FOtC has the beginning of a solution that includes a few other classes:
sealers & unsealers,
purses, which are a little like accounts in that they contain money but don't necessarily have an owner.
mints, which are the only things that can create purses with positive balances
A mint has a sealer/unsealer pair, which it endows to a purse whenever the mint creates one. Purses oversee balance changes; they use the sealer when decreasing a balance, and the unsealer to transfer from one purse to another. Purses can spawn empty purses. Because of the use of sealers & unsealers, a purse only works with other purses created by the same mint. Someone can't write their own purse to counterfeit money; only an object with access to a mint can create money. Counterfeiting is prevented by limiting access to mints.
Anyone with access to a purse can initiate a transaction by spawning an empty purse and transferring money from the first purse into it. The temporary purse can then be sent to a recipient, which can transfer money from the temporary purse to some other purse that it owns. Theft is prevented by limiting access to purses. For example, a bank holds purses on behalf of clients in accounts. Since a bank has access only to the purses of its clients' accounts and temporary purses, only a client's bank can steal from the client (though note that in a transfer between bank accounts, there are two clients that can be victimized, hence two potential thieves).
This system is missing some important details, such as monetary authorities (which hold references to one or more mints) to create money.
All in all, monetary transactions are tricky to implement securely, and thus may not be the best examples to learn the basics of OOP.
If you really want to get OOPy, try the following
Person Betty,John;
CashTransfer PocketMoney;
PocketMoney.from = John;
PocketMoney.to = Betty;
PocketMoney.amount = 20.00;
PocketMoney.transfer();
The point of OOP isn't to make code more like written language, but to have objects with different methods and parameters to make code more readable.
So from the above code, you can see that John is giving Betty $20 in pocket money. The code is meaningful, allowing for easier code readability, as well as understandability.
My vote: C. Where C does what D does (e.g. doesn't lose money, etc.).
In this small example, "the bank" is a perfectly valid entity which knows how much money John and Betty have. Neither John nor Betty should be able to lie to the bank.
Don't be afraid to invert (or not) the logic in an "OO" program as required for the situation.
You should model according to your domain. Option C looks best choice as it will separate the transaction logic into the Bank\Service class.
This is a question I often struggle with myself as a novice programmer. I agree that "C" seems like the best choice. In something like this, I think it's best to use a "neutral" entity such as the "bank". This actually models most real life transactions of importance since most transactions of import utilize checks and/or credit (a neutral 3rd party).
Being new to OOP and finally using some OOP, I'd say that it should be A and B.
We are focusing on persons and it's up to each person to handle his money. We don't know if he's going to use the bank or if he's just getting cash directly from Betty.
You created a Person class and you add methods to the class with two methods: send and recieve. It also must have a public var named balance to keep track of their balances.
You create two Person objects: Betty and John. Use methods accordingly. Like John.sends(Betty, 5). That should create Betty and update Betty's balance as well.
What if they want to use the bank? Add another method, say... Transfer(acct) whatever it is.
That's what I would think.

First-time database design: am I overengineering? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
Background
I'm a first year CS student and I work part time for my dad's small business. I don't have any experience in real world application development. I have written scripts in Python, some coursework in C, but nothing like this.
My dad has a small training business and currently all classes are scheduled, recorded and followed up via an external web application. There is an export/"reports" feature but it is very generic and we need specific reports. We don't have access to the actual database to run the queries. I've been asked to set up a custom reporting system.
My idea is to create the generic CSV exports and import (probably with Python) them into a MySQL database hosted in the office every night, from where I can run the specific queries that are needed. I don't have experience in databases but understand the very basics. I've read a little about database creation and normal forms.
We may start having international clients soon, so I want the database to not explode if/when that happens. We also currently have a couple big corporations as clients, with different divisions (e.g. ACME parent company, ACME healthcare division, ACME bodycare division)
The schema I have come up with is the following:
From the client perspective:
Clients is the main table
Clients are linked to the department they work for
Departments can be scattered around a country: HR in London, Marketing in Swansea, etc.
Departments are linked to the division of a company
Divisions are linked to the parent company
From the classes perspective:
Sessions is the main table
A teacher is linked to each session
A statusid is given to each session. E.g. 0 - Completed, 1 - Cancelled
Sessions are grouped into "packs" of an arbitrary size
Each packs is assigned to a client
I "designed" (more like scribbled) the schema on a piece of paper, trying to keep it normalised to the 3rd form. I then plugged it into MySQL Workbench and it made it all pretty for me: (Click here for full-sized graphic)
(source: maian.org)
Example queries I'll be running
Which clients with credit still left are inactive (those without a class scheduled in the future)
What is the attendance rate per client/department/division (measured by the status id in each session)
How many classes has a teacher had in a month
Flag clients who have low attendance rate
Custom reports for HR departments with attendance rates of people in their division
Question(s)
Is this overengineered or am I headed the right way?
Will the need to join multiple tables for most queries result in a big performance hit?
I have added a 'lastsession' column to clients, as it is probably going to be a common query. Is this a good idea or should I keep the database strictly normalised?
Thanks for your time
Some more answers to your questions:
1) You're pretty much on target for someone who is approaching a problem like this for the first time. I think the pointers from others on this question thus far pretty much cover it. Good job!
2 & 3) The performance hit you will take will largely be dependent on having and optimizing the right indexes for your particular queries / procedures and more importantly the volume of records. Unless you are talking about well over a million records in your main tables you seem to be on track to having a sufficiently mainstream design that performance will not be an issue on reasonable hardware.
That said, and this relates to your question 3, with the start you have you probably shouldn't really be overly worried about performance or hyper-sensitivity to normalization orthodoxy here. This is a reporting server you are building, not a transaction based application backend, which would have a much different profile with respect to the importance of performance or normalization. A database backing a live signup and scheduling application has to be mindful of queries that take seconds to return data. Not only does a report server function have more tolerance for complex and lengthy queries, but the strategies to improve performance are much different.
For example, in a transaction based application environment your performance improvement options might include refactoring your stored procedures and table structures to the nth degree, or developing a caching strategy for small amounts of commonly requested data. In a reporting environment you can certainly do this but you can have an even greater impact on performance by introducing a snapshot mechanism where a scheduled process runs and stores pre-configured reports and your users access the snapshot data with no stress on your db tier on a per request basis.
All of this is a long-winded rant to illustrate that what design principles and tricks you employ may differ given the role of the db you're creating. I hope that's helpful.
You've got the right idea. You can however clean it up, and remove some of the mapping (has*) tables.
What you can do is in the Departments table, add CityId and DivisionId.
Besides that, I think everything is fine...
The only changes I would make are:
1- Change your VARCHAR to NVARCHAR, if you might be going international, you may want unicode.
2- Change your int id's to GUIDs (uniqueidentifier) if possible (this might just be my personal preference). Assuming you eventually get to the point where you have multiple environments (dev/test/staging/prod), you may want to migrate data from one to the other. Have GUID Ids makes this significantly easier.
3- Three layers for your Company -> Division -> Department structure may not be enough. Now, this might be over-engineering, but you could generalize that hierarchy such that you can support n-levels of depth. This will make some of your queries more complex, so that may not be worth the trade-off. Further, it could be that any client that has more layers may be easily "stuffable" into this model.
4- You also have a Status in the Client Table that is a VARCHAR and has no link to the Statuses table. I'd expect a little more clarity there as to what the Client Status represents.
No. It looks like you're designing at a good level of detail.
I think that Countries and Companies are really the same entity in your design, as are Cities and Divisions. I'd get rid of the Countries and Cities tables (and Cities_Has_Departments) and, if necessary, add a boolean flag IsPublicSector to the Companies table (or a CompanyType column if there are more choices than simply Private Sector / Public Sector).
Also, I think there's an error in your usage of the Departments table. It looks like the Departments table serves as a reference to the various kinds of departments that each customer division can have. If so, it should be called DepartmentTypes. But your clients (who are, I assume, attendees) do not belong to a department TYPE, they belong to an actual department instance in a company. As it stands now, you will know that a given client belongs to an HR department somewhere, but not which one!
In other words, Clients should be linked to the table that you call Divisions_Has_Departments (but that I would call simply Departments). If this is so, then you must collapse Cities into Divisions as discussed above if you want to use standard referential integrity in the database.
By the way, it's worth noting that if you're generating CSVs already and want to load them into a mySQL database, LOAD DATA LOCAL INFILE is your best friend: http://dev.mysql.com/doc/refman/5.1/en/load-data.html . Mysqlimport is also worth looking into, and is a command-line tool that's basically a nice wrapper around load data infile.
Most things have already been said, but I feel that I can add one thing: it is quite common for younger developers to worry about performance a little bit too much up-front, and your question about joining tables seems to go into that direction. This is a software development anti-pattern called 'Premature Optimization'. Try to banish that reflex from your mind :)
One more thing: Do you believe you really need the 'cities' and 'countries' tables? Wouldn't having a 'city' and 'country' column in the departments table suffice for your use cases? E.g. does your application need to list departments by city and cities by country?
Following comments based on role as a Business Intelligence/Reporting specialist and strategy/planning manager:
I agree with Larry's direction above. IMHO, It's not so much over engineered, some things just look a little out of place. To keep it simple, I would tag client directly to a Company ID, Department Description, Division Description, Department Type ID, Division Type ID. Use Department Type ID and Division Type ID as references to lookup tables and internal reporting/analysis fields for long term consistency.
Packs table contains "Credit" column, shouldn't that actually be tied to the Client base table so if they many packs you can see how much credit owed is left for future classes? The application can take care of the calc and store it centrally in the Client table.
Company info could use many more fields, including the obvious address/phone/etc. information. I'd also be prepared to add in D&B "DUNs" columns (Site/Branch/Ultimate) long term, Dun and Bradstreet (D&B) has a huge catalog of companies and you'll find later down the road their information is very helpful for reporting/analysis. This will take care of the multiple division issue you mention, and allow you to roll up their hierarchy for sub/division/branches/etc. of large corps.
You don't mention how many records you'll be working with which could imply setting yourself up for a large development initiative which could have been done quicker and far fewer headaches with prepackaged "reporting" software. If your not dealing with a large database (< 65000) rows, make sure MS-Access, OpenOffice (Base) or related report/app dev solutions couldn't do the trick. I use Oracle's free APEX software quite a bit myself, it comes with their free database Oracle XE just download it from their site.
FYI - Reporting insight: for large databases, you typically have two database instances a) transaction database for recording each detailed record. b) reporting database (data mart/data warehouse) housed on a separate machine. For more information search google both Star Schema and Snowflake Schema.
Regards.
I want to address only the concern that joining to mutiple tables will casue a performance hit. Do not be afraid to normalize because you will have to do joins. Joins are normal and expected in relational datbases and they are designed to handle them well. You will need to set PK/FK relationships (for data integrity, this is important to consider in designing) but in many databases FKs are not automatically indexed. Since they wil be used in the joins, you will definitelty want to start by indexing the FKS. PKs generally get an index on creation as they have to be unique. It is true that datawarehouse design reduces the number of joins, but usually one doesn't get to the point of data warehousing until one has millions of records needed to be accessed in one report. Even then almost all data warehouses start with a transactional database to collect the data in real time and then data is moved to the warehouse on a schedule (nightly or monthly or whatever the business need is). So this is a good start even if you need to design a data warehouse later to improve report performance.
I must say your design is impressive for a first year CS student.
It isn't over-engineered, this is how I would approach the problem. Joining is fine, there won't be much of a performance hit (it's completely necessary unless you de-normalise the database out which isn't recommended!). For statuses, see if you can use an enum datatype instead to optimise that table out.
I've worked in the training / school domain and I thought I'd point out that there's generally a M:1 relationship between what you call "sessions" (instances of a given course) and the course itself. In other words, your catalog offers the course ("Spanish 101" or whatever), but you might have two different instances of it during a single semester (Tu-Th taught by Smith, Wed-Fri taught by Jones).
Other than that, it looks like a good start. I bet you'll find that the client domain (graphs leading to "clients") is more complex than you've modeled, but don't go overboard with that until you've got some real data to guide you.
A few things came to mind:
The tables seemed geared to reporting, but not really running the business. I would think when a client signs up, there's essentially an order being placed for the client attending a list of sessions, and that order might be for multiple employees in one company. It would seem an "order" table would really be at the center of your system and driving your data capture and eventual reporting. (Compare the paper documents you've been using to run the business with your database design to see if there's a logical match.)
Companies often don't have divisions. Employees sometimes change divisions/departments, maybe even mid-session. Companies sometimes add/delete/rename divisions/departments. Make sure the possible realtime changing contents of your tables doesn't make subsequent reporting/grouping difficult. With so much contact data split over so many tables, you might have to enforce very strict data entry validation to keep your reports meaningful and inclusive. Eg, when a new client is added, making sure his company/division/department/city match the same values as his coworkers.
The "packs" concept isn't clear at all.
Since you indicate it's a small business, it would be surprising if performance would be an issue, considering the speed and capacity of current machines.

"Proper" way to give clients or managers a reality check on software estimates

Looking back at my past projects I often encounter this one:
A client or a manager presents a task to me and asks for an estimate. I give an estimate say 24 hours. They also ask a business analyst and from what I've heard their experience is mostly non-technical. They give an estimate say 16 hours. In the end, they would consider the value given by the analyst even though aside from providing an estimate on my side, I've explained to them the feasibility of the task on the technical side. They treat the analysts estimate as a "fact of life" even though it is only an estimate and the true value is in the actual task itself. Worse, I see a pattern that they tend to be biased in choosing the lower value (say I presented a lower value estimate than the analyst, they quickly consider it) compared to the feasibility of the task. If you have read Peopleware, they are the types of people who given a set of work hours will do anything and everything in their power to shorten in even though that is not really possible.
Do you have specific negotiation skills and tactics that you used before to avoid this?
If I can help it, I would almost never give a number like "24 hours". Doing so makes several implicit assumptions:
The estimate is accurate to within an hour.
All of the figures in the number are significant figures.
The estimate is not sensitive to conditions that may arise between the time you give the estimate and the time the work is complete.
In most cases these are demonstrably wrong. To avoid falling into the trap posed by (1), quote ranges to reflect how uncertain you are about the accuracy of the estimate: "3 weeks, plus or minus 3 days". This also takes care of (2).
To close the loophole of (3), state your assumptions explicitly: "3 weeks, plus or minutes 3 days, assuming Alice and Bob finish the Frozzbozz component".
IMO, being explicit about your assumptions this way will show a greater depth of thought than the analyst's POV. I'd much rather pay attention to someone who's thought about this more intensely than someone who just pulled a number out of the air, and that will certainly count for plus points on your side of the negotiation.
Do you not have a work breakdown structure that validates your estimate?
If your manager/customer does not trust your estimate, you should be able to easily prove it beyond the ability of an analyst.
Nothing makes your estimate intrinsically better than his beyond the breakdown that shows it to be true. Something like this for example:
Gather Feature Requirements (2 hours)
Design Feature (4 hours)
Build Feature
1 easy form (4 hours)
1 easy business component (4 hours)
1 easy stored procedure (2 hours)
Test Feature
3 easy unit tests (4 hours)
1 regression test (4 hours)
Deploy Feature
1 easy deployment (4 hours)
==========
(28 hours)
Then you say "Okay, I came up with 28 hours, show me where I am wrong. Show me how you can do it in 16."
Sadly scott adams had a lot to contribute to this debate
Dilbert: "In a perfect world the project would take eight months. But based on past projects in this company, I applied a 1.5 incompetence multiplier. And then I applied an LWF of 6.3."
Pointy-Haired Boss: "LWF?"
Alice: "Lying Weasel Factor."
You can "control" clients a little easier than managers since the only power they really have is to not give the work to you (that solves your incorrect estimates problem pretty quickly).
But you just need to point out that it's not the analyst doing the work, it's you. And nobody is better at judging your times than you are.
It's a fact of life that people paying for the work (including managers) will focus on the lower figure. Many times I've submitted proper estimates with lower (e.g., $10.000) and upper bounds (e.g., $11,000) and had emails back saying that the clients were quite happy that I'd quoted $10,000 for the work.
Then, for some reason, they take umbrage when I bill them $10,500. You have to make it clear up front that estimates are, well, estimates, not guarantees. Otherwise they wouldn't be paying time-and-materials but fixed-price (and the fixed price would be considerably higher to cover the fact that the risk is now yours, not theirs).
In addition, you should include all assumptions and risks in any quotes you give. This will both cover you and demonstrate that your estimate is to be taken more seriously than some back-of-an-envelope calculation.
One thing you can do to try to fix this over time, and improve your estimating skills as well, is to track all of the estimates you make, and match those up with the actual time taken. If you can go back to your boss with a list of the last twenty estimates from both you and the business analyst, and the time each actually took, it will be readily apparent whose estimates you should trust.
Under no circumstances give a single figure, give a best, worst and a most likely. If you respond correctly then the next question should be "How do I get a more accurate number" to which the answer should be more detailed requirements and/or design depending where you are in the lifecycle.
Then you give another more refined range of best .. most ... likely and wost. This continues until you are done.
This is known as the cone of uncertanty I have lost count of the number of times I have drawn it on a whiteboard when talking estimates with clients.
Do you have specific negotiation skills and tactics that you used before to avoid this?
Don't work for such people.
Seriously.
Changing their behavior is beyond your control.