Uplift models or binary classification - binary

I can’t figure out what kind of task it is: uplift modeling or binary classification. There is a dataset. Data on which of the customers is the one who will not buy without "flg", but will buy with "flg"! There are two features: "flg" - whether the communication was made (сalling, email, sms), "purchased" - whether a purchase was made. Usually in such tasks, they ask to find the uplift metric. But I need to predict "new_target" which is 1 if the purchase is made after communication and if the purchase is not made without communication, 0 otherwise. Metric AUC. What approach can I use?
I tried to use the models uplift. But I get a low metric... It seems to me that here it is necessary to encode these features somehow

Related

How to design a relational model for double-entry accounting with job costing

I would like to commend to readers the answers here and here for the depth and thought that went into them. I stumbled across them while searching for something tangential for a project I'm working on, and I got caught up reading them from top to bottom.
I am trying to build a niche-market app using these principles (namely, double-entry accounting), with job-costing thrown in. The above answers have been extremely helpful in reshaping my concept of what both the accounting and the database-ing should look and work like. However, I'm having a hard time integrating the job-costing portion of the equation into the excellent graphical examples that were provided.
There were several transaction examples using the House, account holders, fees, etc. I have a few other specific use-cases I would love to get some input on:
I have no customers. I buy a property (usually cash goes out, a liability (loan) is created, an asset (the property) is created), spend a bunch of money to fix it up (either cash out at a store, credit card charges at a store, or a check written to a vendor, which debits the property asset and debits or credits the funding source), and then sell it (cash comes in, the loan is paid off, and hopefully there's more cash left than what I spent on the project). This likely creates more ledger entries than I've listed above, but I'm not an accountant. I think I understand that all my costs go toward my basis in the property, and if my net proceeds are greater than my basis, then I've made money, and if not, then not.
So what I need to record are expenses that a) come from a specific account (i.e. company checking account or owner's Best Buy card etc.), b) are generally associated with a specific job (but not always - I do have the occasional overhead expense like office supplies), and c) are always associated with a cost code (i.e. '100.12 - Window Materials', '100.13 - Window Labor', etc.).
Frequently I receive bills from vendors that are due sometime in the future. I would like to track the bills received but not-yet-paid for a given job (committed costs). I think this transaction looks like this, but I'm not really sure:
As you may have surmised from my quip above about the "owner's Best Buy card," I sometimes (more often than I should) use my personal funds for company- and job-related expenses. I think (again with the caveat that I'm a layman) that all of those expenditures credit "Owner's Equity," and debit/credit other accounts as needed.
I've been keeping track of all of this in a big, ugly spreadsheet, which is why I'm trying to build an app to replace it - the spreadsheet method doesn't work very well and it certainly won't scale.
Preliminary
For those reading this Answer, please note that the context is as follows, in increments:
Derived Account Balance vs Stored Account Balance
Relational Data Model for Double-Entry Accounting
If you have not availed yourself to that, this Answer may not make sense.
I will respond in a sequence that is Normalised, which is of course different to the way you have laid out the problem.
Principle & Correction
There are a few, more than one, errors in your stated problem which you are not aware of, so the first step is awareness; understanding. Once a problem is correctly and precisely declared, it is easy to solve. These are errors that developers commonly make, so they need to be understood as such ... long before an app is contemplated.
1 First Principle
I've been keeping track of all of this in a big, ugly spreadsheet [the spreadsheet method doesn't work very well and it certainly won't scale], which is why I'm trying to build an app to replace it
If the manual (or the previous computerised) system is broken, and you implement a new or replacement app that is based on it, you are guaranteed to carry that broken-ness into the app.
Worse, if this is not understood, a third app can be written, promising to fix the problems in the second app, but it too, is guaranteed to migrate the problems that were not fixed in the first and second app.
Therefore, you must identify and correct every single problem in the system that you are replacing, including testing, before you can design an app and database that has any chance of success.
Scaling is the least of our worries. How any particular thing works with any other thing is the problem.
The fact that you have one great big ugly spreadsheet means that you have an overall perspective: humans can do that, we can fly by the seat of our pants, but computers cannot, they require explicit instructions.
2 Second Principle
I've been keeping track of all of this in a big, ugly spreadsheet [...] - the spreadsheet method doesn't work very well
Why does it not work [as it stands] ?
Reason 1 of 2.
You make a mistake that developers commonly make: you inspect and study the the bits and pieces of a thing, which is in the physical realm, and try to figure out how the thing works. Guaranteed failure, because how a thing works; its purpose; etc, is in the intellectual realm, not the physical.
I won't detail it here, but the larger problem must be noted. This error is a specific instance of a larger error, and very common, that:
developers focus on the functions of the GUI,
instead of the demand, which is to
correctly define the data and its relations, upon which the functions of the GUI are existentially dependent.
A person who has not learned about internal combustion, cannot figure out how to build an engine from looking at the parts of an engine that has been taken apart, even if the parts are laid out carefully. Let alone one with injectors or turbo-chargers. The principle of internal combustion is logical, the parts are physical.
Here you have looked at the spreadsheets that others have used to do their Accounting, and perhaps copied that, without understanding what they are doing with the spreadsheets.
Case in point.
You have examined the first and second linked Answers, and you think you can figure out how to apply that to a new app that fixes the dirty big spreadsheet problem.
Many developers think that if they work out the nuts and bolts, copy-paste-and-substitute, somehow the app will work. Note the carefully thought-out, but still incomplete, graphics that details perceived transactions.
They are missing the logical realm, and messing with the physical realm without the demanded understanding of what they are messing with.
In a word, forget about the pretty graphics for the Transactions, both yours and mine, and seek to understand the Logic (this principle) and the Accounting Standard [3].
"Test driven development" aka "code the minimum" aka "trial and error" is a totally bankrupt method, it has no scientific basis (marketing, yes, but science, no), and it is guaranteed to fail. Dangerous, because the cost is ongoing, never finite.
And to keep failing, if you understand the above.
More precisely, it is anti-science, in that it contradicts the science for building apps and databases.
So the first step is to break that great big spreadsheet down into logical units that have a purpose. And certainly, link each referencing spreadsheet column to the right columns in the referenced spreadsheet ... such that any Amount value is never duplicated.
3 Third Principle
I've been keeping track of all of this in a big, ugly spreadsheet [...] - the spreadsheet method doesn't work very well
Why does it not work, either as it stands, or when the spreadsheet has been divided into logical units ?
Reason 2 of 2.
Lack of Standards.
Since the subject matter is Accounting, we must use Accounting Standards.
That single great big ugly spreadsheet is ready evidence that you have not used an Accountant to set it up. And of course, you cannot set up a set of spreadsheets to do your Accounting without either understanding Accounting or using a qualified Accountant.
Therefore the second step is to either get an Accountant, or obtain a good understanding of Accounting. Note again, the ready evidence of your carefully thought out transactions: despite the fact that you are a very capable person, you cannot figure out the Accounting logic that is in the first and second linked Answers, let alone the Accounting that you need for your app (or your manual system).
So the best advice I can give you is, as stated in the Double-Entry Accounting Answer, find some good Tutorials on the web, and study them.
If you did that, or hired an Accountant to set up your books, you would split the single big fat spreadsheet into standard Accounting Spreadsheets:
Balance Sheet:
Asset or Liability
Profit & Loss:
Revenue or Expense
and one more set (later)
Another way of stating this principle is this. When one is ignorant that a Standard exists, or worse, when one knowingly chooses to not comply with it, one is left in the dangerous position of re-inventing the wheel, from scratch. Aka "Test driven development", aka "code the minimum possible", aka "trial and error". That means that one will go through an entire series of increments of development, which can be eliminated by observance of the Standard.
Problem & Solution
Now that we understand the principles, we can move on to determination of the specific problems, and their solutions. Each of these is a specific application of the Third Principle.
4 Property/Mortgage Treatment
I have no customers. I buy a property (usually cash goes out, a liability (loan) is created, an asset (the property) is created), spend a bunch of money to fix it up (either cash out at a store, credit card charges at a store, or a check written to a vendor, which debits the property asset and debits or credits the funding source), and then sell it
I am not saying that you have not heeded the advice I have given in the Double-Entry Answer. I am saying you have not appreciated the gravity of the advice; what it means in an Accounting context (before we venture into the database context).
Money represents value. Money; value, cannot be created or destroyed. It can only be moved. From one bucket to another. The demand is to have your buckets defined and arranged properly, according to [3].
The property is not created, it already exists. When you buy a property, there is a movement of your cash to the bank, and a movement of their property to you. In the naïve sense only, the property is now an "asset", the mortgage is now a "liability". That naïveté will be clarified into proper accounting buckets later.
You are, in fact, operating as a small single-branch bank; a cooperative; a casino. The precise context for the Double-Entry Accounting Answer. The following is true for
either a corrected set of spreadsheets,
or for following and implementing the Double-Entry Accounting Answer (if you go directly into the app ... without testing the correction to your single spreadsheet).
This is really important to understand, because it has to do with legislation in your country, which you have not mentioned. That legislation will be known to you as Taxation, or your Tax Return for the business. Even if you hold just one property at any one time.
Your "customer" is each bank that is engaged for each property. Name it for the property.
Each mortgage (property) should be set up as an External Account. That will allow you to conduct only those transactions that are actually related to it, against it. Loan Payments; Bank Charges; Expenses; etc. There will be no incoming money, until the property is sold.
In any case, the External Account will match the Bank Statement that the bank gives you for the mortgage account (which you did not mention, but which is a fundamental requirement of Accounting).
As defined in the Double-Entry Accounting Answer, every transaction on an ExternalAccount will have one Double-Entry leg in the Ledger. More, later.
Whether it is an Asset or a Liability in Accounting terms, is a function of the Ledger entry, not a function of the External Account. (By all means, we know it represents a property, which by a naïve perspective is an "asset", until it starts losing money, when it by naïve perspective, becomes a "liability".)
Another way of defining this point is, the bank loan represents a contract, upon which money (value) "changes hands" (is moved). The bank which you engaged is the "customer", the External Account. You must keep all income and expense related to the contract, with the contract.
niche-market app ...
I have a few other specific use-cases ...
No, you don't. There is nothing new under the sun. If you set up your books correctly (multiple linked spreadsheets using Accounting Standards), this is a vanilla use case. Hopefully my explanation has demonstrated that fact.
5 Ledger
Where the above points have to do with the intellectual realm, the understanding of each problem and therein the solution, which causes little work in the physical realm, this point, which has the same demand for the intellectual, is onerous at the physical level. That is, the number of keystrokes; checking; changes; checking ... before you get it set up correctly.
Although the first linked Answer deals with:
Derived vs Stored Account Balance (efficient and audit-able processing re month end),
and the second linked Answer deals with:
Double-Entry Accounting (implementation of an over-arching Accounting Standard in an existing Accounting system, a higher level of audit-ability),
neither explains the Ledger in detail.
The Ledger is the central article of any Accounting system.
The Double-Entry system is not a stand-alone article, but an advancement to that Ledger.
The data model is the specific how to set the database up correctly for both the app, and any reporting client s/w to use, uneventfully.
You do not have a true Ledger. The single big spreadsheet is not a Ledger.
You must set up the Ledger, according to [3]. At best, some of the items in that spreadsheet will be entries in the Ledger, but note, you will perceive them quite differently, due to the corrections set forth in [1][2][3].
Note that when we say "put that in the Ledger" or "that is not in the Ledger", which is for simplicity, what we mean precisely is a reference to single Ledger Entry, which is identified by a specific Account Number in the Ledger.
In the data model, this is LedgerNo.
Likewise, when we say "Accounts", we mean precisely a single Account Number in the Ledger.
If a transaction is not in the Ledger (a specific Account Number, a LedgerNo, one leg of the DEA Credit/Debit), it is not in the "accounts", it is not accounted for.
This is where you will set up genuine Accounts for Assets, and for Liabilities. This is for Internal purposes, in the Ledger, as declared in the margin for Internal in the data model.
The best advice I can give you is, trawl the web for Tutorials on Accounting; determine which are good; study them carefully, with a view to setting up a proper Ledger for your purposes.
The simple answer is, the Ledger is an Hierarchy of Account Numbers.
Wherein the leaf level is an actual AccountNo that can be transacted against,
and the non-leaf levels exist for the purpose of aggregation, no transactions allowed.
Whenever the Ledger is reported (or any derivative of the Ledger, such as BalanceSheet or Profit & Loss):
the hierarchy is shown by indentation,
the transactional Account entries show the Current Balance for the current month
and the aggregate Account entries show the aggregate for the tree under it
[your graphics re transactions]
First and foremost, every Transaction is in the Ledger. That means one leg of the Double-Entry Accounting Transaction is in the Ledger. Look at § 5 in my Double-Entry Accounting Answer, notice that every Business Transaction has at least one blue entry (do not worry about the other details).
Second, the other DEA leg is:
either in the Ledger, meaning that the money moved between one Ledger Account LedgerNo and another Ledger Account LedgerNo. Notice the Business Transactions where both sides are blue.
or in an External Account, meaning that the money moved between one Ledger Account LedgerNo and an External Account AccountNo. Notice the Business Transactions where one side is blue and the other is green.
When you understand that, and you have your Ledger set up, there will be no "??" in your graphics, and the blue/green will be shown. (Do not re-do your graphics, I expect that this Answer will suffice.)
Your "asset/liab" designation is not correct. More precisely, it is premature to make that declaration before the Ledger is fully defined and arranged. First set up your Ledger, with Asset/Liability for each entry in mind. Then you will not have to declare "asset/liab" on each transaction, because that is a function of the Ledger Account Number LedgerNo, not a function of the transaction.
expenses that a) come from a specific account (i.e. company checking account or owner's Best Buy card etc.),
Ledger-ExternalAccount
(one DEA leg in the Ledger, the other leg in the External Account). Noting the caveats above. The other DEA leg will be to one of these (hierarchy):
Expense/Property Improvement/Structure/Material
Expense/Property Improvement/Structure/Labour
Expense/Property Improvement/Fitting/Material
Expense/Property Improvement/Fitting/Labour
Expense/Property Improvement/Furniture
expenses that c) are always associated with a cost code (i.e. '100.12 - Window Materials', '100.13 - Window Labor', etc.).
You will no longer have "cost codes", they will all be Ledger Account Numbers LedgerNos, because the Ledger is where you account for anything and everything.
One DEA leg in the Ledger, the other leg in the External Account for the particular property. The hierarchy will be the same as the previous point.
expenses that b) are generally associated with a specific job
Ledger-ExternalAccount
(one DEA leg in the Ledger, the other in the External Account).
(but not always - I do have the occasional overhead expense like office supplies)
Ledger-Ledger
one DEA leg in the Ledger for an Expense or Liability LedgerNo ... that the money was paid to
Expense/Regular/Office Supplies
the other leg in the Ledger for a Revenue or Asset LedgerNo ... that the money was paid from
Revenue/Monthly Payable
6 Credit & Other Card Treatment
credit card charge
Best Buy card
Each of your cards represents a contract, an Account that that needs to be transacted against, that must be balanced against the monthly statement provided by the institution that issued the card.
Set up each one as an External Account, one DEA leg here, the other in the Ledger.
"owner's Best Buy card" is not clear to me (who is the owner, you or the property owner ... if the latter then the assumption thus far, that "you" buy and sell properties is incorrect.)
In any case, I believe I have given enough detail for you to figure it out.
Do not amalgamate an owner's property Account and their Best Buy card into one External Account: keep separate External Accounts for each.
7 Job Costing
Notice that I am addressing this last, because once you fix the big problems, the problems that remain, are small. What you set out as the big problems (job costing; profit/loss per property) are, once the Ledger has been set up correctly for your business, actually small problems.
As far as I can see, Job Costing is the only remaining point that I have not addressed. First, the issue to be understood here is, the difference between Actuals and Estimates. Everything I have discussed thus far are Actuals.
For Estimates, the Standard procedure is to set up a separate Account structure (tree in the hierarchy) in the Ledger. These are often called Suspense Accounts, as in money that is held in suspense.
Treated properly, these Accounts will prevent you from closing or finalising an External Account before all the Estimates have been transferred to Actuals (Suspense to zero).
The Business Transactions are exactly the same as for Actuals.
This will provide precise tracking of such figures, and also the difference when an item moves from Estimate to Actual.
8 Data Model • Job Costing
Noting that the data model in the first and second linked Answers are complete for the purpose, wherein the Ledger is not expanded:
this Answer deals with explanation of the Ledger, and this data model gives the full definition of a Ledger
Arranged by AccountType
A single-parent hierarchy
Only the leaf level LedgerAccount may be transacted against
The intermediate level LedgerIntermediate is for summarising the tree below it.
I have further Normalised Transaction
expanded External Account to show a Person vs an Organisation
All constraints are made explicit.
Obviously too large for an inline graphic. Here is a PDF in two pages:
the Data Model alone (as above)
the Data Model with sample data and notes, it includes all the examples covered in the Answer
Note the indentation in the Ledger, which denotes the Account hierarchy
Comments
How do you insert the first ledger (e.g. 100 Asset, no parent)?
The Ledger is a Tree, a Single Parent Hierarchy (aka "one way" for strange reasons), as per Account Hierarchy. A root row is required. In a database build operation (using DDL from a file), we generally do all our CREATE TABLEs, followed by all our ADD CONSTRAINT FKs. Insert the root row in with the CREATE TABLE.
After the
CREATE TABLE Ledger
do
INSERT Ledger VALUES ( 0, 0, "I", "AL", "Root", ... ).
After the
CREATE TABLE LedgerIntermediate
do
INSERT LedgerIntermediate VALUES ( 0 ).
Given that the reverse of Comprises is belongs to, all first-level Ledgers eg. Fees, House, Interbank and your Asset would belong to this root row.

Is it possible to train the sentiment classification model with the labeled data and then use it to predict sentiment on data that is not labeled?

I want to do sentiment analysis using machine learning (text classification) approach. For example nltk Naive Bayes Classifier.
But the issue is that a small amount of my data is labeled. (For example, 100 articles are labeled positive or negative) and 500 articles are not labeled.
I was thinking that I train the classifier with labeled data and then try to predict sentiments of unlabeled data.
Is it possible?
I am a beginner in machine learning and don't know much about it.
I am using python 3.7.
Thank you in advance.
Is it possible to train the sentiment classification model with the labeled data and then use it to predict sentiment on data that is not labeled?
Yes. This is basically the definition of what supervised learning is.
I.e. you train on data that has labels, so that you can then put it into production on categorizing your data that does not have labels.
(Any book on supervised learning will have code examples.)
I wonder if your question might really be: can I use supervised learning to make a model, assign labels to another 500 articles, then do further machine learning on all 600 articles? Well the answer is still yes, but the quality will fall somewhere between these two extremes:
Assign random labels to the 500. Bad results.
Get a domain expert assign correct labels to those 500. Good results.
Your model could fall anywhere between those two extremes. It is useful to know where it is, so know if it is worth using the data. You can get an estimate of that by taking a sample, say 25 records, and have them also assigned by a domain expert. If all 25 match, there is a reasonable chance your other 475 records also have been given good labels. If e.g. only 10 of the 25 match, the model is much closer to the random end of the spectrum, and using the other 475 records is probably a bad idea.
("10", "25", etc. are arbitrary examples; choose based on the number of different labels, and your desired confidence in the results.)

What is the best way to structure a database for a stock exchange?

I am trying to make a stock market simulator and i want to be as real as possible.
My question is: Nasdaq has 3000+ companies and in their database of stocks, right?! but is it one entry line for every share of every symbol on the sql db like the following example?
Company Microsoft = MSFT
db `companies_shares`
ID symbol price owner_id* company_id last_trade_datetime
1 msft 58.99 54334 101 2019-06-15 13:09:32
2 msft 58.99 54334 101 2019-06-15 13:09:32
3 msft 58.91 2231 101 2019-06-15 13:32:32
4 msft 58.91 544 101 2019-06-15 13:32:32
*owner_id = user id of the person that last bought the share.
Or is it calculate based on the shares available to trade and demand to buy and sell provided by the market maker? for exemple:
I've already tried the first example by it takes a lot space in my db and i'm concerned about the band width of all those trades, especially when millions of requests(trades) are being made every minute.
I've already tried the first example by it takes a lot space in my db and i'm concerned about the band width of all those trades, especially when millions of requests(trades) are being made every minute.
What is the best solution? Database or math?
Thanks in advance.
You might also want to Google many to many relationships.
Think about it this way. One person might own many stocks. One stock might be held by many people. That is a many to many relationship and usually modelled using three tables in a physical database. This is often written as M:M
Also, people might buy or sell a single company on multiple occasions this would likely be modelled using another table. From the person perspective there will be many transactions so we have a new type of relationship one (person) to many (transactions). This is often written as a 1:M relationship.
As to what to store, as a general rule it is best to store the atomic pieces of data. For example for a transaction, store the customer I'd, transaction date/time, the quantity bought or sold and the price at the very least.
You might also want to read up about normalization. Usually 3rd normal form is a good level to strive for, but a lot of this is a "it depends upon your circumstance and what you need to do". Often people will denormalize for speed of access at the expense of more storage and potentially more complicated updating....
You also mentioned performance, more often than not big companies such as NASDAQ. will use multiple layers if IT infrastructure. Each layer will have a different role and thus different functional characteristics and performance characteristics. Often they will be multiple servers operating together in a cluster. For example they might use a NoSQL system to manage the high volume of trading. From there there might be a feed (e.g. kafka) into other systems for other purposes (e.g. fraud prevention, analytics, reporting and so on).
You also mention data volumes. I do not know how much data you are talking about, but at one financial customer I've worked at have several peta bytes of storage (1 peta byte = 1000 TB) running on over 300 servers just for their analytics platform. They were probably on the medium to large size as far as financial institutions go.
I hope this helps point you in the right direction.

How to generalise over multiple dependent actions in Reinforcement Learning

I am trying to build an RL agent to price paid for airline seats (not the ticket). The general set up is:
After choosing their flights (for n people on a booking), a customer will view a web page with the available seat types and their prices visible.
They select between zero and n seats from a seat map with a variety of different prices for different seats, to be added to their booking.
After perhaps some other steps, they pay for the booking and the agent is rewarded with the seat revenue.
I have not decided on a general architecture yet. I want to take various booking and flight information into account, so I know I will be using function approximation (most likely a neural net) to generalise over the state space.
However, I am less clear on how to set up my action space. I imagine an action would amount to a vector with a price for each different seat type. If I have, for example, 8 different seat types, and 10 different price points for each, this gives me a total of 10^8 different actions, many of which will be very similar. Additionally, each sub-action (pricing one seat type) is somewhat dependent on the others, in the sense that the price of one seat type will likely affect the demand (and hence reward contribution) for another. Hence, I doubt the problem can be decomposed into a set of sub-problems.
I'm interested if there has been any research into dealing with a problem like this. Clearly any agent I build needs some way to generalise across actions to some degree, since collecting real data on millions of actions is not possible, even just for one state.
As I see it, this comes down to two questions:
Is it possible to get an agent to understand actions in relative terms? Say for example, one set of potential prices is [10, 12, 20]. Can I get my agent to realise that there is a natural ordering there, and that the first two pricing actions are more similar to each other than to the third possible action?
Further to this, is it possible to generalise from this set of actions - could an agent be set up to understand that the set of prices [10, 13, 20] is very similar to the first set?
I haven't been able to find any literature on this, especially relating to the second question - any help would be much appreciated!
Correct me if I'm wrong, but I am going to assume this is what you are asking and will answer accordingly.
I am building an RL and it needs to be smart enough to understand that if I were to buy one airplane ticket, it will subsequently affect the price of other airplane tickets because there is now less supply.
Also, the RL agent must realize that actions very close to each other are relatively similar actions, such as [10, 12, 20] ≈ [10, 13, 20]
1) In order to provide memory to your RL agent, you can do this in two ways. The easy way is to feed the states as a vector of past purchased tickets, as well as the current ticket.
Example: Let's say we build the RL to remember at least the past 3 transactions. At the very beginning, our state vector will be [0, 0, 3], meaning that there was no purchases of tickets previously (the zeros), and currently, we are purchasing ticket #3. Then, the next time step's state vector can be [0, 3, 6], telling the RL agent that previously, ticket #3 has been picked, and now we're buying ticket #6. The neural network will learn that the state vector [0, 0, 6] should map to a different outcome compared to [0, 3, 6], because in the first case, ticket #6 was the first ticket purchased and there was lots of supply. But in the 2nd case, ticket 3 was already sold, so now all the remaining tickets went up in price.
The proper and more complex way would be to use a recurrent neural network as your function approximator for your RL agent. The recurrent neural network architecture allows for certain "important" states to be remembered by the neural network. In your case, the amount of tickets purchased previously is important, so the neural network will remember the previously purchased tickets, and calculate the output accordingly.
2) Any function approximation reinforcement learning algorithm will automatically generalize sets of actions close to each other. The only RL architectures that would not do this would be tabular based approaches.
The reason is the following:
We can think of these function approximators simply as a line. Neural networks simply build a highly nonlinear continuous line (neural networks are trained using backpropagation and gradient descent, so they must be continuous), and a set of states will map to a unique set of outputs. Because it is a line, sets of states that are very similar SHOULD map to outputs that are also very close. In the most basic case, imagine y = 2x. If our input x = 1, our y = 2. And if our input x is 1.1, which is very close to 1, our output y = 2.2, which is very close to 2 because they are on a continuous line.
For tabular approach, there is simply a matrix. On the y axis, you have the states, and on the x axis, you have the actions. In this approach, the states and actions are discrete. Depending on the discretization, the difference can be massive, and if the system is poorly discretized, the actions very close to each other MAY not be generalized.
I hope this helps, please let me know if anything is unclear.

Is It Efficient and Scalable for a Neural Network to Rely on Weights that Require Database Interaction?

I'm a high school senior interested in computer science and I have been programming for almost nine years now. I've recently become interested in machine learning and I have decided to implement a neural network. I haven't begun to code it yet and have been in the designing stage for a while now. The objective of the program is to analyze a student's paper, along with some other information, and then predict what grade the student will receive, much like PaperRater. However, I plan to make it far more personal than PaperRater.
The program has four inputs, one is the student's paper, the second is the student's id (i.e, primary key), third is the teacher's id, and finally the course id. I am implementing this on a website where registered, verified users alone can submit their papers for grading. The contents of the paper are going to be weighed in relation to the relationship between the teacher and student and in relation to the course difficulty. The network adapts to the teacher's grading habits for certain classes, the relationship between the teacher and student (e.g., if a teacher dislikes a student you might expect to see a drop in the student's grades), and the course-level (e.g., a teacher shouldn't grade a freshman's paper as harshly as a senior's paper).
However, this approach poses some considerable problems. There is an inherent limit imposed, where the numbers of students, teachers and courses prove to be too much and everything blows up! That's because there is no magic number which can account for every combination of student, teacher and course.
So, I've concluded that each teacher, student, and course must have an individual (albeit arbitrary) weight associated with them, not present in the Neural Network itself. The teacher's weight would describe her grading difficulty, and the student's weight would describe her ability as a writer. The weight of the course would describe the difficulty of the course. Of course, as more and more data is aggregated, the weights should adapt to become more accurate representations.
I realize that there is a relation between teachers and students, teachers and courses, and students and courses; therefore, I plan to make three respective hidden layers which sum the weights of its inputs and apply an activation function. How could I store the weights associated with each teacher, student and course, though?
I have considered storing it in their respective tables, but I don't know how well that would scale (or for that matter, if it would work). I also considered storing it in a file and calling it like that, but I'm sure that would be even worse than storing it in a database.
So the main question I have is: is it (objectively) efficient, in terms of space and computational complexity, and scalable, to store and manage separate, individual weights for each possible element of certain inputs in a SQL database outside of the neural network, if there are a finite (not necessarily small) amount of possible choices for such inputs, and still receive a reasonable output?
Regardless, I would like an explanation as to how come. I believe it would be just fine, but I can't justify it myself and so I'm asking for help. Thanks in advance!
(P.S.: If you realize any problems with my approach not covered in the scope of this question, or have general advice, please include it as an addendum to your answer or please message me).