When ranking is more important than fitting the value of y in a regression model - regression

Let's say you have a model that predicts the purchases of a specific user over a specific period of time.
It seems to work well when we build a model that predicts whether or not to buy and sort users based on that probability.
In the same way, when a model that predicts the purchase amount is constructed and users are sorted based on the predicted amount, the expected performance does not seem to be achieved.
For me, it is important to predict that A will pay more than B. Matching the purchase amount is not important.
What metrics and models can be used in this case?
I am using lightgbm regression as a base.
There is a large variance in the purchase amount. Most users spend 0 won for a certain period, but purchasers spend from a minimum of $1,000 to a maximum of $100,000.

Related

Reinforcement Learning with DQN approach on stock prices

I have programmed a reinforcement model with a DQN approach that is supposed to make purchase decisions based on stock prices.
For the training I use two stock prices. One has an upward trend and one has a downward trend. The time period for both is 1 year (100,000 data points).
As observation I use the price data of the last 1000 data points.
For the training I first collect 100 episodes (one episode is one run of the complete stock price, where the stock price (upward/downward trend is chosen randomly). Per episode I get about 1000 actions (buy, sell, skip).
Then the training takes place with a batch size of 64.
The problem is that the model specializes on one of the stock prices and generates a good reward there. For the other stock price, however, it is very bad and I get a negative reward.
It seems that the model does not try to optimize the average profit over all episodes (upward/downward trend).
As a reward I simply take the money I make per trade in profit or loss. As descout I have set 1.0.
Does anyone have an idea what the problem could be.

Using Poisson Regression to Estimate Rate of Purchase Among Products of Different "Size" (Non-Integers)

AN organization is interested in modelling the sales rate of cases of product sold each week. The product is a luxury item so distributions of sales tend to be small and right-skewed, A typical month (4 weeks) of sales might look like {0, 1, 1, 4}.
While we were originally developing the analysis, this seemed like an obvious application of GLM--specifically Poisson regression to estimate the mean sales rate.
However, the sales team has recently come back and mentioned that they actually sell the product in many smaller sizes, such as 750-mL bottles and 187-mL samples. I could easily convert the sales into equivalent units (a case contains 9L of product), but this would result in some non-integer sales units. If the previous 4-week sales distribution had all been 750mL bottles, for example, the new distribution would look like {0, 0.0833, 0.0833, 0.333}
I would like to be able to model the sales rate in terms of a common unit (a case, or 9L) and I thought I could use an offset term to do this, but I've run into difficulties whenever there are zero products sold (offset term is also zero).
My understanding is that the non-integer values preclude the direct use of a Poisson likelihood to model these data (without some sort of transformation). I could simply try a normal linear model, but the sales data are still discrete (e.g., they can only occupy a handful of values determined by the volume of product and number of units sold). I still feel like a discrete model would be more appropriate but am stumped at how to account for the different "sizes" of product appearing in the data without simply running a separate model for each product.
Have you ever handled data like these in a similar fashion, and how did you make this accommodation?

How to deal with different subsampling time in economic datasets for deep learning?

I am building a deep learning model for macro-economic prediction. However, different indicators varies widely when it comes to its subsampling time, ranging from minutes to annually.
Dataframe example
The picture contains the 'Treasury Rates (DGS1-20)' which is sampled daily and 'Inflation Rate(CPALT...)' which is sampled monthly. These features are essential for the model to train and dropping out the NaN rows would result in too little data.
I've read some books and articles about how to deal with missing data that includes down sampling to monthly time frames, swapping the NaNs with -1, filling it with averages between the last and next value etc. But the methods that I read mostly deals with data sets that has a missing value of about 10% of the whole dataset while in this case of mine, the monthly sampled 'Inflation(CPI)' is missing at 90+% if I combine it with the 'Treasury Rate' dataset.
I was wondering if there was any workaround to handle missing values, particularly for economic data where the sampling time gap ranges so widely. Thank you

How to generalise over multiple dependent actions in Reinforcement Learning

I am trying to build an RL agent to price paid for airline seats (not the ticket). The general set up is:
After choosing their flights (for n people on a booking), a customer will view a web page with the available seat types and their prices visible.
They select between zero and n seats from a seat map with a variety of different prices for different seats, to be added to their booking.
After perhaps some other steps, they pay for the booking and the agent is rewarded with the seat revenue.
I have not decided on a general architecture yet. I want to take various booking and flight information into account, so I know I will be using function approximation (most likely a neural net) to generalise over the state space.
However, I am less clear on how to set up my action space. I imagine an action would amount to a vector with a price for each different seat type. If I have, for example, 8 different seat types, and 10 different price points for each, this gives me a total of 10^8 different actions, many of which will be very similar. Additionally, each sub-action (pricing one seat type) is somewhat dependent on the others, in the sense that the price of one seat type will likely affect the demand (and hence reward contribution) for another. Hence, I doubt the problem can be decomposed into a set of sub-problems.
I'm interested if there has been any research into dealing with a problem like this. Clearly any agent I build needs some way to generalise across actions to some degree, since collecting real data on millions of actions is not possible, even just for one state.
As I see it, this comes down to two questions:
Is it possible to get an agent to understand actions in relative terms? Say for example, one set of potential prices is [10, 12, 20]. Can I get my agent to realise that there is a natural ordering there, and that the first two pricing actions are more similar to each other than to the third possible action?
Further to this, is it possible to generalise from this set of actions - could an agent be set up to understand that the set of prices [10, 13, 20] is very similar to the first set?
I haven't been able to find any literature on this, especially relating to the second question - any help would be much appreciated!
Correct me if I'm wrong, but I am going to assume this is what you are asking and will answer accordingly.
I am building an RL and it needs to be smart enough to understand that if I were to buy one airplane ticket, it will subsequently affect the price of other airplane tickets because there is now less supply.
Also, the RL agent must realize that actions very close to each other are relatively similar actions, such as [10, 12, 20] ≈ [10, 13, 20]
1) In order to provide memory to your RL agent, you can do this in two ways. The easy way is to feed the states as a vector of past purchased tickets, as well as the current ticket.
Example: Let's say we build the RL to remember at least the past 3 transactions. At the very beginning, our state vector will be [0, 0, 3], meaning that there was no purchases of tickets previously (the zeros), and currently, we are purchasing ticket #3. Then, the next time step's state vector can be [0, 3, 6], telling the RL agent that previously, ticket #3 has been picked, and now we're buying ticket #6. The neural network will learn that the state vector [0, 0, 6] should map to a different outcome compared to [0, 3, 6], because in the first case, ticket #6 was the first ticket purchased and there was lots of supply. But in the 2nd case, ticket 3 was already sold, so now all the remaining tickets went up in price.
The proper and more complex way would be to use a recurrent neural network as your function approximator for your RL agent. The recurrent neural network architecture allows for certain "important" states to be remembered by the neural network. In your case, the amount of tickets purchased previously is important, so the neural network will remember the previously purchased tickets, and calculate the output accordingly.
2) Any function approximation reinforcement learning algorithm will automatically generalize sets of actions close to each other. The only RL architectures that would not do this would be tabular based approaches.
The reason is the following:
We can think of these function approximators simply as a line. Neural networks simply build a highly nonlinear continuous line (neural networks are trained using backpropagation and gradient descent, so they must be continuous), and a set of states will map to a unique set of outputs. Because it is a line, sets of states that are very similar SHOULD map to outputs that are also very close. In the most basic case, imagine y = 2x. If our input x = 1, our y = 2. And if our input x is 1.1, which is very close to 1, our output y = 2.2, which is very close to 2 because they are on a continuous line.
For tabular approach, there is simply a matrix. On the y axis, you have the states, and on the x axis, you have the actions. In this approach, the states and actions are discrete. Depending on the discretization, the difference can be massive, and if the system is poorly discretized, the actions very close to each other MAY not be generalized.
I hope this helps, please let me know if anything is unclear.

How does SPSS assign factor scores for cases where underlying variables were pairwise deleted?

Here's a simplified example of what I'm trying to figure out from a report. All analyses are being run in SPSS, which I don't have and don't use (my experience is with SAS and R).
They were running a regression to predict overall meal satisfaction from food type ordered, self-reported food flavor, and self-reported food texture.
But food flavor and texture are highly correlated, so they conducted a factor analysis, found food flavor and texture load on one factor, and used the factor scores in the regression.
However, about 40% of respondents don't have responses on self-reported food texture, so they used pairwise deletion while making the factors.
My question is when SPSS calculates the factor scores and outputs them as new variables in the data set, what does it do with people who had an input for a factor that was pairwise deleted?
How does it calculate (if it calculates it at all) factor scores for those people who had a response pairwise deleted during the creation of the factors and who therefore have missing data for one of the variables?
Factor scores are a linear combination of their scaled inputs. That is, given normalized variables X1, ..., Xn, we have the following (where LaTeX formatting isn't supported and the L's indicate the loadings)
f = \sum_{i=1}^n L_i X_i
In your case n = 2. Now suppose one of the X_i is missing. How do you take a sum involving a missing value? Clearly, you cannot... unless you impute it.
I don't know what the analyst who created your report did. I suggest you ask them.