regression with given upper and lower bounds for the target value - regression

I am using several regressors like xgboost, gradient boosting, random forest or decision tree to predict a continuous target value.
I have some complementary information like I know my prediction (target value) based on all features that I have should be in a given range.
Is there any way to more effectively take into consideration these bounds as a feature to any of these algorithms instead of verifying the range on already predicted values and only doing some post-processing.
Note that by just simply putting the lower and upper bound for my target value, not necessarily these algorithms will learn to effectively compute the prediction in the given range. I am looking for more effective way to take into consideration these bounds as a given data.
Thanks

Related

Determining the convex hull in the presence of outliers

I made a software to create and optimize a racing line in a racetrack.
Now I want to integrate it using real data recorded from GPS, so I need to obtain the g-g diagram, where g is the acceleration. The real g-g diagram is a set of points, in a scatter graph. I need to obtain the contour of that scatter plot, to use it as boundary of limits accelerations.
To obtain data to work on it I recorded myself on two different racetrack.
The code I wrote translate the x-y coordinate to polar R-theta.
Then I divide the circle in a definite number of sector (say, 20).
I calculate the histogram of all R's values in each sector, then from histogram I take the last value with an acceptable number of samples.
Then I draw these lines, and this is the result:
It's not bad, but this boundary is a little inside from the real data, real acceleration is a little bit bigger. I cannot take only the max value, because in this way I take in consideration the absurd values (like 3g in right corner, for sure an error). Moreover, the limit change if I change the number of bins on the histogram, but I cannot find a way to choose the right number of bins.
How can I determine the "true" convex hull, ignoring the outliers?

Custom Openai Gym Environment with Stable-baselines

I am trying to create a simple 2D grid world Openai Gym environment which agent is headed to the terminal cell from anywhere in the grid world. For example, in the 5x5 grid world, X is the current agent location and O is the terminal cell where agent is headed to.
.....
.....
..X..
.....
....O
My action space is defined to discrete value [0,4) which represents up, left, down and right respectively. And, the observation space is a 1D box which denotes the agent current position in the grid world for example [12] (index start from 0 to size*size-1). I am wondering what are the differences between ways of defining the observation space. For example, other than my current definition, an observation space for the same environment can be defined as follow, just to named a few.
discrete value of i, which i represents the current location of agent.
a 2d matrix with all zero except the agent current location which is 1.
maybe others how are these different in term of stable-baselines algorithm or others?
What are the differences between ways of defining the observation space?
I think the better question is:
What are the reason for differences between ways of defining the observation space?
In order to defining the observation space, two thing need to be determined:
What information algorithm need?
This is largely determined by what information you can collect and agents objective, e.g if you want agent to reach the target in a maze, then you may provide info like current location of agent, obstacles direction around agent, target direction, etc.
What form the input information should be?
This is largely determined by what information you use and agents solution(i.e algorithm itself), sometime you have multiple choices and you need run experiments to find out which work best for given algorithm, just like the few you listed.
So in general, reason of different ways to defining the observation space is for better suit different objective and algorithm.

Can I find price floors and ceilings with cuda

Background
I'm trying to convert an algorithm from sequential to parallel, but I am stuck.
Point and Figure Charts
I am creating point and figure charts.
Decreasing
While the stock is going down, add an O every time it breaks through the floor.
Increasing
While the stock is going up, add an X every time it breaks through the ceiling.
Reversal
If the stock reverses direction, but the change is less than a reversal threshold (3 units) do nothing. If the change is greater than the reversal threshold, start a new column (X or O)
Sequential vs Parallel
Sequentially, this is pretty straight forward. I keep a variable for the floor and ceiling. If the current price breaks through the floor or ceiling, or changes more than the reversal threshold, I can take the appropriate action.
My question is, is there a way to find these reversal point in parallel? I'm fairly new to thinking in parallel, so I'm sorry if this is trivial. I am trying to do this in CUDA, but I have been stuck for weeks. I have tried using the finite difference algorithms from NVidia. These produce local max / min but not the reversal points. Small fluctuations produce numerous relative max / min, but most of them are trivial because the change is not greater than the reversal size.
My question is, is there a way to find these reversal point in parallel?
one possible approach:
use thrust::unique to remove periods where the price is numerically constant
use thrust::adjacent_difference to produce 1st difference data
use thrust::adjacent_difference on 1st difference data to get the 2nd difference data, i.e the points where there is a change in the sign of the slope.
use these points of change in sign of slope to identify separate regions of data - build a key vector from these (e.g. with a prefix sum). This key vector segments the price data into "runs" where the price change is in a particular direction.
use thrust::exclusive_scan_by_key on the 1st difference data, to produce the net change of the run
Wherever the net change of the run exceeds a threshold, flag as a "reversal"
Your description of what constitutes a reversal may also be slightly unclear. The above method would not flag a reversal on certain data patterns that you might classify as a reversal. I suspect you are looking beyond a single run as I have defined it here. If that is the case, there may be a method to address that as well - with more steps.

Recurrent NNs: what's the point of parameter sharing? Doesn't padding do the trick anyway?

The following is how I understand the point of parameter sharing in RNNs:
In regular feed-forward neural networks, every input unit is assigned an individual parameter, which means that the number of input units (features) corresponds to the number of parameters to learn. In processing e.g. image data, the number of input units is the same over all training examples (usually constant pixel size * pixel size * rgb frames).
However, sequential input data like sentences can come in highly varying lengths, which means that the number of parameters will not be the same depending on which example sentence is processed. That is why parameter sharing is necessary for efficiently processing sequential data: it makes sure that the model always has the same input size regardless of the sequence length, as it is specified in terms of transition from one state to another. It is thus possible to use the same transition function with the same weights (input to hidden weights, hidden to output weights, hidden to hidden weights) at every time step. The big advantage is that it allows generalization to sequence lengths that did not appear in the training set.
My questions are:
Is my understanding of RNNs, as summarized above, correct?
In the actual code example in Keras I looked at for LSTMs, they padded the sentences to equal lengths before all. By doing so, doesn't this wash away the whole purpose of parameter sharing in RNNs?
Parameter Sharing
Being able to efficiently process sequences of varying length is not the only advantage of parameter sharing. As you said, you can achieve that with padding. The main purpose of parameter sharing is a reduction of the parameters that the model has to learn. This is the whole purpose of using a RNN.
If you would learn a different network for each time step and feed the output of the first model to the second etc. you would end up with a regular feed-forward network. For a number of 20 time steps, you would have 20 models to learn. In Convolutional Nets, parameters are shared by the Convolutional Filters because when we can assume that there are similar interesting patterns in different regions of the picture (for example a simple edge). This drastically reduces the number of parameters we have to learn. Analogously, in sequence learning we can often assume that there are similar patterns at different time steps. Compare 'Yesterday I ate an apple' and 'I ate an apple yesterday'. These two sentences mean the same, but the 'I ate an apple' part occurs on different time steps. By sharing parameters, you only have to learn what that part means once. Otherwise, you'd have to learn it for every time step, where it could occur in your model.
There is a drawback to sharing the parameters. Because our model applies the same transformation to the input at every time step, it now has to learn a transformation that makes sense for all time steps. So, it has to remember, what word came in which time step, i.e. 'chocolate milk' should not lead to the same hidden and memory state as 'milk chocolate'. But this drawback is small compared to using a large feed-forward network.
Padding
As for padding the sequences: the main purpose is not directly to let the model predict sequences of varying length. Like you said, this can be done by using parameter sharing. Padding is used for efficient training - specifically to keep the computational graph during training low. Without padding, we have two options for training:
We unroll the model for each training sample. So, when we have a sequence of length 7, we unroll the model to 7 time steps, feed the sequence, do back-propagation through the 7 time steps and update the parameters. This seems intuitive in theory. But in practice, this is inefficient, because TensorFlow's computational graphs don't allow recurrency, they are feedforward.
The other option is to create the computational graphs before starting training. We let them share the same weights and create one computational graph for every sequence length in our training data. But when our dataset has 30 different sequence lengths this means 30 different graphs during training, so for large models, this is not feasible.
This is why we need padding. We pad all sequences to the same length and then only need to construct one computational graph before starting training. When you have both very short and very long sequence lengths (5 and 100 for example), you can use bucketing and padding. This means, you pad the sequences to different bucket lengths, for example [5, 20, 50, 100]. Then, you create a computational graph for each bucket. The advantage of this is, that you don't have to pad a sequence of length 5 to 100, as you would waste a lot of time on "learning" the 95 padding tokens in there.

Storing vector coordinates in MySQL

I am creating a database to track the (normalized) coordinates of events within a coordinate system. Think: a basketball shot chart, where coordinates of shot attempts are stored relative to where they were taken on the basketball court, in both positive and negative directions from center court.
I'm not exactly sure the best way to store this information in a database in order to give myself the most flexibility in utilizing the data. My options are:
Store a JSON object in a TEXT/CHAR column with X and Y properties
Store each X and Y coordinate in two DECIMAL columns
Use MySQL's spatial POINT object to store the coordinate
My goal is to store a normalized vector2 (as a percentage of the bounding box), so I can map the positions back out onto a rectangle of any size.
It would be nice to be able to do calculations, like distance from another point, but my understanding of spatial objects is that it is more for geographical coordinates than a normalized vector. The other options, however, make calculations a bit more difficult though, currently for my project, they aren't a definitive requirement.
Is it possible to use spatial POINT for this and would calculations be similar to that of measuring geographical points?
It is possible to use POINT, but it may be more of a hassle retrieving or modifying the values as it is stored in binary form. You won't be able to view or modify the field directly; you would use an SQL statement to get the components or create a new POINT to replace the old one.
They are stored as numbers and you can do normal mathematical operations on them. Geospatial-type calculations on distance would use other geospatial data types such as LINESTRING.
To insert a point you would have to create a point from two numbers (I think for your case, there would be no issues with the size of the numbers) :
INSERT INTO coordinatetable(testpoint) VALUES (GeomFromText('POINT(-100473882.33 2133151132.13)'));
INSERT INTO coordinatetable(testpoint) VALUES (GeomFromText('POINT(0.3 -0.213318973)'));
To retrieve it you would have to select the X and Y value separately
SELECT X(testpoint), Y(testpoint) from coordinatetable;
For your case, I would go with storing X and Y coordinate in two DECIMAL columns. It's easier to retrieve, modify and having X and Y coordinates separate would allow you direct access to to the coordinates rather than extract the values you want from data stored in a single field. For larger data sets, it may speed up your queries.
For example:
Whether the player is past half court only requires Y-coordinate
How much help the player could possibly get from the backboard would rely more on the X-coordinate than the Y-coordinate (X closer to zero => Straighter shot)
Whether the player usually scores from locations close to the long edges of the court would rely more on the X-coordinate than the Y-coordinate (X approaches 1 or -1)