What is the "predictions for different feature values" curve in a catboost feature plot? - catboost

The plot I am referring to can be found here. It is reproduced by calling the calc_feature_statistics function.
It is clear to me what the blue and orange curve (mean target and mean prediction) represent.
What is the red line(predictions for different feature values) ?

from the link:
To calculate it, the value of the feature is successively changed to fall into every bucket for every input object. The value for a bucket on the graph is calculated as the average for all objects when their feature values are changed to fall into this bucket.
As far as I understand these words the explanation is as next:
for example you've got categorical feature with three possible values: 'Moscow', 'London', 'New York'. Then:
Let's set all values of this feature in train data as 'Moscow' and
calculate average prediction among all of the data with the model we
trained earlier. This will be the dot of red line for bucket
'Moscow'
Repeat previous step with value 'London' --> this will be dot of red line for bucket 'London'
Same for New York.

Related

If I have a multi-modal regression model (output: a,b,c,d) based on some input (x,y,z), can I provide prior (b) to predict one or more of the outputs

if the inputs to my model are x,y,z and my outputs are continuous variables a,b,c,d I can obviously use the model to predict the vector [a,b,c,d] from [x,y,z].
However what happens if I find myself in a situation whereby I have say value b as a prior, before inference? Can i run the network in a manner such that I am predicting [a,c,d] based on [x,y,z,b]?
Real World example: I have an image with pixel locations (x,y) and pixel values (R,G,B). I then build a neural implicit model to predict pixel values based on pixel locations, say now I have a situation where I have the green values for some pixels as well as their locations, can I use these green values as a prior with my original network to get an improved result. Note that I am not interested in training a new network on said data.
In mathematical terms: I have network f(x,y,z) -> (a,b,c,d) how can I perform f(x,y,z|b) -> (a,c,d)?
I have not tried much here, thinking of maybe passing the prior value back through the network but am kinda lost.

PointNet can't predict segmentation on custom point cloud

I'm currently working on my bachelor project and I'm using the PointNet deep neural network.
My project group and I have created a dataset of point clouds(an unsorted list of x amount of 3d coordinates) and segmentation files, but we can't train PointNet to predict segmentation with the dataset.
Each segmentation file is a list containing the same amount of rows, as points in the corresponding point cloud, and each row is either a 1 or a 2, depending on the corresponding point belonging to segment 1 or 2.
When PointNet predicts it outputs a list of x elements, where each element is the segment that PointNet predicts the corresponding point belongs to.
When we run the benchmark dataset from the original PointNet implementation, the system runs and can predict segmentation, so we know that the error is in the dataset somewhere, even though we have tried our best to have our dataset look like the original benchmark dataset.
The implemented PointNet uses pytorch conv2d, maxpool2d and linear transformation. For calculating the loss, both the nn.functional.nll_loss and the nn.NLLLos functions have been used. When using the nn.NLLLos the weight parameter was set to a tensor of [1,100] to combat potential imbalance of the data.
These are the thing we have tried:
We have tried downsampling the point clouds i.e remove points using voxel downsampling
We have tried downscaling and normalize all values so they are between 0 and 1, using this formula (data - np.min(data)) / (np.max(data) - np.min(data))
We have tried running an euclidean clustering function on the data, to have each scanned object for it self
We have tried replicating another dataset, which was created using the same raw data, which we know have worked before
In the attached link, images of the datafiles with a description can be found.
Cheers everyone

Trying to contour a CSV file in QGIS

I have rainfall data which I have imported as a csv file. It's 185 lines like this:
Name, Longitude, Latitude, Elevation, TotalPrecipitation
BURLINGTON, -72.932, 41.794, 505, M
BAKERSVILLE, -73.008, 41.842, 686, 42.40
BARKHAMSTED, -72.964, 41.921, 710, M
NORFOLK 2 SW, -73.221, 41.973, 1340, 44.22
Looking at the layer properties the latitude and longitude are brought in as "double" but the rainfall amounts come in as "text" so I can't contour them.
How can I get beyond this point and where do I go to do the contouring? Do I go to Vector:Contour? Will it understand M is missing data or will the Ms still exist if this is converted to "double?"
I'm a little confused. Thanks for the help.
I think I might have the idea of help.
Since you have the sort of points located randomly across some area you could do as follows:
Load CSV to your QGIS in order to set the point layer with an attribute table including your most important value, which is Total Precipitation. Let's call it the TEST layer
Processing Toolbox -> TIN Interpolation -> Select the TEST layer. As an Interpolation attribute choose "Total precipitation". Use the green "+" symbol for adding this selection. Don't forget about the Extent option, where you could define the bounds of your interpolation. Preferably I wouldn't exceed the layer I am working on. Output raster size is also important - avoid a small number of rows. Put them about 10 optionally in order to make your interpolation efficient.
https://www.qgistutorials.com/en/docs/3/interpolating_point_data.html
Main bar -> Raster -> Extraction -> Contour
In the input layer select TEST, Interval contours between lines can be 10 (10mm in your case), Attribute name - put PRECIPITATION -> click Run
Your precipitation lines are ready! Now, you can Right-Click -> Properties -> Symbology (change color) or _>Labels (provide labels based on your attribute column Total Precipitation).

Why Caffe's Accuracy layer's bottoms consist of InnerProduct and Label?

I'm new to caffe, in the MNIST example, I was thought that label should compared with softmax layers, but it was not the situation in lenet.prototxt.
I wonder why use InnerProduct result and label to get the accuracy, it seems unreasonable. Was it because I missed something in the layers?
The output dimension of the last inner product layer is 10, which corresponds to the number of classes (digits 0~9) of your problem.
The loss layer, takes two blobs, the first one being the prediction(ip2) and the second one being the label provided by the data layer.
loss_layer = SoftmaxLossLayer(name="loss", bottoms=[:ip2,:label])
It does not produce any outputs - all it does is to compute the loss function value, report it when backpropagation starts, and initiates the gradient with respect to ip2. This is where all magic starts.
After training ( in TEST phase), In the last layer desire results come from multiply weights and ip1( that are computed in last layer ); And each of class( one of 10 neurons) has max value is choosen.

Disabling interpolation in SSRS 2012 line charts

I have a chart where the x axis is comprised of dates. I give the chart data rows, with each row corresponding to a unique date, but occasionally dates are missing. Currently, the line chart's series just connects the existing points with a straight line. What I want is for the line to stop before non-existing datapoints such that the missing points are obvious.
Let me know if I can provide more information.
I would customize the Empty Points properties for that series, e.g. setting the Empty Points / Color property to "No Color".
For more info:
http://msdn.microsoft.com/en-us/library/dd207051.aspx