the default metric for feature importance for CatBoost - catboost

I am using both CatBoost classification and regression model and having hard time to figure out what metrics are being used by feature importance.
According to the CatBoost Doc, it is PredictionValuesChange for non-ranking metrics and LossFunctionChange for ranking metrics.
I don't understand what ranking and non-ranking means here. You can rank the probability generated by any models.
In what case, ranking/non-ranking metrics are used?

Ranking is all the metrics that falls under the following categories:
catboost ranking metrics
the rest would be considered non-ranking and use PredictionValuesChange for the feature importance.

Related

How are Elastic Net penalties applied to Logistic Regression's Maximum Likelihood cost function?

I understand how Ridge / Lasso / Elastic Net regression penalties are applied to the linear regression's cost function, but I am trying to figure out how they are applied to Logistic Regression's Maximum Likelihood cost function.
I've tried looking into pages through google, and it looks like it can be done (I believe Sci-Kit's logistic regression models accept L1 and L2 parameters, and I've seen some YouTube videos saying that the penalties can be applied in logistic models too) and I've found how they are added to the sum of squared residuals cost function, but I am curious on how the penalties are applied with the Maximum Likelihood cost function. Is it Maximum Likelihood minus the penalties?
I got an answer by posting on stats stack exchange (link). I'll post the answer from ofer-a here to help you all searching on Stack Overflow for a similar answer.
The elastic net terms are added to the maximum likelihood cost function.i.e. the > final cost function is:
The first term is the likelihood, the second term is the l1 norm part of the elastic net, and the third term is the l2 norm part. i.e. the network is trying to minimize the negative log likelihood and also trying to minimize the weights.

How feature importance is calculated in regression trees?

In case of classification using decision tree algorithm or Random Forest we use gini impurity or information gain as a measure to decide which feature to select first for splitting parent/intermediate node but if we are conducting regression using decision tree or random forest then how is feature importance calculated or the features selected?
For regression (feature selection), the goal of splitting is to get two childs with the lowest variance among target values.
You can check the 'criterion' parameter from regression vs classification from sklearn library to get a better idea.
You can also check this video: https://www.youtube.com/watch?v=nSaOuPCNvlk

Using OLS regression on binary outcome variable

I have previously been told that -- for reasons that make complete sense -- one shouldn't run OLS regressions when the outcome variable is binary (i.e. yes/no, true/false, win/loss, etc). However, I often read papers in economics/other social sciences in which researchers run OLS regressions on binary variables and interpret the coefficients just like they would for a continuous outcome variable. A few questions about this:
Why do they not run a logistic regression? Is there any disadvantage/limitation to using logit models? In economics, for example, I very often see papers using OLS regression for binary variable and not logit. Can logit only be used in certain situations?
In general, when can one run an OLS regression on ordinal data? If I have a variable that captures "number of times in a week survey respondent does X", can I - in any circumstance - use it as a dependent variable in a linear regression? I often see this being done in literature as well, even though we're always told in introductory statistics/econometrics that outcome variables in an OLS regression should be continuous.
The application of applying OLS to a binary outcome is called Linear Probability Model. Compared to a logistic model, LPM has advantages in terms of implementation and interpretation that make it an appealing option for researchers conducting impact analysis. In LPM, parameters represent mean marginal effects while parameters represent log odds ratio in logistic regression. To calculate the mean marginal effects in logistic regression, we need calculate that derivative for every data point and then
calculate the mean of those derivatives. While logistic regression and the LPM usually yield the same expected average impact estimate[1], researchers prefer LPM for estimating treatment impacts.
In general, yes, we can definitely apply OLS to an ordinal outcome. Similar to the previous case, applying OLS to a binary or ordinal outcome result in violations of the assumptions of OLS. However, within econometrics, they believe the practical effect of violating these assumptions is minor and that the simplicity of interpreting an OLS outweighs the technical correctness of an ordered logit or probit model, especially when the ordinal outcome looks quasi-normal.
Reference:
[1] Deke, J. (2014). Using the linear probability model to estimate impacts on binary outcomes in randomized controlled trials. Mathematica Policy Research.

How to find the confidence score of svm model?

While going through the different evaluation methods for model, I had tried some text classification using svm model which has lots of features,and after training the model, it is able to classify the text.
But my question is, if I want to calculate the confidence score of svm model? I have checked lots of examples using predict_proba and decision function.
Many of times, predict_proba doesn't align with the model prediction and gives wrong probabilities and if I use decision function then how would i interpret the distance as confidence score? How we should define the threshold?

Example how to use catboost with the time series data

In the introduction/promo video (https://www.youtube.com/watch?v=s8Q_orF4tcI) you have mentioned that Catboost can analyse the time series historical data for weather forecasts.
But I was not able to find anything like this in tutorials: https://github.com/catboost/catboost/tree/master/catboost/tutorials
Here are some examples of time series models using CatBoost (no affiliation):
Kaggle: CatBoost - forget about time series
Forecasting Time Series with Gradient Boosting
One thing I see around that I don't have first-hand knowledge of is using the has_time parameter to specify that the observations should be ordered (and not randomized) using a timestamp column.