Coefficient in support vector regression (SVR) using grid search (GridSearchCV) and Pipeline in Scikit Learn - regression

I am having trouble to access the coefficients of a support vector regression model (SVR) in scikit learn when the model is embedded in a pipeline and a grid search.
Consider the following example:
from sklearn.datasets import load_iris
import numpy as np
from sklearn.grid_search import GridSearchCV
from sklearn.svm import SVR
from sklearn.feature_selection import SelectKBest
from sklearn.pipeline import Pipeline
iris = load_iris()
X_train = iris.data
y_train = iris.target
clf = SVR(kernel='linear')
select = SelectKBest(k=2)
steps = [('feature_selection', select), ('svr', clf)]
pipeline = Pipeline(steps)
grid = GridSearchCV(pipeline, param_grid={"svr__C":[10,10,100],"svr__gamma": np.logspace(-2, 2)})
grid.fit(X_train, y_train)
This seems to work fine but when I try to access the coefficient of the best fitting model
grid.best_estimator_.coef_
I get an error message: AttributeError: 'Pipeline' object has no attribute 'coef_'.
I also tried to access the individual steps of the pipeline:
pipeline.named_steps['svr']
but could not find the coefficients there.

Just happened to come across the same problem and this post
had the answer:
grid.best_estimator_ contains an instance of the pipeline, which consists of steps. The last step should always be the estimator, so you should always find the coefficients at:
grid.best_estimator_.steps[-1][1].coef_

Related

Huggingface TFRobertaForSequenceClassification.from_pretrained("cardiffnlp/twitter-roberta-base-emotion", num_labels=1) give ValueError

Reproducible on google colab (transformers 4.24.0).
from transformers import TFAutoModelForSequenceClassification
model = TFRobertaForSequenceClassification.from_pretrained("cardiffnlp/twitter-roberta-base-emotion", num_labels=1)
I would like to set num_labels=1 because I would like to use it as a regression model. But the above code will give ValueError:
ValueError: cannot reshape array of size 3072 into shape (768,1)
I remembered doing this for the distillbert model worked. Is this the right way of calling this? Or this is not supported by HF for anything than the Bert families?

Add a TensorBoard metric from my PettingZoo environment

I'm using Tensorboard to see the progress of the PettingZoo environment that my agents are playing. I can see the reward go up with time, which is good, but I'd like to add other metrics that are specific to my environment. i.e. I'd like TensorBoard to show me more charts with my metrics and how they improve over time.
The only way I could figure out how to do that was by inserting a few lines into the learn method of OnPolicyAlgorithm that's part of SB3. This works and I got the charts I wanted:
(The two bottom charts are the ones I added.)
But obviously editing library code isn't a good practice. I should make the modifications in my own code, not in the libraries. Is there currently a more elegant way to add a metric from my PettingZoo environment into TensorBoard?
You can add a callback to add your own logs. See the below example. In this case the call back is called every step. There are other callbacks that you case use depending on your use case.
import numpy as np
from stable_baselines3 import SAC
from stable_baselines3.common.callbacks import BaseCallback
model = SAC("MlpPolicy", "Pendulum-v1", tensorboard_log="/tmp/sac/", verbose=1)
class TensorboardCallback(BaseCallback):
"""
Custom callback for plotting additional values in tensorboard.
"""
def __init__(self, verbose=0):
super(TensorboardCallback, self).__init__(verbose)
def _on_step(self) -> bool:
# Log scalar value (here a random variable)
value = np.random.random()
self.logger.record('random_value', value)
return True
model.learn(50000, callback=TensorboardCallback())

How to solve EfficientDet with Pytorch Runtime error?

I wanna training EfficientDet with PyTorch-Lightning,
but, i have a problem.
Most of those things were version error.
so, i matched version that model
from effdet.config.model_config import efficientdet_model_param_dict
from effdet import get_efficientdet_config, EfficientDet, DetBenchTrain
from effdet.efficientdet import HeadNet
from effdet.config.model_config import efficientdet_model_param_dict
when i imported effdet, i got runtime error like this
RuntimeError:
object has no attribute nms:
File "/home/ubuntu/anaconda3/envs/pytorch1.7.1_p37/lib/python3.7/site-packages/torchvision/ops/boxes.py", line 35
"""
_assert_has_ops()
return torch.ops.torchvision.nms(boxes, scores, iou_threshold)
~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
'nms' is being compiled since it was called from '_batched_nms_vanilla'
# Ideally for GPU we'd use a higher threshold
# keep only top max_det_per_image scoring predictions
batch_detections.append(detections)
return torch.stack(batch_detections, dim=0)
how to solve this problem?
i wanna modeling quickly...
also, this is version status

Palantir Foundry How to allow dynamic number of input in compute (Code repository)

I have a folder where I will upload one file every month. The file will have the same format in every month.
First problem
The idea is to concatenate all the files in this folder into one file. Currently I am hardcoding the filenames (filename[0], filename[1], filename[2]..) but imagine later I will have 50 files, should I explicitly add them to the transform_df decorator? Is there any other method to handle this?
Second problem:
Currently I have let's say 4 files (2021_07, 2021_08, 2021_09, 2021_10) and I want whenever I add the file presenting 2021_12 data to avoid changing the code.
If I add input_5 = Input(path_to_2021_12_do_not_exists) the code will not be run and give an error.
How can I implement the code for future files and let the code ignore the input if it does not exist without manually each month add a new value to my code?
Thank you
# from pyspark.sql import functions as F
from transforms.api import transform_df, Input, Output
from pyspark.sql.functions import to_date, year, col
from pyspark.sql.types import StringType
from myproject.datasets import utils
from pyspark.sql import DataFrame
from functools import reduce
input_dir = '/Company/Project_name/'
prefix_filename = 'DataInput1_'
suffixes = ['2021_07', '2021_08', '2021_09', '2021_10', '2021_11', '2021_12']
filenames = [input_dir + prefix_filename + suffixe for suffixe in suffixes]
#transform_df(
Output("/Company/Project_name/Data/clean/File_concat"),
input_1=Input(filenames[0]),
input_2=Input(filenames[1]),
input_3=Input(filenames[2]),
input_4=Input(filenames[3]),
)
def compute(input_1, input_2, input_3, input_4):
input_dfs = [input_1, input_2, input_3, input_4]
dfs = []
def transformation_input(df):
# some transformation
return df
for input_df in input_dfs:
dfs.append(transformation_input(input_df))
dfs = reduce(DataFrame.unionByName, dfs)
return dfs
This question comes up a lot, the simple answer is that you don't. Defining datasets and executing a build on them are two different steps executed at different stages.
Whenever you commit your code and run the checks, your overall python code is executed during the renderSchrinkwrap stage, except for the compute part. This allows Foundry to discover what datasets exist and publish.
Publishing involves creating your dataset and putting whatever is inside your compute function is published into the jobspec of the dataset, so foundry knows what code to execute whenever you run a build.
Once you hit build on the dataset, Foundry will only pick up whatever is on the jobspec and execute it. Any other code has already run during your checks, and it has run just once.
So any dynamic input/output would require you to re-run checks on your repo, which means that some code change would have had to happen since the Checks is part of the CI process, not part of the build.
Taking a step back, assuming each of your input files has the same schema, Foundry would expect you to have all of those files in the same dataset as append transactions.
This might not be possible though, if for instance, the only indication of the "year" of the data is embedded in the filename, but your sample code would indicate that you expect all these datasets to have the same schema and easily union together.
You can do this manually through the Dataset Preview - just use the Upload File button or drag-and-drop the new file into the Preview window - or, if it's an "end user" workflow, with a File Upload Widget in a Workshop app. You may need to coordinate with your Foundry support team if this widget isn't available.
Bit late to the post although for anyone who is interested in an answer to most of the question. Dynamically determining file names from within a folder is not doable although having some level of dynamic input is possible as follows:
# from pyspark.sql import functions as F
from transforms.api import transform, Input, Output
from pyspark.sql.functions import to_date, year, col
from pyspark.sql.types import StringType
from myproject.datasets import utils
from pyspark.sql import DataFrame
# from functools import reduce
from transforms.verbs.dataframes import union_many # use this instead of reduce
input_dir = '/Company/Project_name/'
prefix_filename = 'DataInput1_'
suffixes = ['2021_07', '2021_08', '2021_09', '2021_10', '2021_11', '2021_12']
filenames = [input_dir + prefix_filename + suffixe for suffixe in suffixes]
inputs = {('input{}'.format(index)): Input(filename) for (index, filename) in enumerate(filenames))}
#transform(
output=Output("/Company/Project_name/Data/clean/File_concat"),
**inputs
)
def compute(output, **kwargs):
# Extract dataframes from input datasets
input_dfs = [dataset_df.dataframe() for dataset_name, dataset_df in kwargs.items()]
dfs = []
def transformation_input(df):
# some transformation
return df
for input_df in input_dfs:
dfs.append(transformation_input(input_df))
# dfs = reduce(DataFrame.unionByName, dfs)
unioned_dfs = union_many(*dfs)
return unioned_dfs
Couple points:
Created dynamic input dict.
That dict is read into the transform using **kwargs.
Using transform decorator not transform_df, we can extract the dataframes.
(not in question) Combine multiple dataframes using union_many function from transforms_verbs library.

How do you export .caffemodels to other applications?

Is it possible to translate the info in a .caffemodel file such that it could be read by (for example) Matlab. That is, is there a way to write your model using something else that prototxt and import the weights trained using Caffe?
If the answer is "Nope, it's a binary file and will always remain that way", is there some documentation regarding the structure of the file so that one could extract the important information somehow?
As you know, .caffemodel consists of weights and biases.
A simple way to read weights and biases for a caffemodel given the prototxt would be to just load the network in Python and read the weights.
You can use:
import caffe
net = caffe.Net(<prototxt-file>,<model-file>,<phase>);
and access the params from net.params
source
I'll take VGG as an example
from caffe.proto import caffe_pb2
net = caffe_pb2.NetParameter()
caffemodel = sys.argv[1]
with open(caffemodel, 'rb') as f:
net.ParseFromString(f.read())
for i in net.layer:
print i.ListFields()[0][-1]
#conv1
#relu1
#norm1
#pool1
#conv2
#relu2
#norm2
#pool2
#conv3
#relu3
#conv4
#relu4
#conv5
#relu5
#pool5
#fc6
#relu6
#drop6
#fc7
#relu7
#drop7
#fc8
#prob