How do space-filling parameter grids change when we call them by parsnip and recipe packages? - tidymodels

According to this (https://dials.tidymodels.org/reference/grid_max_entropy.html) page, the output of a grid function may vary depending on whether we use a parameter object produced by parsnip and recipes, or we directly use a parameter object. However, it is not clear how one can obtain the parameter range if we use parameters object created from a model or recipe. For example, the default range of neighbors() is [1, 10]. When I use
myGrid <- grid_latin_hypercube(extract_parameter_set_dials(x), size = 25)
where x is a model, and obtain the range using
range(myGrid$neighbors)
I get some values for my neighbors that fall outside [1,10]. How can I obtain the default range of neighbors when I use a parameters object created from a model or recipe?

As you saw in the documentation:
the parsnip and recipe packages may override the default ranges for specific models and preprocessing steps. If the grid function uses a parameters object created from a model or recipe, the ranges may have different defaults (specific to those models)
You can see the ranges by looking at the object element of the parameter set object.
library(tidymodels)
knn_spec <- nearest_neighbor(neighbors = tune())
extract_parameter_set_dials(knn_spec)$object
#> [[1]]
#> # Nearest Neighbors (quantitative)
#> Range: [1, 15]
Created on 2022-05-13 by the reprex package (v2.0.1)

Related

Difference between freezing layer with requires_grad and not passing params to optim in PyTorch

Let's say I train an autoencoder.
I want to freeze the parameters of the encoder for the training, so only the decoder trains.
I can do this using:
# assuming it's a single layer called 'encoder'
model.encoder.weights.data.requers_grad = False
Or I can pass only the decoder's parameters to the optimizer. Is there a difference?
The most practical way is to iterate through all parameters of the module you want to freeze and set required_grad to False. This gives you the flexibility to switch your modules on and off without having to initialize a new optimizer each time. You can do this using the parameters generator available on all nn.Modules:
for param in module.parameters():
param.requires_grad = False
This method is model agnostic since you don't have to worry whether your module contains multiple layers or sub-modules.
Alternatively, you can call the function nn.Module.requires_grad_ once as:
module.requires_grad_(False)

in Kdb, how do I assign a list of symbols each to a list of values

For example, I want to pass a dictionary with 10 pairs into a function to bypass the 8 valence limit. I then want the keys of the dictionary each to be assigned as a local variable to be assigned to their value and use that as the parameters for my function. Alternatively, is there a better way to pull this off?
I am afraid there is no way to functionally assign value to local scope variable. E.g.
{eval parse "a: 10";}1b
creates global variable a.
You may fix some scope name, e.g. .l, keep variables there and clear scope before function return, e.g.:
{
eval each (:),'flip (`$".l.",/:string key x;value x);
r: .l.a + .l.b + .l.c;
delete from `.l;
r
}`a`b`c!1 2 3
But getting values directly from input dictionary (like x[`a]) seems easier and clearer approach.
apply helps to simplify invoking other functions, using dictionary values as parameters. This may be what you are looking for. E.g.
{
f: {x+y+z};
f . x`a`b`c
}`a`b`c!1 2 3

Is there a tool available within Foundry that can automatically populate column descriptions? If so, what is it called?

We are looking to see if there is a tool within the Foundry platform that will allow us to have a list of field descriptions and when the dataset builds, it can populated those descriptions automatically. Does this exist and if so what is the tool called?
If you upgrade your Code Repository to version 1.184.0+, this is released and available from this point onwards.
The method you use to push output column descriptions is to provide a new optional argument to your TransformOutput.write_dataframe(), namely column_descriptions.
This argument should be a dict with keys of column names and values of column descriptions (up to 200 characters in length for stability reasons).
The code will automatically compute the intersection of the column names available on your pyspark.sql.DataFrame and the keys in the dict you provide, so it won't try to put descriptions on columns that don't exist.
The code you use to run this process looks like this:
from transforms.api import transform, Input, Output
#transform(
my_output=Output("/my/output"),
my_input=Input("/my/input"),
)
def my_compute_function(my_input, my_output):
my_output.write_dataframe(
my_input.dataframe(),
column_descriptions={
"col_1": "col 1 description"
}
)

how to find centrality of nodes within clusters using i- graph and python

I'm working on network analysis and I'm new to python. I want to find out the centrality of every node within a cluster using i graph and python pandas.
I have tried the following:
Creating a graph:
tuples = [tuple(x) for x in data.values]
g=igraph.Graph.TupleList(tuples, directed = False,weights=True)
community detection using fast greedy algorithm:
fm = g.community_fastgreedy()
fm1 = fm.as_clustering()
clusters like this are formed:
[1549] 96650006, 966543799, 966500080
[1401] 96650006, 966567865, 966500069, 966500071
Now, I would like to get the eigenvalue centrality for each number within a cluster, so that i know which is the most important number within a cluster.
I am not very familiar with the eigenvector centrality in igraph, but here is the following solution I came up with:
# initial code is the same as yours
import numpy as np
# iterate over all created subgraphs created:
for subgraph in fm1.subgraphs():
# this is basically already what you want
cents = subgraph.eigenvector_centrality()
# get additionally the index of the respective vector
max_idx = np.argmax(cents)
print(subgraph.vs[max_idx]) # gets the correct vertex element.
Essentially, you want to utilize the option to access the created clusters as a graph (.subgraphs() allows you to do exactly that). The rest is then "just" simple manipulation of the graph object to get the element with the respective maximum eigenvector centrality.

Seq2Seq in CNTK : Run Time Error Function Only supports 2 dynamic axis

I am trying to implement a basic translation model where input and output are sentences in different languages in CNTK using LSTMs.
To achieve this I am creating model as follows :
def create_model(x):
with c.layers.default_options():
m = c.layers.Recurrence(c.layers.LSTM(input_vocab_size))(x)
m = sequence.last(m)
y = c.layers.Recurrence(c.layers.LSTM(label_vocab_size))(m)
return m
batch_axis = Axis.default_batch_axis()
input_seq_axis = Axis('inputAxis')
input_dynamic_axes = [batch_axis, input_seq_axis]
raw_input = input_variable(shape = (input_vocab_dim), dynamic_axes = input_dynamic_axes, name = 'raw_input')
z= create_model(raw_input)
But I am getting following error :
RuntimeError: Currently PastValue/FutureValue Function only supports input operand with 2 dynamic axis (1 sequence-axis and 1 batch-axis)
As per I understand, dynamic axis are basically those axis which gets decided after data gets loaded, in this case batch size and length of input sentence. I don't think I am changing the dynamic axis of input anywhere.
Any help is highly appreciated.
The last() operation strips the dynamic axis, since it reduces the input sequence to a single value (the thought vector).
The thought vector should then become the initial state for the second recurrence. So it should not be passed as the data argument to the second recurrence.
In the current version, the initial_state argument of Recurrence() cannot be data dependent. This will be soon possible, it is already under code review and will be merged to master soon.
Until then, there is a more complicated way to pass a data-dependent initial state, where you manually construct the recurrence (without Recurrence() layer), and manually add the initial hidden state in the recurrence. It is illustrated in the sequence-2-sequence sample.
This might be :
input_dynamic_axes= [Axis.default_batch_axis(), Axis.default_dynamic_axis()]
The first one will be the number of sample in your minibatch, the second will be the sequence length automagically inferred by CNTK