Build network with shortcut using torch - deep-learning

I now have a network with 2 inputs X and Y.
X concatenates Y and then pass to network to get result1. And at the same time X will concat result1 as a shortcut.
It's easy if there is only one input.
branch = nn.Sequential()
branch:add(....) --some layers
net = nn.Sequential()
net:add(nn.ConcatTable():add(nn.Identity()):add(branch))
net:add(...)
But when it comes to two inputs I don't actually know how to do it? Besides, nngraph is not allowed.Does any one know how to do it?

You can use the table modules, have a look at this page: https://github.com/torch/nn/blob/master/doc/table.md
net = nn.Sequential()
triple = nn.ParallelTable()
duplicate = nn.ConcatTable()
duplicate:add(nn.Identity())
duplicate:add(nn.Identity())
triple:add(duplicate)
triple:add(nn.Identity())
net:add(triple)
net:add(nn.FlattenTable())
-- at this point the network transforms {X,Y} into {X,X,Y}
separate = nn.ConcatTable()
separate:add(nn.SelectTable(1))
separate:add(nn.NarrowTable(2,2))
net:add(separate)
-- now you get {X,{X,Y}}
parallel_XY = nn.ParallelTable()
parallel_XY:add(nn.Identity()) -- preserves X
parallel_XY:add(...) -- whatever you want to do from {X,Y}
net:add(parallel)
parallel_Xresult = nn.ParallelTable()
parallel_Xresult:add(...) -- whatever you want to do from {X,result}
net:add(parallel_Xresult)
output = net:forward({X,Y})
The idea is to start with {X,Y}, to duplicate X and to do your operations. This is clearly a bit complicated, nngraph is supposed to be here to do that.

Related

Sequence to Sequence Loss

I'm trying to figure out how sequence to sequence loss is calculated. I am using the huggingface transformers library in this case, but this might actually be relevant to other DL libraries.
So to get the required data we can do:
from transformers import EncoderDecoderModel, BertTokenizer
import torch
import torch.nn.functional as F
torch.manual_seed(42)
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
MAX_LEN = 128
tokenize = lambda x: tokenizer(x, max_length=MAX_LEN, truncation=True, padding=True, return_tensors="pt")
model = EncoderDecoderModel.from_encoder_decoder_pretrained('bert-base-uncased', 'bert-base-uncased') # initialize Bert2Bert from pre-trained checkpoints
input_seq = ["Hello, my dog is cute", "my cat cute"]
output_seq = ["Yes it is", "ok"]
input_tokens = tokenize(input_seq)
output_tokens = tokenize(output_seq)
outputs = model(
input_ids=input_tokens["input_ids"],
attention_mask=input_tokens["attention_mask"],
decoder_input_ids=output_tokens["input_ids"],
decoder_attention_mask=output_tokens["attention_mask"],
labels=output_tokens["input_ids"],
return_dict=True)
idx = output_tokens["input_ids"]
logits = F.log_softmax(outputs["logits"], dim=-1)
mask = output_tokens["attention_mask"]
Edit 1
Thanks to #cronoik I was able to replicate the loss calculated by huggingface as being:
output_logits = logits[:,:-1,:]
output_mask = mask[:,:-1]
label_tokens = output_tokens["input_ids"][:, 1:].unsqueeze(-1)
select_logits = torch.gather(output_logits, -1, label_tokens).squeeze()
huggingface_loss = -select_logits.mean()
However, since the last two tokens of the second input is just padding, shouldn't we calculate the loss to be:
seq_loss = (select_logits * output_mask).sum(dim=-1, keepdims=True) / output_mask.sum(dim=-1, keepdims=True)
seq_loss = -seq_loss.mean()
^This takes into account the length of the sequence of each row of outputs, and the padding by masking it out. Think this is especially useful when we have batches of varying length outputs.
ok I found out where I was making the mistakes. This is all thanks to this thread in the HuggingFace forum.
The output labels need to have -100 for the masked version. The transoformers library does not do it for you.
One silly mistake I made was with the mask. It should have been output_mask = mask[:, 1:] instead of :-1.
1. Using Model
We need to set the masks of output to -100. It is important to use clone as shown below:
labels = output_tokens["input_ids"].clone()
labels[output_tokens["attention_mask"]==0] = -100
outputs = model(
input_ids=input_tokens["input_ids"],
attention_mask=input_tokens["attention_mask"],
decoder_input_ids=output_tokens["input_ids"],
decoder_attention_mask=output_tokens["attention_mask"],
labels=labels,
return_dict=True)
2. Calculating Loss
So the final way to replicate it is as follows:
idx = output_tokens["input_ids"]
logits = F.log_softmax(outputs["logits"], dim=-1)
mask = output_tokens["attention_mask"]
# shift things
output_logits = logits[:,:-1,:]
label_tokens = idx[:, 1:].unsqueeze(-1)
output_mask = mask[:,1:]
# gather the logits and mask
select_logits = torch.gather(output_logits, -1, label_tokens).squeeze()
-select_logits[output_mask==1].mean(), outputs["loss"]
The above however ignores the fact that this comes from two different lines. So an alternate way of calculating loss could be:
seq_loss = (select_logits * output_mask).sum(dim=-1, keepdims=True) / output_mask.sum(dim=-1, keepdims=True)
seq_loss.mean()
thanks for sharing. However, the new version of transformers as of today actually does not "shift" anymore. The following is not needed.
#shift things
output_logits = logits[:,:-1,:]
label_tokens = idx[:, 1:].unsqueeze(-1)
output_mask = mask[:,1:

anova_test not returning Mauchly's for three way within subject ANOVA

I am using a data set called sleep (found here: https://drive.google.com/file/d/15ZnsWtzbPpUBQN9qr-KZCnyX-0CYJHL5/view) to run a three way within subject ANOVA comparing Performance based on Stimulation, Deprivation, and Time. I have successfully done this before using anova_test from rstatix. I want to look at the sphericity output but it doesn't appear in the output. I have got it to come up with other three way within subject datasets, so I'm not sure why this is happening. Here is my code:
anova_test(data = sleep, dv = Performance, wid = Subject, within = c(Stimulation, Deprivation, Time))
I also tried to save it to an object and use get_anova_table, but that didn't look any different.
sleep_aov <- anova_test(data = sleep, dv = Performance, wid = Subject, within = c(Stimulation, Deprivation, Time))
get_anova_table(sleep_aov, correction = "GG")
This is an ideal dataset I pulled from the internet, so I'm starting to think the data had a W of 1 (perfect sphericity) and so rstatix is skipping this output. Is this something anova_test does?
Here also is my code using a dataset that does return Mauchly's:
weight_loss_long <- pivot_longer(data = weightloss, cols = c(t1, t2, t3), names_to = "time", values_to = "loss")
weight_loss_long$time <- factor(weight_loss_long$time)
anova_test(data = weight_loss_long, dv = loss, wid = id, within = c(diet, exercises, time))
Not an expert at all, but it might be because your factors have only two levels.
From anova_summary() help:
"Value
return an object of class anova_test a data frame containing the ANOVA table for independent measures ANOVA. However, for repeated/mixed measures ANOVA, it is a list containing the following components are returned:
ANOVA: a data frame containing ANOVA results
Mauchly's Test for Sphericity: If any within-Ss variables with more than 2 levels are present, a data frame containing the results of Mauchly's test for Sphericity. Only reported for effects that have more than 2 levels because sphericity necessarily holds for effects with only 2 levels.
Sphericity Corrections: If any within-Ss variables are present, a data frame containing the Greenhouse-Geisser and Huynh-Feldt epsilon values, and corresponding corrected p-values. "

Trying to define a function that creates lists from files and uses random.choices to choose an element from the weighted lists

I'm trying to define a function that will create lists from multiple text files and print a random element from one of the weighted lists. I've managed to get the function to work with random.choice for a single list.
enter code here
def test_rollitems():
my_commons = open('common.txt')
all_common_lines = my_commons.readlines()
common = []
for i in all_common_lines:
common.append(i)
y = random.choice(common)
print(y)
When I tried adding a second list to the function it wouldn't work and my program just closes when the function is called.
enter code here
def Improved_rollitem():
#create the lists from the files#
my_commons = open('common.txt')
all_common_lines= my_commons.readlines()
common = []
for i in all_common_lines:
common.append(i)
my_uncommons = open('uncommon.txt')
all_uncommon_lines =my_uncommons.readlines()
uncommon =[]
for i in all_uncommon_lines:
uncommon.apend(i)
y = random.choices([common,uncommon], [80,20])
print(y)
Can anyone offer any insight into what I'm doing wrong or missing ?
Nevermind. I figured this out on my own! Was having issues with Geany so I installed Pycharm and was able to work through the issue. Correct code is:
enter code here
def Improved_rollitem():
#create the lists from the files#
my_commons = open('common.txt')
all_common_lines= my_commons.readlines()
common = []
for i in all_common_lines:
common.append(i)
my_uncommons = open('uncommon.txt')
all_uncommon_lines =my_uncommons.readlines()
uncommon =[]
for i in all_uncommon_lines:
uncommon.append(i)
y = random.choices([common,uncommon], [.8,.20])
if y == [common]:
for i in [common]:
print(random.choice(i))
if y == [uncommon]:
for i in [uncommon]:
print(random.choice(i))
If there's a better way to do something like this, it would certainly be cool to know though.

How to get dataset into array

I have worked all the tutorials and searched for "load csv tensorflow" but just can't get the logic of it all. I'm not a total beginner, but I don't have much time to complete this, and I've been suddenly thrown into Tensorflow, which is unexpectedly difficult.
Let me lay it out:
Very simple CSV file of 184 columns that are all float numbers. A row is simply today's price, three buy signals, and the previous 180 days prices
close = tf.placeholder(float, name='close')
signals = tf.placeholder(bool, shape=[3], name='signals')
previous = tf.placeholder(float, shape=[180], name = 'previous')
This article: https://www.tensorflow.org/guide/datasets
It covers how to load pretty well. It even has a section on changing to numpy arrays, which is what I need to train and test the 'net. However, as the author says in the article leading to this Web page, it is pretty complex. It seems like everything is geared toward doing data manipulation, where we have already normalized our data (nothing has really changed in AI since 1983 in terms of inputs, outputs, and layers).
Here is a way to load it, but not in to Numpy and no example of not manipulating the data.
with tf.Session as sess:
sess.run( tf.global variables initializer())
with open('/BTC1.csv') as csv_file:
csv_reader = csv.reader(csv_file, delimiter =',')
line_count = 0
for row in csv_reader:
?????????
line_count += 1
I need to know how to get the csv file in to the
close = tf.placeholder(float, name='close')
signals = tf.placeholder(bool, shape=[3], name='signals')
previous = tf.placeholder(float, shape=[180], name = 'previous')
so that I can follow the tutorials to train and test the net.
It's not that clear for me your question. You might be answering, tell me if I'm wrong, how to feed data in your model? There are several fashions to do so.
Use placeholders with feed_dict during the session. This is the basic and easier one but often suffers from training performance issue. Further explanation, check this post.
Use queue. Hard to implement and badly documented, I don't suggest, because it's been taken over by the third method.
tf.data API.
...
So to answer your question by the first method:
# get your array outside the session
with open('/BTC1.csv') as csv_file:
csv_reader = csv.reader(csv_file, delimiter =',')
dataset = np.asarray([data for data in csv_reader])
close_col = dataset[:, 0]
signal_cols = dataset[:, 1: 3]
previous_cols = dataset[:, 3:]
# let's say you load 100 row each time for training
batch_size = 100
# define placeholders like you
...
with tf.Session() as sess:
...
for i in range(number_iter):
start = i * batch_size
end = (i + 1) * batch_size
sess.run(train_operation, feed_dict={close: close_col[start: end, ],
signals: signal_col[start: end, ],
previous: previous_col[start: end, ]
}
)
By the third method:
# retrieve your columns like before
...
# let's say you load 100 row each time for training
batch_size = 100
# construct your input pipeline
c_col, s_col, p_col = wrapper(filename)
batch = tf.data.Dataset.from_tensor_slices((close_col, signal_col, previous_col))
batch = batch.shuffle(c_col.shape[0]).batch(batch_size) #mix data --> assemble batches --> prefetch to RAM and ready inject to model
iterator = batch.make_initializable_iterator()
iter_init_operation = iterator.initializer
c_it, s_it, p_it = iterator.get_next() #get next batch operation automatically called at each iteration within the session
# replace your close, signal, previous placeholder in your model by c_it, s_it, p_it when you define your model
...
with tf.Session() as sess:
# you need to initialize the iterators
sess.run([tf.global_variable_initializer, iter_init_operation])
...
for i in range(number_iter):
start = i * batch_size
end = (i + 1) * batch_size
sess.run(train_operation)
Good luck!

Is there an easy way to sort regressors by the impact they make in Stata?

Instead of the traditional regression output, I want to get a table with two columns A and B. Column A contains a list of regressors and column B contains their impacts which equal:
b_hat(x) / sigma(x)
where b_hat(x) is the marginal effect on the dependent variable due to a 1 unit change in x, and sigma(x) is a standard deviation of x.
It would be great if the list is sorted by impact.
This is a blunt-force way of doing what you want. You can get the idea & use it to write some sort of custom program to do something like this on the fly. This is really rough so I will appreciate comments to make this cleaner or more robust. Right now it would be pretty sensitive to rather minor things.
tempfile temp
sysuse auto, clear
local set "headroom trunk length turn displacement "
foreach var of varlist `set' {
egen `var'_std=std(`var')
}
local set "headroom_std trunk_std length_std turn_std displacement_std "
reg mpg `set'
preserve
mat A=e(b)
clear
set obs 1
g name = "`set'"+"cons"
split name,p(" ")
drop name
g index=_n
reshape long name,i(index) j(num) string
save `temp'
clear
svmat A
g index=_n
reshape long A,i(index) j(num)
tostring num, replace
merge 1:1 num using `temp', nogen assert(3)
drop if name=="cons"
gsort -A
replace num=string(_n)
keep num name index
reshape wide name,i(index) j(num) string
egen ordered=concat(name1-name5),p(" ")
local set2 =ordered[1]
dis "`set'"
dis "`set2'"
restore
reg mpg `set2'
Here's an approach in which the variable names are entered only once. It uses the -mm_ranks- command from Ben Jann's MOREMATA package at SSC and puts the sorted result into a new Stata data set.
sysuse auto, clear
local lhs turn
local rhs length foreign weight
putmata a = (`rhs'), view replace
mata: st_matrix("sd",sqrt(diagonal(variance(a))))
reg `lhs' `rhs'
matrix b = r(table)'
matrix b = b[1..rowsof(b)-1,1]
mata: c = abs(st_matrix("b"):/st_matrix("sd"))
mata: rank = rows(c):-mm_ranks(c):+1 /* SSC package MOREMATA */
mata: st_matrix("n",(c,rank))
mat n = (b, sd, n)
mat colnames n = beta sd impact rank
clear
svmat n , names(col)
tempfile t1
save `t1'
clear
mat np = n'
svmat np, names(col)
keep in 1
xpose, varname clear
keep _varname
qui merge 1:1 _n using `t1'
drop _merge
sort rank
list