Why the Loss function does not decrease significantly in Flux.jl - deep-learning

After trying some optimizations on activation function and epochs value , it is not possible to fit the model to y data which is a function of the input data.
using Flux, Plots, Statistics
x = Array{Float64}(rand(5, 100));
w = [diff(x[1,:]); 0]./x[1,:];
y1 = cumsum(cos.(cumsum(w)));
scatter(y1)
y = reshape(y1, (1, 100));
data = [(x, y)];
model = Chain(Dense(5 => 100), Dense(100 => 1), identity)
model[1].weight;
loss(m, x, y) = Flux.mse(m(x), y)
Flux.mse(model(x), y)
Flux.mse(model(x), y) == mean((model(x) .- y).^2)
opt_stat = Flux.setup(ADAM(), model)
loss_history = []
epochs = 10000
for epoch in 1:epochs
Flux.train!(loss, model, data, opt_stat)
# print report
train_loss = Flux.mse(model(x), y)
push!(loss_history, train_loss)
println("Epoch = $epoch : Training Loss = $train_loss")
end
ŷ = model(x)
Flux.mse(model(x), y)
Y = reshape(ŷ, (100, 1));
scatter(Y)

Related

Tensor conversion requested dtype int64 for Tensor with dtype float32 when creating CNN model

I tried to create a CNN but I got an error. Could you figure out what I did wrong?
C1, C2 = tf.constant(70, dtype='float32'), tf.constant(1000, dtype="float32")
def score(y_true, y_pred):
tf.dtypes.cast(y_true, tf.float32)
tf.dtypes.cast(y_pred, tf.float32)
sigma = y_pred[:, 2] - y_pred[:, 0]
fvc_pred = y_pred[:, 1]
# sigma_clip = sigma + C1
sigma_clip = tf.maximum(sigma, C1)
delta = tf.abs(y_true[:, 0] - fvc_pred)
delta = tf.minimum(delta, C2)
sq2 = tf.sqrt(tf.dtypes.cast(2, dtype=tf.float32))
metric = (delta / sigma_clip) * sq2 + tf.math.log(sigma_clip * sq2)
return K.mean(metric)
def mloss(_lambda):
def loss(y_true, y_pred):
return _lambda * qloss(y_true, y_pred) + (1 - _lambda) * score(y_true, y_pred)
return loss
def make_model():
z = L.Input((9,), name="Patient")
x = L.Dense(100, activation="relu", name="d1")(z)
x = L.Dense(100, activation="relu", name="d2")(x)
p1 = L.Dense(3, activation="linear", name="p1")(x)
p2 = L.Dense(3, activation="relu", name="p2")(x)
preds = L.Lambda(lambda x: x[0] + tf.cumsum(x[1], axis=1), name="preds")([p1, p2])
model = M.Model(z, preds, name="CNN")
model.compile(loss=mloss(0.8), optimizer="adam", metrics=[score])
return model
net = make_model()
net.fit(z[tr_idx], y[tr_idx], batch_size=200, epochs=1000,
validation_data=(z[val_idx], y[val_idx]), verbose=0)
Error here: ValueError: Tensor conversion requested dtype int64 for
Tensor with dtype float32: <tf.Tensor 'CNN/preds/add:0' shape=(None,
3) dtype=float32>
type cast but didn't solve the problem

Why network is not learning with this loss?

I've been playing around a bit with Pytorch and have created a convolutional network with a total of 3 layers. I created a loss function that takes the results from the first layer and tries to minimize the norm.
So that view2 displays the data after the first layer in a matrix.
During learning, the error did not change at all, and the city was equal to 1 the whole time.
I know that this code doesn't make sense, but I am very intersting to her very this code is not working.
data = sio.loadmat('ORL_32x32.mat')
x, y = data['fea'], data['gnd']
x, y = data['fea'].reshape((-1, 1, 32, 32)), data['gnd']
y = np.squeeze(y - 1) # y in [0, 1, ..., K-1]
class ConvAutoencoder(nn.Module):
def __init__(self):
super(ConvAutoencoder, self).__init__()
## encoder layers ##
# conv layer (depth from 3 --> 16), 3x3 kernels
self.conv1 = nn.Conv2d(1, 3, 3)
self.conv2 = nn.Conv2d(3 ,3, 3)
self.conv3 = nn.Conv2d(3, 3, 3)
self.conv4 = nn.Conv2d(3, 3, 3)
def forward(self, x):
x = F.relu(self.conv1(x))
x = F.relu(self.conv2(x))
x = F.relu(self.conv3(x))
x = F.relu(self.conv4(x))
return x
def test1(self, x):
x = F.relu(self.conv1(x))
x = F.relu(self.conv2(x))
return x
def test2(self, x):
x = F.relu(self.conv1(x))
x = F.relu(self.conv2(x))
x = F.relu(self.conv3(x))
x = F.relu(self.conv4(x))
return x
def my_loss(novi2):
return torch.tensor(LA.norm(novi2)).to(device)
model = ConvAutoencoder().to(device)
epochs = 950;
lossList = []
view2 = np.zeros((576,400))
view3 = np.zeros((576,400))
losses = torch.tensor(0.).to(device)
optimizer = optim.Adam(model.parameters(), lr=0.001)
if not isinstance(x, torch.Tensor):
x = torch.tensor(x, dtype=torch.float32, device=device)
x = x.to(device)
if isinstance(y, torch.Tensor):
y = y.to('cuda').numpy()
K = len(np.unique(y))
for epoch in range(epochs):
view2 = np.zeros((576,400))
view3 = np.zeros((576,400))
output = model.test2(x.to(device)).cpu().detach().numpy()
output1 = model.test1(x.to(device)).cpu().detach().numpy()
for i in range(numclass):
lovro = output[i]
lovro =lovro[[0]]
lovro = lovro.squeeze(axis = 0)
lovro = lovro.flatten()
for j in range(576):
view2[j][i] = lovro[j]
for i in range(numclass):
lovro = output[i]
loss = my_loss(view2)
optimizer.zero_grad()
loss.backward()
optimizer.step()
print('Epoch %02d' %
(epoch))
The way you implemented your loss does not really look "differentiable". I am putting it in quotation marks because what you are observing is a difference between mathematical diffentiation and backpropagation. There is no functional dependency in the underlying graph of computation between your variables and your loss. The reason for that is because you used an array, where you copied values into. So while your loss depends on values of "view2" it does not depend on values of outputs of your model. You have to avoid any value assignments when defining your computation.
x = np.array([0])
x[0] = output_of_network
loss = LA.norm(x) # wrong
loss = LA.norm(output_of_network) # correct

what causes the model can not predict using other data

I am here using ResNet50 to create a regression model. I ran into a problem when I wanted to test a model using other data. The length of the dataset is 2050. Then I separate it into training and testing data. I divide it by 1500 as training data and 500 as test data. At the time of the training process, I had good results and was able to predict quite accurately. but when I want to test it using testing data, the prediction results are bad.
below is the model loss result
the code :
Insole = pd.read_csv('1119_Rwalk40s1_list.txt', header=None, low_memory=False)
SIData = np.asarray(Insole)
df = pd.read_csv('1119_Rwalk40s1.csv', low_memory=False)
columns = ['Fx','Fy','Fz','Mx','My','Mz']
selected_df = df[columns]
FCDatas = selected_df[:2050]
SmartInsole = np.array(SIData)
FCData = np.array(FCDatas)
xX = SmartInsole
yY = FCData
scaler_x = MinMaxScaler(feature_range=(0, 1))
scaler_x.fit(xX)
xscale = scaler_x.transform(xX)
scaler_y = MinMaxScaler(feature_range=(0, 1))
scaler_y.fit(yY)
yscale = scaler_y.transform(yY)
SIDataPCA = xscale
pca = PCA(n_components=12)
pca.fit(SIDataPCA)
SIdata_pca = pca.transform(SIDataPCA)
#For Training
trainX = SIdata_pca[:1500]
trainY = yscale[:1500]
#For Testing
testX = SIdata_pca[1500]
testY = yscale[1500:]
X_train, X_test, y_train, y_test = train_test_split(trainX, trainY, test_size=0.20, random_state=2)
Below is the my resnet model structure:
Below is identity blok:
def identity_block(input_tensor,units):
x = layers.Dense(units)(input_tensor)
x = layers.Activation('relu')(x)
x = layers.Dense(units)(x)
x = layers.Activation('relu')(x)
x = layers.Dense(units)(x)
x = layers.add([x, input_tensor])
x = layers.Activation('relu')(x)
return x
Below is dens_block:
def dens_block(input_tensor,units):
x = layers.Dense(units)(input_tensor)
x = layers.Activation('relu')(x)
x = layers.Dense(units)(x)
x = layers.Activation('relu')(x)
x = layers.Dense(units)(x)
shortcut = layers.Dense(units)(input_tensor)
x = layers.add([x, shortcut])
x = layers.Activation('relu')(x)
return x
Resnet50 model:
def ResNet50Regression():
Res_input = layers.Input(shape=(12,))
width = 32
x = dens_block(Res_input,width)
x = identity_block(x,width)
x = identity_block(x,width)
x = dens_block(x,width)
x = identity_block(x,width)
x = identity_block(x,width)
x = dens_block(x,width)
x = identity_block(x,width)
x = identity_block(x,width)
x = layers.Dense(6,activation="sigmoid")(x)
model = models.Model(inputs=Res_input, outputs=x)
return model
model = ResNet50Regression()
model.compile(loss='mse',
optimizer=Adam(),
metrics=['mse'])
history = model.fit(X_train, y_train,
batch_size=32,
epochs=50,
validation_data=(X_test, y_test),
verbose=2)
model.save('Resnet50-1203.h5')
ypred = model.predict(trainX)
x=[]
colors=['red','green','brown','teal','gray','black','maroon','orange','purple']
colors2=['green','red','orange','black','maroon','teal','blue','gray','brown']
x = np.arange(0,1500)*40/1500
for i in range(0,6):
plt.figure(figsize=(15,6))
plt.plot(x,trainY[0:1500,i],color=colors[i])
plt.plot(x,ypred[0:1500,i], markerfacecolor='none',color=colors2[i])
plt.title('Result for ResNet Regression (Training Data)')
plt.ylabel(columns[i])
plt.xlabel('Time(s)')
plt.legend(['FP Data', 'SI Prediction'], loc='best')
# plt.savefig('Regression Result.png'[i])
plt.show()
Testing Model using other data code:
new_model = load_model('Resnet50-1203.h5')
model.evaluate(testX, testY)
Test_xX_model = new_model.predict(testX)
x=[]
colors=['red','green','brown','teal','gray','black','maroon','orange','purple']
colors2=['green','red','orange','black','maroon','teal','blue','gray','brown']
x = np.arange(0,550)*40/550
for i in range(0,6):
plt.figure(figsize=(15,6))
plt.plot(x,testY[0:550,i],color=colors[i])
plt.plot(x,Test_xX_model[0:550,i], markerfacecolor='none',color=colors2[i])
plt.title('Result for ResNet Regression (Testing Data)')
plt.ylabel(columns[i])
plt.xlabel('Time(s)')
plt.legend(['FP Data', 'SI Prediction'], loc='best')
# plt.savefig('Regression Result.png'[i])
plt.show()
1 of traning data predictions results:
1 of testing data predictions results:
what should i do for this case?

Pytorch running out of memory when initialising cnn model

Trying to implement a SAGEConv classifier using Pytorch. Here's the model definition:
embed_dim = 128
from torch_geometric.nn import GraphConv, TopKPooling, GatedGraphConv
from torch_geometric.nn import global_mean_pool as gap, global_max_pool as gmp
import torch.nn.functional as F
class Net(torch.nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = SAGEConv(embed_dim, 128)
self.pool1 = TopKPooling(128, ratio=0.8)
self.conv2 = SAGEConv(128, 128)
self.pool2 = TopKPooling(128, ratio=0.8)
self.conv3 = SAGEConv(128, 128)
self.pool3 = TopKPooling(128, ratio=0.8)
self.item_embedding = torch.nn.Embedding(num_embeddings=df.item_id.max() +1, embedding_dim=embed_dim)
self.lin1 = torch.nn.Linear(256, 128)
self.lin2 = torch.nn.Linear(128, 64)
self.lin3 = torch.nn.Linear(64, 1)
self.bn1 = torch.nn.BatchNorm1d(128)
self.bn2 = torch.nn.BatchNorm1d(64)
self.act1 = torch.nn.ReLU()
self.act2 = torch.nn.ReLU()
def forward(self, data):
x, edge_index, batch = data.x, data.edge_index, data.batch
x = self.item_embedding(x)
x = x.squeeze(1)
x = F.relu(self.conv1(x, edge_index))
x, edge_index, _, batch, _ = self.pool1(x, edge_index, None, batch)
x1 = torch.cat([gmp(x, batch), gap(x, batch)], dim=1)
x = F.relu(self.conv2(x, edge_index))
x, edge_index, _, batch, _ = self.pool2(x, edge_index, None, batch)
x2 = torch.cat([gmp(x, batch), gap(x, batch)], dim=1)
x = F.relu(self.conv3(x, edge_index))
x, edge_index, _, batch, _ = self.pool3(x, edge_index, None, batch)
x3 = torch.cat([gmp(x, batch), gap(x, batch)], dim=1)
x = x1 + x2 + x3
x = self.lin1(x)
x = self.act1(x)
x = self.lin2(x)
x = self.act2(x)
x = F.dropout(x, p=0.5, training=self.training)
x = torch.sigmoid(self.lin3(x)).squeeze(1)
return x
I get the following error:
RuntimeError: [enforce fail at ..\c10\core\CPUAllocator.cpp:75] data. DefaultCPUAllocator: not enough memory: you tried to allocate 603564952576 bytes. Buy new RAM!
When initialising the model:
device = torch.device('cuda')
model = Net().to(device)
I am new to pytorch. Any help would be appreciated.
Thank you!
Turns out that df.item_id.max() was too large as I did not subset the data. Hence dimensionality was too high.

Why Deep Adaptive Input Normalization (DAIN) normalizes time series data accross rows?

The DAIN paper describes how a network learns to normalize time series data by itself, here is how the authors implemented it. The code leads me to think that normalization is happening across rows, not columns. Can anyone explain why it is implemented that way? Because I always thought that one normalizes time series only across columns to keep each feature's true information.
Here is the piece the does normalization:
```python
class DAIN_Layer(nn.Module):
def __init__(self, mode='adaptive_avg', mean_lr=0.00001, gate_lr=0.001, scale_lr=0.00001, input_dim=144):
super(DAIN_Layer, self).__init__()
print("Mode = ", mode)
self.mode = mode
self.mean_lr = mean_lr
self.gate_lr = gate_lr
self.scale_lr = scale_lr
# Parameters for adaptive average
self.mean_layer = nn.Linear(input_dim, input_dim, bias=False)
self.mean_layer.weight.data = torch.FloatTensor(data=np.eye(input_dim, input_dim))
# Parameters for adaptive std
self.scaling_layer = nn.Linear(input_dim, input_dim, bias=False)
self.scaling_layer.weight.data = torch.FloatTensor(data=np.eye(input_dim, input_dim))
# Parameters for adaptive scaling
self.gating_layer = nn.Linear(input_dim, input_dim)
self.eps = 1e-8
def forward(self, x):
# Expecting (n_samples, dim, n_feature_vectors)
# Nothing to normalize
if self.mode == None:
pass
# Do simple average normalization
elif self.mode == 'avg':
avg = torch.mean(x, 2)
avg = avg.resize(avg.size(0), avg.size(1), 1)
x = x - avg
# Perform only the first step (adaptive averaging)
elif self.mode == 'adaptive_avg':
avg = torch.mean(x, 2)
adaptive_avg = self.mean_layer(avg)
adaptive_avg = adaptive_avg.resize(adaptive_avg.size(0), adaptive_avg.size(1), 1)
x = x - adaptive_avg
# Perform the first + second step (adaptive averaging + adaptive scaling )
elif self.mode == 'adaptive_scale':
# Step 1:
avg = torch.mean(x, 2)
adaptive_avg = self.mean_layer(avg)
adaptive_avg = adaptive_avg.resize(adaptive_avg.size(0), adaptive_avg.size(1), 1)
x = x - adaptive_avg
# Step 2:
std = torch.mean(x ** 2, 2)
std = torch.sqrt(std + self.eps)
adaptive_std = self.scaling_layer(std)
adaptive_std[adaptive_std <= self.eps] = 1
adaptive_std = adaptive_std.resize(adaptive_std.size(0), adaptive_std.size(1), 1)
x = x / (adaptive_std)
elif self.mode == 'full':
# Step 1:
avg = torch.mean(x, 2)
adaptive_avg = self.mean_layer(avg)
adaptive_avg = adaptive_avg.resize(adaptive_avg.size(0), adaptive_avg.size(1), 1)
x = x - adaptive_avg
# # Step 2:
std = torch.mean(x ** 2, 2)
std = torch.sqrt(std + self.eps)
adaptive_std = self.scaling_layer(std)
adaptive_std[adaptive_std <= self.eps] = 1
adaptive_std = adaptive_std.resize(adaptive_std.size(0), adaptive_std.size(1), 1)
x = x / adaptive_std
# Step 3:
avg = torch.mean(x, 2)
gate = F.sigmoid(self.gating_layer(avg))
gate = gate.resize(gate.size(0), gate.size(1), 1)
x = x * gate
else:
assert False
return x
```
I am not sure either but they do transpose in forward function : x = x.transpose(1, 2) of the MLP class. Thus, it seemed to me that they normalise over time for each feature.