Linear Regression using sklearn

Linear Regression using sklearn - regression

I have a model fitted with data but having trouble using the predict function.
d = {'df_Size': [1, 3, 5, 8, 10, 15, 18], 'RAM': [3676, 6532, 9432, 13697, 16633, 23620, 27990]}
df = pd.DataFrame(data=d)
df
X = np.array(df['df_Size']).reshape(-1, 1)
y = np.array(df['RAM']).reshape(-1, 1)
model = LinearRegression()
model.fit(X, y)
print(regr.score(X, y))
then when I try to predict on
X_Size = 25
X_Size
prediction = model.predict(X_Size)
I get the following error
ValueError: Expected 2D array, got scalar array instead:
array=25.
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
I think I am passing the 25 in the wrong format but would just like some help on getting the response for Ram considering the 25 rows.
Thanks,

You need to pass the predictor in the same shape (basically 1 column):
X.shape
Out[11]: (7, 1)
You can do:
model.predict(np.array(25).reshape(1,1))

Related

Luong Style Attention Mechanism with Dot and General scoring functions in keras and tensorflow

I am trying to implement the dot product and general implementation of calculating similarity scores from encoder and decoder output and hidden states respectively in keras.
I have got the idea to do the product of tf.keras.layers.dot(encoder_output,decoder_state) for calculating product score but there is error in multiplication of these two values.
class Attention(tf.keras.Model):
def __init__(self,units):
super().__init__()
self.units = units
def call(self, decoder_state, encoder_output):
score = tf.keras.layers.dot([encoder_output,decoder_state], axes=[2, 1])
attention_weights = tf.nn.softmax(score, axis=1)
context_vector = attention_weights * encoder_output
context_vector = tf.reduce_sum(context_vector, axis=1)
return context_vector, attention_weights
batch_size = 16
units = 32
input_length = 20
decoder_state = tf.random.uniform(shape=[batch_size, units])
encoder_output = tf.random.uniform(shape=[batch_size, input_length, units])
attention = Attention(units)
context_vector, attention_weights = attention(decoder_state, encoder_output)
I am getting the following error:
/usr/local/lib/python3.6/dist-packages/six.py in raise_from(value, from_value)
InvalidArgumentError: Incompatible shapes: [16,20] vs. [16,20,32] [Op:Mul]
It is a very simple fix but as I am new to this I am not able to get the exact method needed to be called here.
I have tried reshaping the values of encoder_output but still this does not work.
Request to help me fix this.

I am just putting #Ayush Srivastava's comment as a response so that the post gets an answer.
Basically, the error occurs because you are trying to multiply 2 tensors (namely attention_weights and encoder_output) with different shapes, so you need to reshape the decoder_state.
Here is the full answer:
class Attention(tf.keras.Model):
def __init__(self,units):
super().__init__()
self.units = units
def call(self, decoder_state, encoder_output):
decoder_state = tf.keras.layers.Reshape((decoder_state.shape[1], 1))(decoder_state)
score = tf.keras.layers.dot([encoder_output, decoder_state],[2, 1])
attention_weights = tf.nn.softmax(score, axis=1)
context_vector = attention_weights * encoder_output
context_vector = tf.reduce_sum(context_vector, axis=1)
return context_vector, attention_weights
Shapes:
decoder_state before reshape: (16, 32)
decoder_state after reshape: (16, 32, 1)
enc_output: (16, 20, 32)
score: (16, 20, 1)
attention_weights: (16, 20, 1)
context_vector before sum: (16, 20, 32)

define a function in which min() is used

I am trying to define a function in which I want a part of the function limited. I try to do this by using min() but it returns
The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
My code:
def f(x, beta):
K_w = (1+((0.5*D)/(0.5*D+x))**2)**2
K_c = min(11,(3.5*(x/D)**(-0.5))) # <-- this is what gives me the problem. It should limit K_c to 11, but that does not work.
K_tot = (K_c**2+K_w**2+2*K_c*K_w*np.cos(beta))**0.5
return K_tot
x = np.linspace(0, 50, 100)
beta = np.linspace(0, 3.14, 180)
X, Y = np.meshgrid(x, beta)
Z = f(X, Y)
fig = plt.figure()
ax = plt.axes(projection='3d')
ax.contour3D(X, Y, Z, 100, cmap = 'viridis')
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('z');
I expected K_c to be limited to 11, but it gave a
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
I might be making a rookie mistake, but help is much appreciated!

Consider using np.clip of which its references can be found here.
np.clip(3.5*(x/D)**(-0.5), None, 11)
for your case.
For example,
>>> import numpy as np
>>> np.clip([1, 2, 3, 15], None, 11)
array([ 1, 2, 3, 11])
The problem with your code is that min is comparing a number with a list of which this is not expected.
Alternatively, here is a list comprehension approach:
A = [1, 2, 3, 15]
B = [min(11, a) for a in A]
print(B)

How do I get CSV files into an Estimator in Tensorflow 1.6

I am new to tensorflow (and my first question in StackOverflow)
As a learning tool, I am trying to do something simple. (4 days later I am still confused)
I have one CSV file with 36 columns (3500 records) with 0s and 1s.
I am envisioning this file as a flattened 6x6 matrix.
I have another CSV file with 1 columnn of ground truth 0 or 1 (3500 records) which indicates if at least 4 of the 6 of elements in the 6x6 matrix's diagonal are 1's.
I am not sure I have processed the CSV files correctly.
I am confused as to how I create the features dictionary and Labels and how that fits into the DNNClassifier
I am using TensorFlow 1.6, Python 3.6
Below is the small amount of code I have so far.
import tensorflow as tf
import os
def x_map(line):
rDefaults = [[] for cl in range(36)]
x_row = tf.decode_csv(line, record_defaults=rDefaults)
return x_row
def y_map(line):
line = tf.string_to_number(line, out_type=tf.int32)
y_row = tf.one_hot(line, depth=2)
return y_row
x_path_file = os.path.join('D:', 'Diag', '6x6_train.csv')
y_path_file = os.path.join('D:', 'Diag', 'HasDiag_train.csv')
filenames = [x_path_file]
x_dataset = tf.data.TextLineDataset(filenames)
x_dataset = x_dataset.map(x_map)
x_dataset = x_dataset.batch(1)
x_iter = x_dataset.make_one_shot_iterator()
x_next_el = x_iter.get_next()
filenames = [y_path_file]
y_dataset = tf.data.TextLineDataset(filenames)
y_dataset = y_dataset.map(y_map)
y_dataset = y_dataset.batch(1)
y_iter = y_dataset.make_one_shot_iterator()
y_next_el = y_iter.get_next()
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
x_el = (sess.run(x_next_el))
y_el = (sess.run(y_next_el))
The output for x_el is:
(array([1.], dtype=float32), array([1.], dtype=float32), array([1.], dtype=float32), array([1.], dtype=float32), array([1.], dtype=float32), array([0.] ... it goes on...
The output for y_el is:
[[1. 0.]]

You're pretty much there for a minimal working model. The main issue I see is that tf.decode_csv returns a tuple of tensors, where as I expect you want a single tensor with all values. Easy fix:
x_row = tf.stack(tf.decode_csv(line, record_defaults=rDefaults))
That should work... but it fails to take advantage of many of the awesome things the tf.data.Dataset API has to offer, like shuffling, parallel threading etc. For example, if you shuffle each dataset, those shuffling operations won't be consistent. This is because you've created two separate datasets and manipulated them independently. If you create them independently, zip them together then manipulate, those manipulations will be consistent.
Try something along these lines:
def get_inputs(
count=None, shuffle=True, buffer_size=1000, batch_size=32,
num_parallel_calls=8, x_paths=[x_path_file], y_paths=[y_path_file]):
"""
Get x, y inputs.
Args:
count: number of epochs. None indicates infinite epochs.
shuffle: whether or not to shuffle the dataset
buffer_size: used in shuffle
batch_size: size of batch. See outputs below
num_parallel_calls: used in map. Note if > 1, intra-batch ordering
will be shuffled
x_paths: list of paths to x-value files.
y_paths: list of paths to y-value files.
Returns:
x: (batch_size, 6, 6) tensor
y: (batch_size, 2) tensor of 1-hot labels
"""
def x_map(line):
rDefaults = [[] for cl in range(n_dims**2)]
x_row = tf.stack(tf.decode_csv(line, record_defaults=rDefaults))
return x_row
def y_map(line):
line = tf.string_to_number(line, out_type=tf.int32)
y_row = tf.one_hot(line, depth=2)
return y_row
def xy_map(x, y):
return x_map(x), y_map(y)
x_ds = tf.data.TextLineDataset(x_paths)
y_ds = tf.data.TextLineDataset(y_paths)
combined = tf.data.Dataset.zip((x_ds, y_ds))
combined = combined.repeat(count=count)
if shuffle:
combined = combined.shuffle(buffer_size)
combined = combined.map(xy_map, num_parallel_calls=num_parallel_calls)
combined = combined.batch(batch_size)
x, y = combined.make_one_shot_iterator().get_next()
return x, y
To experiment/debug,
x, y = get_inputs()
with tf.Session() as sess:
xv, yv = sess.run((x, y))
print(xv.shape, yv.shape)
For use in an estimator, pass the function itself.
estimator.train(get_inputs, max_steps=10000)
def get_eval_inputs():
return get_inputs(
count=1, shuffle=False
x_paths=[x_eval_paths],
y_paths=[y_eval_paths])
estimator.eval(get_eval_inputs)

unknown resampling filter error when trying to create my own dataset with pytorch

I am trying to create a CNN implemented with data augmentation in pytorch to classify dogs and cats. The issue that I am having is that when I try to input my dataset and enumerate through it I keep getting this error:
Traceback (most recent call last):
File "<ipython-input-55-6337e0536bae>", line 75, in <module>
for i, (inputs, labels) in enumerate(trainloader):
File "/usr/local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 188, in __next__
batch = self.collate_fn([self.dataset[i] for i in indices])
File "/usr/local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 188, in <listcomp>
batch = self.collate_fn([self.dataset[i] for i in indices])
File "/usr/local/lib/python3.6/site-packages/torchvision/datasets/folder.py", line 124, in __getitem__
img = self.transform(img)
File "/usr/local/lib/python3.6/site-packages/torchvision/transforms/transforms.py", line 42, in __call__
img = t(img)
File "/usr/local/lib/python3.6/site-packages/torchvision/transforms/transforms.py", line 147, in __call__
return F.resize(img, self.size, self.interpolation)
File "/usr/local/lib/python3.6/site-packages/torchvision/transforms/functional.py", line 197, in resize
return img.resize((ow, oh), interpolation)
File "/usr/local/lib/python3.6/site-packages/PIL/Image.py", line 1724, in resize
raise ValueError("unknown resampling filter")
ValueError: unknown resampling filter
and I really dont know whats wrong with my code. I have provided the code below:
# Creating the CNN
# Importing the libraries
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.autograd import Variable
import torchvision
from torchvision import transforms
#Creating the CNN Model
class CNN(nn.Module):
def __init__(self, nb_outputs):
super(CNN, self).__init__() #activates the inheritance and allows the use of all the tools in the nn.Module
#making the 3 convolutional layers that will be used in the convolutional neural network
self.convolution1 = nn.Conv2d(in_channels = 1, out_channels = 32, kernel_size = 5) #kernal_size -> the deminson of the feature detector e.g kernel_size = 5 => feature detector of size 5x5
self.convolution2 = nn.Conv2d(in_channels = 32, out_channels = 64, kernel_size = 2)
#making 2 full connections one to connect the inputs of the ANN to the hidden layer and another to connect the hidden layer to the outputs of the ANN
self.fc1 = nn.Linear(in_features = self.count_neurons((1, 64,64)), out_features = 40)
self.fc2 = nn.Linear(in_features = 40, out_features = nb_outputs)
def count_neurons(self, image_dim):
x = Variable(torch.rand(1, *image_dim)) #this variable repersents a fake image to allow us to compute the number of neruons
#in order to pass the elements of the tuple image_dim into our function as a list of arguments we need to add a * before image_dim
#since x will be going into our neural network we need to convert it into a torch variable using the Variable() function
x = F.relu(F.max_pool2d(self.convolution1(x), 3, 2)) #first we apply the convolution to x then apply max_pooling to the convolutional fake images and then activate all the neurons in the pooling layer
x = F.relu(F.max_pool2d(self.convolution2(x), 3, 2)) #the signals are now propragated up to the thrid convoulational layer
#Now to flatten x to obtain the number of neurons in the flattening layer
return x.data.view(1, -1).size(1) #this will flatten x into a huge vector and returns the size of the vector, that size repersents the number of neurons that will be inputted into the ANN
#even though x is not a real image from the game since the size of the flattened vector only depends on the dimention of the inputted image we can just set x to have the same dimentions as the image
def forward(self, x):
x = F.relu(F.max_pool2d(self.convolution1(x), 3, 2)) #first we apply the convolution to x then apply max_pooling to the convolutional fake images and then activate all the neurons in the pooling layer
x = F.relu(F.max_pool2d(self.convolution2(x), 3, 2))
#flattening layer of the CNN
x = x.view(x.size(0), -1)
#x is now the inputs to the ANN
x = F.relu(self.fc1(x)) #we propagte the signals from the flatten layer to the full connected layer and activate the neruons by breaking the linearilty with the relu function
x = F.sigmoid(self.fc2(x))
#x is now the output neurons of the ANN
return x
train_tf = transforms.Compose([transforms.RandomHorizontalFlip(),
transforms.Resize(64,64),
transforms.RandomRotation(20),
transforms.RandomGrayscale(.2),
transforms.ToTensor()])
test_tf = transforms.Compose([transforms.Resize(64,64),
transforms.ToTensor()])
training_set = torchvision.datasets.ImageFolder(root = './dataset/training_set',
transform = train_tf)
test_set = torchvision.datasets.ImageFolder(root = './dataset/test_set',
transform = transforms.Compose([transforms.Resize(64,64),
transforms.ToTensor()]) )
trainloader = torch.utils.data.DataLoader(training_set, batch_size=32,
shuffle=True, num_workers=0)
testloader = torch.utils.data.DataLoader(test_set, batch_size= 32,
shuffle=False, num_workers=0)
#training the model
cnn = CNN(1)
cnn.train()
loss = nn.BCELoss()
optimizer = optim.Adam(cnn.parameters(), lr = 0.001) #the optimizer => Adam optimizer
nb_epochs = 25
for epoch in range(nb_epochs):
train_loss = 0.0
train_acc = 0.0
total = 0.0
for i, (inputs, labels) in enumerate(trainloader):
inputs, labels = Variable(inputs), Variable(labels)
cnn.zero_grad()
outputs = cnn(inputs)
loss_error = loss(outputs, labels)
optimizer.step()
_, pred = torch.max(outputs.data, 1)
total += labels.size(0)
train_loss += loss_error.data[0]
train_acc += (pred == labels).sum()
train_loss = train_loss/len(training_loader)
train_acc = train_acc/total
print('Epoch: %d, loss: %.4f, accuracy: %.4f' %(epoch+1, train_loss, train_acc))
The folder arrangement for the code is /dataset/training_set and inside the training_set folder are two more folders one for all the cat images and the other for all the dog images. Each image is name either dog.xxxx.jpg or cat.xxxx.jpg, where the xxxx represents the number so for the first cat image it would be cat.1.jpg up to cat.4000.jpg. This is the same format for the test_set folder. The number of training images is 8000 and the number of test images is 2000. If anyone can point out my error I would greatly appreciate it.
Thank you

Try to set the desired size in transforms.Resize as a tuple:
transforms.Resize((64, 64))
PIL is using the second argument (in your case 64) as the interpolation method.

in torchvision.transforms.Compose([put every transform in these brackets]),
This, will not give the error.

Ordinary least squares regression giving wrong prediction

I am using statsmodels OLS to fit a series of points to a line:
import statsmodels.api as sm
Y = [1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13, 14, 15]
X = [[73.759999999999991], [73.844999999999999], [73.560000000000002],
[73.209999999999994], [72.944999999999993], [73.430000000000007],
[72.950000000000003], [73.219999999999999], [72.609999999999999],
[74.840000000000003], [73.079999999999998], [74.125], [74.75],
[74.760000000000005]]
ols = sm.OLS(Y, X)
r = ols.fit()
preds = r.predict()
print preds
And I get the following results:
[ 7.88819844 7.89728869 7.86680961 7.82937917 7.80103898 7.85290687
7.8015737 7.83044861 7.76521269 8.00369809 7.81547643 7.92723304
7.99407312 7.99514256]
These are an about 10 times off. What am I doing wrong? I tried adding a constant, that just makes the values 1000 times bigger. I don't know much about statistics, so maybe there is something I need to do with the data?

I think you have switched your response and your predictor, like Michael Mayer suggested in his comment. If you plot the data with predictions from your model, you get something like this:
import statsmodels.api as sm
import numpy as np
import matplotlib.pyplot as plt
Y = np.array([1,2,3,4,5,6,7,8,9,11,12,13,14,15])
X = np.array([ 73.76 , 73.845, 73.56 , 73.21 , 72.945, 73.43 , 72.95 ,
73.22 , 72.61 , 74.84 , 73.08 , 74.125, 74.75 , 74.76 ])
Design = np.column_stack((np.ones(14), X))
ols = sm.OLS(Y, Design).fit()
preds = ols.predict()
plt.plot(X, Y, 'ko')
plt.plot(X, preds, 'k-')
plt.show()
If you switch X and Y, which is what I think you want, you get:
Design2 = np.column_stack((np.ones(14), Y))
ols2 = sm.OLS(X, Design2).fit()
preds2 = ols2.predict()
print preds2
[ 73.1386399 73.21305699 73.28747409 73.36189119 73.43630829
73.51072539 73.58514249 73.65955959 73.73397668 73.88281088
73.95722798 74.03164508 74.10606218 74.18047927]
plt.plot(Y, X, 'ko')
plt.plot(Y, preds2, 'k-')
plt.show()

We Keep Coding

html mysql json google-apps-script actionscript-3 ms-access google-chrome google-maps reporting-services sql-server-2008

Linear Regression using sklearn - regression

You need to pass the predictor in the same shape (basically 1 column): X.shape Out[11]: (7, 1) You can do: model.predict(np.array(25).reshape(1,1))

Related

Luong Style Attention Mechanism with Dot and General scoring functions in keras and tensorflow

define a function in which min() is used

How do I get CSV files into an Estimator in Tensorflow 1.6

unknown resampling filter error when trying to create my own dataset with pytorch

Ordinary least squares regression giving wrong prediction

Categories

Resources