The number of MiniBatchSize always 1 even when i change it in minibatchqueue parameters to 32? in MATLAB - deep-learning

I follow the example in Matlab for CNN with Multiple Outputs
Train Network with Multiple Outputs
https://www.mathworks.com/help/deeplearning/ug/train-network-with-multiple-outputs.html
But with a colour image as input data with two output as below:
Name Size Bytes Class Attributes
XTrain 128x128x3x2016 792723456 double
Name Size Bytes Class Attributes
yTrain1 1x2016 2230 categorical
Name Size Bytes Class Attributes
yTrain2 1x2016 16128 double
I define minibatchqueue object as below:
mbq = minibatchqueue(dsTrain,...
'MiniBatchSize',32,...
'MiniBatchFcn', #preprocessData,...
'MiniBatchFormat',{'SSCB','',''});
But the number of Batch size in output is always 1 as
mbq =
minibatchqueue with 3 outputs and properties:
Mini-batch creation:
MiniBatchSize: 1
PartialMiniBatch: 'return'
MiniBatchFcn: #preprocessDataJwan
DispatchInBackground: 0
Outputs:
OutputCast: {'single' 'single' 'single'}
OutputAsDlarray: [1 1 1]
MiniBatchFormat: {'SSCB' '' ''}
OutputEnvironment: {'auto' 'auto' 'auto'}
even I check the data size
[dlX,dlY1,dlY2] = next(mbq);
size(dlX)
size(dlY1)
size(dlY2)
ans =
128 128 3 2016
ans =
2 2016
ans =
1 2016
please, can anyone help me with that?

Related

Understanding the output of a pyomo model, number of solutions is zero?

I am using this code to create a solve a simple problem:
import pyomo.environ as pyo
from pyomo.core.expr.numeric_expr import LinearExpression
model = pyo.ConcreteModel()
model.nVars = pyo.Param(initialize=4)
model.N = pyo.RangeSet(model.nVars)
model.x = pyo.Var(model.N, within=pyo.Binary)
model.coefs = [1, 1, 3, 4]
model.linexp = LinearExpression(constant=0,
linear_coefs=model.coefs,
linear_vars=[model.x[i] for i in model.N])
def caprule(m):
return m.linexp <= 50
model.capme = pyo.Constraint(rule=caprule)
model.obj = pyo.Objective(expr = model.linexp, sense = maximize)
results = SolverFactory('glpk', executable='/usr/bin/glpsol').solve(model)
results.write()
And this is the output:
# ==========================================================
# = Solver Results =
# ==========================================================
# ----------------------------------------------------------
# Problem Information
# ----------------------------------------------------------
Problem:
- Name: unknown
Lower bound: 50.0
Upper bound: 50.0
Number of objectives: 1
Number of constraints: 2
Number of variables: 5
Number of nonzeros: 5
Sense: maximize
# ----------------------------------------------------------
# Solver Information
# ----------------------------------------------------------
Solver:
- Status: ok
Termination condition: optimal
Statistics:
Branch and bound:
Number of bounded subproblems: 0
Number of created subproblems: 0
Error rc: 0
Time: 0.09727835655212402
# ----------------------------------------------------------
# Solution Information
# ----------------------------------------------------------
Solution:
- number of solutions: 0
number of solutions displayed: 0
It says the number of solutions is 0, and yet it does solve the problem:
print(list(model.x[i]() for i in model.N))
Will output this:
[1.0, 1.0, 1.0, 1.0]
Which is a correct answer to the problem. what am I missing?
The interface between pyomo and glpk sometimes (always?) seems to return 0 for the number of solutions. I'm assuming there is some issue with the generalized interface between the pyomo core module and the various solvers that it interfaces with. When I use glpk and cbc solvers on this, it reports the number of solutions as zero. Perhaps those solvers don't fill that data element in the generalized interface. Somebody w/ more experience in the data glob returned from the solver may know precisely. That said, the main thing to look at is the termination condition, which I've found to be always accurate. It reports optimal.
I suspect that you have some mixed code from another model in your example. When I fix a typo or two (you missed the pyo prefix on a few things), it solves fine and gives the correct objective value as 9. I'm not sure where 50 came from in your output.
(slightly cleaned up) Code:
import pyomo.environ as pyo
from pyomo.core.expr.numeric_expr import LinearExpression
model = pyo.ConcreteModel()
model.nVars = pyo.Param(initialize=4)
model.N = pyo.RangeSet(model.nVars)
model.x = pyo.Var(model.N, within=pyo.Binary)
model.coefs = [1, 1, 3, 4]
model.linexp = LinearExpression(constant=0,
linear_coefs=model.coefs,
linear_vars=[model.x[i] for i in model.N])
def caprule(m):
return m.linexp <= 50
model.capme = pyo.Constraint(rule=caprule)
model.obj = pyo.Objective(expr = model.linexp, sense = pyo.maximize)
solver = pyo.SolverFactory('glpk') #, executable='/usr/bin/glpsol').solve(model)
results = solver.solve(model)
print(results)
model.obj.pprint()
model.obj.display()
Output:
Problem:
- Name: unknown
Lower bound: 9.0
Upper bound: 9.0
Number of objectives: 1
Number of constraints: 2
Number of variables: 5
Number of nonzeros: 5
Sense: maximize
Solver:
- Status: ok
Termination condition: optimal
Statistics:
Branch and bound:
Number of bounded subproblems: 0
Number of created subproblems: 0
Error rc: 0
Time: 0.00797891616821289
Solution:
- number of solutions: 0
number of solutions displayed: 0
obj : Size=1, Index=None, Active=True
Key : Active : Sense : Expression
None : True : maximize : x[1] + x[2] + 3*x[3] + 4*x[4]
obj : Size=1, Index=None, Active=True
Key : Active : Value
None : True : 9.0

Token indices sequence length is longer than the specified maximum sequence length for this model (651 > 512) with Hugging face sentiment classifier

I'm trying to get the sentiments for comments with the help of hugging face sentiment analysis pretrained model. It's returning error like Token indices sequence length is longer than the specified maximum sequence length for this model (651 > 512) with Hugging face sentiment classifier.
Below I'm attaching the code please look at it
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
import transformers
import pandas as pd
model = AutoModelForSequenceClassification.from_pretrained('/content/drive/MyDrive/Huggingface-Sentiment-Pipeline')
token = AutoTokenizer.from_pretrained('/content/drive/MyDrive/Huggingface-Sentiment-Pipeline')
classifier = pipeline(task='sentiment-analysis', model=model, tokenizer=token)
data = pd.read_csv('/content/drive/MyDrive/DisneylandReviews.csv', encoding='latin-1')
data.head()
Output is
Review
0 If you've ever been to Disneyland anywhere you...
1 Its been a while since d last time we visit HK...
2 Thanks God it wasn t too hot or too humid wh...
3 HK Disneyland is a great compact park. Unfortu...
4 the location is not in the city, took around 1...
Followed by
classifier("My name is mark")
Output is
[{'label': 'POSITIVE', 'score': 0.9953688383102417}]
Followed by code
basic_sentiment = [i['label'] for i in value if 'label' in i]
basic_sentiment
Output is
['POSITIVE']
Appending the total rows to empty list
text = []
for index, row in data.iterrows():
text.append(row['Review'])
I'm trying to get the sentiment for all the rows
sent = []
for i in range(len(data)):
sentiment = classifier(data.iloc[i,0])
sent.append(sentiment)
The error is :
Token indices sequence length is longer than the specified maximum sequence length for this model (651 > 512). Running this sequence through the model will result in indexing errors
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-19-4bb136563e7c> in <module>()
2
3 for i in range(len(data)):
----> 4 sentiment = classifier(data.iloc[i,0])
5 sent.append(sentiment)
11 frames
/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
1914 # remove once script supports set_grad_enabled
1915 _no_grad_embedding_renorm_(weight, input, max_norm, norm_type)
-> 1916 return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
1917
1918
IndexError: index out of range in self
some of the sentences in your Review column of the data frame are too long. when these sentences are converted to tokens and sent inside the model they are exceeding the 512 seq_length limit of the model, the embedding of the model used in the sentiment-analysis task was trained on 512 tokens embedding.
to fix this issue you can filter out the long sentences and keep only smaller ones (with token length < 512 )
or you can truncate the sentences with truncating = True
sentiment = classifier(data.iloc[i,0], truncation=True)
If you're tokenizing separately from your classification step, this warning can be output during tokenization itself (as opposed to classification).
In my case, I am using a BERT model, so I have MAX_TOKENS=510 (leaving room for the sequence-start and sequence-end tokens).
token = AutoTokenizer.from_pretrained("your model")
tokens = token.tokenize(
text, max_length=MAX_TOKENS, truncation=True
)
Now, when you run your classifier, the tokens are guaranteed not to exceed the maximum length.

Dimension problem when converting a MATLAB .m script into an Octave compatible syntax

I want to run a MATLAB script M-file to reconstruct a point cloud in Octave. Therefore I had to rewrite some parts of the code to make it compatible with Octave. Actually the M-file works fine in Octave (I don't get any errors) and also the plotted point cloud looks good at first glance, but it seems that the variables are only half the size of the original MATLAB variables. In the attached screenshots you can see what I mean.
Octave:
MATLAB:
You can see that the dimension of e.g. M in Octave is 1311114x3 but in MATLAB it is 2622227x3. The actual number of rows in my raw file is 2622227 as well.
Here you can see an extract of the raw file (original data) that I use.
Rotation angle Measured distance
-0,090 26,295
-0,342 26,294
-0,594 26,294
-0,846 26,295
-1,098 26,294
-1,368 26,296
-1,620 26,296
-1,872 26,296
In MATLAB I created my output variable as follows.
data = table;
data.Rotationangle = cell2mat(raw(:, 1));
data.Measureddistance = cell2mat(raw(:, 2));
As there is no table function in Octave I wrote
data = cellfun(#(x)str2num(x), strrep(raw, ',', '.'))
instead.
Octave also has no struct2array function, so I had to replace it as well.
In MATLAB I wrote.
data = table2array(data);
In Octave this was a bit more difficult to do. I had to create a struct2array function, which I did by means of this bug report.
%% Create a struct2array function
function retval = struct2array (input_struct)
%input check
if (~isstruct (input_struct) || (nargin ~= 1))
print_usage;
endif
%convert to cell array and flatten/concatenate output.
retval = [ (struct2cell (input_struct)){:}];
endfunction
clear b;
b.a = data;
data = struct2array(b);
Did I make a mistake somewhere and could someone help me to solve this problem?
edit:
Here's the part of my script where I'm using raw.
delimiter = '\t';
startRow = 5;
formatSpec = '%s%s%[^\n\r]';
fileID = fopen(filename,'r');
dataArray = textscan(fileID, formatSpec, 'Delimiter', delimiter, 'HeaderLines' ,startRow-1, 'ReturnOnError', false, 'EndOfLine', '\r\n');
fclose(fileID);
%% Convert the contents of columns containing numeric text to numbers.
% Replace non-numeric text with NaN.
raw = repmat({''},length(dataArray{1}),length(dataArray)-1);
for col=1:length(dataArray)-1
raw(1:length(dataArray{col}),col) = mat2cell(dataArray{col}, ones(length(dataArray{col}), 1));
end
numericData = NaN(size(dataArray{1},1),size(dataArray,2));
for col=[1,2]
% Converts text in the input cell array to numbers. Replaced non-numeric
% text with NaN.
rawData = dataArray{col};
for row=1:size(rawData, 1)
% Create a regular expression to detect and remove non-numeric prefixes and
% suffixes.
regexstr = '(?<prefix>.*?)(?<numbers>([-]*(\d+[\.]*)+[\,]{0,1}\d*[eEdD]{0,1}[-+]*\d*[i]{0,1})|([-]*(\d+[\.]*)*[\,]{1,1}\d+[eEdD]{0,1}[-+]*\d*[i]{0,1}))(?<suffix>.*)';
try
result = regexp(rawData(row), regexstr, 'names');
numbers = result.numbers;
% Detected commas in non-thousand locations.
invalidThousandsSeparator = false;
if numbers.contains('.')
thousandsRegExp = '^\d+?(\.\d{3})*\,{0,1}\d*$';
if isempty(regexp(numbers, thousandsRegExp, 'once'))
numbers = NaN;
invalidThousandsSeparator = true;
end
end
% Convert numeric text to numbers.
if ~invalidThousandsSeparator
numbers = strrep(numbers, '.', '');
numbers = strrep(numbers, ',', '.');
numbers = textscan(char(numbers), '%f');
numericData(row, col) = numbers{1};
raw{row, col} = numbers{1};
end
catch
raw{row, col} = rawData{row};
end
end
end
You don't see any raw in my workspaces because I clear all temporary variables before I reconstruct my point cloud.
Also my original data in row 1311114 and 1311115 look normal.
edit 2:
As suggested here is a small example table to clarify what I want and what MATLAB does with the table2array function in my case.
data =
-0.0900 26.2950
-0.3420 26.2940
-0.5940 26.2940
-0.8460 26.2950
-1.0980 26.2940
-1.3680 26.2960
-1.6200 26.2960
-1.8720 26.2960
With the struct2array function I used in Octave I get the following array.
data =
-0.090000 26.295000
-0.594000 26.294000
-1.098000 26.294000
-1.620000 26.296000
-2.124000 26.295000
-2.646000 26.293000
-3.150000 26.294000
-3.654000 26.294000
If you compare the Octave array with my original data, you can see that every second row is skipped. This seems to be the reason for 1311114 instead of 2622227 rows.
edit 3:
I tried to solve my problem with the suggestions of #Tasos Papastylianou, which unfortunately was not successful.
First I did the variant with a struct.
data = struct();
data.Rotationangle = [raw(:,1)];
data.Measureddistance = [raw(:,2)];
data = cell2mat( struct2cell (data ).' )
But this leads to the following structure in my script. (Unfortunately the result is not what I would like to have as shown in edit 2. Don't be surprised, I only used a small part of my raw file to accelerate the run of my script, so here are only 769 lines.)
[766,1] = -357,966
[767,1] = -358,506
[768,1] = -359,010
[769,1] = -359,514
[1,2] = 26,295
[2,2] = 26,294
[3,2] = 26,294
[4,2] = 26,296
Furthermore I get the following error.
error: unary operator '-' not implemented for 'cell' operands
error: called from
Cloud_reconstruction at line 137 column 11
Also the approach with the dataframe octave package didn't work. When I run the following code it leads to the error you can see below.
dataframe2array = #(df) cell2mat( struct(df).x_data );
pkg load dataframe;
data = dataframe();
data.Rotationangle = [raw(:, 1)];
data.Measureddistance = [raw(:, 2)];
dataframe2array(data)
error:
warning: Trying to overwrite colum names
warning: called from
df_matassign at line 147 column 13
subsasgn at line 172 column 14
Cloud_reconstruction at line 106 column 20
warning: Trying to overwrite colum names
warning: called from
df_matassign at line 176 column 13
subsasgn at line 172 column 14
Cloud_reconstruction at line 106 column 20
warning: Trying to overwrite colum names
warning: called from
df_matassign at line 147 column 13
subsasgn at line 172 column 14
Cloud_reconstruction at line 107 column 23
warning: Trying to overwrite colum names
warning: called from
df_matassign at line 176 column 13
subsasgn at line 172 column 14
Cloud_reconstruction at line 107 column 23
error: RHS(_,2): but RHS has size 768x1
error: called from
df_matassign at line 179 column 11
subsasgn at line 172 column 14
Cloud_reconstruction at line 107 column 23
Both error messages refer to the following part of my script where I'm doing the reconstruction of the point cloud in cylindrical coordinates.
distLaserCenter = 47; % Distance between the pipe centerline and the blind zone in mm
m = size(data,1); % Find the length of the first dimension of data
zincr = 0.4/360; % z increment in mm per deg
data(:,1) = -data(:,1);
for i = 1:m
data(i,2) = data(i,2) + distLaserCenter;
if i == 1
data(i,3) = 0;
elseif abs(data(i,1)-data(i-1)) < 100
data(i,3) = data(i-1,3) + zincr*(data(i,1)-data(i-1));
else abs(data(i,1)-data(i-1)) > 100;
data(i,3) = data(i-1,3) + zincr*(data(i,1)-(data(i-1)-360));
end
end
To give some background information for a better understanding. The script is used to reconstruct a pipe as a point cloud. The surface of the pipe was scanned from inside with a laser and the laser measured several points (distance from laser to the inner wall of the pipe) at each deg of rotation. I hope this helps to understand what I want to do with my script.
Not sure exactly what you're trying to do, but here's a toy example of how a struct could be used in an equivalent manner to a table:
matlab:
data = table;
data.A = [1;2;3;4;5];
data.B = [10;20;30;40;50];
table2array(data)
octave:
data = struct();
data.A = [1;2;3;4;5];
data.B = [10;20;30;40;50];
cell2mat( struct2cell (data ).' )
Note the transposition operation (.') before passing the result to cell2mat, since in a table, the 'fieldnames' are arranged horizontally in columns, whereas the struct2cell ends up arranging what used to be the 'fieldnames' as rows.
You might also be interested in the dataframe octave package, which performs similar functions to matlab's table (or in fact, R's dataframe object): https://octave.sourceforge.io/dataframe/ (you can install this by typing pkg install -forge dataframe in your console)
Unfortunately, the way to display the data as an array is still not ideal (see: https://stackoverflow.com/a/55417141/4183191), but you can easily convert that into a tiny function, e.g.
dataframe2array = #(df) cell2mat( struct(df).x_data );
Your code can then become:
pkg load dataframe;
data = dataframe();
data.A = [1;2;3;4;5];
data.B = [10;20;30;40;50];
dataframe2array(data)

Pyevolve - sum of chromosome = 1

# Genome instance, 1D List of 20 elements
genome = G1DList.G1DList(20)
Sets the range max and min of the 1D List
genome.setParams(rangemin=0, rangemax=1)
Change the initializator to Real values
genome.initializator.set(Initializators.G1DListInitializatorReal)
This give 20 elements between 0 and 1. I need that the sum of all elements in the chromosome to be equal to 1 . Any idea how to do this?
I found the answer. I created my own Initializator which is a modified copy of G1DListInitializatorReal:
def G1DListInitializatorRealSumEqualOne(genome, **args):
""" Real initialization function of G1DList
This initializator accepts the *rangemin* and *rangemax* genome parameters.
"""
range_min = genome.getParam("rangemin", 0)
range_max = genome.getParam("rangemax", 100)
genome.genomeList = [random.uniform(range_min, range_max) for i in xrange(genome.getListSize())]
genome.genomeList[:] = [x / sum(genome.genomeList) for x in genome.genomeList]
And then changed the code in my question to:
# Genome instance, 1D List of 20 elements
genome = G1DList.G1DList(20)
Sets the range max and min of the 1D List
genome.setParams(rangemin=0, rangemax=1)
Change the initializator to Real values
genome.initializator.set(G1DListInitializatorRealSumEqualOne)

Error in eval(expr, envir, enclos) while using Predict function

When I try to run predict() on the dataset, it keeps giving me error -
Error in eval(expr, envir, enclos) : object 'LoanRange' not found
Here is the part of dataset -
LoanRange Loan.Type N WAFICO WALTV WAOrigRev WAPTValue
1 0-99999 Conventional 109 722.5216 63.55385 6068.239 0.6031879
2 0-99999 FHA 30 696.6348 80.00100 7129.650 0.5623650
3 0-99999 VA 13 698.6986 74.40525 7838.894 0.4892977
4 100000-149999 Conventional 860 731.2333 68.25817 6438.330 0.5962638
5 100000-149999 FHA 285 673.2256 82.42225 8145.068 0.5211495
6 100000-149999 VA 125 704.1686 87.71306 8911.461 0.5020074
7 150000-199999 Conventional 1291 738.7164 70.08944 8125.979 0.6045117
8 150000-199999 FHA 403 672.0891 84.65318 10112.192 0.5199632
9 150000-199999 VA 195 694.1885 90.77495 10909.393 0.5250807
10 200000-249999 Conventional 1162 740.8614 70.65027 8832.563 0.6111419
11 200000-249999 FHA 348 667.6291 85.13457 11013.856 0.5374226
12 200000-249999 VA 221 702.9796 91.76759 11753.642 0.5078298
13 250000-299999 Conventional 948 742.0405 72.22742 9903.160 0.6106858
Following is the code used for predicting count data N after determining the overdispersion-
model2=glm(N~Loan.Type+WAFICO+WALTV+WAOrigRev+WAPTValue, family=quasipoisson(link = "log"), data = DF)
summary(model2)
This is what I have done to create a sequence of count and use predict function-
countaxis <- seq (0,1500,150)
Y <- predict(model2, list(N=countaxis, type = "response")
At this step, I get the error -
Error in eval(expr, envir, enclos) : object 'LoanRange' not found
Can someone please point me where is the problem here.
Think about what exactly you are trying to predict. You are providing the predict function values of N (via countaxis), but in fact the way you set up your model, N is your response variable and the remaining variables are the predictors. That's why R is asking for LoanRange. It actually needs values for LoanRange, Loan.Type, ..., WAPTValue in order to predict N. So you need to feed predict inputs that let the model try to predict N.
For example, you could do something like this:
# create some fake data to predict N
newdata1 = data.frame(rbind(c("0-99999", "Conventional", 722.5216, 63.55385, 6068.239, 0.6031879),
c("150000-199999", "VA", 12.5216, 3.55385, 60.239, 0.0031879)))
colnames(newdata1) = c("LoanRange" ,"Loan.Type", "WAFICO" ,"WALTV" , "WAOrigRev" ,"WAPTValue")
# ensure that numeric variables are indeed numeric and not factors
newdata1$WAFICO = as.numeric(as.character(newdata1$WAFICO))
newdata1$WALTV = as.numeric(as.character(newdata1$WALTV))
newdata1$WAPTValue = as.numeric(as.character(newdata1$WAPTValue))
newdata1$WAOrigRev = as.numeric(as.character(newdata1$WAOrigRev))
# make predictions - this will output values of N
predict(model2, newdata = newdata1, type = "response")