Dimension problem when converting a MATLAB .m script into an Octave compatible syntax - octave

I want to run a MATLAB script M-file to reconstruct a point cloud in Octave. Therefore I had to rewrite some parts of the code to make it compatible with Octave. Actually the M-file works fine in Octave (I don't get any errors) and also the plotted point cloud looks good at first glance, but it seems that the variables are only half the size of the original MATLAB variables. In the attached screenshots you can see what I mean.
Octave:
MATLAB:
You can see that the dimension of e.g. M in Octave is 1311114x3 but in MATLAB it is 2622227x3. The actual number of rows in my raw file is 2622227 as well.
Here you can see an extract of the raw file (original data) that I use.
Rotation angle Measured distance
-0,090 26,295
-0,342 26,294
-0,594 26,294
-0,846 26,295
-1,098 26,294
-1,368 26,296
-1,620 26,296
-1,872 26,296
In MATLAB I created my output variable as follows.
data = table;
data.Rotationangle = cell2mat(raw(:, 1));
data.Measureddistance = cell2mat(raw(:, 2));
As there is no table function in Octave I wrote
data = cellfun(#(x)str2num(x), strrep(raw, ',', '.'))
instead.
Octave also has no struct2array function, so I had to replace it as well.
In MATLAB I wrote.
data = table2array(data);
In Octave this was a bit more difficult to do. I had to create a struct2array function, which I did by means of this bug report.
%% Create a struct2array function
function retval = struct2array (input_struct)
%input check
if (~isstruct (input_struct) || (nargin ~= 1))
print_usage;
endif
%convert to cell array and flatten/concatenate output.
retval = [ (struct2cell (input_struct)){:}];
endfunction
clear b;
b.a = data;
data = struct2array(b);
Did I make a mistake somewhere and could someone help me to solve this problem?
edit:
Here's the part of my script where I'm using raw.
delimiter = '\t';
startRow = 5;
formatSpec = '%s%s%[^\n\r]';
fileID = fopen(filename,'r');
dataArray = textscan(fileID, formatSpec, 'Delimiter', delimiter, 'HeaderLines' ,startRow-1, 'ReturnOnError', false, 'EndOfLine', '\r\n');
fclose(fileID);
%% Convert the contents of columns containing numeric text to numbers.
% Replace non-numeric text with NaN.
raw = repmat({''},length(dataArray{1}),length(dataArray)-1);
for col=1:length(dataArray)-1
raw(1:length(dataArray{col}),col) = mat2cell(dataArray{col}, ones(length(dataArray{col}), 1));
end
numericData = NaN(size(dataArray{1},1),size(dataArray,2));
for col=[1,2]
% Converts text in the input cell array to numbers. Replaced non-numeric
% text with NaN.
rawData = dataArray{col};
for row=1:size(rawData, 1)
% Create a regular expression to detect and remove non-numeric prefixes and
% suffixes.
regexstr = '(?<prefix>.*?)(?<numbers>([-]*(\d+[\.]*)+[\,]{0,1}\d*[eEdD]{0,1}[-+]*\d*[i]{0,1})|([-]*(\d+[\.]*)*[\,]{1,1}\d+[eEdD]{0,1}[-+]*\d*[i]{0,1}))(?<suffix>.*)';
try
result = regexp(rawData(row), regexstr, 'names');
numbers = result.numbers;
% Detected commas in non-thousand locations.
invalidThousandsSeparator = false;
if numbers.contains('.')
thousandsRegExp = '^\d+?(\.\d{3})*\,{0,1}\d*$';
if isempty(regexp(numbers, thousandsRegExp, 'once'))
numbers = NaN;
invalidThousandsSeparator = true;
end
end
% Convert numeric text to numbers.
if ~invalidThousandsSeparator
numbers = strrep(numbers, '.', '');
numbers = strrep(numbers, ',', '.');
numbers = textscan(char(numbers), '%f');
numericData(row, col) = numbers{1};
raw{row, col} = numbers{1};
end
catch
raw{row, col} = rawData{row};
end
end
end
You don't see any raw in my workspaces because I clear all temporary variables before I reconstruct my point cloud.
Also my original data in row 1311114 and 1311115 look normal.
edit 2:
As suggested here is a small example table to clarify what I want and what MATLAB does with the table2array function in my case.
data =
-0.0900 26.2950
-0.3420 26.2940
-0.5940 26.2940
-0.8460 26.2950
-1.0980 26.2940
-1.3680 26.2960
-1.6200 26.2960
-1.8720 26.2960
With the struct2array function I used in Octave I get the following array.
data =
-0.090000 26.295000
-0.594000 26.294000
-1.098000 26.294000
-1.620000 26.296000
-2.124000 26.295000
-2.646000 26.293000
-3.150000 26.294000
-3.654000 26.294000
If you compare the Octave array with my original data, you can see that every second row is skipped. This seems to be the reason for 1311114 instead of 2622227 rows.
edit 3:
I tried to solve my problem with the suggestions of #Tasos Papastylianou, which unfortunately was not successful.
First I did the variant with a struct.
data = struct();
data.Rotationangle = [raw(:,1)];
data.Measureddistance = [raw(:,2)];
data = cell2mat( struct2cell (data ).' )
But this leads to the following structure in my script. (Unfortunately the result is not what I would like to have as shown in edit 2. Don't be surprised, I only used a small part of my raw file to accelerate the run of my script, so here are only 769 lines.)
[766,1] = -357,966
[767,1] = -358,506
[768,1] = -359,010
[769,1] = -359,514
[1,2] = 26,295
[2,2] = 26,294
[3,2] = 26,294
[4,2] = 26,296
Furthermore I get the following error.
error: unary operator '-' not implemented for 'cell' operands
error: called from
Cloud_reconstruction at line 137 column 11
Also the approach with the dataframe octave package didn't work. When I run the following code it leads to the error you can see below.
dataframe2array = #(df) cell2mat( struct(df).x_data );
pkg load dataframe;
data = dataframe();
data.Rotationangle = [raw(:, 1)];
data.Measureddistance = [raw(:, 2)];
dataframe2array(data)
error:
warning: Trying to overwrite colum names
warning: called from
df_matassign at line 147 column 13
subsasgn at line 172 column 14
Cloud_reconstruction at line 106 column 20
warning: Trying to overwrite colum names
warning: called from
df_matassign at line 176 column 13
subsasgn at line 172 column 14
Cloud_reconstruction at line 106 column 20
warning: Trying to overwrite colum names
warning: called from
df_matassign at line 147 column 13
subsasgn at line 172 column 14
Cloud_reconstruction at line 107 column 23
warning: Trying to overwrite colum names
warning: called from
df_matassign at line 176 column 13
subsasgn at line 172 column 14
Cloud_reconstruction at line 107 column 23
error: RHS(_,2): but RHS has size 768x1
error: called from
df_matassign at line 179 column 11
subsasgn at line 172 column 14
Cloud_reconstruction at line 107 column 23
Both error messages refer to the following part of my script where I'm doing the reconstruction of the point cloud in cylindrical coordinates.
distLaserCenter = 47; % Distance between the pipe centerline and the blind zone in mm
m = size(data,1); % Find the length of the first dimension of data
zincr = 0.4/360; % z increment in mm per deg
data(:,1) = -data(:,1);
for i = 1:m
data(i,2) = data(i,2) + distLaserCenter;
if i == 1
data(i,3) = 0;
elseif abs(data(i,1)-data(i-1)) < 100
data(i,3) = data(i-1,3) + zincr*(data(i,1)-data(i-1));
else abs(data(i,1)-data(i-1)) > 100;
data(i,3) = data(i-1,3) + zincr*(data(i,1)-(data(i-1)-360));
end
end
To give some background information for a better understanding. The script is used to reconstruct a pipe as a point cloud. The surface of the pipe was scanned from inside with a laser and the laser measured several points (distance from laser to the inner wall of the pipe) at each deg of rotation. I hope this helps to understand what I want to do with my script.

Not sure exactly what you're trying to do, but here's a toy example of how a struct could be used in an equivalent manner to a table:
matlab:
data = table;
data.A = [1;2;3;4;5];
data.B = [10;20;30;40;50];
table2array(data)
octave:
data = struct();
data.A = [1;2;3;4;5];
data.B = [10;20;30;40;50];
cell2mat( struct2cell (data ).' )
Note the transposition operation (.') before passing the result to cell2mat, since in a table, the 'fieldnames' are arranged horizontally in columns, whereas the struct2cell ends up arranging what used to be the 'fieldnames' as rows.
You might also be interested in the dataframe octave package, which performs similar functions to matlab's table (or in fact, R's dataframe object): https://octave.sourceforge.io/dataframe/ (you can install this by typing pkg install -forge dataframe in your console)
Unfortunately, the way to display the data as an array is still not ideal (see: https://stackoverflow.com/a/55417141/4183191), but you can easily convert that into a tiny function, e.g.
dataframe2array = #(df) cell2mat( struct(df).x_data );
Your code can then become:
pkg load dataframe;
data = dataframe();
data.A = [1;2;3;4;5];
data.B = [10;20;30;40;50];
dataframe2array(data)

Related

Gnuplot one-liner to generate a titled line for each row in a CSV file

I've been trying to figure out gnuplot but haven't been getting anywhere for seemingly 2 reasons. My lack of understanding gnuplot set commands, and the layout of my data file. I've decided the best option is to ask for help.
Getting this gnuplot command into a one-liner is the hope.
Example rows from my CSV data file (MyData.csv):
> _TitleRow1_,15.21,15.21,...could be more, could be less
> _TitleRow2_,16.27,16.27,101,55.12,...could be more, could be less
> _TitleRow3_,16.19,16.19,20.8,...could be more, could be less
...(over 100 rows)
Contents of MyData.csv rows will always be a string as the first column for title, followed by an undetermined amount of decimal values. (Each row gets appended to periodically, so specifying an open ended amount of columns to include is needed)
What I'd like to happen is to generate a line graph showing a line for each row in the csv, using the first column as a row title, and the following numbers generating the actual line.
This is the I'm trying:
gnuplot -e 'set datafile separator ","; set key autotitle columnhead; plot "MyData.csv"'
Which results in:
set datafile separator ","; set key autotitle columnhead; plot "MyData.csv"
^
line 0: Bad data on line 2 of file MyData.csv
This looks like an amazing tool and I'm looking forward to learning more about it. Thanks in advance for any hints/assistance!
Your datafile format is very unfortunate for gnuplot which prefers data in columns.
Although, you can also plot rows (which is not straightforward in gnuplot, but see an example here). This requires a strict matrix, but the problem with your data is that you have a variable column count.
Actually, your CSV is not a "correct" CSV, because a CSV should have the same number of columns for all rows, i.e. if one row has less data than the row with maximum data the line should be filled with ,,, as many as needed. That's basically what the script below is doing.
With this you can plot rows with the option matrix (check help matrix). However, you will get some warnings warning: matrix contains missing or undefined values which you can ignore.
Alternatively, you could transpose your data (with variable column count maybe not straightforward). Maybe there are external tools which can do it easily. With gnuplot-only it will be a bit cumbersome (and first you would have to fill your shorter rows as in the example below).
Maybe there is a simpler and better gnuplot-only solution which I am currently not aware of.
Data: SO73099645.dat
_TitleRow1_, 1.2, 1.3
_TitleRow2_, 2.2, 2.3, 2.4, 2.5
_TitleRow3_, 3.2, 3.3, 3.4
Script:
### plotting rows with variable columns
reset session
FILE = "SO73099645.dat"
getColumns(s) = (sum [i=1:strlen(s)] (s[i:i] eq ',') ? 1 : 0) + 1
set datafile separator "\t"
colCount = 0
myNaNs = myHeaders = ''
stats FILE u (rowCount=$0+1, c=getColumns(strcol(1)), c>colCount ? colCount=c : 0) nooutput
do for [i=1:colCount] {myNaNs=myNaNs.',NaN' }
set table $Data
plot FILE u (s=strcol(1),c=getColumns(s),s.myNaNs[1:(colCount-c)*4]) w table
unset table
set datafile separator ","
stats FILE u (myHeaders=sprintf('%s "%s"',myHeaders,strcol(1))) nooutput
myHeader(n) = word(myHeaders,n)
set key noenhanced
plot for [row=0:rowCount-1] $Data matrix u 1:3 every ::1:row::row w lp pt 7 ti myHeader(row+1)
### end of script
As "one-liner":
FILE = "SO/SO73099645.dat"; getColumns(s) = (sum [i=1:strlen(s)] (s[i:i] eq ',') ? 1 : 0) + 1; set datafile separator "\t"; colCount = 0; myNaNs = myHeaders = ''; stats FILE u (rowCount=$0+1, c=getColumns(strcol(1)), c>colCount ? colCount=c : 0) nooutput; do for [i=1:colCount] {myNaNs=myNaNs.',NaN' }; set table $Data; plot FILE u (s=strcol(1),c=getColumns(s),s.myNaNs[1:(colCount-c)*4]) w table; unset table; set datafile separator ","; stats FILE u (myHeaders=sprintf('%s "%s"',myHeaders,strcol(1))) nooutput; myHeader(n) = word(myHeaders,n); set key noenhanced; plot for [row=0:rowCount-1] $Data matrix u 1:3 every ::1:row::row w lp pt 7 ti myHeader(row+1)
Result:

Why octave error with function huffmandeco about large index types?

I've got a little MatLab script, which I try to understand. It doesn't do very much. It only reads a text from a file and encode and decode it with the Huffman-functions.
But it throws an error while decoding:
"error: out of memory or dimension too large for Octave's index type
error: called from huffmandeco>dict2tree at line 95 column 19"
I don't know why, because I debugged it and don't see a large index type.
I added the part which calculates p from the input text.
%text is a random input text file in ASCII
%calculate the relative frequency of every Symbol
for i=0:127
nlet=length(find(text==i));
p(i+1)=nlet/length(text);
end
symb = 0:127;
dict = huffmandict(symb,p); % Create dictionary
compdata = huffmanenco(fdata,dict); % Encode the data
dsig = huffmandeco(compdata,dict); % Decode the Huffman code
I can oly use octave instead of MatLab. I don't know, if there is an unexpected error. I use the Octave Version 6.2.0 on Win10. I tried the version for large data, it didn't change anything.
Maybe anyone knows the error in this context?
EDIT:
I debugged the code again. In the function huffmandeco I found the following function:
function tree = dict2tree (dict)
L = length (dict);
lengths = zeros (1, L);
## the depth of the tree is limited by the maximum word length.
for i = 1:L
lengths(i) = length (dict{i});
endfor
m = max (lengths);
tree = zeros (1, 2^(m+1)-1)-1;
for i = 1:L
pointer = 1;
word = dict{i};
for bit = word
pointer = 2 * pointer + bit;
endfor
tree(pointer) = i;
endfor
endfunction
The maximum length m in this case is 82. So the function calculates:
tree = zeros (1, 2^(82+1)-1)-1.
So it's obvious why the error called a too large index type.
But there must be a solution or another error, because the code is tested before.
I haven't weeded through the code enough to know why yet, but huffmandict is not ignoring zero-probability symbols the way it claims to. Nor have I been able to find a bug report on Savannah, but again I haven't searched thoroughly.
A workaround is to limit the symbol list and their probabilities to only the symbols that actually occur. Using containers.Map would be ideal, but in Octave you can do that with a couple of the outputs from unique:
% Create a symbol table of the unique characters in the input string
% and the indices into the table for each character in the string.
[symbols, ~, inds] = unique(textstr);
inds = inds.'; % just make it easier to read
For the string
textstr = 'Random String Input.';
the result is:
>> symbols
symbols = .IRSadgimnoprtu
>> inds
inds =
Columns 1 through 19:
4 6 11 7 12 10 1 5 15 14 9 11 8 1 3 11 13 16 15
Column 20:
2
So the first symbol in the input string is symbols(4), the second is symbols(6), and so on.
From there, you just use symbols and inds to create the dictionary and encode/decode the signal. Here's a quick demo script:
textstr = 'Random String Input.';
fprintf("Starting string: %s\n", textstr);
% Create a symbol table of the unique characters in the input string
% and the indices into the table for each character in the string.
[symbols, ~, inds] = unique(textstr);
inds = inds.'; % just make it easier to read
% Calculate the frequency of each symbol in table
% max(inds) == numel(symbols)
p = histc(inds, 1:max(inds))/numel(inds);
dict = huffmandict(symbols, p);
compdata = huffmanenco(inds, dict);
dsig = huffmandeco(compdata, dict);
fprintf("Decoded string: %s\n", symbols(dsig));
And the output:
Starting string: Random String Input.
Decoded string: Random String Input.
To encode strings other than the original input string, you would have to map the characters to symbol indices (ensuring that all symbols in the string are actually present in the symbol table, obviously):
>> [~, s_idx] = ismember('trogdor', symbols)
s_idx =
15 14 12 8 7 12 14
>> compdata = huffmanenco(s_idx, dict);
>> dsig = huffmandeco(compdata, dict);
>> fprintf("Decoded string: %s\n", symbols(dsig));
Decoded string: trogdor

incremental search method script errors

I wrote my very first octave script which is a code for the incremental search method for root finding but I encountered numerous errors that I found hard to understand.
The following is the script:
clear
syms x;
fct=input('enter your function in standard form: ');
f=str2func(fct); % This built in octave function creates functions from strings
Xmax=input('X maximum= ');
Xinit=input('X initial= ');
dx=input('dx= ');
epsi=input('epsi= ');
N=10; % the amount by which dx is decreased in case a root was found.
while (x<=Xmax)
f1=f(Xinit);
x=x+dx
f2=f(x);
if (abs(f2)>(1/epsi))
disp('The function approches infinity at ', num2str(x));
x=x+epsi;
else
if ((f2*f1)>0)
x=x+dx;
elseif ((f2*f1)==0)
disp('a root at ', num2str );
x=x+epsi;
else
if (dx < epsi)
disp('a root at ', num2str);
x=x+epsi;
else
x=x-dx;
dx=dx/N;
x=x+dx;
end
end
end
end
when running it the following errors showed up:
>> Incremental
enter your function in standard form: 1+(5.25*x)-(sec(sqrt(0.68*x)))
warning: passing floating-point values to sym is dangerous, see "help sym"
warning: called from
double_to_sym_heuristic at line 50 column 7
sym at line 379 column 13
mtimes at line 63 column 5
Incremental at line 3 column 4
warning: passing floating-point values to sym is dangerous, see "help sym"
warning: called from
double_to_sym_heuristic at line 50 column 7
sym at line 379 column 13
mtimes at line 63 column 5
Incremental at line 3 column 4
error: wrong type argument 'class'
error: str2func: FCN_NAME must be a string
error: called from
Incremental at line 4 column 2
Below is the flowchart of the incremental search method:
The problem happens in this line:
fct=input('enter your function in standard form: ');
Here input takes the user input and evaluates it. It tries to convert it into a number. In the next line,
f=str2func(fct)
you assume fct is a string.
To fix the problems, tell input to just return the user's input unchanged as a string (see the docs):
fct=input('enter your function in standard form: ', 's');

Nonlinear fits in Octave

currently I'm using nonlin_curvefit function from GNU Octave's 'optim' package to fit data with . But this time I did also need the uncertainty of the returned parameters to determine the quality of the fit. After reading through the documentation I tied using the function curvefit_stat.
However whenever I alwas get errors using this functiona and I can't make any sense of the error message. I'm using Octave 4.2.2 from Ubuntu 18.04's default repository.
Standalone minimal example and error messages below. Startparameters init_cvg usually produces good result, while using init_dvg usually results in poor fits:
1;
x = linspace(-2*pi, 2*pi, 600);
ydata = 3.4*sin(1.6*x) + rand(size(x))*1.3;
f = #(p, x) p(1)*sin(p(2).*x);
function y = testfun(p ,x)
y = p(1).*sin(p(2).*x);
endfunction
init_cvg = [1; 1.1];
init_dvg = [1; 1.0];
[pc, mod_valc, cvgc, outpc] = nonlin_curvefit(f, init_cvg, x, ydata);
[pd, mod_vald, cvgd, outpd] = nonlin_curvefit(f, init_dvg, x, ydata);
hold off
plot(x, yd, "b.");
hold on;
plot(x, mod_valc, "r-");
plot(x, mod_vald, "color", [0.9 0.4 0.3]);
settings = optimset("ret_covp", true);
covpc = curvefit_stat(f, pc, x, ydata, settings);
covpd = curvefit_stat(f, pd, x, ydata, settings);
puts("sqrt(diag(covpc))")
sqrt(diag(covpc))
puts("sqrt(diag(covpd))")
sqrt(diag(covpd))
The first error message occurs when I use f as a model function, the second occurs when I use testfun instead:
>> curvefit_stat_TEST
error: label of objective function must be specified
error: called from
__residmin_stat__ at line 566 column 7
curvefit_stat at line 56 column 7
curvefit_stat_TEST at line 25 column 7
>> curvefit_stat_TEST
error: 'p' undefined near line 8 column 7
error: called from
testfun at line 8 column 5
curvefit_stat_TEST at line 25 column 7
>>
Could somebody confirm this error ?
I would appreciate any help.
I found the problem.
I needed to add "abjf_type", "wls" as arguments to optimset.

Retrieve blob field from mySQL database with MATLAB

I'm accessing public mySQL database using JDBC and mySQL java connector. exonCount is int(10), exonStarts and exonEnds are longblob fields.
javaaddpath('mysql-connector-java-5.1.12-bin.jar')
host = 'genome-mysql.cse.ucsc.edu';
user = 'genome';
password = '';
dbName = 'hg18';
jdbcString = sprintf('jdbc:mysql://%s/%s', host, dbName);
jdbcDriver = 'com.mysql.jdbc.Driver';
dbConn = database(dbName, user , password, jdbcDriver, jdbcString);
gene.Symb = 'CDKN2B';
% Check to make sure that we successfully connected
if isconnection(dbConn)
qry = sprintf('SELECT exonCount, exonStarts, exonEnds FROM refFlat WHERE geneName=''%s''',gene.Symb);
result = get(fetch(exec(dbConn, qry)), 'Data');
fprintf('Connection failed: %s\n', dbConn.Message);
end
Here is the result:
result =
[2] [18x1 int8] [18x1 int8]
[2] [18x1 int8] [18x1 int8]
result{1,2}'
ans =
50 49 57 57 50 57 48 49 44 50 49 57 57 56 54 55 51 44
This is wrong. The length of vectors in 2nd and 3rd columns should match the numbers in the 1st column.
The 1st blob, for example, should be [21992901; 21998673]. How I can convert it?
Update:
Just after submitting this question I thought it might be hex representation of a string.
And it was confirmed:
>> char(result{1,2}')
ans =
21992901,21998673,
So now I need to convert all blobs hex data into numeric vectors. Still thinking to do it in a vectorized way, since number of rows can be large.
This will convert your character data to numeric vectors for all except the first column of data in result, placing the results back into the appropriate cells:
result(:,2:end) = cellfun(#(x) str2num(char(x'))',... %# Apply fcn to each cell
result(:,2:end),... %# Input cells
'UniformOutput',false); %# Output as a cell array
I suggest using textscan
exons = cellfun(#(x) textscan(char(x'),'%d','Delimiter',','),...
result(:,2:end),'UniformOutput',false);
To get a cell array for each of the two numbers, you can replace the format string by %d,%d and drop the Delimiter option.
Here is what I do:
function res = blob2num(x)
res = str2double(regexp(char(x'),'[^,]+','match')');
then
exons = cellfun(#blob2num,result(:,2:3)','UniformOutput',0)
exons =
[2x1 double] [2x1 double]
[2x1 double] [2x1 double]
Any better solution? May be on the step of retrieving data?