Can't read previously saved data from *.m files - octave

[layer1 layer2 layer3] = trainNeuralNetwork4L(tlab, tvec, clab, cvec, 150, 100, 10);
save layerOne100SECOND.m layer1
save layerTwo100SECOND.m layer2
save layerThree100SECOND.m layer3
[efficiency errorsMatrix] = testClassifier4L(layer1, layer2, layer3, 150, 100, 10, tstv, tstl)
1 min 55 s
efficiency = 0.96150
[....]
load "layerTwo100SECOND.m"
layerTwo100SECOND
parse error near line 6 of file /home/yob/studies/rob/lab5/src/layerTwo100SECOND.m
syntax error
>>> 0.3555228566483329 1.434063629132475 0.3947326168010625 -0.2081288665103496 2.116026824600183 -3.72004826748463 -5.971912014167303 -1.831568668193203 -0.5698533706125537 -0.302019433067382 2.105773052363495 -1.386054572212726 1.379784981138861 2.086342965563345 1.686560884521974 1.501297857975125 5.491292848790862 -3.068496819708705 1.709375867569474 -0.0007631747244577478 -3.408706829842817 3.633531634060732 -4.848485685095641 -7.071386223304461 1.005495674207059 1.729698733795992 1.332654214742491 -2.757799109392227 0.5703177663227227 -3.962183321109198 -1.862612684812663 0.002426506616464667 -1.0133423788506 0.9856584491014603 3.261391305445486 -0.238792116035831 7.213403195852512 -0.4550088635822298 2.014786513359268 5.439781417403554 -1.780067076293333 -1.141234270367437 -3.716379329290984 1.329603499392993 0.6289460687541696 1.38704906311103 -1.799460630680088 -1.231927489757737 -1.199171465361949 6.464325931161664 0.7819466841352927 1.518220081499355 -0.3605511334486079 6.646043807207327 -1.885519415534916 1.164993883529136 -0.6867734922571105 -3.487015662787853 0.6052594571214193 0.9747958246654298 -6.681621035920442 6.539828816493673 0.4174688104699146 1.804835542540412 3.099980655618463 0.1957057586983393 -0.5199262355448695 -0.05556003295310553 0.5458621853042805 4.053727148988344 5.08596174444348 -4.4719975219626 4.718638484049811 4.579389030123606 -0.3683947372431971 0.9758069969974679 0.4742051227060113 6.761326112144753 0.9816521216523206 1.790072342537753 0.4513686207416066 -2.880053219384659 -3.256083938937911 3.099498881741825 -0.4967119404782309 -0.6140345297878478 -0.9933076418596357 7.522343253108136 4.93675021253316 -2.693878828387868 -1.358775970578509 -0.7940899801569826 4.867002040829598 4.418439759567837 -2.014761152547027 0.2349575211823655 -4.494720934106189 -2.674441246174409 -0.8495958842163256 0.1921793737146104
^
Why is it impossible to use previously saved data? Is there any way to use them one more time?

Ok, I dealed with it. I had to call: load -ascii filename.

Related

Cannot CSV Load a file in Colab Using tf.compat.v1.keras.utils.get_file

I have mounted my GDrive and have csv file in a folder. I am following the tutorial. However, when I issue the tf.keras.utils.get_file(), I get a ValueError As follows.
data_folder = r"/content/drive/My Drive/NLP/project2/data"
import os
print(os.listdir(data_folder))
It returns:
['crowdsourced_labelled_dataset.csv',
'P2_Testing_Dataset.csv',
'P2_Training_Dataset_old.csv',
'P2_Training_Dataset.csv']
TRAIN_DATA_URL = os.path.join(data_folder, 'P2_Training_Dataset.csv')
train_file_path = tf.compat.v1.keras.utils.get_file("train.csv", TRAIN_DATA_URL)
But this returns:
Downloading data from /content/drive/My Drive/NLP/project2/data/P2_Training_Dataset.csv
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-16-5bd642083471> in <module>()
2 TRAIN_DATA_URL = os.path.join(data_folder, 'P2_Training_Dataset.csv')
3 TEST_DATA_URL = os.path.join(data_folder, 'P2_Testing_Dataset.csv')
----> 4 train_file_path = tf.compat.v1.keras.utils.get_file("train.csv", TRAIN_DATA_URL)
5 test_file_path = tf.compat.v1.keras.utils.get_file("eval.csv", TEST_DATA_URL)
6 frames
/usr/lib/python3.6/urllib/request.py in _parse(self)
382 self.type, rest = splittype(self._full_url)
383 if self.type is None:
--> 384 raise ValueError("unknown url type: %r" % self.full_url)
385 self.host, self.selector = splithost(rest)
386 if self.host:
ValueError: unknown url type: '/content/drive/My Drive/NLP/project2/data/P2_Training_Dataset.csv'
What am I doing wrong please?
As per the docs, this will be the outcome of a call to the function tf.compat.v1.keras.utils.get_file.
tf.keras.utils.get_file(
fname,
origin,
untar=False,
md5_hash=None,
file_hash=None,
cache_subdir='datasets',
hash_algorithm='auto',
extract=False,
archive_format='auto',
cache_dir=None
)
By default the file at the url origin is downloaded to the cache_dir ~/.keras, placed in the cache_subdir datasets, and given the filename fname. The final location of a file example.txt would therefore be ~/.keras/datasets/example.txt.
Returns:
Path to the downloaded file
Since you already have the data in your drive, there's no need to download it again (and IIUC, the function is expecting an accessible URL). Also, there's no need of obtaining the file name from a function call because you already know it.
Assuming the drive is mounted, you can replace your file paths as below:
train_file_path = os.path.join(data_folder, 'P2_Training_Dataset.csv')
test_file_path = os.path.join(data_folder, 'P2_Testing_Dataset.csv')

Dimension problem when converting a MATLAB .m script into an Octave compatible syntax

I want to run a MATLAB script M-file to reconstruct a point cloud in Octave. Therefore I had to rewrite some parts of the code to make it compatible with Octave. Actually the M-file works fine in Octave (I don't get any errors) and also the plotted point cloud looks good at first glance, but it seems that the variables are only half the size of the original MATLAB variables. In the attached screenshots you can see what I mean.
Octave:
MATLAB:
You can see that the dimension of e.g. M in Octave is 1311114x3 but in MATLAB it is 2622227x3. The actual number of rows in my raw file is 2622227 as well.
Here you can see an extract of the raw file (original data) that I use.
Rotation angle Measured distance
-0,090 26,295
-0,342 26,294
-0,594 26,294
-0,846 26,295
-1,098 26,294
-1,368 26,296
-1,620 26,296
-1,872 26,296
In MATLAB I created my output variable as follows.
data = table;
data.Rotationangle = cell2mat(raw(:, 1));
data.Measureddistance = cell2mat(raw(:, 2));
As there is no table function in Octave I wrote
data = cellfun(#(x)str2num(x), strrep(raw, ',', '.'))
instead.
Octave also has no struct2array function, so I had to replace it as well.
In MATLAB I wrote.
data = table2array(data);
In Octave this was a bit more difficult to do. I had to create a struct2array function, which I did by means of this bug report.
%% Create a struct2array function
function retval = struct2array (input_struct)
%input check
if (~isstruct (input_struct) || (nargin ~= 1))
print_usage;
endif
%convert to cell array and flatten/concatenate output.
retval = [ (struct2cell (input_struct)){:}];
endfunction
clear b;
b.a = data;
data = struct2array(b);
Did I make a mistake somewhere and could someone help me to solve this problem?
edit:
Here's the part of my script where I'm using raw.
delimiter = '\t';
startRow = 5;
formatSpec = '%s%s%[^\n\r]';
fileID = fopen(filename,'r');
dataArray = textscan(fileID, formatSpec, 'Delimiter', delimiter, 'HeaderLines' ,startRow-1, 'ReturnOnError', false, 'EndOfLine', '\r\n');
fclose(fileID);
%% Convert the contents of columns containing numeric text to numbers.
% Replace non-numeric text with NaN.
raw = repmat({''},length(dataArray{1}),length(dataArray)-1);
for col=1:length(dataArray)-1
raw(1:length(dataArray{col}),col) = mat2cell(dataArray{col}, ones(length(dataArray{col}), 1));
end
numericData = NaN(size(dataArray{1},1),size(dataArray,2));
for col=[1,2]
% Converts text in the input cell array to numbers. Replaced non-numeric
% text with NaN.
rawData = dataArray{col};
for row=1:size(rawData, 1)
% Create a regular expression to detect and remove non-numeric prefixes and
% suffixes.
regexstr = '(?<prefix>.*?)(?<numbers>([-]*(\d+[\.]*)+[\,]{0,1}\d*[eEdD]{0,1}[-+]*\d*[i]{0,1})|([-]*(\d+[\.]*)*[\,]{1,1}\d+[eEdD]{0,1}[-+]*\d*[i]{0,1}))(?<suffix>.*)';
try
result = regexp(rawData(row), regexstr, 'names');
numbers = result.numbers;
% Detected commas in non-thousand locations.
invalidThousandsSeparator = false;
if numbers.contains('.')
thousandsRegExp = '^\d+?(\.\d{3})*\,{0,1}\d*$';
if isempty(regexp(numbers, thousandsRegExp, 'once'))
numbers = NaN;
invalidThousandsSeparator = true;
end
end
% Convert numeric text to numbers.
if ~invalidThousandsSeparator
numbers = strrep(numbers, '.', '');
numbers = strrep(numbers, ',', '.');
numbers = textscan(char(numbers), '%f');
numericData(row, col) = numbers{1};
raw{row, col} = numbers{1};
end
catch
raw{row, col} = rawData{row};
end
end
end
You don't see any raw in my workspaces because I clear all temporary variables before I reconstruct my point cloud.
Also my original data in row 1311114 and 1311115 look normal.
edit 2:
As suggested here is a small example table to clarify what I want and what MATLAB does with the table2array function in my case.
data =
-0.0900 26.2950
-0.3420 26.2940
-0.5940 26.2940
-0.8460 26.2950
-1.0980 26.2940
-1.3680 26.2960
-1.6200 26.2960
-1.8720 26.2960
With the struct2array function I used in Octave I get the following array.
data =
-0.090000 26.295000
-0.594000 26.294000
-1.098000 26.294000
-1.620000 26.296000
-2.124000 26.295000
-2.646000 26.293000
-3.150000 26.294000
-3.654000 26.294000
If you compare the Octave array with my original data, you can see that every second row is skipped. This seems to be the reason for 1311114 instead of 2622227 rows.
edit 3:
I tried to solve my problem with the suggestions of #Tasos Papastylianou, which unfortunately was not successful.
First I did the variant with a struct.
data = struct();
data.Rotationangle = [raw(:,1)];
data.Measureddistance = [raw(:,2)];
data = cell2mat( struct2cell (data ).' )
But this leads to the following structure in my script. (Unfortunately the result is not what I would like to have as shown in edit 2. Don't be surprised, I only used a small part of my raw file to accelerate the run of my script, so here are only 769 lines.)
[766,1] = -357,966
[767,1] = -358,506
[768,1] = -359,010
[769,1] = -359,514
[1,2] = 26,295
[2,2] = 26,294
[3,2] = 26,294
[4,2] = 26,296
Furthermore I get the following error.
error: unary operator '-' not implemented for 'cell' operands
error: called from
Cloud_reconstruction at line 137 column 11
Also the approach with the dataframe octave package didn't work. When I run the following code it leads to the error you can see below.
dataframe2array = #(df) cell2mat( struct(df).x_data );
pkg load dataframe;
data = dataframe();
data.Rotationangle = [raw(:, 1)];
data.Measureddistance = [raw(:, 2)];
dataframe2array(data)
error:
warning: Trying to overwrite colum names
warning: called from
df_matassign at line 147 column 13
subsasgn at line 172 column 14
Cloud_reconstruction at line 106 column 20
warning: Trying to overwrite colum names
warning: called from
df_matassign at line 176 column 13
subsasgn at line 172 column 14
Cloud_reconstruction at line 106 column 20
warning: Trying to overwrite colum names
warning: called from
df_matassign at line 147 column 13
subsasgn at line 172 column 14
Cloud_reconstruction at line 107 column 23
warning: Trying to overwrite colum names
warning: called from
df_matassign at line 176 column 13
subsasgn at line 172 column 14
Cloud_reconstruction at line 107 column 23
error: RHS(_,2): but RHS has size 768x1
error: called from
df_matassign at line 179 column 11
subsasgn at line 172 column 14
Cloud_reconstruction at line 107 column 23
Both error messages refer to the following part of my script where I'm doing the reconstruction of the point cloud in cylindrical coordinates.
distLaserCenter = 47; % Distance between the pipe centerline and the blind zone in mm
m = size(data,1); % Find the length of the first dimension of data
zincr = 0.4/360; % z increment in mm per deg
data(:,1) = -data(:,1);
for i = 1:m
data(i,2) = data(i,2) + distLaserCenter;
if i == 1
data(i,3) = 0;
elseif abs(data(i,1)-data(i-1)) < 100
data(i,3) = data(i-1,3) + zincr*(data(i,1)-data(i-1));
else abs(data(i,1)-data(i-1)) > 100;
data(i,3) = data(i-1,3) + zincr*(data(i,1)-(data(i-1)-360));
end
end
To give some background information for a better understanding. The script is used to reconstruct a pipe as a point cloud. The surface of the pipe was scanned from inside with a laser and the laser measured several points (distance from laser to the inner wall of the pipe) at each deg of rotation. I hope this helps to understand what I want to do with my script.
Not sure exactly what you're trying to do, but here's a toy example of how a struct could be used in an equivalent manner to a table:
matlab:
data = table;
data.A = [1;2;3;4;5];
data.B = [10;20;30;40;50];
table2array(data)
octave:
data = struct();
data.A = [1;2;3;4;5];
data.B = [10;20;30;40;50];
cell2mat( struct2cell (data ).' )
Note the transposition operation (.') before passing the result to cell2mat, since in a table, the 'fieldnames' are arranged horizontally in columns, whereas the struct2cell ends up arranging what used to be the 'fieldnames' as rows.
You might also be interested in the dataframe octave package, which performs similar functions to matlab's table (or in fact, R's dataframe object): https://octave.sourceforge.io/dataframe/ (you can install this by typing pkg install -forge dataframe in your console)
Unfortunately, the way to display the data as an array is still not ideal (see: https://stackoverflow.com/a/55417141/4183191), but you can easily convert that into a tiny function, e.g.
dataframe2array = #(df) cell2mat( struct(df).x_data );
Your code can then become:
pkg load dataframe;
data = dataframe();
data.A = [1;2;3;4;5];
data.B = [10;20;30;40;50];
dataframe2array(data)

How to unpickle inferSent and load model?

I had a working code that simply loads the infersent model. Now, it wont unpickle the model
MODEL_PATH = "./encoder/infersent1.pkl"
params_model = {'bsize': 64, 'word_emb_dim': 300, 'enc_lstm_dim': 2048,
'pool_type': 'max', 'dpout_model': 0.0, 'version':
model_version}
inferSent = InferSent(params_model)
print(MODEL_PATH)
inferSent.load_state_dict(torch.load(MODEL_PATH))
use_cuda = False
inferSent = inferSent.cuda() if use_cuda else inferSent
# If infersent1 -> use GloVe embeddings. If infersent2 -> use InferSent
embeddings.
W2V_PATH = './dataset/GloVe/glove.840B.300d.txt' if model_version == 1 else
'../dataset/fastText/crawl-300d-2M.vec'
inferSent.set_w2v_path(W2V_PATH)
UnpicklingError: invalid load key, '<'.
The reason of this problem is that your pickle file has not been downloaded properly.
Check the size of your file, it should be around 160 MB. For some reason, the links in the infersent repo don't work. You can build your own NLI model using the train_nli.py script provided in the repository.
python train_nli.py --word_emb_path 'Your word embedding(for example GloVe/fastText)'

Automating a process for multiple CSV file

I've been looking around and couldn't find the answer so here it is.
I'm trying to look into a way for automating of changing the content of a CSV file into something else for machine learning purposes. I have the content of a single line like this:
0, 0, 0, -2.3145, 5.567...... 65, 65, 125, 70.
(516 columns)
And trying to change it to this:
0,
0,
-2.3145,
5.567
....
65,
65,
125,
70.
(516 rows)
So basically transposing the data from horizontal to vertical (single row to single column).
It's easily done using Excel but problem is I have 4000+ of the CSV file so it takes a lot of time.
On top of that, I have to get the first 512 rows and store it into a CSV of another folder adding the last 4 rows into another CSV of another folder while both files have the same name.
Eg:
features(folder)
1.CSV
2.CSV
.....
4000+.CSV
labels(folder)
1.CSV
2.CSV
.....
4000+.CSV
Any suggestions on how I can speed things up? Tried writing my own program but I'm stumped on changing it from row to column. I've only managed to split the single CSV file to it's 4000+ pieces.
EDIT:
I've tested by putting the csv rows into an array and then storing the array into the csv where the code looks like this:
with open('FFTMIM16_512L1H1S0D0_1194.csv', 'r') as f:
reader = csv.reader(f)
your_list = list(reader)
print(your_list[0:512])
print(your_list[512:516])
print(your_list)
with open('test.csv', 'w', newline = '') as fa:
writer = csv.writer(fa)
writer.writerows(your_list[0:511])
with open('test1.csv', 'w', newline = '') as fb:
writer = csv.writer(fb)
writer.writerows(your_list[512:516])
It works but I just need to run it in a loop. A problem that I don't understand is that if I save the values from 0 to 512 on test.csv, it will show 512 counts of rows but when I store from 513 to 516 to test1.csv, it only shows three instead of four rows that I need. Changing fb content from 512 to 516 will work which doesn't make sense to me because the value of 512 in test.csv is 0 while test1.csv is 69. Why is that? From what I can understand is the index of the array, it starts from 0 to the place of number I need. Or is it not the case in python?
EDIT 2:
My new code is as follows:
import csv
import os
import glob
#import itertools
directory = input("INPUT FOLDER: ")
output1 = input("FEATURES FODLER: ")
output2 = input("LABELS FOLDER: ")
in_files = os.path.join(directory, '*.csv')
for in_file in glob.glob(in_files):
with open(in_file) as input_file:
reader = csv.reader(input_file)
your_list = (reader)
filename = os.path.splitext(os.path.basename(in_file))[0] + '.csv'
with open(os.path.join(output1, filename), 'w', newline='') as output_file1:
writer = csv.writer(output_file1)
writer.writerow(your_list[0:512])
with open(os.path.join(output2, filename), 'w', newline='' ) as output_file2:
writer = csv.writer(output_file2)
writer.writerow(your_list[512:516])
It shows the output as I wanted but now it stores apostrophes and braces eg. ['0.0'], ['2.321223'] as well. How do I remove these?
I don't understand why you can't do it programatically if you have your 4000+ pieces, just write every piece in a new line?
In my opinion the easiest way, but not automatically, would be some editor like Notepad ++.
Here you can Replace "," by "\r\n" or if you want to keep the "," you replace it with ",\r\n".
If you want it automated i don't see a not programmatical way.
By the way... if you use python with numpy/scipy you can just use the .transpose() function
*Edit to your comment:
what do you mean with "split from the first to the 512"? If you want parts with the size 512 it would be something like:
new_array = []
temp_array = []
k = 0
for num in your_array:
temp_array.append(num)
k += 1
if k % 512 == 0:
new_array.append(temp_array)
k = 0
temp_array = []
#to append the last block which might not be 512 sized
if len(temp_array) > 0:
new_array.append(temp_array)
# Save Arrays
for i in len(new_array):
saveToCsv(array = new_array[i], name="csv_"+str(i))
Your new_array would now be an array filled with 512 sized arrays.
Might be mistakes here, i did not test the code. To save you only need a function saveToCsf(array, name) which saves an array into a file.

How to create a Feature Collection from a GeoJSON

I have a Feature Collection of Polygons and MultiPolygons and I have to first write it in a temporary file to then load it with geopandas.GeoDataFrame.from_file(tmp_json_file), I'm looking for a way to do it without the temporary file. I've tried to use geopandas.GeoDataFrame.from_feature(), it works pretty well for Feature Collection of simple Polygon but i can't make it work for Feature Collection of Polygons and MultiPolygons, I was thinking about doing something like below, but it's not working yet.
features_collection = []
for feature in json_data['features']:
tmp_properties = {'id': feature['properties']['id']}
if is_multipolygon (feature):
tmp = Feature(geometry=MultiPolygon((feature['geometry']['coordinates'])), properties=tmp_properties)
else:
Feature(geometry=Polygon((feature['geometry']['coordinates'])), properties=tmp_properties)
features_collection.append(tmp)
collection = FeatureCollection(features_collection)
return geopandas.GeoDataFrame.from_features(collection['features'])
The GeoJSON is taken from an API, returning territory (some territory are modelized by a single polygon, other by a set of polygons (formatted as a MultiPolygon).
The GeoJSON are structured as follow : http://pastebin.com/PPdMUGkY
I'm getting the following error from the function above :
Traceback (most recent call last):
File "overlap.py", line 210, in <module>
print bdv_json_to_geodf(contours_bdv)
File "overlap.py", line 148, in json_to_geodf
return geopandas.GeoDataFrame.from_features(collection['features'])
File "/Library/Python/2.7/site-packages/geopandas/geodataframe.py", line 179, in from_features
d = {'geometry': shape(f['geometry'])}
File "/Library/Frameworks/GEOS.framework/Versions/3/Python/2.7/site-packages/shapely/geometry/geo.py", line 40, in shape
return MultiPolygon(ob["coordinates"], context_type='geojson')
File "/Library/Frameworks/GEOS.framework/Versions/3/Python/2.7/site-packages/shapely/geometry/multipolygon.py", line 64, in __init__
self._geom, self._ndim = geos_multipolygon_from_py(polygons)
File "/Library/Frameworks/GEOS.framework/Versions/3/Python/2.7/site-packages/shapely/geometry/multipolygon.py", line 138, in geos_multipolygon_from_py
N = len(ob[0][0][0])
TypeError: object of type 'float' has no len()
For me this works if I just feed the json_data features to GeoDataFrame.from_features:
In [17]: gdf = geopandas.GeoDataFrame.from_features(json_data['features'])
In [18]: gdf.head()
Out[18]:
geometry id
0 (POLYGON ((-0.58570861816406 44.810461337462, ... 2
1 (POLYGON ((-0.5851936340332 44.816550206151, -... 1
2 POLYGON ((-0.58805465698242 44.824018340447, -... 5
3 POLYGON ((-0.59412002563477 44.821664359038, -... 9
4 (POLYGON ((-0.58502197265625 44.817159057661, ... 12
The resulting GeoDataFrame has a mixture of Polygons and MultiPolygons like in the input data:
In [19]: gdf.geom_type.head()
Out[19]:
0 MultiPolygon
1 MultiPolygon
2 Polygon
3 Polygon
4 MultiPolygon
dtype: object
I tried this with GeoPandas 0.2, shapely 1.5.15, pandas 0.18.1 on Windows.