problem when trying to load data with textscan - octave

I want to load a set of data that contains variables type string and float. But when I use textscan in octave, my data doesn't load. I got matrix 1x6 (i have 6 features), but in this matrix I got cells that contains nothing(cells that are 0x1).
my code:
filename='data1.txt';
fileID = fopen(filename,'r');
data = textscan(fileID,'%f %s %s %f %f %s','Delimiter',',');
fclose(fileID);
when I for example try data(1):
>> data(1)
ans =
{
[1,1] = [](0x1)
}
>>
there is it
there is my set
Also my file id isn't -1.
I had been searching for in the ethernet problem like this but I couldn't find any.
I tried to delete headers in data and smaller training set but it don't work.
Pls help.

Don't use textscan. Textscan is horrible, and one should only use it when they're trying to parse data when there's no better way available.
Your data is a standard csv file. Just use csv2cell from the io package.

Related

Q/KDB+ / CSV upload and WSFULL

Pardon me but I'm a Q novice and couldn't find a solution. The code below appends a four-column CSV file to a KDB+ database. This code worked well but, now that my database is large, it throws a WSFULL error. Perhaps there is a more memory efficient way to write it. Please help:
// FILE_LOADER.q
\c 520 500
if [(count .z.x) < 1;
show `$"usage: q loadcsv.q inputfile destfile
where inputfile and destfile are absolute or relative paths to
the files. Inputfile has the following fields:
DATE, TICKER, FIELD, VALUE. DATE is of type date,
TICKER and FIELD are strings, and VALUE is converted to a float.
Any string VALUEs will show up as nulls.";
exit 1
]
f1: hsym `$.z.x[0]
f2: hsym `$.z.x[1]
columns: `DATE`TICKER`FIELD`VALUE
if [() ~ key f1; show ("Input file '",.z.x[0],"' not found");exit 1]
x: .Q.fsn[{f2 upsert flip columns!("DSSF";",")0:x};f1;4194000000]
show ("loaded ",(string x)," characters into the kdb database")
exit 0
First just from trying this out I assume your input csv file never has a header? If it does you'll need a slight code change so kdb is aware.
You are correct that it's a memory issue so what you can do is just decrease the chunk size. You are reading in 4194000000 bytes at a time right now. Try lowering this in accordance with available memory.
If you are still seeing issues it may be your garbage collection setting. You could force a gc after each read/upsert.
.Q.fsn[{f2 upsert flip columns!("DSSF";",")0:x;**.Q.gc[]**};f1;4194000000]

It is possible to obtain the mean of different files to make some computations with it after?

I have a code to calculate the mean of the first five values of each column of a file, for then use these values as a reference point for all set. The problem is that now I need to do the same but for many files. So I will need to obtain the mean of each file to then use these values again with the originals files. I have tried in this way but I obtain an error. Thanks.
%%% - Loading the file of each experiment
myfiles = dir('*.lvm'); % To load every file of .lvm
for i = 1:length(myfiles) % Loop with the number of files
files=myfiles(i).name;
mydata(i).files = files;
mydata(i).T = fileread(files);
arraymean(i) = mean(mydata(i));
end
The files that I need to compute are more or less like this:
Delta_X 3.000000 3.000000 3.000000
***End_of_Header***
X_Value C_P1N1 C_P1N2 C_P1N3
0.000000 -0.044945 -0.045145 -0.045705
0.000000 -0.044939 -0.045135 -0.045711
3.000000 -0.044939 -0.045132 -0.045706
6.000000 -0.044938 -0.045135 -0.045702
Your first line results in 'myfiles' being a structure array with components that you will find defined when you type 'help dir'. In particular, the names of all the files are contained in the structure element myfiles(i).name. To display all the file names, type myfiles.name. So far so good. In the for loop you use 'fileread', but fileread (see help fileread) returns the character string rather than the actual values. I have named your prototype .lvm file DinaF.lvm and I have written a very, very simple function to read the data in that file, by skipping the first three lines, then storing the following matrix, assumed to have 4 columns, in an array called T inside the function and arrayT in the main program
Here is a modified script, where a function read_lvm has been included to read your 'model' lvm file.
The '1' in the first line tells Octave that there is more to the script than just the following function: the main program has to be interpreted as well.
1;
function T=read_lvm(filename)
fid = fopen (filename, "r");
%% Skip by first three lines
for lhead=1:3
junk=fgetl(fid);
endfor
%% Read nrow lines of data, quit when file is empty
nrow=0;
while (! feof (fid) )
nrow=nrow + 1;
thisline=fscanf(fid,'%f',4);
T(nrow,1:4)=transpose(thisline);
endwhile
fclose (fid);
endfunction
## main program
myfiles = dir('*.lvm'); % To load every file of .lvm
for i = 1:length(myfiles) % Loop with the number of files
files=myfiles(i).name;
arrayT(i,:,:) = read_lvm(files);
columnmean(i,1:4)=mean(arrayT(i,:,:))
end
Now the tabular values associated with each .lvm file are in the array arrayT and the mean for that data set is in columnmean(i,1:4). If i>1 then columnmean would be an array, with each row containing the files for each lvm file. T
This discussion is getting to be too distant from the initial question. I am happy to continue to help. If you want more help, close this discussion by accepting my answer (click the swish), then ask a new question with a heading like 'How to read .lvm files in Octave'. That way you will get the insights from many more people.

using a variable to identify file in 'print -dpdf file_name'

I am trying to use a formatted string to identify the file location when using 'print -dpdf file_name' to write a plot (or figure) to a file.
I've tried:
k=1;
file_name = sprintf("\'/home/user/directory to use/file%3.3i.pdf\'",k);
print -dpdf file_name;
but that only gets me a figure written to ~/file_name.pdf which is not what I want. I've tried several other approaches but I cannot find an approach that causes the the third term (file_name, in this example) to be evaluated. I have not found any other printing function that will allow me to perform a formatted write (the '-dpdf' option) of a plot (or figure) to a file.
I need the single quotes because the path name to the location where I want to write the file contains spaces. (I'm working on a Linux box running Fedora 24 updated daily.)
If I compute the file name using the line above, then cut and paste it into the print statement, everything works exactly as I wish it to. I've tried using
k=1;
file_name = sprintf("\'/home/user/directory to use/file%3.3i.pdf\'",k);
print ("-dpdf", '/home/user/directory to use/file001.pdf');
But simply switching to a different form of print statement doesn't solve the problem,although now I get an error message:
GPL Ghostscript 9.16: **** Could not open the file '/home/user/directory to use/file001.pdf' .
**** Unable to open the initial device, quitting.
warning: broken pipe
if you use foo a b this is the same as foo ("a", "b"). In your case you called print ("-dpdf", "file_name")
k = 1;
file_name = sprintf ("/home/user/directory to use/file%3.3i.pdf", k);
print ("-dpdf", file_name);
Observe:
>> k=1;
>> file_name = sprintf ('/home/tasos/Desktop/a folder with spaces in it/this is file number %3.3i.pdf', k)
file_name = /home/tasos/Desktop/a folder with spaces in it/this is file number 001.pdf
>> plot (1 : 10);
>> print (gcf, file_name, '-dpdf')
Tadaaa!
So yeah, no single quotes needed. The reason single quotes work when you're "typing it by hand" is because you're literally creating the string on the spot with the single quotes.
Having said that, it's generally a good idea when generating absolute paths to use the fullfile command instead. Have a look at it.
Tasos Papastylianou #TasosPapastylianou provided great help. My problem is now solved.

Python 3 code to read CSV file, manipulate then create new file....works, but looking for improvements

This is my first ever post here. I am trying to learn a bit of Python. Using Python 3 and numpy.
Did a few tutorials then decided to dive in and try a little project I might find useful at work as thats a good way to learn for me.
I have written a program that reads in data from a CSV file which has a few rows of headers, I then want to extract certain columns from that file based on the header names, then output that back to a new csv file in a particular format.
The program I have works fine and does what I want, but as I'm a newbie I would like some tips as to how I can improve my code.
My main data file (csv) is about 57 columns long and about 36 rows deep so not big.
It works fine, but looking for advice & improvements.
import csv
import numpy as np
#make some arrays..at least I think thats what this does
A=[]
B=[]
keep_headers=[]
#open the main data csv file 'map.csv'...need to check what 'r' means
input_file = open('map.csv','r')
#read the contents of the file into 'data'
data=csv.reader(input_file, delimiter=',')
#skip the first 2 header rows as they are junk
next(data)
next(data)
#read in the next line as the 'header'
headers = next(data)
#Now read in the numeric data (float) from the main csv file 'map.csv'
A=np.genfromtxt('map.csv',delimiter=',',dtype='float',skiprows=5)
#Get the length of a column in A
Alen=len(A[:,0])
#now read the column header values I want to keep from 'keepheader.csv'
keep_headers=np.genfromtxt('keepheader.csv',delimiter=',',dtype='unicode_')
#Get the length of keep headers....i.e. how many headers I'm keeping.
head_len=len(keep_headers)
#Now loop round extracting all the columns with the keep header titles and
#append them to array B
i=0
while i < head_len:
#use index to find the apprpriate column number.
item_num=headers.index(keep_headers[i])
i=i+1
#append the selected column to array B
B=np.append(B,A[:,item_num])
#now reshape the B array
B=np.reshape(B,(head_len,36))
#now transpose it as thats the format I want.
B=np.transpose(B)
#save the array B back to a new csv file called 'cmap.csv'
np.savetxt('cmap.csv',B,fmt='%.3f',delimiter=",")
Thanks.
You can greatly simplify your code using more of numpy capabilities.
A = np.loadtxt('stack.txt',skiprows=2,delimiter=',',dtype=str)
keep_headers=np.loadtxt('keepheader.csv',delimiter=',',dtype=str)
headers = A[0,:]
cols_to_keep = np.in1d( headers, keep_headers )
B = np.float_(A[1:,cols_to_keep])
np.savetxt('cmap.csv',B,fmt='%.3f',delimiter=",")

Working with a big CSV file in MATLAB

I have to work with a big CSV file, up to 2GB. More specifically I have to upload all this data to the mySQL database, but before I have to make a few calculation on that, so I need to do all this thing in MATLAB (also my supervisor want to do in MATLAB because he familiar just with MATLAB :( ).
Any idea how can I handle these big files?
You should probably use textscan to read the data in in chunks and then process. This will probably be more efficient than reading a single line at a time. For example, if you have 3 columns of data, you could do:
filename = 'fname.csv';
[fh, errMsg] = fopen( filename, 'rt' );
if fh == -1, error( 'couldn''t open file: %s: %s', filename, errMsg ); end
N = 100; % read 100 rows at a time
while ~feof( fh )
c = textscan( fh, '%f %f %f', N, 'Delimiter', ',' );
doStuff(c);
end
EDIT
These days (R2014b and later), it's easier and probably more efficient to use a datastore.
There is good advice on handling large datasets in MATLAB in this file exchange item.
Specific topics include:
* Understanding the maximum size of an array and the workspace in MATLAB
* Using undocumented features to show you the available memory in MATLAB
* Setting the 3GB switch under Windows XP to get 1GB more memory for MATLAB
* Using textscan to read large text files and memory mapping feature to read large binary files