Extrapolate two columns from txt with commas and text - octave

I got a problem reading some data from a txt file. I appreciate any suggestion and thank you in advance!
I have a txt file with text/number on top, followed by two tab-separated columns (additionally, they have commas instead of dots).
I want to extract the two columns without text, and replace the commas with dots in order to plot them.
I tried with importdata to be able to replace the commas, but it separates every single character, so I get 36k elements instead of 2048.
Tried dlmread but it ignores the second column...
I have no idea how to proceed without modifying every single file manually.
here is an example of the file:
Data from FLMS012901__118__10-30-26-589.txt Node
Date: Tue Jul 05 10:30:26 CEST 2022
User: Myself
Number of Pixels in Spectrum: 2048
>>>>>Begin Spectral Data<<<<<
338,147 -2183,94
338,527 -2183,94
338,906 -2183,94
339,286 -2251,25
Any suggestions?
EDIT:
Apparently, there was already a solution, even though a bit slow:
% Read file in as a series of strings
fid = fopen('data.txt', 'rb');
strings = textscan(fid, '%s', 'Delimiter', '');
fclose(fid);
% Replace all commas with decimal points
decimal_strings = regexprep(strings{1}, ',', '.');
% Convert to doubles and join all rows together
data = cellfun(#str2num, decimal_strings, 'uni', 0);
data = cat(1, data{:});

On the sample that you provide, the following works:
>> [a,b,c,d] = textread("SO_73502149.txt","%f,%f %f,%f", "headerlines", 6);
>> format free
>> [a+b/1000, c+sign(c).*d/100]
ans =
338.147 -2183.94
338.527 -2183.94
338.906 -2183.94
339.286 -2251.25
However there are some possible traps, according to the way decimal figures are handled in your file, you should adapt the post-processing: If for 338.10 338,1 is printed in the file instead of 338,10 , the decoding would be a bit harder. Whenever c becomes zero, sign(c) would kill the decimal part. A less trivial post-processing would be required.

Related

How to replace MATLAB's timeseries and synchronize functions in Octave?

I have a MATLAB script that I would like to run in Octave. But it turns out that the timeseries and synchronize functions from MATLAB are not yet implemented in Octave. So my question is if there is a way to express or replace these functions in Octave.
For understanding, I have two text files with different row lengths, which I want to synchronize into one text file with the same row length over time. The content of the text files is:
Text file 1:
1st column contains the distance
2nd column contains the time
Text file 2:
1st column contains the angle
2nd column contains the time
Here is the part of my code that I use in MATLAB to synchronize the files.
ts1 = timeseries(distance,timed);
ts2 = timeseries(angle,timea);
[ts1 ts2] = synchronize(ts1,ts2,'union');
distance = ts1.Data;
angle = ts2.Data;
Thanks in advance for your help.
edit:
Here are some example files.
input distance
input roation angle
output
The synchronize function seems to create a common timeseries from two separate ones (here, specifically via their union), and then use interpolation (here 'linear') to find interpolated values for both distance and angle at the common timepoints.
An example of how to achieve this to get the same output in octave as your provided output file is as follows.
Note: I had to preprocess your input files first to replace 'decimal commas' with dots, and then 'tabs' with commas, to make them valid csv files.
Distance_t = csvread('input_distance.txt', 1, 0); % skip header row
Rotation_t = csvread('input_rotation_angle.txt', 1, 0); % skip header row
Common_t = union( Distance_t(:,2), Rotation_t(:,2) );
InterpolatedDistance = interp1( Distance_t(:,2), Distance_t(:,1), Common_t );
InterpolatedRotation = interp1( Rotation_t(:,2), Rotation_t(:,1), Common_t );
Output = [ InterpolatedRotation, InterpolatedDistance ];
Output = sortrows( Output, -1 ); % sort according to column 1, in descending order
Output = Output(~isna(Output(:,2)), :); % remove NA entries
(Note, The step involving removal of NA entries was necessary because we did not specify we wanted extrapolation during the interpolation step, and some of the resulting distance values would be outside the original timerange, which octave labels as NA).

Opening a file of varying row and column structure in Scilab

I habitually use csvRead in scilab to read my data files however I am now faced with one which contains blocks of 200 rows, preceeded by 3 lines of headers, all of which I would like to take into account.
I've tried specifying a range of data following the example on the scilab help website for csvRead (example is right at the bottom of the page) (https://help.scilab.org/doc/6.0.0/en_US/csvRead.html) but I always come out with the same error messages :
The line and/or colmun indices are outside of the limits
or
Error in the column structure.
My first three lines are headers which I know can cause a problem but even if I omit them from my block-range, I still have the same problem.
Otherwise, my data is ordered such that I have my three lines of headers (two lines containing a header over just one or two columns, one line containing a header over all columns), 200 lines of data, and a blank line - this represents data from one image and I have about 500 images in the file, I would like to be able to read and process all of them and keep track of the headers because they state the image number which I need to reference later. Example:
DTN-dist_Devissage-1_0006_0,,,,,,
L0,,,,,,
X [mm],Y [mm],W [mm],exx [1] - Lagrange,eyy [1] - Lagrange,exy [1] - Lagrange,Von Mises Strain [1] - Lagrange
-1.13307,-15.0362,-0.00137507,7.74679e-05,8.30045e-05,5.68249e-05,0.00012711
-1.10417,-14.9504,-0.00193334,7.66086e-05,8.02914e-05,5.43132e-05,0.000122655
-1.07528,-14.8647,-0.00249155,7.57493e-05,7.75786e-05,5.18017e-05,0.0001182
Does anyone have a solution to this?
My current code, following an adapted version of the Scilab-help example looks like this (I have tried varying the blocksize and iblock values to include/omit headers:
blocksize=200;
C1=1;
C2=14;
iblock=1
while (%t)
R1=(iblock-1)*blocksize+4;
R2=blocksize+R1-1;
irange=[R1 C1 R2 C2];
V=csvRead(filepath+filename,",",".","",[],"",irange);
iblock=iblock+1
end
Errors
The CSV
A lot's of your problem comes from the inconsistency of the number of coma in your csv file. Opening it in LibreOffice Calc and saving it puts the right number of comma, even on empty lines.
R1
Your current code doesn't position R1 at the beginning of the values. The right formula is
R1=(iblock-1)*(blocksize+blanksize+headersize)+1+headersize;
End of file
Currently your code raise an error and the end of the file because R1 becomes greater than the number of lines. To solve this, you can specify the maximum number of block or test the value of R1 against the number of lines.
Improved solution for much bigger file.
When solving your probem with a big file, two problems were raised :
We need to know the number of blocks or the number of lines
Each call of csvRead is really slow because it process the whole file at each call (1s / block !)
My idea was to read the whole file and store it in a string matrix ( since mgetl as been improved since 6.0.0 ), then use csvTextScan on a submatrix. Doing so also removes the manual writing of the number of block/lines.
The code follows :
clear all
clc
s = filesep()
filepath='.'+s;
filename='DTN_full.csv';
// header is important as it as the image name
headersize=3;
blocksize=200;
C1=1;
C2=14;
iblock=1
// let save everything. Good for the example.
bigstruct = struct();
// Read all the value in one pass
// then using csvTextScan is much more efficient
text = mgetl(filepath+filename);
nlines = size(text,'r');
while ( %t )
mprintf("Block #%d",iblock);
// Lets read the header
R1=(iblock-1)*(headersize+blocksize+1)+1;
R2=R1 + headersize-1;
// if R1 or R1 is bigger than the number of lines, stop
if sum([R1,R2] > nlines )
mprintf('; End of file\n')
break
end
// We use csvTextScan ony on the lines that matters
// speed the program, since csvRead read thge whole file
// every time it is used.
H=csvTextScan(text(R1:R2),",",".","string");
mprintf("; %s",H(1,1))
R1 = R1 + headersize;
R2 = R1 + blocksize-1;
if sum([R1,R2]> nlines )
mprintf('; End of file\n')
break
end
mprintf("; rows %d to %d\n",R1,R2)
// Lets read the values
V=csvTextScan(text(R1:R2),",",".","double");
iblock=iblock+1
// Let save theses data
bigstruct(H(1,1)) = V;
end
and returns
Block #1; DTN-dist_0005_0; rows 4 to 203
....
Block #178; DTN-dist_0710_0; rows 36112 to 36311
Block #179; End of file
Time elapsed 1.827092s

What does 'multiline strings are different' meant by from RIDE (Robot Framework) output?

i am trying to compare two csv file data and followed below process in RIDE -
${csvA} = Get File ${filePathA}
${csvB} = Get File ${filePathB}
Should Be Equal As Strings ${csvA} ${csvB}
Here are my two csv contents -
csvA data
Harshil,45,8.03,DMJ
Divy,55,8,VVN
Parth,1,9,vvn
kjhjmb,44,0.5,bugg
csvB data
Harshil,45,8.03,DMJ
Divy,55,78,VVN
Parth,1,9,vvnbcb
acc,5,6,afafa
As few of the data is not in match, when i Run the code in RIDE, the result is FAIL. But in the log below data is shown -
**
Multiline strings are different:
--- first
+++ second
## -1,4 +1,4 ##
Harshil,45,8.03,DMJ
-Divy,55,8,VVN
-Parth,1,9,vvn
-kjhjmb,44,0.5,bugg
+Divy,55,78,VVN
+Parth,1,9,vvnbcb
+acc,5,6,afafa**
I would like to know the meaning of ---first +++second ##-1,4+1,4## content.
Thanks in advance!
When robot compares multiline strings (data that has newlines in it), it uses the standard unix tool diff to show the differences. Those characters are all part of what's called a unified diff. Even though you pass in raw data, it's treating the data as two files and showing the differences between the two in a format familiar to most programmers.
Here are two references to read more about the format:
What does "## -1 +1 ##" mean in Git's diff output?. (stackoverflow)
the diff man page (gnu.org)
In short, the ## gives you a reference for which line numbers are different, and the + and - show you which lines are different.
In your specific example it's telling you that three lines were different between the two strings: the line beginning with Divy, the line beginning with Parth, and the line beginning with acc. Since the line beginning with Harshil does not show a + or -, that means it was identical between the two strings.

using a variable to identify file in 'print -dpdf file_name'

I am trying to use a formatted string to identify the file location when using 'print -dpdf file_name' to write a plot (or figure) to a file.
I've tried:
k=1;
file_name = sprintf("\'/home/user/directory to use/file%3.3i.pdf\'",k);
print -dpdf file_name;
but that only gets me a figure written to ~/file_name.pdf which is not what I want. I've tried several other approaches but I cannot find an approach that causes the the third term (file_name, in this example) to be evaluated. I have not found any other printing function that will allow me to perform a formatted write (the '-dpdf' option) of a plot (or figure) to a file.
I need the single quotes because the path name to the location where I want to write the file contains spaces. (I'm working on a Linux box running Fedora 24 updated daily.)
If I compute the file name using the line above, then cut and paste it into the print statement, everything works exactly as I wish it to. I've tried using
k=1;
file_name = sprintf("\'/home/user/directory to use/file%3.3i.pdf\'",k);
print ("-dpdf", '/home/user/directory to use/file001.pdf');
But simply switching to a different form of print statement doesn't solve the problem,although now I get an error message:
GPL Ghostscript 9.16: **** Could not open the file '/home/user/directory to use/file001.pdf' .
**** Unable to open the initial device, quitting.
warning: broken pipe
if you use foo a b this is the same as foo ("a", "b"). In your case you called print ("-dpdf", "file_name")
k = 1;
file_name = sprintf ("/home/user/directory to use/file%3.3i.pdf", k);
print ("-dpdf", file_name);
Observe:
>> k=1;
>> file_name = sprintf ('/home/tasos/Desktop/a folder with spaces in it/this is file number %3.3i.pdf', k)
file_name = /home/tasos/Desktop/a folder with spaces in it/this is file number 001.pdf
>> plot (1 : 10);
>> print (gcf, file_name, '-dpdf')
Tadaaa!
So yeah, no single quotes needed. The reason single quotes work when you're "typing it by hand" is because you're literally creating the string on the spot with the single quotes.
Having said that, it's generally a good idea when generating absolute paths to use the fullfile command instead. Have a look at it.
Tasos Papastylianou #TasosPapastylianou provided great help. My problem is now solved.

Finding a string between two strings in a file

This is a bit of a .json file I need to find information in:
"title":
"Spring bank holiday","date":"2012-06-04","notes":"Substitute day","bunting":true},
{"title":"Queen\u2019s Diamond Jubilee","date":"2012-06-05","notes":"Extra bank holiday","bunting":true},
{"title":"Summer bank holiday","date":"2012-08-27","notes":"","bunting":true},
{"title":"Christmas Day","date":"2012-12-25","notes":"","bunting":true},
{"title":"Boxing Day","date":"2012-12-26","notes":"","bunting":true},
{"title":"New Year\u2019s Day","date":"2013-01-01","notes":"","bunting":true},
{"title":"Good Friday","date":"2013-03-29","notes":"","bunting":false},
{"title":"
The file is much longer, but it is one long line of text.
I would like to display what bank holiday it is after a certain date, and also if it involves bunting.
I've tried grep and sed but I can't figure it out.
I'd like something like this:
[command] between [date] and [}] display [title] and [bunting]/[no bunting]
[title] should be just "Christmas Day" or something else
Forgot to mention:
I would like to achieve this in bash shell, either from the prompt or from a short bit of code.
You should use a proper JSON parser in a decent programming language, then you can do a lot of work in a safe way without too much code. How about this little Python code:
#!/usr/bin/env python
import json
with open('my.json') as jsonFile:
holidays = json.load(jsonFile)
for holiday in holidays:
if holiday['date'] > '2012-05-06':
print holiday['date'], ':', holiday['title'], \
("bunting" if holiday['bunting'] else "no bunting")
break # in case you only want one line of output
I could not figure out what exactly the output should be; if you can be more specific, I can adjust my example.
You can try this with awk:
awk -F"}," '{for(i=1;i<=NF;i++){print $i}}' file.json | awk -F"\"[:,]\"?" '$4>"2013-01-01"{printf "%s:%s:%s\n" ,$2,$4,$8}'
Seeing that the json file is one long string we first split this line into multiple json records on },. Then each individual record is split on a combination of ":, characters with an optional closing ". We then only output the line if its after a certain date.
This will find all records after Jan 1 2013.
EDIT:
The 2nd awk splits each individual json record into key-value pairs using a sub-string starting with ", followed by either a : or ,, and an optional ending ".
So in your example it will split on either ",", ":" or ":.
All odd fields are keys, and all even fields are values (hence $4 being the date in your example). We then check if $4(date) is after 2013-01-01.
I noticed i made a mistake on the optional " (should be followed by ? instead of *) in the split which i have now corrected and i also used printf function to display the values.